aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorGreg KH <gregkh@suse.de>2005-09-12 15:45:04 -0400
committerGreg Kroah-Hartman <gregkh@suse.de>2005-09-12 15:45:04 -0400
commitd58dde0f552a5c5c4485b962d8b6e9dd54fefb30 (patch)
treed9a7e35eb88fea6265d5aadcc3d4ed39122b052a /Documentation
parent877599fdef5ea4a7dd1956e22fa9d6923add97f8 (diff)
parent2ade81473636b33aaac64495f89a7dc572c529f0 (diff)
Merge ../torvalds-2.6/
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/00-INDEX4
-rw-r--r--Documentation/CodingStyle3
-rw-r--r--Documentation/DMA-API.txt2
-rw-r--r--Documentation/DMA-ISA-LPC.txt151
-rw-r--r--Documentation/DocBook/journal-api.tmpl4
-rw-r--r--Documentation/DocBook/kernel-hacking.tmpl310
-rw-r--r--Documentation/DocBook/usb.tmpl2
-rw-r--r--Documentation/MSI-HOWTO.txt2
-rw-r--r--Documentation/RCU/RTFP.txt36
-rw-r--r--Documentation/RCU/UP.txt79
-rw-r--r--Documentation/RCU/checklist.txt23
-rw-r--r--Documentation/RCU/rcu.txt48
-rw-r--r--Documentation/RCU/rcuref.txt74
-rw-r--r--Documentation/RCU/whatisRCU.txt902
-rw-r--r--Documentation/applying-patches.txt439
-rw-r--r--Documentation/cpu-freq/cpufreq-stats.txt2
-rw-r--r--Documentation/cpusets.txt2
-rw-r--r--Documentation/crypto/descore-readme.txt2
-rw-r--r--Documentation/dvb/bt8xx.txt89
-rw-r--r--Documentation/dvb/ci.txt9
-rw-r--r--Documentation/fb/cyblafb/bugs14
-rw-r--r--Documentation/fb/cyblafb/credits7
-rw-r--r--Documentation/fb/cyblafb/documentation17
-rw-r--r--Documentation/fb/cyblafb/fb.modes155
-rw-r--r--Documentation/fb/cyblafb/performance80
-rw-r--r--Documentation/fb/cyblafb/todo32
-rw-r--r--Documentation/fb/cyblafb/usage206
-rw-r--r--Documentation/fb/cyblafb/whycyblafb85
-rw-r--r--Documentation/fb/intel810.txt56
-rw-r--r--Documentation/fb/modedb.txt73
-rw-r--r--Documentation/feature-removal-schedule.txt8
-rw-r--r--Documentation/filesystems/files.txt123
-rw-r--r--Documentation/filesystems/fuse.txt315
-rw-r--r--Documentation/filesystems/proc.txt42
-rw-r--r--Documentation/filesystems/v9fs.txt95
-rw-r--r--Documentation/filesystems/vfs.txt435
-rw-r--r--Documentation/ioctl/cdrom.txt2
-rw-r--r--Documentation/kbuild/makefiles.txt14
-rw-r--r--Documentation/kdump/kdump.txt16
-rw-r--r--Documentation/kernel-parameters.txt10
-rw-r--r--Documentation/mono.txt2
-rw-r--r--Documentation/networking/bonding.txt4
-rw-r--r--Documentation/networking/wan-router.txt4
-rw-r--r--Documentation/pci.txt2
-rw-r--r--Documentation/powerpc/eeh-pci-error-recovery.txt2
-rw-r--r--Documentation/s390/s390dbf.txt2
-rw-r--r--Documentation/scsi/ibmmca.txt2
-rw-r--r--Documentation/sound/alsa/ALSA-Configuration.txt2
-rw-r--r--Documentation/sparse.txt2
-rw-r--r--Documentation/sysrq.txt2
-rw-r--r--Documentation/uml/UserModeLinux-HOWTO.txt2
-rw-r--r--Documentation/usb/gadget_serial.txt2
-rw-r--r--Documentation/video4linux/CARDLIST.bttv4
-rw-r--r--Documentation/video4linux/CARDLIST.saa71343
-rw-r--r--Documentation/video4linux/CARDLIST.tuner1
-rw-r--r--Documentation/video4linux/Zoran2
-rw-r--r--Documentation/x86_64/boot-options.txt5
57 files changed, 3580 insertions, 431 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index f28a24e0279b..433cf5e9ae04 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -46,6 +46,8 @@ SubmittingPatches
46 - procedure to get a source patch included into the kernel tree. 46 - procedure to get a source patch included into the kernel tree.
47VGA-softcursor.txt 47VGA-softcursor.txt
48 - how to change your VGA cursor from a blinking underscore. 48 - how to change your VGA cursor from a blinking underscore.
49applying-patches.txt
50 - description of various trees and how to apply their patches.
49arm/ 51arm/
50 - directory with info about Linux on the ARM architecture. 52 - directory with info about Linux on the ARM architecture.
51basic_profiling.txt 53basic_profiling.txt
@@ -275,7 +277,7 @@ tty.txt
275unicode.txt 277unicode.txt
276 - info on the Unicode character/font mapping used in Linux. 278 - info on the Unicode character/font mapping used in Linux.
277uml/ 279uml/
278 - directory with infomation about User Mode Linux. 280 - directory with information about User Mode Linux.
279usb/ 281usb/
280 - directory with info regarding the Universal Serial Bus. 282 - directory with info regarding the Universal Serial Bus.
281video4linux/ 283video4linux/
diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle
index f25b3953f513..22e5f9036f3c 100644
--- a/Documentation/CodingStyle
+++ b/Documentation/CodingStyle
@@ -236,6 +236,9 @@ ugly), but try to avoid excess. Instead, put the comments at the head
236of the function, telling people what it does, and possibly WHY it does 236of the function, telling people what it does, and possibly WHY it does
237it. 237it.
238 238
239When commenting the kernel API functions, please use the kerneldoc format.
240See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc
241for details.
239 242
240 Chapter 8: You've made a mess of it 243 Chapter 8: You've made a mess of it
241 244
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 6ee3cd6134df..1af0f2d50220 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -121,7 +121,7 @@ pool's device.
121 dma_addr_t addr); 121 dma_addr_t addr);
122 122
123This puts memory back into the pool. The pool is what was passed to 123This puts memory back into the pool. The pool is what was passed to
124the the pool allocation routine; the cpu and dma addresses are what 124the pool allocation routine; the cpu and dma addresses are what
125were returned when that routine allocated the memory being freed. 125were returned when that routine allocated the memory being freed.
126 126
127 127
diff --git a/Documentation/DMA-ISA-LPC.txt b/Documentation/DMA-ISA-LPC.txt
new file mode 100644
index 000000000000..705f6be92bdb
--- /dev/null
+++ b/Documentation/DMA-ISA-LPC.txt
@@ -0,0 +1,151 @@
1 DMA with ISA and LPC devices
2 ============================
3
4 Pierre Ossman <drzeus@drzeus.cx>
5
6This document describes how to do DMA transfers using the old ISA DMA
7controller. Even though ISA is more or less dead today the LPC bus
8uses the same DMA system so it will be around for quite some time.
9
10Part I - Headers and dependencies
11---------------------------------
12
13To do ISA style DMA you need to include two headers:
14
15#include <linux/dma-mapping.h>
16#include <asm/dma.h>
17
18The first is the generic DMA API used to convert virtual addresses to
19physical addresses (see Documentation/DMA-API.txt for details).
20
21The second contains the routines specific to ISA DMA transfers. Since
22this is not present on all platforms make sure you construct your
23Kconfig to be dependent on ISA_DMA_API (not ISA) so that nobody tries
24to build your driver on unsupported platforms.
25
26Part II - Buffer allocation
27---------------------------
28
29The ISA DMA controller has some very strict requirements on which
30memory it can access so extra care must be taken when allocating
31buffers.
32
33(You usually need a special buffer for DMA transfers instead of
34transferring directly to and from your normal data structures.)
35
36The DMA-able address space is the lowest 16 MB of _physical_ memory.
37Also the transfer block may not cross page boundaries (which are 64
38or 128 KiB depending on which channel you use).
39
40In order to allocate a piece of memory that satisfies all these
41requirements you pass the flag GFP_DMA to kmalloc.
42
43Unfortunately the memory available for ISA DMA is scarce so unless you
44allocate the memory during boot-up it's a good idea to also pass
45__GFP_REPEAT and __GFP_NOWARN to make the allocater try a bit harder.
46
47(This scarcity also means that you should allocate the buffer as
48early as possible and not release it until the driver is unloaded.)
49
50Part III - Address translation
51------------------------------
52
53To translate the virtual address to a physical use the normal DMA
54API. Do _not_ use isa_virt_to_phys() even though it does the same
55thing. The reason for this is that the function isa_virt_to_phys()
56will require a Kconfig dependency to ISA, not just ISA_DMA_API which
57is really all you need. Remember that even though the DMA controller
58has its origins in ISA it is used elsewhere.
59
60Note: x86_64 had a broken DMA API when it came to ISA but has since
61been fixed. If your arch has problems then fix the DMA API instead of
62reverting to the ISA functions.
63
64Part IV - Channels
65------------------
66
67A normal ISA DMA controller has 8 channels. The lower four are for
688-bit transfers and the upper four are for 16-bit transfers.
69
70(Actually the DMA controller is really two separate controllers where
71channel 4 is used to give DMA access for the second controller (0-3).
72This means that of the four 16-bits channels only three are usable.)
73
74You allocate these in a similar fashion as all basic resources:
75
76extern int request_dma(unsigned int dmanr, const char * device_id);
77extern void free_dma(unsigned int dmanr);
78
79The ability to use 16-bit or 8-bit transfers is _not_ up to you as a
80driver author but depends on what the hardware supports. Check your
81specs or test different channels.
82
83Part V - Transfer data
84----------------------
85
86Now for the good stuff, the actual DMA transfer. :)
87
88Before you use any ISA DMA routines you need to claim the DMA lock
89using claim_dma_lock(). The reason is that some DMA operations are
90not atomic so only one driver may fiddle with the registers at a
91time.
92
93The first time you use the DMA controller you should call
94clear_dma_ff(). This clears an internal register in the DMA
95controller that is used for the non-atomic operations. As long as you
96(and everyone else) uses the locking functions then you only need to
97reset this once.
98
99Next, you tell the controller in which direction you intend to do the
100transfer using set_dma_mode(). Currently you have the options
101DMA_MODE_READ and DMA_MODE_WRITE.
102
103Set the address from where the transfer should start (this needs to
104be 16-bit aligned for 16-bit transfers) and how many bytes to
105transfer. Note that it's _bytes_. The DMA routines will do all the
106required translation to values that the DMA controller understands.
107
108The final step is enabling the DMA channel and releasing the DMA
109lock.
110
111Once the DMA transfer is finished (or timed out) you should disable
112the channel again. You should also check get_dma_residue() to make
113sure that all data has been transfered.
114
115Example:
116
117int flags, residue;
118
119flags = claim_dma_lock();
120
121clear_dma_ff();
122
123set_dma_mode(channel, DMA_MODE_WRITE);
124set_dma_addr(channel, phys_addr);
125set_dma_count(channel, num_bytes);
126
127dma_enable(channel);
128
129release_dma_lock(flags);
130
131while (!device_done());
132
133flags = claim_dma_lock();
134
135dma_disable(channel);
136
137residue = dma_get_residue(channel);
138if (residue != 0)
139 printk(KERN_ERR "driver: Incomplete DMA transfer!"
140 " %d bytes left!\n", residue);
141
142release_dma_lock(flags);
143
144Part VI - Suspend/resume
145------------------------
146
147It is the driver's responsibility to make sure that the machine isn't
148suspended while a DMA transfer is in progress. Also, all DMA settings
149are lost when the system suspends so if your driver relies on the DMA
150controller being in a certain state then you have to restore these
151registers upon resume.
diff --git a/Documentation/DocBook/journal-api.tmpl b/Documentation/DocBook/journal-api.tmpl
index 1ef6f43c6d8f..341aaa4ce481 100644
--- a/Documentation/DocBook/journal-api.tmpl
+++ b/Documentation/DocBook/journal-api.tmpl
@@ -116,7 +116,7 @@ filesystem. Almost.
116 116
117You still need to actually journal your filesystem changes, this 117You still need to actually journal your filesystem changes, this
118is done by wrapping them into transactions. Additionally you 118is done by wrapping them into transactions. Additionally you
119also need to wrap the modification of each of the the buffers 119also need to wrap the modification of each of the buffers
120with calls to the journal layer, so it knows what the modifications 120with calls to the journal layer, so it knows what the modifications
121you are actually making are. To do this use journal_start() which 121you are actually making are. To do this use journal_start() which
122returns a transaction handle. 122returns a transaction handle.
@@ -128,7 +128,7 @@ and its counterpart journal_stop(), which indicates the end of a transaction
128are nestable calls, so you can reenter a transaction if necessary, 128are nestable calls, so you can reenter a transaction if necessary,
129but remember you must call journal_stop() the same number of times as 129but remember you must call journal_stop() the same number of times as
130journal_start() before the transaction is completed (or more accurately 130journal_start() before the transaction is completed (or more accurately
131leaves the the update phase). Ext3/VFS makes use of this feature to simplify 131leaves the update phase). Ext3/VFS makes use of this feature to simplify
132quota support. 132quota support.
133</para> 133</para>
134 134
diff --git a/Documentation/DocBook/kernel-hacking.tmpl b/Documentation/DocBook/kernel-hacking.tmpl
index 49a9ef82d575..6367bba32d22 100644
--- a/Documentation/DocBook/kernel-hacking.tmpl
+++ b/Documentation/DocBook/kernel-hacking.tmpl
@@ -8,8 +8,7 @@
8 8
9 <authorgroup> 9 <authorgroup>
10 <author> 10 <author>
11 <firstname>Paul</firstname> 11 <firstname>Rusty</firstname>
12 <othername>Rusty</othername>
13 <surname>Russell</surname> 12 <surname>Russell</surname>
14 <affiliation> 13 <affiliation>
15 <address> 14 <address>
@@ -20,7 +19,7 @@
20 </authorgroup> 19 </authorgroup>
21 20
22 <copyright> 21 <copyright>
23 <year>2001</year> 22 <year>2005</year>
24 <holder>Rusty Russell</holder> 23 <holder>Rusty Russell</holder>
25 </copyright> 24 </copyright>
26 25
@@ -64,7 +63,7 @@
64 <chapter id="introduction"> 63 <chapter id="introduction">
65 <title>Introduction</title> 64 <title>Introduction</title>
66 <para> 65 <para>
67 Welcome, gentle reader, to Rusty's Unreliable Guide to Linux 66 Welcome, gentle reader, to Rusty's Remarkably Unreliable Guide to Linux
68 Kernel Hacking. This document describes the common routines and 67 Kernel Hacking. This document describes the common routines and
69 general requirements for kernel code: its goal is to serve as a 68 general requirements for kernel code: its goal is to serve as a
70 primer for Linux kernel development for experienced C 69 primer for Linux kernel development for experienced C
@@ -96,13 +95,13 @@
96 95
97 <listitem> 96 <listitem>
98 <para> 97 <para>
99 not associated with any process, serving a softirq, tasklet or bh; 98 not associated with any process, serving a softirq or tasklet;
100 </para> 99 </para>
101 </listitem> 100 </listitem>
102 101
103 <listitem> 102 <listitem>
104 <para> 103 <para>
105 running in kernel space, associated with a process; 104 running in kernel space, associated with a process (user context);
106 </para> 105 </para>
107 </listitem> 106 </listitem>
108 107
@@ -114,11 +113,12 @@
114 </itemizedlist> 113 </itemizedlist>
115 114
116 <para> 115 <para>
117 There is a strict ordering between these: other than the last 116 There is an ordering between these. The bottom two can preempt
118 category (userspace) each can only be pre-empted by those above. 117 each other, but above that is a strict hierarchy: each can only be
119 For example, while a softirq is running on a CPU, no other 118 preempted by the ones above it. For example, while a softirq is
120 softirq will pre-empt it, but a hardware interrupt can. However, 119 running on a CPU, no other softirq will preempt it, but a hardware
121 any other CPUs in the system execute independently. 120 interrupt can. However, any other CPUs in the system execute
121 independently.
122 </para> 122 </para>
123 123
124 <para> 124 <para>
@@ -130,10 +130,10 @@
130 <title>User Context</title> 130 <title>User Context</title>
131 131
132 <para> 132 <para>
133 User context is when you are coming in from a system call or 133 User context is when you are coming in from a system call or other
134 other trap: you can sleep, and you own the CPU (except for 134 trap: like userspace, you can be preempted by more important tasks
135 interrupts) until you call <function>schedule()</function>. 135 and by interrupts. You can sleep, by calling
136 In other words, user context (unlike userspace) is not pre-emptable. 136 <function>schedule()</function>.
137 </para> 137 </para>
138 138
139 <note> 139 <note>
@@ -153,7 +153,7 @@
153 153
154 <caution> 154 <caution>
155 <para> 155 <para>
156 Beware that if you have interrupts or bottom halves disabled 156 Beware that if you have preemption or softirqs disabled
157 (see below), <function>in_interrupt()</function> will return a 157 (see below), <function>in_interrupt()</function> will return a
158 false positive. 158 false positive.
159 </para> 159 </para>
@@ -168,10 +168,10 @@
168 <hardware>keyboard</hardware> are examples of real 168 <hardware>keyboard</hardware> are examples of real
169 hardware which produce interrupts at any time. The kernel runs 169 hardware which produce interrupts at any time. The kernel runs
170 interrupt handlers, which services the hardware. The kernel 170 interrupt handlers, which services the hardware. The kernel
171 guarantees that this handler is never re-entered: if another 171 guarantees that this handler is never re-entered: if the same
172 interrupt arrives, it is queued (or dropped). Because it 172 interrupt arrives, it is queued (or dropped). Because it
173 disables interrupts, this handler has to be fast: frequently it 173 disables interrupts, this handler has to be fast: frequently it
174 simply acknowledges the interrupt, marks a `software interrupt' 174 simply acknowledges the interrupt, marks a 'software interrupt'
175 for execution and exits. 175 for execution and exits.
176 </para> 176 </para>
177 177
@@ -188,60 +188,52 @@
188 </sect1> 188 </sect1>
189 189
190 <sect1 id="basics-softirqs"> 190 <sect1 id="basics-softirqs">
191 <title>Software Interrupt Context: Bottom Halves, Tasklets, softirqs</title> 191 <title>Software Interrupt Context: Softirqs and Tasklets</title>
192 192
193 <para> 193 <para>
194 Whenever a system call is about to return to userspace, or a 194 Whenever a system call is about to return to userspace, or a
195 hardware interrupt handler exits, any `software interrupts' 195 hardware interrupt handler exits, any 'software interrupts'
196 which are marked pending (usually by hardware interrupts) are 196 which are marked pending (usually by hardware interrupts) are
197 run (<filename>kernel/softirq.c</filename>). 197 run (<filename>kernel/softirq.c</filename>).
198 </para> 198 </para>
199 199
200 <para> 200 <para>
201 Much of the real interrupt handling work is done here. Early in 201 Much of the real interrupt handling work is done here. Early in
202 the transition to <acronym>SMP</acronym>, there were only `bottom 202 the transition to <acronym>SMP</acronym>, there were only 'bottom
203 halves' (BHs), which didn't take advantage of multiple CPUs. Shortly 203 halves' (BHs), which didn't take advantage of multiple CPUs. Shortly
204 after we switched from wind-up computers made of match-sticks and snot, 204 after we switched from wind-up computers made of match-sticks and snot,
205 we abandoned this limitation. 205 we abandoned this limitation and switched to 'softirqs'.
206 </para> 206 </para>
207 207
208 <para> 208 <para>
209 <filename class="headerfile">include/linux/interrupt.h</filename> lists the 209 <filename class="headerfile">include/linux/interrupt.h</filename> lists the
210 different BH's. No matter how many CPUs you have, no two BHs will run at 210 different softirqs. A very important softirq is the
211 the same time. This made the transition to SMP simpler, but sucks hard for 211 timer softirq (<filename
212 scalable performance. A very important bottom half is the timer 212 class="headerfile">include/linux/timer.h</filename>): you can
213 BH (<filename class="headerfile">include/linux/timer.h</filename>): you 213 register to have it call functions for you in a given length of
214 can register to have it call functions for you in a given length of time. 214 time.
215 </para> 215 </para>
216 216
217 <para> 217 <para>
218 2.3.43 introduced softirqs, and re-implemented the (now 218 Softirqs are often a pain to deal with, since the same softirq
219 deprecated) BHs underneath them. Softirqs are fully-SMP 219 will run simultaneously on more than one CPU. For this reason,
220 versions of BHs: they can run on as many CPUs at once as 220 tasklets (<filename
221 required. This means they need to deal with any races in shared 221 class="headerfile">include/linux/interrupt.h</filename>) are more
222 data using their own locks. A bitmask is used to keep track of 222 often used: they are dynamically-registrable (meaning you can have
223 which are enabled, so the 32 available softirqs should not be 223 as many as you want), and they also guarantee that any tasklet
224 used up lightly. (<emphasis>Yes</emphasis>, people will 224 will only run on one CPU at any time, although different tasklets
225 notice). 225 can run simultaneously.
226 </para>
227
228 <para>
229 tasklets (<filename class="headerfile">include/linux/interrupt.h</filename>)
230 are like softirqs, except they are dynamically-registrable (meaning you
231 can have as many as you want), and they also guarantee that any tasklet
232 will only run on one CPU at any time, although different tasklets can
233 run simultaneously (unlike different BHs).
234 </para> 226 </para>
235 <caution> 227 <caution>
236 <para> 228 <para>
237 The name `tasklet' is misleading: they have nothing to do with `tasks', 229 The name 'tasklet' is misleading: they have nothing to do with 'tasks',
238 and probably more to do with some bad vodka Alexey Kuznetsov had at the 230 and probably more to do with some bad vodka Alexey Kuznetsov had at the
239 time. 231 time.
240 </para> 232 </para>
241 </caution> 233 </caution>
242 234
243 <para> 235 <para>
244 You can tell you are in a softirq (or bottom half, or tasklet) 236 You can tell you are in a softirq (or tasklet)
245 using the <function>in_softirq()</function> macro 237 using the <function>in_softirq()</function> macro
246 (<filename class="headerfile">include/linux/interrupt.h</filename>). 238 (<filename class="headerfile">include/linux/interrupt.h</filename>).
247 </para> 239 </para>
@@ -288,11 +280,10 @@
288 <term>A rigid stack limit</term> 280 <term>A rigid stack limit</term>
289 <listitem> 281 <listitem>
290 <para> 282 <para>
291 The kernel stack is about 6K in 2.2 (for most 283 Depending on configuration options the kernel stack is about 3K to 6K for most 32-bit architectures: it's
292 architectures: it's about 14K on the Alpha), and shared 284 about 14K on most 64-bit archs, and often shared with interrupts
293 with interrupts so you can't use it all. Avoid deep 285 so you can't use it all. Avoid deep recursion and huge local
294 recursion and huge local arrays on the stack (allocate 286 arrays on the stack (allocate them dynamically instead).
295 them dynamically instead).
296 </para> 287 </para>
297 </listitem> 288 </listitem>
298 </varlistentry> 289 </varlistentry>
@@ -339,7 +330,7 @@ asmlinkage long sys_mycall(int arg)
339 330
340 <para> 331 <para>
341 If all your routine does is read or write some parameter, consider 332 If all your routine does is read or write some parameter, consider
342 implementing a <function>sysctl</function> interface instead. 333 implementing a <function>sysfs</function> interface instead.
343 </para> 334 </para>
344 335
345 <para> 336 <para>
@@ -417,7 +408,10 @@ cond_resched(); /* Will sleep */
417 </para> 408 </para>
418 409
419 <para> 410 <para>
420 You will eventually lock up your box if you break these rules. 411 You should always compile your kernel
412 <symbol>CONFIG_DEBUG_SPINLOCK_SLEEP</symbol> on, and it will warn
413 you if you break these rules. If you <emphasis>do</emphasis> break
414 the rules, you will eventually lock up your box.
421 </para> 415 </para>
422 416
423 <para> 417 <para>
@@ -515,8 +509,7 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
515 success). 509 success).
516 </para> 510 </para>
517 </caution> 511 </caution>
518 [Yes, this moronic interface makes me cringe. Please submit a 512 [Yes, this moronic interface makes me cringe. The flamewar comes up every year or so. --RR.]
519 patch and become my hero --RR.]
520 </para> 513 </para>
521 <para> 514 <para>
522 The functions may sleep implicitly. This should never be called 515 The functions may sleep implicitly. This should never be called
@@ -587,10 +580,11 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
587 </variablelist> 580 </variablelist>
588 581
589 <para> 582 <para>
590 If you see a <errorname>kmem_grow: Called nonatomically from int 583 If you see a <errorname>sleeping function called from invalid
591 </errorname> warning message you called a memory allocation function 584 context</errorname> warning message, then maybe you called a
592 from interrupt context without <constant>GFP_ATOMIC</constant>. 585 sleeping allocation function from interrupt context without
593 You should really fix that. Run, don't walk. 586 <constant>GFP_ATOMIC</constant>. You should really fix that.
587 Run, don't walk.
594 </para> 588 </para>
595 589
596 <para> 590 <para>
@@ -639,16 +633,16 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
639 </sect1> 633 </sect1>
640 634
641 <sect1 id="routines-udelay"> 635 <sect1 id="routines-udelay">
642 <title><function>udelay()</function>/<function>mdelay()</function> 636 <title><function>mdelay()</function>/<function>udelay()</function>
643 <filename class="headerfile">include/asm/delay.h</filename> 637 <filename class="headerfile">include/asm/delay.h</filename>
644 <filename class="headerfile">include/linux/delay.h</filename> 638 <filename class="headerfile">include/linux/delay.h</filename>
645 </title> 639 </title>
646 640
647 <para> 641 <para>
648 The <function>udelay()</function> function can be used for small pauses. 642 The <function>udelay()</function> and <function>ndelay()</function> functions can be used for small pauses.
649 Do not use large values with <function>udelay()</function> as you risk 643 Do not use large values with them as you risk
650 overflow - the helper function <function>mdelay()</function> is useful 644 overflow - the helper function <function>mdelay()</function> is useful
651 here, or even consider <function>schedule_timeout()</function>. 645 here, or consider <function>msleep()</function>.
652 </para> 646 </para>
653 </sect1> 647 </sect1>
654 648
@@ -698,8 +692,8 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
698 These routines disable soft interrupts on the local CPU, and 692 These routines disable soft interrupts on the local CPU, and
699 restore them. They are reentrant; if soft interrupts were 693 restore them. They are reentrant; if soft interrupts were
700 disabled before, they will still be disabled after this pair 694 disabled before, they will still be disabled after this pair
701 of functions has been called. They prevent softirqs, tasklets 695 of functions has been called. They prevent softirqs and tasklets
702 and bottom halves from running on the current CPU. 696 from running on the current CPU.
703 </para> 697 </para>
704 </sect1> 698 </sect1>
705 699
@@ -708,10 +702,16 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
708 <filename class="headerfile">include/asm/smp.h</filename></title> 702 <filename class="headerfile">include/asm/smp.h</filename></title>
709 703
710 <para> 704 <para>
711 <function>smp_processor_id()</function> returns the current 705 <function>get_cpu()</function> disables preemption (so you won't
712 processor number, between 0 and <symbol>NR_CPUS</symbol> (the 706 suddenly get moved to another CPU) and returns the current
713 maximum number of CPUs supported by Linux, currently 32). These 707 processor number, between 0 and <symbol>NR_CPUS</symbol>. Note
714 values are not necessarily continuous. 708 that the CPU numbers are not necessarily continuous. You return
709 it again with <function>put_cpu()</function> when you are done.
710 </para>
711 <para>
712 If you know you cannot be preempted by another task (ie. you are
713 in interrupt context, or have preemption disabled) you can use
714 smp_processor_id().
715 </para> 715 </para>
716 </sect1> 716 </sect1>
717 717
@@ -722,19 +722,14 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
722 <para> 722 <para>
723 After boot, the kernel frees up a special section; functions 723 After boot, the kernel frees up a special section; functions
724 marked with <type>__init</type> and data structures marked with 724 marked with <type>__init</type> and data structures marked with
725 <type>__initdata</type> are dropped after boot is complete (within 725 <type>__initdata</type> are dropped after boot is complete: similarly
726 modules this directive is currently ignored). <type>__exit</type> 726 modules discard this memory after initialization. <type>__exit</type>
727 is used to declare a function which is only required on exit: the 727 is used to declare a function which is only required on exit: the
728 function will be dropped if this file is not compiled as a module. 728 function will be dropped if this file is not compiled as a module.
729 See the header file for use. Note that it makes no sense for a function 729 See the header file for use. Note that it makes no sense for a function
730 marked with <type>__init</type> to be exported to modules with 730 marked with <type>__init</type> to be exported to modules with
731 <function>EXPORT_SYMBOL()</function> - this will break. 731 <function>EXPORT_SYMBOL()</function> - this will break.
732 </para> 732 </para>
733 <para>
734 Static data structures marked as <type>__initdata</type> must be initialised
735 (as opposed to ordinary static data which is zeroed BSS) and cannot be
736 <type>const</type>.
737 </para>
738 733
739 </sect1> 734 </sect1>
740 735
@@ -762,9 +757,8 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
762 <para> 757 <para>
763 The function can return a negative error number to cause 758 The function can return a negative error number to cause
764 module loading to fail (unfortunately, this has no effect if 759 module loading to fail (unfortunately, this has no effect if
765 the module is compiled into the kernel). For modules, this is 760 the module is compiled into the kernel). This function is
766 called in user context, with interrupts enabled, and the 761 called in user context with interrupts enabled, so it can sleep.
767 kernel lock held, so it can sleep.
768 </para> 762 </para>
769 </sect1> 763 </sect1>
770 764
@@ -779,6 +773,34 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
779 reached zero. This function can also sleep, but cannot fail: 773 reached zero. This function can also sleep, but cannot fail:
780 everything must be cleaned up by the time it returns. 774 everything must be cleaned up by the time it returns.
781 </para> 775 </para>
776
777 <para>
778 Note that this macro is optional: if it is not present, your
779 module will not be removable (except for 'rmmod -f').
780 </para>
781 </sect1>
782
783 <sect1 id="routines-module-use-counters">
784 <title> <function>try_module_get()</function>/<function>module_put()</function>
785 <filename class="headerfile">include/linux/module.h</filename></title>
786
787 <para>
788 These manipulate the module usage count, to protect against
789 removal (a module also can't be removed if another module uses one
790 of its exported symbols: see below). Before calling into module
791 code, you should call <function>try_module_get()</function> on
792 that module: if it fails, then the module is being removed and you
793 should act as if it wasn't there. Otherwise, you can safely enter
794 the module, and call <function>module_put()</function> when you're
795 finished.
796 </para>
797
798 <para>
799 Most registerable structures have an
800 <structfield>owner</structfield> field, such as in the
801 <structname>file_operations</structname> structure. Set this field
802 to the macro <symbol>THIS_MODULE</symbol>.
803 </para>
782 </sect1> 804 </sect1>
783 805
784 <!-- add info on new-style module refcounting here --> 806 <!-- add info on new-style module refcounting here -->
@@ -821,7 +843,7 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
821 There is a macro to do this: 843 There is a macro to do this:
822 <function>wait_event_interruptible()</function> 844 <function>wait_event_interruptible()</function>
823 845
824 <filename class="headerfile">include/linux/sched.h</filename> The 846 <filename class="headerfile">include/linux/wait.h</filename> The
825 first argument is the wait queue head, and the second is an 847 first argument is the wait queue head, and the second is an
826 expression which is evaluated; the macro returns 848 expression which is evaluated; the macro returns
827 <returnvalue>0</returnvalue> when this expression is true, or 849 <returnvalue>0</returnvalue> when this expression is true, or
@@ -847,10 +869,11 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
847 <para> 869 <para>
848 Call <function>wake_up()</function> 870 Call <function>wake_up()</function>
849 871
850 <filename class="headerfile">include/linux/sched.h</filename>;, 872 <filename class="headerfile">include/linux/wait.h</filename>;,
851 which will wake up every process in the queue. The exception is 873 which will wake up every process in the queue. The exception is
852 if one has <constant>TASK_EXCLUSIVE</constant> set, in which case 874 if one has <constant>TASK_EXCLUSIVE</constant> set, in which case
853 the remainder of the queue will not be woken. 875 the remainder of the queue will not be woken. There are other variants
876 of this basic function available in the same header.
854 </para> 877 </para>
855 </sect1> 878 </sect1>
856 </chapter> 879 </chapter>
@@ -863,7 +886,7 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
863 first class of operations work on <type>atomic_t</type> 886 first class of operations work on <type>atomic_t</type>
864 887
865 <filename class="headerfile">include/asm/atomic.h</filename>; this 888 <filename class="headerfile">include/asm/atomic.h</filename>; this
866 contains a signed integer (at least 24 bits long), and you must use 889 contains a signed integer (at least 32 bits long), and you must use
867 these functions to manipulate or read atomic_t variables. 890 these functions to manipulate or read atomic_t variables.
868 <function>atomic_read()</function> and 891 <function>atomic_read()</function> and
869 <function>atomic_set()</function> get and set the counter, 892 <function>atomic_set()</function> get and set the counter,
@@ -882,13 +905,12 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
882 905
883 <para> 906 <para>
884 Note that these functions are slower than normal arithmetic, and 907 Note that these functions are slower than normal arithmetic, and
885 so should not be used unnecessarily. On some platforms they 908 so should not be used unnecessarily.
886 are much slower, like 32-bit Sparc where they use a spinlock.
887 </para> 909 </para>
888 910
889 <para> 911 <para>
890 The second class of atomic operations is atomic bit operations on a 912 The second class of atomic operations is atomic bit operations on an
891 <type>long</type>, defined in 913 <type>unsigned long</type>, defined in
892 914
893 <filename class="headerfile">include/linux/bitops.h</filename>. These 915 <filename class="headerfile">include/linux/bitops.h</filename>. These
894 operations generally take a pointer to the bit pattern, and a bit 916 operations generally take a pointer to the bit pattern, and a bit
@@ -899,7 +921,7 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
899 <function>test_and_clear_bit()</function> and 921 <function>test_and_clear_bit()</function> and
900 <function>test_and_change_bit()</function> do the same thing, 922 <function>test_and_change_bit()</function> do the same thing,
901 except return true if the bit was previously set; these are 923 except return true if the bit was previously set; these are
902 particularly useful for very simple locking. 924 particularly useful for atomically setting flags.
903 </para> 925 </para>
904 926
905 <para> 927 <para>
@@ -907,12 +929,6 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
907 than BITS_PER_LONG. The resulting behavior is strange on big-endian 929 than BITS_PER_LONG. The resulting behavior is strange on big-endian
908 platforms though so it is a good idea not to do this. 930 platforms though so it is a good idea not to do this.
909 </para> 931 </para>
910
911 <para>
912 Note that the order of bits depends on the architecture, and in
913 particular, the bitfield passed to these operations must be at
914 least as large as a <type>long</type>.
915 </para>
916 </chapter> 932 </chapter>
917 933
918 <chapter id="symbols"> 934 <chapter id="symbols">
@@ -932,11 +948,8 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
932 <filename class="headerfile">include/linux/module.h</filename></title> 948 <filename class="headerfile">include/linux/module.h</filename></title>
933 949
934 <para> 950 <para>
935 This is the classic method of exporting a symbol, and it works 951 This is the classic method of exporting a symbol: dynamically
936 for both modules and non-modules. In the kernel all these 952 loaded modules will be able to use the symbol as normal.
937 declarations are often bundled into a single file to help
938 genksyms (which searches source files for these declarations).
939 See the comment on genksyms and Makefiles below.
940 </para> 953 </para>
941 </sect1> 954 </sect1>
942 955
@@ -949,7 +962,8 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
949 symbols exported by <function>EXPORT_SYMBOL_GPL()</function> can 962 symbols exported by <function>EXPORT_SYMBOL_GPL()</function> can
950 only be seen by modules with a 963 only be seen by modules with a
951 <function>MODULE_LICENSE()</function> that specifies a GPL 964 <function>MODULE_LICENSE()</function> that specifies a GPL
952 compatible license. 965 compatible license. It implies that the function is considered
966 an internal implementation issue, and not really an interface.
953 </para> 967 </para>
954 </sect1> 968 </sect1>
955 </chapter> 969 </chapter>
@@ -962,12 +976,13 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
962 <filename class="headerfile">include/linux/list.h</filename></title> 976 <filename class="headerfile">include/linux/list.h</filename></title>
963 977
964 <para> 978 <para>
965 There are three sets of linked-list routines in the kernel 979 There used to be three sets of linked-list routines in the kernel
966 headers, but this one seems to be winning out (and Linus has 980 headers, but this one is the winner. If you don't have some
967 used it). If you don't have some particular pressing need for 981 particular pressing need for a single list, it's a good choice.
968 a single list, it's a good choice. In fact, I don't care 982 </para>
969 whether it's a good choice or not, just use it so we can get 983
970 rid of the others. 984 <para>
985 In particular, <function>list_for_each_entry</function> is useful.
971 </para> 986 </para>
972 </sect1> 987 </sect1>
973 988
@@ -979,14 +994,13 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
979 convention, and return <returnvalue>0</returnvalue> for success, 994 convention, and return <returnvalue>0</returnvalue> for success,
980 and a negative error number 995 and a negative error number
981 (eg. <returnvalue>-EFAULT</returnvalue>) for failure. This can be 996 (eg. <returnvalue>-EFAULT</returnvalue>) for failure. This can be
982 unintuitive at first, but it's fairly widespread in the networking 997 unintuitive at first, but it's fairly widespread in the kernel.
983 code, for example.
984 </para> 998 </para>
985 999
986 <para> 1000 <para>
987 The filesystem code uses <function>ERR_PTR()</function> 1001 Using <function>ERR_PTR()</function>
988 1002
989 <filename class="headerfile">include/linux/fs.h</filename>; to 1003 <filename class="headerfile">include/linux/err.h</filename>; to
990 encode a negative error number into a pointer, and 1004 encode a negative error number into a pointer, and
991 <function>IS_ERR()</function> and <function>PTR_ERR()</function> 1005 <function>IS_ERR()</function> and <function>PTR_ERR()</function>
992 to get it back out again: avoids a separate pointer parameter for 1006 to get it back out again: avoids a separate pointer parameter for
@@ -1040,7 +1054,7 @@ static struct block_device_operations opt_fops = {
1040 supported, due to lack of general use, but the following are 1054 supported, due to lack of general use, but the following are
1041 considered standard (see the GCC info page section "C 1055 considered standard (see the GCC info page section "C
1042 Extensions" for more details - Yes, really the info page, the 1056 Extensions" for more details - Yes, really the info page, the
1043 man page is only a short summary of the stuff in info): 1057 man page is only a short summary of the stuff in info).
1044 </para> 1058 </para>
1045 <itemizedlist> 1059 <itemizedlist>
1046 <listitem> 1060 <listitem>
@@ -1091,7 +1105,7 @@ static struct block_device_operations opt_fops = {
1091 </listitem> 1105 </listitem>
1092 <listitem> 1106 <listitem>
1093 <para> 1107 <para>
1094 Function names as strings (__FUNCTION__) 1108 Function names as strings (__func__).
1095 </para> 1109 </para>
1096 </listitem> 1110 </listitem>
1097 <listitem> 1111 <listitem>
@@ -1164,63 +1178,35 @@ static struct block_device_operations opt_fops = {
1164 <listitem> 1178 <listitem>
1165 <para> 1179 <para>
1166 Usually you want a configuration option for your kernel hack. 1180 Usually you want a configuration option for your kernel hack.
1167 Edit <filename>Config.in</filename> in the appropriate directory 1181 Edit <filename>Kconfig</filename> in the appropriate directory.
1168 (but under <filename>arch/</filename> it's called 1182 The Config language is simple to use by cut and paste, and there's
1169 <filename>config.in</filename>). The Config Language used is not 1183 complete documentation in
1170 bash, even though it looks like bash; the safe way is to use only 1184 <filename>Documentation/kbuild/kconfig-language.txt</filename>.
1171 the constructs that you already see in
1172 <filename>Config.in</filename> files (see
1173 <filename>Documentation/kbuild/kconfig-language.txt</filename>).
1174 It's good to run "make xconfig" at least once to test (because
1175 it's the only one with a static parser).
1176 </para>
1177
1178 <para>
1179 Variables which can be Y or N use <type>bool</type> followed by a
1180 tagline and the config define name (which must start with
1181 CONFIG_). The <type>tristate</type> function is the same, but
1182 allows the answer M (which defines
1183 <symbol>CONFIG_foo_MODULE</symbol> in your source, instead of
1184 <symbol>CONFIG_FOO</symbol>) if <symbol>CONFIG_MODULES</symbol>
1185 is enabled.
1186 </para> 1185 </para>
1187 1186
1188 <para> 1187 <para>
1189 You may well want to make your CONFIG option only visible if 1188 You may well want to make your CONFIG option only visible if
1190 <symbol>CONFIG_EXPERIMENTAL</symbol> is enabled: this serves as a 1189 <symbol>CONFIG_EXPERIMENTAL</symbol> is enabled: this serves as a
1191 warning to users. There many other fancy things you can do: see 1190 warning to users. There many other fancy things you can do: see
1192 the various <filename>Config.in</filename> files for ideas. 1191 the various <filename>Kconfig</filename> files for ideas.
1193 </para> 1192 </para>
1194 </listitem>
1195 1193
1196 <listitem>
1197 <para> 1194 <para>
1198 Edit the <filename>Makefile</filename>: the CONFIG variables are 1195 In your description of the option, make sure you address both the
1199 exported here so you can conditionalize compilation with `ifeq'. 1196 expert user and the user who knows nothing about your feature. Mention
1200 If your file exports symbols then add the names to 1197 incompatibilities and issues here. <emphasis> Definitely
1201 <varname>export-objs</varname> so that genksyms will find them. 1198 </emphasis> end your description with <quote> if in doubt, say N
1202 <caution> 1199 </quote> (or, occasionally, `Y'); this is for people who have no
1203 <para> 1200 idea what you are talking about.
1204 There is a restriction on the kernel build system that objects
1205 which export symbols must have globally unique names.
1206 If your object does not have a globally unique name then the
1207 standard fix is to move the
1208 <function>EXPORT_SYMBOL()</function> statements to their own
1209 object with a unique name.
1210 This is why several systems have separate exporting objects,
1211 usually suffixed with ksyms.
1212 </para>
1213 </caution>
1214 </para> 1201 </para>
1215 </listitem> 1202 </listitem>
1216 1203
1217 <listitem> 1204 <listitem>
1218 <para> 1205 <para>
1219 Document your option in Documentation/Configure.help. Mention 1206 Edit the <filename>Makefile</filename>: the CONFIG variables are
1220 incompatibilities and issues here. <emphasis> Definitely 1207 exported here so you can usually just add a "obj-$(CONFIG_xxx) +=
1221 </emphasis> end your description with <quote> if in doubt, say N 1208 xxx.o" line. The syntax is documented in
1222 </quote> (or, occasionally, `Y'); this is for people who have no 1209 <filename>Documentation/kbuild/makefiles.txt</filename>.
1223 idea what you are talking about.
1224 </para> 1210 </para>
1225 </listitem> 1211 </listitem>
1226 1212
@@ -1253,20 +1239,12 @@ static struct block_device_operations opt_fops = {
1253 </para> 1239 </para>
1254 1240
1255 <para> 1241 <para>
1256 <filename>include/linux/brlock.h:</filename> 1242 <filename>include/asm-i386/delay.h:</filename>
1257 </para> 1243 </para>
1258 <programlisting> 1244 <programlisting>
1259extern inline void br_read_lock (enum brlock_indices idx) 1245#define ndelay(n) (__builtin_constant_p(n) ? \
1260{ 1246 ((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \
1261 /* 1247 __ndelay(n))
1262 * This causes a link-time bug message if an
1263 * invalid index is used:
1264 */
1265 if (idx >= __BR_END)
1266 __br_lock_usage_bug();
1267
1268 read_lock(&amp;__brlock_array[smp_processor_id()][idx]);
1269}
1270 </programlisting> 1248 </programlisting>
1271 1249
1272 <para> 1250 <para>
diff --git a/Documentation/DocBook/usb.tmpl b/Documentation/DocBook/usb.tmpl
index f3ef0bf435e9..705c442c7bf4 100644
--- a/Documentation/DocBook/usb.tmpl
+++ b/Documentation/DocBook/usb.tmpl
@@ -841,7 +841,7 @@ usbdev_ioctl (int fd, int ifno, unsigned request, void *param)
841 File modification time is not updated by this request. 841 File modification time is not updated by this request.
842 </para><para> 842 </para><para>
843 Those struct members are from some interface descriptor 843 Those struct members are from some interface descriptor
844 applying to the the current configuration. 844 applying to the current configuration.
845 The interface number is the bInterfaceNumber value, and 845 The interface number is the bInterfaceNumber value, and
846 the altsetting number is the bAlternateSetting value. 846 the altsetting number is the bAlternateSetting value.
847 (This resets each endpoint in the interface.) 847 (This resets each endpoint in the interface.)
diff --git a/Documentation/MSI-HOWTO.txt b/Documentation/MSI-HOWTO.txt
index d5032eb480aa..63edc5f847c4 100644
--- a/Documentation/MSI-HOWTO.txt
+++ b/Documentation/MSI-HOWTO.txt
@@ -430,7 +430,7 @@ which may result in system hang. The software driver of specific
430MSI-capable hardware is responsible for whether calling 430MSI-capable hardware is responsible for whether calling
431pci_enable_msi or not. A return of zero indicates the kernel 431pci_enable_msi or not. A return of zero indicates the kernel
432successfully initializes the MSI/MSI-X capability structure of the 432successfully initializes the MSI/MSI-X capability structure of the
433device funtion. The device function is now running on MSI/MSI-X mode. 433device function. The device function is now running on MSI/MSI-X mode.
434 434
4355.6 How to tell whether MSI/MSI-X is enabled on device function 4355.6 How to tell whether MSI/MSI-X is enabled on device function
436 436
diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt
index 9c6d450138ea..fcbcbc35b122 100644
--- a/Documentation/RCU/RTFP.txt
+++ b/Documentation/RCU/RTFP.txt
@@ -2,7 +2,8 @@ Read the F-ing Papers!
2 2
3 3
4This document describes RCU-related publications, and is followed by 4This document describes RCU-related publications, and is followed by
5the corresponding bibtex entries. 5the corresponding bibtex entries. A number of the publications may
6be found at http://www.rdrop.com/users/paulmck/RCU/.
6 7
7The first thing resembling RCU was published in 1980, when Kung and Lehman 8The first thing resembling RCU was published in 1980, when Kung and Lehman
8[Kung80] recommended use of a garbage collector to defer destruction 9[Kung80] recommended use of a garbage collector to defer destruction
@@ -113,6 +114,10 @@ describing how to make RCU safe for soft-realtime applications [Sarma04c],
113and a paper describing SELinux performance with RCU [JamesMorris04b]. 114and a paper describing SELinux performance with RCU [JamesMorris04b].
114 115
115 116
1172005 has seen further adaptation of RCU to realtime use, permitting
118preemption of RCU realtime critical sections [PaulMcKenney05a,
119PaulMcKenney05b].
120
116Bibtex Entries 121Bibtex Entries
117 122
118@article{Kung80 123@article{Kung80
@@ -410,3 +415,32 @@ Oregon Health and Sciences University"
410\url{http://www.livejournal.com/users/james_morris/2153.html} 415\url{http://www.livejournal.com/users/james_morris/2153.html}
411[Viewed December 10, 2004]" 416[Viewed December 10, 2004]"
412} 417}
418
419@unpublished{PaulMcKenney05a
420,Author="Paul E. McKenney"
421,Title="{[RFC]} {RCU} and {CONFIG\_PREEMPT\_RT} progress"
422,month="May"
423,year="2005"
424,note="Available:
425\url{http://lkml.org/lkml/2005/5/9/185}
426[Viewed May 13, 2005]"
427,annotation="
428 First publication of working lock-based deferred free patches
429 for the CONFIG_PREEMPT_RT environment.
430"
431}
432
433@conference{PaulMcKenney05b
434,Author="Paul E. McKenney and Dipankar Sarma"
435,Title="Towards Hard Realtime Response from the Linux Kernel on SMP Hardware"
436,Booktitle="linux.conf.au 2005"
437,month="April"
438,year="2005"
439,address="Canberra, Australia"
440,note="Available:
441\url{http://www.rdrop.com/users/paulmck/RCU/realtimeRCU.2005.04.23a.pdf}
442[Viewed May 13, 2005]"
443,annotation="
444 Realtime turns into making RCU yet more realtime friendly.
445"
446}
diff --git a/Documentation/RCU/UP.txt b/Documentation/RCU/UP.txt
index 3bfb84b3b7db..aab4a9ec3931 100644
--- a/Documentation/RCU/UP.txt
+++ b/Documentation/RCU/UP.txt
@@ -8,7 +8,7 @@ is that since there is only one CPU, it should not be necessary to
8wait for anything else to get done, since there are no other CPUs for 8wait for anything else to get done, since there are no other CPUs for
9anything else to be happening on. Although this approach will -sort- -of- 9anything else to be happening on. Although this approach will -sort- -of-
10work a surprising amount of the time, it is a very bad idea in general. 10work a surprising amount of the time, it is a very bad idea in general.
11This document presents two examples that demonstrate exactly how bad an 11This document presents three examples that demonstrate exactly how bad an
12idea this is. 12idea this is.
13 13
14 14
@@ -26,6 +26,9 @@ from softirq, the list scan would find itself referencing a newly freed
26element B. This situation can greatly decrease the life expectancy of 26element B. This situation can greatly decrease the life expectancy of
27your kernel. 27your kernel.
28 28
29This same problem can occur if call_rcu() is invoked from a hardware
30interrupt handler.
31
29 32
30Example 2: Function-Call Fatality 33Example 2: Function-Call Fatality
31 34
@@ -44,8 +47,37 @@ its arguments would cause it to fail to make the fundamental guarantee
44underlying RCU, namely that call_rcu() defers invoking its arguments until 47underlying RCU, namely that call_rcu() defers invoking its arguments until
45all RCU read-side critical sections currently executing have completed. 48all RCU read-side critical sections currently executing have completed.
46 49
47Quick Quiz: why is it -not- legal to invoke synchronize_rcu() in 50Quick Quiz #1: why is it -not- legal to invoke synchronize_rcu() in
48this case? 51 this case?
52
53
54Example 3: Death by Deadlock
55
56Suppose that call_rcu() is invoked while holding a lock, and that the
57callback function must acquire this same lock. In this case, if
58call_rcu() were to directly invoke the callback, the result would
59be self-deadlock.
60
61In some cases, it would possible to restructure to code so that
62the call_rcu() is delayed until after the lock is released. However,
63there are cases where this can be quite ugly:
64
651. If a number of items need to be passed to call_rcu() within
66 the same critical section, then the code would need to create
67 a list of them, then traverse the list once the lock was
68 released.
69
702. In some cases, the lock will be held across some kernel API,
71 so that delaying the call_rcu() until the lock is released
72 requires that the data item be passed up via a common API.
73 It is far better to guarantee that callbacks are invoked
74 with no locks held than to have to modify such APIs to allow
75 arbitrary data items to be passed back up through them.
76
77If call_rcu() directly invokes the callback, painful locking restrictions
78or API changes would be required.
79
80Quick Quiz #2: What locking restriction must RCU callbacks respect?
49 81
50 82
51Summary 83Summary
@@ -53,12 +85,35 @@ Summary
53Permitting call_rcu() to immediately invoke its arguments or permitting 85Permitting call_rcu() to immediately invoke its arguments or permitting
54synchronize_rcu() to immediately return breaks RCU, even on a UP system. 86synchronize_rcu() to immediately return breaks RCU, even on a UP system.
55So do not do it! Even on a UP system, the RCU infrastructure -must- 87So do not do it! Even on a UP system, the RCU infrastructure -must-
56respect grace periods. 88respect grace periods, and -must- invoke callbacks from a known environment
57 89in which no locks are held.
58 90
59Answer to Quick Quiz 91
60 92Answer to Quick Quiz #1:
61The calling function is scanning an RCU-protected linked list, and 93 Why is it -not- legal to invoke synchronize_rcu() in this case?
62is therefore within an RCU read-side critical section. Therefore, 94
63the called function has been invoked within an RCU read-side critical 95 Because the calling function is scanning an RCU-protected linked
64section, and is not permitted to block. 96 list, and is therefore within an RCU read-side critical section.
97 Therefore, the called function has been invoked within an RCU
98 read-side critical section, and is not permitted to block.
99
100Answer to Quick Quiz #2:
101 What locking restriction must RCU callbacks respect?
102
103 Any lock that is acquired within an RCU callback must be
104 acquired elsewhere using an _irq variant of the spinlock
105 primitive. For example, if "mylock" is acquired by an
106 RCU callback, then a process-context acquisition of this
107 lock must use something like spin_lock_irqsave() to
108 acquire the lock.
109
110 If the process-context code were to simply use spin_lock(),
111 then, since RCU callbacks can be invoked from softirq context,
112 the callback might be called from a softirq that interrupted
113 the process-context critical section. This would result in
114 self-deadlock.
115
116 This restriction might seem gratuitous, since very few RCU
117 callbacks acquire locks directly. However, a great many RCU
118 callbacks do acquire locks -indirectly-, for example, via
119 the kfree() primitive.
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 8f3fb77c9cd3..e118a7c1a092 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -43,6 +43,10 @@ over a rather long period of time, but improvements are always welcome!
43 rcu_read_lock_bh()) in the read-side critical sections, 43 rcu_read_lock_bh()) in the read-side critical sections,
44 and are also an excellent aid to readability. 44 and are also an excellent aid to readability.
45 45
46 As a rough rule of thumb, any dereference of an RCU-protected
47 pointer must be covered by rcu_read_lock() or rcu_read_lock_bh()
48 or by the appropriate update-side lock.
49
463. Does the update code tolerate concurrent accesses? 503. Does the update code tolerate concurrent accesses?
47 51
48 The whole point of RCU is to permit readers to run without 52 The whole point of RCU is to permit readers to run without
@@ -90,7 +94,11 @@ over a rather long period of time, but improvements are always welcome!
90 94
91 The rcu_dereference() primitive is used by the various 95 The rcu_dereference() primitive is used by the various
92 "_rcu()" list-traversal primitives, such as the 96 "_rcu()" list-traversal primitives, such as the
93 list_for_each_entry_rcu(). 97 list_for_each_entry_rcu(). Note that it is perfectly
98 legal (if redundant) for update-side code to use
99 rcu_dereference() and the "_rcu()" list-traversal
100 primitives. This is particularly useful in code
101 that is common to readers and updaters.
94 102
95 b. If the list macros are being used, the list_add_tail_rcu() 103 b. If the list macros are being used, the list_add_tail_rcu()
96 and list_add_rcu() primitives must be used in order 104 and list_add_rcu() primitives must be used in order
@@ -150,16 +158,9 @@ over a rather long period of time, but improvements are always welcome!
150 158
151 Use of the _rcu() list-traversal primitives outside of an 159 Use of the _rcu() list-traversal primitives outside of an
152 RCU read-side critical section causes no harm other than 160 RCU read-side critical section causes no harm other than
153 a slight performance degradation on Alpha CPUs and some 161 a slight performance degradation on Alpha CPUs. It can
154 confusion on the part of people trying to read the code. 162 also be quite helpful in reducing code bloat when common
155 163 code is shared between readers and updaters.
156 Another way of thinking of this is "If you are holding the
157 lock that prevents the data structure from changing, why do
158 you also need RCU-based protection?" That said, there may
159 well be situations where use of the _rcu() list-traversal
160 primitives while the update-side lock is held results in
161 simpler and more maintainable code. The jury is still out
162 on this question.
163 164
16410. Conversely, if you are in an RCU read-side critical section, 16510. Conversely, if you are in an RCU read-side critical section,
165 you -must- use the "_rcu()" variants of the list macros. 166 you -must- use the "_rcu()" variants of the list macros.
diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt
index eb444006683e..6fa092251586 100644
--- a/Documentation/RCU/rcu.txt
+++ b/Documentation/RCU/rcu.txt
@@ -64,6 +64,54 @@ o I hear that RCU is patented? What is with that?
64 Of these, one was allowed to lapse by the assignee, and the 64 Of these, one was allowed to lapse by the assignee, and the
65 others have been contributed to the Linux kernel under GPL. 65 others have been contributed to the Linux kernel under GPL.
66 66
67o I hear that RCU needs work in order to support realtime kernels?
68
69 Yes, work in progress.
70
67o Where can I find more information on RCU? 71o Where can I find more information on RCU?
68 72
69 See the RTFP.txt file in this directory. 73 See the RTFP.txt file in this directory.
74 Or point your browser at http://www.rdrop.com/users/paulmck/RCU/.
75
76o What are all these files in this directory?
77
78
79 NMI-RCU.txt
80
81 Describes how to use RCU to implement dynamic
82 NMI handlers, which can be revectored on the fly,
83 without rebooting.
84
85 RTFP.txt
86
87 List of RCU-related publications and web sites.
88
89 UP.txt
90
91 Discussion of RCU usage in UP kernels.
92
93 arrayRCU.txt
94
95 Describes how to use RCU to protect arrays, with
96 resizeable arrays whose elements reference other
97 data structures being of the most interest.
98
99 checklist.txt
100
101 Lists things to check for when inspecting code that
102 uses RCU.
103
104 listRCU.txt
105
106 Describes how to use RCU to protect linked lists.
107 This is the simplest and most common use of RCU
108 in the Linux kernel.
109
110 rcu.txt
111
112 You are reading it!
113
114 whatisRCU.txt
115
116 Overview of how the RCU implementation works. Along
117 the way, presents a conceptual view of RCU.
diff --git a/Documentation/RCU/rcuref.txt b/Documentation/RCU/rcuref.txt
new file mode 100644
index 000000000000..a23fee66064d
--- /dev/null
+++ b/Documentation/RCU/rcuref.txt
@@ -0,0 +1,74 @@
1Refcounter framework for elements of lists/arrays protected by
2RCU.
3
4Refcounting on elements of lists which are protected by traditional
5reader/writer spinlocks or semaphores are straight forward as in:
6
71. 2.
8add() search_and_reference()
9{ {
10 alloc_object read_lock(&list_lock);
11 ... search_for_element
12 atomic_set(&el->rc, 1); atomic_inc(&el->rc);
13 write_lock(&list_lock); ...
14 add_element read_unlock(&list_lock);
15 ... ...
16 write_unlock(&list_lock); }
17}
18
193. 4.
20release_referenced() delete()
21{ {
22 ... write_lock(&list_lock);
23 atomic_dec(&el->rc, relfunc) ...
24 ... delete_element
25} write_unlock(&list_lock);
26 ...
27 if (atomic_dec_and_test(&el->rc))
28 kfree(el);
29 ...
30 }
31
32If this list/array is made lock free using rcu as in changing the
33write_lock in add() and delete() to spin_lock and changing read_lock
34in search_and_reference to rcu_read_lock(), the rcuref_get in
35search_and_reference could potentially hold reference to an element which
36has already been deleted from the list/array. rcuref_lf_get_rcu takes
37care of this scenario. search_and_reference should look as;
38
391. 2.
40add() search_and_reference()
41{ {
42 alloc_object rcu_read_lock();
43 ... search_for_element
44 atomic_set(&el->rc, 1); if (rcuref_inc_lf(&el->rc)) {
45 write_lock(&list_lock); rcu_read_unlock();
46 return FAIL;
47 add_element }
48 ... ...
49 write_unlock(&list_lock); rcu_read_unlock();
50} }
513. 4.
52release_referenced() delete()
53{ {
54 ... write_lock(&list_lock);
55 rcuref_dec(&el->rc, relfunc) ...
56 ... delete_element
57} write_unlock(&list_lock);
58 ...
59 if (rcuref_dec_and_test(&el->rc))
60 call_rcu(&el->head, el_free);
61 ...
62 }
63
64Sometimes, reference to the element need to be obtained in the
65update (write) stream. In such cases, rcuref_inc_lf might be an overkill
66since the spinlock serialising list updates are held. rcuref_inc
67is to be used in such cases.
68For arches which do not have cmpxchg rcuref_inc_lf
69api uses a hashed spinlock implementation and the same hashed spinlock
70is acquired in all rcuref_xxx primitives to preserve atomicity.
71Note: Use rcuref_inc api only if you need to use rcuref_inc_lf on the
72refcounter atleast at one place. Mixing rcuref_inc and atomic_xxx api
73might lead to races. rcuref_inc_lf() must be used in lockfree
74RCU critical sections only.
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
new file mode 100644
index 000000000000..354d89c78377
--- /dev/null
+++ b/Documentation/RCU/whatisRCU.txt
@@ -0,0 +1,902 @@
1What is RCU?
2
3RCU is a synchronization mechanism that was added to the Linux kernel
4during the 2.5 development effort that is optimized for read-mostly
5situations. Although RCU is actually quite simple once you understand it,
6getting there can sometimes be a challenge. Part of the problem is that
7most of the past descriptions of RCU have been written with the mistaken
8assumption that there is "one true way" to describe RCU. Instead,
9the experience has been that different people must take different paths
10to arrive at an understanding of RCU. This document provides several
11different paths, as follows:
12
131. RCU OVERVIEW
142. WHAT IS RCU'S CORE API?
153. WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
164. WHAT IF MY UPDATING THREAD CANNOT BLOCK?
175. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
186. ANALOGY WITH READER-WRITER LOCKING
197. FULL LIST OF RCU APIs
208. ANSWERS TO QUICK QUIZZES
21
22People who prefer starting with a conceptual overview should focus on
23Section 1, though most readers will profit by reading this section at
24some point. People who prefer to start with an API that they can then
25experiment with should focus on Section 2. People who prefer to start
26with example uses should focus on Sections 3 and 4. People who need to
27understand the RCU implementation should focus on Section 5, then dive
28into the kernel source code. People who reason best by analogy should
29focus on Section 6. Section 7 serves as an index to the docbook API
30documentation, and Section 8 is the traditional answer key.
31
32So, start with the section that makes the most sense to you and your
33preferred method of learning. If you need to know everything about
34everything, feel free to read the whole thing -- but if you are really
35that type of person, you have perused the source code and will therefore
36never need this document anyway. ;-)
37
38
391. RCU OVERVIEW
40
41The basic idea behind RCU is to split updates into "removal" and
42"reclamation" phases. The removal phase removes references to data items
43within a data structure (possibly by replacing them with references to
44new versions of these data items), and can run concurrently with readers.
45The reason that it is safe to run the removal phase concurrently with
46readers is the semantics of modern CPUs guarantee that readers will see
47either the old or the new version of the data structure rather than a
48partially updated reference. The reclamation phase does the work of reclaiming
49(e.g., freeing) the data items removed from the data structure during the
50removal phase. Because reclaiming data items can disrupt any readers
51concurrently referencing those data items, the reclamation phase must
52not start until readers no longer hold references to those data items.
53
54Splitting the update into removal and reclamation phases permits the
55updater to perform the removal phase immediately, and to defer the
56reclamation phase until all readers active during the removal phase have
57completed, either by blocking until they finish or by registering a
58callback that is invoked after they finish. Only readers that are active
59during the removal phase need be considered, because any reader starting
60after the removal phase will be unable to gain a reference to the removed
61data items, and therefore cannot be disrupted by the reclamation phase.
62
63So the typical RCU update sequence goes something like the following:
64
65a. Remove pointers to a data structure, so that subsequent
66 readers cannot gain a reference to it.
67
68b. Wait for all previous readers to complete their RCU read-side
69 critical sections.
70
71c. At this point, there cannot be any readers who hold references
72 to the data structure, so it now may safely be reclaimed
73 (e.g., kfree()d).
74
75Step (b) above is the key idea underlying RCU's deferred destruction.
76The ability to wait until all readers are done allows RCU readers to
77use much lighter-weight synchronization, in some cases, absolutely no
78synchronization at all. In contrast, in more conventional lock-based
79schemes, readers must use heavy-weight synchronization in order to
80prevent an updater from deleting the data structure out from under them.
81This is because lock-based updaters typically update data items in place,
82and must therefore exclude readers. In contrast, RCU-based updaters
83typically take advantage of the fact that writes to single aligned
84pointers are atomic on modern CPUs, allowing atomic insertion, removal,
85and replacement of data items in a linked structure without disrupting
86readers. Concurrent RCU readers can then continue accessing the old
87versions, and can dispense with the atomic operations, memory barriers,
88and communications cache misses that are so expensive on present-day
89SMP computer systems, even in absence of lock contention.
90
91In the three-step procedure shown above, the updater is performing both
92the removal and the reclamation step, but it is often helpful for an
93entirely different thread to do the reclamation, as is in fact the case
94in the Linux kernel's directory-entry cache (dcache). Even if the same
95thread performs both the update step (step (a) above) and the reclamation
96step (step (c) above), it is often helpful to think of them separately.
97For example, RCU readers and updaters need not communicate at all,
98but RCU provides implicit low-overhead communication between readers
99and reclaimers, namely, in step (b) above.
100
101So how the heck can a reclaimer tell when a reader is done, given
102that readers are not doing any sort of synchronization operations???
103Read on to learn about how RCU's API makes this easy.
104
105
1062. WHAT IS RCU'S CORE API?
107
108The core RCU API is quite small:
109
110a. rcu_read_lock()
111b. rcu_read_unlock()
112c. synchronize_rcu() / call_rcu()
113d. rcu_assign_pointer()
114e. rcu_dereference()
115
116There are many other members of the RCU API, but the rest can be
117expressed in terms of these five, though most implementations instead
118express synchronize_rcu() in terms of the call_rcu() callback API.
119
120The five core RCU APIs are described below, the other 18 will be enumerated
121later. See the kernel docbook documentation for more info, or look directly
122at the function header comments.
123
124rcu_read_lock()
125
126 void rcu_read_lock(void);
127
128 Used by a reader to inform the reclaimer that the reader is
129 entering an RCU read-side critical section. It is illegal
130 to block while in an RCU read-side critical section, though
131 kernels built with CONFIG_PREEMPT_RCU can preempt RCU read-side
132 critical sections. Any RCU-protected data structure accessed
133 during an RCU read-side critical section is guaranteed to remain
134 unreclaimed for the full duration of that critical section.
135 Reference counts may be used in conjunction with RCU to maintain
136 longer-term references to data structures.
137
138rcu_read_unlock()
139
140 void rcu_read_unlock(void);
141
142 Used by a reader to inform the reclaimer that the reader is
143 exiting an RCU read-side critical section. Note that RCU
144 read-side critical sections may be nested and/or overlapping.
145
146synchronize_rcu()
147
148 void synchronize_rcu(void);
149
150 Marks the end of updater code and the beginning of reclaimer
151 code. It does this by blocking until all pre-existing RCU
152 read-side critical sections on all CPUs have completed.
153 Note that synchronize_rcu() will -not- necessarily wait for
154 any subsequent RCU read-side critical sections to complete.
155 For example, consider the following sequence of events:
156
157 CPU 0 CPU 1 CPU 2
158 ----------------- ------------------------- ---------------
159 1. rcu_read_lock()
160 2. enters synchronize_rcu()
161 3. rcu_read_lock()
162 4. rcu_read_unlock()
163 5. exits synchronize_rcu()
164 6. rcu_read_unlock()
165
166 To reiterate, synchronize_rcu() waits only for ongoing RCU
167 read-side critical sections to complete, not necessarily for
168 any that begin after synchronize_rcu() is invoked.
169
170 Of course, synchronize_rcu() does not necessarily return
171 -immediately- after the last pre-existing RCU read-side critical
172 section completes. For one thing, there might well be scheduling
173 delays. For another thing, many RCU implementations process
174 requests in batches in order to improve efficiencies, which can
175 further delay synchronize_rcu().
176
177 Since synchronize_rcu() is the API that must figure out when
178 readers are done, its implementation is key to RCU. For RCU
179 to be useful in all but the most read-intensive situations,
180 synchronize_rcu()'s overhead must also be quite small.
181
182 The call_rcu() API is a callback form of synchronize_rcu(),
183 and is described in more detail in a later section. Instead of
184 blocking, it registers a function and argument which are invoked
185 after all ongoing RCU read-side critical sections have completed.
186 This callback variant is particularly useful in situations where
187 it is illegal to block.
188
189rcu_assign_pointer()
190
191 typeof(p) rcu_assign_pointer(p, typeof(p) v);
192
193 Yes, rcu_assign_pointer() -is- implemented as a macro, though it
194 would be cool to be able to declare a function in this manner.
195 (Compiler experts will no doubt disagree.)
196
197 The updater uses this function to assign a new value to an
198 RCU-protected pointer, in order to safely communicate the change
199 in value from the updater to the reader. This function returns
200 the new value, and also executes any memory-barrier instructions
201 required for a given CPU architecture.
202
203 Perhaps more important, it serves to document which pointers
204 are protected by RCU. That said, rcu_assign_pointer() is most
205 frequently used indirectly, via the _rcu list-manipulation
206 primitives such as list_add_rcu().
207
208rcu_dereference()
209
210 typeof(p) rcu_dereference(p);
211
212 Like rcu_assign_pointer(), rcu_dereference() must be implemented
213 as a macro.
214
215 The reader uses rcu_dereference() to fetch an RCU-protected
216 pointer, which returns a value that may then be safely
217 dereferenced. Note that rcu_deference() does not actually
218 dereference the pointer, instead, it protects the pointer for
219 later dereferencing. It also executes any needed memory-barrier
220 instructions for a given CPU architecture. Currently, only Alpha
221 needs memory barriers within rcu_dereference() -- on other CPUs,
222 it compiles to nothing, not even a compiler directive.
223
224 Common coding practice uses rcu_dereference() to copy an
225 RCU-protected pointer to a local variable, then dereferences
226 this local variable, for example as follows:
227
228 p = rcu_dereference(head.next);
229 return p->data;
230
231 However, in this case, one could just as easily combine these
232 into one statement:
233
234 return rcu_dereference(head.next)->data;
235
236 If you are going to be fetching multiple fields from the
237 RCU-protected structure, using the local variable is of
238 course preferred. Repeated rcu_dereference() calls look
239 ugly and incur unnecessary overhead on Alpha CPUs.
240
241 Note that the value returned by rcu_dereference() is valid
242 only within the enclosing RCU read-side critical section.
243 For example, the following is -not- legal:
244
245 rcu_read_lock();
246 p = rcu_dereference(head.next);
247 rcu_read_unlock();
248 x = p->address;
249 rcu_read_lock();
250 y = p->data;
251 rcu_read_unlock();
252
253 Holding a reference from one RCU read-side critical section
254 to another is just as illegal as holding a reference from
255 one lock-based critical section to another! Similarly,
256 using a reference outside of the critical section in which
257 it was acquired is just as illegal as doing so with normal
258 locking.
259
260 As with rcu_assign_pointer(), an important function of
261 rcu_dereference() is to document which pointers are protected
262 by RCU. And, again like rcu_assign_pointer(), rcu_dereference()
263 is typically used indirectly, via the _rcu list-manipulation
264 primitives, such as list_for_each_entry_rcu().
265
266The following diagram shows how each API communicates among the
267reader, updater, and reclaimer.
268
269
270 rcu_assign_pointer()
271 +--------+
272 +---------------------->| reader |---------+
273 | +--------+ |
274 | | |
275 | | | Protect:
276 | | | rcu_read_lock()
277 | | | rcu_read_unlock()
278 | rcu_dereference() | |
279 +---------+ | |
280 | updater |<---------------------+ |
281 +---------+ V
282 | +-----------+
283 +----------------------------------->| reclaimer |
284 +-----------+
285 Defer:
286 synchronize_rcu() & call_rcu()
287
288
289The RCU infrastructure observes the time sequence of rcu_read_lock(),
290rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
291order to determine when (1) synchronize_rcu() invocations may return
292to their callers and (2) call_rcu() callbacks may be invoked. Efficient
293implementations of the RCU infrastructure make heavy use of batching in
294order to amortize their overhead over many uses of the corresponding APIs.
295
296There are no fewer than three RCU mechanisms in the Linux kernel; the
297diagram above shows the first one, which is by far the most commonly used.
298The rcu_dereference() and rcu_assign_pointer() primitives are used for
299all three mechanisms, but different defer and protect primitives are
300used as follows:
301
302 Defer Protect
303
304a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock()
305 call_rcu()
306
307b. call_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh()
308
309c. synchronize_sched() preempt_disable() / preempt_enable()
310 local_irq_save() / local_irq_restore()
311 hardirq enter / hardirq exit
312 NMI enter / NMI exit
313
314These three mechanisms are used as follows:
315
316a. RCU applied to normal data structures.
317
318b. RCU applied to networking data structures that may be subjected
319 to remote denial-of-service attacks.
320
321c. RCU applied to scheduler and interrupt/NMI-handler tasks.
322
323Again, most uses will be of (a). The (b) and (c) cases are important
324for specialized uses, but are relatively uncommon.
325
326
3273. WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
328
329This section shows a simple use of the core RCU API to protect a
330global pointer to a dynamically allocated structure. More typical
331uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt.
332
333 struct foo {
334 int a;
335 char b;
336 long c;
337 };
338 DEFINE_SPINLOCK(foo_mutex);
339
340 struct foo *gbl_foo;
341
342 /*
343 * Create a new struct foo that is the same as the one currently
344 * pointed to by gbl_foo, except that field "a" is replaced
345 * with "new_a". Points gbl_foo to the new structure, and
346 * frees up the old structure after a grace period.
347 *
348 * Uses rcu_assign_pointer() to ensure that concurrent readers
349 * see the initialized version of the new structure.
350 *
351 * Uses synchronize_rcu() to ensure that any readers that might
352 * have references to the old structure complete before freeing
353 * the old structure.
354 */
355 void foo_update_a(int new_a)
356 {
357 struct foo *new_fp;
358 struct foo *old_fp;
359
360 new_fp = kmalloc(sizeof(*fp), GFP_KERNEL);
361 spin_lock(&foo_mutex);
362 old_fp = gbl_foo;
363 *new_fp = *old_fp;
364 new_fp->a = new_a;
365 rcu_assign_pointer(gbl_foo, new_fp);
366 spin_unlock(&foo_mutex);
367 synchronize_rcu();
368 kfree(old_fp);
369 }
370
371 /*
372 * Return the value of field "a" of the current gbl_foo
373 * structure. Use rcu_read_lock() and rcu_read_unlock()
374 * to ensure that the structure does not get deleted out
375 * from under us, and use rcu_dereference() to ensure that
376 * we see the initialized version of the structure (important
377 * for DEC Alpha and for people reading the code).
378 */
379 int foo_get_a(void)
380 {
381 int retval;
382
383 rcu_read_lock();
384 retval = rcu_dereference(gbl_foo)->a;
385 rcu_read_unlock();
386 return retval;
387 }
388
389So, to sum up:
390
391o Use rcu_read_lock() and rcu_read_unlock() to guard RCU
392 read-side critical sections.
393
394o Within an RCU read-side critical section, use rcu_dereference()
395 to dereference RCU-protected pointers.
396
397o Use some solid scheme (such as locks or semaphores) to
398 keep concurrent updates from interfering with each other.
399
400o Use rcu_assign_pointer() to update an RCU-protected pointer.
401 This primitive protects concurrent readers from the updater,
402 -not- concurrent updates from each other! You therefore still
403 need to use locking (or something similar) to keep concurrent
404 rcu_assign_pointer() primitives from interfering with each other.
405
406o Use synchronize_rcu() -after- removing a data element from an
407 RCU-protected data structure, but -before- reclaiming/freeing
408 the data element, in order to wait for the completion of all
409 RCU read-side critical sections that might be referencing that
410 data item.
411
412See checklist.txt for additional rules to follow when using RCU.
413
414
4154. WHAT IF MY UPDATING THREAD CANNOT BLOCK?
416
417In the example above, foo_update_a() blocks until a grace period elapses.
418This is quite simple, but in some cases one cannot afford to wait so
419long -- there might be other high-priority work to be done.
420
421In such cases, one uses call_rcu() rather than synchronize_rcu().
422The call_rcu() API is as follows:
423
424 void call_rcu(struct rcu_head * head,
425 void (*func)(struct rcu_head *head));
426
427This function invokes func(head) after a grace period has elapsed.
428This invocation might happen from either softirq or process context,
429so the function is not permitted to block. The foo struct needs to
430have an rcu_head structure added, perhaps as follows:
431
432 struct foo {
433 int a;
434 char b;
435 long c;
436 struct rcu_head rcu;
437 };
438
439The foo_update_a() function might then be written as follows:
440
441 /*
442 * Create a new struct foo that is the same as the one currently
443 * pointed to by gbl_foo, except that field "a" is replaced
444 * with "new_a". Points gbl_foo to the new structure, and
445 * frees up the old structure after a grace period.
446 *
447 * Uses rcu_assign_pointer() to ensure that concurrent readers
448 * see the initialized version of the new structure.
449 *
450 * Uses call_rcu() to ensure that any readers that might have
451 * references to the old structure complete before freeing the
452 * old structure.
453 */
454 void foo_update_a(int new_a)
455 {
456 struct foo *new_fp;
457 struct foo *old_fp;
458
459 new_fp = kmalloc(sizeof(*fp), GFP_KERNEL);
460 spin_lock(&foo_mutex);
461 old_fp = gbl_foo;
462 *new_fp = *old_fp;
463 new_fp->a = new_a;
464 rcu_assign_pointer(gbl_foo, new_fp);
465 spin_unlock(&foo_mutex);
466 call_rcu(&old_fp->rcu, foo_reclaim);
467 }
468
469The foo_reclaim() function might appear as follows:
470
471 void foo_reclaim(struct rcu_head *rp)
472 {
473 struct foo *fp = container_of(rp, struct foo, rcu);
474
475 kfree(fp);
476 }
477
478The container_of() primitive is a macro that, given a pointer into a
479struct, the type of the struct, and the pointed-to field within the
480struct, returns a pointer to the beginning of the struct.
481
482The use of call_rcu() permits the caller of foo_update_a() to
483immediately regain control, without needing to worry further about the
484old version of the newly updated element. It also clearly shows the
485RCU distinction between updater, namely foo_update_a(), and reclaimer,
486namely foo_reclaim().
487
488The summary of advice is the same as for the previous section, except
489that we are now using call_rcu() rather than synchronize_rcu():
490
491o Use call_rcu() -after- removing a data element from an
492 RCU-protected data structure in order to register a callback
493 function that will be invoked after the completion of all RCU
494 read-side critical sections that might be referencing that
495 data item.
496
497Again, see checklist.txt for additional rules governing the use of RCU.
498
499
5005. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
501
502One of the nice things about RCU is that it has extremely simple "toy"
503implementations that are a good first step towards understanding the
504production-quality implementations in the Linux kernel. This section
505presents two such "toy" implementations of RCU, one that is implemented
506in terms of familiar locking primitives, and another that more closely
507resembles "classic" RCU. Both are way too simple for real-world use,
508lacking both functionality and performance. However, they are useful
509in getting a feel for how RCU works. See kernel/rcupdate.c for a
510production-quality implementation, and see:
511
512 http://www.rdrop.com/users/paulmck/RCU
513
514for papers describing the Linux kernel RCU implementation. The OLS'01
515and OLS'02 papers are a good introduction, and the dissertation provides
516more details on the current implementation.
517
518
5195A. "TOY" IMPLEMENTATION #1: LOCKING
520
521This section presents a "toy" RCU implementation that is based on
522familiar locking primitives. Its overhead makes it a non-starter for
523real-life use, as does its lack of scalability. It is also unsuitable
524for realtime use, since it allows scheduling latency to "bleed" from
525one read-side critical section to another.
526
527However, it is probably the easiest implementation to relate to, so is
528a good starting point.
529
530It is extremely simple:
531
532 static DEFINE_RWLOCK(rcu_gp_mutex);
533
534 void rcu_read_lock(void)
535 {
536 read_lock(&rcu_gp_mutex);
537 }
538
539 void rcu_read_unlock(void)
540 {
541 read_unlock(&rcu_gp_mutex);
542 }
543
544 void synchronize_rcu(void)
545 {
546 write_lock(&rcu_gp_mutex);
547 write_unlock(&rcu_gp_mutex);
548 }
549
550[You can ignore rcu_assign_pointer() and rcu_dereference() without
551missing much. But here they are anyway. And whatever you do, don't
552forget about them when submitting patches making use of RCU!]
553
554 #define rcu_assign_pointer(p, v) ({ \
555 smp_wmb(); \
556 (p) = (v); \
557 })
558
559 #define rcu_dereference(p) ({ \
560 typeof(p) _________p1 = p; \
561 smp_read_barrier_depends(); \
562 (_________p1); \
563 })
564
565
566The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
567and release a global reader-writer lock. The synchronize_rcu()
568primitive write-acquires this same lock, then immediately releases
569it. This means that once synchronize_rcu() exits, all RCU read-side
570critical sections that were in progress before synchonize_rcu() was
571called are guaranteed to have completed -- there is no way that
572synchronize_rcu() would have been able to write-acquire the lock
573otherwise.
574
575It is possible to nest rcu_read_lock(), since reader-writer locks may
576be recursively acquired. Note also that rcu_read_lock() is immune
577from deadlock (an important property of RCU). The reason for this is
578that the only thing that can block rcu_read_lock() is a synchronize_rcu().
579But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex,
580so there can be no deadlock cycle.
581
582Quick Quiz #1: Why is this argument naive? How could a deadlock
583 occur when using this algorithm in a real-world Linux
584 kernel? How could this deadlock be avoided?
585
586
5875B. "TOY" EXAMPLE #2: CLASSIC RCU
588
589This section presents a "toy" RCU implementation that is based on
590"classic RCU". It is also short on performance (but only for updates) and
591on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT
592kernels. The definitions of rcu_dereference() and rcu_assign_pointer()
593are the same as those shown in the preceding section, so they are omitted.
594
595 void rcu_read_lock(void) { }
596
597 void rcu_read_unlock(void) { }
598
599 void synchronize_rcu(void)
600 {
601 int cpu;
602
603 for_each_cpu(cpu)
604 run_on(cpu);
605 }
606
607Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing.
608This is the great strength of classic RCU in a non-preemptive kernel:
609read-side overhead is precisely zero, at least on non-Alpha CPUs.
610And there is absolutely no way that rcu_read_lock() can possibly
611participate in a deadlock cycle!
612
613The implementation of synchronize_rcu() simply schedules itself on each
614CPU in turn. The run_on() primitive can be implemented straightforwardly
615in terms of the sched_setaffinity() primitive. Of course, a somewhat less
616"toy" implementation would restore the affinity upon completion rather
617than just leaving all tasks running on the last CPU, but when I said
618"toy", I meant -toy-!
619
620So how the heck is this supposed to work???
621
622Remember that it is illegal to block while in an RCU read-side critical
623section. Therefore, if a given CPU executes a context switch, we know
624that it must have completed all preceding RCU read-side critical sections.
625Once -all- CPUs have executed a context switch, then -all- preceding
626RCU read-side critical sections will have completed.
627
628So, suppose that we remove a data item from its structure and then invoke
629synchronize_rcu(). Once synchronize_rcu() returns, we are guaranteed
630that there are no RCU read-side critical sections holding a reference
631to that data item, so we can safely reclaim it.
632
633Quick Quiz #2: Give an example where Classic RCU's read-side
634 overhead is -negative-.
635
636Quick Quiz #3: If it is illegal to block in an RCU read-side
637 critical section, what the heck do you do in
638 PREEMPT_RT, where normal spinlocks can block???
639
640
6416. ANALOGY WITH READER-WRITER LOCKING
642
643Although RCU can be used in many different ways, a very common use of
644RCU is analogous to reader-writer locking. The following unified
645diff shows how closely related RCU and reader-writer locking can be.
646
647 @@ -13,15 +14,15 @@
648 struct list_head *lp;
649 struct el *p;
650
651 - read_lock();
652 - list_for_each_entry(p, head, lp) {
653 + rcu_read_lock();
654 + list_for_each_entry_rcu(p, head, lp) {
655 if (p->key == key) {
656 *result = p->data;
657 - read_unlock();
658 + rcu_read_unlock();
659 return 1;
660 }
661 }
662 - read_unlock();
663 + rcu_read_unlock();
664 return 0;
665 }
666
667 @@ -29,15 +30,16 @@
668 {
669 struct el *p;
670
671 - write_lock(&listmutex);
672 + spin_lock(&listmutex);
673 list_for_each_entry(p, head, lp) {
674 if (p->key == key) {
675 list_del(&p->list);
676 - write_unlock(&listmutex);
677 + spin_unlock(&listmutex);
678 + synchronize_rcu();
679 kfree(p);
680 return 1;
681 }
682 }
683 - write_unlock(&listmutex);
684 + spin_unlock(&listmutex);
685 return 0;
686 }
687
688Or, for those who prefer a side-by-side listing:
689
690 1 struct el { 1 struct el {
691 2 struct list_head list; 2 struct list_head list;
692 3 long key; 3 long key;
693 4 spinlock_t mutex; 4 spinlock_t mutex;
694 5 int data; 5 int data;
695 6 /* Other data fields */ 6 /* Other data fields */
696 7 }; 7 };
697 8 spinlock_t listmutex; 8 spinlock_t listmutex;
698 9 struct el head; 9 struct el head;
699
700 1 int search(long key, int *result) 1 int search(long key, int *result)
701 2 { 2 {
702 3 struct list_head *lp; 3 struct list_head *lp;
703 4 struct el *p; 4 struct el *p;
704 5 5
705 6 read_lock(); 6 rcu_read_lock();
706 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) {
707 8 if (p->key == key) { 8 if (p->key == key) {
708 9 *result = p->data; 9 *result = p->data;
70910 read_unlock(); 10 rcu_read_unlock();
71011 return 1; 11 return 1;
71112 } 12 }
71213 } 13 }
71314 read_unlock(); 14 rcu_read_unlock();
71415 return 0; 15 return 0;
71516 } 16 }
716
717 1 int delete(long key) 1 int delete(long key)
718 2 { 2 {
719 3 struct el *p; 3 struct el *p;
720 4 4
721 5 write_lock(&listmutex); 5 spin_lock(&listmutex);
722 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) {
723 7 if (p->key == key) { 7 if (p->key == key) {
724 8 list_del(&p->list); 8 list_del(&p->list);
725 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex);
726 10 synchronize_rcu();
72710 kfree(p); 11 kfree(p);
72811 return 1; 12 return 1;
72912 } 13 }
73013 } 14 }
73114 write_unlock(&listmutex); 15 spin_unlock(&listmutex);
73215 return 0; 16 return 0;
73316 } 17 }
734
735Either way, the differences are quite small. Read-side locking moves
736to rcu_read_lock() and rcu_read_unlock, update-side locking moves from
737from a reader-writer lock to a simple spinlock, and a synchronize_rcu()
738precedes the kfree().
739
740However, there is one potential catch: the read-side and update-side
741critical sections can now run concurrently. In many cases, this will
742not be a problem, but it is necessary to check carefully regardless.
743For example, if multiple independent list updates must be seen as
744a single atomic update, converting to RCU will require special care.
745
746Also, the presence of synchronize_rcu() means that the RCU version of
747delete() can now block. If this is a problem, there is a callback-based
748mechanism that never blocks, namely call_rcu(), that can be used in
749place of synchronize_rcu().
750
751
7527. FULL LIST OF RCU APIs
753
754The RCU APIs are documented in docbook-format header comments in the
755Linux-kernel source code, but it helps to have a full list of the
756APIs, since there does not appear to be a way to categorize them
757in docbook. Here is the list, by category.
758
759Markers for RCU read-side critical sections:
760
761 rcu_read_lock
762 rcu_read_unlock
763 rcu_read_lock_bh
764 rcu_read_unlock_bh
765
766RCU pointer/list traversal:
767
768 rcu_dereference
769 list_for_each_rcu (to be deprecated in favor of
770 list_for_each_entry_rcu)
771 list_for_each_safe_rcu (deprecated, not used)
772 list_for_each_entry_rcu
773 list_for_each_continue_rcu (to be deprecated in favor of new
774 list_for_each_entry_continue_rcu)
775 hlist_for_each_rcu (to be deprecated in favor of
776 hlist_for_each_entry_rcu)
777 hlist_for_each_entry_rcu
778
779RCU pointer update:
780
781 rcu_assign_pointer
782 list_add_rcu
783 list_add_tail_rcu
784 list_del_rcu
785 list_replace_rcu
786 hlist_del_rcu
787 hlist_add_head_rcu
788
789RCU grace period:
790
791 synchronize_kernel (deprecated)
792 synchronize_net
793 synchronize_sched
794 synchronize_rcu
795 call_rcu
796 call_rcu_bh
797
798See the comment headers in the source code (or the docbook generated
799from them) for more information.
800
801
8028. ANSWERS TO QUICK QUIZZES
803
804Quick Quiz #1: Why is this argument naive? How could a deadlock
805 occur when using this algorithm in a real-world Linux
806 kernel? [Referring to the lock-based "toy" RCU
807 algorithm.]
808
809Answer: Consider the following sequence of events:
810
811 1. CPU 0 acquires some unrelated lock, call it
812 "problematic_lock".
813
814 2. CPU 1 enters synchronize_rcu(), write-acquiring
815 rcu_gp_mutex.
816
817 3. CPU 0 enters rcu_read_lock(), but must wait
818 because CPU 1 holds rcu_gp_mutex.
819
820 4. CPU 1 is interrupted, and the irq handler
821 attempts to acquire problematic_lock.
822
823 The system is now deadlocked.
824
825 One way to avoid this deadlock is to use an approach like
826 that of CONFIG_PREEMPT_RT, where all normal spinlocks
827 become blocking locks, and all irq handlers execute in
828 the context of special tasks. In this case, in step 4
829 above, the irq handler would block, allowing CPU 1 to
830 release rcu_gp_mutex, avoiding the deadlock.
831
832 Even in the absence of deadlock, this RCU implementation
833 allows latency to "bleed" from readers to other
834 readers through synchronize_rcu(). To see this,
835 consider task A in an RCU read-side critical section
836 (thus read-holding rcu_gp_mutex), task B blocked
837 attempting to write-acquire rcu_gp_mutex, and
838 task C blocked in rcu_read_lock() attempting to
839 read_acquire rcu_gp_mutex. Task A's RCU read-side
840 latency is holding up task C, albeit indirectly via
841 task B.
842
843 Realtime RCU implementations therefore use a counter-based
844 approach where tasks in RCU read-side critical sections
845 cannot be blocked by tasks executing synchronize_rcu().
846
847Quick Quiz #2: Give an example where Classic RCU's read-side
848 overhead is -negative-.
849
850Answer: Imagine a single-CPU system with a non-CONFIG_PREEMPT
851 kernel where a routing table is used by process-context
852 code, but can be updated by irq-context code (for example,
853 by an "ICMP REDIRECT" packet). The usual way of handling
854 this would be to have the process-context code disable
855 interrupts while searching the routing table. Use of
856 RCU allows such interrupt-disabling to be dispensed with.
857 Thus, without RCU, you pay the cost of disabling interrupts,
858 and with RCU you don't.
859
860 One can argue that the overhead of RCU in this
861 case is negative with respect to the single-CPU
862 interrupt-disabling approach. Others might argue that
863 the overhead of RCU is merely zero, and that replacing
864 the positive overhead of the interrupt-disabling scheme
865 with the zero-overhead RCU scheme does not constitute
866 negative overhead.
867
868 In real life, of course, things are more complex. But
869 even the theoretical possibility of negative overhead for
870 a synchronization primitive is a bit unexpected. ;-)
871
872Quick Quiz #3: If it is illegal to block in an RCU read-side
873 critical section, what the heck do you do in
874 PREEMPT_RT, where normal spinlocks can block???
875
876Answer: Just as PREEMPT_RT permits preemption of spinlock
877 critical sections, it permits preemption of RCU
878 read-side critical sections. It also permits
879 spinlocks blocking while in RCU read-side critical
880 sections.
881
882 Why the apparent inconsistency? Because it is it
883 possible to use priority boosting to keep the RCU
884 grace periods short if need be (for example, if running
885 short of memory). In contrast, if blocking waiting
886 for (say) network reception, there is no way to know
887 what should be boosted. Especially given that the
888 process we need to boost might well be a human being
889 who just went out for a pizza or something. And although
890 a computer-operated cattle prod might arouse serious
891 interest, it might also provoke serious objections.
892 Besides, how does the computer know what pizza parlor
893 the human being went to???
894
895
896ACKNOWLEDGEMENTS
897
898My thanks to the people who helped make this human-readable, including
899Jon Walpole, Josh Triplett, Serge Hallyn, and Suzanne Wood.
900
901
902For more information, see http://www.rdrop.com/users/paulmck/RCU.
diff --git a/Documentation/applying-patches.txt b/Documentation/applying-patches.txt
new file mode 100644
index 000000000000..681e426e2482
--- /dev/null
+++ b/Documentation/applying-patches.txt
@@ -0,0 +1,439 @@
1
2 Applying Patches To The Linux Kernel
3 ------------------------------------
4
5 (Written by Jesper Juhl, August 2005)
6
7
8
9A frequently asked question on the Linux Kernel Mailing List is how to apply
10a patch to the kernel or, more specifically, what base kernel a patch for
11one of the many trees/branches should be applied to. Hopefully this document
12will explain this to you.
13
14In addition to explaining how to apply and revert patches, a brief
15description of the different kernel trees (and examples of how to apply
16their specific patches) is also provided.
17
18
19What is a patch?
20---
21 A patch is a small text document containing a delta of changes between two
22different versions of a source tree. Patches are created with the `diff'
23program.
24To correctly apply a patch you need to know what base it was generated from
25and what new version the patch will change the source tree into. These
26should both be present in the patch file metadata or be possible to deduce
27from the filename.
28
29
30How do I apply or revert a patch?
31---
32 You apply a patch with the `patch' program. The patch program reads a diff
33(or patch) file and makes the changes to the source tree described in it.
34
35Patches for the Linux kernel are generated relative to the parent directory
36holding the kernel source dir.
37
38This means that paths to files inside the patch file contain the name of the
39kernel source directories it was generated against (or some other directory
40names like "a/" and "b/").
41Since this is unlikely to match the name of the kernel source dir on your
42local machine (but is often useful info to see what version an otherwise
43unlabeled patch was generated against) you should change into your kernel
44source directory and then strip the first element of the path from filenames
45in the patch file when applying it (the -p1 argument to `patch' does this).
46
47To revert a previously applied patch, use the -R argument to patch.
48So, if you applied a patch like this:
49 patch -p1 < ../patch-x.y.z
50
51You can revert (undo) it like this:
52 patch -R -p1 < ../patch-x.y.z
53
54
55How do I feed a patch/diff file to `patch'?
56---
57 This (as usual with Linux and other UNIX like operating systems) can be
58done in several different ways.
59In all the examples below I feed the file (in uncompressed form) to patch
60via stdin using the following syntax:
61 patch -p1 < path/to/patch-x.y.z
62
63If you just want to be able to follow the examples below and don't want to
64know of more than one way to use patch, then you can stop reading this
65section here.
66
67Patch can also get the name of the file to use via the -i argument, like
68this:
69 patch -p1 -i path/to/patch-x.y.z
70
71If your patch file is compressed with gzip or bzip2 and you don't want to
72uncompress it before applying it, then you can feed it to patch like this
73instead:
74 zcat path/to/patch-x.y.z.gz | patch -p1
75 bzcat path/to/patch-x.y.z.bz2 | patch -p1
76
77If you wish to uncompress the patch file by hand first before applying it
78(what I assume you've done in the examples below), then you simply run
79gunzip or bunzip2 on the file - like this:
80 gunzip patch-x.y.z.gz
81 bunzip2 patch-x.y.z.bz2
82
83Which will leave you with a plain text patch-x.y.z file that you can feed to
84patch via stdin or the -i argument, as you prefer.
85
86A few other nice arguments for patch are -s which causes patch to be silent
87except for errors which is nice to prevent errors from scrolling out of the
88screen too fast, and --dry-run which causes patch to just print a listing of
89what would happen, but doesn't actually make any changes. Finally --verbose
90tells patch to print more information about the work being done.
91
92
93Common errors when patching
94---
95 When patch applies a patch file it attempts to verify the sanity of the
96file in different ways.
97Checking that the file looks like a valid patch file, checking the code
98around the bits being modified matches the context provided in the patch are
99just two of the basic sanity checks patch does.
100
101If patch encounters something that doesn't look quite right it has two
102options. It can either refuse to apply the changes and abort or it can try
103to find a way to make the patch apply with a few minor changes.
104
105One example of something that's not 'quite right' that patch will attempt to
106fix up is if all the context matches, the lines being changed match, but the
107line numbers are different. This can happen, for example, if the patch makes
108a change in the middle of the file but for some reasons a few lines have
109been added or removed near the beginning of the file. In that case
110everything looks good it has just moved up or down a bit, and patch will
111usually adjust the line numbers and apply the patch.
112
113Whenever patch applies a patch that it had to modify a bit to make it fit
114it'll tell you about it by saying the patch applied with 'fuzz'.
115You should be wary of such changes since even though patch probably got it
116right it doesn't /always/ get it right, and the result will sometimes be
117wrong.
118
119When patch encounters a change that it can't fix up with fuzz it rejects it
120outright and leaves a file with a .rej extension (a reject file). You can
121read this file to see exactely what change couldn't be applied, so you can
122go fix it up by hand if you wish.
123
124If you don't have any third party patches applied to your kernel source, but
125only patches from kernel.org and you apply the patches in the correct order,
126and have made no modifications yourself to the source files, then you should
127never see a fuzz or reject message from patch. If you do see such messages
128anyway, then there's a high risk that either your local source tree or the
129patch file is corrupted in some way. In that case you should probably try
130redownloading the patch and if things are still not OK then you'd be advised
131to start with a fresh tree downloaded in full from kernel.org.
132
133Let's look a bit more at some of the messages patch can produce.
134
135If patch stops and presents a "File to patch:" prompt, then patch could not
136find a file to be patched. Most likely you forgot to specify -p1 or you are
137in the wrong directory. Less often, you'll find patches that need to be
138applied with -p0 instead of -p1 (reading the patch file should reveal if
139this is the case - if so, then this is an error by the person who created
140the patch but is not fatal).
141
142If you get "Hunk #2 succeeded at 1887 with fuzz 2 (offset 7 lines)." or a
143message similar to that, then it means that patch had to adjust the location
144of the change (in this example it needed to move 7 lines from where it
145expected to make the change to make it fit).
146The resulting file may or may not be OK, depending on the reason the file
147was different than expected.
148This often happens if you try to apply a patch that was generated against a
149different kernel version than the one you are trying to patch.
150
151If you get a message like "Hunk #3 FAILED at 2387.", then it means that the
152patch could not be applied correctly and the patch program was unable to
153fuzz its way through. This will generate a .rej file with the change that
154caused the patch to fail and also a .orig file showing you the original
155content that couldn't be changed.
156
157If you get "Reversed (or previously applied) patch detected! Assume -R? [n]"
158then patch detected that the change contained in the patch seems to have
159already been made.
160If you actually did apply this patch previously and you just re-applied it
161in error, then just say [n]o and abort this patch. If you applied this patch
162previously and actually intended to revert it, but forgot to specify -R,
163then you can say [y]es here to make patch revert it for you.
164This can also happen if the creator of the patch reversed the source and
165destination directories when creating the patch, and in that case reverting
166the patch will in fact apply it.
167
168A message similar to "patch: **** unexpected end of file in patch" or "patch
169unexpectedly ends in middle of line" means that patch could make no sense of
170the file you fed to it. Either your download is broken or you tried to feed
171patch a compressed patch file without uncompressing it first.
172
173As I already mentioned above, these errors should never happen if you apply
174a patch from kernel.org to the correct version of an unmodified source tree.
175So if you get these errors with kernel.org patches then you should probably
176assume that either your patch file or your tree is broken and I'd advice you
177to start over with a fresh download of a full kernel tree and the patch you
178wish to apply.
179
180
181Are there any alternatives to `patch'?
182---
183 Yes there are alternatives. You can use the `interdiff' program
184(http://cyberelk.net/tim/patchutils/) to generate a patch representing the
185differences between two patches and then apply the result.
186This will let you move from something like 2.6.12.2 to 2.6.12.3 in a single
187step. The -z flag to interdiff will even let you feed it patches in gzip or
188bzip2 compressed form directly without the use of zcat or bzcat or manual
189decompression.
190
191Here's how you'd go from 2.6.12.2 to 2.6.12.3 in a single step:
192 interdiff -z ../patch-2.6.12.2.bz2 ../patch-2.6.12.3.gz | patch -p1
193
194Although interdiff may save you a step or two you are generally advised to
195do the additional steps since interdiff can get things wrong in some cases.
196
197 Another alternative is `ketchup', which is a python script for automatic
198downloading and applying of patches (http://www.selenic.com/ketchup/).
199
200Other nice tools are diffstat which shows a summary of changes made by a
201patch, lsdiff which displays a short listing of affected files in a patch
202file, along with (optionally) the line numbers of the start of each patch
203and grepdiff which displays a list of the files modified by a patch where
204the patch contains a given regular expression.
205
206
207Where can I download the patches?
208---
209 The patches are available at http://kernel.org/
210Most recent patches are linked from the front page, but they also have
211specific homes.
212
213The 2.6.x.y (-stable) and 2.6.x patches live at
214 ftp://ftp.kernel.org/pub/linux/kernel/v2.6/
215
216The -rc patches live at
217 ftp://ftp.kernel.org/pub/linux/kernel/v2.6/testing/
218
219The -git patches live at
220 ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots/
221
222The -mm kernels live at
223 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/
224
225In place of ftp.kernel.org you can use ftp.cc.kernel.org, where cc is a
226country code. This way you'll be downloading from a mirror site that's most
227likely geographically closer to you, resulting in faster downloads for you,
228less bandwidth used globally and less load on the main kernel.org servers -
229these are good things, do use mirrors when possible.
230
231
232The 2.6.x kernels
233---
234 These are the base stable releases released by Linus. The highest numbered
235release is the most recent.
236
237If regressions or other serious flaws are found then a -stable fix patch
238will be released (see below) on top of this base. Once a new 2.6.x base
239kernel is released, a patch is made available that is a delta between the
240previous 2.6.x kernel and the new one.
241
242To apply a patch moving from 2.6.11 to 2.6.12 you'd do the following (note
243that such patches do *NOT* apply on top of 2.6.x.y kernels but on top of the
244base 2.6.x kernel - if you need to move from 2.6.x.y to 2.6.x+1 you need to
245first revert the 2.6.x.y patch).
246
247Here are some examples:
248
249# moving from 2.6.11 to 2.6.12
250$ cd ~/linux-2.6.11 # change to kernel source dir
251$ patch -p1 < ../patch-2.6.12 # apply the 2.6.12 patch
252$ cd ..
253$ mv linux-2.6.11 linux-2.6.12 # rename source dir
254
255# moving from 2.6.11.1 to 2.6.12
256$ cd ~/linux-2.6.11.1 # change to kernel source dir
257$ patch -p1 -R < ../patch-2.6.11.1 # revert the 2.6.11.1 patch
258 # source dir is now 2.6.11
259$ patch -p1 < ../patch-2.6.12 # apply new 2.6.12 patch
260$ cd ..
261$ mv linux-2.6.11.1 inux-2.6.12 # rename source dir
262
263
264The 2.6.x.y kernels
265---
266 Kernels with 4 digit versions are -stable kernels. They contain small(ish)
267critical fixes for security problems or significant regressions discovered
268in a given 2.6.x kernel.
269
270This is the recommended branch for users who want the most recent stable
271kernel and are not interested in helping test development/experimental
272versions.
273
274If no 2.6.x.y kernel is available, then the highest numbered 2.6.x kernel is
275the current stable kernel.
276
277These patches are not incremental, meaning that for example the 2.6.12.3
278patch does not apply on top of the 2.6.12.2 kernel source, but rather on top
279of the base 2.6.12 kernel source.
280So, in order to apply the 2.6.12.3 patch to your existing 2.6.12.2 kernel
281source you have to first back out the 2.6.12.2 patch (so you are left with a
282base 2.6.12 kernel source) and then apply the new 2.6.12.3 patch.
283
284Here's a small example:
285
286$ cd ~/linux-2.6.12.2 # change into the kernel source dir
287$ patch -p1 -R < ../patch-2.6.12.2 # revert the 2.6.12.2 patch
288$ patch -p1 < ../patch-2.6.12.3 # apply the new 2.6.12.3 patch
289$ cd ..
290$ mv linux-2.6.12.2 linux-2.6.12.3 # rename the kernel source dir
291
292
293The -rc kernels
294---
295 These are release-candidate kernels. These are development kernels released
296by Linus whenever he deems the current git (the kernel's source management
297tool) tree to be in a reasonably sane state adequate for testing.
298
299These kernels are not stable and you should expect occasional breakage if
300you intend to run them. This is however the most stable of the main
301development branches and is also what will eventually turn into the next
302stable kernel, so it is important that it be tested by as many people as
303possible.
304
305This is a good branch to run for people who want to help out testing
306development kernels but do not want to run some of the really experimental
307stuff (such people should see the sections about -git and -mm kernels below).
308
309The -rc patches are not incremental, they apply to a base 2.6.x kernel, just
310like the 2.6.x.y patches described above. The kernel version before the -rcN
311suffix denotes the version of the kernel that this -rc kernel will eventually
312turn into.
313So, 2.6.13-rc5 means that this is the fifth release candidate for the 2.6.13
314kernel and the patch should be applied on top of the 2.6.12 kernel source.
315
316Here are 3 examples of how to apply these patches:
317
318# first an example of moving from 2.6.12 to 2.6.13-rc3
319$ cd ~/linux-2.6.12 # change into the 2.6.12 source dir
320$ patch -p1 < ../patch-2.6.13-rc3 # apply the 2.6.13-rc3 patch
321$ cd ..
322$ mv linux-2.6.12 linux-2.6.13-rc3 # rename the source dir
323
324# now let's move from 2.6.13-rc3 to 2.6.13-rc5
325$ cd ~/linux-2.6.13-rc3 # change into the 2.6.13-rc3 dir
326$ patch -p1 -R < ../patch-2.6.13-rc3 # revert the 2.6.13-rc3 patch
327$ patch -p1 < ../patch-2.6.13-rc5 # apply the new 2.6.13-rc5 patch
328$ cd ..
329$ mv linux-2.6.13-rc3 linux-2.6.13-rc5 # rename the source dir
330
331# finally let's try and move from 2.6.12.3 to 2.6.13-rc5
332$ cd ~/linux-2.6.12.3 # change to the kernel source dir
333$ patch -p1 -R < ../patch-2.6.12.3 # revert the 2.6.12.3 patch
334$ patch -p1 < ../patch-2.6.13-rc5 # apply new 2.6.13-rc5 patch
335$ cd ..
336$ mv linux-2.6.12.3 linux-2.6.13-rc5 # rename the kernel source dir
337
338
339The -git kernels
340---
341 These are daily snapshots of Linus' kernel tree (managed in a git
342repository, hence the name).
343
344These patches are usually released daily and represent the current state of
345Linus' tree. They are more experimental than -rc kernels since they are
346generated automatically without even a cursory glance to see if they are
347sane.
348
349-git patches are not incremental and apply either to a base 2.6.x kernel or
350a base 2.6.x-rc kernel - you can see which from their name.
351A patch named 2.6.12-git1 applies to the 2.6.12 kernel source and a patch
352named 2.6.13-rc3-git2 applies to the source of the 2.6.13-rc3 kernel.
353
354Here are some examples of how to apply these patches:
355
356# moving from 2.6.12 to 2.6.12-git1
357$ cd ~/linux-2.6.12 # change to the kernel source dir
358$ patch -p1 < ../patch-2.6.12-git1 # apply the 2.6.12-git1 patch
359$ cd ..
360$ mv linux-2.6.12 linux-2.6.12-git1 # rename the kernel source dir
361
362# moving from 2.6.12-git1 to 2.6.13-rc2-git3
363$ cd ~/linux-2.6.12-git1 # change to the kernel source dir
364$ patch -p1 -R < ../patch-2.6.12-git1 # revert the 2.6.12-git1 patch
365 # we now have a 2.6.12 kernel
366$ patch -p1 < ../patch-2.6.13-rc2 # apply the 2.6.13-rc2 patch
367 # the kernel is now 2.6.13-rc2
368$ patch -p1 < ../patch-2.6.13-rc2-git3 # apply the 2.6.13-rc2-git3 patch
369 # the kernel is now 2.6.13-rc2-git3
370$ cd ..
371$ mv linux-2.6.12-git1 linux-2.6.13-rc2-git3 # rename source dir
372
373
374The -mm kernels
375---
376 These are experimental kernels released by Andrew Morton.
377
378The -mm tree serves as a sort of proving ground for new features and other
379experimental patches.
380Once a patch has proved its worth in -mm for a while Andrew pushes it on to
381Linus for inclusion in mainline.
382
383Although it's encouraged that patches flow to Linus via the -mm tree, this
384is not always enforced.
385Subsystem maintainers (or individuals) sometimes push their patches directly
386to Linus, even though (or after) they have been merged and tested in -mm (or
387sometimes even without prior testing in -mm).
388
389You should generally strive to get your patches into mainline via -mm to
390ensure maximum testing.
391
392This branch is in constant flux and contains many experimental features, a
393lot of debugging patches not appropriate for mainline etc and is the most
394experimental of the branches described in this document.
395
396These kernels are not appropriate for use on systems that are supposed to be
397stable and they are more risky to run than any of the other branches (make
398sure you have up-to-date backups - that goes for any experimental kernel but
399even more so for -mm kernels).
400
401These kernels in addition to all the other experimental patches they contain
402usually also contain any changes in the mainline -git kernels available at
403the time of release.
404
405Testing of -mm kernels is greatly appreciated since the whole point of the
406tree is to weed out regressions, crashes, data corruption bugs, build
407breakage (and any other bug in general) before changes are merged into the
408more stable mainline Linus tree.
409But testers of -mm should be aware that breakage in this tree is more common
410than in any other tree.
411
412The -mm kernels are not released on a fixed schedule, but usually a few -mm
413kernels are released in between each -rc kernel (1 to 3 is common).
414The -mm kernels apply to either a base 2.6.x kernel (when no -rc kernels
415have been released yet) or to a Linus -rc kernel.
416
417Here are some examples of applying the -mm patches:
418
419# moving from 2.6.12 to 2.6.12-mm1
420$ cd ~/linux-2.6.12 # change to the 2.6.12 source dir
421$ patch -p1 < ../2.6.12-mm1 # apply the 2.6.12-mm1 patch
422$ cd ..
423$ mv linux-2.6.12 linux-2.6.12-mm1 # rename the source appropriately
424
425# moving from 2.6.12-mm1 to 2.6.13-rc3-mm3
426$ cd ~/linux-2.6.12-mm1
427$ patch -p1 -R < ../2.6.12-mm1 # revert the 2.6.12-mm1 patch
428 # we now have a 2.6.12 source
429$ patch -p1 < ../patch-2.6.13-rc3 # apply the 2.6.13-rc3 patch
430 # we now have a 2.6.13-rc3 source
431$ patch -p1 < ../2.6.13-rc3-mm3 # apply the 2.6.13-rc3-mm3 patch
432$ cd ..
433$ mv linux-2.6.12-mm1 linux-2.6.13-rc3-mm3 # rename the source dir
434
435
436This concludes this list of explanations of the various kernel trees and I
437hope you are now crystal clear on how to apply the various patches and help
438testing the kernel.
439
diff --git a/Documentation/cpu-freq/cpufreq-stats.txt b/Documentation/cpu-freq/cpufreq-stats.txt
index e2d1e760b4ba..6a82948ff4bd 100644
--- a/Documentation/cpu-freq/cpufreq-stats.txt
+++ b/Documentation/cpu-freq/cpufreq-stats.txt
@@ -36,7 +36,7 @@ cpufreq stats provides following statistics (explained in detail below).
36 36
37All the statistics will be from the time the stats driver has been inserted 37All the statistics will be from the time the stats driver has been inserted
38to the time when a read of a particular statistic is done. Obviously, stats 38to the time when a read of a particular statistic is done. Obviously, stats
39driver will not have any information about the the frequcny transitions before 39driver will not have any information about the frequency transitions before
40the stats driver insertion. 40the stats driver insertion.
41 41
42-------------------------------------------------------------------------------- 42--------------------------------------------------------------------------------
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index 47f4114fbf54..d17b7d2dd771 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -277,7 +277,7 @@ rewritten to the 'tasks' file of its cpuset. This is done to avoid
277impacting the scheduler code in the kernel with a check for changes 277impacting the scheduler code in the kernel with a check for changes
278in a tasks processor placement. 278in a tasks processor placement.
279 279
280There is an exception to the above. If hotplug funtionality is used 280There is an exception to the above. If hotplug functionality is used
281to remove all the CPUs that are currently assigned to a cpuset, 281to remove all the CPUs that are currently assigned to a cpuset,
282then the kernel will automatically update the cpus_allowed of all 282then the kernel will automatically update the cpus_allowed of all
283tasks attached to CPUs in that cpuset to allow all CPUs. When memory 283tasks attached to CPUs in that cpuset to allow all CPUs. When memory
diff --git a/Documentation/crypto/descore-readme.txt b/Documentation/crypto/descore-readme.txt
index 166474c2ee0b..16e9e6350755 100644
--- a/Documentation/crypto/descore-readme.txt
+++ b/Documentation/crypto/descore-readme.txt
@@ -1,4 +1,4 @@
1Below is the orginal README file from the descore.shar package. 1Below is the original README file from the descore.shar package.
2------------------------------------------------------------------------------ 2------------------------------------------------------------------------------
3 3
4des - fast & portable DES encryption & decryption. 4des - fast & portable DES encryption & decryption.
diff --git a/Documentation/dvb/bt8xx.txt b/Documentation/dvb/bt8xx.txt
index 4b8c326c6aac..cb63b7a93c82 100644
--- a/Documentation/dvb/bt8xx.txt
+++ b/Documentation/dvb/bt8xx.txt
@@ -1,55 +1,74 @@
1How to get the Nebula Electronics DigiTV, Pinnacle PCTV Sat, Twinhan DST + clones working 1How to get the Nebula, PCTV and Twinhan DST cards working
2========================================================================================= 2=========================================================
3 3
41) General information 4This class of cards has a bt878a as the PCI interface, and
5====================== 5require the bttv driver.
6 6
7This class of cards has a bt878a chip as the PCI interface. 7Please pay close attention to the warning about the bttv module
8The different card drivers require the bttv driver to provide the means 8options below for the DST card.
9to access the i2c bus and the gpio pins of the bt8xx chipset.
10 9
112) Compilation rules for Kernel >= 2.6.12 101) General informations
12========================================= 11=======================
13 12
14Enable the following options: 13These drivers require the bttv driver to provide the means to access
14the i2c bus and the gpio pins of the bt8xx chipset.
15 15
16Because of this, you need to enable
16"Device drivers" => "Multimedia devices" 17"Device drivers" => "Multimedia devices"
17 => "Video For Linux" => "BT848 Video For Linux" 18 => "Video For Linux" => "BT848 Video For Linux"
19
20Furthermore you need to enable
18"Device drivers" => "Multimedia devices" => "Digital Video Broadcasting Devices" 21"Device drivers" => "Multimedia devices" => "Digital Video Broadcasting Devices"
19 => "DVB for Linux" "DVB Core Support" "BT8xx based PCI cards" 22 => "DVB for Linux" "DVB Core Support" "BT8xx based PCI cards"
20 23
213) Loading Modules, described by two approaches 242) Loading Modules
22=============================================== 25==================
23 26
24In general you need to load the bttv driver, which will handle the gpio and 27In general you need to load the bttv driver, which will handle the gpio and
25i2c communication for us, plus the common dvb-bt8xx device driver, 28i2c communication for us, plus the common dvb-bt8xx device driver.
26which is called the backend. 29The frontends for Nebula (nxt6000), Pinnacle PCTV (cx24110) and
27The frontends for Nebula DigiTV (nxt6000), Pinnacle PCTV Sat (cx24110), 30TwinHan (dst) are loaded automatically by the dvb-bt8xx device driver.
28TwinHan DST + clones (dst and dst-ca) are loaded automatically by the backend.
29For further details about TwinHan DST + clones see /Documentation/dvb/ci.txt.
30 31
313a) The manual approach 323a) Nebula / Pinnacle PCTV
32----------------------- 33--------------------------
33 34
34Loading modules: 35 $ modprobe bttv (normally bttv is being loaded automatically by kmod)
35modprobe bttv 36 $ modprobe dvb-bt8xx (or just place dvb-bt8xx in /etc/modules for automatic loading)
36modprobe dvb-bt8xx
37 37
38Unloading modules:
39modprobe -r dvb-bt8xx
40modprobe -r bttv
41 38
423b) The automatic approach 393b) TwinHan and Clones
43-------------------------- 40--------------------------
44 41
45If not already done by installation, place a line either in 42 $ modprobe bttv i2c_hw=1 card=0x71
46/etc/modules.conf or in /etc/modprobe.conf containing this text: 43 $ modprobe dvb-bt8xx
47alias char-major-81 bttv 44 $ modprobe dst
45
46The value 0x71 will override the PCI type detection for dvb-bt8xx,
47which is necessary for TwinHan cards.
48
49If you're having an older card (blue color circuit) and card=0x71 locks
50your machine, try using 0x68, too. If that does not work, ask on the
51mailing list.
52
53The DST module takes a couple of useful parameters.
54
55verbose takes values 0 to 4. These values control the verbosity level,
56and can be used to debug also.
57
58verbose=0 means complete disabling of messages
59 1 only error messages are displayed
60 2 notifications are also displayed
61 3 informational messages are also displayed
62 4 debug setting
63
64dst_addons takes values 0 and 0x20. A value of 0 means it is a FTA card.
650x20 means it has a Conditional Access slot.
66
67The autodected values are determined bythe cards 'response
68string' which you can see in your logs e.g.
48 69
49Then place a line in /etc/modules containing this text: 70dst_get_device_id: Recognise [DSTMCI]
50dvb-bt8xx
51 71
52Reboot your system and have fun!
53 72
54-- 73--
55Authors: Richard Walker, Jamie Honan, Michael Hunold, Manu Abraham, Uwe Bugla 74Authors: Richard Walker, Jamie Honan, Michael Hunold, Manu Abraham
diff --git a/Documentation/dvb/ci.txt b/Documentation/dvb/ci.txt
index 62e0701b542a..95f0e73b2135 100644
--- a/Documentation/dvb/ci.txt
+++ b/Documentation/dvb/ci.txt
@@ -23,7 +23,6 @@ This application requires the following to function properly as of now.
23 eg: $ szap -c channels.conf -r "TMC" -x 23 eg: $ szap -c channels.conf -r "TMC" -x
24 24
25 (b) a channels.conf containing a valid PMT PID 25 (b) a channels.conf containing a valid PMT PID
26
27 eg: TMC:11996:h:0:27500:278:512:650:321 26 eg: TMC:11996:h:0:27500:278:512:650:321
28 27
29 here 278 is a valid PMT PID. the rest of the values are the 28 here 278 is a valid PMT PID. the rest of the values are the
@@ -31,13 +30,7 @@ This application requires the following to function properly as of now.
31 30
32 (c) after running a szap, you have to run ca_zap, for the 31 (c) after running a szap, you have to run ca_zap, for the
33 descrambler to function, 32 descrambler to function,
34 33 eg: $ ca_zap channels.conf "TMC"
35 eg: $ ca_zap patched_channels.conf "TMC"
36
37 The patched means a patch to apply to scan, such that scan can
38 generate a channels.conf_with pmt, which has this PMT PID info
39 (NOTE: szap cannot use this channels.conf with the PMT_PID)
40
41 34
42 (d) Hopeflly Enjoy your favourite subscribed channel as you do with 35 (d) Hopeflly Enjoy your favourite subscribed channel as you do with
43 a FTA card. 36 a FTA card.
diff --git a/Documentation/fb/cyblafb/bugs b/Documentation/fb/cyblafb/bugs
new file mode 100644
index 000000000000..f90cc66ea919
--- /dev/null
+++ b/Documentation/fb/cyblafb/bugs
@@ -0,0 +1,14 @@
1Bugs
2====
3
4I currently don't know of any bug. Please do send reports to:
5 - linux-fbdev-devel@lists.sourceforge.net
6 - Knut_Petersen@t-online.de.
7
8
9Untested features
10=================
11
12All LCD stuff is untested. If it worked in tridentfb, it should work in
13cyblafb. Please test and report the results to Knut_Petersen@t-online.de.
14
diff --git a/Documentation/fb/cyblafb/credits b/Documentation/fb/cyblafb/credits
new file mode 100644
index 000000000000..0eb3b443dc2b
--- /dev/null
+++ b/Documentation/fb/cyblafb/credits
@@ -0,0 +1,7 @@
1Thanks to
2=========
3 * Alan Hourihane, for writing the X trident driver
4 * Jani Monoses, for writing the tridentfb driver
5 * Antonino A. Daplas, for review of the first published
6 version of cyblafb and some code
7 * Jochen Hein, for testing and a helpfull bug report
diff --git a/Documentation/fb/cyblafb/documentation b/Documentation/fb/cyblafb/documentation
new file mode 100644
index 000000000000..bb1aac048425
--- /dev/null
+++ b/Documentation/fb/cyblafb/documentation
@@ -0,0 +1,17 @@
1Available Documentation
2=======================
3
4Apollo PLE 133 Chipset VT8601A North Bridge Datasheet, Rev. 1.82, October 22,
52001, available from VIA:
6
7 http://www.viavpsd.com/product/6/15/DS8601A182.pdf
8
9The datasheet is incomplete, some registers that need to be programmed are not
10explained at all and important bits are listed as "reserved". But you really
11need the datasheet to understand the code. "p. xxx" comments refer to page
12numbers of this document.
13
14XFree/XOrg drivers are available and of good quality, looking at the code
15there is a good idea if the datasheet does not provide enough information
16or if the datasheet seems to be wrong.
17
diff --git a/Documentation/fb/cyblafb/fb.modes b/Documentation/fb/cyblafb/fb.modes
new file mode 100644
index 000000000000..cf4351fc32ff
--- /dev/null
+++ b/Documentation/fb/cyblafb/fb.modes
@@ -0,0 +1,155 @@
1#
2# Sample fb.modes file
3#
4# Provides an incomplete list of working modes for
5# the cyberblade/i1 graphics core.
6#
7# The value 4294967256 is used instead of -40. Of course, -40 is not
8# a really reasonable value, but chip design does not always follow
9# logic. Believe me, it's ok, and it's the way the BIOS does it.
10#
11# fbset requires 4294967256 in fb.modes and -40 as an argument to
12# the -t parameter. That's also not too reasonable, and it might change
13# in the future or might even be differt for your current version.
14#
15
16mode "640x480-50"
17 geometry 640 480 640 3756 8
18 timings 47619 4294967256 24 17 0 216 3
19endmode
20
21mode "640x480-60"
22 geometry 640 480 640 3756 8
23 timings 39682 4294967256 24 17 0 216 3
24endmode
25
26mode "640x480-70"
27 geometry 640 480 640 3756 8
28 timings 34013 4294967256 24 17 0 216 3
29endmode
30
31mode "640x480-72"
32 geometry 640 480 640 3756 8
33 timings 33068 4294967256 24 17 0 216 3
34endmode
35
36mode "640x480-75"
37 geometry 640 480 640 3756 8
38 timings 31746 4294967256 24 17 0 216 3
39endmode
40
41mode "640x480-80"
42 geometry 640 480 640 3756 8
43 timings 29761 4294967256 24 17 0 216 3
44endmode
45
46mode "640x480-85"
47 geometry 640 480 640 3756 8
48 timings 28011 4294967256 24 17 0 216 3
49endmode
50
51mode "800x600-50"
52 geometry 800 600 800 3221 8
53 timings 30303 96 24 14 0 136 11
54endmode
55
56mode "800x600-60"
57 geometry 800 600 800 3221 8
58 timings 25252 96 24 14 0 136 11
59endmode
60
61mode "800x600-70"
62 geometry 800 600 800 3221 8
63 timings 21645 96 24 14 0 136 11
64endmode
65
66mode "800x600-72"
67 geometry 800 600 800 3221 8
68 timings 21043 96 24 14 0 136 11
69endmode
70
71mode "800x600-75"
72 geometry 800 600 800 3221 8
73 timings 20202 96 24 14 0 136 11
74endmode
75
76mode "800x600-80"
77 geometry 800 600 800 3221 8
78 timings 18939 96 24 14 0 136 11
79endmode
80
81mode "800x600-85"
82 geometry 800 600 800 3221 8
83 timings 17825 96 24 14 0 136 11
84endmode
85
86mode "1024x768-50"
87 geometry 1024 768 1024 2815 8
88 timings 19054 144 24 29 0 120 3
89endmode
90
91mode "1024x768-60"
92 geometry 1024 768 1024 2815 8
93 timings 15880 144 24 29 0 120 3
94endmode
95
96mode "1024x768-70"
97 geometry 1024 768 1024 2815 8
98 timings 13610 144 24 29 0 120 3
99endmode
100
101mode "1024x768-72"
102 geometry 1024 768 1024 2815 8
103 timings 13232 144 24 29 0 120 3
104endmode
105
106mode "1024x768-75"
107 geometry 1024 768 1024 2815 8
108 timings 12703 144 24 29 0 120 3
109endmode
110
111mode "1024x768-80"
112 geometry 1024 768 1024 2815 8
113 timings 11910 144 24 29 0 120 3
114endmode
115
116mode "1024x768-85"
117 geometry 1024 768 1024 2815 8
118 timings 11209 144 24 29 0 120 3
119endmode
120
121mode "1280x1024-50"
122 geometry 1280 1024 1280 2662 8
123 timings 11114 232 16 39 0 160 3
124endmode
125
126mode "1280x1024-60"
127 geometry 1280 1024 1280 2662 8
128 timings 9262 232 16 39 0 160 3
129endmode
130
131mode "1280x1024-70"
132 geometry 1280 1024 1280 2662 8
133 timings 7939 232 16 39 0 160 3
134endmode
135
136mode "1280x1024-72"
137 geometry 1280 1024 1280 2662 8
138 timings 7719 232 16 39 0 160 3
139endmode
140
141mode "1280x1024-75"
142 geometry 1280 1024 1280 2662 8
143 timings 7410 232 16 39 0 160 3
144endmode
145
146mode "1280x1024-80"
147 geometry 1280 1024 1280 2662 8
148 timings 6946 232 16 39 0 160 3
149endmode
150
151mode "1280x1024-85"
152 geometry 1280 1024 1280 2662 8
153 timings 6538 232 16 39 0 160 3
154endmode
155
diff --git a/Documentation/fb/cyblafb/performance b/Documentation/fb/cyblafb/performance
new file mode 100644
index 000000000000..eb4e47a9cea6
--- /dev/null
+++ b/Documentation/fb/cyblafb/performance
@@ -0,0 +1,80 @@
1Speed
2=====
3
4CyBlaFB is much faster than tridentfb and vesafb. Compare the performance data
5for mode 1280x1024-[8,16,32]@61 Hz.
6
7Test 1: Cat a file with 2000 lines of 0 characters.
8Test 2: Cat a file with 2000 lines of 80 characters.
9Test 3: Cat a file with 2000 lines of 160 characters.
10
11All values show system time use in seconds, kernel 2.6.12 was used for
12the measurements. 2.6.13 is a bit slower, 2.6.14 hopefully will include a
13patch that speeds up kernel bitblitting a lot ( > 20%).
14
15+-----------+-----------------------------------------------------+
16| | not accelerated |
17| TRIDENTFB +-----------------+-----------------+-----------------+
18| of 2.6.12 | 8 bpp | 16 bpp | 32 bpp |
19| | noypan | ypan | noypan | ypan | noypan | ypan |
20+-----------+--------+--------+--------+--------+--------+--------+
21| Test 1 | 4.31 | 4.33 | 6.05 | 12.81 | ---- | ---- |
22| Test 2 | 67.94 | 5.44 | 123.16 | 14.79 | ---- | ---- |
23| Test 3 | 131.36 | 6.55 | 240.12 | 16.76 | ---- | ---- |
24+-----------+--------+--------+--------+--------+--------+--------+
25| Comments | | | completely bro- |
26| | | | ken, monitor |
27| | | | switches off |
28+-----------+-----------------+-----------------+-----------------+
29
30
31+-----------+-----------------------------------------------------+
32| | accelerated |
33| TRIDENTFB +-----------------+-----------------+-----------------+
34| of 2.6.12 | 8 bpp | 16 bpp | 32 bpp |
35| | noypan | ypan | noypan | ypan | noypan | ypan |
36+-----------+--------+--------+--------+--------+--------+--------+
37| Test 1 | ---- | ---- | 20.62 | 1.22 | ---- | ---- |
38| Test 2 | ---- | ---- | 22.61 | 3.19 | ---- | ---- |
39| Test 3 | ---- | ---- | 24.59 | 5.16 | ---- | ---- |
40+-----------+--------+--------+--------+--------+--------+--------+
41| Comments | broken, writing | broken, ok only | completely bro- |
42| | to wrong places | if bgcolor is | ken, monitor |
43| | on screen + bug | black, bug in | switches off |
44| | in fillrect() | fillrect() | |
45+-----------+-----------------+-----------------+-----------------+
46
47
48+-----------+-----------------------------------------------------+
49| | not accelerated |
50| VESAFB +-----------------+-----------------+-----------------+
51| of 2.6.12 | 8 bpp | 16 bpp | 32 bpp |
52| | noypan | ypan | noypan | ypan | noypan | ypan |
53+-----------+--------+--------+--------+--------+--------+--------+
54| Test 1 | 4.26 | 3.76 | 5.99 | 7.23 | ---- | ---- |
55| Test 2 | 65.65 | 4.89 | 120.88 | 9.08 | ---- | ---- |
56| Test 3 | 126.91 | 5.94 | 235.77 | 11.03 | ---- | ---- |
57+-----------+--------+--------+--------+--------+--------+--------+
58| Comments | vga=0x307 | vga=0x31a | vga=0x31b not |
59| | fh=80kHz | fh=80kHz | supported by |
60| | fv=75kHz | fv=75kHz | video BIOS and |
61| | | | hardware |
62+-----------+-----------------+-----------------+-----------------+
63
64
65+-----------+-----------------------------------------------------+
66| | accelerated |
67| CYBLAFB +-----------------+-----------------+-----------------+
68| | 8 bpp | 16 bpp | 32 bpp |
69| | noypan | ypan | noypan | ypan | noypan | ypan |
70+-----------+--------+--------+--------+--------+--------+--------+
71| Test 1 | 8.02 | 0.23 | 19.04 | 0.61 | 57.12 | 2.74 |
72| Test 2 | 8.38 | 0.55 | 19.39 | 0.92 | 57.54 | 3.13 |
73| Test 3 | 8.73 | 0.86 | 19.74 | 1.24 | 57.95 | 3.51 |
74+-----------+--------+--------+--------+--------+--------+--------+
75| Comments | | | |
76| | | | |
77| | | | |
78| | | | |
79+-----------+-----------------+-----------------+-----------------+
80
diff --git a/Documentation/fb/cyblafb/todo b/Documentation/fb/cyblafb/todo
new file mode 100644
index 000000000000..80fb2f89b6c1
--- /dev/null
+++ b/Documentation/fb/cyblafb/todo
@@ -0,0 +1,32 @@
1TODO / Missing features
2=======================
3
4Verify LCD stuff "stretch" and "center" options are
5 completely untested ... this code needs to be
6 verified. As I don't have access to such
7 hardware, please contact me if you are
8 willing run some tests.
9
10Interlaced video modes The reason that interleaved
11 modes are disabled is that I do not know
12 the meaning of the vertical interlace
13 parameter. Also the datasheet mentions a
14 bit d8 of a horizontal interlace parameter,
15 but nowhere the lower 8 bits. Please help
16 if you can.
17
18low-res double scan modes Who needs it?
19
20accelerated color blitting Who needs it? The console driver does use color
21 blitting for nothing but drawing the penguine,
22 everything else is done using color expanding
23 blitting of 1bpp character bitmaps.
24
25xpanning Who needs it?
26
27ioctls Who needs it?
28
29TV-out Will be done later
30
31??? Feel free to contact me if you have any
32 feature requests
diff --git a/Documentation/fb/cyblafb/usage b/Documentation/fb/cyblafb/usage
new file mode 100644
index 000000000000..e627c8f54211
--- /dev/null
+++ b/Documentation/fb/cyblafb/usage
@@ -0,0 +1,206 @@
1CyBlaFB is a framebuffer driver for the Cyberblade/i1 graphics core integrated
2into the VIA Apollo PLE133 (aka vt8601) south bridge. It is developed and
3tested using a VIA EPIA 5000 board.
4
5Cyblafb - compiled into the kernel or as a module?
6==================================================
7
8You might compile cyblafb either as a module or compile it permanently into the
9kernel.
10
11Unless you have a real reason to do so you should not compile both vesafb and
12cyblafb permanently into the kernel. It's possible and it helps during the
13developement cycle, but it's useless and will at least block some otherwise
14usefull memory for ordinary users.
15
16Selecting Modes
17===============
18
19 Startup Mode
20 ============
21
22 First of all, you might use the "vga=???" boot parameter as it is
23 documented in vesafb.txt and svga.txt. Cyblafb will detect the video
24 mode selected and will use the geometry and timings found by
25 inspecting the hardware registers.
26
27 video=cyblafb vga=0x317
28
29 Alternatively you might use a combination of the mode, ref and bpp
30 parameters. If you compiled the driver into the kernel, add something
31 like this to the kernel command line:
32
33 video=cyblafb:1280x1024,bpp=16,ref=50 ...
34
35 If you compiled the driver as a module, the same mode would be
36 selected by the following command:
37
38 modprobe cyblafb mode=1280x1024 bpp=16 ref=50 ...
39
40 None of the modes possible to select as startup modes are affected by
41 the problems described at the end of the next subsection.
42
43 Mode changes using fbset
44 ========================
45
46 You might use fbset to change the video mode, see "man fbset". Cyblafb
47 generally does assume that you know what you are doing. But it does
48 some checks, especially those that are needed to prevent you from
49 damaging your hardware.
50
51 - only 8, 16, 24 and 32 bpp video modes are accepted
52 - interlaced video modes are not accepted
53 - double scan video modes are not accepted
54 - if a flat panel is found, cyblafb does not allow you
55 to program a resolution higher than the physical
56 resolution of the flat panel monitor
57 - cyblafb does not allow xres to differ from xres_virtual
58 - cyblafb does not allow vclk to exceed 230 MHz. As 32 bpp
59 and (currently) 24 bit modes use a doubled vclk internally,
60 the dotclock limit as seen by fbset is 115 MHz for those
61 modes and 230 MHz for 8 and 16 bpp modes.
62
63 Any request that violates the rules given above will be ignored and
64 fbset will return an error.
65
66 If you program a virtual y resolution higher than the hardware limit,
67 cyblafb will silently decrease that value to the highest possible
68 value.
69
70 Attempts to disable acceleration are ignored.
71
72 Some video modes that should work do not work as expected. If you use
73 the standard fb.modes, fbset 640x480-60 will program that mode, but
74 you will see a vertical area, about two characters wide, with only
75 much darker characters than the other characters on the screen.
76 Cyblafb does allow that mode to be set, as it does not violate the
77 official specifications. It would need a lot of code to reliably sort
78 out all invalid modes, playing around with the margin values will
79 give a valid mode quickly. And if cyblafb would detect such an invalid
80 mode, should it silently alter the requested values or should it
81 report an error? Both options have some pros and cons. As stated
82 above, none of the startup modes are affected, and if you set
83 verbosity to 1 or higher, cyblafb will print the fbset command that
84 would be needed to program that mode using fbset.
85
86
87Other Parameters
88================
89
90
91crt don't autodetect, assume monitor connected to
92 standard VGA connector
93
94fp don't autodetect, assume flat panel display
95 connected to flat panel monitor interface
96
97nativex inform driver about native x resolution of
98 flat panel monitor connected to special
99 interface (should be autodetected)
100
101stretch stretch image to adapt low resolution modes to
102 higer resolutions of flat panel monitors
103 connected to special interface
104
105center center image to adapt low resolution modes to
106 higer resolutions of flat panel monitors
107 connected to special interface
108
109memsize use if autodetected memsize is wrong ...
110 should never be necessary
111
112nopcirr disable PCI read retry
113nopciwr disable PCI write retry
114nopcirb disable PCI read bursts
115nopciwb disable PCI write bursts
116
117bpp bpp for specified modes
118 valid values: 8 || 16 || 24 || 32
119
120ref refresh rate for specified mode
121 valid values: 50 <= ref <= 85
122
123mode 640x480 or 800x600 or 1024x768 or 1280x1024
124 if not specified, the startup mode will be detected
125 and used, so you might also use the vga=??? parameter
126 described in vesafb.txt. If you do not specify a mode,
127 bpp and ref parameters are ignored.
128
129verbosity 0 is the default, increase to at least 2 for every
130 bug report!
131
132vesafb allows cyblafb to be loaded after vesafb has been
133 loaded. See sections "Module unloading ...".
134
135
136Development hints
137=================
138
139It's much faster do compile a module and to load the new version after
140unloading the old module than to compile a new kernel and to reboot. So if you
141try to work on cyblafb, it might be a good idea to use cyblafb as a module.
142In real life, fast often means dangerous, and that's also the case here. If
143you introduce a serious bug when cyblafb is compiled into the kernel, the
144kernel will lock or oops with a high probability before the file system is
145mounted, and the danger for your data is low. If you load a broken own version
146of cyblafb on a running system, the danger for the integrity of the file
147system is much higher as you might need a hard reset afterwards. Decide
148yourself.
149
150Module unloading, the vfb method
151================================
152
153If you want to unload/reload cyblafb using the virtual framebuffer, you need
154to enable vfb support in the kernel first. After that, load the modules as
155shown below:
156
157 modprobe vfb vfb_enable=1
158 modprobe fbcon
159 modprobe cyblafb
160 fbset -fb /dev/fb1 1280x1024-60 -vyres 2662
161 con2fb /dev/fb1 /dev/tty1
162 ...
163
164If you now made some changes to cyblafb and want to reload it, you might do it
165as show below:
166
167 con2fb /dev/fb0 /dev/tty1
168 ...
169 rmmod cyblafb
170 modprobe cyblafb
171 con2fb /dev/fb1 /dev/tty1
172 ...
173
174Of course, you might choose another mode, and most certainly you also want to
175map some other /dev/tty* to the real framebuffer device. You might also choose
176to compile fbcon as a kernel module or place it permanently in the kernel.
177
178I do not know of any way to unload fbcon, and fbcon will prevent the
179framebuffer device loaded first from unloading. [If there is a way, then
180please add a description here!]
181
182Module unloading, the vesafb method
183===================================
184
185Configure the kernel:
186
187 <*> Support for frame buffer devices
188 [*] VESA VGA graphics support
189 <M> Cyberblade/i1 support
190
191Add e.g. "video=vesafb:ypan vga=0x307" to the kernel parameters. The ypan
192parameter is important, choose any vga parameter you like as long as it is
193a graphics mode.
194
195After booting, load cyblafb without any mode and bpp parameter and assign
196cyblafb to individual ttys using con2fb, e.g.:
197
198 modprobe cyblafb vesafb=1
199 con2fb /dev/fb1 /dev/tty1
200
201Unloading cyblafb works without problems after you assign vesafb to all
202ttys again, e.g.:
203
204 con2fb /dev/fb0 /dev/tty1
205 rmmod cyblafb
206
diff --git a/Documentation/fb/cyblafb/whycyblafb b/Documentation/fb/cyblafb/whycyblafb
new file mode 100644
index 000000000000..a123bc11e698
--- /dev/null
+++ b/Documentation/fb/cyblafb/whycyblafb
@@ -0,0 +1,85 @@
1I tried the following framebuffer drivers:
2
3 - TRIDENTFB is full of bugs. Acceleration is broken for Blade3D
4 graphics cores like the cyberblade/i1. It claims to support a great
5 number of devices, but documentation for most of these devices is
6 unfortunately not available. There is _no_ reason to use tridentfb
7 for cyberblade/i1 + CRT users. VESAFB is faster, and the one
8 advantage, mode switching, is broken in tridentfb.
9
10 - VESAFB is used by many distributions as a standard. Vesafb does
11 not support mode switching. VESAFB is a bit faster than the working
12 configurations of TRIDENTFB, but it is still too slow, even if you
13 use ypan.
14
15 - EPIAFB (you'll find it on sourceforge) supports the Cyberblade/i1
16 graphics core, but it still has serious bugs and developement seems
17 to have stopped. This is the one driver with TV-out support. If you
18 do need this feature, try epiafb.
19
20None of these drivers was a real option for me.
21
22I believe that is unreasonable to change code that announces to support 20
23devices if I only have more or less sufficient documentation for exactly one
24of these. The risk of breaking device foo while fixing device bar is too high.
25
26So I decided to start CyBlaFB as a stripped down tridentfb.
27
28All code specific to other Trident chips has been removed. After that there
29were a lot of cosmetic changes to increase the readability of the code. All
30register names were changed to those mnemonics used in the datasheet. Function
31and macro names were changed if they hindered easy understanding of the code.
32
33After that I debugged the code and implemented some new features. I'll try to
34give a little summary of the main changes:
35
36 - calculation of vertical and horizontal timings was fixed
37
38 - video signal quality has been improved dramatically
39
40 - acceleration:
41
42 - fillrect and copyarea were fixed and reenabled
43
44 - color expanding imageblit was newly implemented, color
45 imageblit (only used to draw the penguine) still uses the
46 generic code.
47
48 - init of the acceleration engine was improved and moved to a
49 place where it really works ...
50
51 - sync function has a timeout now and tries to reset and
52 reinit the accel engine if necessary
53
54 - fewer slow copyarea calls when doing ypan scrolling by using
55 undocumented bit d21 of screen start address stored in
56 CR2B[5]. BIOS does use it also, so this should be safe.
57
58 - cyblafb rejects any attempt to set modes that would cause vclk
59 values above reasonable 230 MHz. 32bit modes use a clock
60 multiplicator of 2, so fbset does show the correct values for
61 pixclock but not for vclk in this case. The fbset limit is 115 MHz
62 for 32 bpp modes.
63
64 - cyblafb rejects modes known to be broken or unimplemented (all
65 interlaced modes, all doublescan modes for now)
66
67 - cyblafb now works independant of the video mode in effect at startup
68 time (tridentfb does not init all needed registers to reasonable
69 values)
70
71 - switching between video modes does work reliably now
72
73 - the first video mode now is the one selected on startup using the
74 vga=???? mechanism or any of
75 - 640x480, 800x600, 1024x768, 1280x1024
76 - 8, 16, 24 or 32 bpp
77 - refresh between 50 Hz and 85 Hz, 1 Hz steps (1280x1024-32
78 is limited to 63Hz)
79
80 - pci retry and pci burst mode are settable (try to disable if you
81 experience latency problems)
82
83 - built as a module cyblafb might be unloaded and reloaded using
84 the vfb module and con2vt or might be used together with vesafb
85
diff --git a/Documentation/fb/intel810.txt b/Documentation/fb/intel810.txt
index fd68b162e4a1..4f0d6bc789ef 100644
--- a/Documentation/fb/intel810.txt
+++ b/Documentation/fb/intel810.txt
@@ -5,6 +5,7 @@ Intel 810/815 Framebuffer driver
5 March 17, 2002 5 March 17, 2002
6 6
7 First Released: July 2001 7 First Released: July 2001
8 Last Update: September 12, 2005
8================================================================ 9================================================================
9 10
10A. Introduction 11A. Introduction
@@ -44,6 +45,8 @@ B. Features
44 45
45 - Hardware Cursor Support 46 - Hardware Cursor Support
46 47
48 - Supports EDID probing either by DDC/I2C or through the BIOS
49
47C. List of available options 50C. List of available options
48 51
49 a. "video=i810fb" 52 a. "video=i810fb"
@@ -52,14 +55,17 @@ C. List of available options
52 Recommendation: required 55 Recommendation: required
53 56
54 b. "xres:<value>" 57 b. "xres:<value>"
55 select horizontal resolution in pixels 58 select horizontal resolution in pixels. (This parameter will be
59 ignored if 'mode_option' is specified. See 'o' below).
56 60
57 Recommendation: user preference 61 Recommendation: user preference
58 (default = 640) 62 (default = 640)
59 63
60 c. "yres:<value>" 64 c. "yres:<value>"
61 select vertical resolution in scanlines. If Discrete Video Timings 65 select vertical resolution in scanlines. If Discrete Video Timings
62 is enabled, this will be ignored and computed as 3*xres/4. 66 is enabled, this will be ignored and computed as 3*xres/4. (This
67 parameter will be ignored if 'mode_option' is specified. See 'o'
68 below)
63 69
64 Recommendation: user preference 70 Recommendation: user preference
65 (default = 480) 71 (default = 480)
@@ -86,7 +92,8 @@ C. List of available options
86 g. "hsync1/hsync2:<value>" 92 g. "hsync1/hsync2:<value>"
87 select the minimum and maximum Horizontal Sync Frequency of the 93 select the minimum and maximum Horizontal Sync Frequency of the
88 monitor in KHz. If a using a fixed frequency monitor, hsync1 must 94 monitor in KHz. If a using a fixed frequency monitor, hsync1 must
89 be equal to hsync2. 95 be equal to hsync2. If EDID probing is successful, these will be
96 ignored and values will be taken from the EDID block.
90 97
91 Recommendation: check monitor manual for correct values 98 Recommendation: check monitor manual for correct values
92 default (29/30) 99 default (29/30)
@@ -94,7 +101,8 @@ C. List of available options
94 h. "vsync1/vsync2:<value>" 101 h. "vsync1/vsync2:<value>"
95 select the minimum and maximum Vertical Sync Frequency of the monitor 102 select the minimum and maximum Vertical Sync Frequency of the monitor
96 in Hz. You can also use this option to lock your monitor's refresh 103 in Hz. You can also use this option to lock your monitor's refresh
97 rate. 104 rate. If EDID probing is successful, these will be ignored and values
105 will be taken from the EDID block.
98 106
99 Recommendation: check monitor manual for correct values 107 Recommendation: check monitor manual for correct values
100 (default = 60/60) 108 (default = 60/60)
@@ -154,7 +162,11 @@ C. List of available options
154 162
155 Recommendation: do not set 163 Recommendation: do not set
156 (default = not set) 164 (default = not set)
157 165 o. <xres>x<yres>[-<bpp>][@<refresh>]
166 The driver will now accept specification of boot mode option. If this
167 is specified, the options 'xres' and 'yres' will be ignored. See
168 Documentation/fb/modedb.txt for usage.
169
158D. Kernel booting 170D. Kernel booting
159 171
160Separate each option/option-pair by commas (,) and the option from its value 172Separate each option/option-pair by commas (,) and the option from its value
@@ -176,7 +188,10 @@ will be computed based on the hsync1/hsync2 and vsync1/vsync2 values.
176 188
177IMPORTANT: 189IMPORTANT:
178You must include hsync1, hsync2, vsync1 and vsync2 to enable video modes 190You must include hsync1, hsync2, vsync1 and vsync2 to enable video modes
179better than 640x480 at 60Hz. 191better than 640x480 at 60Hz. HOWEVER, if your chipset/display combination
192supports I2C and has an EDID block, you can safely exclude hsync1, hsync2,
193vsync1 and vsync2 parameters. These parameters will be taken from the EDID
194block.
180 195
181E. Module options 196E. Module options
182 197
@@ -217,32 +232,21 @@ F. Setup
217 This is required. The option is under "Character Devices" 232 This is required. The option is under "Character Devices"
218 233
219 d. Under "Graphics Support", select "Intel 810/815" either statically 234 d. Under "Graphics Support", select "Intel 810/815" either statically
220 or as a module. Choose "use VESA GTF for video timings" if you 235 or as a module. Choose "use VESA Generalized Timing Formula" if
221 need to maximize the capability of your display. To be on the 236 you need to maximize the capability of your display. To be on the
222 safe side, you can leave this unselected. 237 safe side, you can leave this unselected.
223 238
224 e. If you want a framebuffer console, enable it under "Console 239 e. If you want support for DDC/I2C probing (Plug and Play Displays),
240 set 'Enable DDC Support' to 'y'. To make this option appear, set
241 'use VESA Generalized Timing Formula' to 'y'.
242
243 f. If you want a framebuffer console, enable it under "Console
225 Drivers" 244 Drivers"
226 245
227 f. Compile your kernel. 246 g. Compile your kernel.
228 247
229 g. Load the driver as described in section D and E. 248 h. Load the driver as described in section D and E.
230 249
231 Optional:
232 h. If you are going to run XFree86 with its native drivers, the
233 standard XFree86 4.1.0 and 4.2.0 drivers should work as is.
234 However, there's a bug in the XFree86 i810 drivers. It attempts
235 to use XAA even when switched to the console. This will crash
236 your server. I have a fix at this site:
237
238 http://i810fb.sourceforge.net.
239
240 You can either use the patch, or just replace
241
242 /usr/X11R6/lib/modules/drivers/i810_drv.o
243
244 with the one provided at the website.
245
246 i. Try the DirectFB (http://www.directfb.org) + the i810 gfxdriver 250 i. Try the DirectFB (http://www.directfb.org) + the i810 gfxdriver
247 patch to see the chipset in action (or inaction :-). 251 patch to see the chipset in action (or inaction :-).
248 252
diff --git a/Documentation/fb/modedb.txt b/Documentation/fb/modedb.txt
index e04458b319d5..4fcdb4cf4cca 100644
--- a/Documentation/fb/modedb.txt
+++ b/Documentation/fb/modedb.txt
@@ -20,12 +20,83 @@ in a video= option, fbmem considers that to be a global video mode option.
20 20
21Valid mode specifiers (mode_option argument): 21Valid mode specifiers (mode_option argument):
22 22
23 <xres>x<yres>[-<bpp>][@<refresh>] 23 <xres>x<yres>[M][R][-<bpp>][@<refresh>][i][m]
24 <name>[-<bpp>][@<refresh>] 24 <name>[-<bpp>][@<refresh>]
25 25
26with <xres>, <yres>, <bpp> and <refresh> decimal numbers and <name> a string. 26with <xres>, <yres>, <bpp> and <refresh> decimal numbers and <name> a string.
27Things between square brackets are optional. 27Things between square brackets are optional.
28 28
29If 'M' is specified in the mode_option argument (after <yres> and before
30<bpp> and <refresh>, if specified) the timings will be calculated using
31VESA(TM) Coordinated Video Timings instead of looking up the mode from a table.
32If 'R' is specified, do a 'reduced blanking' calculation for digital displays.
33If 'i' is specified, calculate for an interlaced mode. And if 'm' is
34specified, add margins to the calculation (1.8% of xres rounded down to 8
35pixels and 1.8% of yres).
36
37 Sample usage: 1024x768M@60m - CVT timing with margins
38
39***** oOo ***** oOo ***** oOo ***** oOo ***** oOo ***** oOo ***** oOo *****
40
41What is the VESA(TM) Coordinated Video Timings (CVT)?
42
43From the VESA(TM) Website:
44
45 "The purpose of CVT is to provide a method for generating a consistent
46 and coordinated set of standard formats, display refresh rates, and
47 timing specifications for computer display products, both those
48 employing CRTs, and those using other display technologies. The
49 intention of CVT is to give both source and display manufacturers a
50 common set of tools to enable new timings to be developed in a
51 consistent manner that ensures greater compatibility."
52
53This is the third standard approved by VESA(TM) concerning video timings. The
54first was the Discrete Video Timings (DVT) which is a collection of
55pre-defined modes approved by VESA(TM). The second is the Generalized Timing
56Formula (GTF) which is an algorithm to calculate the timings, given the
57pixelclock, the horizontal sync frequency, or the vertical refresh rate.
58
59The GTF is limited by the fact that it is designed mainly for CRT displays.
60It artificially increases the pixelclock because of its high blanking
61requirement. This is inappropriate for digital display interface with its high
62data rate which requires that it conserves the pixelclock as much as possible.
63Also, GTF does not take into account the aspect ratio of the display.
64
65The CVT addresses these limitations. If used with CRT's, the formula used
66is a derivation of GTF with a few modifications. If used with digital
67displays, the "reduced blanking" calculation can be used.
68
69From the framebuffer subsystem perspective, new formats need not be added
70to the global mode database whenever a new mode is released by display
71manufacturers. Specifying for CVT will work for most, if not all, relatively
72new CRT displays and probably with most flatpanels, if 'reduced blanking'
73calculation is specified. (The CVT compatibility of the display can be
74determined from its EDID. The version 1.3 of the EDID has extra 128-byte
75blocks where additional timing information is placed. As of this time, there
76is no support yet in the layer to parse this additional blocks.)
77
78CVT also introduced a new naming convention (should be seen from dmesg output):
79
80 <pix>M<a>[-R]
81
82 where: pix = total amount of pixels in MB (xres x yres)
83 M = always present
84 a = aspect ratio (3 - 4:3; 4 - 5:4; 9 - 15:9, 16:9; A - 16:10)
85 -R = reduced blanking
86
87 example: .48M3-R - 800x600 with reduced blanking
88
89Note: VESA(TM) has restrictions on what is a standard CVT timing:
90
91 - aspect ratio can only be one of the above values
92 - acceptable refresh rates are 50, 60, 70 or 85 Hz only
93 - if reduced blanking, the refresh rate must be at 60Hz
94
95If one of the above are not satisfied, the kernel will print a warning but the
96timings will still be calculated.
97
98***** oOo ***** oOo ***** oOo ***** oOo ***** oOo ***** oOo ***** oOo *****
99
29To find a suitable video mode, you just call 100To find a suitable video mode, you just call
30 101
31int __init fb_find_mode(struct fb_var_screeninfo *var, 102int __init fb_find_mode(struct fb_var_screeninfo *var,
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 5f95d4b3cab1..784e08c1c80a 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -17,14 +17,6 @@ Who: Greg Kroah-Hartman <greg@kroah.com>
17 17
18--------------------------- 18---------------------------
19 19
20What: ACPI S4bios support
21When: May 2005
22Why: Noone uses it, and it probably does not work, anyway. swsusp is
23 faster, more reliable, and people are actually using it.
24Who: Pavel Machek <pavel@suse.cz>
25
26---------------------------
27
28What: io_remap_page_range() (macro or function) 20What: io_remap_page_range() (macro or function)
29When: September 2005 21When: September 2005
30Why: Replaced by io_remap_pfn_range() which allows more memory space 22Why: Replaced by io_remap_pfn_range() which allows more memory space
diff --git a/Documentation/filesystems/files.txt b/Documentation/filesystems/files.txt
new file mode 100644
index 000000000000..8c206f4e0250
--- /dev/null
+++ b/Documentation/filesystems/files.txt
@@ -0,0 +1,123 @@
1File management in the Linux kernel
2-----------------------------------
3
4This document describes how locking for files (struct file)
5and file descriptor table (struct files) works.
6
7Up until 2.6.12, the file descriptor table has been protected
8with a lock (files->file_lock) and reference count (files->count).
9->file_lock protected accesses to all the file related fields
10of the table. ->count was used for sharing the file descriptor
11table between tasks cloned with CLONE_FILES flag. Typically
12this would be the case for posix threads. As with the common
13refcounting model in the kernel, the last task doing
14a put_files_struct() frees the file descriptor (fd) table.
15The files (struct file) themselves are protected using
16reference count (->f_count).
17
18In the new lock-free model of file descriptor management,
19the reference counting is similar, but the locking is
20based on RCU. The file descriptor table contains multiple
21elements - the fd sets (open_fds and close_on_exec, the
22array of file pointers, the sizes of the sets and the array
23etc.). In order for the updates to appear atomic to
24a lock-free reader, all the elements of the file descriptor
25table are in a separate structure - struct fdtable.
26files_struct contains a pointer to struct fdtable through
27which the actual fd table is accessed. Initially the
28fdtable is embedded in files_struct itself. On a subsequent
29expansion of fdtable, a new fdtable structure is allocated
30and files->fdtab points to the new structure. The fdtable
31structure is freed with RCU and lock-free readers either
32see the old fdtable or the new fdtable making the update
33appear atomic. Here are the locking rules for
34the fdtable structure -
35
361. All references to the fdtable must be done through
37 the files_fdtable() macro :
38
39 struct fdtable *fdt;
40
41 rcu_read_lock();
42
43 fdt = files_fdtable(files);
44 ....
45 if (n <= fdt->max_fds)
46 ....
47 ...
48 rcu_read_unlock();
49
50 files_fdtable() uses rcu_dereference() macro which takes care of
51 the memory barrier requirements for lock-free dereference.
52 The fdtable pointer must be read within the read-side
53 critical section.
54
552. Reading of the fdtable as described above must be protected
56 by rcu_read_lock()/rcu_read_unlock().
57
583. For any update to the the fd table, files->file_lock must
59 be held.
60
614. To look up the file structure given an fd, a reader
62 must use either fcheck() or fcheck_files() APIs. These
63 take care of barrier requirements due to lock-free lookup.
64 An example :
65
66 struct file *file;
67
68 rcu_read_lock();
69 file = fcheck(fd);
70 if (file) {
71 ...
72 }
73 ....
74 rcu_read_unlock();
75
765. Handling of the file structures is special. Since the look-up
77 of the fd (fget()/fget_light()) are lock-free, it is possible
78 that look-up may race with the last put() operation on the
79 file structure. This is avoided using the rcuref APIs
80 on ->f_count :
81
82 rcu_read_lock();
83 file = fcheck_files(files, fd);
84 if (file) {
85 if (rcuref_inc_lf(&file->f_count))
86 *fput_needed = 1;
87 else
88 /* Didn't get the reference, someone's freed */
89 file = NULL;
90 }
91 rcu_read_unlock();
92 ....
93 return file;
94
95 rcuref_inc_lf() detects if refcounts is already zero or
96 goes to zero during increment. If it does, we fail
97 fget()/fget_light().
98
996. Since both fdtable and file structures can be looked up
100 lock-free, they must be installed using rcu_assign_pointer()
101 API. If they are looked up lock-free, rcu_dereference()
102 must be used. However it is advisable to use files_fdtable()
103 and fcheck()/fcheck_files() which take care of these issues.
104
1057. While updating, the fdtable pointer must be looked up while
106 holding files->file_lock. If ->file_lock is dropped, then
107 another thread expand the files thereby creating a new
108 fdtable and making the earlier fdtable pointer stale.
109 For example :
110
111 spin_lock(&files->file_lock);
112 fd = locate_fd(files, file, start);
113 if (fd >= 0) {
114 /* locate_fd() may have expanded fdtable, load the ptr */
115 fdt = files_fdtable(files);
116 FD_SET(fd, fdt->open_fds);
117 FD_CLR(fd, fdt->close_on_exec);
118 spin_unlock(&files->file_lock);
119 .....
120
121 Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
122 the fdtable pointer (fdt) must be loaded after locate_fd().
123
diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt
new file mode 100644
index 000000000000..6b5741e651a2
--- /dev/null
+++ b/Documentation/filesystems/fuse.txt
@@ -0,0 +1,315 @@
1Definitions
2~~~~~~~~~~~
3
4Userspace filesystem:
5
6 A filesystem in which data and metadata are provided by an ordinary
7 userspace process. The filesystem can be accessed normally through
8 the kernel interface.
9
10Filesystem daemon:
11
12 The process(es) providing the data and metadata of the filesystem.
13
14Non-privileged mount (or user mount):
15
16 A userspace filesystem mounted by a non-privileged (non-root) user.
17 The filesystem daemon is running with the privileges of the mounting
18 user. NOTE: this is not the same as mounts allowed with the "user"
19 option in /etc/fstab, which is not discussed here.
20
21Mount owner:
22
23 The user who does the mounting.
24
25User:
26
27 The user who is performing filesystem operations.
28
29What is FUSE?
30~~~~~~~~~~~~~
31
32FUSE is a userspace filesystem framework. It consists of a kernel
33module (fuse.ko), a userspace library (libfuse.*) and a mount utility
34(fusermount).
35
36One of the most important features of FUSE is allowing secure,
37non-privileged mounts. This opens up new possibilities for the use of
38filesystems. A good example is sshfs: a secure network filesystem
39using the sftp protocol.
40
41The userspace library and utilities are available from the FUSE
42homepage:
43
44 http://fuse.sourceforge.net/
45
46Mount options
47~~~~~~~~~~~~~
48
49'fd=N'
50
51 The file descriptor to use for communication between the userspace
52 filesystem and the kernel. The file descriptor must have been
53 obtained by opening the FUSE device ('/dev/fuse').
54
55'rootmode=M'
56
57 The file mode of the filesystem's root in octal representation.
58
59'user_id=N'
60
61 The numeric user id of the mount owner.
62
63'group_id=N'
64
65 The numeric group id of the mount owner.
66
67'default_permissions'
68
69 By default FUSE doesn't check file access permissions, the
70 filesystem is free to implement it's access policy or leave it to
71 the underlying file access mechanism (e.g. in case of network
72 filesystems). This option enables permission checking, restricting
73 access based on file mode. This is option is usually useful
74 together with the 'allow_other' mount option.
75
76'allow_other'
77
78 This option overrides the security measure restricting file access
79 to the user mounting the filesystem. This option is by default only
80 allowed to root, but this restriction can be removed with a
81 (userspace) configuration option.
82
83'max_read=N'
84
85 With this option the maximum size of read operations can be set.
86 The default is infinite. Note that the size of read requests is
87 limited anyway to 32 pages (which is 128kbyte on i386).
88
89How do non-privileged mounts work?
90~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
91
92Since the mount() system call is a privileged operation, a helper
93program (fusermount) is needed, which is installed setuid root.
94
95The implication of providing non-privileged mounts is that the mount
96owner must not be able to use this capability to compromise the
97system. Obvious requirements arising from this are:
98
99 A) mount owner should not be able to get elevated privileges with the
100 help of the mounted filesystem
101
102 B) mount owner should not get illegitimate access to information from
103 other users' and the super user's processes
104
105 C) mount owner should not be able to induce undesired behavior in
106 other users' or the super user's processes
107
108How are requirements fulfilled?
109~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
110
111 A) The mount owner could gain elevated privileges by either:
112
113 1) creating a filesystem containing a device file, then opening
114 this device
115
116 2) creating a filesystem containing a suid or sgid application,
117 then executing this application
118
119 The solution is not to allow opening device files and ignore
120 setuid and setgid bits when executing programs. To ensure this
121 fusermount always adds "nosuid" and "nodev" to the mount options
122 for non-privileged mounts.
123
124 B) If another user is accessing files or directories in the
125 filesystem, the filesystem daemon serving requests can record the
126 exact sequence and timing of operations performed. This
127 information is otherwise inaccessible to the mount owner, so this
128 counts as an information leak.
129
130 The solution to this problem will be presented in point 2) of C).
131
132 C) There are several ways in which the mount owner can induce
133 undesired behavior in other users' processes, such as:
134
135 1) mounting a filesystem over a file or directory which the mount
136 owner could otherwise not be able to modify (or could only
137 make limited modifications).
138
139 This is solved in fusermount, by checking the access
140 permissions on the mountpoint and only allowing the mount if
141 the mount owner can do unlimited modification (has write
142 access to the mountpoint, and mountpoint is not a "sticky"
143 directory)
144
145 2) Even if 1) is solved the mount owner can change the behavior
146 of other users' processes.
147
148 i) It can slow down or indefinitely delay the execution of a
149 filesystem operation creating a DoS against the user or the
150 whole system. For example a suid application locking a
151 system file, and then accessing a file on the mount owner's
152 filesystem could be stopped, and thus causing the system
153 file to be locked forever.
154
155 ii) It can present files or directories of unlimited length, or
156 directory structures of unlimited depth, possibly causing a
157 system process to eat up diskspace, memory or other
158 resources, again causing DoS.
159
160 The solution to this as well as B) is not to allow processes
161 to access the filesystem, which could otherwise not be
162 monitored or manipulated by the mount owner. Since if the
163 mount owner can ptrace a process, it can do all of the above
164 without using a FUSE mount, the same criteria as used in
165 ptrace can be used to check if a process is allowed to access
166 the filesystem or not.
167
168 Note that the ptrace check is not strictly necessary to
169 prevent B/2/i, it is enough to check if mount owner has enough
170 privilege to send signal to the process accessing the
171 filesystem, since SIGSTOP can be used to get a similar effect.
172
173I think these limitations are unacceptable?
174~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
175
176If a sysadmin trusts the users enough, or can ensure through other
177measures, that system processes will never enter non-privileged
178mounts, it can relax the last limitation with a "user_allow_other"
179config option. If this config option is set, the mounting user can
180add the "allow_other" mount option which disables the check for other
181users' processes.
182
183Kernel - userspace interface
184~~~~~~~~~~~~~~~~~~~~~~~~~~~~
185
186The following diagram shows how a filesystem operation (in this
187example unlink) is performed in FUSE.
188
189NOTE: everything in this description is greatly simplified
190
191 | "rm /mnt/fuse/file" | FUSE filesystem daemon
192 | |
193 | | >sys_read()
194 | | >fuse_dev_read()
195 | | >request_wait()
196 | | [sleep on fc->waitq]
197 | |
198 | >sys_unlink() |
199 | >fuse_unlink() |
200 | [get request from |
201 | fc->unused_list] |
202 | >request_send() |
203 | [queue req on fc->pending] |
204 | [wake up fc->waitq] | [woken up]
205 | >request_wait_answer() |
206 | [sleep on req->waitq] |
207 | | <request_wait()
208 | | [remove req from fc->pending]
209 | | [copy req to read buffer]
210 | | [add req to fc->processing]
211 | | <fuse_dev_read()
212 | | <sys_read()
213 | |
214 | | [perform unlink]
215 | |
216 | | >sys_write()
217 | | >fuse_dev_write()
218 | | [look up req in fc->processing]
219 | | [remove from fc->processing]
220 | | [copy write buffer to req]
221 | [woken up] | [wake up req->waitq]
222 | | <fuse_dev_write()
223 | | <sys_write()
224 | <request_wait_answer() |
225 | <request_send() |
226 | [add request to |
227 | fc->unused_list] |
228 | <fuse_unlink() |
229 | <sys_unlink() |
230
231There are a couple of ways in which to deadlock a FUSE filesystem.
232Since we are talking about unprivileged userspace programs,
233something must be done about these.
234
235Scenario 1 - Simple deadlock
236-----------------------------
237
238 | "rm /mnt/fuse/file" | FUSE filesystem daemon
239 | |
240 | >sys_unlink("/mnt/fuse/file") |
241 | [acquire inode semaphore |
242 | for "file"] |
243 | >fuse_unlink() |
244 | [sleep on req->waitq] |
245 | | <sys_read()
246 | | >sys_unlink("/mnt/fuse/file")
247 | | [acquire inode semaphore
248 | | for "file"]
249 | | *DEADLOCK*
250
251The solution for this is to allow requests to be interrupted while
252they are in userspace:
253
254 | [interrupted by signal] |
255 | <fuse_unlink() |
256 | [release semaphore] | [semaphore acquired]
257 | <sys_unlink() |
258 | | >fuse_unlink()
259 | | [queue req on fc->pending]
260 | | [wake up fc->waitq]
261 | | [sleep on req->waitq]
262
263If the filesystem daemon was single threaded, this will stop here,
264since there's no other thread to dequeue and execute the request.
265In this case the solution is to kill the FUSE daemon as well. If
266there are multiple serving threads, you just have to kill them as
267long as any remain.
268
269Moral: a filesystem which deadlocks, can soon find itself dead.
270
271Scenario 2 - Tricky deadlock
272----------------------------
273
274This one needs a carefully crafted filesystem. It's a variation on
275the above, only the call back to the filesystem is not explicit,
276but is caused by a pagefault.
277
278 | Kamikaze filesystem thread 1 | Kamikaze filesystem thread 2
279 | |
280 | [fd = open("/mnt/fuse/file")] | [request served normally]
281 | [mmap fd to 'addr'] |
282 | [close fd] | [FLUSH triggers 'magic' flag]
283 | [read a byte from addr] |
284 | >do_page_fault() |
285 | [find or create page] |
286 | [lock page] |
287 | >fuse_readpage() |
288 | [queue READ request] |
289 | [sleep on req->waitq] |
290 | | [read request to buffer]
291 | | [create reply header before addr]
292 | | >sys_write(addr - headerlength)
293 | | >fuse_dev_write()
294 | | [look up req in fc->processing]
295 | | [remove from fc->processing]
296 | | [copy write buffer to req]
297 | | >do_page_fault()
298 | | [find or create page]
299 | | [lock page]
300 | | * DEADLOCK *
301
302Solution is again to let the the request be interrupted (not
303elaborated further).
304
305An additional problem is that while the write buffer is being
306copied to the request, the request must not be interrupted. This
307is because the destination address of the copy may not be valid
308after the request is interrupted.
309
310This is solved with doing the copy atomically, and allowing
311interruption while the page(s) belonging to the write buffer are
312faulted with get_user_pages(). The 'req->locked' flag indicates
313when the copy is taking place, and interruption is delayed until
314this flag is unset.
315
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 5024ba7a592c..d4773565ea2f 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1241,16 +1241,38 @@ swap-intensive.
1241overcommit_memory 1241overcommit_memory
1242----------------- 1242-----------------
1243 1243
1244This file contains one value. The following algorithm is used to decide if 1244Controls overcommit of system memory, possibly allowing processes
1245there's enough memory: if the value of overcommit_memory is positive, then 1245to allocate (but not use) more memory than is actually available.
1246there's always enough memory. This is a useful feature, since programs often 1246
1247malloc() huge amounts of memory 'just in case', while they only use a small 1247
1248part of it. Leaving this value at 0 will lead to the failure of such a huge 12480 - Heuristic overcommit handling. Obvious overcommits of
1249malloc(), when in fact the system has enough memory for the program to run. 1249 address space are refused. Used for a typical system. It
1250 1250 ensures a seriously wild allocation fails while allowing
1251On the other hand, enabling this feature can cause you to run out of memory 1251 overcommit to reduce swap usage. root is allowed to
1252and thrash the system to death, so large and/or important servers will want to 1252 allocate slighly more memory in this mode. This is the
1253set this value to 0. 1253 default.
1254
12551 - Always overcommit. Appropriate for some scientific
1256 applications.
1257
12582 - Don't overcommit. The total address space commit
1259 for the system is not permitted to exceed swap plus a
1260 configurable percentage (default is 50) of physical RAM.
1261 Depending on the percentage you use, in most situations
1262 this means a process will not be killed while attempting
1263 to use already-allocated memory but will receive errors
1264 on memory allocation as appropriate.
1265
1266overcommit_ratio
1267----------------
1268
1269Percentage of physical memory size to include in overcommit calculations
1270(see above.)
1271
1272Memory allocation limit = swapspace + physmem * (overcommit_ratio / 100)
1273
1274 swapspace = total size of all swap areas
1275 physmem = size of physical memory in system
1254 1276
1255nr_hugepages and hugetlb_shm_group 1277nr_hugepages and hugetlb_shm_group
1256---------------------------------- 1278----------------------------------
diff --git a/Documentation/filesystems/v9fs.txt b/Documentation/filesystems/v9fs.txt
new file mode 100644
index 000000000000..4e92feb6b507
--- /dev/null
+++ b/Documentation/filesystems/v9fs.txt
@@ -0,0 +1,95 @@
1 V9FS: 9P2000 for Linux
2 ======================
3
4ABOUT
5=====
6
7v9fs is a Unix implementation of the Plan 9 9p remote filesystem protocol.
8
9This software was originally developed by Ron Minnich <rminnich@lanl.gov>
10and Maya Gokhale <maya@lanl.gov>. Additional development by Greg Watson
11<gwatson@lanl.gov> and most recently Eric Van Hensbergen
12<ericvh@gmail.com> and Latchesar Ionkov <lucho@ionkov.net>.
13
14USAGE
15=====
16
17For remote file server:
18
19 mount -t 9P 10.10.1.2 /mnt/9
20
21For Plan 9 From User Space applications (http://swtch.com/plan9)
22
23 mount -t 9P `namespace`/acme /mnt/9 -o proto=unix,name=$USER
24
25OPTIONS
26=======
27
28 proto=name select an alternative transport. Valid options are
29 currently:
30 unix - specifying a named pipe mount point
31 tcp - specifying a normal TCP/IP connection
32 fd - used passed file descriptors for connection
33 (see rfdno and wfdno)
34
35 name=name user name to attempt mount as on the remote server. The
36 server may override or ignore this value. Certain user
37 names may require authentication.
38
39 aname=name aname specifies the file tree to access when the server is
40 offering several exported file systems.
41
42 debug=n specifies debug level. The debug level is a bitmask.
43 0x01 = display verbose error messages
44 0x02 = developer debug (DEBUG_CURRENT)
45 0x04 = display 9P trace
46 0x08 = display VFS trace
47 0x10 = display Marshalling debug
48 0x20 = display RPC debug
49 0x40 = display transport debug
50 0x80 = display allocation debug
51
52 rfdno=n the file descriptor for reading with proto=fd
53
54 wfdno=n the file descriptor for writing with proto=fd
55
56 maxdata=n the number of bytes to use for 9P packet payload (msize)
57
58 port=n port to connect to on the remote server
59
60 timeout=n request timeouts (in ms) (default 60000ms)
61
62 noextend force legacy mode (no 9P2000.u semantics)
63
64 uid attempt to mount as a particular uid
65
66 gid attempt to mount with a particular gid
67
68 afid security channel - used by Plan 9 authentication protocols
69
70 nodevmap do not map special files - represent them as normal files.
71 This can be used to share devices/named pipes/sockets between
72 hosts. This functionality will be expanded in later versions.
73
74RESOURCES
75=========
76
77The Linux version of the 9P server, along with some client-side utilities
78can be found at http://v9fs.sf.net (along with a CVS repository of the
79development branch of this module). There are user and developer mailing
80lists here, as well as a bug-tracker.
81
82For more information on the Plan 9 Operating System check out
83http://plan9.bell-labs.com/plan9
84
85For information on Plan 9 from User Space (Plan 9 applications and libraries
86ported to Linux/BSD/OSX/etc) check out http://swtch.com/plan9
87
88
89STATUS
90======
91
92The 2.6 kernel support is working on PPC and x86.
93
94PLEASE USE THE SOURCEFORGE BUG-TRACKER TO REPORT PROBLEMS.
95
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 3f318dd44c77..f042c12e0ed2 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -1,35 +1,27 @@
1/* -*- auto-fill -*- */
2 1
3 Overview of the Virtual File System 2 Overview of the Linux Virtual File System
4 3
5 Richard Gooch <rgooch@atnf.csiro.au> 4 Original author: Richard Gooch <rgooch@atnf.csiro.au>
6 5
7 5-JUL-1999 6 Last updated on August 25, 2005
8 7
8 Copyright (C) 1999 Richard Gooch
9 Copyright (C) 2005 Pekka Enberg
9 10
10Conventions used in this document <section> 11 This file is released under the GPLv2.
11=================================
12 12
13Each section in this document will have the string "<section>" at the
14right-hand side of the section title. Each subsection will have
15"<subsection>" at the right-hand side. These strings are meant to make
16it easier to search through the document.
17 13
18NOTE that the master copy of this document is available online at: 14What is it?
19http://www.atnf.csiro.au/~rgooch/linux/docs/vfs.txt
20
21
22What is it? <section>
23=========== 15===========
24 16
25The Virtual File System (otherwise known as the Virtual Filesystem 17The Virtual File System (otherwise known as the Virtual Filesystem
26Switch) is the software layer in the kernel that provides the 18Switch) is the software layer in the kernel that provides the
27filesystem interface to userspace programs. It also provides an 19filesystem interface to userspace programs. It also provides an
28abstraction within the kernel which allows different filesystem 20abstraction within the kernel which allows different filesystem
29implementations to co-exist. 21implementations to coexist.
30 22
31 23
32A Quick Look At How It Works <section> 24A Quick Look At How It Works
33============================ 25============================
34 26
35In this section I'll briefly describe how things work, before 27In this section I'll briefly describe how things work, before
@@ -38,7 +30,8 @@ when user programs open and manipulate files, and then look from the
38other view which is how a filesystem is supported and subsequently 30other view which is how a filesystem is supported and subsequently
39mounted. 31mounted.
40 32
41Opening a File <subsection> 33
34Opening a File
42-------------- 35--------------
43 36
44The VFS implements the open(2), stat(2), chmod(2) and similar system 37The VFS implements the open(2), stat(2), chmod(2) and similar system
@@ -77,7 +70,7 @@ back to userspace.
77 70
78Opening a file requires another operation: allocation of a file 71Opening a file requires another operation: allocation of a file
79structure (this is the kernel-side implementation of file 72structure (this is the kernel-side implementation of file
80descriptors). The freshly allocated file structure is initialised with 73descriptors). The freshly allocated file structure is initialized with
81a pointer to the dentry and a set of file operation member functions. 74a pointer to the dentry and a set of file operation member functions.
82These are taken from the inode data. The open() file method is then 75These are taken from the inode data. The open() file method is then
83called so the specific filesystem implementation can do it's work. You 76called so the specific filesystem implementation can do it's work. You
@@ -102,7 +95,8 @@ filesystem or driver code at the same time, on different
102processors. You should ensure that access to shared resources is 95processors. You should ensure that access to shared resources is
103protected by appropriate locks. 96protected by appropriate locks.
104 97
105Registering and Mounting a Filesystem <subsection> 98
99Registering and Mounting a Filesystem
106------------------------------------- 100-------------------------------------
107 101
108If you want to support a new kind of filesystem in the kernel, all you 102If you want to support a new kind of filesystem in the kernel, all you
@@ -123,17 +117,21 @@ updated to point to the root inode for the new filesystem.
123It's now time to look at things in more detail. 117It's now time to look at things in more detail.
124 118
125 119
126struct file_system_type <section> 120struct file_system_type
127======================= 121=======================
128 122
129This describes the filesystem. As of kernel 2.1.99, the following 123This describes the filesystem. As of kernel 2.6.13, the following
130members are defined: 124members are defined:
131 125
132struct file_system_type { 126struct file_system_type {
133 const char *name; 127 const char *name;
134 int fs_flags; 128 int fs_flags;
135 struct super_block *(*read_super) (struct super_block *, void *, int); 129 struct super_block *(*get_sb) (struct file_system_type *, int,
136 struct file_system_type * next; 130 const char *, void *);
131 void (*kill_sb) (struct super_block *);
132 struct module *owner;
133 struct file_system_type * next;
134 struct list_head fs_supers;
137}; 135};
138 136
139 name: the name of the filesystem type, such as "ext2", "iso9660", 137 name: the name of the filesystem type, such as "ext2", "iso9660",
@@ -141,51 +139,97 @@ struct file_system_type {
141 139
142 fs_flags: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.) 140 fs_flags: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.)
143 141
144 read_super: the method to call when a new instance of this 142 get_sb: the method to call when a new instance of this
145 filesystem should be mounted 143 filesystem should be mounted
146 144
147 next: for internal VFS use: you should initialise this to NULL 145 kill_sb: the method to call when an instance of this filesystem
146 should be unmounted
147
148 owner: for internal VFS use: you should initialize this to THIS_MODULE in
149 most cases.
148 150
149The read_super() method has the following arguments: 151 next: for internal VFS use: you should initialize this to NULL
152
153The get_sb() method has the following arguments:
150 154
151 struct super_block *sb: the superblock structure. This is partially 155 struct super_block *sb: the superblock structure. This is partially
152 initialised by the VFS and the rest must be initialised by the 156 initialized by the VFS and the rest must be initialized by the
153 read_super() method 157 get_sb() method
158
159 int flags: mount flags
160
161 const char *dev_name: the device name we are mounting.
154 162
155 void *data: arbitrary mount options, usually comes as an ASCII 163 void *data: arbitrary mount options, usually comes as an ASCII
156 string 164 string
157 165
158 int silent: whether or not to be silent on error 166 int silent: whether or not to be silent on error
159 167
160The read_super() method must determine if the block device specified 168The get_sb() method must determine if the block device specified
161in the superblock contains a filesystem of the type the method 169in the superblock contains a filesystem of the type the method
162supports. On success the method returns the superblock pointer, on 170supports. On success the method returns the superblock pointer, on
163failure it returns NULL. 171failure it returns NULL.
164 172
165The most interesting member of the superblock structure that the 173The most interesting member of the superblock structure that the
166read_super() method fills in is the "s_op" field. This is a pointer to 174get_sb() method fills in is the "s_op" field. This is a pointer to
167a "struct super_operations" which describes the next level of the 175a "struct super_operations" which describes the next level of the
168filesystem implementation. 176filesystem implementation.
169 177
178Usually, a filesystem uses generic one of the generic get_sb()
179implementations and provides a fill_super() method instead. The
180generic methods are:
181
182 get_sb_bdev: mount a filesystem residing on a block device
170 183
171struct super_operations <section> 184 get_sb_nodev: mount a filesystem that is not backed by a device
185
186 get_sb_single: mount a filesystem which shares the instance between
187 all mounts
188
189A fill_super() method implementation has the following arguments:
190
191 struct super_block *sb: the superblock structure. The method fill_super()
192 must initialize this properly.
193
194 void *data: arbitrary mount options, usually comes as an ASCII
195 string
196
197 int silent: whether or not to be silent on error
198
199
200struct super_operations
172======================= 201=======================
173 202
174This describes how the VFS can manipulate the superblock of your 203This describes how the VFS can manipulate the superblock of your
175filesystem. As of kernel 2.1.99, the following members are defined: 204filesystem. As of kernel 2.6.13, the following members are defined:
176 205
177struct super_operations { 206struct super_operations {
178 void (*read_inode) (struct inode *); 207 struct inode *(*alloc_inode)(struct super_block *sb);
179 int (*write_inode) (struct inode *, int); 208 void (*destroy_inode)(struct inode *);
180 void (*put_inode) (struct inode *); 209
181 void (*drop_inode) (struct inode *); 210 void (*read_inode) (struct inode *);
182 void (*delete_inode) (struct inode *); 211
183 int (*notify_change) (struct dentry *, struct iattr *); 212 void (*dirty_inode) (struct inode *);
184 void (*put_super) (struct super_block *); 213 int (*write_inode) (struct inode *, int);
185 void (*write_super) (struct super_block *); 214 void (*put_inode) (struct inode *);
186 int (*statfs) (struct super_block *, struct statfs *, int); 215 void (*drop_inode) (struct inode *);
187 int (*remount_fs) (struct super_block *, int *, char *); 216 void (*delete_inode) (struct inode *);
188 void (*clear_inode) (struct inode *); 217 void (*put_super) (struct super_block *);
218 void (*write_super) (struct super_block *);
219 int (*sync_fs)(struct super_block *sb, int wait);
220 void (*write_super_lockfs) (struct super_block *);
221 void (*unlockfs) (struct super_block *);
222 int (*statfs) (struct super_block *, struct kstatfs *);
223 int (*remount_fs) (struct super_block *, int *, char *);
224 void (*clear_inode) (struct inode *);
225 void (*umount_begin) (struct super_block *);
226
227 void (*sync_inodes) (struct super_block *sb,
228 struct writeback_control *wbc);
229 int (*show_options)(struct seq_file *, struct vfsmount *);
230
231 ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
232 ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
189}; 233};
190 234
191All methods are called without any locks being held, unless otherwise 235All methods are called without any locks being held, unless otherwise
@@ -193,43 +237,62 @@ noted. This means that most methods can block safely. All methods are
193only called from a process context (i.e. not from an interrupt handler 237only called from a process context (i.e. not from an interrupt handler
194or bottom half). 238or bottom half).
195 239
240 alloc_inode: this method is called by inode_alloc() to allocate memory
241 for struct inode and initialize it.
242
243 destroy_inode: this method is called by destroy_inode() to release
244 resources allocated for struct inode.
245
196 read_inode: this method is called to read a specific inode from the 246 read_inode: this method is called to read a specific inode from the
197 mounted filesystem. The "i_ino" member in the "struct inode" 247 mounted filesystem. The i_ino member in the struct inode is
198 will be initialised by the VFS to indicate which inode to 248 initialized by the VFS to indicate which inode to read. Other
199 read. Other members are filled in by this method 249 members are filled in by this method.
250
251 You can set this to NULL and use iget5_locked() instead of iget()
252 to read inodes. This is necessary for filesystems for which the
253 inode number is not sufficient to identify an inode.
254
255 dirty_inode: this method is called by the VFS to mark an inode dirty.
200 256
201 write_inode: this method is called when the VFS needs to write an 257 write_inode: this method is called when the VFS needs to write an
202 inode to disc. The second parameter indicates whether the write 258 inode to disc. The second parameter indicates whether the write
203 should be synchronous or not, not all filesystems check this flag. 259 should be synchronous or not, not all filesystems check this flag.
204 260
205 put_inode: called when the VFS inode is removed from the inode 261 put_inode: called when the VFS inode is removed from the inode
206 cache. This method is optional 262 cache.
207 263
208 drop_inode: called when the last access to the inode is dropped, 264 drop_inode: called when the last access to the inode is dropped,
209 with the inode_lock spinlock held. 265 with the inode_lock spinlock held.
210 266
211 This method should be either NULL (normal unix filesystem 267 This method should be either NULL (normal UNIX filesystem
212 semantics) or "generic_delete_inode" (for filesystems that do not 268 semantics) or "generic_delete_inode" (for filesystems that do not
213 want to cache inodes - causing "delete_inode" to always be 269 want to cache inodes - causing "delete_inode" to always be
214 called regardless of the value of i_nlink) 270 called regardless of the value of i_nlink)
215 271
216 The "generic_delete_inode()" behaviour is equivalent to the 272 The "generic_delete_inode()" behavior is equivalent to the
217 old practice of using "force_delete" in the put_inode() case, 273 old practice of using "force_delete" in the put_inode() case,
218 but does not have the races that the "force_delete()" approach 274 but does not have the races that the "force_delete()" approach
219 had. 275 had.
220 276
221 delete_inode: called when the VFS wants to delete an inode 277 delete_inode: called when the VFS wants to delete an inode
222 278
223 notify_change: called when VFS inode attributes are changed. If this
224 is NULL the VFS falls back to the write_inode() method. This
225 is called with the kernel lock held
226
227 put_super: called when the VFS wishes to free the superblock 279 put_super: called when the VFS wishes to free the superblock
228 (i.e. unmount). This is called with the superblock lock held 280 (i.e. unmount). This is called with the superblock lock held
229 281
230 write_super: called when the VFS superblock needs to be written to 282 write_super: called when the VFS superblock needs to be written to
231 disc. This method is optional 283 disc. This method is optional
232 284
285 sync_fs: called when VFS is writing out all dirty data associated with
286 a superblock. The second parameter indicates whether the method
287 should wait until the write out has been completed. Optional.
288
289 write_super_lockfs: called when VFS is locking a filesystem and forcing
290 it into a consistent state. This function is currently used by the
291 Logical Volume Manager (LVM).
292
293 unlockfs: called when VFS is unlocking a filesystem and making it writable
294 again.
295
233 statfs: called when the VFS needs to get filesystem statistics. This 296 statfs: called when the VFS needs to get filesystem statistics. This
234 is called with the kernel lock held 297 is called with the kernel lock held
235 298
@@ -238,21 +301,31 @@ or bottom half).
238 301
239 clear_inode: called then the VFS clears the inode. Optional 302 clear_inode: called then the VFS clears the inode. Optional
240 303
304 umount_begin: called when the VFS is unmounting a filesystem.
305
306 sync_inodes: called when the VFS is writing out dirty data associated with
307 a superblock.
308
309 show_options: called by the VFS to show mount options for /proc/<pid>/mounts.
310
311 quota_read: called by the VFS to read from filesystem quota file.
312
313 quota_write: called by the VFS to write to filesystem quota file.
314
241The read_inode() method is responsible for filling in the "i_op" 315The read_inode() method is responsible for filling in the "i_op"
242field. This is a pointer to a "struct inode_operations" which 316field. This is a pointer to a "struct inode_operations" which
243describes the methods that can be performed on individual inodes. 317describes the methods that can be performed on individual inodes.
244 318
245 319
246struct inode_operations <section> 320struct inode_operations
247======================= 321=======================
248 322
249This describes how the VFS can manipulate an inode in your 323This describes how the VFS can manipulate an inode in your
250filesystem. As of kernel 2.1.99, the following members are defined: 324filesystem. As of kernel 2.6.13, the following members are defined:
251 325
252struct inode_operations { 326struct inode_operations {
253 struct file_operations * default_file_ops; 327 int (*create) (struct inode *,struct dentry *,int, struct nameidata *);
254 int (*create) (struct inode *,struct dentry *,int); 328 struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameidata *);
255 int (*lookup) (struct inode *,struct dentry *);
256 int (*link) (struct dentry *,struct inode *,struct dentry *); 329 int (*link) (struct dentry *,struct inode *,struct dentry *);
257 int (*unlink) (struct inode *,struct dentry *); 330 int (*unlink) (struct inode *,struct dentry *);
258 int (*symlink) (struct inode *,struct dentry *,const char *); 331 int (*symlink) (struct inode *,struct dentry *,const char *);
@@ -261,25 +334,22 @@ struct inode_operations {
261 int (*mknod) (struct inode *,struct dentry *,int,dev_t); 334 int (*mknod) (struct inode *,struct dentry *,int,dev_t);
262 int (*rename) (struct inode *, struct dentry *, 335 int (*rename) (struct inode *, struct dentry *,
263 struct inode *, struct dentry *); 336 struct inode *, struct dentry *);
264 int (*readlink) (struct dentry *, char *,int); 337 int (*readlink) (struct dentry *, char __user *,int);
265 struct dentry * (*follow_link) (struct dentry *, struct dentry *); 338 void * (*follow_link) (struct dentry *, struct nameidata *);
266 int (*readpage) (struct file *, struct page *); 339 void (*put_link) (struct dentry *, struct nameidata *, void *);
267 int (*writepage) (struct page *page, struct writeback_control *wbc);
268 int (*bmap) (struct inode *,int);
269 void (*truncate) (struct inode *); 340 void (*truncate) (struct inode *);
270 int (*permission) (struct inode *, int); 341 int (*permission) (struct inode *, int, struct nameidata *);
271 int (*smap) (struct inode *,int); 342 int (*setattr) (struct dentry *, struct iattr *);
272 int (*updatepage) (struct file *, struct page *, const char *, 343 int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
273 unsigned long, unsigned int, int); 344 int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
274 int (*revalidate) (struct dentry *); 345 ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t);
346 ssize_t (*listxattr) (struct dentry *, char *, size_t);
347 int (*removexattr) (struct dentry *, const char *);
275}; 348};
276 349
277Again, all methods are called without any locks being held, unless 350Again, all methods are called without any locks being held, unless
278otherwise noted. 351otherwise noted.
279 352
280 default_file_ops: this is a pointer to a "struct file_operations"
281 which describes how to open and then manipulate open files
282
283 create: called by the open(2) and creat(2) system calls. Only 353 create: called by the open(2) and creat(2) system calls. Only
284 required if you want to support regular files. The dentry you 354 required if you want to support regular files. The dentry you
285 get should not have an inode (i.e. it should be a negative 355 get should not have an inode (i.e. it should be a negative
@@ -328,31 +398,143 @@ otherwise noted.
328 you want to support reading symbolic links 398 you want to support reading symbolic links
329 399
330 follow_link: called by the VFS to follow a symbolic link to the 400 follow_link: called by the VFS to follow a symbolic link to the
331 inode it points to. Only required if you want to support 401 inode it points to. Only required if you want to support
332 symbolic links 402 symbolic links. This function returns a void pointer cookie
403 that is passed to put_link().
404
405 put_link: called by the VFS to release resources allocated by
406 follow_link(). The cookie returned by follow_link() is passed to
407 to this function as the last parameter. It is used by filesystems
408 such as NFS where page cache is not stable (i.e. page that was
409 installed when the symbolic link walk started might not be in the
410 page cache at the end of the walk).
411
412 truncate: called by the VFS to change the size of a file. The i_size
413 field of the inode is set to the desired size by the VFS before
414 this function is called. This function is called by the truncate(2)
415 system call and related functionality.
416
417 permission: called by the VFS to check for access rights on a POSIX-like
418 filesystem.
419
420 setattr: called by the VFS to set attributes for a file. This function is
421 called by chmod(2) and related system calls.
422
423 getattr: called by the VFS to get attributes of a file. This function is
424 called by stat(2) and related system calls.
425
426 setxattr: called by the VFS to set an extended attribute for a file.
427 Extended attribute is a name:value pair associated with an inode. This
428 function is called by setxattr(2) system call.
429
430 getxattr: called by the VFS to retrieve the value of an extended attribute
431 name. This function is called by getxattr(2) function call.
432
433 listxattr: called by the VFS to list all extended attributes for a given
434 file. This function is called by listxattr(2) system call.
435
436 removexattr: called by the VFS to remove an extended attribute from a file.
437 This function is called by removexattr(2) system call.
438
439
440struct address_space_operations
441===============================
442
443This describes how the VFS can manipulate mapping of a file to page cache in
444your filesystem. As of kernel 2.6.13, the following members are defined:
445
446struct address_space_operations {
447 int (*writepage)(struct page *page, struct writeback_control *wbc);
448 int (*readpage)(struct file *, struct page *);
449 int (*sync_page)(struct page *);
450 int (*writepages)(struct address_space *, struct writeback_control *);
451 int (*set_page_dirty)(struct page *page);
452 int (*readpages)(struct file *filp, struct address_space *mapping,
453 struct list_head *pages, unsigned nr_pages);
454 int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
455 int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
456 sector_t (*bmap)(struct address_space *, sector_t);
457 int (*invalidatepage) (struct page *, unsigned long);
458 int (*releasepage) (struct page *, int);
459 ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
460 loff_t offset, unsigned long nr_segs);
461 struct page* (*get_xip_page)(struct address_space *, sector_t,
462 int);
463};
464
465 writepage: called by the VM write a dirty page to backing store.
466
467 readpage: called by the VM to read a page from backing store.
468
469 sync_page: called by the VM to notify the backing store to perform all
470 queued I/O operations for a page. I/O operations for other pages
471 associated with this address_space object may also be performed.
472
473 writepages: called by the VM to write out pages associated with the
474 address_space object.
475
476 set_page_dirty: called by the VM to set a page dirty.
477
478 readpages: called by the VM to read pages associated with the address_space
479 object.
333 480
481 prepare_write: called by the generic write path in VM to set up a write
482 request for a page.
334 483
335struct file_operations <section> 484 commit_write: called by the generic write path in VM to write page to
485 its backing store.
486
487 bmap: called by the VFS to map a logical block offset within object to
488 physical block number. This method is use by for the legacy FIBMAP
489 ioctl. Other uses are discouraged.
490
491 invalidatepage: called by the VM on truncate to disassociate a page from its
492 address_space mapping.
493
494 releasepage: called by the VFS to release filesystem specific metadata from
495 a page.
496
497 direct_IO: called by the VM for direct I/O writes and reads.
498
499 get_xip_page: called by the VM to translate a block number to a page.
500 The page is valid until the corresponding filesystem is unmounted.
501 Filesystems that want to use execute-in-place (XIP) need to implement
502 it. An example implementation can be found in fs/ext2/xip.c.
503
504
505struct file_operations
336====================== 506======================
337 507
338This describes how the VFS can manipulate an open file. As of kernel 508This describes how the VFS can manipulate an open file. As of kernel
3392.1.99, the following members are defined: 5092.6.13, the following members are defined:
340 510
341struct file_operations { 511struct file_operations {
342 loff_t (*llseek) (struct file *, loff_t, int); 512 loff_t (*llseek) (struct file *, loff_t, int);
343 ssize_t (*read) (struct file *, char *, size_t, loff_t *); 513 ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
344 ssize_t (*write) (struct file *, const char *, size_t, loff_t *); 514 ssize_t (*aio_read) (struct kiocb *, char __user *, size_t, loff_t);
515 ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
516 ssize_t (*aio_write) (struct kiocb *, const char __user *, size_t, loff_t);
345 int (*readdir) (struct file *, void *, filldir_t); 517 int (*readdir) (struct file *, void *, filldir_t);
346 unsigned int (*poll) (struct file *, struct poll_table_struct *); 518 unsigned int (*poll) (struct file *, struct poll_table_struct *);
347 int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long); 519 int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
520 long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
521 long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
348 int (*mmap) (struct file *, struct vm_area_struct *); 522 int (*mmap) (struct file *, struct vm_area_struct *);
349 int (*open) (struct inode *, struct file *); 523 int (*open) (struct inode *, struct file *);
524 int (*flush) (struct file *);
350 int (*release) (struct inode *, struct file *); 525 int (*release) (struct inode *, struct file *);
351 int (*fsync) (struct file *, struct dentry *); 526 int (*fsync) (struct file *, struct dentry *, int datasync);
352 int (*fasync) (struct file *, int); 527 int (*aio_fsync) (struct kiocb *, int datasync);
353 int (*check_media_change) (kdev_t dev); 528 int (*fasync) (int, struct file *, int);
354 int (*revalidate) (kdev_t dev);
355 int (*lock) (struct file *, int, struct file_lock *); 529 int (*lock) (struct file *, int, struct file_lock *);
530 ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);
531 ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);
532 ssize_t (*sendfile) (struct file *, loff_t *, size_t, read_actor_t, void *);
533 ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
534 unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
535 int (*check_flags)(int);
536 int (*dir_notify)(struct file *filp, unsigned long arg);
537 int (*flock) (struct file *, int, struct file_lock *);
356}; 538};
357 539
358Again, all methods are called without any locks being held, unless 540Again, all methods are called without any locks being held, unless
@@ -362,8 +544,12 @@ otherwise noted.
362 544
363 read: called by read(2) and related system calls 545 read: called by read(2) and related system calls
364 546
547 aio_read: called by io_submit(2) and other asynchronous I/O operations
548
365 write: called by write(2) and related system calls 549 write: called by write(2) and related system calls
366 550
551 aio_write: called by io_submit(2) and other asynchronous I/O operations
552
367 readdir: called when the VFS needs to read the directory contents 553 readdir: called when the VFS needs to read the directory contents
368 554
369 poll: called by the VFS when a process wants to check if there is 555 poll: called by the VFS when a process wants to check if there is
@@ -372,18 +558,25 @@ otherwise noted.
372 558
373 ioctl: called by the ioctl(2) system call 559 ioctl: called by the ioctl(2) system call
374 560
561 unlocked_ioctl: called by the ioctl(2) system call. Filesystems that do not
562 require the BKL should use this method instead of the ioctl() above.
563
564 compat_ioctl: called by the ioctl(2) system call when 32 bit system calls
565 are used on 64 bit kernels.
566
375 mmap: called by the mmap(2) system call 567 mmap: called by the mmap(2) system call
376 568
377 open: called by the VFS when an inode should be opened. When the VFS 569 open: called by the VFS when an inode should be opened. When the VFS
378 opens a file, it creates a new "struct file" and initialises 570 opens a file, it creates a new "struct file". It then calls the
379 the "f_op" file operations member with the "default_file_ops" 571 open method for the newly allocated file structure. You might
380 field in the inode structure. It then calls the open method 572 think that the open method really belongs in
381 for the newly allocated file structure. You might think that 573 "struct inode_operations", and you may be right. I think it's
382 the open method really belongs in "struct inode_operations", 574 done the way it is because it makes filesystems simpler to
383 and you may be right. I think it's done the way it is because 575 implement. The open() method is a good place to initialize the
384 it makes filesystems simpler to implement. The open() method 576 "private_data" member in the file structure if you want to point
385 is a good place to initialise the "private_data" member in the 577 to a device structure
386 file structure if you want to point to a device structure 578
579 flush: called by the close(2) system call to flush a file
387 580
388 release: called when the last reference to an open file is closed 581 release: called when the last reference to an open file is closed
389 582
@@ -392,6 +585,23 @@ otherwise noted.
392 fasync: called by the fcntl(2) system call when asynchronous 585 fasync: called by the fcntl(2) system call when asynchronous
393 (non-blocking) mode is enabled for a file 586 (non-blocking) mode is enabled for a file
394 587
588 lock: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW
589 commands
590
591 readv: called by the readv(2) system call
592
593 writev: called by the writev(2) system call
594
595 sendfile: called by the sendfile(2) system call
596
597 get_unmapped_area: called by the mmap(2) system call
598
599 check_flags: called by the fcntl(2) system call for F_SETFL command
600
601 dir_notify: called by the fcntl(2) system call for F_NOTIFY command
602
603 flock: called by the flock(2) system call
604
395Note that the file operations are implemented by the specific 605Note that the file operations are implemented by the specific
396filesystem in which the inode resides. When opening a device node 606filesystem in which the inode resides. When opening a device node
397(character or block special) most filesystems will call special 607(character or block special) most filesystems will call special
@@ -400,29 +610,28 @@ driver information. These support routines replace the filesystem file
400operations with those for the device driver, and then proceed to call 610operations with those for the device driver, and then proceed to call
401the new open() method for the file. This is how opening a device file 611the new open() method for the file. This is how opening a device file
402in the filesystem eventually ends up calling the device driver open() 612in the filesystem eventually ends up calling the device driver open()
403method. Note the devfs (the Device FileSystem) has a more direct path 613method.
404from device node to device driver (this is an unofficial kernel
405patch).
406 614
407 615
408Directory Entry Cache (dcache) <section> 616Directory Entry Cache (dcache)
409------------------------------ 617==============================
618
410 619
411struct dentry_operations 620struct dentry_operations
412======================== 621------------------------
413 622
414This describes how a filesystem can overload the standard dentry 623This describes how a filesystem can overload the standard dentry
415operations. Dentries and the dcache are the domain of the VFS and the 624operations. Dentries and the dcache are the domain of the VFS and the
416individual filesystem implementations. Device drivers have no business 625individual filesystem implementations. Device drivers have no business
417here. These methods may be set to NULL, as they are either optional or 626here. These methods may be set to NULL, as they are either optional or
418the VFS uses a default. As of kernel 2.1.99, the following members are 627the VFS uses a default. As of kernel 2.6.13, the following members are
419defined: 628defined:
420 629
421struct dentry_operations { 630struct dentry_operations {
422 int (*d_revalidate)(struct dentry *); 631 int (*d_revalidate)(struct dentry *, struct nameidata *);
423 int (*d_hash) (struct dentry *, struct qstr *); 632 int (*d_hash) (struct dentry *, struct qstr *);
424 int (*d_compare) (struct dentry *, struct qstr *, struct qstr *); 633 int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
425 void (*d_delete)(struct dentry *); 634 int (*d_delete)(struct dentry *);
426 void (*d_release)(struct dentry *); 635 void (*d_release)(struct dentry *);
427 void (*d_iput)(struct dentry *, struct inode *); 636 void (*d_iput)(struct dentry *, struct inode *);
428}; 637};
@@ -451,6 +660,7 @@ Each dentry has a pointer to its parent dentry, as well as a hash list
451of child dentries. Child dentries are basically like files in a 660of child dentries. Child dentries are basically like files in a
452directory. 661directory.
453 662
663
454Directory Entry Cache APIs 664Directory Entry Cache APIs
455-------------------------- 665--------------------------
456 666
@@ -471,7 +681,7 @@ manipulate dentries:
471 "d_delete" method is called 681 "d_delete" method is called
472 682
473 d_drop: this unhashes a dentry from its parents hash list. A 683 d_drop: this unhashes a dentry from its parents hash list. A
474 subsequent call to dput() will dellocate the dentry if its 684 subsequent call to dput() will deallocate the dentry if its
475 usage count drops to 0 685 usage count drops to 0
476 686
477 d_delete: delete a dentry. If there are no other open references to 687 d_delete: delete a dentry. If there are no other open references to
@@ -507,16 +717,16 @@ up by walking the tree starting with the first component
507of the pathname and using that dentry along with the next 717of the pathname and using that dentry along with the next
508component to look up the next level and so on. Since it 718component to look up the next level and so on. Since it
509is a frequent operation for workloads like multiuser 719is a frequent operation for workloads like multiuser
510environments and webservers, it is important to optimize 720environments and web servers, it is important to optimize
511this path. 721this path.
512 722
513Prior to 2.5.10, dcache_lock was acquired in d_lookup and thus 723Prior to 2.5.10, dcache_lock was acquired in d_lookup and thus
514in every component during path look-up. Since 2.5.10 onwards, 724in every component during path look-up. Since 2.5.10 onwards,
515fastwalk algorithm changed this by holding the dcache_lock 725fast-walk algorithm changed this by holding the dcache_lock
516at the beginning and walking as many cached path component 726at the beginning and walking as many cached path component
517dentries as possible. This signficantly decreases the number 727dentries as possible. This significantly decreases the number
518of acquisition of dcache_lock. However it also increases the 728of acquisition of dcache_lock. However it also increases the
519lock hold time signficantly and affects performance in large 729lock hold time significantly and affects performance in large
520SMP machines. Since 2.5.62 kernel, dcache has been using 730SMP machines. Since 2.5.62 kernel, dcache has been using
521a new locking model that uses RCU to make dcache look-up 731a new locking model that uses RCU to make dcache look-up
522lock-free. 732lock-free.
@@ -527,7 +737,7 @@ protected the hash chain, d_child, d_alias, d_lru lists as well
527as d_inode and several other things like mount look-up. RCU-based 737as d_inode and several other things like mount look-up. RCU-based
528changes affect only the way the hash chain is protected. For everything 738changes affect only the way the hash chain is protected. For everything
529else the dcache_lock must be taken for both traversing as well as 739else the dcache_lock must be taken for both traversing as well as
530updating. The hash chain updations too take the dcache_lock. 740updating. The hash chain updates too take the dcache_lock.
531The significant change is the way d_lookup traverses the hash chain, 741The significant change is the way d_lookup traverses the hash chain,
532it doesn't acquire the dcache_lock for this and rely on RCU to 742it doesn't acquire the dcache_lock for this and rely on RCU to
533ensure that the dentry has not been *freed*. 743ensure that the dentry has not been *freed*.
@@ -535,14 +745,15 @@ ensure that the dentry has not been *freed*.
535 745
536Dcache locking details 746Dcache locking details
537---------------------- 747----------------------
748
538For many multi-user workloads, open() and stat() on files are 749For many multi-user workloads, open() and stat() on files are
539very frequently occurring operations. Both involve walking 750very frequently occurring operations. Both involve walking
540of path names to find the dentry corresponding to the 751of path names to find the dentry corresponding to the
541concerned file. In 2.4 kernel, dcache_lock was held 752concerned file. In 2.4 kernel, dcache_lock was held
542during look-up of each path component. Contention and 753during look-up of each path component. Contention and
543cacheline bouncing of this global lock caused significant 754cache-line bouncing of this global lock caused significant
544scalability problems. With the introduction of RCU 755scalability problems. With the introduction of RCU
545in linux kernel, this was worked around by making 756in Linux kernel, this was worked around by making
546the look-up of path components during path walking lock-free. 757the look-up of path components during path walking lock-free.
547 758
548 759
@@ -562,7 +773,7 @@ Some of the important changes are :
5622. Insertion of a dentry into the hash table is done using 7732. Insertion of a dentry into the hash table is done using
563 hlist_add_head_rcu() which take care of ordering the writes - 774 hlist_add_head_rcu() which take care of ordering the writes -
564 the writes to the dentry must be visible before the dentry 775 the writes to the dentry must be visible before the dentry
565 is inserted. This works in conjuction with hlist_for_each_rcu() 776 is inserted. This works in conjunction with hlist_for_each_rcu()
566 while walking the hash chain. The only requirement is that 777 while walking the hash chain. The only requirement is that
567 all initialization to the dentry must be done before hlist_add_head_rcu() 778 all initialization to the dentry must be done before hlist_add_head_rcu()
568 since we don't have dcache_lock protection while traversing 779 since we don't have dcache_lock protection while traversing
@@ -584,7 +795,7 @@ Some of the important changes are :
584 the same. In some sense, dcache_rcu path walking looks like 795 the same. In some sense, dcache_rcu path walking looks like
585 the pre-2.5.10 version. 796 the pre-2.5.10 version.
586 797
5875. All dentry hash chain updations must take the dcache_lock as well as 7985. All dentry hash chain updates must take the dcache_lock as well as
588 the per-dentry lock in that order. dput() does this to ensure 799 the per-dentry lock in that order. dput() does this to ensure
589 that a dentry that has just been looked up in another CPU 800 that a dentry that has just been looked up in another CPU
590 doesn't get deleted before dget() can be done on it. 801 doesn't get deleted before dget() can be done on it.
@@ -640,10 +851,10 @@ handled as described below :
640 Since we redo the d_parent check and compare name while holding 851 Since we redo the d_parent check and compare name while holding
641 d_lock, lock-free look-up will not race against d_move(). 852 d_lock, lock-free look-up will not race against d_move().
642 853
6434. There can be a theoritical race when a dentry keeps coming back 8544. There can be a theoretical race when a dentry keeps coming back
644 to original bucket due to double moves. Due to this look-up may 855 to original bucket due to double moves. Due to this look-up may
645 consider that it has never moved and can end up in a infinite loop. 856 consider that it has never moved and can end up in a infinite loop.
646 But this is not any worse that theoritical livelocks we already 857 But this is not any worse that theoretical livelocks we already
647 have in the kernel. 858 have in the kernel.
648 859
649 860
diff --git a/Documentation/ioctl/cdrom.txt b/Documentation/ioctl/cdrom.txt
index 4ccdcc6fe364..8ec32cc49eb1 100644
--- a/Documentation/ioctl/cdrom.txt
+++ b/Documentation/ioctl/cdrom.txt
@@ -878,7 +878,7 @@ DVD_READ_STRUCT Read structure
878 878
879 error returns: 879 error returns:
880 EINVAL physical.layer_num exceeds number of layers 880 EINVAL physical.layer_num exceeds number of layers
881 EIO Recieved invalid response from drive 881 EIO Received invalid response from drive
882 882
883 883
884 884
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt
index 9a1586590d82..d802ce88bedc 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -31,7 +31,7 @@ This document describes the Linux kernel Makefiles.
31 31
32 === 6 Architecture Makefiles 32 === 6 Architecture Makefiles
33 --- 6.1 Set variables to tweak the build to the architecture 33 --- 6.1 Set variables to tweak the build to the architecture
34 --- 6.2 Add prerequisites to prepare: 34 --- 6.2 Add prerequisites to archprepare:
35 --- 6.3 List directories to visit when descending 35 --- 6.3 List directories to visit when descending
36 --- 6.4 Architecture specific boot images 36 --- 6.4 Architecture specific boot images
37 --- 6.5 Building non-kbuild targets 37 --- 6.5 Building non-kbuild targets
@@ -734,18 +734,18 @@ When kbuild executes the following steps are followed (roughly):
734 for loadable kernel modules. 734 for loadable kernel modules.
735 735
736 736
737--- 6.2 Add prerequisites to prepare: 737--- 6.2 Add prerequisites to archprepare:
738 738
739 The prepare: rule is used to list prerequisites that needs to be 739 The archprepare: rule is used to list prerequisites that needs to be
740 built before starting to descend down in the subdirectories. 740 built before starting to descend down in the subdirectories.
741 This is usual header files containing assembler constants. 741 This is usual header files containing assembler constants.
742 742
743 Example: 743 Example:
744 #arch/s390/Makefile 744 #arch/arm/Makefile
745 prepare: include/asm-$(ARCH)/offsets.h 745 archprepare: maketools
746 746
747 In this example the file include/asm-$(ARCH)/offsets.h will 747 In this example the file target maketools will be processed
748 be built before descending down in the subdirectories. 748 before descending down in the subdirectories.
749 See also chapter XXX-TODO that describe how kbuild supports 749 See also chapter XXX-TODO that describe how kbuild supports
750 generating offset header files. 750 generating offset header files.
751 751
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 7ff213f4becd..1f5f7d28c9e6 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -39,8 +39,7 @@ SETUP
39 and apply http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-kdump.patch 39 and apply http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-kdump.patch
40 and after that build the source. 40 and after that build the source.
41 41
422) Download and build the appropriate (latest) kexec/kdump (-mm) kernel 422) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernel.
43 patchset and apply it to the vanilla kernel tree.
44 43
45 Two kernels need to be built in order to get this feature working. 44 Two kernels need to be built in order to get this feature working.
46 45
@@ -84,15 +83,16 @@ SETUP
84 83
854) Load the second kernel to be booted using: 844) Load the second kernel to be booted using:
86 85
87 kexec -p <second-kernel> --crash-dump --args-linux --append="root=<root-dev> 86 kexec -p <second-kernel> --args-linux --elf32-core-headers
88 init 1 irqpoll" 87 --append="root=<root-dev> init 1 irqpoll"
89 88
90 Note: i) <second-kernel> has to be a vmlinux image. bzImage will not work, 89 Note: i) <second-kernel> has to be a vmlinux image. bzImage will not work,
91 as of now. 90 as of now.
92 ii) By default ELF headers are stored in ELF32 format (for i386). This 91 ii) By default ELF headers are stored in ELF64 format. Option
93 is sufficient to represent the physical memory up to 4GB. To store 92 --elf32-core-headers forces generation of ELF32 headers. gdb can
94 headers in ELF64 format, specifiy "--elf64-core-headers" on the 93 not open ELF64 headers on 32 bit systems. So creating ELF32
95 kexec command line additionally. 94 headers can come handy for users who have got non-PAE systems and
95 hence have memory less than 4GB.
96 iii) Specify "irqpoll" as command line parameter. This reduces driver 96 iii) Specify "irqpoll" as command line parameter. This reduces driver
97 initialization failures in second kernel due to shared interrupts. 97 initialization failures in second kernel due to shared interrupts.
98 98
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index d2f0c67ba1fb..7086f0a90d14 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -164,6 +164,15 @@ running once the system is up.
164 over-ride platform specific driver. 164 over-ride platform specific driver.
165 See also Documentation/acpi-hotkey.txt. 165 See also Documentation/acpi-hotkey.txt.
166 166
167 enable_timer_pin_1 [i386,x86-64]
168 Enable PIN 1 of APIC timer
169 Can be useful to work around chipset bugs (in particular on some ATI chipsets)
170 The kernel tries to set a reasonable default.
171
172 disable_timer_pin_1 [i386,x86-64]
173 Disable PIN 1 of APIC timer
174 Can be useful to work around chipset bugs.
175
167 ad1816= [HW,OSS] 176 ad1816= [HW,OSS]
168 Format: <io>,<irq>,<dma>,<dma2> 177 Format: <io>,<irq>,<dma>,<dma2>
169 See also Documentation/sound/oss/AD1816. 178 See also Documentation/sound/oss/AD1816.
@@ -549,6 +558,7 @@ running once the system is up.
549 keyboard and can not control its state 558 keyboard and can not control its state
550 (Don't attempt to blink the leds) 559 (Don't attempt to blink the leds)
551 i8042.noaux [HW] Don't check for auxiliary (== mouse) port 560 i8042.noaux [HW] Don't check for auxiliary (== mouse) port
561 i8042.nokbd [HW] Don't check/create keyboard port
552 i8042.nomux [HW] Don't check presence of an active multiplexing 562 i8042.nomux [HW] Don't check presence of an active multiplexing
553 controller 563 controller
554 i8042.nopnp [HW] Don't use ACPIPnP / PnPBIOS to discover KBD/AUX 564 i8042.nopnp [HW] Don't use ACPIPnP / PnPBIOS to discover KBD/AUX
diff --git a/Documentation/mono.txt b/Documentation/mono.txt
index 6739ab9615ef..807a0c7b4737 100644
--- a/Documentation/mono.txt
+++ b/Documentation/mono.txt
@@ -30,7 +30,7 @@ other program after you have done the following:
30 Read the file 'binfmt_misc.txt' in this directory to know 30 Read the file 'binfmt_misc.txt' in this directory to know
31 more about the configuration process. 31 more about the configuration process.
32 32
333) Add the following enries to /etc/rc.local or similar script 333) Add the following entries to /etc/rc.local or similar script
34 to be run at system startup: 34 to be run at system startup:
35 35
36# Insert BINFMT_MISC module into the kernel 36# Insert BINFMT_MISC module into the kernel
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 24d029455baa..a55f0f95b171 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -1241,7 +1241,7 @@ traffic while still maintaining carrier on.
1241 1241
1242 If running SNMP agents, the bonding driver should be loaded 1242 If running SNMP agents, the bonding driver should be loaded
1243before any network drivers participating in a bond. This requirement 1243before any network drivers participating in a bond. This requirement
1244is due to the the interface index (ipAdEntIfIndex) being associated to 1244is due to the interface index (ipAdEntIfIndex) being associated to
1245the first interface found with a given IP address. That is, there is 1245the first interface found with a given IP address. That is, there is
1246only one ipAdEntIfIndex for each IP address. For example, if eth0 and 1246only one ipAdEntIfIndex for each IP address. For example, if eth0 and
1247eth1 are slaves of bond0 and the driver for eth0 is loaded before the 1247eth1 are slaves of bond0 and the driver for eth0 is loaded before the
@@ -1937,7 +1937,7 @@ switches currently available support 802.3ad.
1937 If not explicitly configured (with ifconfig or ip link), the 1937 If not explicitly configured (with ifconfig or ip link), the
1938MAC address of the bonding device is taken from its first slave 1938MAC address of the bonding device is taken from its first slave
1939device. This MAC address is then passed to all following slaves and 1939device. This MAC address is then passed to all following slaves and
1940remains persistent (even if the the first slave is removed) until the 1940remains persistent (even if the first slave is removed) until the
1941bonding device is brought down or reconfigured. 1941bonding device is brought down or reconfigured.
1942 1942
1943 If you wish to change the MAC address, you can set it with 1943 If you wish to change the MAC address, you can set it with
diff --git a/Documentation/networking/wan-router.txt b/Documentation/networking/wan-router.txt
index aea20cd2a56e..c96897aa08b6 100644
--- a/Documentation/networking/wan-router.txt
+++ b/Documentation/networking/wan-router.txt
@@ -355,7 +355,7 @@ REVISION HISTORY
355 There is no functional difference between the two packages 355 There is no functional difference between the two packages
356 356
3572.0.7 Aug 26, 1999 o Merged X25API code into WANPIPE. 3572.0.7 Aug 26, 1999 o Merged X25API code into WANPIPE.
358 o Fixed a memeory leak for X25API 358 o Fixed a memory leak for X25API
359 o Updated the X25API code for 2.2.X kernels. 359 o Updated the X25API code for 2.2.X kernels.
360 o Improved NEM handling. 360 o Improved NEM handling.
361 361
@@ -514,7 +514,7 @@ beta2-2.2.0 Jan 8 2001
514 o Patches for 2.4.0 kernel 514 o Patches for 2.4.0 kernel
515 o Patches for 2.2.18 kernel 515 o Patches for 2.2.18 kernel
516 o Minor updates to PPP and CHLDC drivers. 516 o Minor updates to PPP and CHLDC drivers.
517 Note: No functinal difference. 517 Note: No functional difference.
518 518
519beta3-2.2.9 Jan 10 2001 519beta3-2.2.9 Jan 10 2001
520 o I missed the 2.2.18 kernel patches in beta2-2.2.0 520 o I missed the 2.2.18 kernel patches in beta2-2.2.0
diff --git a/Documentation/pci.txt b/Documentation/pci.txt
index 76d28d033657..711210b38f5f 100644
--- a/Documentation/pci.txt
+++ b/Documentation/pci.txt
@@ -84,7 +84,7 @@ Each entry consists of:
84 84
85Most drivers don't need to use the driver_data field. Best practice 85Most drivers don't need to use the driver_data field. Best practice
86for use of driver_data is to use it as an index into a static list of 86for use of driver_data is to use it as an index into a static list of
87equivalant device types, not to use it as a pointer. 87equivalent device types, not to use it as a pointer.
88 88
89Have a table entry {PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID} 89Have a table entry {PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID}
90to have probe() called for every PCI device known to the system. 90to have probe() called for every PCI device known to the system.
diff --git a/Documentation/powerpc/eeh-pci-error-recovery.txt b/Documentation/powerpc/eeh-pci-error-recovery.txt
index 2bfe71beec5b..e75d7474322c 100644
--- a/Documentation/powerpc/eeh-pci-error-recovery.txt
+++ b/Documentation/powerpc/eeh-pci-error-recovery.txt
@@ -134,7 +134,7 @@ pci_get_device_by_addr() will find the pci device associated
134with that address (if any). 134with that address (if any).
135 135
136The default include/asm-ppc64/io.h macros readb(), inb(), insb(), 136The default include/asm-ppc64/io.h macros readb(), inb(), insb(),
137etc. include a check to see if the the i/o read returned all-0xff's. 137etc. include a check to see if the i/o read returned all-0xff's.
138If so, these make a call to eeh_dn_check_failure(), which in turn 138If so, these make a call to eeh_dn_check_failure(), which in turn
139asks the firmware if the all-ff's value is the sign of a true EEH 139asks the firmware if the all-ff's value is the sign of a true EEH
140error. If it is not, processing continues as normal. The grand 140error. If it is not, processing continues as normal. The grand
diff --git a/Documentation/s390/s390dbf.txt b/Documentation/s390/s390dbf.txt
index e24fdeada970..e321a8ed2a2d 100644
--- a/Documentation/s390/s390dbf.txt
+++ b/Documentation/s390/s390dbf.txt
@@ -468,7 +468,7 @@ The hex_ascii view shows the data field in hex and ascii representation
468The raw view returns a bytestream as the debug areas are stored in memory. 468The raw view returns a bytestream as the debug areas are stored in memory.
469 469
470The sprintf view formats the debug entries in the same way as the sprintf 470The sprintf view formats the debug entries in the same way as the sprintf
471function would do. The sprintf event/expection fuctions write to the 471function would do. The sprintf event/expection functions write to the
472debug entry a pointer to the format string (size = sizeof(long)) 472debug entry a pointer to the format string (size = sizeof(long))
473and for each vararg a long value. So e.g. for a debug entry with a format 473and for each vararg a long value. So e.g. for a debug entry with a format
474string plus two varargs one would need to allocate a (3 * sizeof(long)) 474string plus two varargs one would need to allocate a (3 * sizeof(long))
diff --git a/Documentation/scsi/ibmmca.txt b/Documentation/scsi/ibmmca.txt
index 2814491600ff..2ffb3ae0ef4d 100644
--- a/Documentation/scsi/ibmmca.txt
+++ b/Documentation/scsi/ibmmca.txt
@@ -344,7 +344,7 @@
344 /proc/scsi/ibmmca/<host_no>. ibmmca_proc_info() provides this information. 344 /proc/scsi/ibmmca/<host_no>. ibmmca_proc_info() provides this information.
345 345
346 This table is quite informative for interested users. It shows the load 346 This table is quite informative for interested users. It shows the load
347 of commands on the subsystem and wether you are running the bypassed 347 of commands on the subsystem and whether you are running the bypassed
348 (software) or integrated (hardware) SCSI-command set (see below). The 348 (software) or integrated (hardware) SCSI-command set (see below). The
349 amount of accesses is shown. Read, write, modeselect is shown separately 349 amount of accesses is shown. Read, write, modeselect is shown separately
350 in order to help debugging problems with CD-ROMs or tapedrives. 350 in order to help debugging problems with CD-ROMs or tapedrives.
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index 5c49ba07e709..ebfcdf28485f 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -1459,7 +1459,7 @@ devices where %i is sound card number from zero to seven.
1459To auto-load an ALSA driver for OSS services, define the string 1459To auto-load an ALSA driver for OSS services, define the string
1460'sound-slot-%i' where %i means the slot number for OSS, which 1460'sound-slot-%i' where %i means the slot number for OSS, which
1461corresponds to the card index of ALSA. Usually, define this 1461corresponds to the card index of ALSA. Usually, define this
1462as the the same card module. 1462as the same card module.
1463 1463
1464An example configuration for a single emu10k1 card is like below: 1464An example configuration for a single emu10k1 card is like below:
1465----- /etc/modprobe.conf 1465----- /etc/modprobe.conf
diff --git a/Documentation/sparse.txt b/Documentation/sparse.txt
index f97841478459..5df44dc894e5 100644
--- a/Documentation/sparse.txt
+++ b/Documentation/sparse.txt
@@ -57,7 +57,7 @@ With BK, you can just get it from
57 57
58and DaveJ has tar-balls at 58and DaveJ has tar-balls at
59 59
60 http://www.codemonkey.org.uk/projects/bitkeeper/sparse/ 60 http://www.codemonkey.org.uk/projects/git-snapshots/sparse/
61 61
62 62
63Once you have it, just do 63Once you have it, just do
diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
index 136d817c01ba..baf17b381588 100644
--- a/Documentation/sysrq.txt
+++ b/Documentation/sysrq.txt
@@ -171,7 +171,7 @@ the header 'include/linux/sysrq.h', this will define everything else you need.
171Next, you must create a sysrq_key_op struct, and populate it with A) the key 171Next, you must create a sysrq_key_op struct, and populate it with A) the key
172handler function you will use, B) a help_msg string, that will print when SysRQ 172handler function you will use, B) a help_msg string, that will print when SysRQ
173prints help, and C) an action_msg string, that will print right before your 173prints help, and C) an action_msg string, that will print right before your
174handler is called. Your handler must conform to the protoype in 'sysrq.h'. 174handler is called. Your handler must conform to the prototype in 'sysrq.h'.
175 175
176After the sysrq_key_op is created, you can call the macro 176After the sysrq_key_op is created, you can call the macro
177register_sysrq_key(int key, struct sysrq_key_op *op_p) that is defined in 177register_sysrq_key(int key, struct sysrq_key_op *op_p) that is defined in
diff --git a/Documentation/uml/UserModeLinux-HOWTO.txt b/Documentation/uml/UserModeLinux-HOWTO.txt
index 0c7b654fec99..544430e39980 100644
--- a/Documentation/uml/UserModeLinux-HOWTO.txt
+++ b/Documentation/uml/UserModeLinux-HOWTO.txt
@@ -2176,7 +2176,7 @@
2176 If you want to access files on the host machine from inside UML, you 2176 If you want to access files on the host machine from inside UML, you
2177 can treat it as a separate machine and either nfs mount directories 2177 can treat it as a separate machine and either nfs mount directories
2178 from the host or copy files into the virtual machine with scp or rcp. 2178 from the host or copy files into the virtual machine with scp or rcp.
2179 However, since UML is running on the the host, it can access those 2179 However, since UML is running on the host, it can access those
2180 files just like any other process and make them available inside the 2180 files just like any other process and make them available inside the
2181 virtual machine without needing to use the network. 2181 virtual machine without needing to use the network.
2182 2182
diff --git a/Documentation/usb/gadget_serial.txt b/Documentation/usb/gadget_serial.txt
index a938c3dd13d6..815f5c2301ff 100644
--- a/Documentation/usb/gadget_serial.txt
+++ b/Documentation/usb/gadget_serial.txt
@@ -20,7 +20,7 @@ License along with this program; if not, write to the Free
20Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, 20Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
21MA 02111-1307 USA. 21MA 02111-1307 USA.
22 22
23This document and the the gadget serial driver itself are 23This document and the gadget serial driver itself are
24Copyright (C) 2004 by Al Borchers (alborchers@steinerpoint.com). 24Copyright (C) 2004 by Al Borchers (alborchers@steinerpoint.com).
25 25
26If you have questions, problems, or suggestions for this driver 26If you have questions, problems, or suggestions for this driver
diff --git a/Documentation/video4linux/CARDLIST.bttv b/Documentation/video4linux/CARDLIST.bttv
index 62a12a08e2ac..ec785f9f15a3 100644
--- a/Documentation/video4linux/CARDLIST.bttv
+++ b/Documentation/video4linux/CARDLIST.bttv
@@ -126,10 +126,12 @@ card=124 - AverMedia AverTV DVB-T 761
126card=125 - MATRIX Vision Sigma-SQ 126card=125 - MATRIX Vision Sigma-SQ
127card=126 - MATRIX Vision Sigma-SLC 127card=126 - MATRIX Vision Sigma-SLC
128card=127 - APAC Viewcomp 878(AMAX) 128card=127 - APAC Viewcomp 878(AMAX)
129card=128 - DVICO FusionHDTV DVB-T Lite 129card=128 - DViCO FusionHDTV DVB-T Lite
130card=129 - V-Gear MyVCD 130card=129 - V-Gear MyVCD
131card=130 - Super TV Tuner 131card=130 - Super TV Tuner
132card=131 - Tibet Systems 'Progress DVR' CS16 132card=131 - Tibet Systems 'Progress DVR' CS16
133card=132 - Kodicom 4400R (master) 133card=132 - Kodicom 4400R (master)
134card=133 - Kodicom 4400R (slave) 134card=133 - Kodicom 4400R (slave)
135card=134 - Adlink RTV24 135card=134 - Adlink RTV24
136card=135 - DViCO FusionHDTV 5 Lite
137card=136 - Acorp Y878F
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134
index 1b5a3a9ffbe2..dc57225f39be 100644
--- a/Documentation/video4linux/CARDLIST.saa7134
+++ b/Documentation/video4linux/CARDLIST.saa7134
@@ -62,3 +62,6 @@
62 61 -> Philips TOUGH DVB-T reference design [1131:2004] 62 61 -> Philips TOUGH DVB-T reference design [1131:2004]
63 62 -> Compro VideoMate TV Gold+II 63 62 -> Compro VideoMate TV Gold+II
64 63 -> Kworld Xpert TV PVR7134 64 63 -> Kworld Xpert TV PVR7134
65 64 -> FlyTV mini Asus Digimatrix [1043:0210,1043:0210]
66 65 -> V-Stream Studio TV Terminator
67 66 -> Yuan TUN-900 (saa7135)
diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner
index f3302e1b1b9c..f5876be658a6 100644
--- a/Documentation/video4linux/CARDLIST.tuner
+++ b/Documentation/video4linux/CARDLIST.tuner
@@ -64,3 +64,4 @@ tuner=62 - Philips TEA5767HN FM Radio
64tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner 64tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner
65tuner=64 - LG TDVS-H062F/TUA6034 65tuner=64 - LG TDVS-H062F/TUA6034
66tuner=65 - Ymec TVF66T5-B/DFF 66tuner=65 - Ymec TVF66T5-B/DFF
67tuner=66 - LG NTSC (TALN mini series)
diff --git a/Documentation/video4linux/Zoran b/Documentation/video4linux/Zoran
index 01425c21986b..52c94bd7dca1 100644
--- a/Documentation/video4linux/Zoran
+++ b/Documentation/video4linux/Zoran
@@ -222,7 +222,7 @@ was introduced in 1991, is used in the DC10 old
222can generate: PAL , NTSC , SECAM 222can generate: PAL , NTSC , SECAM
223 223
224The adv717x, should be able to produce PAL N. But you find nothing PAL N 224The adv717x, should be able to produce PAL N. But you find nothing PAL N
225specific in the the registers. Seem that you have to reuse a other standard 225specific in the registers. Seem that you have to reuse a other standard
226to generate PAL N, maybe it would work if you use the PAL M settings. 226to generate PAL N, maybe it would work if you use the PAL M settings.
227 227
228========================== 228==========================
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt
index 678e8f192db2..ffe1c062088b 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -11,6 +11,11 @@ Machine check
11 If your BIOS doesn't do that it's a good idea to enable though 11 If your BIOS doesn't do that it's a good idea to enable though
12 to make sure you log even machine check events that result 12 to make sure you log even machine check events that result
13 in a reboot. 13 in a reboot.
14 mce=tolerancelevel (number)
15 0: always panic, 1: panic if deadlock possible,
16 2: try to avoid panic, 3: never panic or exit (for testing)
17 default is 1
18 Can be also set using sysfs which is preferable.
14 19
15 nomce (for compatibility with i386): same as mce=off 20 nomce (for compatibility with i386): same as mce=off
16 21