diff options
Diffstat (limited to 'Documentation/DocBook/kernel-hacking.tmpl')
-rw-r--r-- | Documentation/DocBook/kernel-hacking.tmpl | 1349 |
1 files changed, 1349 insertions, 0 deletions
diff --git a/Documentation/DocBook/kernel-hacking.tmpl b/Documentation/DocBook/kernel-hacking.tmpl new file mode 100644 index 000000000000..49a9ef82d575 --- /dev/null +++ b/Documentation/DocBook/kernel-hacking.tmpl | |||
@@ -0,0 +1,1349 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="lk-hacking-guide"> | ||
6 | <bookinfo> | ||
7 | <title>Unreliable Guide To Hacking The Linux Kernel</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>Paul</firstname> | ||
12 | <othername>Rusty</othername> | ||
13 | <surname>Russell</surname> | ||
14 | <affiliation> | ||
15 | <address> | ||
16 | <email>rusty@rustcorp.com.au</email> | ||
17 | </address> | ||
18 | </affiliation> | ||
19 | </author> | ||
20 | </authorgroup> | ||
21 | |||
22 | <copyright> | ||
23 | <year>2001</year> | ||
24 | <holder>Rusty Russell</holder> | ||
25 | </copyright> | ||
26 | |||
27 | <legalnotice> | ||
28 | <para> | ||
29 | This documentation is free software; you can redistribute | ||
30 | it and/or modify it under the terms of the GNU General Public | ||
31 | License as published by the Free Software Foundation; either | ||
32 | version 2 of the License, or (at your option) any later | ||
33 | version. | ||
34 | </para> | ||
35 | |||
36 | <para> | ||
37 | This program is distributed in the hope that it will be | ||
38 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
39 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
40 | See the GNU General Public License for more details. | ||
41 | </para> | ||
42 | |||
43 | <para> | ||
44 | You should have received a copy of the GNU General Public | ||
45 | License along with this program; if not, write to the Free | ||
46 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
47 | MA 02111-1307 USA | ||
48 | </para> | ||
49 | |||
50 | <para> | ||
51 | For more details see the file COPYING in the source | ||
52 | distribution of Linux. | ||
53 | </para> | ||
54 | </legalnotice> | ||
55 | |||
56 | <releaseinfo> | ||
57 | This is the first release of this document as part of the kernel tarball. | ||
58 | </releaseinfo> | ||
59 | |||
60 | </bookinfo> | ||
61 | |||
62 | <toc></toc> | ||
63 | |||
64 | <chapter id="introduction"> | ||
65 | <title>Introduction</title> | ||
66 | <para> | ||
67 | Welcome, gentle reader, to Rusty's Unreliable Guide to Linux | ||
68 | Kernel Hacking. This document describes the common routines and | ||
69 | general requirements for kernel code: its goal is to serve as a | ||
70 | primer for Linux kernel development for experienced C | ||
71 | programmers. I avoid implementation details: that's what the | ||
72 | code is for, and I ignore whole tracts of useful routines. | ||
73 | </para> | ||
74 | <para> | ||
75 | Before you read this, please understand that I never wanted to | ||
76 | write this document, being grossly under-qualified, but I always | ||
77 | wanted to read it, and this was the only way. I hope it will | ||
78 | grow into a compendium of best practice, common starting points | ||
79 | and random information. | ||
80 | </para> | ||
81 | </chapter> | ||
82 | |||
83 | <chapter id="basic-players"> | ||
84 | <title>The Players</title> | ||
85 | |||
86 | <para> | ||
87 | At any time each of the CPUs in a system can be: | ||
88 | </para> | ||
89 | |||
90 | <itemizedlist> | ||
91 | <listitem> | ||
92 | <para> | ||
93 | not associated with any process, serving a hardware interrupt; | ||
94 | </para> | ||
95 | </listitem> | ||
96 | |||
97 | <listitem> | ||
98 | <para> | ||
99 | not associated with any process, serving a softirq, tasklet or bh; | ||
100 | </para> | ||
101 | </listitem> | ||
102 | |||
103 | <listitem> | ||
104 | <para> | ||
105 | running in kernel space, associated with a process; | ||
106 | </para> | ||
107 | </listitem> | ||
108 | |||
109 | <listitem> | ||
110 | <para> | ||
111 | running a process in user space. | ||
112 | </para> | ||
113 | </listitem> | ||
114 | </itemizedlist> | ||
115 | |||
116 | <para> | ||
117 | There is a strict ordering between these: other than the last | ||
118 | category (userspace) each can only be pre-empted by those above. | ||
119 | For example, while a softirq is running on a CPU, no other | ||
120 | softirq will pre-empt it, but a hardware interrupt can. However, | ||
121 | any other CPUs in the system execute independently. | ||
122 | </para> | ||
123 | |||
124 | <para> | ||
125 | We'll see a number of ways that the user context can block | ||
126 | interrupts, to become truly non-preemptable. | ||
127 | </para> | ||
128 | |||
129 | <sect1 id="basics-usercontext"> | ||
130 | <title>User Context</title> | ||
131 | |||
132 | <para> | ||
133 | User context is when you are coming in from a system call or | ||
134 | other trap: you can sleep, and you own the CPU (except for | ||
135 | interrupts) until you call <function>schedule()</function>. | ||
136 | In other words, user context (unlike userspace) is not pre-emptable. | ||
137 | </para> | ||
138 | |||
139 | <note> | ||
140 | <para> | ||
141 | You are always in user context on module load and unload, | ||
142 | and on operations on the block device layer. | ||
143 | </para> | ||
144 | </note> | ||
145 | |||
146 | <para> | ||
147 | In user context, the <varname>current</varname> pointer (indicating | ||
148 | the task we are currently executing) is valid, and | ||
149 | <function>in_interrupt()</function> | ||
150 | (<filename>include/linux/interrupt.h</filename>) is <returnvalue>false | ||
151 | </returnvalue>. | ||
152 | </para> | ||
153 | |||
154 | <caution> | ||
155 | <para> | ||
156 | Beware that if you have interrupts or bottom halves disabled | ||
157 | (see below), <function>in_interrupt()</function> will return a | ||
158 | false positive. | ||
159 | </para> | ||
160 | </caution> | ||
161 | </sect1> | ||
162 | |||
163 | <sect1 id="basics-hardirqs"> | ||
164 | <title>Hardware Interrupts (Hard IRQs)</title> | ||
165 | |||
166 | <para> | ||
167 | Timer ticks, <hardware>network cards</hardware> and | ||
168 | <hardware>keyboard</hardware> are examples of real | ||
169 | hardware which produce interrupts at any time. The kernel runs | ||
170 | interrupt handlers, which services the hardware. The kernel | ||
171 | guarantees that this handler is never re-entered: if another | ||
172 | interrupt arrives, it is queued (or dropped). Because it | ||
173 | disables interrupts, this handler has to be fast: frequently it | ||
174 | simply acknowledges the interrupt, marks a `software interrupt' | ||
175 | for execution and exits. | ||
176 | </para> | ||
177 | |||
178 | <para> | ||
179 | You can tell you are in a hardware interrupt, because | ||
180 | <function>in_irq()</function> returns <returnvalue>true</returnvalue>. | ||
181 | </para> | ||
182 | <caution> | ||
183 | <para> | ||
184 | Beware that this will return a false positive if interrupts are disabled | ||
185 | (see below). | ||
186 | </para> | ||
187 | </caution> | ||
188 | </sect1> | ||
189 | |||
190 | <sect1 id="basics-softirqs"> | ||
191 | <title>Software Interrupt Context: Bottom Halves, Tasklets, softirqs</title> | ||
192 | |||
193 | <para> | ||
194 | Whenever a system call is about to return to userspace, or a | ||
195 | hardware interrupt handler exits, any `software interrupts' | ||
196 | which are marked pending (usually by hardware interrupts) are | ||
197 | run (<filename>kernel/softirq.c</filename>). | ||
198 | </para> | ||
199 | |||
200 | <para> | ||
201 | Much of the real interrupt handling work is done here. Early in | ||
202 | the transition to <acronym>SMP</acronym>, there were only `bottom | ||
203 | halves' (BHs), which didn't take advantage of multiple CPUs. Shortly | ||
204 | after we switched from wind-up computers made of match-sticks and snot, | ||
205 | we abandoned this limitation. | ||
206 | </para> | ||
207 | |||
208 | <para> | ||
209 | <filename class="headerfile">include/linux/interrupt.h</filename> lists the | ||
210 | different BH's. No matter how many CPUs you have, no two BHs will run at | ||
211 | the same time. This made the transition to SMP simpler, but sucks hard for | ||
212 | scalable performance. A very important bottom half is the timer | ||
213 | BH (<filename class="headerfile">include/linux/timer.h</filename>): you | ||
214 | can register to have it call functions for you in a given length of time. | ||
215 | </para> | ||
216 | |||
217 | <para> | ||
218 | 2.3.43 introduced softirqs, and re-implemented the (now | ||
219 | deprecated) BHs underneath them. Softirqs are fully-SMP | ||
220 | versions of BHs: they can run on as many CPUs at once as | ||
221 | required. This means they need to deal with any races in shared | ||
222 | data using their own locks. A bitmask is used to keep track of | ||
223 | which are enabled, so the 32 available softirqs should not be | ||
224 | used up lightly. (<emphasis>Yes</emphasis>, people will | ||
225 | notice). | ||
226 | </para> | ||
227 | |||
228 | <para> | ||
229 | tasklets (<filename class="headerfile">include/linux/interrupt.h</filename>) | ||
230 | are like softirqs, except they are dynamically-registrable (meaning you | ||
231 | can have as many as you want), and they also guarantee that any tasklet | ||
232 | will only run on one CPU at any time, although different tasklets can | ||
233 | run simultaneously (unlike different BHs). | ||
234 | </para> | ||
235 | <caution> | ||
236 | <para> | ||
237 | The name `tasklet' is misleading: they have nothing to do with `tasks', | ||
238 | and probably more to do with some bad vodka Alexey Kuznetsov had at the | ||
239 | time. | ||
240 | </para> | ||
241 | </caution> | ||
242 | |||
243 | <para> | ||
244 | You can tell you are in a softirq (or bottom half, or tasklet) | ||
245 | using the <function>in_softirq()</function> macro | ||
246 | (<filename class="headerfile">include/linux/interrupt.h</filename>). | ||
247 | </para> | ||
248 | <caution> | ||
249 | <para> | ||
250 | Beware that this will return a false positive if a bh lock (see below) | ||
251 | is held. | ||
252 | </para> | ||
253 | </caution> | ||
254 | </sect1> | ||
255 | </chapter> | ||
256 | |||
257 | <chapter id="basic-rules"> | ||
258 | <title>Some Basic Rules</title> | ||
259 | |||
260 | <variablelist> | ||
261 | <varlistentry> | ||
262 | <term>No memory protection</term> | ||
263 | <listitem> | ||
264 | <para> | ||
265 | If you corrupt memory, whether in user context or | ||
266 | interrupt context, the whole machine will crash. Are you | ||
267 | sure you can't do what you want in userspace? | ||
268 | </para> | ||
269 | </listitem> | ||
270 | </varlistentry> | ||
271 | |||
272 | <varlistentry> | ||
273 | <term>No floating point or <acronym>MMX</acronym></term> | ||
274 | <listitem> | ||
275 | <para> | ||
276 | The <acronym>FPU</acronym> context is not saved; even in user | ||
277 | context the <acronym>FPU</acronym> state probably won't | ||
278 | correspond with the current process: you would mess with some | ||
279 | user process' <acronym>FPU</acronym> state. If you really want | ||
280 | to do this, you would have to explicitly save/restore the full | ||
281 | <acronym>FPU</acronym> state (and avoid context switches). It | ||
282 | is generally a bad idea; use fixed point arithmetic first. | ||
283 | </para> | ||
284 | </listitem> | ||
285 | </varlistentry> | ||
286 | |||
287 | <varlistentry> | ||
288 | <term>A rigid stack limit</term> | ||
289 | <listitem> | ||
290 | <para> | ||
291 | The kernel stack is about 6K in 2.2 (for most | ||
292 | architectures: it's about 14K on the Alpha), and shared | ||
293 | with interrupts so you can't use it all. Avoid deep | ||
294 | recursion and huge local arrays on the stack (allocate | ||
295 | them dynamically instead). | ||
296 | </para> | ||
297 | </listitem> | ||
298 | </varlistentry> | ||
299 | |||
300 | <varlistentry> | ||
301 | <term>The Linux kernel is portable</term> | ||
302 | <listitem> | ||
303 | <para> | ||
304 | Let's keep it that way. Your code should be 64-bit clean, | ||
305 | and endian-independent. You should also minimize CPU | ||
306 | specific stuff, e.g. inline assembly should be cleanly | ||
307 | encapsulated and minimized to ease porting. Generally it | ||
308 | should be restricted to the architecture-dependent part of | ||
309 | the kernel tree. | ||
310 | </para> | ||
311 | </listitem> | ||
312 | </varlistentry> | ||
313 | </variablelist> | ||
314 | </chapter> | ||
315 | |||
316 | <chapter id="ioctls"> | ||
317 | <title>ioctls: Not writing a new system call</title> | ||
318 | |||
319 | <para> | ||
320 | A system call generally looks like this | ||
321 | </para> | ||
322 | |||
323 | <programlisting> | ||
324 | asmlinkage long sys_mycall(int arg) | ||
325 | { | ||
326 | return 0; | ||
327 | } | ||
328 | </programlisting> | ||
329 | |||
330 | <para> | ||
331 | First, in most cases you don't want to create a new system call. | ||
332 | You create a character device and implement an appropriate ioctl | ||
333 | for it. This is much more flexible than system calls, doesn't have | ||
334 | to be entered in every architecture's | ||
335 | <filename class="headerfile">include/asm/unistd.h</filename> and | ||
336 | <filename>arch/kernel/entry.S</filename> file, and is much more | ||
337 | likely to be accepted by Linus. | ||
338 | </para> | ||
339 | |||
340 | <para> | ||
341 | If all your routine does is read or write some parameter, consider | ||
342 | implementing a <function>sysctl</function> interface instead. | ||
343 | </para> | ||
344 | |||
345 | <para> | ||
346 | Inside the ioctl you're in user context to a process. When a | ||
347 | error occurs you return a negated errno (see | ||
348 | <filename class="headerfile">include/linux/errno.h</filename>), | ||
349 | otherwise you return <returnvalue>0</returnvalue>. | ||
350 | </para> | ||
351 | |||
352 | <para> | ||
353 | After you slept you should check if a signal occurred: the | ||
354 | Unix/Linux way of handling signals is to temporarily exit the | ||
355 | system call with the <constant>-ERESTARTSYS</constant> error. The | ||
356 | system call entry code will switch back to user context, process | ||
357 | the signal handler and then your system call will be restarted | ||
358 | (unless the user disabled that). So you should be prepared to | ||
359 | process the restart, e.g. if you're in the middle of manipulating | ||
360 | some data structure. | ||
361 | </para> | ||
362 | |||
363 | <programlisting> | ||
364 | if (signal_pending()) | ||
365 | return -ERESTARTSYS; | ||
366 | </programlisting> | ||
367 | |||
368 | <para> | ||
369 | If you're doing longer computations: first think userspace. If you | ||
370 | <emphasis>really</emphasis> want to do it in kernel you should | ||
371 | regularly check if you need to give up the CPU (remember there is | ||
372 | cooperative multitasking per CPU). Idiom: | ||
373 | </para> | ||
374 | |||
375 | <programlisting> | ||
376 | cond_resched(); /* Will sleep */ | ||
377 | </programlisting> | ||
378 | |||
379 | <para> | ||
380 | A short note on interface design: the UNIX system call motto is | ||
381 | "Provide mechanism not policy". | ||
382 | </para> | ||
383 | </chapter> | ||
384 | |||
385 | <chapter id="deadlock-recipes"> | ||
386 | <title>Recipes for Deadlock</title> | ||
387 | |||
388 | <para> | ||
389 | You cannot call any routines which may sleep, unless: | ||
390 | </para> | ||
391 | <itemizedlist> | ||
392 | <listitem> | ||
393 | <para> | ||
394 | You are in user context. | ||
395 | </para> | ||
396 | </listitem> | ||
397 | |||
398 | <listitem> | ||
399 | <para> | ||
400 | You do not own any spinlocks. | ||
401 | </para> | ||
402 | </listitem> | ||
403 | |||
404 | <listitem> | ||
405 | <para> | ||
406 | You have interrupts enabled (actually, Andi Kleen says | ||
407 | that the scheduling code will enable them for you, but | ||
408 | that's probably not what you wanted). | ||
409 | </para> | ||
410 | </listitem> | ||
411 | </itemizedlist> | ||
412 | |||
413 | <para> | ||
414 | Note that some functions may sleep implicitly: common ones are | ||
415 | the user space access functions (*_user) and memory allocation | ||
416 | functions without <symbol>GFP_ATOMIC</symbol>. | ||
417 | </para> | ||
418 | |||
419 | <para> | ||
420 | You will eventually lock up your box if you break these rules. | ||
421 | </para> | ||
422 | |||
423 | <para> | ||
424 | Really. | ||
425 | </para> | ||
426 | </chapter> | ||
427 | |||
428 | <chapter id="common-routines"> | ||
429 | <title>Common Routines</title> | ||
430 | |||
431 | <sect1 id="routines-printk"> | ||
432 | <title> | ||
433 | <function>printk()</function> | ||
434 | <filename class="headerfile">include/linux/kernel.h</filename> | ||
435 | </title> | ||
436 | |||
437 | <para> | ||
438 | <function>printk()</function> feeds kernel messages to the | ||
439 | console, dmesg, and the syslog daemon. It is useful for debugging | ||
440 | and reporting errors, and can be used inside interrupt context, | ||
441 | but use with caution: a machine which has its console flooded with | ||
442 | printk messages is unusable. It uses a format string mostly | ||
443 | compatible with ANSI C printf, and C string concatenation to give | ||
444 | it a first "priority" argument: | ||
445 | </para> | ||
446 | |||
447 | <programlisting> | ||
448 | printk(KERN_INFO "i = %u\n", i); | ||
449 | </programlisting> | ||
450 | |||
451 | <para> | ||
452 | See <filename class="headerfile">include/linux/kernel.h</filename>; | ||
453 | for other KERN_ values; these are interpreted by syslog as the | ||
454 | level. Special case: for printing an IP address use | ||
455 | </para> | ||
456 | |||
457 | <programlisting> | ||
458 | __u32 ipaddress; | ||
459 | printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress)); | ||
460 | </programlisting> | ||
461 | |||
462 | <para> | ||
463 | <function>printk()</function> internally uses a 1K buffer and does | ||
464 | not catch overruns. Make sure that will be enough. | ||
465 | </para> | ||
466 | |||
467 | <note> | ||
468 | <para> | ||
469 | You will know when you are a real kernel hacker | ||
470 | when you start typoing printf as printk in your user programs :) | ||
471 | </para> | ||
472 | </note> | ||
473 | |||
474 | <!--- From the Lions book reader department --> | ||
475 | |||
476 | <note> | ||
477 | <para> | ||
478 | Another sidenote: the original Unix Version 6 sources had a | ||
479 | comment on top of its printf function: "Printf should not be | ||
480 | used for chit-chat". You should follow that advice. | ||
481 | </para> | ||
482 | </note> | ||
483 | </sect1> | ||
484 | |||
485 | <sect1 id="routines-copy"> | ||
486 | <title> | ||
487 | <function>copy_[to/from]_user()</function> | ||
488 | / | ||
489 | <function>get_user()</function> | ||
490 | / | ||
491 | <function>put_user()</function> | ||
492 | <filename class="headerfile">include/asm/uaccess.h</filename> | ||
493 | </title> | ||
494 | |||
495 | <para> | ||
496 | <emphasis>[SLEEPS]</emphasis> | ||
497 | </para> | ||
498 | |||
499 | <para> | ||
500 | <function>put_user()</function> and <function>get_user()</function> | ||
501 | are used to get and put single values (such as an int, char, or | ||
502 | long) from and to userspace. A pointer into userspace should | ||
503 | never be simply dereferenced: data should be copied using these | ||
504 | routines. Both return <constant>-EFAULT</constant> or 0. | ||
505 | </para> | ||
506 | <para> | ||
507 | <function>copy_to_user()</function> and | ||
508 | <function>copy_from_user()</function> are more general: they copy | ||
509 | an arbitrary amount of data to and from userspace. | ||
510 | <caution> | ||
511 | <para> | ||
512 | Unlike <function>put_user()</function> and | ||
513 | <function>get_user()</function>, they return the amount of | ||
514 | uncopied data (ie. <returnvalue>0</returnvalue> still means | ||
515 | success). | ||
516 | </para> | ||
517 | </caution> | ||
518 | [Yes, this moronic interface makes me cringe. Please submit a | ||
519 | patch and become my hero --RR.] | ||
520 | </para> | ||
521 | <para> | ||
522 | The functions may sleep implicitly. This should never be called | ||
523 | outside user context (it makes no sense), with interrupts | ||
524 | disabled, or a spinlock held. | ||
525 | </para> | ||
526 | </sect1> | ||
527 | |||
528 | <sect1 id="routines-kmalloc"> | ||
529 | <title><function>kmalloc()</function>/<function>kfree()</function> | ||
530 | <filename class="headerfile">include/linux/slab.h</filename></title> | ||
531 | |||
532 | <para> | ||
533 | <emphasis>[MAY SLEEP: SEE BELOW]</emphasis> | ||
534 | </para> | ||
535 | |||
536 | <para> | ||
537 | These routines are used to dynamically request pointer-aligned | ||
538 | chunks of memory, like malloc and free do in userspace, but | ||
539 | <function>kmalloc()</function> takes an extra flag word. | ||
540 | Important values: | ||
541 | </para> | ||
542 | |||
543 | <variablelist> | ||
544 | <varlistentry> | ||
545 | <term> | ||
546 | <constant> | ||
547 | GFP_KERNEL | ||
548 | </constant> | ||
549 | </term> | ||
550 | <listitem> | ||
551 | <para> | ||
552 | May sleep and swap to free memory. Only allowed in user | ||
553 | context, but is the most reliable way to allocate memory. | ||
554 | </para> | ||
555 | </listitem> | ||
556 | </varlistentry> | ||
557 | |||
558 | <varlistentry> | ||
559 | <term> | ||
560 | <constant> | ||
561 | GFP_ATOMIC | ||
562 | </constant> | ||
563 | </term> | ||
564 | <listitem> | ||
565 | <para> | ||
566 | Don't sleep. Less reliable than <constant>GFP_KERNEL</constant>, | ||
567 | but may be called from interrupt context. You should | ||
568 | <emphasis>really</emphasis> have a good out-of-memory | ||
569 | error-handling strategy. | ||
570 | </para> | ||
571 | </listitem> | ||
572 | </varlistentry> | ||
573 | |||
574 | <varlistentry> | ||
575 | <term> | ||
576 | <constant> | ||
577 | GFP_DMA | ||
578 | </constant> | ||
579 | </term> | ||
580 | <listitem> | ||
581 | <para> | ||
582 | Allocate ISA DMA lower than 16MB. If you don't know what that | ||
583 | is you don't need it. Very unreliable. | ||
584 | </para> | ||
585 | </listitem> | ||
586 | </varlistentry> | ||
587 | </variablelist> | ||
588 | |||
589 | <para> | ||
590 | If you see a <errorname>kmem_grow: Called nonatomically from int | ||
591 | </errorname> warning message you called a memory allocation function | ||
592 | from interrupt context without <constant>GFP_ATOMIC</constant>. | ||
593 | You should really fix that. Run, don't walk. | ||
594 | </para> | ||
595 | |||
596 | <para> | ||
597 | If you are allocating at least <constant>PAGE_SIZE</constant> | ||
598 | (<filename class="headerfile">include/asm/page.h</filename>) bytes, | ||
599 | consider using <function>__get_free_pages()</function> | ||
600 | |||
601 | (<filename class="headerfile">include/linux/mm.h</filename>). It | ||
602 | takes an order argument (0 for page sized, 1 for double page, 2 | ||
603 | for four pages etc.) and the same memory priority flag word as | ||
604 | above. | ||
605 | </para> | ||
606 | |||
607 | <para> | ||
608 | If you are allocating more than a page worth of bytes you can use | ||
609 | <function>vmalloc()</function>. It'll allocate virtual memory in | ||
610 | the kernel map. This block is not contiguous in physical memory, | ||
611 | but the <acronym>MMU</acronym> makes it look like it is for you | ||
612 | (so it'll only look contiguous to the CPUs, not to external device | ||
613 | drivers). If you really need large physically contiguous memory | ||
614 | for some weird device, you have a problem: it is poorly supported | ||
615 | in Linux because after some time memory fragmentation in a running | ||
616 | kernel makes it hard. The best way is to allocate the block early | ||
617 | in the boot process via the <function>alloc_bootmem()</function> | ||
618 | routine. | ||
619 | </para> | ||
620 | |||
621 | <para> | ||
622 | Before inventing your own cache of often-used objects consider | ||
623 | using a slab cache in | ||
624 | <filename class="headerfile">include/linux/slab.h</filename> | ||
625 | </para> | ||
626 | </sect1> | ||
627 | |||
628 | <sect1 id="routines-current"> | ||
629 | <title><function>current</function> | ||
630 | <filename class="headerfile">include/asm/current.h</filename></title> | ||
631 | |||
632 | <para> | ||
633 | This global variable (really a macro) contains a pointer to | ||
634 | the current task structure, so is only valid in user context. | ||
635 | For example, when a process makes a system call, this will | ||
636 | point to the task structure of the calling process. It is | ||
637 | <emphasis>not NULL</emphasis> in interrupt context. | ||
638 | </para> | ||
639 | </sect1> | ||
640 | |||
641 | <sect1 id="routines-udelay"> | ||
642 | <title><function>udelay()</function>/<function>mdelay()</function> | ||
643 | <filename class="headerfile">include/asm/delay.h</filename> | ||
644 | <filename class="headerfile">include/linux/delay.h</filename> | ||
645 | </title> | ||
646 | |||
647 | <para> | ||
648 | The <function>udelay()</function> function can be used for small pauses. | ||
649 | Do not use large values with <function>udelay()</function> as you risk | ||
650 | overflow - the helper function <function>mdelay()</function> is useful | ||
651 | here, or even consider <function>schedule_timeout()</function>. | ||
652 | </para> | ||
653 | </sect1> | ||
654 | |||
655 | <sect1 id="routines-endian"> | ||
656 | <title><function>cpu_to_be32()</function>/<function>be32_to_cpu()</function>/<function>cpu_to_le32()</function>/<function>le32_to_cpu()</function> | ||
657 | <filename class="headerfile">include/asm/byteorder.h</filename> | ||
658 | </title> | ||
659 | |||
660 | <para> | ||
661 | The <function>cpu_to_be32()</function> family (where the "32" can | ||
662 | be replaced by 64 or 16, and the "be" can be replaced by "le") are | ||
663 | the general way to do endian conversions in the kernel: they | ||
664 | return the converted value. All variations supply the reverse as | ||
665 | well: <function>be32_to_cpu()</function>, etc. | ||
666 | </para> | ||
667 | |||
668 | <para> | ||
669 | There are two major variations of these functions: the pointer | ||
670 | variation, such as <function>cpu_to_be32p()</function>, which take | ||
671 | a pointer to the given type, and return the converted value. The | ||
672 | other variation is the "in-situ" family, such as | ||
673 | <function>cpu_to_be32s()</function>, which convert value referred | ||
674 | to by the pointer, and return void. | ||
675 | </para> | ||
676 | </sect1> | ||
677 | |||
678 | <sect1 id="routines-local-irqs"> | ||
679 | <title><function>local_irq_save()</function>/<function>local_irq_restore()</function> | ||
680 | <filename class="headerfile">include/asm/system.h</filename> | ||
681 | </title> | ||
682 | |||
683 | <para> | ||
684 | These routines disable hard interrupts on the local CPU, and | ||
685 | restore them. They are reentrant; saving the previous state in | ||
686 | their one <varname>unsigned long flags</varname> argument. If you | ||
687 | know that interrupts are enabled, you can simply use | ||
688 | <function>local_irq_disable()</function> and | ||
689 | <function>local_irq_enable()</function>. | ||
690 | </para> | ||
691 | </sect1> | ||
692 | |||
693 | <sect1 id="routines-softirqs"> | ||
694 | <title><function>local_bh_disable()</function>/<function>local_bh_enable()</function> | ||
695 | <filename class="headerfile">include/linux/interrupt.h</filename></title> | ||
696 | |||
697 | <para> | ||
698 | These routines disable soft interrupts on the local CPU, and | ||
699 | restore them. They are reentrant; if soft interrupts were | ||
700 | disabled before, they will still be disabled after this pair | ||
701 | of functions has been called. They prevent softirqs, tasklets | ||
702 | and bottom halves from running on the current CPU. | ||
703 | </para> | ||
704 | </sect1> | ||
705 | |||
706 | <sect1 id="routines-processorids"> | ||
707 | <title><function>smp_processor_id</function>() | ||
708 | <filename class="headerfile">include/asm/smp.h</filename></title> | ||
709 | |||
710 | <para> | ||
711 | <function>smp_processor_id()</function> returns the current | ||
712 | processor number, between 0 and <symbol>NR_CPUS</symbol> (the | ||
713 | maximum number of CPUs supported by Linux, currently 32). These | ||
714 | values are not necessarily continuous. | ||
715 | </para> | ||
716 | </sect1> | ||
717 | |||
718 | <sect1 id="routines-init"> | ||
719 | <title><type>__init</type>/<type>__exit</type>/<type>__initdata</type> | ||
720 | <filename class="headerfile">include/linux/init.h</filename></title> | ||
721 | |||
722 | <para> | ||
723 | After boot, the kernel frees up a special section; functions | ||
724 | marked with <type>__init</type> and data structures marked with | ||
725 | <type>__initdata</type> are dropped after boot is complete (within | ||
726 | modules this directive is currently ignored). <type>__exit</type> | ||
727 | is used to declare a function which is only required on exit: the | ||
728 | function will be dropped if this file is not compiled as a module. | ||
729 | See the header file for use. Note that it makes no sense for a function | ||
730 | marked with <type>__init</type> to be exported to modules with | ||
731 | <function>EXPORT_SYMBOL()</function> - this will break. | ||
732 | </para> | ||
733 | <para> | ||
734 | Static data structures marked as <type>__initdata</type> must be initialised | ||
735 | (as opposed to ordinary static data which is zeroed BSS) and cannot be | ||
736 | <type>const</type>. | ||
737 | </para> | ||
738 | |||
739 | </sect1> | ||
740 | |||
741 | <sect1 id="routines-init-again"> | ||
742 | <title><function>__initcall()</function>/<function>module_init()</function> | ||
743 | <filename class="headerfile">include/linux/init.h</filename></title> | ||
744 | <para> | ||
745 | Many parts of the kernel are well served as a module | ||
746 | (dynamically-loadable parts of the kernel). Using the | ||
747 | <function>module_init()</function> and | ||
748 | <function>module_exit()</function> macros it is easy to write code | ||
749 | without #ifdefs which can operate both as a module or built into | ||
750 | the kernel. | ||
751 | </para> | ||
752 | |||
753 | <para> | ||
754 | The <function>module_init()</function> macro defines which | ||
755 | function is to be called at module insertion time (if the file is | ||
756 | compiled as a module), or at boot time: if the file is not | ||
757 | compiled as a module the <function>module_init()</function> macro | ||
758 | becomes equivalent to <function>__initcall()</function>, which | ||
759 | through linker magic ensures that the function is called on boot. | ||
760 | </para> | ||
761 | |||
762 | <para> | ||
763 | The function can return a negative error number to cause | ||
764 | module loading to fail (unfortunately, this has no effect if | ||
765 | the module is compiled into the kernel). For modules, this is | ||
766 | called in user context, with interrupts enabled, and the | ||
767 | kernel lock held, so it can sleep. | ||
768 | </para> | ||
769 | </sect1> | ||
770 | |||
771 | <sect1 id="routines-moduleexit"> | ||
772 | <title> <function>module_exit()</function> | ||
773 | <filename class="headerfile">include/linux/init.h</filename> </title> | ||
774 | |||
775 | <para> | ||
776 | This macro defines the function to be called at module removal | ||
777 | time (or never, in the case of the file compiled into the | ||
778 | kernel). It will only be called if the module usage count has | ||
779 | reached zero. This function can also sleep, but cannot fail: | ||
780 | everything must be cleaned up by the time it returns. | ||
781 | </para> | ||
782 | </sect1> | ||
783 | |||
784 | <!-- add info on new-style module refcounting here --> | ||
785 | </chapter> | ||
786 | |||
787 | <chapter id="queues"> | ||
788 | <title>Wait Queues | ||
789 | <filename class="headerfile">include/linux/wait.h</filename> | ||
790 | </title> | ||
791 | <para> | ||
792 | <emphasis>[SLEEPS]</emphasis> | ||
793 | </para> | ||
794 | |||
795 | <para> | ||
796 | A wait queue is used to wait for someone to wake you up when a | ||
797 | certain condition is true. They must be used carefully to ensure | ||
798 | there is no race condition. You declare a | ||
799 | <type>wait_queue_head_t</type>, and then processes which want to | ||
800 | wait for that condition declare a <type>wait_queue_t</type> | ||
801 | referring to themselves, and place that in the queue. | ||
802 | </para> | ||
803 | |||
804 | <sect1 id="queue-declaring"> | ||
805 | <title>Declaring</title> | ||
806 | |||
807 | <para> | ||
808 | You declare a <type>wait_queue_head_t</type> using the | ||
809 | <function>DECLARE_WAIT_QUEUE_HEAD()</function> macro, or using the | ||
810 | <function>init_waitqueue_head()</function> routine in your | ||
811 | initialization code. | ||
812 | </para> | ||
813 | </sect1> | ||
814 | |||
815 | <sect1 id="queue-waitqueue"> | ||
816 | <title>Queuing</title> | ||
817 | |||
818 | <para> | ||
819 | Placing yourself in the waitqueue is fairly complex, because you | ||
820 | must put yourself in the queue before checking the condition. | ||
821 | There is a macro to do this: | ||
822 | <function>wait_event_interruptible()</function> | ||
823 | |||
824 | <filename class="headerfile">include/linux/sched.h</filename> The | ||
825 | first argument is the wait queue head, and the second is an | ||
826 | expression which is evaluated; the macro returns | ||
827 | <returnvalue>0</returnvalue> when this expression is true, or | ||
828 | <returnvalue>-ERESTARTSYS</returnvalue> if a signal is received. | ||
829 | The <function>wait_event()</function> version ignores signals. | ||
830 | </para> | ||
831 | <para> | ||
832 | Do not use the <function>sleep_on()</function> function family - | ||
833 | it is very easy to accidentally introduce races; almost certainly | ||
834 | one of the <function>wait_event()</function> family will do, or a | ||
835 | loop around <function>schedule_timeout()</function>. If you choose | ||
836 | to loop around <function>schedule_timeout()</function> remember | ||
837 | you must set the task state (with | ||
838 | <function>set_current_state()</function>) on each iteration to avoid | ||
839 | busy-looping. | ||
840 | </para> | ||
841 | |||
842 | </sect1> | ||
843 | |||
844 | <sect1 id="queue-waking"> | ||
845 | <title>Waking Up Queued Tasks</title> | ||
846 | |||
847 | <para> | ||
848 | Call <function>wake_up()</function> | ||
849 | |||
850 | <filename class="headerfile">include/linux/sched.h</filename>;, | ||
851 | which will wake up every process in the queue. The exception is | ||
852 | if one has <constant>TASK_EXCLUSIVE</constant> set, in which case | ||
853 | the remainder of the queue will not be woken. | ||
854 | </para> | ||
855 | </sect1> | ||
856 | </chapter> | ||
857 | |||
858 | <chapter id="atomic-ops"> | ||
859 | <title>Atomic Operations</title> | ||
860 | |||
861 | <para> | ||
862 | Certain operations are guaranteed atomic on all platforms. The | ||
863 | first class of operations work on <type>atomic_t</type> | ||
864 | |||
865 | <filename class="headerfile">include/asm/atomic.h</filename>; this | ||
866 | contains a signed integer (at least 24 bits long), and you must use | ||
867 | these functions to manipulate or read atomic_t variables. | ||
868 | <function>atomic_read()</function> and | ||
869 | <function>atomic_set()</function> get and set the counter, | ||
870 | <function>atomic_add()</function>, | ||
871 | <function>atomic_sub()</function>, | ||
872 | <function>atomic_inc()</function>, | ||
873 | <function>atomic_dec()</function>, and | ||
874 | <function>atomic_dec_and_test()</function> (returns | ||
875 | <returnvalue>true</returnvalue> if it was decremented to zero). | ||
876 | </para> | ||
877 | |||
878 | <para> | ||
879 | Yes. It returns <returnvalue>true</returnvalue> (i.e. != 0) if the | ||
880 | atomic variable is zero. | ||
881 | </para> | ||
882 | |||
883 | <para> | ||
884 | Note that these functions are slower than normal arithmetic, and | ||
885 | so should not be used unnecessarily. On some platforms they | ||
886 | are much slower, like 32-bit Sparc where they use a spinlock. | ||
887 | </para> | ||
888 | |||
889 | <para> | ||
890 | The second class of atomic operations is atomic bit operations on a | ||
891 | <type>long</type>, defined in | ||
892 | |||
893 | <filename class="headerfile">include/linux/bitops.h</filename>. These | ||
894 | operations generally take a pointer to the bit pattern, and a bit | ||
895 | number: 0 is the least significant bit. | ||
896 | <function>set_bit()</function>, <function>clear_bit()</function> | ||
897 | and <function>change_bit()</function> set, clear, and flip the | ||
898 | given bit. <function>test_and_set_bit()</function>, | ||
899 | <function>test_and_clear_bit()</function> and | ||
900 | <function>test_and_change_bit()</function> do the same thing, | ||
901 | except return true if the bit was previously set; these are | ||
902 | particularly useful for very simple locking. | ||
903 | </para> | ||
904 | |||
905 | <para> | ||
906 | It is possible to call these operations with bit indices greater | ||
907 | than BITS_PER_LONG. The resulting behavior is strange on big-endian | ||
908 | platforms though so it is a good idea not to do this. | ||
909 | </para> | ||
910 | |||
911 | <para> | ||
912 | Note that the order of bits depends on the architecture, and in | ||
913 | particular, the bitfield passed to these operations must be at | ||
914 | least as large as a <type>long</type>. | ||
915 | </para> | ||
916 | </chapter> | ||
917 | |||
918 | <chapter id="symbols"> | ||
919 | <title>Symbols</title> | ||
920 | |||
921 | <para> | ||
922 | Within the kernel proper, the normal linking rules apply | ||
923 | (ie. unless a symbol is declared to be file scope with the | ||
924 | <type>static</type> keyword, it can be used anywhere in the | ||
925 | kernel). However, for modules, a special exported symbol table is | ||
926 | kept which limits the entry points to the kernel proper. Modules | ||
927 | can also export symbols. | ||
928 | </para> | ||
929 | |||
930 | <sect1 id="sym-exportsymbols"> | ||
931 | <title><function>EXPORT_SYMBOL()</function> | ||
932 | <filename class="headerfile">include/linux/module.h</filename></title> | ||
933 | |||
934 | <para> | ||
935 | This is the classic method of exporting a symbol, and it works | ||
936 | for both modules and non-modules. In the kernel all these | ||
937 | declarations are often bundled into a single file to help | ||
938 | genksyms (which searches source files for these declarations). | ||
939 | See the comment on genksyms and Makefiles below. | ||
940 | </para> | ||
941 | </sect1> | ||
942 | |||
943 | <sect1 id="sym-exportsymbols-gpl"> | ||
944 | <title><function>EXPORT_SYMBOL_GPL()</function> | ||
945 | <filename class="headerfile">include/linux/module.h</filename></title> | ||
946 | |||
947 | <para> | ||
948 | Similar to <function>EXPORT_SYMBOL()</function> except that the | ||
949 | symbols exported by <function>EXPORT_SYMBOL_GPL()</function> can | ||
950 | only be seen by modules with a | ||
951 | <function>MODULE_LICENSE()</function> that specifies a GPL | ||
952 | compatible license. | ||
953 | </para> | ||
954 | </sect1> | ||
955 | </chapter> | ||
956 | |||
957 | <chapter id="conventions"> | ||
958 | <title>Routines and Conventions</title> | ||
959 | |||
960 | <sect1 id="conventions-doublelinkedlist"> | ||
961 | <title>Double-linked lists | ||
962 | <filename class="headerfile">include/linux/list.h</filename></title> | ||
963 | |||
964 | <para> | ||
965 | There are three sets of linked-list routines in the kernel | ||
966 | headers, but this one seems to be winning out (and Linus has | ||
967 | used it). If you don't have some particular pressing need for | ||
968 | a single list, it's a good choice. In fact, I don't care | ||
969 | whether it's a good choice or not, just use it so we can get | ||
970 | rid of the others. | ||
971 | </para> | ||
972 | </sect1> | ||
973 | |||
974 | <sect1 id="convention-returns"> | ||
975 | <title>Return Conventions</title> | ||
976 | |||
977 | <para> | ||
978 | For code called in user context, it's very common to defy C | ||
979 | convention, and return <returnvalue>0</returnvalue> for success, | ||
980 | and a negative error number | ||
981 | (eg. <returnvalue>-EFAULT</returnvalue>) for failure. This can be | ||
982 | unintuitive at first, but it's fairly widespread in the networking | ||
983 | code, for example. | ||
984 | </para> | ||
985 | |||
986 | <para> | ||
987 | The filesystem code uses <function>ERR_PTR()</function> | ||
988 | |||
989 | <filename class="headerfile">include/linux/fs.h</filename>; to | ||
990 | encode a negative error number into a pointer, and | ||
991 | <function>IS_ERR()</function> and <function>PTR_ERR()</function> | ||
992 | to get it back out again: avoids a separate pointer parameter for | ||
993 | the error number. Icky, but in a good way. | ||
994 | </para> | ||
995 | </sect1> | ||
996 | |||
997 | <sect1 id="conventions-borkedcompile"> | ||
998 | <title>Breaking Compilation</title> | ||
999 | |||
1000 | <para> | ||
1001 | Linus and the other developers sometimes change function or | ||
1002 | structure names in development kernels; this is not done just to | ||
1003 | keep everyone on their toes: it reflects a fundamental change | ||
1004 | (eg. can no longer be called with interrupts on, or does extra | ||
1005 | checks, or doesn't do checks which were caught before). Usually | ||
1006 | this is accompanied by a fairly complete note to the linux-kernel | ||
1007 | mailing list; search the archive. Simply doing a global replace | ||
1008 | on the file usually makes things <emphasis>worse</emphasis>. | ||
1009 | </para> | ||
1010 | </sect1> | ||
1011 | |||
1012 | <sect1 id="conventions-initialising"> | ||
1013 | <title>Initializing structure members</title> | ||
1014 | |||
1015 | <para> | ||
1016 | The preferred method of initializing structures is to use | ||
1017 | designated initialisers, as defined by ISO C99, eg: | ||
1018 | </para> | ||
1019 | <programlisting> | ||
1020 | static struct block_device_operations opt_fops = { | ||
1021 | .open = opt_open, | ||
1022 | .release = opt_release, | ||
1023 | .ioctl = opt_ioctl, | ||
1024 | .check_media_change = opt_media_change, | ||
1025 | }; | ||
1026 | </programlisting> | ||
1027 | <para> | ||
1028 | This makes it easy to grep for, and makes it clear which | ||
1029 | structure fields are set. You should do this because it looks | ||
1030 | cool. | ||
1031 | </para> | ||
1032 | </sect1> | ||
1033 | |||
1034 | <sect1 id="conventions-gnu-extns"> | ||
1035 | <title>GNU Extensions</title> | ||
1036 | |||
1037 | <para> | ||
1038 | GNU Extensions are explicitly allowed in the Linux kernel. | ||
1039 | Note that some of the more complex ones are not very well | ||
1040 | supported, due to lack of general use, but the following are | ||
1041 | considered standard (see the GCC info page section "C | ||
1042 | Extensions" for more details - Yes, really the info page, the | ||
1043 | man page is only a short summary of the stuff in info): | ||
1044 | </para> | ||
1045 | <itemizedlist> | ||
1046 | <listitem> | ||
1047 | <para> | ||
1048 | Inline functions | ||
1049 | </para> | ||
1050 | </listitem> | ||
1051 | <listitem> | ||
1052 | <para> | ||
1053 | Statement expressions (ie. the ({ and }) constructs). | ||
1054 | </para> | ||
1055 | </listitem> | ||
1056 | <listitem> | ||
1057 | <para> | ||
1058 | Declaring attributes of a function / variable / type | ||
1059 | (__attribute__) | ||
1060 | </para> | ||
1061 | </listitem> | ||
1062 | <listitem> | ||
1063 | <para> | ||
1064 | typeof | ||
1065 | </para> | ||
1066 | </listitem> | ||
1067 | <listitem> | ||
1068 | <para> | ||
1069 | Zero length arrays | ||
1070 | </para> | ||
1071 | </listitem> | ||
1072 | <listitem> | ||
1073 | <para> | ||
1074 | Macro varargs | ||
1075 | </para> | ||
1076 | </listitem> | ||
1077 | <listitem> | ||
1078 | <para> | ||
1079 | Arithmetic on void pointers | ||
1080 | </para> | ||
1081 | </listitem> | ||
1082 | <listitem> | ||
1083 | <para> | ||
1084 | Non-Constant initializers | ||
1085 | </para> | ||
1086 | </listitem> | ||
1087 | <listitem> | ||
1088 | <para> | ||
1089 | Assembler Instructions (not outside arch/ and include/asm/) | ||
1090 | </para> | ||
1091 | </listitem> | ||
1092 | <listitem> | ||
1093 | <para> | ||
1094 | Function names as strings (__FUNCTION__) | ||
1095 | </para> | ||
1096 | </listitem> | ||
1097 | <listitem> | ||
1098 | <para> | ||
1099 | __builtin_constant_p() | ||
1100 | </para> | ||
1101 | </listitem> | ||
1102 | </itemizedlist> | ||
1103 | |||
1104 | <para> | ||
1105 | Be wary when using long long in the kernel, the code gcc generates for | ||
1106 | it is horrible and worse: division and multiplication does not work | ||
1107 | on i386 because the GCC runtime functions for it are missing from | ||
1108 | the kernel environment. | ||
1109 | </para> | ||
1110 | |||
1111 | <!-- FIXME: add a note about ANSI aliasing cleanness --> | ||
1112 | </sect1> | ||
1113 | |||
1114 | <sect1 id="conventions-cplusplus"> | ||
1115 | <title>C++</title> | ||
1116 | |||
1117 | <para> | ||
1118 | Using C++ in the kernel is usually a bad idea, because the | ||
1119 | kernel does not provide the necessary runtime environment | ||
1120 | and the include files are not tested for it. It is still | ||
1121 | possible, but not recommended. If you really want to do | ||
1122 | this, forget about exceptions at least. | ||
1123 | </para> | ||
1124 | </sect1> | ||
1125 | |||
1126 | <sect1 id="conventions-ifdef"> | ||
1127 | <title>#if</title> | ||
1128 | |||
1129 | <para> | ||
1130 | It is generally considered cleaner to use macros in header files | ||
1131 | (or at the top of .c files) to abstract away functions rather than | ||
1132 | using `#if' pre-processor statements throughout the source code. | ||
1133 | </para> | ||
1134 | </sect1> | ||
1135 | </chapter> | ||
1136 | |||
1137 | <chapter id="submitting"> | ||
1138 | <title>Putting Your Stuff in the Kernel</title> | ||
1139 | |||
1140 | <para> | ||
1141 | In order to get your stuff into shape for official inclusion, or | ||
1142 | even to make a neat patch, there's administrative work to be | ||
1143 | done: | ||
1144 | </para> | ||
1145 | <itemizedlist> | ||
1146 | <listitem> | ||
1147 | <para> | ||
1148 | Figure out whose pond you've been pissing in. Look at the top of | ||
1149 | the source files, inside the <filename>MAINTAINERS</filename> | ||
1150 | file, and last of all in the <filename>CREDITS</filename> file. | ||
1151 | You should coordinate with this person to make sure you're not | ||
1152 | duplicating effort, or trying something that's already been | ||
1153 | rejected. | ||
1154 | </para> | ||
1155 | |||
1156 | <para> | ||
1157 | Make sure you put your name and EMail address at the top of | ||
1158 | any files you create or mangle significantly. This is the | ||
1159 | first place people will look when they find a bug, or when | ||
1160 | <emphasis>they</emphasis> want to make a change. | ||
1161 | </para> | ||
1162 | </listitem> | ||
1163 | |||
1164 | <listitem> | ||
1165 | <para> | ||
1166 | Usually you want a configuration option for your kernel hack. | ||
1167 | Edit <filename>Config.in</filename> in the appropriate directory | ||
1168 | (but under <filename>arch/</filename> it's called | ||
1169 | <filename>config.in</filename>). The Config Language used is not | ||
1170 | bash, even though it looks like bash; the safe way is to use only | ||
1171 | the constructs that you already see in | ||
1172 | <filename>Config.in</filename> files (see | ||
1173 | <filename>Documentation/kbuild/kconfig-language.txt</filename>). | ||
1174 | It's good to run "make xconfig" at least once to test (because | ||
1175 | it's the only one with a static parser). | ||
1176 | </para> | ||
1177 | |||
1178 | <para> | ||
1179 | Variables which can be Y or N use <type>bool</type> followed by a | ||
1180 | tagline and the config define name (which must start with | ||
1181 | CONFIG_). The <type>tristate</type> function is the same, but | ||
1182 | allows the answer M (which defines | ||
1183 | <symbol>CONFIG_foo_MODULE</symbol> in your source, instead of | ||
1184 | <symbol>CONFIG_FOO</symbol>) if <symbol>CONFIG_MODULES</symbol> | ||
1185 | is enabled. | ||
1186 | </para> | ||
1187 | |||
1188 | <para> | ||
1189 | You may well want to make your CONFIG option only visible if | ||
1190 | <symbol>CONFIG_EXPERIMENTAL</symbol> is enabled: this serves as a | ||
1191 | warning to users. There many other fancy things you can do: see | ||
1192 | the various <filename>Config.in</filename> files for ideas. | ||
1193 | </para> | ||
1194 | </listitem> | ||
1195 | |||
1196 | <listitem> | ||
1197 | <para> | ||
1198 | Edit the <filename>Makefile</filename>: the CONFIG variables are | ||
1199 | exported here so you can conditionalize compilation with `ifeq'. | ||
1200 | If your file exports symbols then add the names to | ||
1201 | <varname>export-objs</varname> so that genksyms will find them. | ||
1202 | <caution> | ||
1203 | <para> | ||
1204 | There is a restriction on the kernel build system that objects | ||
1205 | which export symbols must have globally unique names. | ||
1206 | If your object does not have a globally unique name then the | ||
1207 | standard fix is to move the | ||
1208 | <function>EXPORT_SYMBOL()</function> statements to their own | ||
1209 | object with a unique name. | ||
1210 | This is why several systems have separate exporting objects, | ||
1211 | usually suffixed with ksyms. | ||
1212 | </para> | ||
1213 | </caution> | ||
1214 | </para> | ||
1215 | </listitem> | ||
1216 | |||
1217 | <listitem> | ||
1218 | <para> | ||
1219 | Document your option in Documentation/Configure.help. Mention | ||
1220 | incompatibilities and issues here. <emphasis> Definitely | ||
1221 | </emphasis> end your description with <quote> if in doubt, say N | ||
1222 | </quote> (or, occasionally, `Y'); this is for people who have no | ||
1223 | idea what you are talking about. | ||
1224 | </para> | ||
1225 | </listitem> | ||
1226 | |||
1227 | <listitem> | ||
1228 | <para> | ||
1229 | Put yourself in <filename>CREDITS</filename> if you've done | ||
1230 | something noteworthy, usually beyond a single file (your name | ||
1231 | should be at the top of the source files anyway). | ||
1232 | <filename>MAINTAINERS</filename> means you want to be consulted | ||
1233 | when changes are made to a subsystem, and hear about bugs; it | ||
1234 | implies a more-than-passing commitment to some part of the code. | ||
1235 | </para> | ||
1236 | </listitem> | ||
1237 | |||
1238 | <listitem> | ||
1239 | <para> | ||
1240 | Finally, don't forget to read <filename>Documentation/SubmittingPatches</filename> | ||
1241 | and possibly <filename>Documentation/SubmittingDrivers</filename>. | ||
1242 | </para> | ||
1243 | </listitem> | ||
1244 | </itemizedlist> | ||
1245 | </chapter> | ||
1246 | |||
1247 | <chapter id="cantrips"> | ||
1248 | <title>Kernel Cantrips</title> | ||
1249 | |||
1250 | <para> | ||
1251 | Some favorites from browsing the source. Feel free to add to this | ||
1252 | list. | ||
1253 | </para> | ||
1254 | |||
1255 | <para> | ||
1256 | <filename>include/linux/brlock.h:</filename> | ||
1257 | </para> | ||
1258 | <programlisting> | ||
1259 | extern inline void br_read_lock (enum brlock_indices idx) | ||
1260 | { | ||
1261 | /* | ||
1262 | * This causes a link-time bug message if an | ||
1263 | * invalid index is used: | ||
1264 | */ | ||
1265 | if (idx >= __BR_END) | ||
1266 | __br_lock_usage_bug(); | ||
1267 | |||
1268 | read_lock(&__brlock_array[smp_processor_id()][idx]); | ||
1269 | } | ||
1270 | </programlisting> | ||
1271 | |||
1272 | <para> | ||
1273 | <filename>include/linux/fs.h</filename>: | ||
1274 | </para> | ||
1275 | <programlisting> | ||
1276 | /* | ||
1277 | * Kernel pointers have redundant information, so we can use a | ||
1278 | * scheme where we can return either an error code or a dentry | ||
1279 | * pointer with the same return value. | ||
1280 | * | ||
1281 | * This should be a per-architecture thing, to allow different | ||
1282 | * error and pointer decisions. | ||
1283 | */ | ||
1284 | #define ERR_PTR(err) ((void *)((long)(err))) | ||
1285 | #define PTR_ERR(ptr) ((long)(ptr)) | ||
1286 | #define IS_ERR(ptr) ((unsigned long)(ptr) > (unsigned long)(-1000)) | ||
1287 | </programlisting> | ||
1288 | |||
1289 | <para> | ||
1290 | <filename>include/asm-i386/uaccess.h:</filename> | ||
1291 | </para> | ||
1292 | |||
1293 | <programlisting> | ||
1294 | #define copy_to_user(to,from,n) \ | ||
1295 | (__builtin_constant_p(n) ? \ | ||
1296 | __constant_copy_to_user((to),(from),(n)) : \ | ||
1297 | __generic_copy_to_user((to),(from),(n))) | ||
1298 | </programlisting> | ||
1299 | |||
1300 | <para> | ||
1301 | <filename>arch/sparc/kernel/head.S:</filename> | ||
1302 | </para> | ||
1303 | |||
1304 | <programlisting> | ||
1305 | /* | ||
1306 | * Sun people can't spell worth damn. "compatability" indeed. | ||
1307 | * At least we *know* we can't spell, and use a spell-checker. | ||
1308 | */ | ||
1309 | |||
1310 | /* Uh, actually Linus it is I who cannot spell. Too much murky | ||
1311 | * Sparc assembly will do this to ya. | ||
1312 | */ | ||
1313 | C_LABEL(cputypvar): | ||
1314 | .asciz "compatability" | ||
1315 | |||
1316 | /* Tested on SS-5, SS-10. Probably someone at Sun applied a spell-checker. */ | ||
1317 | .align 4 | ||
1318 | C_LABEL(cputypvar_sun4m): | ||
1319 | .asciz "compatible" | ||
1320 | </programlisting> | ||
1321 | |||
1322 | <para> | ||
1323 | <filename>arch/sparc/lib/checksum.S:</filename> | ||
1324 | </para> | ||
1325 | |||
1326 | <programlisting> | ||
1327 | /* Sun, you just can't beat me, you just can't. Stop trying, | ||
1328 | * give up. I'm serious, I am going to kick the living shit | ||
1329 | * out of you, game over, lights out. | ||
1330 | */ | ||
1331 | </programlisting> | ||
1332 | </chapter> | ||
1333 | |||
1334 | <chapter id="credits"> | ||
1335 | <title>Thanks</title> | ||
1336 | |||
1337 | <para> | ||
1338 | Thanks to Andi Kleen for the idea, answering my questions, fixing | ||
1339 | my mistakes, filling content, etc. Philipp Rumpf for more spelling | ||
1340 | and clarity fixes, and some excellent non-obvious points. Werner | ||
1341 | Almesberger for giving me a great summary of | ||
1342 | <function>disable_irq()</function>, and Jes Sorensen and Andrea | ||
1343 | Arcangeli added caveats. Michael Elizabeth Chastain for checking | ||
1344 | and adding to the Configure section. <!-- Rusty insisted on this | ||
1345 | bit; I didn't do it! --> Telsa Gwynne for teaching me DocBook. | ||
1346 | </para> | ||
1347 | </chapter> | ||
1348 | </book> | ||
1349 | |||