diff options
Diffstat (limited to 'Documentation')
172 files changed, 12649 insertions, 6775 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index 5f7f7d7f77d2..02457ec9c94f 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX | |||
@@ -184,6 +184,8 @@ mtrr.txt | |||
184 | - how to use PPro Memory Type Range Registers to increase performance. | 184 | - how to use PPro Memory Type Range Registers to increase performance. |
185 | nbd.txt | 185 | nbd.txt |
186 | - info on a TCP implementation of a network block device. | 186 | - info on a TCP implementation of a network block device. |
187 | netlabel/ | ||
188 | - directory with information on the NetLabel subsystem. | ||
187 | networking/ | 189 | networking/ |
188 | - directory with info on various aspects of networking with Linux. | 190 | - directory with info on various aspects of networking with Linux. |
189 | nfsroot.txt | 191 | nfsroot.txt |
diff --git a/Documentation/ABI/obsolete/devfs b/Documentation/ABI/removed/devfs index b8b87399bc8f..8195c4e0d0a1 100644 --- a/Documentation/ABI/obsolete/devfs +++ b/Documentation/ABI/removed/devfs | |||
@@ -1,13 +1,12 @@ | |||
1 | What: devfs | 1 | What: devfs |
2 | Date: July 2005 | 2 | Date: July 2005 (scheduled), finally removed in kernel v2.6.18 |
3 | Contact: Greg Kroah-Hartman <gregkh@suse.de> | 3 | Contact: Greg Kroah-Hartman <gregkh@suse.de> |
4 | Description: | 4 | Description: |
5 | devfs has been unmaintained for a number of years, has unfixable | 5 | devfs has been unmaintained for a number of years, has unfixable |
6 | races, contains a naming policy within the kernel that is | 6 | races, contains a naming policy within the kernel that is |
7 | against the LSB, and can be replaced by using udev. | 7 | against the LSB, and can be replaced by using udev. |
8 | The files fs/devfs/*, include/linux/devfs_fs*.h will be removed, | 8 | The files fs/devfs/*, include/linux/devfs_fs*.h were removed, |
9 | along with the the assorted devfs function calls throughout the | 9 | along with the the assorted devfs function calls throughout the |
10 | kernel tree. | 10 | kernel tree. |
11 | 11 | ||
12 | Users: | 12 | Users: |
13 | |||
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power new file mode 100644 index 000000000000..d882f8093871 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-power | |||
@@ -0,0 +1,88 @@ | |||
1 | What: /sys/power/ | ||
2 | Date: August 2006 | ||
3 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
4 | Description: | ||
5 | The /sys/power directory will contain files that will | ||
6 | provide a unified interface to the power management | ||
7 | subsystem. | ||
8 | |||
9 | What: /sys/power/state | ||
10 | Date: August 2006 | ||
11 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
12 | Description: | ||
13 | The /sys/power/state file controls the system power state. | ||
14 | Reading from this file returns what states are supported, | ||
15 | which is hard-coded to 'standby' (Power-On Suspend), 'mem' | ||
16 | (Suspend-to-RAM), and 'disk' (Suspend-to-Disk). | ||
17 | |||
18 | Writing to this file one of these strings causes the system to | ||
19 | transition into that state. Please see the file | ||
20 | Documentation/power/states.txt for a description of each of | ||
21 | these states. | ||
22 | |||
23 | What: /sys/power/disk | ||
24 | Date: August 2006 | ||
25 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
26 | Description: | ||
27 | The /sys/power/disk file controls the operating mode of the | ||
28 | suspend-to-disk mechanism. Reading from this file returns | ||
29 | the name of the method by which the system will be put to | ||
30 | sleep on the next suspend. There are four methods supported: | ||
31 | 'firmware' - means that the memory image will be saved to disk | ||
32 | by some firmware, in which case we also assume that the | ||
33 | firmware will handle the system suspend. | ||
34 | 'platform' - the memory image will be saved by the kernel and | ||
35 | the system will be put to sleep by the platform driver (e.g. | ||
36 | ACPI or other PM registers). | ||
37 | 'shutdown' - the memory image will be saved by the kernel and | ||
38 | the system will be powered off. | ||
39 | 'reboot' - the memory image will be saved by the kernel and | ||
40 | the system will be rebooted. | ||
41 | |||
42 | The suspend-to-disk method may be chosen by writing to this | ||
43 | file one of the accepted strings: | ||
44 | |||
45 | 'firmware' | ||
46 | 'platform' | ||
47 | 'shutdown' | ||
48 | 'reboot' | ||
49 | |||
50 | It will only change to 'firmware' or 'platform' if the system | ||
51 | supports that. | ||
52 | |||
53 | What: /sys/power/image_size | ||
54 | Date: August 2006 | ||
55 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
56 | Description: | ||
57 | The /sys/power/image_size file controls the size of the image | ||
58 | created by the suspend-to-disk mechanism. It can be written a | ||
59 | string representing a non-negative integer that will be used | ||
60 | as an upper limit of the image size, in bytes. The kernel's | ||
61 | suspend-to-disk code will do its best to ensure the image size | ||
62 | will not exceed this number. However, if it turns out to be | ||
63 | impossible, the kernel will try to suspend anyway using the | ||
64 | smallest image possible. In particular, if "0" is written to | ||
65 | this file, the suspend image will be as small as possible. | ||
66 | |||
67 | Reading from this file will display the current image size | ||
68 | limit, which is set to 500 MB by default. | ||
69 | |||
70 | What: /sys/power/pm_trace | ||
71 | Date: August 2006 | ||
72 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
73 | Description: | ||
74 | The /sys/power/pm_trace file controls the code which saves the | ||
75 | last PM event point in the RTC across reboots, so that you can | ||
76 | debug a machine that just hangs during suspend (or more | ||
77 | commonly, during resume). Namely, the RTC is only used to save | ||
78 | the last PM event point if this file contains '1'. Initially | ||
79 | it contains '0' which may be changed to '1' by writing a | ||
80 | string representing a nonzero integer into it. | ||
81 | |||
82 | To use this debugging feature you should attempt to suspend | ||
83 | the machine, then reboot it and run | ||
84 | |||
85 | dmesg -s 1000000 | grep 'hash matches' | ||
86 | |||
87 | CAUTION: Using it will cause your machine's real-time (CMOS) | ||
88 | clock to be set to a random invalid time after a resume. | ||
diff --git a/Documentation/Changes b/Documentation/Changes index b02f476c2973..abee7f58c1ed 100644 --- a/Documentation/Changes +++ b/Documentation/Changes | |||
@@ -37,15 +37,14 @@ o e2fsprogs 1.29 # tune2fs | |||
37 | o jfsutils 1.1.3 # fsck.jfs -V | 37 | o jfsutils 1.1.3 # fsck.jfs -V |
38 | o reiserfsprogs 3.6.3 # reiserfsck -V 2>&1|grep reiserfsprogs | 38 | o reiserfsprogs 3.6.3 # reiserfsck -V 2>&1|grep reiserfsprogs |
39 | o xfsprogs 2.6.0 # xfs_db -V | 39 | o xfsprogs 2.6.0 # xfs_db -V |
40 | o pcmciautils 004 | 40 | o pcmciautils 004 # pccardctl -V |
41 | o pcmcia-cs 3.1.21 # cardmgr -V | ||
42 | o quota-tools 3.09 # quota -V | 41 | o quota-tools 3.09 # quota -V |
43 | o PPP 2.4.0 # pppd --version | 42 | o PPP 2.4.0 # pppd --version |
44 | o isdn4k-utils 3.1pre1 # isdnctrl 2>&1|grep version | 43 | o isdn4k-utils 3.1pre1 # isdnctrl 2>&1|grep version |
45 | o nfs-utils 1.0.5 # showmount --version | 44 | o nfs-utils 1.0.5 # showmount --version |
46 | o procps 3.2.0 # ps --version | 45 | o procps 3.2.0 # ps --version |
47 | o oprofile 0.9 # oprofiled --version | 46 | o oprofile 0.9 # oprofiled --version |
48 | o udev 071 # udevinfo -V | 47 | o udev 081 # udevinfo -V |
49 | 48 | ||
50 | Kernel compilation | 49 | Kernel compilation |
51 | ================== | 50 | ================== |
@@ -181,8 +180,8 @@ Intel IA32 microcode | |||
181 | -------------------- | 180 | -------------------- |
182 | 181 | ||
183 | A driver has been added to allow updating of Intel IA32 microcode, | 182 | A driver has been added to allow updating of Intel IA32 microcode, |
184 | accessible as both a devfs regular file and as a normal (misc) | 183 | accessible as a normal (misc) character device. If you are not using |
185 | character device. If you are not using devfs you may need to: | 184 | udev you may need to: |
186 | 185 | ||
187 | mkdir /dev/cpu | 186 | mkdir /dev/cpu |
188 | mknod /dev/cpu/microcode c 10 184 | 187 | mknod /dev/cpu/microcode c 10 184 |
@@ -201,7 +200,9 @@ with programs using shared memory. | |||
201 | udev | 200 | udev |
202 | ---- | 201 | ---- |
203 | udev is a userspace application for populating /dev dynamically with | 202 | udev is a userspace application for populating /dev dynamically with |
204 | only entries for devices actually present. udev replaces devfs. | 203 | only entries for devices actually present. udev replaces the basic |
204 | functionality of devfs, while allowing persistant device naming for | ||
205 | devices. | ||
205 | 206 | ||
206 | FUSE | 207 | FUSE |
207 | ---- | 208 | ---- |
@@ -231,18 +232,13 @@ The PPP driver has been restructured to support multilink and to | |||
231 | enable it to operate over diverse media layers. If you use PPP, | 232 | enable it to operate over diverse media layers. If you use PPP, |
232 | upgrade pppd to at least 2.4.0. | 233 | upgrade pppd to at least 2.4.0. |
233 | 234 | ||
234 | If you are not using devfs, you must have the device file /dev/ppp | 235 | If you are not using udev, you must have the device file /dev/ppp |
235 | which can be made by: | 236 | which can be made by: |
236 | 237 | ||
237 | mknod /dev/ppp c 108 0 | 238 | mknod /dev/ppp c 108 0 |
238 | 239 | ||
239 | as root. | 240 | as root. |
240 | 241 | ||
241 | If you use devfsd and build ppp support as modules, you will need | ||
242 | the following in your /etc/devfsd.conf file: | ||
243 | |||
244 | LOOKUP PPP MODLOAD | ||
245 | |||
246 | Isdn4k-utils | 242 | Isdn4k-utils |
247 | ------------ | 243 | ------------ |
248 | 244 | ||
@@ -271,7 +267,7 @@ active clients. | |||
271 | 267 | ||
272 | To enable this new functionality, you need to: | 268 | To enable this new functionality, you need to: |
273 | 269 | ||
274 | mount -t nfsd nfsd /proc/fs/nfs | 270 | mount -t nfsd nfsd /proc/fs/nfsd |
275 | 271 | ||
276 | before running exportfs or mountd. It is recommended that all NFS | 272 | before running exportfs or mountd. It is recommended that all NFS |
277 | services be protected from the internet-at-large by a firewall where | 273 | services be protected from the internet-at-large by a firewall where |
diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle index 6d2412ec91ed..29c18966b050 100644 --- a/Documentation/CodingStyle +++ b/Documentation/CodingStyle | |||
@@ -532,6 +532,40 @@ appears outweighs the potential value of the hint that tells gcc to do | |||
532 | something it would have done anyway. | 532 | something it would have done anyway. |
533 | 533 | ||
534 | 534 | ||
535 | Chapter 16: Function return values and names | ||
536 | |||
537 | Functions can return values of many different kinds, and one of the | ||
538 | most common is a value indicating whether the function succeeded or | ||
539 | failed. Such a value can be represented as an error-code integer | ||
540 | (-Exxx = failure, 0 = success) or a "succeeded" boolean (0 = failure, | ||
541 | non-zero = success). | ||
542 | |||
543 | Mixing up these two sorts of representations is a fertile source of | ||
544 | difficult-to-find bugs. If the C language included a strong distinction | ||
545 | between integers and booleans then the compiler would find these mistakes | ||
546 | for us... but it doesn't. To help prevent such bugs, always follow this | ||
547 | convention: | ||
548 | |||
549 | If the name of a function is an action or an imperative command, | ||
550 | the function should return an error-code integer. If the name | ||
551 | is a predicate, the function should return a "succeeded" boolean. | ||
552 | |||
553 | For example, "add work" is a command, and the add_work() function returns 0 | ||
554 | for success or -EBUSY for failure. In the same way, "PCI device present" is | ||
555 | a predicate, and the pci_dev_present() function returns 1 if it succeeds in | ||
556 | finding a matching device or 0 if it doesn't. | ||
557 | |||
558 | All EXPORTed functions must respect this convention, and so should all | ||
559 | public functions. Private (static) functions need not, but it is | ||
560 | recommended that they do. | ||
561 | |||
562 | Functions whose return value is the actual result of a computation, rather | ||
563 | than an indication of whether the computation succeeded, are not subject to | ||
564 | this rule. Generally they indicate failure by returning some out-of-range | ||
565 | result. Typical examples would be functions that return pointers; they use | ||
566 | NULL or the ERR_PTR mechanism to report failure. | ||
567 | |||
568 | |||
535 | 569 | ||
536 | Appendix I: References | 570 | Appendix I: References |
537 | 571 | ||
diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt index 7c717699032c..63392c9132b4 100644 --- a/Documentation/DMA-mapping.txt +++ b/Documentation/DMA-mapping.txt | |||
@@ -698,12 +698,12 @@ these interfaces. Remember that, as defined, consistent mappings are | |||
698 | always going to be SAC addressable. | 698 | always going to be SAC addressable. |
699 | 699 | ||
700 | The first thing your driver needs to do is query the PCI platform | 700 | The first thing your driver needs to do is query the PCI platform |
701 | layer with your devices DAC addressing capabilities: | 701 | layer if it is capable of handling your devices DAC addressing |
702 | capabilities: | ||
702 | 703 | ||
703 | int pci_dac_set_dma_mask(struct pci_dev *pdev, u64 mask); | 704 | int pci_dac_dma_supported(struct pci_dev *hwdev, u64 mask); |
704 | 705 | ||
705 | This routine behaves identically to pci_set_dma_mask. You may not | 706 | You may not use the following interfaces if this routine fails. |
706 | use the following interfaces if this routine fails. | ||
707 | 707 | ||
708 | Next, DMA addresses using this API are kept track of using the | 708 | Next, DMA addresses using this API are kept track of using the |
709 | dma64_addr_t type. It is guaranteed to be big enough to hold any | 709 | dma64_addr_t type. It is guaranteed to be big enough to hold any |
diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 5a2882d275ba..66e1cf733571 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile | |||
@@ -10,7 +10,8 @@ DOCBOOKS := wanbook.xml z8530book.xml mcabook.xml videobook.xml \ | |||
10 | kernel-hacking.xml kernel-locking.xml deviceiobook.xml \ | 10 | kernel-hacking.xml kernel-locking.xml deviceiobook.xml \ |
11 | procfs-guide.xml writing_usb_driver.xml \ | 11 | procfs-guide.xml writing_usb_driver.xml \ |
12 | kernel-api.xml journal-api.xml lsm.xml usb.xml \ | 12 | kernel-api.xml journal-api.xml lsm.xml usb.xml \ |
13 | gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml | 13 | gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \ |
14 | genericirq.xml | ||
14 | 15 | ||
15 | ### | 16 | ### |
16 | # The build process is as follows (targets): | 17 | # The build process is as follows (targets): |
diff --git a/Documentation/DocBook/genericirq.tmpl b/Documentation/DocBook/genericirq.tmpl new file mode 100644 index 000000000000..0f4a4b6321e4 --- /dev/null +++ b/Documentation/DocBook/genericirq.tmpl | |||
@@ -0,0 +1,474 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="Generic-IRQ-Guide"> | ||
6 | <bookinfo> | ||
7 | <title>Linux generic IRQ handling</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>Thomas</firstname> | ||
12 | <surname>Gleixner</surname> | ||
13 | <affiliation> | ||
14 | <address> | ||
15 | <email>tglx@linutronix.de</email> | ||
16 | </address> | ||
17 | </affiliation> | ||
18 | </author> | ||
19 | <author> | ||
20 | <firstname>Ingo</firstname> | ||
21 | <surname>Molnar</surname> | ||
22 | <affiliation> | ||
23 | <address> | ||
24 | <email>mingo@elte.hu</email> | ||
25 | </address> | ||
26 | </affiliation> | ||
27 | </author> | ||
28 | </authorgroup> | ||
29 | |||
30 | <copyright> | ||
31 | <year>2005-2006</year> | ||
32 | <holder>Thomas Gleixner</holder> | ||
33 | </copyright> | ||
34 | <copyright> | ||
35 | <year>2005-2006</year> | ||
36 | <holder>Ingo Molnar</holder> | ||
37 | </copyright> | ||
38 | |||
39 | <legalnotice> | ||
40 | <para> | ||
41 | This documentation is free software; you can redistribute | ||
42 | it and/or modify it under the terms of the GNU General Public | ||
43 | License version 2 as published by the Free Software Foundation. | ||
44 | </para> | ||
45 | |||
46 | <para> | ||
47 | This program is distributed in the hope that it will be | ||
48 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
49 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
50 | See the GNU General Public License for more details. | ||
51 | </para> | ||
52 | |||
53 | <para> | ||
54 | You should have received a copy of the GNU General Public | ||
55 | License along with this program; if not, write to the Free | ||
56 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
57 | MA 02111-1307 USA | ||
58 | </para> | ||
59 | |||
60 | <para> | ||
61 | For more details see the file COPYING in the source | ||
62 | distribution of Linux. | ||
63 | </para> | ||
64 | </legalnotice> | ||
65 | </bookinfo> | ||
66 | |||
67 | <toc></toc> | ||
68 | |||
69 | <chapter id="intro"> | ||
70 | <title>Introduction</title> | ||
71 | <para> | ||
72 | The generic interrupt handling layer is designed to provide a | ||
73 | complete abstraction of interrupt handling for device drivers. | ||
74 | It is able to handle all the different types of interrupt controller | ||
75 | hardware. Device drivers use generic API functions to request, enable, | ||
76 | disable and free interrupts. The drivers do not have to know anything | ||
77 | about interrupt hardware details, so they can be used on different | ||
78 | platforms without code changes. | ||
79 | </para> | ||
80 | <para> | ||
81 | This documentation is provided to developers who want to implement | ||
82 | an interrupt subsystem based for their architecture, with the help | ||
83 | of the generic IRQ handling layer. | ||
84 | </para> | ||
85 | </chapter> | ||
86 | |||
87 | <chapter id="rationale"> | ||
88 | <title>Rationale</title> | ||
89 | <para> | ||
90 | The original implementation of interrupt handling in Linux is using | ||
91 | the __do_IRQ() super-handler, which is able to deal with every | ||
92 | type of interrupt logic. | ||
93 | </para> | ||
94 | <para> | ||
95 | Originally, Russell King identified different types of handlers to | ||
96 | build a quite universal set for the ARM interrupt handler | ||
97 | implementation in Linux 2.5/2.6. He distinguished between: | ||
98 | <itemizedlist> | ||
99 | <listitem><para>Level type</para></listitem> | ||
100 | <listitem><para>Edge type</para></listitem> | ||
101 | <listitem><para>Simple type</para></listitem> | ||
102 | </itemizedlist> | ||
103 | In the SMP world of the __do_IRQ() super-handler another type | ||
104 | was identified: | ||
105 | <itemizedlist> | ||
106 | <listitem><para>Per CPU type</para></listitem> | ||
107 | </itemizedlist> | ||
108 | </para> | ||
109 | <para> | ||
110 | This split implementation of highlevel IRQ handlers allows us to | ||
111 | optimize the flow of the interrupt handling for each specific | ||
112 | interrupt type. This reduces complexity in that particular codepath | ||
113 | and allows the optimized handling of a given type. | ||
114 | </para> | ||
115 | <para> | ||
116 | The original general IRQ implementation used hw_interrupt_type | ||
117 | structures and their ->ack(), ->end() [etc.] callbacks to | ||
118 | differentiate the flow control in the super-handler. This leads to | ||
119 | a mix of flow logic and lowlevel hardware logic, and it also leads | ||
120 | to unnecessary code duplication: for example in i386, there is a | ||
121 | ioapic_level_irq and a ioapic_edge_irq irq-type which share many | ||
122 | of the lowlevel details but have different flow handling. | ||
123 | </para> | ||
124 | <para> | ||
125 | A more natural abstraction is the clean separation of the | ||
126 | 'irq flow' and the 'chip details'. | ||
127 | </para> | ||
128 | <para> | ||
129 | Analysing a couple of architecture's IRQ subsystem implementations | ||
130 | reveals that most of them can use a generic set of 'irq flow' | ||
131 | methods and only need to add the chip level specific code. | ||
132 | The separation is also valuable for (sub)architectures | ||
133 | which need specific quirks in the irq flow itself but not in the | ||
134 | chip-details - and thus provides a more transparent IRQ subsystem | ||
135 | design. | ||
136 | </para> | ||
137 | <para> | ||
138 | Each interrupt descriptor is assigned its own highlevel flow | ||
139 | handler, which is normally one of the generic | ||
140 | implementations. (This highlevel flow handler implementation also | ||
141 | makes it simple to provide demultiplexing handlers which can be | ||
142 | found in embedded platforms on various architectures.) | ||
143 | </para> | ||
144 | <para> | ||
145 | The separation makes the generic interrupt handling layer more | ||
146 | flexible and extensible. For example, an (sub)architecture can | ||
147 | use a generic irq-flow implementation for 'level type' interrupts | ||
148 | and add a (sub)architecture specific 'edge type' implementation. | ||
149 | </para> | ||
150 | <para> | ||
151 | To make the transition to the new model easier and prevent the | ||
152 | breakage of existing implementations, the __do_IRQ() super-handler | ||
153 | is still available. This leads to a kind of duality for the time | ||
154 | being. Over time the new model should be used in more and more | ||
155 | architectures, as it enables smaller and cleaner IRQ subsystems. | ||
156 | </para> | ||
157 | </chapter> | ||
158 | <chapter id="bugs"> | ||
159 | <title>Known Bugs And Assumptions</title> | ||
160 | <para> | ||
161 | None (knock on wood). | ||
162 | </para> | ||
163 | </chapter> | ||
164 | |||
165 | <chapter id="Abstraction"> | ||
166 | <title>Abstraction layers</title> | ||
167 | <para> | ||
168 | There are three main levels of abstraction in the interrupt code: | ||
169 | <orderedlist> | ||
170 | <listitem><para>Highlevel driver API</para></listitem> | ||
171 | <listitem><para>Highlevel IRQ flow handlers</para></listitem> | ||
172 | <listitem><para>Chiplevel hardware encapsulation</para></listitem> | ||
173 | </orderedlist> | ||
174 | </para> | ||
175 | <sect1> | ||
176 | <title>Interrupt control flow</title> | ||
177 | <para> | ||
178 | Each interrupt is described by an interrupt descriptor structure | ||
179 | irq_desc. The interrupt is referenced by an 'unsigned int' numeric | ||
180 | value which selects the corresponding interrupt decription structure | ||
181 | in the descriptor structures array. | ||
182 | The descriptor structure contains status information and pointers | ||
183 | to the interrupt flow method and the interrupt chip structure | ||
184 | which are assigned to this interrupt. | ||
185 | </para> | ||
186 | <para> | ||
187 | Whenever an interrupt triggers, the lowlevel arch code calls into | ||
188 | the generic interrupt code by calling desc->handle_irq(). | ||
189 | This highlevel IRQ handling function only uses desc->chip primitives | ||
190 | referenced by the assigned chip descriptor structure. | ||
191 | </para> | ||
192 | </sect1> | ||
193 | <sect1> | ||
194 | <title>Highlevel Driver API</title> | ||
195 | <para> | ||
196 | The highlevel Driver API consists of following functions: | ||
197 | <itemizedlist> | ||
198 | <listitem><para>request_irq()</para></listitem> | ||
199 | <listitem><para>free_irq()</para></listitem> | ||
200 | <listitem><para>disable_irq()</para></listitem> | ||
201 | <listitem><para>enable_irq()</para></listitem> | ||
202 | <listitem><para>disable_irq_nosync() (SMP only)</para></listitem> | ||
203 | <listitem><para>synchronize_irq() (SMP only)</para></listitem> | ||
204 | <listitem><para>set_irq_type()</para></listitem> | ||
205 | <listitem><para>set_irq_wake()</para></listitem> | ||
206 | <listitem><para>set_irq_data()</para></listitem> | ||
207 | <listitem><para>set_irq_chip()</para></listitem> | ||
208 | <listitem><para>set_irq_chip_data()</para></listitem> | ||
209 | </itemizedlist> | ||
210 | See the autogenerated function documentation for details. | ||
211 | </para> | ||
212 | </sect1> | ||
213 | <sect1> | ||
214 | <title>Highlevel IRQ flow handlers</title> | ||
215 | <para> | ||
216 | The generic layer provides a set of pre-defined irq-flow methods: | ||
217 | <itemizedlist> | ||
218 | <listitem><para>handle_level_irq</para></listitem> | ||
219 | <listitem><para>handle_edge_irq</para></listitem> | ||
220 | <listitem><para>handle_simple_irq</para></listitem> | ||
221 | <listitem><para>handle_percpu_irq</para></listitem> | ||
222 | </itemizedlist> | ||
223 | The interrupt flow handlers (either predefined or architecture | ||
224 | specific) are assigned to specific interrupts by the architecture | ||
225 | either during bootup or during device initialization. | ||
226 | </para> | ||
227 | <sect2> | ||
228 | <title>Default flow implementations</title> | ||
229 | <sect3> | ||
230 | <title>Helper functions</title> | ||
231 | <para> | ||
232 | The helper functions call the chip primitives and | ||
233 | are used by the default flow implementations. | ||
234 | The following helper functions are implemented (simplified excerpt): | ||
235 | <programlisting> | ||
236 | default_enable(irq) | ||
237 | { | ||
238 | desc->chip->unmask(irq); | ||
239 | } | ||
240 | |||
241 | default_disable(irq) | ||
242 | { | ||
243 | if (!delay_disable(irq)) | ||
244 | desc->chip->mask(irq); | ||
245 | } | ||
246 | |||
247 | default_ack(irq) | ||
248 | { | ||
249 | chip->ack(irq); | ||
250 | } | ||
251 | |||
252 | default_mask_ack(irq) | ||
253 | { | ||
254 | if (chip->mask_ack) { | ||
255 | chip->mask_ack(irq); | ||
256 | } else { | ||
257 | chip->mask(irq); | ||
258 | chip->ack(irq); | ||
259 | } | ||
260 | } | ||
261 | |||
262 | noop(irq) | ||
263 | { | ||
264 | } | ||
265 | |||
266 | </programlisting> | ||
267 | </para> | ||
268 | </sect3> | ||
269 | </sect2> | ||
270 | <sect2> | ||
271 | <title>Default flow handler implementations</title> | ||
272 | <sect3> | ||
273 | <title>Default Level IRQ flow handler</title> | ||
274 | <para> | ||
275 | handle_level_irq provides a generic implementation | ||
276 | for level-triggered interrupts. | ||
277 | </para> | ||
278 | <para> | ||
279 | The following control flow is implemented (simplified excerpt): | ||
280 | <programlisting> | ||
281 | desc->chip->start(); | ||
282 | handle_IRQ_event(desc->action); | ||
283 | desc->chip->end(); | ||
284 | </programlisting> | ||
285 | </para> | ||
286 | </sect3> | ||
287 | <sect3> | ||
288 | <title>Default Edge IRQ flow handler</title> | ||
289 | <para> | ||
290 | handle_edge_irq provides a generic implementation | ||
291 | for edge-triggered interrupts. | ||
292 | </para> | ||
293 | <para> | ||
294 | The following control flow is implemented (simplified excerpt): | ||
295 | <programlisting> | ||
296 | if (desc->status & running) { | ||
297 | desc->chip->hold(); | ||
298 | desc->status |= pending | masked; | ||
299 | return; | ||
300 | } | ||
301 | desc->chip->start(); | ||
302 | desc->status |= running; | ||
303 | do { | ||
304 | if (desc->status & masked) | ||
305 | desc->chip->enable(); | ||
306 | desc-status &= ~pending; | ||
307 | handle_IRQ_event(desc->action); | ||
308 | } while (status & pending); | ||
309 | desc-status &= ~running; | ||
310 | desc->chip->end(); | ||
311 | </programlisting> | ||
312 | </para> | ||
313 | </sect3> | ||
314 | <sect3> | ||
315 | <title>Default simple IRQ flow handler</title> | ||
316 | <para> | ||
317 | handle_simple_irq provides a generic implementation | ||
318 | for simple interrupts. | ||
319 | </para> | ||
320 | <para> | ||
321 | Note: The simple flow handler does not call any | ||
322 | handler/chip primitives. | ||
323 | </para> | ||
324 | <para> | ||
325 | The following control flow is implemented (simplified excerpt): | ||
326 | <programlisting> | ||
327 | handle_IRQ_event(desc->action); | ||
328 | </programlisting> | ||
329 | </para> | ||
330 | </sect3> | ||
331 | <sect3> | ||
332 | <title>Default per CPU flow handler</title> | ||
333 | <para> | ||
334 | handle_percpu_irq provides a generic implementation | ||
335 | for per CPU interrupts. | ||
336 | </para> | ||
337 | <para> | ||
338 | Per CPU interrupts are only available on SMP and | ||
339 | the handler provides a simplified version without | ||
340 | locking. | ||
341 | </para> | ||
342 | <para> | ||
343 | The following control flow is implemented (simplified excerpt): | ||
344 | <programlisting> | ||
345 | desc->chip->start(); | ||
346 | handle_IRQ_event(desc->action); | ||
347 | desc->chip->end(); | ||
348 | </programlisting> | ||
349 | </para> | ||
350 | </sect3> | ||
351 | </sect2> | ||
352 | <sect2> | ||
353 | <title>Quirks and optimizations</title> | ||
354 | <para> | ||
355 | The generic functions are intended for 'clean' architectures and chips, | ||
356 | which have no platform-specific IRQ handling quirks. If an architecture | ||
357 | needs to implement quirks on the 'flow' level then it can do so by | ||
358 | overriding the highlevel irq-flow handler. | ||
359 | </para> | ||
360 | </sect2> | ||
361 | <sect2> | ||
362 | <title>Delayed interrupt disable</title> | ||
363 | <para> | ||
364 | This per interrupt selectable feature, which was introduced by Russell | ||
365 | King in the ARM interrupt implementation, does not mask an interrupt | ||
366 | at the hardware level when disable_irq() is called. The interrupt is | ||
367 | kept enabled and is masked in the flow handler when an interrupt event | ||
368 | happens. This prevents losing edge interrupts on hardware which does | ||
369 | not store an edge interrupt event while the interrupt is disabled at | ||
370 | the hardware level. When an interrupt arrives while the IRQ_DISABLED | ||
371 | flag is set, then the interrupt is masked at the hardware level and | ||
372 | the IRQ_PENDING bit is set. When the interrupt is re-enabled by | ||
373 | enable_irq() the pending bit is checked and if it is set, the | ||
374 | interrupt is resent either via hardware or by a software resend | ||
375 | mechanism. (It's necessary to enable CONFIG_HARDIRQS_SW_RESEND when | ||
376 | you want to use the delayed interrupt disable feature and your | ||
377 | hardware is not capable of retriggering an interrupt.) | ||
378 | The delayed interrupt disable can be runtime enabled, per interrupt, | ||
379 | by setting the IRQ_DELAYED_DISABLE flag in the irq_desc status field. | ||
380 | </para> | ||
381 | </sect2> | ||
382 | </sect1> | ||
383 | <sect1> | ||
384 | <title>Chiplevel hardware encapsulation</title> | ||
385 | <para> | ||
386 | The chip level hardware descriptor structure irq_chip | ||
387 | contains all the direct chip relevant functions, which | ||
388 | can be utilized by the irq flow implementations. | ||
389 | <itemizedlist> | ||
390 | <listitem><para>ack()</para></listitem> | ||
391 | <listitem><para>mask_ack() - Optional, recommended for performance</para></listitem> | ||
392 | <listitem><para>mask()</para></listitem> | ||
393 | <listitem><para>unmask()</para></listitem> | ||
394 | <listitem><para>retrigger() - Optional</para></listitem> | ||
395 | <listitem><para>set_type() - Optional</para></listitem> | ||
396 | <listitem><para>set_wake() - Optional</para></listitem> | ||
397 | </itemizedlist> | ||
398 | These primitives are strictly intended to mean what they say: ack means | ||
399 | ACK, masking means masking of an IRQ line, etc. It is up to the flow | ||
400 | handler(s) to use these basic units of lowlevel functionality. | ||
401 | </para> | ||
402 | </sect1> | ||
403 | </chapter> | ||
404 | |||
405 | <chapter id="doirq"> | ||
406 | <title>__do_IRQ entry point</title> | ||
407 | <para> | ||
408 | The original implementation __do_IRQ() is an alternative entry | ||
409 | point for all types of interrupts. | ||
410 | </para> | ||
411 | <para> | ||
412 | This handler turned out to be not suitable for all | ||
413 | interrupt hardware and was therefore reimplemented with split | ||
414 | functionality for egde/level/simple/percpu interrupts. This is not | ||
415 | only a functional optimization. It also shortens code paths for | ||
416 | interrupts. | ||
417 | </para> | ||
418 | <para> | ||
419 | To make use of the split implementation, replace the call to | ||
420 | __do_IRQ by a call to desc->chip->handle_irq() and associate | ||
421 | the appropriate handler function to desc->chip->handle_irq(). | ||
422 | In most cases the generic handler implementations should | ||
423 | be sufficient. | ||
424 | </para> | ||
425 | </chapter> | ||
426 | |||
427 | <chapter id="locking"> | ||
428 | <title>Locking on SMP</title> | ||
429 | <para> | ||
430 | The locking of chip registers is up to the architecture that | ||
431 | defines the chip primitives. There is a chip->lock field that can be used | ||
432 | for serialization, but the generic layer does not touch it. The per-irq | ||
433 | structure is protected via desc->lock, by the generic layer. | ||
434 | </para> | ||
435 | </chapter> | ||
436 | <chapter id="structs"> | ||
437 | <title>Structures</title> | ||
438 | <para> | ||
439 | This chapter contains the autogenerated documentation of the structures which are | ||
440 | used in the generic IRQ layer. | ||
441 | </para> | ||
442 | !Iinclude/linux/irq.h | ||
443 | </chapter> | ||
444 | |||
445 | <chapter id="pubfunctions"> | ||
446 | <title>Public Functions Provided</title> | ||
447 | <para> | ||
448 | This chapter contains the autogenerated documentation of the kernel API functions | ||
449 | which are exported. | ||
450 | </para> | ||
451 | !Ekernel/irq/manage.c | ||
452 | !Ekernel/irq/chip.c | ||
453 | </chapter> | ||
454 | |||
455 | <chapter id="intfunctions"> | ||
456 | <title>Internal Functions Provided</title> | ||
457 | <para> | ||
458 | This chapter contains the autogenerated documentation of the internal functions. | ||
459 | </para> | ||
460 | !Ikernel/irq/handle.c | ||
461 | !Ikernel/irq/chip.c | ||
462 | </chapter> | ||
463 | |||
464 | <chapter id="credits"> | ||
465 | <title>Credits</title> | ||
466 | <para> | ||
467 | The following people have contributed to this document: | ||
468 | <orderedlist> | ||
469 | <listitem><para>Thomas Gleixner<email>tglx@linutronix.de</email></para></listitem> | ||
470 | <listitem><para>Ingo Molnar<email>mingo@elte.hu</email></para></listitem> | ||
471 | </orderedlist> | ||
472 | </para> | ||
473 | </chapter> | ||
474 | </book> | ||
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index 31b727ceb127..6d4b1ef5b6f1 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl | |||
@@ -59,9 +59,14 @@ | |||
59 | !Iinclude/linux/hrtimer.h | 59 | !Iinclude/linux/hrtimer.h |
60 | !Ekernel/hrtimer.c | 60 | !Ekernel/hrtimer.c |
61 | </sect1> | 61 | </sect1> |
62 | <sect1><title>Workqueues and Kevents</title> | ||
63 | !Ekernel/workqueue.c | ||
64 | </sect1> | ||
62 | <sect1><title>Internal Functions</title> | 65 | <sect1><title>Internal Functions</title> |
63 | !Ikernel/exit.c | 66 | !Ikernel/exit.c |
64 | !Ikernel/signal.c | 67 | !Ikernel/signal.c |
68 | !Iinclude/linux/kthread.h | ||
69 | !Ekernel/kthread.c | ||
65 | </sect1> | 70 | </sect1> |
66 | 71 | ||
67 | <sect1><title>Kernel objects manipulation</title> | 72 | <sect1><title>Kernel objects manipulation</title> |
@@ -114,6 +119,29 @@ X!Ilib/string.c | |||
114 | </sect1> | 119 | </sect1> |
115 | </chapter> | 120 | </chapter> |
116 | 121 | ||
122 | <chapter id="kernel-lib"> | ||
123 | <title>Basic Kernel Library Functions</title> | ||
124 | |||
125 | <para> | ||
126 | The Linux kernel provides more basic utility functions. | ||
127 | </para> | ||
128 | |||
129 | <sect1><title>Bitmap Operations</title> | ||
130 | !Elib/bitmap.c | ||
131 | !Ilib/bitmap.c | ||
132 | </sect1> | ||
133 | |||
134 | <sect1><title>Command-line Parsing</title> | ||
135 | !Elib/cmdline.c | ||
136 | </sect1> | ||
137 | |||
138 | <sect1><title>CRC Functions</title> | ||
139 | !Elib/crc16.c | ||
140 | !Elib/crc32.c | ||
141 | !Elib/crc-ccitt.c | ||
142 | </sect1> | ||
143 | </chapter> | ||
144 | |||
117 | <chapter id="mm"> | 145 | <chapter id="mm"> |
118 | <title>Memory Management in Linux</title> | 146 | <title>Memory Management in Linux</title> |
119 | <sect1><title>The Slab Cache</title> | 147 | <sect1><title>The Slab Cache</title> |
@@ -153,27 +181,6 @@ X!Ilib/string.c | |||
153 | </sect1> | 181 | </sect1> |
154 | </chapter> | 182 | </chapter> |
155 | 183 | ||
156 | <chapter id="proc"> | ||
157 | <title>The proc filesystem</title> | ||
158 | |||
159 | <sect1><title>sysctl interface</title> | ||
160 | !Ekernel/sysctl.c | ||
161 | </sect1> | ||
162 | |||
163 | <sect1><title>proc filesystem interface</title> | ||
164 | !Ifs/proc/base.c | ||
165 | </sect1> | ||
166 | </chapter> | ||
167 | |||
168 | <chapter id="debugfs"> | ||
169 | <title>The debugfs filesystem</title> | ||
170 | |||
171 | <sect1><title>debugfs interface</title> | ||
172 | !Efs/debugfs/inode.c | ||
173 | !Efs/debugfs/file.c | ||
174 | </sect1> | ||
175 | </chapter> | ||
176 | |||
177 | <chapter id="vfs"> | 184 | <chapter id="vfs"> |
178 | <title>The Linux VFS</title> | 185 | <title>The Linux VFS</title> |
179 | <sect1><title>The Filesystem types</title> | 186 | <sect1><title>The Filesystem types</title> |
@@ -206,6 +213,50 @@ X!Ilib/string.c | |||
206 | </sect1> | 213 | </sect1> |
207 | </chapter> | 214 | </chapter> |
208 | 215 | ||
216 | <chapter id="proc"> | ||
217 | <title>The proc filesystem</title> | ||
218 | |||
219 | <sect1><title>sysctl interface</title> | ||
220 | !Ekernel/sysctl.c | ||
221 | </sect1> | ||
222 | |||
223 | <sect1><title>proc filesystem interface</title> | ||
224 | !Ifs/proc/base.c | ||
225 | </sect1> | ||
226 | </chapter> | ||
227 | |||
228 | <chapter id="sysfs"> | ||
229 | <title>The Filesystem for Exporting Kernel Objects</title> | ||
230 | !Efs/sysfs/file.c | ||
231 | !Efs/sysfs/symlink.c | ||
232 | !Efs/sysfs/bin.c | ||
233 | </chapter> | ||
234 | |||
235 | <chapter id="debugfs"> | ||
236 | <title>The debugfs filesystem</title> | ||
237 | |||
238 | <sect1><title>debugfs interface</title> | ||
239 | !Efs/debugfs/inode.c | ||
240 | !Efs/debugfs/file.c | ||
241 | </sect1> | ||
242 | </chapter> | ||
243 | |||
244 | <chapter id="relayfs"> | ||
245 | <title>relay interface support</title> | ||
246 | |||
247 | <para> | ||
248 | Relay interface support | ||
249 | is designed to provide an efficient mechanism for tools and | ||
250 | facilities to relay large amounts of data from kernel space to | ||
251 | user space. | ||
252 | </para> | ||
253 | |||
254 | <sect1><title>relay interface</title> | ||
255 | !Ekernel/relay.c | ||
256 | !Ikernel/relay.c | ||
257 | </sect1> | ||
258 | </chapter> | ||
259 | |||
209 | <chapter id="netcore"> | 260 | <chapter id="netcore"> |
210 | <title>Linux Networking</title> | 261 | <title>Linux Networking</title> |
211 | <sect1><title>Networking Base Types</title> | 262 | <sect1><title>Networking Base Types</title> |
@@ -275,20 +326,19 @@ X!Ekernel/module.c | |||
275 | </sect1> | 326 | </sect1> |
276 | 327 | ||
277 | <sect1><title>Resources Management</title> | 328 | <sect1><title>Resources Management</title> |
278 | !Ekernel/resource.c | 329 | !Ikernel/resource.c |
279 | </sect1> | 330 | </sect1> |
280 | 331 | ||
281 | <sect1><title>MTRR Handling</title> | 332 | <sect1><title>MTRR Handling</title> |
282 | !Earch/i386/kernel/cpu/mtrr/main.c | 333 | !Earch/i386/kernel/cpu/mtrr/main.c |
283 | </sect1> | 334 | </sect1> |
335 | |||
284 | <sect1><title>PCI Support Library</title> | 336 | <sect1><title>PCI Support Library</title> |
285 | !Edrivers/pci/pci.c | 337 | !Edrivers/pci/pci.c |
286 | !Edrivers/pci/pci-driver.c | 338 | !Edrivers/pci/pci-driver.c |
287 | !Edrivers/pci/remove.c | 339 | !Edrivers/pci/remove.c |
288 | !Edrivers/pci/pci-acpi.c | 340 | !Edrivers/pci/pci-acpi.c |
289 | <!-- kerneldoc does not understand to __devinit | 341 | !Edrivers/pci/search.c |
290 | X!Edrivers/pci/search.c | ||
291 | --> | ||
292 | !Edrivers/pci/msi.c | 342 | !Edrivers/pci/msi.c |
293 | !Edrivers/pci/bus.c | 343 | !Edrivers/pci/bus.c |
294 | <!-- FIXME: Removed for now since no structured comments in source | 344 | <!-- FIXME: Removed for now since no structured comments in source |
@@ -315,16 +365,11 @@ X!Earch/i386/kernel/mca.c | |||
315 | </sect1> | 365 | </sect1> |
316 | </chapter> | 366 | </chapter> |
317 | 367 | ||
318 | <chapter id="devfs"> | 368 | <chapter id="firmware"> |
319 | <title>The Device File System</title> | 369 | <title>Firmware Interfaces</title> |
320 | !Efs/devfs/base.c | 370 | <sect1><title>DMI Interfaces</title> |
321 | </chapter> | 371 | !Edrivers/firmware/dmi_scan.c |
322 | 372 | </sect1> | |
323 | <chapter id="sysfs"> | ||
324 | <title>The Filesystem for Exporting Kernel Objects</title> | ||
325 | !Efs/sysfs/file.c | ||
326 | !Efs/sysfs/symlink.c | ||
327 | !Efs/sysfs/bin.c | ||
328 | </chapter> | 373 | </chapter> |
329 | 374 | ||
330 | <chapter id="security"> | 375 | <chapter id="security"> |
@@ -357,6 +402,7 @@ X!Iinclude/linux/device.h | |||
357 | --> | 402 | --> |
358 | !Edrivers/base/driver.c | 403 | !Edrivers/base/driver.c |
359 | !Edrivers/base/core.c | 404 | !Edrivers/base/core.c |
405 | !Edrivers/base/class.c | ||
360 | !Edrivers/base/firmware_class.c | 406 | !Edrivers/base/firmware_class.c |
361 | !Edrivers/base/transport_class.c | 407 | !Edrivers/base/transport_class.c |
362 | !Edrivers/base/dmapool.c | 408 | !Edrivers/base/dmapool.c |
@@ -403,17 +449,29 @@ X!Edrivers/pnp/system.c | |||
403 | </sect1> | 449 | </sect1> |
404 | </chapter> | 450 | </chapter> |
405 | 451 | ||
406 | |||
407 | <chapter id="blkdev"> | 452 | <chapter id="blkdev"> |
408 | <title>Block Devices</title> | 453 | <title>Block Devices</title> |
409 | !Eblock/ll_rw_blk.c | 454 | !Eblock/ll_rw_blk.c |
410 | </chapter> | 455 | </chapter> |
411 | 456 | ||
457 | <chapter id="chrdev"> | ||
458 | <title>Char devices</title> | ||
459 | !Efs/char_dev.c | ||
460 | </chapter> | ||
461 | |||
412 | <chapter id="miscdev"> | 462 | <chapter id="miscdev"> |
413 | <title>Miscellaneous Devices</title> | 463 | <title>Miscellaneous Devices</title> |
414 | !Edrivers/char/misc.c | 464 | !Edrivers/char/misc.c |
415 | </chapter> | 465 | </chapter> |
416 | 466 | ||
467 | <chapter id="parportdev"> | ||
468 | <title>Parallel Port Devices</title> | ||
469 | !Iinclude/linux/parport.h | ||
470 | !Edrivers/parport/ieee1284.c | ||
471 | !Edrivers/parport/share.c | ||
472 | !Idrivers/parport/daisy.c | ||
473 | </chapter> | ||
474 | |||
417 | <chapter id="viddev"> | 475 | <chapter id="viddev"> |
418 | <title>Video4Linux</title> | 476 | <title>Video4Linux</title> |
419 | !Edrivers/media/video/videodev.c | 477 | !Edrivers/media/video/videodev.c |
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl index 158ffe9bfade..644c3884fab9 100644 --- a/Documentation/DocBook/kernel-locking.tmpl +++ b/Documentation/DocBook/kernel-locking.tmpl | |||
@@ -1590,7 +1590,7 @@ the amount of locking which needs to be done. | |||
1590 | <para> | 1590 | <para> |
1591 | Our final dilemma is this: when can we actually destroy the | 1591 | Our final dilemma is this: when can we actually destroy the |
1592 | removed element? Remember, a reader might be stepping through | 1592 | removed element? Remember, a reader might be stepping through |
1593 | this element in the list right now: it we free this element and | 1593 | this element in the list right now: if we free this element and |
1594 | the <symbol>next</symbol> pointer changes, the reader will jump | 1594 | the <symbol>next</symbol> pointer changes, the reader will jump |
1595 | off into garbage and crash. We need to wait until we know that | 1595 | off into garbage and crash. We need to wait until we know that |
1596 | all the readers who were traversing the list when we deleted the | 1596 | all the readers who were traversing the list when we deleted the |
diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl index e97c32314541..065e8dc23e3a 100644 --- a/Documentation/DocBook/libata.tmpl +++ b/Documentation/DocBook/libata.tmpl | |||
@@ -868,18 +868,18 @@ and other resources, etc. | |||
868 | 868 | ||
869 | <chapter id="libataExt"> | 869 | <chapter id="libataExt"> |
870 | <title>libata Library</title> | 870 | <title>libata Library</title> |
871 | !Edrivers/scsi/libata-core.c | 871 | !Edrivers/ata/libata-core.c |
872 | </chapter> | 872 | </chapter> |
873 | 873 | ||
874 | <chapter id="libataInt"> | 874 | <chapter id="libataInt"> |
875 | <title>libata Core Internals</title> | 875 | <title>libata Core Internals</title> |
876 | !Idrivers/scsi/libata-core.c | 876 | !Idrivers/ata/libata-core.c |
877 | </chapter> | 877 | </chapter> |
878 | 878 | ||
879 | <chapter id="libataScsiInt"> | 879 | <chapter id="libataScsiInt"> |
880 | <title>libata SCSI translation/emulation</title> | 880 | <title>libata SCSI translation/emulation</title> |
881 | !Edrivers/scsi/libata-scsi.c | 881 | !Edrivers/ata/libata-scsi.c |
882 | !Idrivers/scsi/libata-scsi.c | 882 | !Idrivers/ata/libata-scsi.c |
883 | </chapter> | 883 | </chapter> |
884 | 884 | ||
885 | <chapter id="ataExceptions"> | 885 | <chapter id="ataExceptions"> |
@@ -1600,12 +1600,12 @@ and other resources, etc. | |||
1600 | 1600 | ||
1601 | <chapter id="PiixInt"> | 1601 | <chapter id="PiixInt"> |
1602 | <title>ata_piix Internals</title> | 1602 | <title>ata_piix Internals</title> |
1603 | !Idrivers/scsi/ata_piix.c | 1603 | !Idrivers/ata/ata_piix.c |
1604 | </chapter> | 1604 | </chapter> |
1605 | 1605 | ||
1606 | <chapter id="SILInt"> | 1606 | <chapter id="SILInt"> |
1607 | <title>sata_sil Internals</title> | 1607 | <title>sata_sil Internals</title> |
1608 | !Idrivers/scsi/sata_sil.c | 1608 | !Idrivers/ata/sata_sil.c |
1609 | </chapter> | 1609 | </chapter> |
1610 | 1610 | ||
1611 | <chapter id="libataThanks"> | 1611 | <chapter id="libataThanks"> |
diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl index 6e463d0db266..a8c8cce50633 100644 --- a/Documentation/DocBook/mtdnand.tmpl +++ b/Documentation/DocBook/mtdnand.tmpl | |||
@@ -109,7 +109,7 @@ | |||
109 | for most of the implementations. These functions can be replaced by the | 109 | for most of the implementations. These functions can be replaced by the |
110 | board driver if neccecary. Those functions are called via pointers in the | 110 | board driver if neccecary. Those functions are called via pointers in the |
111 | NAND chip description structure. The board driver can set the functions which | 111 | NAND chip description structure. The board driver can set the functions which |
112 | should be replaced by board dependend functions before calling nand_scan(). | 112 | should be replaced by board dependent functions before calling nand_scan(). |
113 | If the function pointer is NULL on entry to nand_scan() then the pointer | 113 | If the function pointer is NULL on entry to nand_scan() then the pointer |
114 | is set to the default function which is suitable for the detected chip type. | 114 | is set to the default function which is suitable for the detected chip type. |
115 | </para></listitem> | 115 | </para></listitem> |
@@ -133,7 +133,7 @@ | |||
133 | [REPLACEABLE]</para><para> | 133 | [REPLACEABLE]</para><para> |
134 | Replaceable members hold hardware related functions which can be | 134 | Replaceable members hold hardware related functions which can be |
135 | provided by the board driver. The board driver can set the functions which | 135 | provided by the board driver. The board driver can set the functions which |
136 | should be replaced by board dependend functions before calling nand_scan(). | 136 | should be replaced by board dependent functions before calling nand_scan(). |
137 | If the function pointer is NULL on entry to nand_scan() then the pointer | 137 | If the function pointer is NULL on entry to nand_scan() then the pointer |
138 | is set to the default function which is suitable for the detected chip type. | 138 | is set to the default function which is suitable for the detected chip type. |
139 | </para></listitem> | 139 | </para></listitem> |
@@ -156,9 +156,8 @@ | |||
156 | <title>Basic board driver</title> | 156 | <title>Basic board driver</title> |
157 | <para> | 157 | <para> |
158 | For most boards it will be sufficient to provide just the | 158 | For most boards it will be sufficient to provide just the |
159 | basic functions and fill out some really board dependend | 159 | basic functions and fill out some really board dependent |
160 | members in the nand chip description structure. | 160 | members in the nand chip description structure. |
161 | See drivers/mtd/nand/skeleton for reference. | ||
162 | </para> | 161 | </para> |
163 | <sect1> | 162 | <sect1> |
164 | <title>Basic defines</title> | 163 | <title>Basic defines</title> |
@@ -189,9 +188,9 @@ static unsigned long baseaddr; | |||
189 | <sect1> | 188 | <sect1> |
190 | <title>Partition defines</title> | 189 | <title>Partition defines</title> |
191 | <para> | 190 | <para> |
192 | If you want to divide your device into parititions, then | 191 | If you want to divide your device into partitions, then |
193 | enable the configuration switch CONFIG_MTD_PARITIONS and define | 192 | enable the configuration switch CONFIG_MTD_PARTITIONS and define |
194 | a paritioning scheme suitable to your board. | 193 | a partitioning scheme suitable to your board. |
195 | </para> | 194 | </para> |
196 | <programlisting> | 195 | <programlisting> |
197 | #define NUM_PARTITIONS 2 | 196 | #define NUM_PARTITIONS 2 |
@@ -1295,7 +1294,9 @@ in this page</entry> | |||
1295 | </para> | 1294 | </para> |
1296 | !Idrivers/mtd/nand/nand_base.c | 1295 | !Idrivers/mtd/nand/nand_base.c |
1297 | !Idrivers/mtd/nand/nand_bbt.c | 1296 | !Idrivers/mtd/nand/nand_bbt.c |
1298 | !Idrivers/mtd/nand/nand_ecc.c | 1297 | <!-- No internal functions for kernel-doc: |
1298 | X!Idrivers/mtd/nand/nand_ecc.c | ||
1299 | --> | ||
1299 | </chapter> | 1300 | </chapter> |
1300 | 1301 | ||
1301 | <chapter id="credits"> | 1302 | <chapter id="credits"> |
diff --git a/Documentation/DocBook/usb.tmpl b/Documentation/DocBook/usb.tmpl index 320af25de3a2..3608472d7b74 100644 --- a/Documentation/DocBook/usb.tmpl +++ b/Documentation/DocBook/usb.tmpl | |||
@@ -43,59 +43,52 @@ | |||
43 | 43 | ||
44 | <para>A Universal Serial Bus (USB) is used to connect a host, | 44 | <para>A Universal Serial Bus (USB) is used to connect a host, |
45 | such as a PC or workstation, to a number of peripheral | 45 | such as a PC or workstation, to a number of peripheral |
46 | devices. USB uses a tree structure, with the host at the | 46 | devices. USB uses a tree structure, with the host as the |
47 | root (the system's master), hubs as interior nodes, and | 47 | root (the system's master), hubs as interior nodes, and |
48 | peripheral devices as leaves (and slaves). | 48 | peripherals as leaves (and slaves). |
49 | Modern PCs support several such trees of USB devices, usually | 49 | Modern PCs support several such trees of USB devices, usually |
50 | one USB 2.0 tree (480 Mbit/sec each) with | 50 | one USB 2.0 tree (480 Mbit/sec each) with |
51 | a few USB 1.1 trees (12 Mbit/sec each) that are used when you | 51 | a few USB 1.1 trees (12 Mbit/sec each) that are used when you |
52 | connect a USB 1.1 device directly to the machine's "root hub". | 52 | connect a USB 1.1 device directly to the machine's "root hub". |
53 | </para> | 53 | </para> |
54 | 54 | ||
55 | <para>That master/slave asymmetry was designed in part for | 55 | <para>That master/slave asymmetry was designed-in for a number of |
56 | ease of use. It is not physically possible to assemble | 56 | reasons, one being ease of use. It is not physically possible to |
57 | (legal) USB cables incorrectly: all upstream "to-the-host" | 57 | assemble (legal) USB cables incorrectly: all upstream "to the host" |
58 | connectors are the rectangular type, matching the sockets on | 58 | connectors are the rectangular type (matching the sockets on |
59 | root hubs, and the downstream type are the squarish type | 59 | root hubs), and all downstream connectors are the squarish type |
60 | (or they are built in to the peripheral). | 60 | (or they are built into the peripheral). |
61 | Software doesn't need to deal with distributed autoconfiguration | 61 | Also, the host software doesn't need to deal with distributed |
62 | since the pre-designated master node manages all that. | 62 | auto-configuration since the pre-designated master node manages all that. |
63 | At the electrical level, bus protocol overhead is reduced by | 63 | And finally, at the electrical level, bus protocol overhead is reduced by |
64 | eliminating arbitration and moving scheduling into host software. | 64 | eliminating arbitration and moving scheduling into the host software. |
65 | </para> | 65 | </para> |
66 | 66 | ||
67 | <para>USB 1.0 was announced in January 1996, and was revised | 67 | <para>USB 1.0 was announced in January 1996 and was revised |
68 | as USB 1.1 (with improvements in hub specification and | 68 | as USB 1.1 (with improvements in hub specification and |
69 | support for interrupt-out transfers) in September 1998. | 69 | support for interrupt-out transfers) in September 1998. |
70 | USB 2.0 was released in April 2000, including high speed | 70 | USB 2.0 was released in April 2000, adding high-speed |
71 | transfers and transaction translating hubs (used for USB 1.1 | 71 | transfers and transaction-translating hubs (used for USB 1.1 |
72 | and 1.0 backward compatibility). | 72 | and 1.0 backward compatibility). |
73 | </para> | 73 | </para> |
74 | 74 | ||
75 | <para>USB support was added to Linux early in the 2.2 kernel series | 75 | <para>Kernel developers added USB support to Linux early in the 2.2 kernel |
76 | shortly before the 2.3 development forked off. Updates | 76 | series, shortly before 2.3 development forked. Updates from 2.3 were |
77 | from 2.3 were regularly folded back into 2.2 releases, bringing | 77 | regularly folded back into 2.2 releases, which improved reliability and |
78 | new features such as <filename>/sbin/hotplug</filename> support, | 78 | brought <filename>/sbin/hotplug</filename> support as well more drivers. |
79 | more drivers, and more robustness. | 79 | Such improvements were continued in the 2.5 kernel series, where they added |
80 | The 2.5 kernel series continued such improvements, and also | 80 | USB 2.0 support, improved performance, and made the host controller drivers |
81 | worked on USB 2.0 support, | 81 | (HCDs) more consistent. They also simplified the API (to make bugs less |
82 | higher performance, | 82 | likely) and added internal "kerneldoc" documentation. |
83 | better consistency between host controller drivers, | ||
84 | API simplification (to make bugs less likely), | ||
85 | and providing internal "kerneldoc" documentation. | ||
86 | </para> | 83 | </para> |
87 | 84 | ||
88 | <para>Linux can run inside USB devices as well as on | 85 | <para>Linux can run inside USB devices as well as on |
89 | the hosts that control the devices. | 86 | the hosts that control the devices. |
90 | Because the Linux 2.x USB support evolved to support mass market | 87 | But USB device drivers running inside those peripherals |
91 | platforms such as Apple Macintosh or PC-compatible systems, | ||
92 | it didn't address design concerns for those types of USB systems. | ||
93 | So it can't be used inside mass-market PDAs, or other peripherals. | ||
94 | USB device drivers running inside those Linux peripherals | ||
95 | don't do the same things as the ones running inside hosts, | 88 | don't do the same things as the ones running inside hosts, |
96 | and so they've been given a different name: | 89 | so they've been given a different name: |
97 | they're called <emphasis>gadget drivers</emphasis>. | 90 | <emphasis>gadget drivers</emphasis>. |
98 | This document does not present gadget drivers. | 91 | This document does not cover gadget drivers. |
99 | </para> | 92 | </para> |
100 | 93 | ||
101 | </chapter> | 94 | </chapter> |
@@ -103,17 +96,14 @@ | |||
103 | <chapter id="host"> | 96 | <chapter id="host"> |
104 | <title>USB Host-Side API Model</title> | 97 | <title>USB Host-Side API Model</title> |
105 | 98 | ||
106 | <para>Within the kernel, | 99 | <para>Host-side drivers for USB devices talk to the "usbcore" APIs. |
107 | host-side drivers for USB devices talk to the "usbcore" APIs. | 100 | There are two. One is intended for |
108 | There are two types of public "usbcore" APIs, targetted at two different | 101 | <emphasis>general-purpose</emphasis> drivers (exposed through |
109 | layers of USB driver. Those are | 102 | driver frameworks), and the other is for drivers that are |
110 | <emphasis>general purpose</emphasis> drivers, exposed through | 103 | <emphasis>part of the core</emphasis>. |
111 | driver frameworks such as block, character, or network devices; | 104 | Such core drivers include the <emphasis>hub</emphasis> driver |
112 | and drivers that are <emphasis>part of the core</emphasis>, | 105 | (which manages trees of USB devices) and several different kinds |
113 | which are involved in managing a USB bus. | 106 | of <emphasis>host controller drivers</emphasis>, |
114 | Such core drivers include the <emphasis>hub</emphasis> driver, | ||
115 | which manages trees of USB devices, and several different kinds | ||
116 | of <emphasis>host controller driver (HCD)</emphasis>, | ||
117 | which control individual busses. | 107 | which control individual busses. |
118 | </para> | 108 | </para> |
119 | 109 | ||
@@ -122,21 +112,21 @@ | |||
122 | 112 | ||
123 | <itemizedlist> | 113 | <itemizedlist> |
124 | 114 | ||
125 | <listitem><para>USB supports four kinds of data transfer | 115 | <listitem><para>USB supports four kinds of data transfers |
126 | (control, bulk, interrupt, and isochronous). Two transfer | 116 | (control, bulk, interrupt, and isochronous). Two of them (control |
127 | types use bandwidth as it's available (control and bulk), | 117 | and bulk) use bandwidth as it's available, |
128 | while the other two types of transfer (interrupt and isochronous) | 118 | while the other two (interrupt and isochronous) |
129 | are scheduled to provide guaranteed bandwidth. | 119 | are scheduled to provide guaranteed bandwidth. |
130 | </para></listitem> | 120 | </para></listitem> |
131 | 121 | ||
132 | <listitem><para>The device description model includes one or more | 122 | <listitem><para>The device description model includes one or more |
133 | "configurations" per device, only one of which is active at a time. | 123 | "configurations" per device, only one of which is active at a time. |
134 | Devices that are capable of high speed operation must also support | 124 | Devices that are capable of high-speed operation must also support |
135 | full speed configurations, along with a way to ask about the | 125 | full-speed configurations, along with a way to ask about the |
136 | "other speed" configurations that might be used. | 126 | "other speed" configurations which might be used. |
137 | </para></listitem> | 127 | </para></listitem> |
138 | 128 | ||
139 | <listitem><para>Configurations have one or more "interface", each | 129 | <listitem><para>Configurations have one or more "interfaces", each |
140 | of which may have "alternate settings". Interfaces may be | 130 | of which may have "alternate settings". Interfaces may be |
141 | standardized by USB "Class" specifications, or may be specific to | 131 | standardized by USB "Class" specifications, or may be specific to |
142 | a vendor or device.</para> | 132 | a vendor or device.</para> |
@@ -162,7 +152,7 @@ | |||
162 | </para></listitem> | 152 | </para></listitem> |
163 | 153 | ||
164 | <listitem><para>The Linux USB API supports synchronous calls for | 154 | <listitem><para>The Linux USB API supports synchronous calls for |
165 | control and bulk messaging. | 155 | control and bulk messages. |
166 | It also supports asynchnous calls for all kinds of data transfer, | 156 | It also supports asynchnous calls for all kinds of data transfer, |
167 | using request structures called "URBs" (USB Request Blocks). | 157 | using request structures called "URBs" (USB Request Blocks). |
168 | </para></listitem> | 158 | </para></listitem> |
@@ -463,14 +453,25 @@ | |||
463 | file in your Linux kernel sources. | 453 | file in your Linux kernel sources. |
464 | </para> | 454 | </para> |
465 | 455 | ||
466 | <para>Otherwise the main use for this file from programs | 456 | <para>This file, in combination with the poll() system call, can |
467 | is to poll() it to get notifications of usb devices | 457 | also be used to detect when devices are added or removed: |
468 | as they're plugged or unplugged. | 458 | <programlisting>int fd; |
469 | To see what changed, you'd need to read the file and | 459 | struct pollfd pfd; |
470 | compare "before" and "after" contents, scan the filesystem, | 460 | |
471 | or see its hotplug event. | 461 | fd = open("/proc/bus/usb/devices", O_RDONLY); |
462 | pfd = { fd, POLLIN, 0 }; | ||
463 | for (;;) { | ||
464 | /* The first time through, this call will return immediately. */ | ||
465 | poll(&pfd, 1, -1); | ||
466 | |||
467 | /* To see what's changed, compare the file's previous and current | ||
468 | contents or scan the filesystem. (Scanning is more precise.) */ | ||
469 | }</programlisting> | ||
470 | Note that this behavior is intended to be used for informational | ||
471 | and debug purposes. It would be more appropriate to use programs | ||
472 | such as udev or HAL to initialize a device or start a user-mode | ||
473 | helper program, for instance. | ||
472 | </para> | 474 | </para> |
473 | |||
474 | </sect1> | 475 | </sect1> |
475 | 476 | ||
476 | <sect1> | 477 | <sect1> |
diff --git a/Documentation/DocBook/videobook.tmpl b/Documentation/DocBook/videobook.tmpl index fdff984a5161..b629da33951d 100644 --- a/Documentation/DocBook/videobook.tmpl +++ b/Documentation/DocBook/videobook.tmpl | |||
@@ -976,7 +976,7 @@ static int camera_close(struct video_device *dev) | |||
976 | <title>Interrupt Handling</title> | 976 | <title>Interrupt Handling</title> |
977 | <para> | 977 | <para> |
978 | Our example handler is for an ISA bus device. If it was PCI you would be | 978 | Our example handler is for an ISA bus device. If it was PCI you would be |
979 | able to share the interrupt and would have set SA_SHIRQ to indicate a | 979 | able to share the interrupt and would have set IRQF_SHARED to indicate a |
980 | shared IRQ. We pass the device pointer as the interrupt routine argument. We | 980 | shared IRQ. We pass the device pointer as the interrupt routine argument. We |
981 | don't need to since we only support one card but doing this will make it | 981 | don't need to since we only support one card but doing this will make it |
982 | easier to upgrade the driver for multiple devices in the future. | 982 | easier to upgrade the driver for multiple devices in the future. |
diff --git a/Documentation/HOWTO b/Documentation/HOWTO index 915ae8c986c6..1d6560413cc5 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO | |||
@@ -358,7 +358,8 @@ Here is a list of some of the different kernel trees available: | |||
358 | quilt trees: | 358 | quilt trees: |
359 | - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de> | 359 | - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de> |
360 | kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/ | 360 | kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/ |
361 | 361 | - x86-64, partly i386, Andi Kleen <ak@suse.de> | |
362 | ftp.firstfloor.org:/pub/ak/x86_64/quilt/ | ||
362 | 363 | ||
363 | Bug Reporting | 364 | Bug Reporting |
364 | ------------- | 365 | ------------- |
diff --git a/Documentation/IPMI.txt b/Documentation/IPMI.txt index bf1cf98d2a27..0256805b548f 100644 --- a/Documentation/IPMI.txt +++ b/Documentation/IPMI.txt | |||
@@ -10,7 +10,7 @@ standard for controlling intelligent devices that monitor a system. | |||
10 | It provides for dynamic discovery of sensors in the system and the | 10 | It provides for dynamic discovery of sensors in the system and the |
11 | ability to monitor the sensors and be informed when the sensor's | 11 | ability to monitor the sensors and be informed when the sensor's |
12 | values change or go outside certain boundaries. It also has a | 12 | values change or go outside certain boundaries. It also has a |
13 | standardized database for field-replacable units (FRUs) and a watchdog | 13 | standardized database for field-replaceable units (FRUs) and a watchdog |
14 | timer. | 14 | timer. |
15 | 15 | ||
16 | To use this, you need an interface to an IPMI controller in your | 16 | To use this, you need an interface to an IPMI controller in your |
@@ -64,7 +64,7 @@ situation, you need to read the section below named 'The SI Driver' or | |||
64 | IPMI defines a standard watchdog timer. You can enable this with the | 64 | IPMI defines a standard watchdog timer. You can enable this with the |
65 | 'IPMI Watchdog Timer' config option. If you compile the driver into | 65 | 'IPMI Watchdog Timer' config option. If you compile the driver into |
66 | the kernel, then via a kernel command-line option you can have the | 66 | the kernel, then via a kernel command-line option you can have the |
67 | watchdog timer start as soon as it intitializes. It also have a lot | 67 | watchdog timer start as soon as it initializes. It also have a lot |
68 | of other options, see the 'Watchdog' section below for more details. | 68 | of other options, see the 'Watchdog' section below for more details. |
69 | Note that you can also have the watchdog continue to run if it is | 69 | Note that you can also have the watchdog continue to run if it is |
70 | closed (by default it is disabled on close). Go into the 'Watchdog | 70 | closed (by default it is disabled on close). Go into the 'Watchdog |
diff --git a/Documentation/IRQ.txt b/Documentation/IRQ.txt new file mode 100644 index 000000000000..1011e7175021 --- /dev/null +++ b/Documentation/IRQ.txt | |||
@@ -0,0 +1,22 @@ | |||
1 | What is an IRQ? | ||
2 | |||
3 | An IRQ is an interrupt request from a device. | ||
4 | Currently they can come in over a pin, or over a packet. | ||
5 | Several devices may be connected to the same pin thus | ||
6 | sharing an IRQ. | ||
7 | |||
8 | An IRQ number is a kernel identifier used to talk about a hardware | ||
9 | interrupt source. Typically this is an index into the global irq_desc | ||
10 | array, but except for what linux/interrupt.h implements the details | ||
11 | are architecture specific. | ||
12 | |||
13 | An IRQ number is an enumeration of the possible interrupt sources on a | ||
14 | machine. Typically what is enumerated is the number of input pins on | ||
15 | all of the interrupt controller in the system. In the case of ISA | ||
16 | what is enumerated are the 16 input pins on the two i8259 interrupt | ||
17 | controllers. | ||
18 | |||
19 | Architectures can assign additional meaning to the IRQ numbers, and | ||
20 | are encouraged to in the case where there is any manual configuration | ||
21 | of the hardware involved. The ISA IRQs are a classic example of | ||
22 | assigning this kind of additional meaning. | ||
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 49e27cc19385..1d50cf0c905e 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt | |||
@@ -144,9 +144,47 @@ over a rather long period of time, but improvements are always welcome! | |||
144 | whether the increased speed is worth it. | 144 | whether the increased speed is worth it. |
145 | 145 | ||
146 | 8. Although synchronize_rcu() is a bit slower than is call_rcu(), | 146 | 8. Although synchronize_rcu() is a bit slower than is call_rcu(), |
147 | it usually results in simpler code. So, unless update performance | 147 | it usually results in simpler code. So, unless update |
148 | is important or the updaters cannot block, synchronize_rcu() | 148 | performance is critically important or the updaters cannot block, |
149 | should be used in preference to call_rcu(). | 149 | synchronize_rcu() should be used in preference to call_rcu(). |
150 | |||
151 | An especially important property of the synchronize_rcu() | ||
152 | primitive is that it automatically self-limits: if grace periods | ||
153 | are delayed for whatever reason, then the synchronize_rcu() | ||
154 | primitive will correspondingly delay updates. In contrast, | ||
155 | code using call_rcu() should explicitly limit update rate in | ||
156 | cases where grace periods are delayed, as failing to do so can | ||
157 | result in excessive realtime latencies or even OOM conditions. | ||
158 | |||
159 | Ways of gaining this self-limiting property when using call_rcu() | ||
160 | include: | ||
161 | |||
162 | a. Keeping a count of the number of data-structure elements | ||
163 | used by the RCU-protected data structure, including those | ||
164 | waiting for a grace period to elapse. Enforce a limit | ||
165 | on this number, stalling updates as needed to allow | ||
166 | previously deferred frees to complete. | ||
167 | |||
168 | Alternatively, limit only the number awaiting deferred | ||
169 | free rather than the total number of elements. | ||
170 | |||
171 | b. Limiting update rate. For example, if updates occur only | ||
172 | once per hour, then no explicit rate limiting is required, | ||
173 | unless your system is already badly broken. The dcache | ||
174 | subsystem takes this approach -- updates are guarded | ||
175 | by a global lock, limiting their rate. | ||
176 | |||
177 | c. Trusted update -- if updates can only be done manually by | ||
178 | superuser or some other trusted user, then it might not | ||
179 | be necessary to automatically limit them. The theory | ||
180 | here is that superuser already has lots of ways to crash | ||
181 | the machine. | ||
182 | |||
183 | d. Use call_rcu_bh() rather than call_rcu(), in order to take | ||
184 | advantage of call_rcu_bh()'s faster grace periods. | ||
185 | |||
186 | e. Periodically invoke synchronize_rcu(), permitting a limited | ||
187 | number of updates per grace period. | ||
150 | 188 | ||
151 | 9. All RCU list-traversal primitives, which include | 189 | 9. All RCU list-traversal primitives, which include |
152 | list_for_each_rcu(), list_for_each_entry_rcu(), | 190 | list_for_each_rcu(), list_for_each_entry_rcu(), |
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index e4c38152f7f7..a4948591607d 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt | |||
@@ -7,7 +7,7 @@ The CONFIG_RCU_TORTURE_TEST config option is available for all RCU | |||
7 | implementations. It creates an rcutorture kernel module that can | 7 | implementations. It creates an rcutorture kernel module that can |
8 | be loaded to run a torture test. The test periodically outputs | 8 | be loaded to run a torture test. The test periodically outputs |
9 | status messages via printk(), which can be examined via the dmesg | 9 | status messages via printk(), which can be examined via the dmesg |
10 | command (perhaps grepping for "rcutorture"). The test is started | 10 | command (perhaps grepping for "torture"). The test is started |
11 | when the module is loaded, and stops when the module is unloaded. | 11 | when the module is loaded, and stops when the module is unloaded. |
12 | 12 | ||
13 | However, actually setting this config option to "y" results in the system | 13 | However, actually setting this config option to "y" results in the system |
@@ -35,6 +35,19 @@ stat_interval The number of seconds between output of torture | |||
35 | be printed -only- when the module is unloaded, and this | 35 | be printed -only- when the module is unloaded, and this |
36 | is the default. | 36 | is the default. |
37 | 37 | ||
38 | shuffle_interval | ||
39 | The number of seconds to keep the test threads affinitied | ||
40 | to a particular subset of the CPUs. Used in conjunction | ||
41 | with test_no_idle_hz. | ||
42 | |||
43 | test_no_idle_hz Whether or not to test the ability of RCU to operate in | ||
44 | a kernel that disables the scheduling-clock interrupt to | ||
45 | idle CPUs. Boolean parameter, "1" to test, "0" otherwise. | ||
46 | |||
47 | torture_type The type of RCU to test: "rcu" for the rcu_read_lock() | ||
48 | API, "rcu_bh" for the rcu_read_lock_bh() API, and "srcu" | ||
49 | for the "srcu_read_lock()" API. | ||
50 | |||
38 | verbose Enable debug printk()s. Default is disabled. | 51 | verbose Enable debug printk()s. Default is disabled. |
39 | 52 | ||
40 | 53 | ||
@@ -42,14 +55,14 @@ OUTPUT | |||
42 | 55 | ||
43 | The statistics output is as follows: | 56 | The statistics output is as follows: |
44 | 57 | ||
45 | rcutorture: --- Start of test: nreaders=16 stat_interval=0 verbose=0 | 58 | rcu-torture: --- Start of test: nreaders=16 stat_interval=0 verbose=0 |
46 | rcutorture: rtc: 0000000000000000 ver: 1916 tfle: 0 rta: 1916 rtaf: 0 rtf: 1915 | 59 | rcu-torture: rtc: 0000000000000000 ver: 1916 tfle: 0 rta: 1916 rtaf: 0 rtf: 1915 |
47 | rcutorture: Reader Pipe: 1466408 9747 0 0 0 0 0 0 0 0 0 | 60 | rcu-torture: Reader Pipe: 1466408 9747 0 0 0 0 0 0 0 0 0 |
48 | rcutorture: Reader Batch: 1464477 11678 0 0 0 0 0 0 0 0 | 61 | rcu-torture: Reader Batch: 1464477 11678 0 0 0 0 0 0 0 0 |
49 | rcutorture: Free-Block Circulation: 1915 1915 1915 1915 1915 1915 1915 1915 1915 1915 0 | 62 | rcu-torture: Free-Block Circulation: 1915 1915 1915 1915 1915 1915 1915 1915 1915 1915 0 |
50 | rcutorture: --- End of test | 63 | rcu-torture: --- End of test |
51 | 64 | ||
52 | The command "dmesg | grep rcutorture:" will extract this information on | 65 | The command "dmesg | grep torture:" will extract this information on |
53 | most systems. On more esoteric configurations, it may be necessary to | 66 | most systems. On more esoteric configurations, it may be necessary to |
54 | use other commands to access the output of the printk()s used by | 67 | use other commands to access the output of the printk()s used by |
55 | the RCU torture test. The printk()s use KERN_ALERT, so they should | 68 | the RCU torture test. The printk()s use KERN_ALERT, so they should |
@@ -115,8 +128,9 @@ The following script may be used to torture RCU: | |||
115 | modprobe rcutorture | 128 | modprobe rcutorture |
116 | sleep 100 | 129 | sleep 100 |
117 | rmmod rcutorture | 130 | rmmod rcutorture |
118 | dmesg | grep rcutorture: | 131 | dmesg | grep torture: |
119 | 132 | ||
120 | The output can be manually inspected for the error flag of "!!!". | 133 | The output can be manually inspected for the error flag of "!!!". |
121 | One could of course create a more elaborate script that automatically | 134 | One could of course create a more elaborate script that automatically |
122 | checked for such errors. | 135 | checked for such errors. The "rmmod" command forces a "SUCCESS" or |
136 | "FAILURE" indication to be printk()ed. | ||
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 6e459420ee9f..318df44259b3 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt | |||
@@ -184,7 +184,17 @@ synchronize_rcu() | |||
184 | blocking, it registers a function and argument which are invoked | 184 | blocking, it registers a function and argument which are invoked |
185 | after all ongoing RCU read-side critical sections have completed. | 185 | after all ongoing RCU read-side critical sections have completed. |
186 | This callback variant is particularly useful in situations where | 186 | This callback variant is particularly useful in situations where |
187 | it is illegal to block. | 187 | it is illegal to block or where update-side performance is |
188 | critically important. | ||
189 | |||
190 | However, the call_rcu() API should not be used lightly, as use | ||
191 | of the synchronize_rcu() API generally results in simpler code. | ||
192 | In addition, the synchronize_rcu() API has the nice property | ||
193 | of automatically limiting update rate should grace periods | ||
194 | be delayed. This property results in system resilience in face | ||
195 | of denial-of-service attacks. Code using call_rcu() should limit | ||
196 | update rate in order to gain this same sort of resilience. See | ||
197 | checklist.txt for some approaches to limiting the update rate. | ||
188 | 198 | ||
189 | rcu_assign_pointer() | 199 | rcu_assign_pointer() |
190 | 200 | ||
@@ -677,8 +687,9 @@ diff shows how closely related RCU and reader-writer locking can be. | |||
677 | + spin_lock(&listmutex); | 687 | + spin_lock(&listmutex); |
678 | list_for_each_entry(p, head, lp) { | 688 | list_for_each_entry(p, head, lp) { |
679 | if (p->key == key) { | 689 | if (p->key == key) { |
680 | list_del(&p->list); | 690 | - list_del(&p->list); |
681 | - write_unlock(&listmutex); | 691 | - write_unlock(&listmutex); |
692 | + list_del_rcu(&p->list); | ||
682 | + spin_unlock(&listmutex); | 693 | + spin_unlock(&listmutex); |
683 | + synchronize_rcu(); | 694 | + synchronize_rcu(); |
684 | kfree(p); | 695 | kfree(p); |
@@ -726,7 +737,7 @@ Or, for those who prefer a side-by-side listing: | |||
726 | 5 write_lock(&listmutex); 5 spin_lock(&listmutex); | 737 | 5 write_lock(&listmutex); 5 spin_lock(&listmutex); |
727 | 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { | 738 | 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { |
728 | 7 if (p->key == key) { 7 if (p->key == key) { | 739 | 7 if (p->key == key) { 7 if (p->key == key) { |
729 | 8 list_del(&p->list); 8 list_del(&p->list); | 740 | 8 list_del(&p->list); 8 list_del_rcu(&p->list); |
730 | 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); | 741 | 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); |
731 | 10 synchronize_rcu(); | 742 | 10 synchronize_rcu(); |
732 | 10 kfree(p); 11 kfree(p); | 743 | 10 kfree(p); 11 kfree(p); |
diff --git a/Documentation/README.DAC960 b/Documentation/README.DAC960 index 98ea617a0dd6..0e8f618ab534 100644 --- a/Documentation/README.DAC960 +++ b/Documentation/README.DAC960 | |||
@@ -78,9 +78,9 @@ also known as "System Drives", and Drive Groups are also called "Packs". Both | |||
78 | terms are in use in the Mylex documentation; I have chosen to standardize on | 78 | terms are in use in the Mylex documentation; I have chosen to standardize on |
79 | the more generic "Logical Drive" and "Drive Group". | 79 | the more generic "Logical Drive" and "Drive Group". |
80 | 80 | ||
81 | DAC960 RAID disk devices are named in the style of the Device File System | 81 | DAC960 RAID disk devices are named in the style of the obsolete Device File |
82 | (DEVFS). The device corresponding to Logical Drive D on Controller C is | 82 | System (DEVFS). The device corresponding to Logical Drive D on Controller C |
83 | referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1 | 83 | is referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1 |
84 | through /dev/rd/cCdDp7. For example, partition 3 of Logical Drive 5 on | 84 | through /dev/rd/cCdDp7. For example, partition 3 of Logical Drive 5 on |
85 | Controller 2 is referred to as /dev/rd/c2d5p3. Note that unlike with SCSI | 85 | Controller 2 is referred to as /dev/rd/c2d5p3. Note that unlike with SCSI |
86 | disks the device names will not change in the event of a disk drive failure. | 86 | disks the device names will not change in the event of a disk drive failure. |
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist index 8230098da529..a6cb6ffd2933 100644 --- a/Documentation/SubmitChecklist +++ b/Documentation/SubmitChecklist | |||
@@ -1,57 +1,66 @@ | |||
1 | Linux Kernel patch sumbittal checklist | 1 | Linux Kernel patch sumbittal checklist |
2 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 2 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
3 | 3 | ||
4 | Here are some basic things that developers should do if they | 4 | Here are some basic things that developers should do if they want to see their |
5 | want to see their kernel patch submittals accepted quicker. | 5 | kernel patch submissions accepted more quickly. |
6 | 6 | ||
7 | These are all above and beyond the documentation that is provided | 7 | These are all above and beyond the documentation that is provided in |
8 | in Documentation/SubmittingPatches and elsewhere about submitting | 8 | Documentation/SubmittingPatches and elsewhere regarding submitting Linux |
9 | Linux kernel patches. | 9 | kernel patches. |
10 | 10 | ||
11 | 11 | ||
12 | 12 | ||
13 | - Builds cleanly with applicable or modified CONFIG options =y, =m, and =n. | 13 | 1: Builds cleanly with applicable or modified CONFIG options =y, =m, and |
14 | No gcc warnings/errors, no linker warnings/errors. | 14 | =n. No gcc warnings/errors, no linker warnings/errors. |
15 | 15 | ||
16 | - Passes allnoconfig, allmodconfig | 16 | 2: Passes allnoconfig, allmodconfig |
17 | 17 | ||
18 | - Builds on multiple CPU arch-es by using local cross-compile tools | 18 | 3: Builds on multiple CPU architectures by using local cross-compile tools |
19 | or something like PLM at OSDL. | 19 | or something like PLM at OSDL. |
20 | 20 | ||
21 | - ppc64 is a good architecture for cross-compilation checking because it | 21 | 4: ppc64 is a good architecture for cross-compilation checking because it |
22 | tends to use `unsigned long' for 64-bit quantities. | 22 | tends to use `unsigned long' for 64-bit quantities. |
23 | 23 | ||
24 | - Matches kernel coding style(!) | 24 | 5: Matches kernel coding style(!) |
25 | 25 | ||
26 | - Any new or modified CONFIG options don't muck up the config menu. | 26 | 6: Any new or modified CONFIG options don't muck up the config menu. |
27 | 27 | ||
28 | - All new Kconfig options have help text. | 28 | 7: All new Kconfig options have help text. |
29 | 29 | ||
30 | - Has been carefully reviewed with respect to relevant Kconfig | 30 | 8: Has been carefully reviewed with respect to relevant Kconfig |
31 | combinations. This is very hard to get right with testing -- | 31 | combinations. This is very hard to get right with testing -- brainpower |
32 | brainpower pays off here. | 32 | pays off here. |
33 | 33 | ||
34 | - Check cleanly with sparse. | 34 | 9: Check cleanly with sparse. |
35 | 35 | ||
36 | - Use 'make checkstack' and 'make namespacecheck' and fix any | 36 | 10: Use 'make checkstack' and 'make namespacecheck' and fix any problems |
37 | problems that they find. Note: checkstack does not point out | 37 | that they find. Note: checkstack does not point out problems explicitly, |
38 | problems explicitly, but any one function that uses more than | 38 | but any one function that uses more than 512 bytes on the stack is a |
39 | 512 bytes on the stack is a candidate for change. | 39 | candidate for change. |
40 | 40 | ||
41 | - Include kernel-doc to document global kernel APIs. (Not required | 41 | 11: Include kernel-doc to document global kernel APIs. (Not required for |
42 | for static functions, but OK there also.) Use 'make htmldocs' | 42 | static functions, but OK there also.) Use 'make htmldocs' or 'make |
43 | or 'make mandocs' to check the kernel-doc and fix any issues. | 43 | mandocs' to check the kernel-doc and fix any issues. |
44 | 44 | ||
45 | - Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, | 45 | 12: Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, |
46 | CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, | 46 | CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, |
47 | CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously | 47 | CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously |
48 | enabled. | 48 | enabled. |
49 | 49 | ||
50 | - Has been build- and runtime tested with and without CONFIG_SMP and | 50 | 13: Has been build- and runtime tested with and without CONFIG_SMP and |
51 | CONFIG_PREEMPT. | 51 | CONFIG_PREEMPT. |
52 | 52 | ||
53 | - If the patch affects IO/Disk, etc: has been tested with and without | 53 | 14: If the patch affects IO/Disk, etc: has been tested with and without |
54 | CONFIG_LBD. | 54 | CONFIG_LBD. |
55 | 55 | ||
56 | 15: All codepaths have been exercised with all lockdep features enabled. | ||
56 | 57 | ||
57 | 2006-APR-27 | 58 | 16: All new /proc entries are documented under Documentation/ |
59 | |||
60 | 17: All new kernel boot parameters are documented in | ||
61 | Documentation/kernel-parameters.txt. | ||
62 | |||
63 | 18: All new module parameters are documented with MODULE_PARM_DESC() | ||
64 | |||
65 | 19: All new userspace interfaces are documented in Documentation/ABI/. | ||
66 | See Documentation/ABI/README for more information. | ||
diff --git a/Documentation/SubmittingDrivers b/Documentation/SubmittingDrivers index 6bd30fdd0786..58bead05eabb 100644 --- a/Documentation/SubmittingDrivers +++ b/Documentation/SubmittingDrivers | |||
@@ -59,11 +59,11 @@ Copyright: The copyright owner must agree to use of GPL. | |||
59 | are the same person/entity. If not, the name of | 59 | are the same person/entity. If not, the name of |
60 | the person/entity authorizing use of GPL should be | 60 | the person/entity authorizing use of GPL should be |
61 | listed in case it's necessary to verify the will of | 61 | listed in case it's necessary to verify the will of |
62 | the copright owner. | 62 | the copyright owner. |
63 | 63 | ||
64 | Interfaces: If your driver uses existing interfaces and behaves like | 64 | Interfaces: If your driver uses existing interfaces and behaves like |
65 | other drivers in the same class it will be much more likely | 65 | other drivers in the same class it will be much more likely |
66 | to be accepted than if it invents gratuitous new ones. | 66 | to be accepted than if it invents gratuitous new ones. |
67 | If you need to implement a common API over Linux and NT | 67 | If you need to implement a common API over Linux and NT |
68 | drivers do it in userspace. | 68 | drivers do it in userspace. |
69 | 69 | ||
@@ -88,7 +88,7 @@ Clarity: It helps if anyone can see how to fix the driver. It helps | |||
88 | it will go in the bitbucket. | 88 | it will go in the bitbucket. |
89 | 89 | ||
90 | Control: In general if there is active maintainance of a driver by | 90 | Control: In general if there is active maintainance of a driver by |
91 | the author then patches will be redirected to them unless | 91 | the author then patches will be redirected to them unless |
92 | they are totally obvious and without need of checking. | 92 | they are totally obvious and without need of checking. |
93 | If you want to be the contact and update point for the | 93 | If you want to be the contact and update point for the |
94 | driver it is a good idea to state this in the comments, | 94 | driver it is a good idea to state this in the comments, |
@@ -100,7 +100,7 @@ What Criteria Do Not Determine Acceptance | |||
100 | Vendor: Being the hardware vendor and maintaining the driver is | 100 | Vendor: Being the hardware vendor and maintaining the driver is |
101 | often a good thing. If there is a stable working driver from | 101 | often a good thing. If there is a stable working driver from |
102 | other people already in the tree don't expect 'we are the | 102 | other people already in the tree don't expect 'we are the |
103 | vendor' to get your driver chosen. Ideally work with the | 103 | vendor' to get your driver chosen. Ideally work with the |
104 | existing driver author to build a single perfect driver. | 104 | existing driver author to build a single perfect driver. |
105 | 105 | ||
106 | Author: It doesn't matter if a large Linux company wrote the driver, | 106 | Author: It doesn't matter if a large Linux company wrote the driver, |
@@ -116,17 +116,13 @@ Linux kernel master tree: | |||
116 | ftp.??.kernel.org:/pub/linux/kernel/... | 116 | ftp.??.kernel.org:/pub/linux/kernel/... |
117 | ?? == your country code, such as "us", "uk", "fr", etc. | 117 | ?? == your country code, such as "us", "uk", "fr", etc. |
118 | 118 | ||
119 | Linux kernel mailing list: | 119 | Linux kernel mailing list: |
120 | linux-kernel@vger.kernel.org | 120 | linux-kernel@vger.kernel.org |
121 | [mail majordomo@vger.kernel.org to subscribe] | 121 | [mail majordomo@vger.kernel.org to subscribe] |
122 | 122 | ||
123 | Linux Device Drivers, Third Edition (covers 2.6.10): | 123 | Linux Device Drivers, Third Edition (covers 2.6.10): |
124 | http://lwn.net/Kernel/LDD3/ (free version) | 124 | http://lwn.net/Kernel/LDD3/ (free version) |
125 | 125 | ||
126 | Kernel traffic: | ||
127 | Weekly summary of kernel list activity (much easier to read) | ||
128 | http://www.kerneltraffic.org/kernel-traffic/ | ||
129 | |||
130 | LWN.net: | 126 | LWN.net: |
131 | Weekly summary of kernel development activity - http://lwn.net/ | 127 | Weekly summary of kernel development activity - http://lwn.net/ |
132 | 2.6 API changes: | 128 | 2.6 API changes: |
@@ -145,11 +141,8 @@ KernelNewbies: | |||
145 | Linux USB project: | 141 | Linux USB project: |
146 | http://www.linux-usb.org/ | 142 | http://www.linux-usb.org/ |
147 | 143 | ||
148 | How to NOT write kernel driver by arjanv@redhat.com | 144 | How to NOT write kernel driver by Arjan van de Ven: |
149 | http://people.redhat.com/arjanv/olspaper.pdf | 145 | http://www.fenrus.org/how-to-not-write-a-device-driver-paper.pdf |
150 | 146 | ||
151 | Kernel Janitor: | 147 | Kernel Janitor: |
152 | http://janitor.kernelnewbies.org/ | 148 | http://janitor.kernelnewbies.org/ |
153 | |||
154 | -- | ||
155 | Last updated on 17 Nov 2005. | ||
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index c2c85bcb3d43..302d148c2e18 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches | |||
@@ -10,7 +10,9 @@ kernel, the process can sometimes be daunting if you're not familiar | |||
10 | with "the system." This text is a collection of suggestions which | 10 | with "the system." This text is a collection of suggestions which |
11 | can greatly increase the chances of your change being accepted. | 11 | can greatly increase the chances of your change being accepted. |
12 | 12 | ||
13 | If you are submitting a driver, also read Documentation/SubmittingDrivers. | 13 | Read Documentation/SubmitChecklist for a list of items to check |
14 | before submitting code. If you are submitting a driver, also read | ||
15 | Documentation/SubmittingDrivers. | ||
14 | 16 | ||
15 | 17 | ||
16 | 18 | ||
@@ -74,9 +76,6 @@ There are a number of scripts which can aid in this: | |||
74 | Quilt: | 76 | Quilt: |
75 | http://savannah.nongnu.org/projects/quilt | 77 | http://savannah.nongnu.org/projects/quilt |
76 | 78 | ||
77 | Randy Dunlap's patch scripts: | ||
78 | http://www.xenotime.net/linux/scripts/patching-scripts-002.tar.gz | ||
79 | |||
80 | Andrew Morton's patch scripts: | 79 | Andrew Morton's patch scripts: |
81 | http://www.zip.com.au/~akpm/linux/patches/ | 80 | http://www.zip.com.au/~akpm/linux/patches/ |
82 | Instead of these scripts, quilt is the recommended patch management | 81 | Instead of these scripts, quilt is the recommended patch management |
@@ -174,15 +173,15 @@ For small patches you may want to CC the Trivial Patch Monkey | |||
174 | trivial@kernel.org managed by Adrian Bunk; which collects "trivial" | 173 | trivial@kernel.org managed by Adrian Bunk; which collects "trivial" |
175 | patches. Trivial patches must qualify for one of the following rules: | 174 | patches. Trivial patches must qualify for one of the following rules: |
176 | Spelling fixes in documentation | 175 | Spelling fixes in documentation |
177 | Spelling fixes which could break grep(1). | 176 | Spelling fixes which could break grep(1) |
178 | Warning fixes (cluttering with useless warnings is bad) | 177 | Warning fixes (cluttering with useless warnings is bad) |
179 | Compilation fixes (only if they are actually correct) | 178 | Compilation fixes (only if they are actually correct) |
180 | Runtime fixes (only if they actually fix things) | 179 | Runtime fixes (only if they actually fix things) |
181 | Removing use of deprecated functions/macros (eg. check_region). | 180 | Removing use of deprecated functions/macros (eg. check_region) |
182 | Contact detail and documentation fixes | 181 | Contact detail and documentation fixes |
183 | Non-portable code replaced by portable code (even in arch-specific, | 182 | Non-portable code replaced by portable code (even in arch-specific, |
184 | since people copy, as long as it's trivial) | 183 | since people copy, as long as it's trivial) |
185 | Any fix by the author/maintainer of the file. (ie. patch monkey | 184 | Any fix by the author/maintainer of the file (ie. patch monkey |
186 | in re-transmission mode) | 185 | in re-transmission mode) |
187 | URL: <http://www.kernel.org/pub/linux/kernel/people/bunk/trivial/> | 186 | URL: <http://www.kernel.org/pub/linux/kernel/people/bunk/trivial/> |
188 | 187 | ||
@@ -210,6 +209,19 @@ Exception: If your mailer is mangling patches then someone may ask | |||
210 | you to re-send them using MIME. | 209 | you to re-send them using MIME. |
211 | 210 | ||
212 | 211 | ||
212 | WARNING: Some mailers like Mozilla send your messages with | ||
213 | ---- message header ---- | ||
214 | Content-Type: text/plain; charset=us-ascii; format=flowed | ||
215 | ---- message header ---- | ||
216 | The problem is that "format=flowed" makes some of the mailers | ||
217 | on receiving side to replace TABs with spaces and do similar | ||
218 | changes. Thus the patches from you can look corrupted. | ||
219 | |||
220 | To fix this just make your mozilla defaults/pref/mailnews.js file to look like: | ||
221 | pref("mailnews.send_plaintext_flowed", false); // RFC 2646======= | ||
222 | pref("mailnews.display.disable_format_flowed_support", true); | ||
223 | |||
224 | |||
213 | 225 | ||
214 | 7) E-mail size. | 226 | 7) E-mail size. |
215 | 227 | ||
@@ -246,13 +258,13 @@ updated change. | |||
246 | It is quite common for Linus to "drop" your patch without comment. | 258 | It is quite common for Linus to "drop" your patch without comment. |
247 | That's the nature of the system. If he drops your patch, it could be | 259 | That's the nature of the system. If he drops your patch, it could be |
248 | due to | 260 | due to |
249 | * Your patch did not apply cleanly to the latest kernel version | 261 | * Your patch did not apply cleanly to the latest kernel version. |
250 | * Your patch was not sufficiently discussed on linux-kernel. | 262 | * Your patch was not sufficiently discussed on linux-kernel. |
251 | * A style issue (see section 2), | 263 | * A style issue (see section 2). |
252 | * An e-mail formatting issue (re-read this section) | 264 | * An e-mail formatting issue (re-read this section). |
253 | * A technical problem with your change | 265 | * A technical problem with your change. |
254 | * He gets tons of e-mail, and yours got lost in the shuffle | 266 | * He gets tons of e-mail, and yours got lost in the shuffle. |
255 | * You are being annoying (See Figure 1) | 267 | * You are being annoying. |
256 | 268 | ||
257 | When in doubt, solicit comments on linux-kernel mailing list. | 269 | When in doubt, solicit comments on linux-kernel mailing list. |
258 | 270 | ||
@@ -309,6 +321,8 @@ then you just add a line saying | |||
309 | 321 | ||
310 | Signed-off-by: Random J Developer <random@developer.example.org> | 322 | Signed-off-by: Random J Developer <random@developer.example.org> |
311 | 323 | ||
324 | using your real name (sorry, no pseudonyms or anonymous contributions.) | ||
325 | |||
312 | Some people also put extra tags at the end. They'll just be ignored for | 326 | Some people also put extra tags at the end. They'll just be ignored for |
313 | now, but you can do this to mark internal company procedures or just | 327 | now, but you can do this to mark internal company procedures or just |
314 | point out some special detail about the sign-off. | 328 | point out some special detail about the sign-off. |
@@ -475,22 +489,21 @@ SECTION 3 - REFERENCES | |||
475 | Andrew Morton, "The perfect patch" (tpp). | 489 | Andrew Morton, "The perfect patch" (tpp). |
476 | <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt> | 490 | <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt> |
477 | 491 | ||
478 | Jeff Garzik, "Linux kernel patch submission format." | 492 | Jeff Garzik, "Linux kernel patch submission format". |
479 | <http://linux.yyz.us/patch-format.html> | 493 | <http://linux.yyz.us/patch-format.html> |
480 | 494 | ||
481 | Greg Kroah-Hartman "How to piss off a kernel subsystem maintainer". | 495 | Greg Kroah-Hartman, "How to piss off a kernel subsystem maintainer". |
482 | <http://www.kroah.com/log/2005/03/31/> | 496 | <http://www.kroah.com/log/2005/03/31/> |
483 | <http://www.kroah.com/log/2005/07/08/> | 497 | <http://www.kroah.com/log/2005/07/08/> |
484 | <http://www.kroah.com/log/2005/10/19/> | 498 | <http://www.kroah.com/log/2005/10/19/> |
485 | <http://www.kroah.com/log/2006/01/11/> | 499 | <http://www.kroah.com/log/2006/01/11/> |
486 | 500 | ||
487 | NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!. | 501 | NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people! |
488 | <http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2> | 502 | <http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2> |
489 | 503 | ||
490 | Kernel Documentation/CodingStyle | 504 | Kernel Documentation/CodingStyle: |
491 | <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle> | 505 | <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle> |
492 | 506 | ||
493 | Linus Torvald's mail on the canonical patch format: | 507 | Linus Torvalds's mail on the canonical patch format: |
494 | <http://lkml.org/lkml/2005/4/7/183> | 508 | <http://lkml.org/lkml/2005/4/7/183> |
495 | -- | 509 | -- |
496 | Last updated on 17 Nov 2005. | ||
diff --git a/Documentation/accounting/delay-accounting.txt b/Documentation/accounting/delay-accounting.txt new file mode 100644 index 000000000000..1443cd71d263 --- /dev/null +++ b/Documentation/accounting/delay-accounting.txt | |||
@@ -0,0 +1,112 @@ | |||
1 | Delay accounting | ||
2 | ---------------- | ||
3 | |||
4 | Tasks encounter delays in execution when they wait | ||
5 | for some kernel resource to become available e.g. a | ||
6 | runnable task may wait for a free CPU to run on. | ||
7 | |||
8 | The per-task delay accounting functionality measures | ||
9 | the delays experienced by a task while | ||
10 | |||
11 | a) waiting for a CPU (while being runnable) | ||
12 | b) completion of synchronous block I/O initiated by the task | ||
13 | c) swapping in pages | ||
14 | |||
15 | and makes these statistics available to userspace through | ||
16 | the taskstats interface. | ||
17 | |||
18 | Such delays provide feedback for setting a task's cpu priority, | ||
19 | io priority and rss limit values appropriately. Long delays for | ||
20 | important tasks could be a trigger for raising its corresponding priority. | ||
21 | |||
22 | The functionality, through its use of the taskstats interface, also provides | ||
23 | delay statistics aggregated for all tasks (or threads) belonging to a | ||
24 | thread group (corresponding to a traditional Unix process). This is a commonly | ||
25 | needed aggregation that is more efficiently done by the kernel. | ||
26 | |||
27 | Userspace utilities, particularly resource management applications, can also | ||
28 | aggregate delay statistics into arbitrary groups. To enable this, delay | ||
29 | statistics of a task are available both during its lifetime as well as on its | ||
30 | exit, ensuring continuous and complete monitoring can be done. | ||
31 | |||
32 | |||
33 | Interface | ||
34 | --------- | ||
35 | |||
36 | Delay accounting uses the taskstats interface which is described | ||
37 | in detail in a separate document in this directory. Taskstats returns a | ||
38 | generic data structure to userspace corresponding to per-pid and per-tgid | ||
39 | statistics. The delay accounting functionality populates specific fields of | ||
40 | this structure. See | ||
41 | include/linux/taskstats.h | ||
42 | for a description of the fields pertaining to delay accounting. | ||
43 | It will generally be in the form of counters returning the cumulative | ||
44 | delay seen for cpu, sync block I/O, swapin etc. | ||
45 | |||
46 | Taking the difference of two successive readings of a given | ||
47 | counter (say cpu_delay_total) for a task will give the delay | ||
48 | experienced by the task waiting for the corresponding resource | ||
49 | in that interval. | ||
50 | |||
51 | When a task exits, records containing the per-task statistics | ||
52 | are sent to userspace without requiring a command. If it is the last exiting | ||
53 | task of a thread group, the per-tgid statistics are also sent. More details | ||
54 | are given in the taskstats interface description. | ||
55 | |||
56 | The getdelays.c userspace utility in this directory allows simple commands to | ||
57 | be run and the corresponding delay statistics to be displayed. It also serves | ||
58 | as an example of using the taskstats interface. | ||
59 | |||
60 | Usage | ||
61 | ----- | ||
62 | |||
63 | Compile the kernel with | ||
64 | CONFIG_TASK_DELAY_ACCT=y | ||
65 | CONFIG_TASKSTATS=y | ||
66 | |||
67 | Delay accounting is enabled by default at boot up. | ||
68 | To disable, add | ||
69 | nodelayacct | ||
70 | to the kernel boot options. The rest of the instructions | ||
71 | below assume this has not been done. | ||
72 | |||
73 | After the system has booted up, use a utility | ||
74 | similar to getdelays.c to access the delays | ||
75 | seen by a given task or a task group (tgid). | ||
76 | The utility also allows a given command to be | ||
77 | executed and the corresponding delays to be | ||
78 | seen. | ||
79 | |||
80 | General format of the getdelays command | ||
81 | |||
82 | getdelays [-t tgid] [-p pid] [-c cmd...] | ||
83 | |||
84 | |||
85 | Get delays, since system boot, for pid 10 | ||
86 | # ./getdelays -p 10 | ||
87 | (output similar to next case) | ||
88 | |||
89 | Get sum of delays, since system boot, for all pids with tgid 5 | ||
90 | # ./getdelays -t 5 | ||
91 | |||
92 | |||
93 | CPU count real total virtual total delay total | ||
94 | 7876 92005750 100000000 24001500 | ||
95 | IO count delay total | ||
96 | 0 0 | ||
97 | MEM count delay total | ||
98 | 0 0 | ||
99 | |||
100 | Get delays seen in executing a given simple command | ||
101 | # ./getdelays -c ls / | ||
102 | |||
103 | bin data1 data3 data5 dev home media opt root srv sys usr | ||
104 | boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var | ||
105 | |||
106 | |||
107 | CPU count real total virtual total delay total | ||
108 | 6 4000250 4000000 0 | ||
109 | IO count delay total | ||
110 | 0 0 | ||
111 | MEM count delay total | ||
112 | 0 0 | ||
diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c new file mode 100644 index 000000000000..795ca3911cc5 --- /dev/null +++ b/Documentation/accounting/getdelays.c | |||
@@ -0,0 +1,396 @@ | |||
1 | /* getdelays.c | ||
2 | * | ||
3 | * Utility to get per-pid and per-tgid delay accounting statistics | ||
4 | * Also illustrates usage of the taskstats interface | ||
5 | * | ||
6 | * Copyright (C) Shailabh Nagar, IBM Corp. 2005 | ||
7 | * Copyright (C) Balbir Singh, IBM Corp. 2006 | ||
8 | * Copyright (c) Jay Lan, SGI. 2006 | ||
9 | * | ||
10 | */ | ||
11 | |||
12 | #include <stdio.h> | ||
13 | #include <stdlib.h> | ||
14 | #include <errno.h> | ||
15 | #include <unistd.h> | ||
16 | #include <poll.h> | ||
17 | #include <string.h> | ||
18 | #include <fcntl.h> | ||
19 | #include <sys/types.h> | ||
20 | #include <sys/stat.h> | ||
21 | #include <sys/socket.h> | ||
22 | #include <sys/types.h> | ||
23 | #include <signal.h> | ||
24 | |||
25 | #include <linux/genetlink.h> | ||
26 | #include <linux/taskstats.h> | ||
27 | |||
28 | /* | ||
29 | * Generic macros for dealing with netlink sockets. Might be duplicated | ||
30 | * elsewhere. It is recommended that commercial grade applications use | ||
31 | * libnl or libnetlink and use the interfaces provided by the library | ||
32 | */ | ||
33 | #define GENLMSG_DATA(glh) ((void *)(NLMSG_DATA(glh) + GENL_HDRLEN)) | ||
34 | #define GENLMSG_PAYLOAD(glh) (NLMSG_PAYLOAD(glh, 0) - GENL_HDRLEN) | ||
35 | #define NLA_DATA(na) ((void *)((char*)(na) + NLA_HDRLEN)) | ||
36 | #define NLA_PAYLOAD(len) (len - NLA_HDRLEN) | ||
37 | |||
38 | #define err(code, fmt, arg...) do { printf(fmt, ##arg); exit(code); } while (0) | ||
39 | int done = 0; | ||
40 | int rcvbufsz=0; | ||
41 | |||
42 | char name[100]; | ||
43 | int dbg=0, print_delays=0; | ||
44 | __u64 stime, utime; | ||
45 | #define PRINTF(fmt, arg...) { \ | ||
46 | if (dbg) { \ | ||
47 | printf(fmt, ##arg); \ | ||
48 | } \ | ||
49 | } | ||
50 | |||
51 | /* Maximum size of response requested or message sent */ | ||
52 | #define MAX_MSG_SIZE 256 | ||
53 | /* Maximum number of cpus expected to be specified in a cpumask */ | ||
54 | #define MAX_CPUS 32 | ||
55 | /* Maximum length of pathname to log file */ | ||
56 | #define MAX_FILENAME 256 | ||
57 | |||
58 | struct msgtemplate { | ||
59 | struct nlmsghdr n; | ||
60 | struct genlmsghdr g; | ||
61 | char buf[MAX_MSG_SIZE]; | ||
62 | }; | ||
63 | |||
64 | char cpumask[100+6*MAX_CPUS]; | ||
65 | |||
66 | /* | ||
67 | * Create a raw netlink socket and bind | ||
68 | */ | ||
69 | static int create_nl_socket(int protocol) | ||
70 | { | ||
71 | int fd; | ||
72 | struct sockaddr_nl local; | ||
73 | |||
74 | fd = socket(AF_NETLINK, SOCK_RAW, protocol); | ||
75 | if (fd < 0) | ||
76 | return -1; | ||
77 | |||
78 | if (rcvbufsz) | ||
79 | if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, | ||
80 | &rcvbufsz, sizeof(rcvbufsz)) < 0) { | ||
81 | printf("Unable to set socket rcv buf size to %d\n", | ||
82 | rcvbufsz); | ||
83 | return -1; | ||
84 | } | ||
85 | |||
86 | memset(&local, 0, sizeof(local)); | ||
87 | local.nl_family = AF_NETLINK; | ||
88 | |||
89 | if (bind(fd, (struct sockaddr *) &local, sizeof(local)) < 0) | ||
90 | goto error; | ||
91 | |||
92 | return fd; | ||
93 | error: | ||
94 | close(fd); | ||
95 | return -1; | ||
96 | } | ||
97 | |||
98 | |||
99 | int send_cmd(int sd, __u16 nlmsg_type, __u32 nlmsg_pid, | ||
100 | __u8 genl_cmd, __u16 nla_type, | ||
101 | void *nla_data, int nla_len) | ||
102 | { | ||
103 | struct nlattr *na; | ||
104 | struct sockaddr_nl nladdr; | ||
105 | int r, buflen; | ||
106 | char *buf; | ||
107 | |||
108 | struct msgtemplate msg; | ||
109 | |||
110 | msg.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN); | ||
111 | msg.n.nlmsg_type = nlmsg_type; | ||
112 | msg.n.nlmsg_flags = NLM_F_REQUEST; | ||
113 | msg.n.nlmsg_seq = 0; | ||
114 | msg.n.nlmsg_pid = nlmsg_pid; | ||
115 | msg.g.cmd = genl_cmd; | ||
116 | msg.g.version = 0x1; | ||
117 | na = (struct nlattr *) GENLMSG_DATA(&msg); | ||
118 | na->nla_type = nla_type; | ||
119 | na->nla_len = nla_len + 1 + NLA_HDRLEN; | ||
120 | memcpy(NLA_DATA(na), nla_data, nla_len); | ||
121 | msg.n.nlmsg_len += NLMSG_ALIGN(na->nla_len); | ||
122 | |||
123 | buf = (char *) &msg; | ||
124 | buflen = msg.n.nlmsg_len ; | ||
125 | memset(&nladdr, 0, sizeof(nladdr)); | ||
126 | nladdr.nl_family = AF_NETLINK; | ||
127 | while ((r = sendto(sd, buf, buflen, 0, (struct sockaddr *) &nladdr, | ||
128 | sizeof(nladdr))) < buflen) { | ||
129 | if (r > 0) { | ||
130 | buf += r; | ||
131 | buflen -= r; | ||
132 | } else if (errno != EAGAIN) | ||
133 | return -1; | ||
134 | } | ||
135 | return 0; | ||
136 | } | ||
137 | |||
138 | |||
139 | /* | ||
140 | * Probe the controller in genetlink to find the family id | ||
141 | * for the TASKSTATS family | ||
142 | */ | ||
143 | int get_family_id(int sd) | ||
144 | { | ||
145 | struct { | ||
146 | struct nlmsghdr n; | ||
147 | struct genlmsghdr g; | ||
148 | char buf[256]; | ||
149 | } ans; | ||
150 | |||
151 | int id, rc; | ||
152 | struct nlattr *na; | ||
153 | int rep_len; | ||
154 | |||
155 | strcpy(name, TASKSTATS_GENL_NAME); | ||
156 | rc = send_cmd(sd, GENL_ID_CTRL, getpid(), CTRL_CMD_GETFAMILY, | ||
157 | CTRL_ATTR_FAMILY_NAME, (void *)name, | ||
158 | strlen(TASKSTATS_GENL_NAME)+1); | ||
159 | |||
160 | rep_len = recv(sd, &ans, sizeof(ans), 0); | ||
161 | if (ans.n.nlmsg_type == NLMSG_ERROR || | ||
162 | (rep_len < 0) || !NLMSG_OK((&ans.n), rep_len)) | ||
163 | return 0; | ||
164 | |||
165 | na = (struct nlattr *) GENLMSG_DATA(&ans); | ||
166 | na = (struct nlattr *) ((char *) na + NLA_ALIGN(na->nla_len)); | ||
167 | if (na->nla_type == CTRL_ATTR_FAMILY_ID) { | ||
168 | id = *(__u16 *) NLA_DATA(na); | ||
169 | } | ||
170 | return id; | ||
171 | } | ||
172 | |||
173 | void print_delayacct(struct taskstats *t) | ||
174 | { | ||
175 | printf("\n\nCPU %15s%15s%15s%15s\n" | ||
176 | " %15llu%15llu%15llu%15llu\n" | ||
177 | "IO %15s%15s\n" | ||
178 | " %15llu%15llu\n" | ||
179 | "MEM %15s%15s\n" | ||
180 | " %15llu%15llu\n\n", | ||
181 | "count", "real total", "virtual total", "delay total", | ||
182 | t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total, | ||
183 | t->cpu_delay_total, | ||
184 | "count", "delay total", | ||
185 | t->blkio_count, t->blkio_delay_total, | ||
186 | "count", "delay total", t->swapin_count, t->swapin_delay_total); | ||
187 | } | ||
188 | |||
189 | int main(int argc, char *argv[]) | ||
190 | { | ||
191 | int c, rc, rep_len, aggr_len, len2, cmd_type; | ||
192 | __u16 id; | ||
193 | __u32 mypid; | ||
194 | |||
195 | struct nlattr *na; | ||
196 | int nl_sd = -1; | ||
197 | int len = 0; | ||
198 | pid_t tid = 0; | ||
199 | pid_t rtid = 0; | ||
200 | |||
201 | int fd = 0; | ||
202 | int count = 0; | ||
203 | int write_file = 0; | ||
204 | int maskset = 0; | ||
205 | char logfile[128]; | ||
206 | int loop = 0; | ||
207 | |||
208 | struct msgtemplate msg; | ||
209 | |||
210 | while (1) { | ||
211 | c = getopt(argc, argv, "dw:r:m:t:p:v:l"); | ||
212 | if (c < 0) | ||
213 | break; | ||
214 | |||
215 | switch (c) { | ||
216 | case 'd': | ||
217 | printf("print delayacct stats ON\n"); | ||
218 | print_delays = 1; | ||
219 | break; | ||
220 | case 'w': | ||
221 | strncpy(logfile, optarg, MAX_FILENAME); | ||
222 | printf("write to file %s\n", logfile); | ||
223 | write_file = 1; | ||
224 | break; | ||
225 | case 'r': | ||
226 | rcvbufsz = atoi(optarg); | ||
227 | printf("receive buf size %d\n", rcvbufsz); | ||
228 | if (rcvbufsz < 0) | ||
229 | err(1, "Invalid rcv buf size\n"); | ||
230 | break; | ||
231 | case 'm': | ||
232 | strncpy(cpumask, optarg, sizeof(cpumask)); | ||
233 | maskset = 1; | ||
234 | printf("cpumask %s maskset %d\n", cpumask, maskset); | ||
235 | break; | ||
236 | case 't': | ||
237 | tid = atoi(optarg); | ||
238 | if (!tid) | ||
239 | err(1, "Invalid tgid\n"); | ||
240 | cmd_type = TASKSTATS_CMD_ATTR_TGID; | ||
241 | print_delays = 1; | ||
242 | break; | ||
243 | case 'p': | ||
244 | tid = atoi(optarg); | ||
245 | if (!tid) | ||
246 | err(1, "Invalid pid\n"); | ||
247 | cmd_type = TASKSTATS_CMD_ATTR_PID; | ||
248 | print_delays = 1; | ||
249 | break; | ||
250 | case 'v': | ||
251 | printf("debug on\n"); | ||
252 | dbg = 1; | ||
253 | break; | ||
254 | case 'l': | ||
255 | printf("listen forever\n"); | ||
256 | loop = 1; | ||
257 | break; | ||
258 | default: | ||
259 | printf("Unknown option %d\n", c); | ||
260 | exit(-1); | ||
261 | } | ||
262 | } | ||
263 | |||
264 | if (write_file) { | ||
265 | fd = open(logfile, O_WRONLY | O_CREAT | O_TRUNC, | ||
266 | S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH); | ||
267 | if (fd == -1) { | ||
268 | perror("Cannot open output file\n"); | ||
269 | exit(1); | ||
270 | } | ||
271 | } | ||
272 | |||
273 | if ((nl_sd = create_nl_socket(NETLINK_GENERIC)) < 0) | ||
274 | err(1, "error creating Netlink socket\n"); | ||
275 | |||
276 | |||
277 | mypid = getpid(); | ||
278 | id = get_family_id(nl_sd); | ||
279 | if (!id) { | ||
280 | printf("Error getting family id, errno %d", errno); | ||
281 | goto err; | ||
282 | } | ||
283 | PRINTF("family id %d\n", id); | ||
284 | |||
285 | if (maskset) { | ||
286 | rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET, | ||
287 | TASKSTATS_CMD_ATTR_REGISTER_CPUMASK, | ||
288 | &cpumask, sizeof(cpumask)); | ||
289 | PRINTF("Sent register cpumask, retval %d\n", rc); | ||
290 | if (rc < 0) { | ||
291 | printf("error sending register cpumask\n"); | ||
292 | goto err; | ||
293 | } | ||
294 | } | ||
295 | |||
296 | if (tid) { | ||
297 | rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET, | ||
298 | cmd_type, &tid, sizeof(__u32)); | ||
299 | PRINTF("Sent pid/tgid, retval %d\n", rc); | ||
300 | if (rc < 0) { | ||
301 | printf("error sending tid/tgid cmd\n"); | ||
302 | goto done; | ||
303 | } | ||
304 | } | ||
305 | |||
306 | do { | ||
307 | int i; | ||
308 | |||
309 | rep_len = recv(nl_sd, &msg, sizeof(msg), 0); | ||
310 | PRINTF("received %d bytes\n", rep_len); | ||
311 | |||
312 | if (rep_len < 0) { | ||
313 | printf("nonfatal reply error: errno %d\n", errno); | ||
314 | continue; | ||
315 | } | ||
316 | if (msg.n.nlmsg_type == NLMSG_ERROR || | ||
317 | !NLMSG_OK((&msg.n), rep_len)) { | ||
318 | printf("fatal reply error, errno %d\n", errno); | ||
319 | goto done; | ||
320 | } | ||
321 | |||
322 | PRINTF("nlmsghdr size=%d, nlmsg_len=%d, rep_len=%d\n", | ||
323 | sizeof(struct nlmsghdr), msg.n.nlmsg_len, rep_len); | ||
324 | |||
325 | |||
326 | rep_len = GENLMSG_PAYLOAD(&msg.n); | ||
327 | |||
328 | na = (struct nlattr *) GENLMSG_DATA(&msg); | ||
329 | len = 0; | ||
330 | i = 0; | ||
331 | while (len < rep_len) { | ||
332 | len += NLA_ALIGN(na->nla_len); | ||
333 | switch (na->nla_type) { | ||
334 | case TASKSTATS_TYPE_AGGR_TGID: | ||
335 | /* Fall through */ | ||
336 | case TASKSTATS_TYPE_AGGR_PID: | ||
337 | aggr_len = NLA_PAYLOAD(na->nla_len); | ||
338 | len2 = 0; | ||
339 | /* For nested attributes, na follows */ | ||
340 | na = (struct nlattr *) NLA_DATA(na); | ||
341 | done = 0; | ||
342 | while (len2 < aggr_len) { | ||
343 | switch (na->nla_type) { | ||
344 | case TASKSTATS_TYPE_PID: | ||
345 | rtid = *(int *) NLA_DATA(na); | ||
346 | if (print_delays) | ||
347 | printf("PID\t%d\n", rtid); | ||
348 | break; | ||
349 | case TASKSTATS_TYPE_TGID: | ||
350 | rtid = *(int *) NLA_DATA(na); | ||
351 | if (print_delays) | ||
352 | printf("TGID\t%d\n", rtid); | ||
353 | break; | ||
354 | case TASKSTATS_TYPE_STATS: | ||
355 | count++; | ||
356 | if (print_delays) | ||
357 | print_delayacct((struct taskstats *) NLA_DATA(na)); | ||
358 | if (fd) { | ||
359 | if (write(fd, NLA_DATA(na), na->nla_len) < 0) { | ||
360 | err(1,"write error\n"); | ||
361 | } | ||
362 | } | ||
363 | if (!loop) | ||
364 | goto done; | ||
365 | break; | ||
366 | default: | ||
367 | printf("Unknown nested nla_type %d\n", na->nla_type); | ||
368 | break; | ||
369 | } | ||
370 | len2 += NLA_ALIGN(na->nla_len); | ||
371 | na = (struct nlattr *) ((char *) na + len2); | ||
372 | } | ||
373 | break; | ||
374 | |||
375 | default: | ||
376 | printf("Unknown nla_type %d\n", na->nla_type); | ||
377 | break; | ||
378 | } | ||
379 | na = (struct nlattr *) (GENLMSG_DATA(&msg) + len); | ||
380 | } | ||
381 | } while (loop); | ||
382 | done: | ||
383 | if (maskset) { | ||
384 | rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET, | ||
385 | TASKSTATS_CMD_ATTR_DEREGISTER_CPUMASK, | ||
386 | &cpumask, sizeof(cpumask)); | ||
387 | printf("Sent deregister mask, retval %d\n", rc); | ||
388 | if (rc < 0) | ||
389 | err(rc, "error sending deregister cpumask\n"); | ||
390 | } | ||
391 | err: | ||
392 | close(nl_sd); | ||
393 | if (fd) | ||
394 | close(fd); | ||
395 | return 0; | ||
396 | } | ||
diff --git a/Documentation/accounting/taskstats.txt b/Documentation/accounting/taskstats.txt new file mode 100644 index 000000000000..92ebf29e9041 --- /dev/null +++ b/Documentation/accounting/taskstats.txt | |||
@@ -0,0 +1,181 @@ | |||
1 | Per-task statistics interface | ||
2 | ----------------------------- | ||
3 | |||
4 | |||
5 | Taskstats is a netlink-based interface for sending per-task and | ||
6 | per-process statistics from the kernel to userspace. | ||
7 | |||
8 | Taskstats was designed for the following benefits: | ||
9 | |||
10 | - efficiently provide statistics during lifetime of a task and on its exit | ||
11 | - unified interface for multiple accounting subsystems | ||
12 | - extensibility for use by future accounting patches | ||
13 | |||
14 | Terminology | ||
15 | ----------- | ||
16 | |||
17 | "pid", "tid" and "task" are used interchangeably and refer to the standard | ||
18 | Linux task defined by struct task_struct. per-pid stats are the same as | ||
19 | per-task stats. | ||
20 | |||
21 | "tgid", "process" and "thread group" are used interchangeably and refer to the | ||
22 | tasks that share an mm_struct i.e. the traditional Unix process. Despite the | ||
23 | use of tgid, there is no special treatment for the task that is thread group | ||
24 | leader - a process is deemed alive as long as it has any task belonging to it. | ||
25 | |||
26 | Usage | ||
27 | ----- | ||
28 | |||
29 | To get statistics during a task's lifetime, userspace opens a unicast netlink | ||
30 | socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid. | ||
31 | The response contains statistics for a task (if pid is specified) or the sum of | ||
32 | statistics for all tasks of the process (if tgid is specified). | ||
33 | |||
34 | To obtain statistics for tasks which are exiting, the userspace listener | ||
35 | sends a register command and specifies a cpumask. Whenever a task exits on | ||
36 | one of the cpus in the cpumask, its per-pid statistics are sent to the | ||
37 | registered listener. Using cpumasks allows the data received by one listener | ||
38 | to be limited and assists in flow control over the netlink interface and is | ||
39 | explained in more detail below. | ||
40 | |||
41 | If the exiting task is the last thread exiting its thread group, | ||
42 | an additional record containing the per-tgid stats is also sent to userspace. | ||
43 | The latter contains the sum of per-pid stats for all threads in the thread | ||
44 | group, both past and present. | ||
45 | |||
46 | getdelays.c is a simple utility demonstrating usage of the taskstats interface | ||
47 | for reporting delay accounting statistics. Users can register cpumasks, | ||
48 | send commands and process responses, listen for per-tid/tgid exit data, | ||
49 | write the data received to a file and do basic flow control by increasing | ||
50 | receive buffer sizes. | ||
51 | |||
52 | Interface | ||
53 | --------- | ||
54 | |||
55 | The user-kernel interface is encapsulated in include/linux/taskstats.h | ||
56 | |||
57 | To avoid this documentation becoming obsolete as the interface evolves, only | ||
58 | an outline of the current version is given. taskstats.h always overrides the | ||
59 | description here. | ||
60 | |||
61 | struct taskstats is the common accounting structure for both per-pid and | ||
62 | per-tgid data. It is versioned and can be extended by each accounting subsystem | ||
63 | that is added to the kernel. The fields and their semantics are defined in the | ||
64 | taskstats.h file. | ||
65 | |||
66 | The data exchanged between user and kernel space is a netlink message belonging | ||
67 | to the NETLINK_GENERIC family and using the netlink attributes interface. | ||
68 | The messages are in the format | ||
69 | |||
70 | +----------+- - -+-------------+-------------------+ | ||
71 | | nlmsghdr | Pad | genlmsghdr | taskstats payload | | ||
72 | +----------+- - -+-------------+-------------------+ | ||
73 | |||
74 | |||
75 | The taskstats payload is one of the following three kinds: | ||
76 | |||
77 | 1. Commands: Sent from user to kernel. Commands to get data on | ||
78 | a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID, | ||
79 | containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes | ||
80 | the task/process for which userspace wants statistics. | ||
81 | |||
82 | Commands to register/deregister interest in exit data from a set of cpus | ||
83 | consist of one attribute, of type | ||
84 | TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the | ||
85 | attribute payload. The cpumask is specified as an ascii string of | ||
86 | comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8 | ||
87 | the cpumask would be "1-3,5,7-8". If userspace forgets to deregister interest | ||
88 | in cpus before closing the listening socket, the kernel cleans up its interest | ||
89 | set over time. However, for the sake of efficiency, an explicit deregistration | ||
90 | is advisable. | ||
91 | |||
92 | 2. Response for a command: sent from the kernel in response to a userspace | ||
93 | command. The payload is a series of three attributes of type: | ||
94 | |||
95 | a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates | ||
96 | a pid/tgid will be followed by some stats. | ||
97 | |||
98 | b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats | ||
99 | is being returned. | ||
100 | |||
101 | c) TASKSTATS_TYPE_STATS: attribute with a struct taskstsats as payload. The | ||
102 | same structure is used for both per-pid and per-tgid stats. | ||
103 | |||
104 | 3. New message sent by kernel whenever a task exits. The payload consists of a | ||
105 | series of attributes of the following type: | ||
106 | |||
107 | a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats | ||
108 | b) TASKSTATS_TYPE_PID: contains exiting task's pid | ||
109 | c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats | ||
110 | d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats | ||
111 | e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs | ||
112 | f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process | ||
113 | |||
114 | |||
115 | per-tgid stats | ||
116 | -------------- | ||
117 | |||
118 | Taskstats provides per-process stats, in addition to per-task stats, since | ||
119 | resource management is often done at a process granularity and aggregating task | ||
120 | stats in userspace alone is inefficient and potentially inaccurate (due to lack | ||
121 | of atomicity). | ||
122 | |||
123 | However, maintaining per-process, in addition to per-task stats, within the | ||
124 | kernel has space and time overheads. To address this, the taskstats code | ||
125 | accumalates each exiting task's statistics into a process-wide data structure. | ||
126 | When the last task of a process exits, the process level data accumalated also | ||
127 | gets sent to userspace (along with the per-task data). | ||
128 | |||
129 | When a user queries to get per-tgid data, the sum of all other live threads in | ||
130 | the group is added up and added to the accumalated total for previously exited | ||
131 | threads of the same thread group. | ||
132 | |||
133 | Extending taskstats | ||
134 | ------------------- | ||
135 | |||
136 | There are two ways to extend the taskstats interface to export more | ||
137 | per-task/process stats as patches to collect them get added to the kernel | ||
138 | in future: | ||
139 | |||
140 | 1. Adding more fields to the end of the existing struct taskstats. Backward | ||
141 | compatibility is ensured by the version number within the | ||
142 | structure. Userspace will use only the fields of the struct that correspond | ||
143 | to the version its using. | ||
144 | |||
145 | 2. Defining separate statistic structs and using the netlink attributes | ||
146 | interface to return them. Since userspace processes each netlink attribute | ||
147 | independently, it can always ignore attributes whose type it does not | ||
148 | understand (because it is using an older version of the interface). | ||
149 | |||
150 | |||
151 | Choosing between 1. and 2. is a matter of trading off flexibility and | ||
152 | overhead. If only a few fields need to be added, then 1. is the preferable | ||
153 | path since the kernel and userspace don't need to incur the overhead of | ||
154 | processing new netlink attributes. But if the new fields expand the existing | ||
155 | struct too much, requiring disparate userspace accounting utilities to | ||
156 | unnecessarily receive large structures whose fields are of no interest, then | ||
157 | extending the attributes structure would be worthwhile. | ||
158 | |||
159 | Flow control for taskstats | ||
160 | -------------------------- | ||
161 | |||
162 | When the rate of task exits becomes large, a listener may not be able to keep | ||
163 | up with the kernel's rate of sending per-tid/tgid exit data leading to data | ||
164 | loss. This possibility gets compounded when the taskstats structure gets | ||
165 | extended and the number of cpus grows large. | ||
166 | |||
167 | To avoid losing statistics, userspace should do one or more of the following: | ||
168 | |||
169 | - increase the receive buffer sizes for the netlink sockets opened by | ||
170 | listeners to receive exit data. | ||
171 | |||
172 | - create more listeners and reduce the number of cpus being listened to by | ||
173 | each listener. In the extreme case, there could be one listener for each cpu. | ||
174 | Users may also consider setting the cpu affinity of the listener to the subset | ||
175 | of cpus to which it listens, especially if they are listening to just one cpu. | ||
176 | |||
177 | Despite these measures, if the userspace receives ENOBUFS error messages | ||
178 | indicated overflow of receive buffers, it should take measures to handle the | ||
179 | loss of data. | ||
180 | |||
181 | ---- | ||
diff --git a/Documentation/arm/IXP4xx b/Documentation/arm/IXP4xx index d4c6d3aa0c25..43edb4ecf27d 100644 --- a/Documentation/arm/IXP4xx +++ b/Documentation/arm/IXP4xx | |||
@@ -85,7 +85,7 @@ IXP4xx provides two methods of accessing PCI memory space: | |||
85 | 2) If > 64MB of memory space is required, the IXP4xx can be | 85 | 2) If > 64MB of memory space is required, the IXP4xx can be |
86 | configured to use indirect registers to access PCI This allows | 86 | configured to use indirect registers to access PCI This allows |
87 | for up to 128MB (0x48000000 to 0x4fffffff) of memory on the bus. | 87 | for up to 128MB (0x48000000 to 0x4fffffff) of memory on the bus. |
88 | The disadvantadge of this is that every PCI access requires | 88 | The disadvantage of this is that every PCI access requires |
89 | three local register accesses plus a spinlock, but in some | 89 | three local register accesses plus a spinlock, but in some |
90 | cases the performance hit is acceptable. In addition, you cannot | 90 | cases the performance hit is acceptable. In addition, you cannot |
91 | mmap() PCI devices in this case due to the indirect nature | 91 | mmap() PCI devices in this case due to the indirect nature |
diff --git a/Documentation/arm/Samsung-S3C24XX/Overview.txt b/Documentation/arm/Samsung-S3C24XX/Overview.txt index 8c6ee684174c..3e46d2a31158 100644 --- a/Documentation/arm/Samsung-S3C24XX/Overview.txt +++ b/Documentation/arm/Samsung-S3C24XX/Overview.txt | |||
@@ -7,11 +7,13 @@ Introduction | |||
7 | ------------ | 7 | ------------ |
8 | 8 | ||
9 | The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported | 9 | The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported |
10 | by the 's3c2410' architecture of ARM Linux. Currently the S3C2410 and | 10 | by the 's3c2410' architecture of ARM Linux. Currently the S3C2410, |
11 | the S3C2440 are supported CPUs. | 11 | S3C2440 and S3C2442 devices are supported. |
12 | 12 | ||
13 | Support for the S3C2400 series is in progress. | 13 | Support for the S3C2400 series is in progress. |
14 | 14 | ||
15 | Support for the S3C2412 and S3C2413 CPUs is being merged. | ||
16 | |||
15 | 17 | ||
16 | Configuration | 18 | Configuration |
17 | ------------- | 19 | ------------- |
@@ -43,9 +45,18 @@ Machines | |||
43 | 45 | ||
44 | Samsung's own development board, geared for PDA work. | 46 | Samsung's own development board, geared for PDA work. |
45 | 47 | ||
48 | Samsung/Aiji SMDK2412 | ||
49 | |||
50 | The S3C2412 version of the SMDK2440. | ||
51 | |||
52 | Samsung/Aiji SMDK2413 | ||
53 | |||
54 | The S3C2412 version of the SMDK2440. | ||
55 | |||
46 | Samsung/Meritech SMDK2440 | 56 | Samsung/Meritech SMDK2440 |
47 | 57 | ||
48 | The S3C2440 compatible version of the SMDK2440 | 58 | The S3C2440 compatible version of the SMDK2440, which has the |
59 | option of an S3C2440 or S3C2442 CPU module. | ||
49 | 60 | ||
50 | Thorcom VR1000 | 61 | Thorcom VR1000 |
51 | 62 | ||
@@ -211,24 +222,6 @@ Port Contributors | |||
211 | Lucas Correia Villa Real (S3C2400 port) | 222 | Lucas Correia Villa Real (S3C2400 port) |
212 | 223 | ||
213 | 224 | ||
214 | Document Changes | ||
215 | ---------------- | ||
216 | |||
217 | 05 Sep 2004 - BJD - Added Document Changes section | ||
218 | 05 Sep 2004 - BJD - Added Klaus Fetscher to list of contributors | ||
219 | 25 Oct 2004 - BJD - Added Dimitry Andric to list of contributors | ||
220 | 25 Oct 2004 - BJD - Updated the MTD from the 2.6.9 merge | ||
221 | 21 Jan 2005 - BJD - Added rx3715, added Shannon to contributors | ||
222 | 10 Feb 2005 - BJD - Added Guillaume Gourat to contributors | ||
223 | 02 Mar 2005 - BJD - Added SMDK2440 to list of machines | ||
224 | 06 Mar 2005 - BJD - Added Christer Weinigel | ||
225 | 08 Mar 2005 - BJD - Added LCVR to list of people, updated introduction | ||
226 | 08 Mar 2005 - BJD - Added section on adding machines | ||
227 | 09 Sep 2005 - BJD - Added section on platform data | ||
228 | 11 Feb 2006 - BJD - Added I2C, RTC and Watchdog sections | ||
229 | 11 Feb 2006 - BJD - Added Osiris machine, and S3C2400 information | ||
230 | |||
231 | |||
232 | Document Author | 225 | Document Author |
233 | --------------- | 226 | --------------- |
234 | 227 | ||
diff --git a/Documentation/arm/Samsung-S3C24XX/S3C2412.txt b/Documentation/arm/Samsung-S3C24XX/S3C2412.txt new file mode 100644 index 000000000000..cb82a7fc7901 --- /dev/null +++ b/Documentation/arm/Samsung-S3C24XX/S3C2412.txt | |||
@@ -0,0 +1,120 @@ | |||
1 | S3C2412 ARM Linux Overview | ||
2 | ========================== | ||
3 | |||
4 | Introduction | ||
5 | ------------ | ||
6 | |||
7 | The S3C2412 is part of the S3C24XX range of ARM9 System-on-Chip CPUs | ||
8 | from Samsung. This part has an ARM926-EJS core, capable of running up | ||
9 | to 266MHz (see data-sheet for more information) | ||
10 | |||
11 | |||
12 | Clock | ||
13 | ----- | ||
14 | |||
15 | The core clock code provides a set of clocks to the drivers, and allows | ||
16 | for source selection and a number of other features. | ||
17 | |||
18 | |||
19 | Power | ||
20 | ----- | ||
21 | |||
22 | No support for suspend/resume to RAM in the current system. | ||
23 | |||
24 | |||
25 | DMA | ||
26 | --- | ||
27 | |||
28 | No current support for DMA. | ||
29 | |||
30 | |||
31 | GPIO | ||
32 | ---- | ||
33 | |||
34 | There is support for setting the GPIO to input/output/special function | ||
35 | and reading or writing to them. | ||
36 | |||
37 | |||
38 | UART | ||
39 | ---- | ||
40 | |||
41 | The UART hardware is similar to the S3C2440, and is supported by the | ||
42 | s3c2410 driver in the drivers/serial directory. | ||
43 | |||
44 | |||
45 | NAND | ||
46 | ---- | ||
47 | |||
48 | The NAND hardware is similar to the S3C2440, and is supported by the | ||
49 | s3c2410 driver in the drivers/mtd/nand directory. | ||
50 | |||
51 | |||
52 | USB Host | ||
53 | -------- | ||
54 | |||
55 | The USB hardware is similar to the S3C2410, with extended clock source | ||
56 | control. The OHCI portion is supported by the ohci-s3c2410 driver, and | ||
57 | the clock control selection is supported by the core clock code. | ||
58 | |||
59 | |||
60 | USB Device | ||
61 | ---------- | ||
62 | |||
63 | No current support in the kernel | ||
64 | |||
65 | |||
66 | IRQs | ||
67 | ---- | ||
68 | |||
69 | All the standard, and external interrupt sources are supported. The | ||
70 | extra sub-sources are not yet supported. | ||
71 | |||
72 | |||
73 | RTC | ||
74 | --- | ||
75 | |||
76 | The RTC hardware is similar to the S3C2410, and is supported by the | ||
77 | s3c2410-rtc driver. | ||
78 | |||
79 | |||
80 | Watchdog | ||
81 | -------- | ||
82 | |||
83 | The watchdog harware is the same as the S3C2410, and is supported by | ||
84 | the s3c2410_wdt driver. | ||
85 | |||
86 | |||
87 | MMC/SD/SDIO | ||
88 | ----------- | ||
89 | |||
90 | No current support for the MMC/SD/SDIO block. | ||
91 | |||
92 | IIC | ||
93 | --- | ||
94 | |||
95 | The IIC hardware is the same as the S3C2410, and is supported by the | ||
96 | i2c-s3c24xx driver. | ||
97 | |||
98 | |||
99 | IIS | ||
100 | --- | ||
101 | |||
102 | No current support for the IIS interface. | ||
103 | |||
104 | |||
105 | SPI | ||
106 | --- | ||
107 | |||
108 | No current support for the SPI interfaces. | ||
109 | |||
110 | |||
111 | ATA | ||
112 | --- | ||
113 | |||
114 | No current support for the on-board ATA block. | ||
115 | |||
116 | |||
117 | Document Author | ||
118 | --------------- | ||
119 | |||
120 | Ben Dooks, (c) 2006 Simtec Electronics | ||
diff --git a/Documentation/arm/Samsung-S3C24XX/S3C2413.txt b/Documentation/arm/Samsung-S3C24XX/S3C2413.txt new file mode 100644 index 000000000000..ab2a88858f12 --- /dev/null +++ b/Documentation/arm/Samsung-S3C24XX/S3C2413.txt | |||
@@ -0,0 +1,21 @@ | |||
1 | S3C2413 ARM Linux Overview | ||
2 | ========================== | ||
3 | |||
4 | Introduction | ||
5 | ------------ | ||
6 | |||
7 | The S3C2413 is an extended version of the S3C2412, with an camera | ||
8 | interface and mobile DDR memory support. See the S3C2412 support | ||
9 | documentation for more information. | ||
10 | |||
11 | |||
12 | Camera Interface | ||
13 | --------------- | ||
14 | |||
15 | This block is currently not supported. | ||
16 | |||
17 | |||
18 | Document Author | ||
19 | --------------- | ||
20 | |||
21 | Ben Dooks, (c) 2006 Simtec Electronics | ||
diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt index 23a1c2402bcc..2a63d5662a93 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/atomic_ops.txt | |||
@@ -157,13 +157,13 @@ For example, smp_mb__before_atomic_dec() can be used like so: | |||
157 | smp_mb__before_atomic_dec(); | 157 | smp_mb__before_atomic_dec(); |
158 | atomic_dec(&obj->ref_count); | 158 | atomic_dec(&obj->ref_count); |
159 | 159 | ||
160 | It makes sure that all memory operations preceeding the atomic_dec() | 160 | It makes sure that all memory operations preceding the atomic_dec() |
161 | call are strongly ordered with respect to the atomic counter | 161 | call are strongly ordered with respect to the atomic counter |
162 | operation. In the above example, it guarentees that the assignment of | 162 | operation. In the above example, it guarantees that the assignment of |
163 | "1" to obj->dead will be globally visible to other cpus before the | 163 | "1" to obj->dead will be globally visible to other cpus before the |
164 | atomic counter decrement. | 164 | atomic counter decrement. |
165 | 165 | ||
166 | Without the explicitl smp_mb__before_atomic_dec() call, the | 166 | Without the explicit smp_mb__before_atomic_dec() call, the |
167 | implementation could legally allow the atomic counter update visible | 167 | implementation could legally allow the atomic counter update visible |
168 | to other cpus before the "obj->dead = 1;" assignment. | 168 | to other cpus before the "obj->dead = 1;" assignment. |
169 | 169 | ||
@@ -173,11 +173,11 @@ ordering with respect to memory operations after an atomic_dec() call | |||
173 | (smp_mb__{before,after}_atomic_inc()). | 173 | (smp_mb__{before,after}_atomic_inc()). |
174 | 174 | ||
175 | A missing memory barrier in the cases where they are required by the | 175 | A missing memory barrier in the cases where they are required by the |
176 | atomic_t implementation above can have disasterous results. Here is | 176 | atomic_t implementation above can have disastrous results. Here is |
177 | an example, which follows a pattern occuring frequently in the Linux | 177 | an example, which follows a pattern occurring frequently in the Linux |
178 | kernel. It is the use of atomic counters to implement reference | 178 | kernel. It is the use of atomic counters to implement reference |
179 | counting, and it works such that once the counter falls to zero it can | 179 | counting, and it works such that once the counter falls to zero it can |
180 | be guarenteed that no other entity can be accessing the object: | 180 | be guaranteed that no other entity can be accessing the object: |
181 | 181 | ||
182 | static void obj_list_add(struct obj *obj) | 182 | static void obj_list_add(struct obj *obj) |
183 | { | 183 | { |
@@ -291,9 +291,9 @@ to the size of an "unsigned long" C data type, and are least of that | |||
291 | size. The endianness of the bits within each "unsigned long" are the | 291 | size. The endianness of the bits within each "unsigned long" are the |
292 | native endianness of the cpu. | 292 | native endianness of the cpu. |
293 | 293 | ||
294 | void set_bit(unsigned long nr, volatils unsigned long *addr); | 294 | void set_bit(unsigned long nr, volatile unsigned long *addr); |
295 | void clear_bit(unsigned long nr, volatils unsigned long *addr); | 295 | void clear_bit(unsigned long nr, volatile unsigned long *addr); |
296 | void change_bit(unsigned long nr, volatils unsigned long *addr); | 296 | void change_bit(unsigned long nr, volatile unsigned long *addr); |
297 | 297 | ||
298 | These routines set, clear, and change, respectively, the bit number | 298 | These routines set, clear, and change, respectively, the bit number |
299 | indicated by "nr" on the bit mask pointed to by "ADDR". | 299 | indicated by "nr" on the bit mask pointed to by "ADDR". |
@@ -301,9 +301,9 @@ indicated by "nr" on the bit mask pointed to by "ADDR". | |||
301 | They must execute atomically, yet there are no implicit memory barrier | 301 | They must execute atomically, yet there are no implicit memory barrier |
302 | semantics required of these interfaces. | 302 | semantics required of these interfaces. |
303 | 303 | ||
304 | int test_and_set_bit(unsigned long nr, volatils unsigned long *addr); | 304 | int test_and_set_bit(unsigned long nr, volatile unsigned long *addr); |
305 | int test_and_clear_bit(unsigned long nr, volatils unsigned long *addr); | 305 | int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr); |
306 | int test_and_change_bit(unsigned long nr, volatils unsigned long *addr); | 306 | int test_and_change_bit(unsigned long nr, volatile unsigned long *addr); |
307 | 307 | ||
308 | Like the above, except that these routines return a boolean which | 308 | Like the above, except that these routines return a boolean which |
309 | indicates whether the changed bit was set _BEFORE_ the atomic bit | 309 | indicates whether the changed bit was set _BEFORE_ the atomic bit |
@@ -335,7 +335,7 @@ subsequent memory operation is made visible. For example: | |||
335 | /* ... */; | 335 | /* ... */; |
336 | obj->killed = 1; | 336 | obj->killed = 1; |
337 | 337 | ||
338 | The implementation of test_and_set_bit() must guarentee that | 338 | The implementation of test_and_set_bit() must guarantee that |
339 | "obj->dead = 1;" is visible to cpus before the atomic memory operation | 339 | "obj->dead = 1;" is visible to cpus before the atomic memory operation |
340 | done by test_and_set_bit() becomes visible. Likewise, the atomic | 340 | done by test_and_set_bit() becomes visible. Likewise, the atomic |
341 | memory operation done by test_and_set_bit() must become visible before | 341 | memory operation done by test_and_set_bit() must become visible before |
@@ -474,7 +474,7 @@ Now, as far as memory barriers go, as long as spin_lock() | |||
474 | strictly orders all subsequent memory operations (including | 474 | strictly orders all subsequent memory operations (including |
475 | the cas()) with respect to itself, things will be fine. | 475 | the cas()) with respect to itself, things will be fine. |
476 | 476 | ||
477 | Said another way, _atomic_dec_and_lock() must guarentee that | 477 | Said another way, _atomic_dec_and_lock() must guarantee that |
478 | a counter dropping to zero is never made visible before the | 478 | a counter dropping to zero is never made visible before the |
479 | spinlock being acquired. | 479 | spinlock being acquired. |
480 | 480 | ||
diff --git a/Documentation/cciss.txt b/Documentation/cciss.txt index 15378422fc46..9c629ffa0e58 100644 --- a/Documentation/cciss.txt +++ b/Documentation/cciss.txt | |||
@@ -20,6 +20,7 @@ This driver is known to work with the following cards: | |||
20 | * SA P400i | 20 | * SA P400i |
21 | * SA E200 | 21 | * SA E200 |
22 | * SA E200i | 22 | * SA E200i |
23 | * SA E500 | ||
23 | 24 | ||
24 | If nodes are not already created in the /dev/cciss directory, run as root: | 25 | If nodes are not already created in the /dev/cciss directory, run as root: |
25 | 26 | ||
diff --git a/Documentation/connector/ucon.c b/Documentation/connector/ucon.c new file mode 100644 index 000000000000..d738cde2a8d5 --- /dev/null +++ b/Documentation/connector/ucon.c | |||
@@ -0,0 +1,206 @@ | |||
1 | /* | ||
2 | * ucon.c | ||
3 | * | ||
4 | * Copyright (c) 2004+ Evgeniy Polyakov <johnpol@2ka.mipt.ru> | ||
5 | * | ||
6 | * | ||
7 | * This program is free software; you can redistribute it and/or modify | ||
8 | * it under the terms of the GNU General Public License as published by | ||
9 | * the Free Software Foundation; either version 2 of the License, or | ||
10 | * (at your option) any later version. | ||
11 | * | ||
12 | * This program is distributed in the hope that it will be useful, | ||
13 | * but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
14 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
15 | * GNU General Public License for more details. | ||
16 | * | ||
17 | * You should have received a copy of the GNU General Public License | ||
18 | * along with this program; if not, write to the Free Software | ||
19 | * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA | ||
20 | */ | ||
21 | |||
22 | #include <asm/types.h> | ||
23 | |||
24 | #include <sys/types.h> | ||
25 | #include <sys/socket.h> | ||
26 | #include <sys/poll.h> | ||
27 | |||
28 | #include <linux/netlink.h> | ||
29 | #include <linux/rtnetlink.h> | ||
30 | |||
31 | #include <arpa/inet.h> | ||
32 | |||
33 | #include <stdio.h> | ||
34 | #include <stdlib.h> | ||
35 | #include <unistd.h> | ||
36 | #include <string.h> | ||
37 | #include <errno.h> | ||
38 | #include <time.h> | ||
39 | |||
40 | #include <linux/connector.h> | ||
41 | |||
42 | #define DEBUG | ||
43 | #define NETLINK_CONNECTOR 11 | ||
44 | |||
45 | #ifdef DEBUG | ||
46 | #define ulog(f, a...) fprintf(stdout, f, ##a) | ||
47 | #else | ||
48 | #define ulog(f, a...) do {} while (0) | ||
49 | #endif | ||
50 | |||
51 | static int need_exit; | ||
52 | static __u32 seq; | ||
53 | |||
54 | static int netlink_send(int s, struct cn_msg *msg) | ||
55 | { | ||
56 | struct nlmsghdr *nlh; | ||
57 | unsigned int size; | ||
58 | int err; | ||
59 | char buf[128]; | ||
60 | struct cn_msg *m; | ||
61 | |||
62 | size = NLMSG_SPACE(sizeof(struct cn_msg) + msg->len); | ||
63 | |||
64 | nlh = (struct nlmsghdr *)buf; | ||
65 | nlh->nlmsg_seq = seq++; | ||
66 | nlh->nlmsg_pid = getpid(); | ||
67 | nlh->nlmsg_type = NLMSG_DONE; | ||
68 | nlh->nlmsg_len = NLMSG_LENGTH(size - sizeof(*nlh)); | ||
69 | nlh->nlmsg_flags = 0; | ||
70 | |||
71 | m = NLMSG_DATA(nlh); | ||
72 | #if 0 | ||
73 | ulog("%s: [%08x.%08x] len=%u, seq=%u, ack=%u.\n", | ||
74 | __func__, msg->id.idx, msg->id.val, msg->len, msg->seq, msg->ack); | ||
75 | #endif | ||
76 | memcpy(m, msg, sizeof(*m) + msg->len); | ||
77 | |||
78 | err = send(s, nlh, size, 0); | ||
79 | if (err == -1) | ||
80 | ulog("Failed to send: %s [%d].\n", | ||
81 | strerror(errno), errno); | ||
82 | |||
83 | return err; | ||
84 | } | ||
85 | |||
86 | int main(int argc, char *argv[]) | ||
87 | { | ||
88 | int s; | ||
89 | char buf[1024]; | ||
90 | int len; | ||
91 | struct nlmsghdr *reply; | ||
92 | struct sockaddr_nl l_local; | ||
93 | struct cn_msg *data; | ||
94 | FILE *out; | ||
95 | time_t tm; | ||
96 | struct pollfd pfd; | ||
97 | |||
98 | if (argc < 2) | ||
99 | out = stdout; | ||
100 | else { | ||
101 | out = fopen(argv[1], "a+"); | ||
102 | if (!out) { | ||
103 | ulog("Unable to open %s for writing: %s\n", | ||
104 | argv[1], strerror(errno)); | ||
105 | out = stdout; | ||
106 | } | ||
107 | } | ||
108 | |||
109 | memset(buf, 0, sizeof(buf)); | ||
110 | |||
111 | s = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR); | ||
112 | if (s == -1) { | ||
113 | perror("socket"); | ||
114 | return -1; | ||
115 | } | ||
116 | |||
117 | l_local.nl_family = AF_NETLINK; | ||
118 | l_local.nl_groups = 0x123; /* bitmask of requested groups */ | ||
119 | l_local.nl_pid = 0; | ||
120 | |||
121 | if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) { | ||
122 | perror("bind"); | ||
123 | close(s); | ||
124 | return -1; | ||
125 | } | ||
126 | |||
127 | #if 0 | ||
128 | { | ||
129 | int on = 0x57; /* Additional group number */ | ||
130 | setsockopt(s, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, &on, sizeof(on)); | ||
131 | } | ||
132 | #endif | ||
133 | if (0) { | ||
134 | int i, j; | ||
135 | |||
136 | memset(buf, 0, sizeof(buf)); | ||
137 | |||
138 | data = (struct cn_msg *)buf; | ||
139 | |||
140 | data->id.idx = 0x123; | ||
141 | data->id.val = 0x456; | ||
142 | data->seq = seq++; | ||
143 | data->ack = 0; | ||
144 | data->len = 0; | ||
145 | |||
146 | for (j=0; j<10; ++j) { | ||
147 | for (i=0; i<1000; ++i) { | ||
148 | len = netlink_send(s, data); | ||
149 | } | ||
150 | |||
151 | ulog("%d messages have been sent to %08x.%08x.\n", i, data->id.idx, data->id.val); | ||
152 | } | ||
153 | |||
154 | return 0; | ||
155 | } | ||
156 | |||
157 | |||
158 | pfd.fd = s; | ||
159 | |||
160 | while (!need_exit) { | ||
161 | pfd.events = POLLIN; | ||
162 | pfd.revents = 0; | ||
163 | switch (poll(&pfd, 1, -1)) { | ||
164 | case 0: | ||
165 | need_exit = 1; | ||
166 | break; | ||
167 | case -1: | ||
168 | if (errno != EINTR) { | ||
169 | need_exit = 1; | ||
170 | break; | ||
171 | } | ||
172 | continue; | ||
173 | } | ||
174 | if (need_exit) | ||
175 | break; | ||
176 | |||
177 | memset(buf, 0, sizeof(buf)); | ||
178 | len = recv(s, buf, sizeof(buf), 0); | ||
179 | if (len == -1) { | ||
180 | perror("recv buf"); | ||
181 | close(s); | ||
182 | return -1; | ||
183 | } | ||
184 | reply = (struct nlmsghdr *)buf; | ||
185 | |||
186 | switch (reply->nlmsg_type) { | ||
187 | case NLMSG_ERROR: | ||
188 | fprintf(out, "Error message received.\n"); | ||
189 | fflush(out); | ||
190 | break; | ||
191 | case NLMSG_DONE: | ||
192 | data = (struct cn_msg *)NLMSG_DATA(reply); | ||
193 | |||
194 | time(&tm); | ||
195 | fprintf(out, "%.24s : [%x.%x] [%08u.%08u].\n", | ||
196 | ctime(&tm), data->id.idx, data->id.val, data->seq, data->ack); | ||
197 | fflush(out); | ||
198 | break; | ||
199 | default: | ||
200 | break; | ||
201 | } | ||
202 | } | ||
203 | |||
204 | close(s); | ||
205 | return 0; | ||
206 | } | ||
diff --git a/Documentation/console/console.txt b/Documentation/console/console.txt new file mode 100644 index 000000000000..d3e17447321c --- /dev/null +++ b/Documentation/console/console.txt | |||
@@ -0,0 +1,144 @@ | |||
1 | Console Drivers | ||
2 | =============== | ||
3 | |||
4 | The linux kernel has 2 general types of console drivers. The first type is | ||
5 | assigned by the kernel to all the virtual consoles during the boot process. | ||
6 | This type will be called 'system driver', and only one system driver is allowed | ||
7 | to exist. The system driver is persistent and it can never be unloaded, though | ||
8 | it may become inactive. | ||
9 | |||
10 | The second type has to be explicitly loaded and unloaded. This will be called | ||
11 | 'modular driver' by this document. Multiple modular drivers can coexist at | ||
12 | any time with each driver sharing the console with other drivers including | ||
13 | the system driver. However, modular drivers cannot take over the console | ||
14 | that is currently occupied by another modular driver. (Exception: Drivers that | ||
15 | call take_over_console() will succeed in the takeover regardless of the type | ||
16 | of driver occupying the consoles.) They can only take over the console that is | ||
17 | occupied by the system driver. In the same token, if the modular driver is | ||
18 | released by the console, the system driver will take over. | ||
19 | |||
20 | Modular drivers, from the programmer's point of view, has to call: | ||
21 | |||
22 | take_over_console() - load and bind driver to console layer | ||
23 | give_up_console() - unbind and unload driver | ||
24 | |||
25 | In newer kernels, the following are also available: | ||
26 | |||
27 | register_con_driver() | ||
28 | unregister_con_driver() | ||
29 | |||
30 | If sysfs is enabled, the contents of /sys/class/vtconsole can be | ||
31 | examined. This shows the console backends currently registered by the | ||
32 | system which are named vtcon<n> where <n> is an integer fro 0 to 15. Thus: | ||
33 | |||
34 | ls /sys/class/vtconsole | ||
35 | . .. vtcon0 vtcon1 | ||
36 | |||
37 | Each directory in /sys/class/vtconsole has 3 files: | ||
38 | |||
39 | ls /sys/class/vtconsole/vtcon0 | ||
40 | . .. bind name uevent | ||
41 | |||
42 | What do these files signify? | ||
43 | |||
44 | 1. bind - this is a read/write file. It shows the status of the driver if | ||
45 | read, or acts to bind or unbind the driver to the virtual consoles | ||
46 | when written to. The possible values are: | ||
47 | |||
48 | 0 - means the driver is not bound and if echo'ed, commands the driver | ||
49 | to unbind | ||
50 | |||
51 | 1 - means the driver is bound and if echo'ed, commands the driver to | ||
52 | bind | ||
53 | |||
54 | 2. name - read-only file. Shows the name of the driver in this format: | ||
55 | |||
56 | cat /sys/class/vtconsole/vtcon0/name | ||
57 | (S) VGA+ | ||
58 | |||
59 | '(S)' stands for a (S)ystem driver, ie, it cannot be directly | ||
60 | commanded to bind or unbind | ||
61 | |||
62 | 'VGA+' is the name of the driver | ||
63 | |||
64 | cat /sys/class/vtconsole/vtcon1/name | ||
65 | (M) frame buffer device | ||
66 | |||
67 | In this case, '(M)' stands for a (M)odular driver, one that can be | ||
68 | directly commanded to bind or unbind. | ||
69 | |||
70 | 3. uevent - ignore this file | ||
71 | |||
72 | When unbinding, the modular driver is detached first, and then the system | ||
73 | driver takes over the consoles vacated by the driver. Binding, on the other | ||
74 | hand, will bind the driver to the consoles that are currently occupied by a | ||
75 | system driver. | ||
76 | |||
77 | NOTE1: Binding and binding must be selected in Kconfig. It's under: | ||
78 | |||
79 | Device Drivers -> Character devices -> Support for binding and unbinding | ||
80 | console drivers | ||
81 | |||
82 | NOTE2: If any of the virtual consoles are in KD_GRAPHICS mode, then binding or | ||
83 | unbinding will not succeed. An example of an application that sets the console | ||
84 | to KD_GRAPHICS is X. | ||
85 | |||
86 | How useful is this feature? This is very useful for console driver | ||
87 | developers. By unbinding the driver from the console layer, one can unload the | ||
88 | driver, make changes, recompile, reload and rebind the driver without any need | ||
89 | for rebooting the kernel. For regular users who may want to switch from | ||
90 | framebuffer console to VGA console and vice versa, this feature also makes | ||
91 | this possible. (NOTE NOTE NOTE: Please read fbcon.txt under Documentation/fb | ||
92 | for more details). | ||
93 | |||
94 | Notes for developers: | ||
95 | ===================== | ||
96 | |||
97 | take_over_console() is now broken up into: | ||
98 | |||
99 | register_con_driver() | ||
100 | bind_con_driver() - private function | ||
101 | |||
102 | give_up_console() is a wrapper to unregister_con_driver(), and a driver must | ||
103 | be fully unbound for this call to succeed. con_is_bound() will check if the | ||
104 | driver is bound or not. | ||
105 | |||
106 | Guidelines for console driver writers: | ||
107 | ===================================== | ||
108 | |||
109 | In order for binding to and unbinding from the console to properly work, | ||
110 | console drivers must follow these guidelines: | ||
111 | |||
112 | 1. All drivers, except system drivers, must call either register_con_driver() | ||
113 | or take_over_console(). register_con_driver() will just add the driver to | ||
114 | the console's internal list. It won't take over the | ||
115 | console. take_over_console(), as it name implies, will also take over (or | ||
116 | bind to) the console. | ||
117 | |||
118 | 2. All resources allocated during con->con_init() must be released in | ||
119 | con->con_deinit(). | ||
120 | |||
121 | 3. All resources allocated in con->con_startup() must be released when the | ||
122 | driver, which was previously bound, becomes unbound. The console layer | ||
123 | does not have a complementary call to con->con_startup() so it's up to the | ||
124 | driver to check when it's legal to release these resources. Calling | ||
125 | con_is_bound() in con->con_deinit() will help. If the call returned | ||
126 | false(), then it's safe to release the resources. This balance has to be | ||
127 | ensured because con->con_startup() can be called again when a request to | ||
128 | rebind the driver to the console arrives. | ||
129 | |||
130 | 4. Upon exit of the driver, ensure that the driver is totally unbound. If the | ||
131 | condition is satisfied, then the driver must call unregister_con_driver() | ||
132 | or give_up_console(). | ||
133 | |||
134 | 5. unregister_con_driver() can also be called on conditions which make it | ||
135 | impossible for the driver to service console requests. This can happen | ||
136 | with the framebuffer console that suddenly lost all of its drivers. | ||
137 | |||
138 | The current crop of console drivers should still work correctly, but binding | ||
139 | and unbinding them may cause problems. With minimal fixes, these drivers can | ||
140 | be made to work correctly. | ||
141 | |||
142 | ========================== | ||
143 | Antonino Daplas <adaplas@pol.net> | ||
144 | |||
diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt index 7fedc00c3d30..555c8cf3650a 100644 --- a/Documentation/cpu-freq/user-guide.txt +++ b/Documentation/cpu-freq/user-guide.txt | |||
@@ -153,10 +153,13 @@ scaling_governor, and by "echoing" the name of another | |||
153 | that some governors won't load - they only | 153 | that some governors won't load - they only |
154 | work on some specific architectures or | 154 | work on some specific architectures or |
155 | processors. | 155 | processors. |
156 | scaling_min_freq and | 156 | scaling_min_freq and |
157 | scaling_max_freq show the current "policy limits" (in | 157 | scaling_max_freq show the current "policy limits" (in |
158 | kHz). By echoing new values into these | 158 | kHz). By echoing new values into these |
159 | files, you can change these limits. | 159 | files, you can change these limits. |
160 | NOTE: when setting a policy you need to | ||
161 | first set scaling_max_freq, then | ||
162 | scaling_min_freq. | ||
160 | 163 | ||
161 | 164 | ||
162 | If you have selected the "userspace" governor which allows you to | 165 | If you have selected the "userspace" governor which allows you to |
diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt index 1bcf69996c9d..bc107cb157a8 100644 --- a/Documentation/cpu-hotplug.txt +++ b/Documentation/cpu-hotplug.txt | |||
@@ -251,16 +251,24 @@ A: This is what you would need in your kernel code to receive notifications. | |||
251 | return NOTIFY_OK; | 251 | return NOTIFY_OK; |
252 | } | 252 | } |
253 | 253 | ||
254 | static struct notifier_block foobar_cpu_notifer = | 254 | static struct notifier_block __cpuinitdata foobar_cpu_notifer = |
255 | { | 255 | { |
256 | .notifier_call = foobar_cpu_callback, | 256 | .notifier_call = foobar_cpu_callback, |
257 | }; | 257 | }; |
258 | 258 | ||
259 | You need to call register_cpu_notifier() from your init function. | ||
260 | Init functions could be of two types: | ||
261 | 1. early init (init function called when only the boot processor is online). | ||
262 | 2. late init (init function called _after_ all the CPUs are online). | ||
259 | 263 | ||
260 | In your init function, | 264 | For the first case, you should add the following to your init function |
261 | 265 | ||
262 | register_cpu_notifier(&foobar_cpu_notifier); | 266 | register_cpu_notifier(&foobar_cpu_notifier); |
263 | 267 | ||
268 | For the second case, you should add the following to your init function | ||
269 | |||
270 | register_hotcpu_notifier(&foobar_cpu_notifier); | ||
271 | |||
264 | You can fail PREPARE notifiers if something doesn't work to prepare resources. | 272 | You can fail PREPARE notifiers if something doesn't work to prepare resources. |
265 | This will stop the activity and send a following CANCELED event back. | 273 | This will stop the activity and send a following CANCELED event back. |
266 | 274 | ||
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt index 159e2a0c3e80..842f0d1ab216 100644 --- a/Documentation/cpusets.txt +++ b/Documentation/cpusets.txt | |||
@@ -217,6 +217,12 @@ exclusive cpuset. Also, the use of a Linux virtual file system (vfs) | |||
217 | to represent the cpuset hierarchy provides for a familiar permission | 217 | to represent the cpuset hierarchy provides for a familiar permission |
218 | and name space for cpusets, with a minimum of additional kernel code. | 218 | and name space for cpusets, with a minimum of additional kernel code. |
219 | 219 | ||
220 | The cpus and mems files in the root (top_cpuset) cpuset are | ||
221 | read-only. The cpus file automatically tracks the value of | ||
222 | cpu_online_map using a CPU hotplug notifier, and the mems file | ||
223 | automatically tracks the value of node_online_map using the | ||
224 | cpuset_track_online_nodes() hook. | ||
225 | |||
220 | 226 | ||
221 | 1.4 What are exclusive cpusets ? | 227 | 1.4 What are exclusive cpusets ? |
222 | -------------------------------- | 228 | -------------------------------- |
diff --git a/Documentation/crypto/api-intro.txt b/Documentation/crypto/api-intro.txt index 74dffc68ff9f..5a03a2801d67 100644 --- a/Documentation/crypto/api-intro.txt +++ b/Documentation/crypto/api-intro.txt | |||
@@ -19,15 +19,14 @@ At the lowest level are algorithms, which register dynamically with the | |||
19 | API. | 19 | API. |
20 | 20 | ||
21 | 'Transforms' are user-instantiated objects, which maintain state, handle all | 21 | 'Transforms' are user-instantiated objects, which maintain state, handle all |
22 | of the implementation logic (e.g. manipulating page vectors), provide an | 22 | of the implementation logic (e.g. manipulating page vectors) and provide an |
23 | abstraction to the underlying algorithms, and handle common logical | 23 | abstraction to the underlying algorithms. However, at the user |
24 | operations (e.g. cipher modes, HMAC for digests). However, at the user | ||
25 | level they are very simple. | 24 | level they are very simple. |
26 | 25 | ||
27 | Conceptually, the API layering looks like this: | 26 | Conceptually, the API layering looks like this: |
28 | 27 | ||
29 | [transform api] (user interface) | 28 | [transform api] (user interface) |
30 | [transform ops] (per-type logic glue e.g. cipher.c, digest.c) | 29 | [transform ops] (per-type logic glue e.g. cipher.c, compress.c) |
31 | [algorithm api] (for registering algorithms) | 30 | [algorithm api] (for registering algorithms) |
32 | 31 | ||
33 | The idea is to make the user interface and algorithm registration API | 32 | The idea is to make the user interface and algorithm registration API |
@@ -44,22 +43,27 @@ under development. | |||
44 | Here's an example of how to use the API: | 43 | Here's an example of how to use the API: |
45 | 44 | ||
46 | #include <linux/crypto.h> | 45 | #include <linux/crypto.h> |
46 | #include <linux/err.h> | ||
47 | #include <linux/scatterlist.h> | ||
47 | 48 | ||
48 | struct scatterlist sg[2]; | 49 | struct scatterlist sg[2]; |
49 | char result[128]; | 50 | char result[128]; |
50 | struct crypto_tfm *tfm; | 51 | struct crypto_hash *tfm; |
52 | struct hash_desc desc; | ||
51 | 53 | ||
52 | tfm = crypto_alloc_tfm("md5", 0); | 54 | tfm = crypto_alloc_hash("md5", 0, CRYPTO_ALG_ASYNC); |
53 | if (tfm == NULL) | 55 | if (IS_ERR(tfm)) |
54 | fail(); | 56 | fail(); |
55 | 57 | ||
56 | /* ... set up the scatterlists ... */ | 58 | /* ... set up the scatterlists ... */ |
59 | |||
60 | desc.tfm = tfm; | ||
61 | desc.flags = 0; | ||
57 | 62 | ||
58 | crypto_digest_init(tfm); | 63 | if (crypto_hash_digest(&desc, &sg, 2, result)) |
59 | crypto_digest_update(tfm, &sg, 2); | 64 | fail(); |
60 | crypto_digest_final(tfm, result); | ||
61 | 65 | ||
62 | crypto_free_tfm(tfm); | 66 | crypto_free_hash(tfm); |
63 | 67 | ||
64 | 68 | ||
65 | Many real examples are available in the regression test module (tcrypt.c). | 69 | Many real examples are available in the regression test module (tcrypt.c). |
@@ -126,7 +130,7 @@ might already be working on. | |||
126 | BUGS | 130 | BUGS |
127 | 131 | ||
128 | Send bug reports to: | 132 | Send bug reports to: |
129 | James Morris <jmorris@redhat.com> | 133 | Herbert Xu <herbert@gondor.apana.org.au> |
130 | Cc: David S. Miller <davem@redhat.com> | 134 | Cc: David S. Miller <davem@redhat.com> |
131 | 135 | ||
132 | 136 | ||
@@ -134,13 +138,14 @@ FURTHER INFORMATION | |||
134 | 138 | ||
135 | For further patches and various updates, including the current TODO | 139 | For further patches and various updates, including the current TODO |
136 | list, see: | 140 | list, see: |
137 | http://samba.org/~jamesm/crypto/ | 141 | http://gondor.apana.org.au/~herbert/crypto/ |
138 | 142 | ||
139 | 143 | ||
140 | AUTHORS | 144 | AUTHORS |
141 | 145 | ||
142 | James Morris | 146 | James Morris |
143 | David S. Miller | 147 | David S. Miller |
148 | Herbert Xu | ||
144 | 149 | ||
145 | 150 | ||
146 | CREDITS | 151 | CREDITS |
@@ -238,8 +243,11 @@ Anubis algorithm contributors: | |||
238 | Tiger algorithm contributors: | 243 | Tiger algorithm contributors: |
239 | Aaron Grothe | 244 | Aaron Grothe |
240 | 245 | ||
246 | VIA PadLock contributors: | ||
247 | Michal Ludvig | ||
248 | |||
241 | Generic scatterwalk code by Adam J. Richter <adam@yggdrasil.com> | 249 | Generic scatterwalk code by Adam J. Richter <adam@yggdrasil.com> |
242 | 250 | ||
243 | Please send any credits updates or corrections to: | 251 | Please send any credits updates or corrections to: |
244 | James Morris <jmorris@redhat.com> | 252 | Herbert Xu <herbert@gondor.apana.org.au> |
245 | 253 | ||
diff --git a/Documentation/devices.txt b/Documentation/devices.txt index b2f593fc76ca..addc67b1d770 100644 --- a/Documentation/devices.txt +++ b/Documentation/devices.txt | |||
@@ -3,7 +3,7 @@ | |||
3 | 3 | ||
4 | Maintained by Torben Mathiasen <device@lanana.org> | 4 | Maintained by Torben Mathiasen <device@lanana.org> |
5 | 5 | ||
6 | Last revised: 01 March 2006 | 6 | Last revised: 15 May 2006 |
7 | 7 | ||
8 | This list is the Linux Device List, the official registry of allocated | 8 | This list is the Linux Device List, the official registry of allocated |
9 | device numbers and /dev directory nodes for the Linux operating | 9 | device numbers and /dev directory nodes for the Linux operating |
@@ -2543,6 +2543,9 @@ Your cooperation is appreciated. | |||
2543 | 64 = /dev/usb/rio500 Diamond Rio 500 | 2543 | 64 = /dev/usb/rio500 Diamond Rio 500 |
2544 | 65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de) | 2544 | 65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de) |
2545 | 66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD) | 2545 | 66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD) |
2546 | 67 = /dev/usb/adutux0 1st Ontrak ADU device | ||
2547 | ... | ||
2548 | 76 = /dev/usb/adutux10 10th Ontrak ADU device | ||
2546 | 96 = /dev/usb/hiddev0 1st USB HID device | 2549 | 96 = /dev/usb/hiddev0 1st USB HID device |
2547 | ... | 2550 | ... |
2548 | 111 = /dev/usb/hiddev15 16th USB HID device | 2551 | 111 = /dev/usb/hiddev15 16th USB HID device |
@@ -2565,10 +2568,10 @@ Your cooperation is appreciated. | |||
2565 | 243 = /dev/usb/dabusb3 Fourth dabusb device | 2568 | 243 = /dev/usb/dabusb3 Fourth dabusb device |
2566 | 2569 | ||
2567 | 180 block USB block devices | 2570 | 180 block USB block devices |
2568 | 0 = /dev/uba First USB block device | 2571 | 0 = /dev/uba First USB block device |
2569 | 8 = /dev/ubb Second USB block device | 2572 | 8 = /dev/ubb Second USB block device |
2570 | 16 = /dev/ubc Thrid USB block device | 2573 | 16 = /dev/ubc Third USB block device |
2571 | ... | 2574 | ... |
2572 | 2575 | ||
2573 | 181 char Conrad Electronic parallel port radio clocks | 2576 | 181 char Conrad Electronic parallel port radio clocks |
2574 | 0 = /dev/pcfclock0 First Conrad radio clock | 2577 | 0 = /dev/pcfclock0 First Conrad radio clock |
@@ -2791,6 +2794,7 @@ Your cooperation is appreciated. | |||
2791 | 170 = /dev/ttyNX0 Hilscher netX serial port 0 | 2794 | 170 = /dev/ttyNX0 Hilscher netX serial port 0 |
2792 | ... | 2795 | ... |
2793 | 185 = /dev/ttyNX15 Hilscher netX serial port 15 | 2796 | 185 = /dev/ttyNX15 Hilscher netX serial port 15 |
2797 | 186 = /dev/ttyJ0 JTAG1 DCC protocol based serial port emulation | ||
2794 | 2798 | ||
2795 | 205 char Low-density serial ports (alternate device) | 2799 | 205 char Low-density serial ports (alternate device) |
2796 | 0 = /dev/culu0 Callout device for ttyLU0 | 2800 | 0 = /dev/culu0 Callout device for ttyLU0 |
@@ -3108,6 +3112,10 @@ Your cooperation is appreciated. | |||
3108 | ... | 3112 | ... |
3109 | 240 = /dev/rfdp 16th RFD FTL layer | 3113 | 240 = /dev/rfdp 16th RFD FTL layer |
3110 | 3114 | ||
3115 | 257 char Phoenix Technologies Cryptographic Services Driver | ||
3116 | 0 = /dev/ptlsec Crypto Services Driver | ||
3117 | |||
3118 | |||
3111 | 3119 | ||
3112 | **** ADDITIONAL /dev DIRECTORY ENTRIES | 3120 | **** ADDITIONAL /dev DIRECTORY ENTRIES |
3113 | 3121 | ||
diff --git a/Documentation/digiepca.txt b/Documentation/digiepca.txt index 88820fe38dad..f2560e22f2c9 100644 --- a/Documentation/digiepca.txt +++ b/Documentation/digiepca.txt | |||
@@ -2,7 +2,7 @@ NOTE: This driver is obsolete. Digi provides a 2.6 driver (dgdm) at | |||
2 | http://www.digi.com for PCI cards. They no longer maintain this driver, | 2 | http://www.digi.com for PCI cards. They no longer maintain this driver, |
3 | and have no 2.6 driver for ISA cards. | 3 | and have no 2.6 driver for ISA cards. |
4 | 4 | ||
5 | This driver requires a number of user-space tools. They can be aquired from | 5 | This driver requires a number of user-space tools. They can be acquired from |
6 | http://www.digi.com, but only works with 2.4 kernels. | 6 | http://www.digi.com, but only works with 2.4 kernels. |
7 | 7 | ||
8 | 8 | ||
diff --git a/Documentation/dontdiff b/Documentation/dontdiff index 24adfe9af3ca..63c2d0c55aa2 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff | |||
@@ -135,6 +135,7 @@ tags | |||
135 | times.h* | 135 | times.h* |
136 | tkparse | 136 | tkparse |
137 | trix_boot.h | 137 | trix_boot.h |
138 | utsrelease.h* | ||
138 | version.h* | 139 | version.h* |
139 | vmlinux | 140 | vmlinux |
140 | vmlinux-* | 141 | vmlinux-* |
diff --git a/Documentation/driver-model/overview.txt b/Documentation/driver-model/overview.txt index ac4a7a737e43..2050c9ffc629 100644 --- a/Documentation/driver-model/overview.txt +++ b/Documentation/driver-model/overview.txt | |||
@@ -18,7 +18,7 @@ Traditional driver models implemented some sort of tree-like structure | |||
18 | (sometimes just a list) for the devices they control. There wasn't any | 18 | (sometimes just a list) for the devices they control. There wasn't any |
19 | uniformity across the different bus types. | 19 | uniformity across the different bus types. |
20 | 20 | ||
21 | The current driver model provides a comon, uniform data model for describing | 21 | The current driver model provides a common, uniform data model for describing |
22 | a bus and the devices that can appear under the bus. The unified bus | 22 | a bus and the devices that can appear under the bus. The unified bus |
23 | model includes a set of common attributes which all busses carry, and a set | 23 | model includes a set of common attributes which all busses carry, and a set |
24 | of common callbacks, such as device discovery during bus probing, bus | 24 | of common callbacks, such as device discovery during bus probing, bus |
diff --git a/Documentation/drivers/edac/edac.txt b/Documentation/drivers/edac/edac.txt index 70d96a62e5e1..7b3d969d2964 100644 --- a/Documentation/drivers/edac/edac.txt +++ b/Documentation/drivers/edac/edac.txt | |||
@@ -35,15 +35,14 @@ the vendor should tie the parity status bits to 0 if they do not intend | |||
35 | to generate parity. Some vendors do not do this, and thus the parity bit | 35 | to generate parity. Some vendors do not do this, and thus the parity bit |
36 | can "float" giving false positives. | 36 | can "float" giving false positives. |
37 | 37 | ||
38 | The PCI Parity EDAC device has the ability to "skip" known flaky | 38 | [There are patches in the kernel queue which will allow for storage of |
39 | cards during the parity scan. These are set by the parity "blacklist" | 39 | quirks of PCI devices reporting false parity positives. The 2.6.18 |
40 | interface in the sysfs for PCI Parity. (See the PCI section in the sysfs | 40 | kernel should have those patches included. When that becomes available, |
41 | section below.) There is also a parity "whitelist" which is used as | 41 | then EDAC will be patched to utilize that information to "skip" such |
42 | an explicit list of devices to scan, while the blacklist is a list | 42 | devices.] |
43 | of devices to skip. | ||
44 | 43 | ||
45 | EDAC will have future error detectors that will be added or integrated | 44 | EDAC will have future error detectors that will be integrated with |
46 | into EDAC in the following list: | 45 | EDAC or added to it, in the following list: |
47 | 46 | ||
48 | MCE Machine Check Exception | 47 | MCE Machine Check Exception |
49 | MCA Machine Check Architecture | 48 | MCA Machine Check Architecture |
@@ -93,22 +92,24 @@ EDAC lives in the /sys/devices/system/edac directory. Within this directory | |||
93 | there currently reside 2 'edac' components: | 92 | there currently reside 2 'edac' components: |
94 | 93 | ||
95 | mc memory controller(s) system | 94 | mc memory controller(s) system |
96 | pci PCI status system | 95 | pci PCI control and status system |
97 | 96 | ||
98 | 97 | ||
99 | ============================================================================ | 98 | ============================================================================ |
100 | Memory Controller (mc) Model | 99 | Memory Controller (mc) Model |
101 | 100 | ||
102 | First a background on the memory controller's model abstracted in EDAC. | 101 | First a background on the memory controller's model abstracted in EDAC. |
103 | Each mc device controls a set of DIMM memory modules. These modules are | 102 | Each 'mc' device controls a set of DIMM memory modules. These modules are |
104 | laid out in a Chip-Select Row (csrowX) and Channel table (chX). There can | 103 | laid out in a Chip-Select Row (csrowX) and Channel table (chX). There can |
105 | be multiple csrows and two channels. | 104 | be multiple csrows and multiple channels. |
106 | 105 | ||
107 | Memory controllers allow for several csrows, with 8 csrows being a typical value. | 106 | Memory controllers allow for several csrows, with 8 csrows being a typical value. |
108 | Yet, the actual number of csrows depends on the electrical "loading" | 107 | Yet, the actual number of csrows depends on the electrical "loading" |
109 | of a given motherboard, memory controller and DIMM characteristics. | 108 | of a given motherboard, memory controller and DIMM characteristics. |
110 | 109 | ||
111 | Dual channels allows for 128 bit data transfers to the CPU from memory. | 110 | Dual channels allows for 128 bit data transfers to the CPU from memory. |
111 | Some newer chipsets allow for more than 2 channels, like Fully Buffered DIMMs | ||
112 | (FB-DIMMs). The following example will assume 2 channels: | ||
112 | 113 | ||
113 | 114 | ||
114 | Channel 0 Channel 1 | 115 | Channel 0 Channel 1 |
@@ -234,23 +235,15 @@ Polling period control file: | |||
234 | The time period, in milliseconds, for polling for error information. | 235 | The time period, in milliseconds, for polling for error information. |
235 | Too small a value wastes resources. Too large a value might delay | 236 | Too small a value wastes resources. Too large a value might delay |
236 | necessary handling of errors and might loose valuable information for | 237 | necessary handling of errors and might loose valuable information for |
237 | locating the error. 1000 milliseconds (once each second) is about | 238 | locating the error. 1000 milliseconds (once each second) is the current |
238 | right for most uses. | 239 | default. Systems which require all the bandwidth they can get, may |
240 | increase this. | ||
239 | 241 | ||
240 | LOAD TIME: module/kernel parameter: poll_msec=[0|1] | 242 | LOAD TIME: module/kernel parameter: poll_msec=[0|1] |
241 | 243 | ||
242 | RUN TIME: echo "1000" >/sys/devices/system/edac/mc/poll_msec | 244 | RUN TIME: echo "1000" >/sys/devices/system/edac/mc/poll_msec |
243 | 245 | ||
244 | 246 | ||
245 | Module Version read-only attribute file: | ||
246 | |||
247 | 'mc_version' | ||
248 | |||
249 | The EDAC CORE module's version and compile date are shown here to | ||
250 | indicate what EDAC is running. | ||
251 | |||
252 | |||
253 | |||
254 | ============================================================================ | 247 | ============================================================================ |
255 | 'mcX' DIRECTORIES | 248 | 'mcX' DIRECTORIES |
256 | 249 | ||
@@ -284,35 +277,6 @@ Seconds since last counter reset control file: | |||
284 | 277 | ||
285 | 278 | ||
286 | 279 | ||
287 | DIMM capability attribute file: | ||
288 | |||
289 | 'edac_capability' | ||
290 | |||
291 | The EDAC (Error Detection and Correction) capabilities/modes of | ||
292 | the memory controller hardware. | ||
293 | |||
294 | |||
295 | DIMM Current Capability attribute file: | ||
296 | |||
297 | 'edac_current_capability' | ||
298 | |||
299 | The EDAC capabilities available with the hardware | ||
300 | configuration. This may not be the same as "EDAC capability" | ||
301 | if the correct memory is not used. If a memory controller is | ||
302 | capable of EDAC, but DIMMs without check bits are in use, then | ||
303 | Parity, SECDED, S4ECD4ED capabilities will not be available | ||
304 | even though the memory controller might be capable of those | ||
305 | modes with the proper memory loaded. | ||
306 | |||
307 | |||
308 | Memory Type supported on this controller attribute file: | ||
309 | |||
310 | 'supported_mem_type' | ||
311 | |||
312 | This attribute file displays the memory type, usually | ||
313 | buffered and unbuffered DIMMs. | ||
314 | |||
315 | |||
316 | Memory Controller name attribute file: | 280 | Memory Controller name attribute file: |
317 | 281 | ||
318 | 'mc_name' | 282 | 'mc_name' |
@@ -321,16 +285,6 @@ Memory Controller name attribute file: | |||
321 | that is being utilized. | 285 | that is being utilized. |
322 | 286 | ||
323 | 287 | ||
324 | Memory Controller Module name attribute file: | ||
325 | |||
326 | 'module_name' | ||
327 | |||
328 | This attribute file displays the memory controller module name, | ||
329 | version and date built. The name of the memory controller | ||
330 | hardware - some drivers work with multiple controllers and | ||
331 | this field shows which hardware is present. | ||
332 | |||
333 | |||
334 | Total memory managed by this memory controller attribute file: | 288 | Total memory managed by this memory controller attribute file: |
335 | 289 | ||
336 | 'size_mb' | 290 | 'size_mb' |
@@ -432,6 +386,9 @@ Memory Type attribute file: | |||
432 | 386 | ||
433 | This attribute file will display what type of memory is currently | 387 | This attribute file will display what type of memory is currently |
434 | on this csrow. Normally, either buffered or unbuffered memory. | 388 | on this csrow. Normally, either buffered or unbuffered memory. |
389 | Examples: | ||
390 | Registered-DDR | ||
391 | Unbuffered-DDR | ||
435 | 392 | ||
436 | 393 | ||
437 | EDAC Mode of operation attribute file: | 394 | EDAC Mode of operation attribute file: |
@@ -446,8 +403,13 @@ Device type attribute file: | |||
446 | 403 | ||
447 | 'dev_type' | 404 | 'dev_type' |
448 | 405 | ||
449 | This attribute file will display what type of DIMM device is | 406 | This attribute file will display what type of DRAM device is |
450 | being utilized. Example: x4 | 407 | being utilized on this DIMM. |
408 | Examples: | ||
409 | x1 | ||
410 | x2 | ||
411 | x4 | ||
412 | x8 | ||
451 | 413 | ||
452 | 414 | ||
453 | Channel 0 CE Count attribute file: | 415 | Channel 0 CE Count attribute file: |
@@ -522,10 +484,10 @@ SYSTEM LOGGING | |||
522 | If logging for UEs and CEs are enabled then system logs will have | 484 | If logging for UEs and CEs are enabled then system logs will have |
523 | error notices indicating errors that have been detected: | 485 | error notices indicating errors that have been detected: |
524 | 486 | ||
525 | MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, | 487 | EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, |
526 | channel 1 "DIMM_B1": amd76x_edac | 488 | channel 1 "DIMM_B1": amd76x_edac |
527 | 489 | ||
528 | MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, | 490 | EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, |
529 | channel 1 "DIMM_B1": amd76x_edac | 491 | channel 1 "DIMM_B1": amd76x_edac |
530 | 492 | ||
531 | 493 | ||
@@ -610,64 +572,4 @@ Parity Count: | |||
610 | 572 | ||
611 | 573 | ||
612 | 574 | ||
613 | PCI Device Whitelist: | ||
614 | |||
615 | 'pci_parity_whitelist' | ||
616 | |||
617 | This control file allows for an explicit list of PCI devices to be | ||
618 | scanned for parity errors. Only devices found on this list will | ||
619 | be examined. The list is a line of hexadecimal VENDOR and DEVICE | ||
620 | ID tuples: | ||
621 | |||
622 | 1022:7450,1434:16a6 | ||
623 | |||
624 | One or more can be inserted, separated by a comma. | ||
625 | |||
626 | To write the above list doing the following as one command line: | ||
627 | |||
628 | echo "1022:7450,1434:16a6" | ||
629 | > /sys/devices/system/edac/pci/pci_parity_whitelist | ||
630 | |||
631 | |||
632 | |||
633 | To display what the whitelist is, simply 'cat' the same file. | ||
634 | |||
635 | |||
636 | PCI Device Blacklist: | ||
637 | |||
638 | 'pci_parity_blacklist' | ||
639 | |||
640 | This control file allows for a list of PCI devices to be | ||
641 | skipped for scanning. | ||
642 | The list is a line of hexadecimal VENDOR and DEVICE ID tuples: | ||
643 | |||
644 | 1022:7450,1434:16a6 | ||
645 | |||
646 | One or more can be inserted, separated by a comma. | ||
647 | |||
648 | To write the above list doing the following as one command line: | ||
649 | |||
650 | echo "1022:7450,1434:16a6" | ||
651 | > /sys/devices/system/edac/pci/pci_parity_blacklist | ||
652 | |||
653 | |||
654 | To display what the whitelist currently contains, | ||
655 | simply 'cat' the same file. | ||
656 | |||
657 | ======================================================================= | 575 | ======================================================================= |
658 | |||
659 | PCI Vendor and Devices IDs can be obtained with the lspci command. Using | ||
660 | the -n option lspci will display the vendor and device IDs. The system | ||
661 | administrator will have to determine which devices should be scanned or | ||
662 | skipped. | ||
663 | |||
664 | |||
665 | |||
666 | The two lists (white and black) are prioritized. blacklist is the lower | ||
667 | priority and will NOT be utilized when a whitelist has been set. | ||
668 | Turn OFF a whitelist by an empty echo command: | ||
669 | |||
670 | echo > /sys/devices/system/edac/pci/pci_parity_whitelist | ||
671 | |||
672 | and any previous blacklist will be utilized. | ||
673 | |||
diff --git a/Documentation/fb/fbcon.txt b/Documentation/fb/fbcon.txt index 08dce0f631bf..f373df12ed4c 100644 --- a/Documentation/fb/fbcon.txt +++ b/Documentation/fb/fbcon.txt | |||
@@ -135,10 +135,10 @@ C. Boot options | |||
135 | 135 | ||
136 | The angle can be changed anytime afterwards by 'echoing' the same | 136 | The angle can be changed anytime afterwards by 'echoing' the same |
137 | numbers to any one of the 2 attributes found in | 137 | numbers to any one of the 2 attributes found in |
138 | /sys/class/graphics/fb{x} | 138 | /sys/class/graphics/fbcon |
139 | 139 | ||
140 | con_rotate - rotate the display of the active console | 140 | rotate - rotate the display of the active console |
141 | con_rotate_all - rotate the display of all consoles | 141 | rotate_all - rotate the display of all consoles |
142 | 142 | ||
143 | Console rotation will only become available if Console Rotation | 143 | Console rotation will only become available if Console Rotation |
144 | Support is compiled in your kernel. | 144 | Support is compiled in your kernel. |
@@ -148,5 +148,177 @@ C. Boot options | |||
148 | Actually, the underlying fb driver is totally ignorant of console | 148 | Actually, the underlying fb driver is totally ignorant of console |
149 | rotation. | 149 | rotation. |
150 | 150 | ||
151 | --- | 151 | C. Attaching, Detaching and Unloading |
152 | |||
153 | Before going on on how to attach, detach and unload the framebuffer console, an | ||
154 | illustration of the dependencies may help. | ||
155 | |||
156 | The console layer, as with most subsystems, needs a driver that interfaces with | ||
157 | the hardware. Thus, in a VGA console: | ||
158 | |||
159 | console ---> VGA driver ---> hardware. | ||
160 | |||
161 | Assuming the VGA driver can be unloaded, one must first unbind the VGA driver | ||
162 | from the console layer before unloading the driver. The VGA driver cannot be | ||
163 | unloaded if it is still bound to the console layer. (See | ||
164 | Documentation/console/console.txt for more information). | ||
165 | |||
166 | This is more complicated in the case of the the framebuffer console (fbcon), | ||
167 | because fbcon is an intermediate layer between the console and the drivers: | ||
168 | |||
169 | console ---> fbcon ---> fbdev drivers ---> hardware | ||
170 | |||
171 | The fbdev drivers cannot be unloaded if it's bound to fbcon, and fbcon cannot | ||
172 | be unloaded if it's bound to the console layer. | ||
173 | |||
174 | So to unload the fbdev drivers, one must first unbind fbcon from the console, | ||
175 | then unbind the fbdev drivers from fbcon. Fortunately, unbinding fbcon from | ||
176 | the console layer will automatically unbind framebuffer drivers from | ||
177 | fbcon. Thus, there is no need to explicitly unbind the fbdev drivers from | ||
178 | fbcon. | ||
179 | |||
180 | So, how do we unbind fbcon from the console? Part of the answer is in | ||
181 | Documentation/console/console.txt. To summarize: | ||
182 | |||
183 | Echo a value to the bind file that represents the framebuffer console | ||
184 | driver. So assuming vtcon1 represents fbcon, then: | ||
185 | |||
186 | echo 1 > sys/class/vtconsole/vtcon1/bind - attach framebuffer console to | ||
187 | console layer | ||
188 | echo 0 > sys/class/vtconsole/vtcon1/bind - detach framebuffer console from | ||
189 | console layer | ||
190 | |||
191 | If fbcon is detached from the console layer, your boot console driver (which is | ||
192 | usually VGA text mode) will take over. A few drivers (rivafb and i810fb) will | ||
193 | restore VGA text mode for you. With the rest, before detaching fbcon, you | ||
194 | must take a few additional steps to make sure that your VGA text mode is | ||
195 | restored properly. The following is one of the several methods that you can do: | ||
196 | |||
197 | 1. Download or install vbetool. This utility is included with most | ||
198 | distributions nowadays, and is usually part of the suspend/resume tool. | ||
199 | |||
200 | 2. In your kernel configuration, ensure that CONFIG_FRAMEBUFFER_CONSOLE is set | ||
201 | to 'y' or 'm'. Enable one or more of your favorite framebuffer drivers. | ||
202 | |||
203 | 3. Boot into text mode and as root run: | ||
204 | |||
205 | vbetool vbestate save > <vga state file> | ||
206 | |||
207 | The above command saves the register contents of your graphics | ||
208 | hardware to <vga state file>. You need to do this step only once as | ||
209 | the state file can be reused. | ||
210 | |||
211 | 4. If fbcon is compiled as a module, load fbcon by doing: | ||
212 | |||
213 | modprobe fbcon | ||
214 | |||
215 | 5. Now to detach fbcon: | ||
216 | |||
217 | vbetool vbestate restore < <vga state file> && \ | ||
218 | echo 0 > /sys/class/vtconsole/vtcon1/bind | ||
219 | |||
220 | 6. That's it, you're back to VGA mode. And if you compiled fbcon as a module, | ||
221 | you can unload it by 'rmmod fbcon' | ||
222 | |||
223 | 7. To reattach fbcon: | ||
224 | |||
225 | echo 1 > /sys/class/vtconsole/vtcon1/bind | ||
226 | |||
227 | 8. Once fbcon is unbound, all drivers registered to the system will also | ||
228 | become unbound. This means that fbcon and individual framebuffer drivers | ||
229 | can be unloaded or reloaded at will. Reloading the drivers or fbcon will | ||
230 | automatically bind the console, fbcon and the drivers together. Unloading | ||
231 | all the drivers without unloading fbcon will make it impossible for the | ||
232 | console to bind fbcon. | ||
233 | |||
234 | Notes for vesafb users: | ||
235 | ======================= | ||
236 | |||
237 | Unfortunately, if your bootline includes a vga=xxx parameter that sets the | ||
238 | hardware in graphics mode, such as when loading vesafb, vgacon will not load. | ||
239 | Instead, vgacon will replace the default boot console with dummycon, and you | ||
240 | won't get any display after detaching fbcon. Your machine is still alive, so | ||
241 | you can reattach vesafb. However, to reattach vesafb, you need to do one of | ||
242 | the following: | ||
243 | |||
244 | Variation 1: | ||
245 | |||
246 | a. Before detaching fbcon, do | ||
247 | |||
248 | vbetool vbemode save > <vesa state file> # do once for each vesafb mode, | ||
249 | # the file can be reused | ||
250 | |||
251 | b. Detach fbcon as in step 5. | ||
252 | |||
253 | c. Attach fbcon | ||
254 | |||
255 | vbetool vbestate restore < <vesa state file> && \ | ||
256 | echo 1 > /sys/class/vtconsole/vtcon1/bind | ||
257 | |||
258 | Variation 2: | ||
259 | |||
260 | a. Before detaching fbcon, do: | ||
261 | echo <ID> > /sys/class/tty/console/bind | ||
262 | |||
263 | |||
264 | vbetool vbemode get | ||
265 | |||
266 | b. Take note of the mode number | ||
267 | |||
268 | b. Detach fbcon as in step 5. | ||
269 | |||
270 | c. Attach fbcon: | ||
271 | |||
272 | vbetool vbemode set <mode number> && \ | ||
273 | echo 1 > /sys/class/vtconsole/vtcon1/bind | ||
274 | |||
275 | Samples: | ||
276 | ======== | ||
277 | |||
278 | Here are 2 sample bash scripts that you can use to bind or unbind the | ||
279 | framebuffer console driver if you are in an X86 box: | ||
280 | |||
281 | --------------------------------------------------------------------------- | ||
282 | #!/bin/bash | ||
283 | # Unbind fbcon | ||
284 | |||
285 | # Change this to where your actual vgastate file is located | ||
286 | # Or Use VGASTATE=$1 to indicate the state file at runtime | ||
287 | VGASTATE=/tmp/vgastate | ||
288 | |||
289 | # path to vbetool | ||
290 | VBETOOL=/usr/local/bin | ||
291 | |||
292 | |||
293 | for (( i = 0; i < 16; i++)) | ||
294 | do | ||
295 | if test -x /sys/class/vtconsole/vtcon$i; then | ||
296 | if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \ | ||
297 | = 1 ]; then | ||
298 | if test -x $VBETOOL/vbetool; then | ||
299 | echo Unbinding vtcon$i | ||
300 | $VBETOOL/vbetool vbestate restore < $VGASTATE | ||
301 | echo 0 > /sys/class/vtconsole/vtcon$i/bind | ||
302 | fi | ||
303 | fi | ||
304 | fi | ||
305 | done | ||
306 | |||
307 | --------------------------------------------------------------------------- | ||
308 | #!/bin/bash | ||
309 | # Bind fbcon | ||
310 | |||
311 | for (( i = 0; i < 16; i++)) | ||
312 | do | ||
313 | if test -x /sys/class/vtconsole/vtcon$i; then | ||
314 | if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \ | ||
315 | = 1 ]; then | ||
316 | echo Unbinding vtcon$i | ||
317 | echo 1 > /sys/class/vtconsole/vtcon$i/bind | ||
318 | fi | ||
319 | fi | ||
320 | done | ||
321 | --------------------------------------------------------------------------- | ||
322 | |||
323 | -- | ||
152 | Antonino Daplas <adaplas@pol.net> | 324 | Antonino Daplas <adaplas@pol.net> |
diff --git a/Documentation/fb/imacfb.txt b/Documentation/fb/imacfb.txt new file mode 100644 index 000000000000..759028545a7e --- /dev/null +++ b/Documentation/fb/imacfb.txt | |||
@@ -0,0 +1,31 @@ | |||
1 | |||
2 | What is imacfb? | ||
3 | =============== | ||
4 | |||
5 | This is a generic EFI platform driver for Intel based Apple computers. | ||
6 | Imacfb is only for EFI booted Intel Macs. | ||
7 | |||
8 | Supported Hardware | ||
9 | ================== | ||
10 | |||
11 | iMac 17"/20" | ||
12 | Macbook | ||
13 | Macbook Pro 15"/17" | ||
14 | MacMini | ||
15 | |||
16 | How to use it? | ||
17 | ============== | ||
18 | |||
19 | Imacfb does not have any kind of autodetection of your machine. | ||
20 | You have to add the fillowing kernel parameters in your elilo.conf: | ||
21 | Macbook : | ||
22 | video=imacfb:macbook | ||
23 | MacMini : | ||
24 | video=imacfb:mini | ||
25 | Macbook Pro 15", iMac 17" : | ||
26 | video=imacfb:i17 | ||
27 | Macbook Pro 17", iMac 20" : | ||
28 | video=imacfb:i20 | ||
29 | |||
30 | -- | ||
31 | Edgar Hucek <gimli@dark-green.com> | ||
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 027285d0c26c..436697cb9388 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -6,14 +6,18 @@ be removed from this file. | |||
6 | 6 | ||
7 | --------------------------- | 7 | --------------------------- |
8 | 8 | ||
9 | What: devfs | 9 | What: /sys/devices/.../power/state |
10 | When: July 2005 | 10 | dev->power.power_state |
11 | Files: fs/devfs/*, include/linux/devfs_fs*.h and assorted devfs | 11 | dpm_runtime_{suspend,resume)() |
12 | function calls throughout the kernel tree | 12 | When: July 2007 |
13 | Why: It has been unmaintained for a number of years, has unfixable | 13 | Why: Broken design for runtime control over driver power states, confusing |
14 | races, contains a naming policy within the kernel that is | 14 | driver-internal runtime power management with: mechanisms to support |
15 | against the LSB, and can be replaced by using udev. | 15 | system-wide sleep state transitions; event codes that distinguish |
16 | Who: Greg Kroah-Hartman <greg@kroah.com> | 16 | different phases of swsusp "sleep" transitions; and userspace policy |
17 | inputs. This framework was never widely used, and most attempts to | ||
18 | use it were broken. Drivers should instead be exposing domain-specific | ||
19 | interfaces either to kernel or to userspace. | ||
20 | Who: Pavel Machek <pavel@suse.cz> | ||
17 | 21 | ||
18 | --------------------------- | 22 | --------------------------- |
19 | 23 | ||
@@ -66,11 +70,15 @@ Who: Mauro Carvalho Chehab <mchehab@brturbo.com.br> | |||
66 | 70 | ||
67 | --------------------------- | 71 | --------------------------- |
68 | 72 | ||
69 | What: remove EXPORT_SYMBOL(insert_resource) | 73 | What: sys_sysctl |
70 | When: April 2006 | 74 | When: January 2007 |
71 | Files: kernel/resource.c | 75 | Why: The same information is available through /proc/sys and that is the |
72 | Why: No modular usage in the kernel. | 76 | interface user space prefers to use. And there do not appear to be |
73 | Who: Adrian Bunk <bunk@stusta.de> | 77 | any existing user in user space of sys_sysctl. The additional |
78 | maintenance overhead of keeping a set of binary names gets | ||
79 | in the way of doing a good job of maintaining this interface. | ||
80 | |||
81 | Who: Eric Biederman <ebiederm@xmission.com> | ||
74 | 82 | ||
75 | --------------------------- | 83 | --------------------------- |
76 | 84 | ||
@@ -132,16 +140,6 @@ Who: NeilBrown <neilb@suse.de> | |||
132 | 140 | ||
133 | --------------------------- | 141 | --------------------------- |
134 | 142 | ||
135 | What: au1x00_uart driver | ||
136 | When: January 2006 | ||
137 | Why: The 8250 serial driver now has the ability to deal with the differences | ||
138 | between the standard 8250 family of UARTs and their slightly strange | ||
139 | brother on Alchemy SOCs. The loss of features is not considered an | ||
140 | issue. | ||
141 | Who: Ralf Baechle <ralf@linux-mips.org> | ||
142 | |||
143 | --------------------------- | ||
144 | |||
145 | What: eepro100 network driver | 143 | What: eepro100 network driver |
146 | When: January 2007 | 144 | When: January 2007 |
147 | Why: replaced by the e100 driver | 145 | Why: replaced by the e100 driver |
@@ -149,6 +147,13 @@ Who: Adrian Bunk <bunk@stusta.de> | |||
149 | 147 | ||
150 | --------------------------- | 148 | --------------------------- |
151 | 149 | ||
150 | What: drivers depending on OSS_OBSOLETE_DRIVER | ||
151 | When: options in 2.6.20, code in 2.6.22 | ||
152 | Why: OSS drivers with ALSA replacements | ||
153 | Who: Adrian Bunk <bunk@stusta.de> | ||
154 | |||
155 | --------------------------- | ||
156 | |||
152 | What: pci_module_init(driver) | 157 | What: pci_module_init(driver) |
153 | When: January 2007 | 158 | When: January 2007 |
154 | Why: Is replaced by pci_register_driver(pci_driver). | 159 | Why: Is replaced by pci_register_driver(pci_driver). |
@@ -177,14 +182,13 @@ Who: Jean Delvare <khali@linux-fr.org> | |||
177 | 182 | ||
178 | --------------------------- | 183 | --------------------------- |
179 | 184 | ||
180 | What: remove EXPORT_SYMBOL(tasklist_lock) | 185 | What: Unused EXPORT_SYMBOL/EXPORT_SYMBOL_GPL exports |
181 | When: August 2006 | 186 | (temporary transition config option provided until then) |
182 | Files: kernel/fork.c | 187 | The transition config option will also be removed at the same time. |
183 | Why: tasklist_lock protects the kernel internal task list. Modules have | 188 | When: before 2.6.19 |
184 | no business looking at it, and all instances in drivers have been due | 189 | Why: Unused symbols are both increasing the size of the kernel binary |
185 | to use of too-lowlevel APIs. Having this symbol exported prevents | 190 | and are often a sign of "wrong API" |
186 | moving to more scalable locking schemes for the task list. | 191 | Who: Arjan van de Ven <arjan@linux.intel.com> |
187 | Who: Christoph Hellwig <hch@lst.de> | ||
188 | 192 | ||
189 | --------------------------- | 193 | --------------------------- |
190 | 194 | ||
@@ -224,3 +228,109 @@ Why: The interface no longer has any callers left in the kernel. It | |||
224 | Who: Nick Piggin <npiggin@suse.de> | 228 | Who: Nick Piggin <npiggin@suse.de> |
225 | 229 | ||
226 | --------------------------- | 230 | --------------------------- |
231 | |||
232 | What: Support for the Momentum / PMC-Sierra Jaguar ATX evaluation board | ||
233 | When: September 2006 | ||
234 | Why: Does no longer build since quite some time, and was never popular, | ||
235 | due to the platform being replaced by successor models. Apparently | ||
236 | no user base left. It also is one of the last users of | ||
237 | WANT_PAGE_VIRTUAL. | ||
238 | Who: Ralf Baechle <ralf@linux-mips.org> | ||
239 | |||
240 | --------------------------- | ||
241 | |||
242 | What: Support for the Momentum Ocelot, Ocelot 3, Ocelot C and Ocelot G | ||
243 | When: September 2006 | ||
244 | Why: Some do no longer build and apparently there is no user base left | ||
245 | for these platforms. | ||
246 | Who: Ralf Baechle <ralf@linux-mips.org> | ||
247 | |||
248 | --------------------------- | ||
249 | |||
250 | What: Support for MIPS Technologies' Altas and SEAD evaluation board | ||
251 | When: September 2006 | ||
252 | Why: Some do no longer build and apparently there is no user base left | ||
253 | for these platforms. Hardware out of production since several years. | ||
254 | Who: Ralf Baechle <ralf@linux-mips.org> | ||
255 | |||
256 | --------------------------- | ||
257 | |||
258 | What: Support for the IT8172-based platforms, ITE 8172G and Globespan IVR | ||
259 | When: September 2006 | ||
260 | Why: Code does no longer build since at least 2.6.0, apparently there is | ||
261 | no user base left for these platforms. Hardware out of production | ||
262 | since several years and hardly a trace of the manufacturer left on | ||
263 | the net. | ||
264 | Who: Ralf Baechle <ralf@linux-mips.org> | ||
265 | |||
266 | --------------------------- | ||
267 | |||
268 | What: Interrupt only SA_* flags | ||
269 | When: Januar 2007 | ||
270 | Why: The interrupt related SA_* flags are replaced by IRQF_* to move them | ||
271 | out of the signal namespace. | ||
272 | |||
273 | Who: Thomas Gleixner <tglx@linutronix.de> | ||
274 | |||
275 | --------------------------- | ||
276 | |||
277 | What: i2c-ite and i2c-algo-ite drivers | ||
278 | When: September 2006 | ||
279 | Why: These drivers never compiled since they were added to the kernel | ||
280 | tree 5 years ago. This feature removal can be reevaluated if | ||
281 | someone shows interest in the drivers, fixes them and takes over | ||
282 | maintenance. | ||
283 | http://marc.theaimsgroup.com/?l=linux-mips&m=115040510817448 | ||
284 | Who: Jean Delvare <khali@linux-fr.org> | ||
285 | |||
286 | --------------------------- | ||
287 | |||
288 | What: Bridge netfilter deferred IPv4/IPv6 output hook calling | ||
289 | When: January 2007 | ||
290 | Why: The deferred output hooks are a layering violation causing unusual | ||
291 | and broken behaviour on bridge devices. Examples of things they | ||
292 | break include QoS classifation using the MARK or CLASSIFY targets, | ||
293 | the IPsec policy match and connection tracking with VLANs on a | ||
294 | bridge. Their only use is to enable bridge output port filtering | ||
295 | within iptables with the physdev match, which can also be done by | ||
296 | combining iptables and ebtables using netfilter marks. Until it | ||
297 | will get removed the hook deferral is disabled by default and is | ||
298 | only enabled when needed. | ||
299 | |||
300 | Who: Patrick McHardy <kaber@trash.net> | ||
301 | |||
302 | --------------------------- | ||
303 | |||
304 | What: frame diverter | ||
305 | When: November 2006 | ||
306 | Why: The frame diverter is included in most distribution kernels, but is | ||
307 | broken. It does not correctly handle many things: | ||
308 | - IPV6 | ||
309 | - non-linear skb's | ||
310 | - network device RCU on removal | ||
311 | - input frames not correctly checked for protocol errors | ||
312 | It also adds allocation overhead even if not enabled. | ||
313 | It is not clear if anyone is still using it. | ||
314 | Who: Stephen Hemminger <shemminger@osdl.org> | ||
315 | |||
316 | --------------------------- | ||
317 | |||
318 | |||
319 | What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment | ||
320 | When: Oktober 2008 | ||
321 | Why: The stacking of class devices makes these values misleading and | ||
322 | inconsistent. | ||
323 | Class devices should not carry any of these properties, and bus | ||
324 | devices have SUBSYTEM and DRIVER as a replacement. | ||
325 | Who: Kay Sievers <kay.sievers@suse.de> | ||
326 | |||
327 | --------------------------- | ||
328 | |||
329 | What: i2c-isa | ||
330 | When: December 2006 | ||
331 | Why: i2c-isa is a non-sense and doesn't fit in the device driver | ||
332 | model. Drivers relying on it are better implemented as platform | ||
333 | drivers. | ||
334 | Who: Jean Delvare <khali@linux-fr.org> | ||
335 | |||
336 | --------------------------- | ||
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 66fdc0744fe0..16dec61d7671 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX | |||
@@ -62,8 +62,8 @@ ramfs-rootfs-initramfs.txt | |||
62 | - info on the 'in memory' filesystems ramfs, rootfs and initramfs. | 62 | - info on the 'in memory' filesystems ramfs, rootfs and initramfs. |
63 | reiser4.txt | 63 | reiser4.txt |
64 | - info on the Reiser4 filesystem based on dancing tree algorithms. | 64 | - info on the Reiser4 filesystem based on dancing tree algorithms. |
65 | relayfs.txt | 65 | relay.txt |
66 | - info on relayfs, for efficient streaming from kernel to user space. | 66 | - info on relay, for efficient streaming from kernel to user space. |
67 | romfs.txt | 67 | romfs.txt |
68 | - description of the ROMFS filesystem. | 68 | - description of the ROMFS filesystem. |
69 | smbfs.txt | 69 | smbfs.txt |
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index d31efbbdfe50..247d7f619aa2 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking | |||
@@ -142,8 +142,8 @@ see also dquot_operations section. | |||
142 | 142 | ||
143 | --------------------------- file_system_type --------------------------- | 143 | --------------------------- file_system_type --------------------------- |
144 | prototypes: | 144 | prototypes: |
145 | struct int (*get_sb) (struct file_system_type *, int, | 145 | int (*get_sb) (struct file_system_type *, int, |
146 | const char *, void *, struct vfsmount *); | 146 | const char *, void *, struct vfsmount *); |
147 | void (*kill_sb) (struct super_block *); | 147 | void (*kill_sb) (struct super_block *); |
148 | locking rules: | 148 | locking rules: |
149 | may block BKL | 149 | may block BKL |
diff --git a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.txt index 58c65a1713e5..7cac200e2a85 100644 --- a/Documentation/filesystems/automount-support.txt +++ b/Documentation/filesystems/automount-support.txt | |||
@@ -19,7 +19,7 @@ following procedure: | |||
19 | 19 | ||
20 | (2) Have the follow_link() op do the following steps: | 20 | (2) Have the follow_link() op do the following steps: |
21 | 21 | ||
22 | (a) Call do_kern_mount() to call the appropriate filesystem to set up a | 22 | (a) Call vfs_kern_mount() to call the appropriate filesystem to set up a |
23 | superblock and gain a vfsmount structure representing it. | 23 | superblock and gain a vfsmount structure representing it. |
24 | 24 | ||
25 | (b) Copy the nameidata provided as an argument and substitute the dentry | 25 | (b) Copy the nameidata provided as an argument and substitute the dentry |
diff --git a/Documentation/filesystems/configfs/configfs_example.c b/Documentation/filesystems/configfs/configfs_example.c index 3d4713a6c207..2d6a14a463e0 100644 --- a/Documentation/filesystems/configfs/configfs_example.c +++ b/Documentation/filesystems/configfs/configfs_example.c | |||
@@ -264,6 +264,15 @@ static struct config_item_type simple_child_type = { | |||
264 | }; | 264 | }; |
265 | 265 | ||
266 | 266 | ||
267 | struct simple_children { | ||
268 | struct config_group group; | ||
269 | }; | ||
270 | |||
271 | static inline struct simple_children *to_simple_children(struct config_item *item) | ||
272 | { | ||
273 | return item ? container_of(to_config_group(item), struct simple_children, group) : NULL; | ||
274 | } | ||
275 | |||
267 | static struct config_item *simple_children_make_item(struct config_group *group, const char *name) | 276 | static struct config_item *simple_children_make_item(struct config_group *group, const char *name) |
268 | { | 277 | { |
269 | struct simple_child *simple_child; | 278 | struct simple_child *simple_child; |
@@ -304,7 +313,13 @@ static ssize_t simple_children_attr_show(struct config_item *item, | |||
304 | "items have only one attribute that is readable and writeable.\n"); | 313 | "items have only one attribute that is readable and writeable.\n"); |
305 | } | 314 | } |
306 | 315 | ||
316 | static void simple_children_release(struct config_item *item) | ||
317 | { | ||
318 | kfree(to_simple_children(item)); | ||
319 | } | ||
320 | |||
307 | static struct configfs_item_operations simple_children_item_ops = { | 321 | static struct configfs_item_operations simple_children_item_ops = { |
322 | .release = simple_children_release, | ||
308 | .show_attribute = simple_children_attr_show, | 323 | .show_attribute = simple_children_attr_show, |
309 | }; | 324 | }; |
310 | 325 | ||
@@ -345,10 +360,6 @@ static struct configfs_subsystem simple_children_subsys = { | |||
345 | * children of its own. | 360 | * children of its own. |
346 | */ | 361 | */ |
347 | 362 | ||
348 | struct simple_children { | ||
349 | struct config_group group; | ||
350 | }; | ||
351 | |||
352 | static struct config_group *group_children_make_group(struct config_group *group, const char *name) | 363 | static struct config_group *group_children_make_group(struct config_group *group, const char *name) |
353 | { | 364 | { |
354 | struct simple_children *simple_children; | 365 | struct simple_children *simple_children; |
diff --git a/Documentation/filesystems/devfs/ChangeLog b/Documentation/filesystems/devfs/ChangeLog deleted file mode 100644 index e5aba5246d7c..000000000000 --- a/Documentation/filesystems/devfs/ChangeLog +++ /dev/null | |||
@@ -1,1977 +0,0 @@ | |||
1 | /* -*- auto-fill -*- */ | ||
2 | =============================================================================== | ||
3 | Changes for patch v1 | ||
4 | |||
5 | - creation of devfs | ||
6 | |||
7 | - modified miscellaneous character devices to support devfs | ||
8 | =============================================================================== | ||
9 | Changes for patch v2 | ||
10 | |||
11 | - bug fix with manual inode creation | ||
12 | =============================================================================== | ||
13 | Changes for patch v3 | ||
14 | |||
15 | - bugfixes | ||
16 | |||
17 | - documentation improvements | ||
18 | |||
19 | - created a couple of scripts (one to save&restore a devfs and the | ||
20 | other to set up compatibility symlinks) | ||
21 | |||
22 | - devfs support for SCSI discs. New name format is: sd_hHcCiIlL | ||
23 | =============================================================================== | ||
24 | Changes for patch v4 | ||
25 | |||
26 | - bugfix for the directory reading code | ||
27 | |||
28 | - bugfix for compilation with kerneld | ||
29 | |||
30 | - devfs support for generic hard discs | ||
31 | |||
32 | - rationalisation of the various watchdog drivers | ||
33 | =============================================================================== | ||
34 | Changes for patch v5 | ||
35 | |||
36 | - support for mounting directly from entries in the devfs (it doesn't | ||
37 | need to be mounted to do this), including the root filesystem. | ||
38 | Mounting of swap partitions also works. Hence, now if you set | ||
39 | CONFIG_DEVFS_ONLY to 'Y' then you won't be able to access your discs | ||
40 | via ordinary device nodes. Naturally, the default is 'N' so that you | ||
41 | can still use your old device nodes. If you want to mount from devfs | ||
42 | entries, make sure you use: append = "root=/dev/sd_..." in your | ||
43 | lilo.conf. It seems LILO looks for the device number (major&minor) | ||
44 | and writes that into the kernel image :-( | ||
45 | |||
46 | - support for character memory devices (/dev/null, /dev/zero, /dev/full | ||
47 | and so on). Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
48 | =============================================================================== | ||
49 | Changes for patch v6 | ||
50 | |||
51 | - support for subdirectories | ||
52 | |||
53 | - support for symbolic links (created by devfs_mk_symlink(), no | ||
54 | support yet for creation via symlink(2)) | ||
55 | |||
56 | - SCSI disc naming now cast in stone, with the format: | ||
57 | /dev/sd/c0b1t2u3 controller=0, bus=1, ID=2, LUN=3, whole disc | ||
58 | /dev/sd/c0b1t2u3p4 controller=0, bus=1, ID=2, LUN=3, 4th partition | ||
59 | |||
60 | - loop devices now appear in devfs | ||
61 | |||
62 | - tty devices, console, serial ports, etc. now appear in devfs | ||
63 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
64 | |||
65 | - bugs with mounting devfs-only devices now fixed | ||
66 | =============================================================================== | ||
67 | Changes for patch v7 | ||
68 | |||
69 | - SCSI CD-ROMS, tapes and generic devices now appear in devfs | ||
70 | =============================================================================== | ||
71 | Changes for patch v8 | ||
72 | |||
73 | - bugfix with no-rewind SCSI tapes | ||
74 | |||
75 | - RAMDISCs now appear in devfs | ||
76 | |||
77 | - better cleaning up of devfs entries created by various modules | ||
78 | |||
79 | - interface change to <devfs_register> | ||
80 | =============================================================================== | ||
81 | Changes for patch v9 | ||
82 | |||
83 | - the v8 patch was corrupted somehow, which would affect the patch for | ||
84 | linux/fs/filesystems.c | ||
85 | I've also fixed the v8 patch file on the WWW | ||
86 | |||
87 | - MetaDevices (/dev/md*) should now appear in devfs | ||
88 | =============================================================================== | ||
89 | Changes for patch v10 | ||
90 | |||
91 | - bugfix in meta device support for devfs | ||
92 | |||
93 | - created this ChangeLog file | ||
94 | |||
95 | - added devfs support to the floppy driver | ||
96 | |||
97 | - added support for creating sockets in a devfs | ||
98 | =============================================================================== | ||
99 | Changes for patch v11 | ||
100 | |||
101 | - added DEVFS_FL_HIDE_UNREG flag | ||
102 | |||
103 | - incorporated better patch for ttyname() in libc 5.4.43 from H.J. Lu. | ||
104 | |||
105 | - interface change to <devfs_mk_symlink> | ||
106 | |||
107 | - support for creating symlinks with symlink(2) | ||
108 | |||
109 | - parallel port printer (/dev/lp*) now appears in devfs | ||
110 | =============================================================================== | ||
111 | Changes for patch v12 | ||
112 | |||
113 | - added inode check to <devfs_fill_file> function | ||
114 | |||
115 | - improved devfs support when mounting from devfs | ||
116 | |||
117 | - added call to <<release>> operation when removing swap areas on | ||
118 | devfs devices | ||
119 | |||
120 | - increased NR_SUPER to 128 to support large numbers of devfs mounts | ||
121 | (for chroot(2) gaols) | ||
122 | |||
123 | - fixed bug in SCSI disc support: was generating incorrect minors if | ||
124 | SCSI ID's did not start at 0 and increase by 1 | ||
125 | |||
126 | - support symlink traversal when mounting root | ||
127 | =============================================================================== | ||
128 | Changes for patch v13 | ||
129 | |||
130 | - added devfs support to soundcard driver | ||
131 | Thanks to Eric Dumas <dumas@linux.eu.org> and | ||
132 | C. Scott Ananian <cananian@alumni.princeton.edu> | ||
133 | |||
134 | - added devfs support to the joystick driver | ||
135 | |||
136 | - loop driver now has it's own subdirectory "/dev/loop/" | ||
137 | |||
138 | - created <devfs_get_flags> and <devfs_set_flags> functions | ||
139 | |||
140 | - fix problem with SCSI disc compatibility names (sd{a,b,c,d,e,f}) | ||
141 | which assumes ID's start at 0 and increase by 1. Also only create | ||
142 | devfs entries for SCSI disc partitions which actually exist | ||
143 | Show new names in partition check | ||
144 | Thanks to Jakub Jelinek <jj@sunsite.ms.mff.cuni.cz> | ||
145 | =============================================================================== | ||
146 | Changes for patch v14 | ||
147 | |||
148 | - bug fix in floppy driver: would not compile without | ||
149 | CONFIG_DEVFS_FS='Y' | ||
150 | Thanks to Jurgen Botz <jbotz@nova.botz.org> | ||
151 | |||
152 | - bug fix in loop driver | ||
153 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
154 | |||
155 | - do not create devfs entries for printers not configured | ||
156 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
157 | |||
158 | - do not create devfs entries for serial ports not present | ||
159 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
160 | |||
161 | - ensure <tty_register_devfs> is exported from tty_io.c | ||
162 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
163 | |||
164 | - allow unregistering of devfs symlink entries | ||
165 | |||
166 | - fixed bug in SCSI disc naming introduced in last patch version | ||
167 | =============================================================================== | ||
168 | Changes for patch v15 | ||
169 | |||
170 | - ported to kernel 2.1.81 | ||
171 | =============================================================================== | ||
172 | Changes for patch v16 | ||
173 | |||
174 | - created <devfs_set_symlink_destination> function | ||
175 | |||
176 | - moved DEVFS_SUPER_MAGIC into header file | ||
177 | |||
178 | - added DEVFS_FL_HIDE flag | ||
179 | |||
180 | - created <devfs_get_maj_min> | ||
181 | |||
182 | - created <devfs_get_handle_from_inode> | ||
183 | |||
184 | - fixed bugs in searching by major&minor | ||
185 | |||
186 | - changed interface to <devfs_unregister>, <devfs_fill_file> and | ||
187 | <devfs_find_handle> | ||
188 | |||
189 | - fixed inode times when symlink created with symlink(2) | ||
190 | |||
191 | - change tty driver to do auto-creation of devfs entries | ||
192 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
193 | |||
194 | - fixed bug in genhd.c: whole disc (non-SCSI) was not registered to | ||
195 | devfs | ||
196 | |||
197 | - updated libc 5.4.43 patch for ttyname() | ||
198 | =============================================================================== | ||
199 | Changes for patch v17 | ||
200 | |||
201 | - added CONFIG_DEVFS_TTY_COMPAT | ||
202 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
203 | |||
204 | - bugfix in devfs support for drivers/char/lp.c | ||
205 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
206 | |||
207 | - clean up serial driver so that PCMCIA devices unregister correctly | ||
208 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
209 | |||
210 | - fixed bug in genhd.c: whole disc (non-SCSI) was not registered to | ||
211 | devfs [was missing in patch v16] | ||
212 | |||
213 | - updated libc 5.4.43 patch for ttyname() [was missing in patch v16] | ||
214 | |||
215 | - all SCSI devices now registered in /dev/sg | ||
216 | |||
217 | - support removal of devfs entries via unlink(2) | ||
218 | =============================================================================== | ||
219 | Changes for patch v18 | ||
220 | |||
221 | - added floppy/?u720 floppy entry | ||
222 | |||
223 | - fixed kerneld support for entries in devfs subdirectories | ||
224 | |||
225 | - incorporated latest patch for ttyname() in libc 5.4.43 from H.J. Lu. | ||
226 | =============================================================================== | ||
227 | Changes for patch v19 | ||
228 | |||
229 | - bug fix when looking up unregistered entries: kerneld was not called | ||
230 | |||
231 | - fixes for kernel 2.1.86 (now requires 2.1.86) | ||
232 | =============================================================================== | ||
233 | Changes for patch v20 | ||
234 | |||
235 | - only create available floppy entries | ||
236 | Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl> | ||
237 | |||
238 | - new IDE naming scheme following SCSI format (i.e. /dev/id/c0b0t0u0p1 | ||
239 | instead of /dev/hda1) | ||
240 | Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl> | ||
241 | |||
242 | - new XT disc naming scheme following SCSI format (i.e. /dev/xd/c0t0p1 | ||
243 | instead of /dev/xda1) | ||
244 | Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl> | ||
245 | |||
246 | - new non-standard CD-ROM names (i.e. /dev/sbp/c#t#) | ||
247 | Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl> | ||
248 | |||
249 | - allow symlink traversal when mounting the root filesystem | ||
250 | |||
251 | - Create entries for MD devices at MD init | ||
252 | Thanks to Christophe Leroy <christophe.leroy5@capway.com> | ||
253 | =============================================================================== | ||
254 | Changes for patch v21 | ||
255 | |||
256 | - ported to kernel 2.1.91 | ||
257 | =============================================================================== | ||
258 | Changes for patch v22 | ||
259 | |||
260 | - SCSI host number patch ("scsihosts=" kernel option) | ||
261 | Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl> | ||
262 | =============================================================================== | ||
263 | Changes for patch v23 | ||
264 | |||
265 | - Fixed persistence bug with device numbers for manually created | ||
266 | device files | ||
267 | |||
268 | - Fixed problem with recreating symlinks with different content | ||
269 | |||
270 | - Added CONFIG_DEVFS_MOUNT (mount devfs on /dev at boot time) | ||
271 | =============================================================================== | ||
272 | Changes for patch v24 | ||
273 | |||
274 | - Switched from CONFIG_KERNELD to CONFIG_KMOD: module autoloading | ||
275 | should now work again | ||
276 | |||
277 | - Hide entries which are manually unlinked | ||
278 | |||
279 | - Always invalidate devfs dentry cache when registering entries | ||
280 | |||
281 | - Support removal of devfs directories via rmdir(2) | ||
282 | |||
283 | - Ensure directories created by <devfs_mk_dir> are visible | ||
284 | |||
285 | - Default no access for "other" for floppy device | ||
286 | =============================================================================== | ||
287 | Changes for patch v25 | ||
288 | |||
289 | - Updates to CREDITS file and minor IDE numbering change | ||
290 | Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl> | ||
291 | |||
292 | - Invalidate devfs dentry cache when making directories | ||
293 | |||
294 | - Invalidate devfs dentry cache when removing entries | ||
295 | |||
296 | - More informative message if root FS mount fails when devfs | ||
297 | configured | ||
298 | |||
299 | - Fixed persistence bug with fifos | ||
300 | =============================================================================== | ||
301 | Changes for patch v26 | ||
302 | |||
303 | - ported to kernel 2.1.97 | ||
304 | |||
305 | - Changed serial directory from "/dev/serial" to "/dev/tts" and | ||
306 | "/dev/consoles" to "/dev/vc" to be more friendly to new procps | ||
307 | =============================================================================== | ||
308 | Changes for patch v27 | ||
309 | |||
310 | - Added support for IDE4 and IDE5 | ||
311 | Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl> | ||
312 | |||
313 | - Documented "scsihosts=" boot parameter | ||
314 | |||
315 | - Print process command when debugging kerneld/kmod | ||
316 | |||
317 | - Added debugging for register/unregister/change operations | ||
318 | |||
319 | - Added "devfs=" boot options | ||
320 | |||
321 | - Hide unregistered entries by default | ||
322 | =============================================================================== | ||
323 | Changes for patch v28 | ||
324 | |||
325 | - No longer lock/unlock superblock in <devfs_put_super> (cope with | ||
326 | recent VFS interface change) | ||
327 | |||
328 | - Do not automatically change ownership/protection of /dev/tty | ||
329 | |||
330 | - Drop negative dentries when they are released | ||
331 | |||
332 | - Manage dcache more efficiently | ||
333 | =============================================================================== | ||
334 | Changes for patch v29 | ||
335 | |||
336 | - Added DEVFS_FL_AUTO_DEVNUM flag | ||
337 | =============================================================================== | ||
338 | Changes for patch v30 | ||
339 | |||
340 | - No longer set unnecessary methods | ||
341 | |||
342 | - Ported to kernel 2.1.99-pre3 | ||
343 | =============================================================================== | ||
344 | Changes for patch v31 | ||
345 | |||
346 | - Added PID display to <call_kerneld> debugging message | ||
347 | |||
348 | - Added "diread" and "diwrite" options | ||
349 | |||
350 | - Ported to kernel 2.1.102 | ||
351 | |||
352 | - Fixed persistence problem with permissions | ||
353 | =============================================================================== | ||
354 | Changes for patch v32 | ||
355 | |||
356 | - Fixed devfs support in drivers/block/md.c | ||
357 | =============================================================================== | ||
358 | Changes for patch v33 | ||
359 | |||
360 | - Support legacy device nodes | ||
361 | |||
362 | - Fixed bug where recreated inodes were hidden | ||
363 | |||
364 | - New IDE naming scheme: everything is under /dev/ide | ||
365 | =============================================================================== | ||
366 | Changes for patch v34 | ||
367 | |||
368 | - Improved debugging in <get_vfs_inode> | ||
369 | |||
370 | - Prevent duplicate calls to <devfs_mk_dir> in SCSI layer | ||
371 | |||
372 | - No longer free old dentries in <devfs_mk_dir> | ||
373 | |||
374 | - Free all dentries for a given entry when deleting inodes | ||
375 | =============================================================================== | ||
376 | Changes for patch v35 | ||
377 | |||
378 | - Ported to kernel 2.1.105 (sound driver changes) | ||
379 | =============================================================================== | ||
380 | Changes for patch v36 | ||
381 | |||
382 | - Fixed sound driver port | ||
383 | =============================================================================== | ||
384 | Changes for patch v37 | ||
385 | |||
386 | - Minor documentation tweaks | ||
387 | =============================================================================== | ||
388 | Changes for patch v38 | ||
389 | |||
390 | - More documentation tweaks | ||
391 | |||
392 | - Fix for sound driver port | ||
393 | |||
394 | - Removed ttyname-patch (grab libc 5.4.44 instead) | ||
395 | |||
396 | - Ported to kernel 2.1.107-pre2 (loop driver fix) | ||
397 | =============================================================================== | ||
398 | Changes for patch v39 | ||
399 | |||
400 | - Ported to kernel 2.1.107 (hd.c hunk broke due to spelling "fixes"). Sigh | ||
401 | |||
402 | - Removed many #ifdef's, replaced with trickery in include/devfs_fs.h | ||
403 | =============================================================================== | ||
404 | Changes for patch v40 | ||
405 | |||
406 | - Fix for sound driver port | ||
407 | |||
408 | - Limit auto-device numbering to majors 128 to 239 | ||
409 | =============================================================================== | ||
410 | Changes for patch v41 | ||
411 | |||
412 | - Fixed inode times persistence problem | ||
413 | =============================================================================== | ||
414 | Changes for patch v42 | ||
415 | |||
416 | - Ported to kernel 2.1.108 (drivers/scsi/hosts.c hunk broke) | ||
417 | =============================================================================== | ||
418 | Changes for patch v43 | ||
419 | |||
420 | - Fixed spelling in <devfs_readlink> debug | ||
421 | |||
422 | - Fixed bug in <devfs_setup> parsing "dilookup" | ||
423 | |||
424 | - More #ifdef's removed | ||
425 | |||
426 | - Supported Sparc keyboard (/dev/kbd) | ||
427 | |||
428 | - Supported DSP56001 digital signal processor (/dev/dsp56k) | ||
429 | |||
430 | - Supported Apple Desktop Bus (/dev/adb) | ||
431 | |||
432 | - Supported Coda network file system (/dev/cfs*) | ||
433 | =============================================================================== | ||
434 | Changes for patch v44 | ||
435 | |||
436 | - Fixed devfs inode leak when manually recreating inodes | ||
437 | |||
438 | - Fixed permission persistence problem when recreating inodes | ||
439 | =============================================================================== | ||
440 | Changes for patch v45 | ||
441 | |||
442 | - Ported to kernel 2.1.110 | ||
443 | =============================================================================== | ||
444 | Changes for patch v46 | ||
445 | |||
446 | - Ported to kernel 2.1.112-pre1 | ||
447 | |||
448 | - Removed harmless "unused variable" compiler warning | ||
449 | |||
450 | - Fixed modes for manually recreated device nodes | ||
451 | =============================================================================== | ||
452 | Changes for patch v47 | ||
453 | |||
454 | - Added NULL devfs inode warning in <devfs_read_inode> | ||
455 | |||
456 | - Force all inode nlink values to 1 | ||
457 | =============================================================================== | ||
458 | Changes for patch v48 | ||
459 | |||
460 | - Added "dimknod" option | ||
461 | |||
462 | - Set inode nlink to 0 when freeing dentries | ||
463 | |||
464 | - Added support for virtual console capture devices (/dev/vcs*) | ||
465 | Thanks to Dennis Hou <smilax@mindmeld.yi.org> | ||
466 | |||
467 | - Fixed modes for manually recreated symlinks | ||
468 | =============================================================================== | ||
469 | Changes for patch v49 | ||
470 | |||
471 | - Ported to kernel 2.1.113 | ||
472 | =============================================================================== | ||
473 | Changes for patch v50 | ||
474 | |||
475 | - Fixed bugs in recreated directories and symlinks | ||
476 | =============================================================================== | ||
477 | Changes for patch v51 | ||
478 | |||
479 | - Improved robustness of rc.devfs script | ||
480 | Thanks to Roderich Schupp <rsch@experteam.de> | ||
481 | |||
482 | - Fixed bugs in recreated device nodes | ||
483 | |||
484 | - Fixed bug in currently unused <devfs_get_handle_from_inode> | ||
485 | |||
486 | - Defined new <devfs_handle_t> type | ||
487 | |||
488 | - Improved debugging when getting entries | ||
489 | |||
490 | - Fixed bug where directories could be emptied | ||
491 | |||
492 | - Ported to kernel 2.1.115 | ||
493 | =============================================================================== | ||
494 | Changes for patch v52 | ||
495 | |||
496 | - Replaced dummy .epoch inode with .devfsd character device | ||
497 | |||
498 | - Modified rc.devfs to take account of above change | ||
499 | |||
500 | - Removed spurious driver warning messages when CONFIG_DEVFS_FS=n | ||
501 | |||
502 | - Implemented devfsd protocol revision 0 | ||
503 | =============================================================================== | ||
504 | Changes for patch v53 | ||
505 | |||
506 | - Ported to kernel 2.1.116 (kmod change broke hunk) | ||
507 | |||
508 | - Updated Documentation/Configure.help | ||
509 | |||
510 | - Test and tty pattern patch for rc.devfs script | ||
511 | Thanks to Roderich Schupp <rsch@experteam.de> | ||
512 | |||
513 | - Added soothing message to warning in <devfs_d_iput> | ||
514 | =============================================================================== | ||
515 | Changes for patch v54 | ||
516 | |||
517 | - Ported to kernel 2.1.117 | ||
518 | |||
519 | - Fixed default permissions in sound driver | ||
520 | |||
521 | - Added support for frame buffer devices (/dev/fb*) | ||
522 | =============================================================================== | ||
523 | Changes for patch v55 | ||
524 | |||
525 | - Ported to kernel 2.1.119 | ||
526 | |||
527 | - Use GCC extensions for structure initialisations | ||
528 | |||
529 | - Implemented async open notification | ||
530 | |||
531 | - Incremented devfsd protocol revision to 1 | ||
532 | =============================================================================== | ||
533 | Changes for patch v56 | ||
534 | |||
535 | - Ported to kernel 2.1.120-pre3 | ||
536 | |||
537 | - Moved async open notification to end of <devfs_open> | ||
538 | =============================================================================== | ||
539 | Changes for patch v57 | ||
540 | |||
541 | - Ported to kernel 2.1.121 | ||
542 | |||
543 | - Prepended "/dev/" to module load request | ||
544 | |||
545 | - Renamed <call_kerneld> to <call_kmod> | ||
546 | |||
547 | - Created sample modules.conf file | ||
548 | =============================================================================== | ||
549 | Changes for patch v58 | ||
550 | |||
551 | - Fixed typo "AYSNC" -> "ASYNC" | ||
552 | =============================================================================== | ||
553 | Changes for patch v59 | ||
554 | |||
555 | - Added open flag for files | ||
556 | =============================================================================== | ||
557 | Changes for patch v60 | ||
558 | |||
559 | - Ported to kernel 2.1.123-pre2 | ||
560 | =============================================================================== | ||
561 | Changes for patch v61 | ||
562 | |||
563 | - Set i_blocks=0 and i_blksize=1024 in <devfs_read_inode> | ||
564 | =============================================================================== | ||
565 | Changes for patch v62 | ||
566 | |||
567 | - Ported to kernel 2.1.123 | ||
568 | =============================================================================== | ||
569 | Changes for patch v63 | ||
570 | |||
571 | - Ported to kernel 2.1.124-pre2 | ||
572 | =============================================================================== | ||
573 | Changes for patch v64 | ||
574 | |||
575 | - Fixed Unix98 pty support | ||
576 | |||
577 | - Increased buffer size in <get_partition_list> to avoid crash and | ||
578 | burn | ||
579 | =============================================================================== | ||
580 | Changes for patch v65 | ||
581 | |||
582 | - More Unix98 pty support fixes | ||
583 | |||
584 | - Added test for empty <<name>> in <devfs_find_handle> | ||
585 | |||
586 | - Renamed <generate_path> to <devfs_generate_path> and published | ||
587 | |||
588 | - Created /dev/root symlink | ||
589 | Thanks to Roderich Schupp <rsch@ExperTeam.de> | ||
590 | with further modifications by me | ||
591 | =============================================================================== | ||
592 | Changes for patch v66 | ||
593 | |||
594 | - Yet more Unix98 pty support fixes (now tested) | ||
595 | |||
596 | - Created <devfs_get_fops> | ||
597 | |||
598 | - Support media change checks when CONFIG_DEVFS_ONLY=y | ||
599 | |||
600 | - Abolished Unix98-style PTY names for old PTY devices | ||
601 | =============================================================================== | ||
602 | Changes for patch v67 | ||
603 | |||
604 | - Added inline declaration for dummy <devfs_generate_path> | ||
605 | |||
606 | - Removed spurious "unable to register... in devfs" messages when | ||
607 | CONFIG_DEVFS_FS=n | ||
608 | |||
609 | - Fixed misc. devices when CONFIG_DEVFS_FS=n | ||
610 | |||
611 | - Limit auto-device numbering to majors 144 to 239 | ||
612 | =============================================================================== | ||
613 | Changes for patch v68 | ||
614 | |||
615 | - Hide unopened virtual consoles from directory listings | ||
616 | |||
617 | - Added support for video capture devices | ||
618 | |||
619 | - Ported to kernel 2.1.125 | ||
620 | =============================================================================== | ||
621 | Changes for patch v69 | ||
622 | |||
623 | - Fix for CONFIG_VT=n | ||
624 | =============================================================================== | ||
625 | Changes for patch v70 | ||
626 | |||
627 | - Added support for non-OSS/Free sound cards | ||
628 | =============================================================================== | ||
629 | Changes for patch v71 | ||
630 | |||
631 | - Ported to kernel 2.1.126-pre2 | ||
632 | =============================================================================== | ||
633 | Changes for patch v72 | ||
634 | |||
635 | - #ifdef's for CONFIG_DEVFS_DISABLE_OLD_NAMES removed | ||
636 | =============================================================================== | ||
637 | Changes for patch v73 | ||
638 | |||
639 | - CONFIG_DEVFS_DISABLE_OLD_NAMES replaced with "nocompat" boot option | ||
640 | |||
641 | - CONFIG_DEVFS_BOOT_OPTIONS removed: boot options always available | ||
642 | =============================================================================== | ||
643 | Changes for patch v74 | ||
644 | |||
645 | - Removed CONFIG_DEVFS_MOUNT and "mount" boot option and replaced with | ||
646 | "nomount" boot option | ||
647 | |||
648 | - Documentation updates | ||
649 | |||
650 | - Updated sample modules.conf | ||
651 | =============================================================================== | ||
652 | Changes for patch v75 | ||
653 | |||
654 | - Updated sample modules.conf | ||
655 | |||
656 | - Remount devfs after initrd finishes | ||
657 | |||
658 | - Ported to kernel 2.1.127 | ||
659 | |||
660 | - Added support for ISDN | ||
661 | Thanks to Christophe Leroy <christophe.leroy5@capway.com> | ||
662 | =============================================================================== | ||
663 | Changes for patch v76 | ||
664 | |||
665 | - Updated an email address in ChangeLog | ||
666 | |||
667 | - CONFIG_DEVFS_ONLY replaced with "only" boot option | ||
668 | =============================================================================== | ||
669 | Changes for patch v77 | ||
670 | |||
671 | - Added DEVFS_FL_REMOVABLE flag | ||
672 | |||
673 | - Check for disc change when listing directories with removable media | ||
674 | devices | ||
675 | |||
676 | - Use DEVFS_FL_REMOVABLE in sd.c | ||
677 | |||
678 | - Ported to kernel 2.1.128 | ||
679 | =============================================================================== | ||
680 | Changes for patch v78 | ||
681 | |||
682 | - Only call <scan_dir_for_removable> on first call to <devfs_readdir> | ||
683 | |||
684 | - Ported to kernel 2.1.129-pre5 | ||
685 | |||
686 | - ISDN support improvements | ||
687 | Thanks to Christophe Leroy <christophe.leroy5@capway.com> | ||
688 | =============================================================================== | ||
689 | Changes for patch v79 | ||
690 | |||
691 | - Ported to kernel 2.1.130 | ||
692 | |||
693 | - Renamed miscdevice "apm" to "apm_bios" to be consistent with | ||
694 | devices.txt | ||
695 | =============================================================================== | ||
696 | Changes for patch v80 | ||
697 | |||
698 | - Ported to kernel 2.1.131 | ||
699 | |||
700 | - Updated <devfs_rmdir> for VFS change in 2.1.131 | ||
701 | =============================================================================== | ||
702 | Changes for patch v81 | ||
703 | |||
704 | - Fixed permissions on /dev/ptmx | ||
705 | =============================================================================== | ||
706 | Changes for patch v82 | ||
707 | |||
708 | - Ported to kernel 2.1.132-pre4 | ||
709 | |||
710 | - Changed initial permissions on /dev/pts/* | ||
711 | |||
712 | - Created <devfs_mk_compat> | ||
713 | |||
714 | - Added "symlinks" boot option | ||
715 | |||
716 | - Changed devfs_register_blkdev() back to register_blkdev() for IDE | ||
717 | |||
718 | - Check for partitions on removable media in <devfs_lookup> | ||
719 | =============================================================================== | ||
720 | Changes for patch v83 | ||
721 | |||
722 | - Fixed support for ramdisc when using string-based root FS name | ||
723 | |||
724 | - Ported to kernel 2.2.0-pre1 | ||
725 | =============================================================================== | ||
726 | Changes for patch v84 | ||
727 | |||
728 | - Ported to kernel 2.2.0-pre7 | ||
729 | =============================================================================== | ||
730 | Changes for patch v85 | ||
731 | |||
732 | - Compile fixes for driver/sound/sound_common.c (non-module) and | ||
733 | drivers/isdn/isdn_common.c | ||
734 | Thanks to Christophe Leroy <christophe.leroy5@capway.com> | ||
735 | |||
736 | - Added support for registering regular files | ||
737 | |||
738 | - Created <devfs_set_file_size> | ||
739 | |||
740 | - Added /dev/cpu/mtrr as an alternative interface to /proc/mtrr | ||
741 | |||
742 | - Update devfs inodes from entries if not changed through FS | ||
743 | =============================================================================== | ||
744 | Changes for patch v86 | ||
745 | |||
746 | - Ported to kernel 2.2.0-pre9 | ||
747 | =============================================================================== | ||
748 | Changes for patch v87 | ||
749 | |||
750 | - Fixed bug when mounting non-devfs devices in a devfs | ||
751 | =============================================================================== | ||
752 | Changes for patch v88 | ||
753 | |||
754 | - Fixed <devfs_fill_file> to only initialise temporary inodes | ||
755 | |||
756 | - Trap for NULL fops in <devfs_register> | ||
757 | |||
758 | - Return -ENODEV in <devfs_fill_file> for non-driver inodes | ||
759 | |||
760 | - Fixed bug when unswapping non-devfs devices in a devfs | ||
761 | =============================================================================== | ||
762 | Changes for patch v89 | ||
763 | |||
764 | - Switched to C data types in include/linux/devfs_fs.h | ||
765 | |||
766 | - Switched from PATH_MAX to DEVFS_PATHLEN | ||
767 | |||
768 | - Updated Documentation/filesystems/devfs/modules.conf to take account | ||
769 | of reverse scanning (!) by modprobe | ||
770 | |||
771 | - Ported to kernel 2.2.0 | ||
772 | =============================================================================== | ||
773 | Changes for patch v90 | ||
774 | |||
775 | - CONFIG_DEVFS_DISABLE_OLD_TTY_NAMES replaced with "nottycompat" boot | ||
776 | option | ||
777 | |||
778 | - CONFIG_DEVFS_TTY_COMPAT removed: existing "symlinks" boot option now | ||
779 | controls this. This means you must have libc 5.4.44 or later, or a | ||
780 | recent version of libc 6 if you use the "symlinks" option | ||
781 | =============================================================================== | ||
782 | Changes for patch v91 | ||
783 | |||
784 | - Switch from <devfs_mk_symlink> to <devfs_mk_compat> in | ||
785 | drivers/char/vc_screen.c to fix problems with Midnight Commander | ||
786 | =============================================================================== | ||
787 | Changes for patch v92 | ||
788 | |||
789 | - Ported to kernel 2.2.2-pre5 | ||
790 | =============================================================================== | ||
791 | Changes for patch v93 | ||
792 | |||
793 | - Modified <sd_name> in drivers/scsi/sd.c to cope with devices that | ||
794 | don't exist (which happens with new RAID autostart code printk()s) | ||
795 | =============================================================================== | ||
796 | Changes for patch v94 | ||
797 | |||
798 | - Fixed bug in joystick driver: only first joystick was registered | ||
799 | =============================================================================== | ||
800 | Changes for patch v95 | ||
801 | |||
802 | - Fixed another bug in joystick driver | ||
803 | |||
804 | - Fixed <devfsd_read> to not overrun event buffer | ||
805 | =============================================================================== | ||
806 | Changes for patch v96 | ||
807 | |||
808 | - Ported to kernel 2.2.5-2 | ||
809 | |||
810 | - Created <devfs_auto_unregister> | ||
811 | |||
812 | - Fixed bugs: compatibility entries were not unregistered for: | ||
813 | loop driver | ||
814 | floppy driver | ||
815 | RAMDISC driver | ||
816 | IDE tape driver | ||
817 | SCSI CD-ROM driver | ||
818 | SCSI HDD driver | ||
819 | =============================================================================== | ||
820 | Changes for patch v97 | ||
821 | |||
822 | - Fixed bugs: compatibility entries were not unregistered for: | ||
823 | ALSA sound driver | ||
824 | partitions in generic disc driver | ||
825 | |||
826 | - Don't return unregistred entries in <devfs_find_handle> | ||
827 | |||
828 | - Panic in <devfs_unregister> if entry unregistered | ||
829 | |||
830 | - Don't panic in <devfs_auto_unregister> for duplicates | ||
831 | =============================================================================== | ||
832 | Changes for patch v98 | ||
833 | |||
834 | - Don't unregister already unregistered entries in <unregister> | ||
835 | |||
836 | - Register entry in <sd_detect> | ||
837 | |||
838 | - Unregister entry in <sd_detach> | ||
839 | |||
840 | - Changed to <devfs_*register_chrdev> in drivers/char/tty_io.c | ||
841 | |||
842 | - Ported to kernel 2.2.7 | ||
843 | =============================================================================== | ||
844 | Changes for patch v99 | ||
845 | |||
846 | - Ported to kernel 2.2.8 | ||
847 | |||
848 | - Fixed bug in drivers/scsi/sd.c when >16 SCSI discs | ||
849 | |||
850 | - Disable warning messages when unable to read partition table for | ||
851 | removable media | ||
852 | =============================================================================== | ||
853 | Changes for patch v100 | ||
854 | |||
855 | - Ported to kernel 2.3.1-pre5 | ||
856 | |||
857 | - Added "oops-on-panic" boot option | ||
858 | |||
859 | - Improved debugging in <devfs_register> and <devfs_unregister> | ||
860 | |||
861 | - Register entry in <sr_detect> | ||
862 | |||
863 | - Unregister entry in <sr_detach> | ||
864 | |||
865 | - Register entry in <sg_detect> | ||
866 | |||
867 | - Unregister entry in <sg_detach> | ||
868 | |||
869 | - Added support for ALSA drivers | ||
870 | =============================================================================== | ||
871 | Changes for patch v101 | ||
872 | |||
873 | - Ported to kernel 2.3.2 | ||
874 | =============================================================================== | ||
875 | Changes for patch v102 | ||
876 | |||
877 | - Update serial driver to register PCMCIA entries | ||
878 | Thanks to Roch-Alexandre Nomine-Beguin <roch@samarkand.infini.fr> | ||
879 | |||
880 | - Updated an email address in ChangeLog | ||
881 | |||
882 | - Hide virtual console capture entries from directory listings when | ||
883 | corresponding console device is not open | ||
884 | =============================================================================== | ||
885 | Changes for patch v103 | ||
886 | |||
887 | - Ported to kernel 2.3.3 | ||
888 | =============================================================================== | ||
889 | Changes for patch v104 | ||
890 | |||
891 | - Added documentation for some functions | ||
892 | |||
893 | - Added "doc" target to fs/devfs/Makefile | ||
894 | |||
895 | - Added "v4l" directory for video4linux devices | ||
896 | |||
897 | - Replaced call to <devfs_unregister> in <sd_detach> with call to | ||
898 | <devfs_register_partitions> | ||
899 | |||
900 | - Moved registration for sr and sg drivers from detect() to attach() | ||
901 | methods | ||
902 | |||
903 | - Register entries in <st_attach> and unregister in <st_detach> | ||
904 | |||
905 | - Work around IDE driver treating CD-ROM as gendisk | ||
906 | |||
907 | - Use <sed> instead of <tr> in rc.devfs | ||
908 | |||
909 | - Updated ToDo list | ||
910 | |||
911 | - Removed "oops-on-panic" boot option: now always Oops | ||
912 | =============================================================================== | ||
913 | Changes for patch v105 | ||
914 | |||
915 | - Unregister SCSI host from <scsi_host_no_list> in <scsi_unregister> | ||
916 | Thanks to Zoltán Böszörményi <zboszor@mail.externet.hu> | ||
917 | |||
918 | - Don't save /dev/log in rc.devfs | ||
919 | |||
920 | - Ported to kernel 2.3.4-pre1 | ||
921 | =============================================================================== | ||
922 | Changes for patch v106 | ||
923 | |||
924 | - Fixed silly typo in drivers/scsi/st.c | ||
925 | |||
926 | - Improved debugging in <devfs_register> | ||
927 | =============================================================================== | ||
928 | Changes for patch v107 | ||
929 | |||
930 | - Added "diunlink" and "nokmod" boot options | ||
931 | |||
932 | - Removed superfluous warning message in <devfs_d_iput> | ||
933 | =============================================================================== | ||
934 | Changes for patch v108 | ||
935 | |||
936 | - Remove entries when unloading sound module | ||
937 | =============================================================================== | ||
938 | Changes for patch v109 | ||
939 | |||
940 | - Ported to kernel 2.3.6-pre2 | ||
941 | =============================================================================== | ||
942 | Changes for patch v110 | ||
943 | |||
944 | - Took account of change to <d_alloc_root> | ||
945 | =============================================================================== | ||
946 | Changes for patch v111 | ||
947 | |||
948 | - Created separate event queue for each mounted devfs | ||
949 | |||
950 | - Removed <devfs_invalidate_dcache> | ||
951 | |||
952 | - Created new ioctl()s for devfsd | ||
953 | |||
954 | - Incremented devfsd protocol revision to 3 | ||
955 | |||
956 | - Fixed bug when re-creating directories: contents were lost | ||
957 | |||
958 | - Block access to inodes until devfsd updates permissions | ||
959 | =============================================================================== | ||
960 | Changes for patch v112 | ||
961 | |||
962 | - Modified patch so it applies against 2.3.5 and 2.3.6 | ||
963 | |||
964 | - Updated an email address in ChangeLog | ||
965 | |||
966 | - Do not automatically change ownership/protection of /dev/tty<n> | ||
967 | |||
968 | - Updated sample modules.conf | ||
969 | |||
970 | - Switched to sending process uid/gid to devfsd | ||
971 | |||
972 | - Renamed <call_kmod> to <try_modload> | ||
973 | |||
974 | - Added DEVFSD_NOTIFY_LOOKUP event | ||
975 | |||
976 | - Added DEVFSD_NOTIFY_CHANGE event | ||
977 | |||
978 | - Added DEVFSD_NOTIFY_CREATE event | ||
979 | |||
980 | - Incremented devfsd protocol revision to 4 | ||
981 | |||
982 | - Moved kernel-specific stuff to include/linux/devfs_fs_kernel.h | ||
983 | =============================================================================== | ||
984 | Changes for patch v113 | ||
985 | |||
986 | - Ported to kernel 2.3.9 | ||
987 | |||
988 | - Restricted permissions on some block devices | ||
989 | =============================================================================== | ||
990 | Changes for patch v114 | ||
991 | |||
992 | - Added support for /dev/netlink | ||
993 | Thanks to Dennis Hou <smilax@mindmeld.yi.org> | ||
994 | |||
995 | - Return EISDIR rather than EINVAL for read(2) on directories | ||
996 | |||
997 | - Ported to kernel 2.3.10 | ||
998 | =============================================================================== | ||
999 | Changes for patch v115 | ||
1000 | |||
1001 | - Added support for all remaining character devices | ||
1002 | Thanks to Dennis Hou <smilax@mindmeld.yi.org> | ||
1003 | |||
1004 | - Cleaned up netlink support | ||
1005 | =============================================================================== | ||
1006 | Changes for patch v116 | ||
1007 | |||
1008 | - Added support for /dev/parport%d | ||
1009 | Thanks to Tim Waugh <tim@cyberelk.demon.co.uk> | ||
1010 | |||
1011 | - Fixed parallel port ATAPI tape driver | ||
1012 | |||
1013 | - Fixed Atari SLM laser printer driver | ||
1014 | =============================================================================== | ||
1015 | Changes for patch v117 | ||
1016 | |||
1017 | - Added support for COSA card | ||
1018 | Thanks to Dennis Hou <smilax@mindmeld.yi.org> | ||
1019 | |||
1020 | - Fixed drivers/char/ppdev.c: missing #include <linux/init.h> | ||
1021 | |||
1022 | - Fixed drivers/char/ftape/zftape/zftape-init.c | ||
1023 | Thanks to Vladimir Popov <mashgrad@usa.net> | ||
1024 | =============================================================================== | ||
1025 | Changes for patch v118 | ||
1026 | |||
1027 | - Ported to kernel 2.3.15-pre3 | ||
1028 | |||
1029 | - Fixed bug in loop driver | ||
1030 | |||
1031 | - Unregister /dev/lp%d entries in drivers/char/lp.c | ||
1032 | Thanks to Maciej W. Rozycki <macro@ds2.pg.gda.pl> | ||
1033 | =============================================================================== | ||
1034 | Changes for patch v119 | ||
1035 | |||
1036 | - Ported to kernel 2.3.16 | ||
1037 | =============================================================================== | ||
1038 | Changes for patch v120 | ||
1039 | |||
1040 | - Fixed bug in drivers/scsi/scsi.c | ||
1041 | |||
1042 | - Added /dev/ppp | ||
1043 | Thanks to Dennis Hou <smilax@mindmeld.yi.org> | ||
1044 | |||
1045 | - Ported to kernel 2.3.17 | ||
1046 | =============================================================================== | ||
1047 | Changes for patch v121 | ||
1048 | |||
1049 | - Fixed bug in drivers/block/loop.c | ||
1050 | |||
1051 | - Ported to kernel 2.3.18 | ||
1052 | =============================================================================== | ||
1053 | Changes for patch v122 | ||
1054 | |||
1055 | - Ported to kernel 2.3.19 | ||
1056 | =============================================================================== | ||
1057 | Changes for patch v123 | ||
1058 | |||
1059 | - Ported to kernel 2.3.20 | ||
1060 | =============================================================================== | ||
1061 | Changes for patch v124 | ||
1062 | |||
1063 | - Ported to kernel 2.3.21 | ||
1064 | =============================================================================== | ||
1065 | Changes for patch v125 | ||
1066 | |||
1067 | - Created <devfs_get_info>, <devfs_set_info>, | ||
1068 | <devfs_get_first_child> and <devfs_get_next_sibling> | ||
1069 | Added <<dir>> parameter to <devfs_register>, <devfs_mk_compat>, | ||
1070 | <devfs_mk_dir> and <devfs_find_handle> | ||
1071 | Work sponsored by SGI | ||
1072 | |||
1073 | - Fixed apparent bug in COSA driver | ||
1074 | |||
1075 | - Re-instated "scsihosts=" boot option | ||
1076 | =============================================================================== | ||
1077 | Changes for patch v126 | ||
1078 | |||
1079 | - Always create /dev/pts if CONFIG_UNIX98_PTYS=y | ||
1080 | |||
1081 | - Fixed call to <devfs_mk_dir> in drivers/block/ide-disk.c | ||
1082 | Thanks to Dennis Hou <smilax@mindmeld.yi.org> | ||
1083 | |||
1084 | - Allow multiple unregistrations | ||
1085 | |||
1086 | - Created /dev/scsi hierarchy | ||
1087 | Work sponsored by SGI | ||
1088 | =============================================================================== | ||
1089 | Changes for patch v127 | ||
1090 | |||
1091 | Work sponsored by SGI | ||
1092 | |||
1093 | - No longer disable devpts if devfs enabled (caveat emptor) | ||
1094 | |||
1095 | - Added flags array to struct gendisk and removed code from | ||
1096 | drivers/scsi/sd.c | ||
1097 | |||
1098 | - Created /dev/discs hierarchy | ||
1099 | =============================================================================== | ||
1100 | Changes for patch v128 | ||
1101 | |||
1102 | Work sponsored by SGI | ||
1103 | |||
1104 | - Created /dev/cdroms hierarchy | ||
1105 | =============================================================================== | ||
1106 | Changes for patch v129 | ||
1107 | |||
1108 | Work sponsored by SGI | ||
1109 | |||
1110 | - Removed compatibility entries for sound devices | ||
1111 | |||
1112 | - Removed compatibility entries for printer devices | ||
1113 | |||
1114 | - Removed compatibility entries for video4linux devices | ||
1115 | |||
1116 | - Removed compatibility entries for parallel port devices | ||
1117 | |||
1118 | - Removed compatibility entries for frame buffer devices | ||
1119 | =============================================================================== | ||
1120 | Changes for patch v130 | ||
1121 | |||
1122 | Work sponsored by SGI | ||
1123 | |||
1124 | - Added major and minor number to devfsd protocol | ||
1125 | |||
1126 | - Incremented devfsd protocol revision to 5 | ||
1127 | |||
1128 | - Removed compatibility entries for SoundBlaster CD-ROMs | ||
1129 | |||
1130 | - Removed compatibility entries for netlink devices | ||
1131 | |||
1132 | - Removed compatibility entries for SCSI generic devices | ||
1133 | |||
1134 | - Removed compatibility entries for SCSI tape devices | ||
1135 | =============================================================================== | ||
1136 | Changes for patch v131 | ||
1137 | |||
1138 | Work sponsored by SGI | ||
1139 | |||
1140 | - Support info pointer for all devfs entry types | ||
1141 | |||
1142 | - Added <<info>> parameter to <devfs_mk_dir> and <devfs_mk_symlink> | ||
1143 | |||
1144 | - Removed /dev/st hierarchy | ||
1145 | |||
1146 | - Removed /dev/sg hierarchy | ||
1147 | |||
1148 | - Removed compatibility entries for loop devices | ||
1149 | |||
1150 | - Removed compatibility entries for IDE tape devices | ||
1151 | |||
1152 | - Removed compatibility entries for SCSI CD-ROMs | ||
1153 | |||
1154 | - Removed /dev/sr hierarchy | ||
1155 | =============================================================================== | ||
1156 | Changes for patch v132 | ||
1157 | |||
1158 | Work sponsored by SGI | ||
1159 | |||
1160 | - Removed compatibility entries for floppy devices | ||
1161 | |||
1162 | - Removed compatibility entries for RAMDISCs | ||
1163 | |||
1164 | - Removed compatibility entries for meta-devices | ||
1165 | |||
1166 | - Removed compatibility entries for SCSI discs | ||
1167 | |||
1168 | - Created <devfs_make_root> | ||
1169 | |||
1170 | - Removed /dev/sd hierarchy | ||
1171 | |||
1172 | - Support "../" when searching devfs namespace | ||
1173 | |||
1174 | - Created /dev/ide/host* hierarchy | ||
1175 | |||
1176 | - Supported IDE hard discs in /dev/ide/host* hierarchy | ||
1177 | |||
1178 | - Removed compatibility entries for IDE discs | ||
1179 | |||
1180 | - Removed /dev/ide/hd hierarchy | ||
1181 | |||
1182 | - Supported IDE CD-ROMs in /dev/ide/host* hierarchy | ||
1183 | |||
1184 | - Removed compatibility entries for IDE CD-ROMs | ||
1185 | |||
1186 | - Removed /dev/ide/cd hierarchy | ||
1187 | =============================================================================== | ||
1188 | Changes for patch v133 | ||
1189 | |||
1190 | Work sponsored by SGI | ||
1191 | |||
1192 | - Created <devfs_get_unregister_slave> | ||
1193 | |||
1194 | - Fixed bug in fs/partitions/check.c when rescanning | ||
1195 | =============================================================================== | ||
1196 | Changes for patch v134 | ||
1197 | |||
1198 | Work sponsored by SGI | ||
1199 | |||
1200 | - Removed /dev/sd, /dev/sr, /dev/st and /dev/sg directories | ||
1201 | |||
1202 | - Removed /dev/ide/hd directory | ||
1203 | |||
1204 | - Exported <devfs_get_parent> | ||
1205 | |||
1206 | - Created <devfs_register_tape> and /dev/tapes hierarchy | ||
1207 | |||
1208 | - Removed /dev/ide/mt hierarchy | ||
1209 | |||
1210 | - Removed /dev/ide/fd hierarchy | ||
1211 | |||
1212 | - Ported to kernel 2.3.25 | ||
1213 | =============================================================================== | ||
1214 | Changes for patch v135 | ||
1215 | |||
1216 | Work sponsored by SGI | ||
1217 | |||
1218 | - Removed compatibility entries for virtual console capture devices | ||
1219 | |||
1220 | - Removed unused <devfs_set_symlink_destination> | ||
1221 | |||
1222 | - Removed compatibility entries for serial devices | ||
1223 | |||
1224 | - Removed compatibility entries for console devices | ||
1225 | |||
1226 | - Do not hide entries from devfsd or children | ||
1227 | |||
1228 | - Removed DEVFS_FL_TTY_COMPAT flag | ||
1229 | |||
1230 | - Removed "nottycompat" boot option | ||
1231 | |||
1232 | - Removed <devfs_mk_compat> | ||
1233 | =============================================================================== | ||
1234 | Changes for patch v136 | ||
1235 | |||
1236 | Work sponsored by SGI | ||
1237 | |||
1238 | - Moved BSD pty devices to /dev/pty | ||
1239 | |||
1240 | - Added DEVFS_FL_WAIT flag | ||
1241 | =============================================================================== | ||
1242 | Changes for patch v137 | ||
1243 | |||
1244 | Work sponsored by SGI | ||
1245 | |||
1246 | - Really fixed bug in fs/partitions/check.c when rescanning | ||
1247 | |||
1248 | - Support new "disc" naming scheme in <get_removable_partition> | ||
1249 | |||
1250 | - Allow NULL fops in <devfs_register> | ||
1251 | |||
1252 | - Removed redundant name functions in SCSI disc and IDE drivers | ||
1253 | =============================================================================== | ||
1254 | Changes for patch v138 | ||
1255 | |||
1256 | Work sponsored by SGI | ||
1257 | |||
1258 | - Fixed old bugs in drivers/block/paride/pt.c, drivers/char/tpqic02.c, | ||
1259 | drivers/net/wan/cosa.c and drivers/scsi/scsi.c | ||
1260 | Thanks to Sergey Kubushin <ksi@ksi-linux.com> | ||
1261 | |||
1262 | - Fall back to major table if NULL fops given to <devfs_register> | ||
1263 | =============================================================================== | ||
1264 | Changes for patch v139 | ||
1265 | |||
1266 | Work sponsored by SGI | ||
1267 | |||
1268 | - Corrected and moved <get_blkfops> and <get_chrfops> declarations | ||
1269 | from arch/alpha/kernel/osf_sys.c to include/linux/fs.h | ||
1270 | |||
1271 | - Removed name function from struct gendisk | ||
1272 | |||
1273 | - Updated devfs FAQ | ||
1274 | =============================================================================== | ||
1275 | Changes for patch v140 | ||
1276 | |||
1277 | Work sponsored by SGI | ||
1278 | |||
1279 | - Ported to kernel 2.3.27 | ||
1280 | =============================================================================== | ||
1281 | Changes for patch v141 | ||
1282 | |||
1283 | Work sponsored by SGI | ||
1284 | |||
1285 | - Bug fix in arch/m68k/atari/joystick.c | ||
1286 | |||
1287 | - Moved ISDN and capi devices to /dev/isdn | ||
1288 | =============================================================================== | ||
1289 | Changes for patch v142 | ||
1290 | |||
1291 | Work sponsored by SGI | ||
1292 | |||
1293 | - Bug fix in drivers/block/ide-probe.c (patch confusion) | ||
1294 | =============================================================================== | ||
1295 | Changes for patch v143 | ||
1296 | |||
1297 | Work sponsored by SGI | ||
1298 | |||
1299 | - Bug fix in drivers/block/blkpg.c:partition_name() | ||
1300 | =============================================================================== | ||
1301 | Changes for patch v144 | ||
1302 | |||
1303 | Work sponsored by SGI | ||
1304 | |||
1305 | - Ported to kernel 2.3.29 | ||
1306 | |||
1307 | - Removed calls to <devfs_register> from cdu31a, cm206, mcd and mcdx | ||
1308 | CD-ROM drivers: generic driver handles this now | ||
1309 | |||
1310 | - Moved joystick devices to /dev/joysticks | ||
1311 | =============================================================================== | ||
1312 | Changes for patch v145 | ||
1313 | |||
1314 | Work sponsored by SGI | ||
1315 | |||
1316 | - Ported to kernel 2.3.30-pre3 | ||
1317 | |||
1318 | - Register whole-disc entry even for invalid partition tables | ||
1319 | |||
1320 | - Fixed bug in mounting root FS when initrd enabled | ||
1321 | |||
1322 | - Fixed device entry leak with IDE CD-ROMs | ||
1323 | |||
1324 | - Fixed compile problem with drivers/isdn/isdn_common.c | ||
1325 | |||
1326 | - Moved COSA devices to /dev/cosa | ||
1327 | |||
1328 | - Support fifos when unregistering | ||
1329 | |||
1330 | - Created <devfs_register_series> and used in many drivers | ||
1331 | |||
1332 | - Moved Coda devices to /dev/coda | ||
1333 | |||
1334 | - Moved parallel port IDE tapes to /dev/pt | ||
1335 | |||
1336 | - Moved parallel port IDE generic devices to /dev/pg | ||
1337 | =============================================================================== | ||
1338 | Changes for patch v146 | ||
1339 | |||
1340 | Work sponsored by SGI | ||
1341 | |||
1342 | - Removed obsolete DEVFS_FL_COMPAT and DEVFS_FL_TOLERANT flags | ||
1343 | |||
1344 | - Fixed compile problem with fs/coda/psdev.c | ||
1345 | |||
1346 | - Reinstate change to <devfs_register_blkdev> in | ||
1347 | drivers/block/ide-probe.c now that fs/isofs/inode.c is fixed | ||
1348 | |||
1349 | - Switched to <devfs_register_blkdev> in drivers/block/floppy.c, | ||
1350 | drivers/scsi/sr.c and drivers/block/md.c | ||
1351 | |||
1352 | - Moved DAC960 devices to /dev/dac960 | ||
1353 | =============================================================================== | ||
1354 | Changes for patch v147 | ||
1355 | |||
1356 | Work sponsored by SGI | ||
1357 | |||
1358 | - Ported to kernel 2.3.32-pre4 | ||
1359 | =============================================================================== | ||
1360 | Changes for patch v148 | ||
1361 | |||
1362 | Work sponsored by SGI | ||
1363 | |||
1364 | - Removed kmod support: use devfsd instead | ||
1365 | |||
1366 | - Moved miscellaneous character devices to /dev/misc | ||
1367 | =============================================================================== | ||
1368 | Changes for patch v149 | ||
1369 | |||
1370 | Work sponsored by SGI | ||
1371 | |||
1372 | - Ensure include/linux/joystick.h is OK for user-space | ||
1373 | |||
1374 | - Improved debugging in <get_vfs_inode> | ||
1375 | |||
1376 | - Ensure dentries created by devfsd will be cleaned up | ||
1377 | =============================================================================== | ||
1378 | Changes for patch v150 | ||
1379 | |||
1380 | Work sponsored by SGI | ||
1381 | |||
1382 | - Ported to kernel 2.3.34 | ||
1383 | =============================================================================== | ||
1384 | Changes for patch v151 | ||
1385 | |||
1386 | Work sponsored by SGI | ||
1387 | |||
1388 | - Ported to kernel 2.3.35-pre1 | ||
1389 | |||
1390 | - Created <devfs_get_name> | ||
1391 | =============================================================================== | ||
1392 | Changes for patch v152 | ||
1393 | |||
1394 | Work sponsored by SGI | ||
1395 | |||
1396 | - Updated sample modules.conf | ||
1397 | |||
1398 | - Ported to kernel 2.3.36-pre1 | ||
1399 | =============================================================================== | ||
1400 | Changes for patch v153 | ||
1401 | |||
1402 | Work sponsored by SGI | ||
1403 | |||
1404 | - Ported to kernel 2.3.42 | ||
1405 | |||
1406 | - Removed <devfs_fill_file> | ||
1407 | =============================================================================== | ||
1408 | Changes for patch v154 | ||
1409 | |||
1410 | Work sponsored by SGI | ||
1411 | |||
1412 | - Took account of device number changes for /dev/fb* | ||
1413 | =============================================================================== | ||
1414 | Changes for patch v155 | ||
1415 | |||
1416 | Work sponsored by SGI | ||
1417 | |||
1418 | - Ported to kernel 2.3.43-pre8 | ||
1419 | |||
1420 | - Moved /dev/tty0 to /dev/vc/0 | ||
1421 | |||
1422 | - Moved sequence number formatting from <_tty_make_name> to drivers | ||
1423 | =============================================================================== | ||
1424 | Changes for patch v156 | ||
1425 | |||
1426 | Work sponsored by SGI | ||
1427 | |||
1428 | - Fixed breakage in drivers/scsi/sd.c due to recent SCSI changes | ||
1429 | =============================================================================== | ||
1430 | Changes for patch v157 | ||
1431 | |||
1432 | Work sponsored by SGI | ||
1433 | |||
1434 | - Ported to kernel 2.3.45 | ||
1435 | =============================================================================== | ||
1436 | Changes for patch v158 | ||
1437 | |||
1438 | Work sponsored by SGI | ||
1439 | |||
1440 | - Ported to kernel 2.3.46-pre2 | ||
1441 | =============================================================================== | ||
1442 | Changes for patch v159 | ||
1443 | |||
1444 | Work sponsored by SGI | ||
1445 | |||
1446 | - Fixed drivers/block/md.c | ||
1447 | Thanks to Mike Galbraith <mikeg@weiden.de> | ||
1448 | |||
1449 | - Documentation fixes | ||
1450 | |||
1451 | - Moved device registration from <lp_init> to <lp_register> | ||
1452 | Thanks to Tim Waugh <twaugh@redhat.com> | ||
1453 | =============================================================================== | ||
1454 | Changes for patch v160 | ||
1455 | |||
1456 | Work sponsored by SGI | ||
1457 | |||
1458 | - Fixed drivers/char/joystick/joystick.c | ||
1459 | Thanks to Vojtech Pavlik <vojtech@suse.cz> | ||
1460 | |||
1461 | - Documentation updates | ||
1462 | |||
1463 | - Fixed arch/i386/kernel/mtrr.c if procfs and devfs not enabled | ||
1464 | |||
1465 | - Fixed drivers/char/stallion.c | ||
1466 | =============================================================================== | ||
1467 | Changes for patch v161 | ||
1468 | |||
1469 | Work sponsored by SGI | ||
1470 | |||
1471 | - Remove /dev/ide when ide-mod is unloaded | ||
1472 | |||
1473 | - Fixed bug in drivers/block/ide-probe.c when secondary but no primary | ||
1474 | |||
1475 | - Added DEVFS_FL_NO_PERSISTENCE flag | ||
1476 | |||
1477 | - Used new DEVFS_FL_NO_PERSISTENCE flag for Unix98 pty slaves | ||
1478 | |||
1479 | - Removed unnecessary call to <update_devfs_inode_from_entry> in | ||
1480 | <devfs_readdir> | ||
1481 | |||
1482 | - Only set auto-ownership for /dev/pty/s* | ||
1483 | =============================================================================== | ||
1484 | Changes for patch v162 | ||
1485 | |||
1486 | Work sponsored by SGI | ||
1487 | |||
1488 | - Set inode->i_size to correct size for symlinks | ||
1489 | Thanks to Jeremy Fitzhardinge <jeremy@goop.org> | ||
1490 | |||
1491 | - Only give lookup() method to directories to comply with new VFS | ||
1492 | assumptions | ||
1493 | |||
1494 | - Remove unnecessary tests in symlink methods | ||
1495 | |||
1496 | - Don't kill existing block ops in <devfs_read_inode> | ||
1497 | |||
1498 | - Restore auto-ownership for /dev/pty/m* | ||
1499 | =============================================================================== | ||
1500 | Changes for patch v163 | ||
1501 | |||
1502 | Work sponsored by SGI | ||
1503 | |||
1504 | - Don't create missing directories in <devfs_find_handle> | ||
1505 | |||
1506 | - Removed Documentation/filesystems/devfs/mk-devlinks | ||
1507 | |||
1508 | - Updated Documentation/filesystems/devfs/README | ||
1509 | =============================================================================== | ||
1510 | Changes for patch v164 | ||
1511 | |||
1512 | Work sponsored by SGI | ||
1513 | |||
1514 | - Fixed CONFIG_DEVFS breakage in drivers/char/serial.c introduced in | ||
1515 | linux-2.3.99-pre6-7 | ||
1516 | =============================================================================== | ||
1517 | Changes for patch v165 | ||
1518 | |||
1519 | Work sponsored by SGI | ||
1520 | |||
1521 | - Ported to kernel 2.3.99-pre6 | ||
1522 | =============================================================================== | ||
1523 | Changes for patch v166 | ||
1524 | |||
1525 | Work sponsored by SGI | ||
1526 | |||
1527 | - Added CONFIG_DEVFS_MOUNT | ||
1528 | =============================================================================== | ||
1529 | Changes for patch v167 | ||
1530 | |||
1531 | Work sponsored by SGI | ||
1532 | |||
1533 | - Updated Documentation/filesystems/devfs/README | ||
1534 | |||
1535 | - Updated sample modules.conf | ||
1536 | =============================================================================== | ||
1537 | Changes for patch v168 | ||
1538 | |||
1539 | Work sponsored by SGI | ||
1540 | |||
1541 | - Disabled multi-mount capability (use VFS bindings instead) | ||
1542 | |||
1543 | - Updated README from master HTML file | ||
1544 | =============================================================================== | ||
1545 | Changes for patch v169 | ||
1546 | |||
1547 | Work sponsored by SGI | ||
1548 | |||
1549 | - Removed multi-mount code | ||
1550 | |||
1551 | - Removed compatibility macros: VFS has changed too much | ||
1552 | =============================================================================== | ||
1553 | Changes for patch v170 | ||
1554 | |||
1555 | Work sponsored by SGI | ||
1556 | |||
1557 | - Updated README from master HTML file | ||
1558 | |||
1559 | - Merged devfs inode into devfs entry | ||
1560 | =============================================================================== | ||
1561 | Changes for patch v171 | ||
1562 | |||
1563 | Work sponsored by SGI | ||
1564 | |||
1565 | - Updated sample modules.conf | ||
1566 | |||
1567 | - Removed dead code in <devfs_register> which used to call | ||
1568 | <free_dentries> | ||
1569 | |||
1570 | - Ported to kernel 2.4.0-test2-pre3 | ||
1571 | =============================================================================== | ||
1572 | Changes for patch v172 | ||
1573 | |||
1574 | Work sponsored by SGI | ||
1575 | |||
1576 | - Changed interface to <devfs_register> | ||
1577 | |||
1578 | - Changed interface to <devfs_register_series> | ||
1579 | =============================================================================== | ||
1580 | Changes for patch v173 | ||
1581 | |||
1582 | Work sponsored by SGI | ||
1583 | |||
1584 | - Simplified interface to <devfs_mk_symlink> | ||
1585 | |||
1586 | - Simplified interface to <devfs_mk_dir> | ||
1587 | |||
1588 | - Simplified interface to <devfs_find_handle> | ||
1589 | =============================================================================== | ||
1590 | Changes for patch v174 | ||
1591 | |||
1592 | Work sponsored by SGI | ||
1593 | |||
1594 | - Updated README from master HTML file | ||
1595 | =============================================================================== | ||
1596 | Changes for patch v175 | ||
1597 | |||
1598 | Work sponsored by SGI | ||
1599 | |||
1600 | - DocBook update for fs/devfs/base.c | ||
1601 | Thanks to Tim Waugh <twaugh@redhat.com> | ||
1602 | |||
1603 | - Removed stale fs/tunnel.c (was never used or completed) | ||
1604 | =============================================================================== | ||
1605 | Changes for patch v176 | ||
1606 | |||
1607 | Work sponsored by SGI | ||
1608 | |||
1609 | - Updated ToDo list | ||
1610 | |||
1611 | - Removed sample modules.conf: now distributed with devfsd | ||
1612 | |||
1613 | - Updated README from master HTML file | ||
1614 | |||
1615 | - Ported to kernel 2.4.0-test3-pre4 (which had devfs-patch-v174) | ||
1616 | =============================================================================== | ||
1617 | Changes for patch v177 | ||
1618 | |||
1619 | - Updated README from master HTML file | ||
1620 | |||
1621 | - Documentation cleanups | ||
1622 | |||
1623 | - Ensure <devfs_generate_path> terminates string for root entry | ||
1624 | Thanks to Tim Jansen <tim@tjansen.de> | ||
1625 | |||
1626 | - Exported <devfs_get_name> to modules | ||
1627 | |||
1628 | - Make <devfs_mk_symlink> send events to devfsd | ||
1629 | |||
1630 | - Cleaned up option processing in <devfs_setup> | ||
1631 | |||
1632 | - Fixed bugs in handling symlinks: could leak or cause Oops | ||
1633 | |||
1634 | - Cleaned up directory handling by separating fops | ||
1635 | Thanks to Alexander Viro <viro@parcelfarce.linux.theplanet.co.uk> | ||
1636 | =============================================================================== | ||
1637 | Changes for patch v178 | ||
1638 | |||
1639 | - Fixed handling of inverted options in <devfs_setup> | ||
1640 | =============================================================================== | ||
1641 | Changes for patch v179 | ||
1642 | |||
1643 | - Adjusted <try_modload> to account for <devfs_generate_path> fix | ||
1644 | =============================================================================== | ||
1645 | Changes for patch v180 | ||
1646 | |||
1647 | - Fixed !CONFIG_DEVFS_FS stub declaration of <devfs_get_info> | ||
1648 | =============================================================================== | ||
1649 | Changes for patch v181 | ||
1650 | |||
1651 | - Answered question posed by Al Viro and removed his comments from <devfs_open> | ||
1652 | |||
1653 | - Moved setting of registered flag after other fields are changed | ||
1654 | |||
1655 | - Fixed race between <devfsd_close> and <devfsd_notify_one> | ||
1656 | |||
1657 | - Global VFS changes added bogus BKL to devfsd_close(): removed | ||
1658 | |||
1659 | - Widened locking in <devfs_readlink> and <devfs_follow_link> | ||
1660 | |||
1661 | - Replaced <devfsd_read> stack usage with <devfsd_ioctl> kmalloc | ||
1662 | |||
1663 | - Simplified locking in <devfsd_ioctl> and fixed memory leak | ||
1664 | =============================================================================== | ||
1665 | Changes for patch v182 | ||
1666 | |||
1667 | - Created <devfs_*alloc_major> and <devfs_*alloc_devnum> | ||
1668 | |||
1669 | - Removed broken devnum allocation and use <devfs_alloc_devnum> | ||
1670 | |||
1671 | - Fixed old devnum leak by calling new <devfs_dealloc_devnum> | ||
1672 | |||
1673 | - Created <devfs_*alloc_unique_number> | ||
1674 | |||
1675 | - Fixed number leak for /dev/cdroms/cdrom%d | ||
1676 | |||
1677 | - Fixed number leak for /dev/discs/disc%d | ||
1678 | =============================================================================== | ||
1679 | Changes for patch v183 | ||
1680 | |||
1681 | - Fixed bug in <devfs_setup> which could hang boot process | ||
1682 | =============================================================================== | ||
1683 | Changes for patch v184 | ||
1684 | |||
1685 | - Documentation typo fix for fs/devfs/util.c | ||
1686 | |||
1687 | - Fixed drivers/char/stallion.c for devfs | ||
1688 | |||
1689 | - Added DEVFSD_NOTIFY_DELETE event | ||
1690 | |||
1691 | - Updated README from master HTML file | ||
1692 | |||
1693 | - Removed #include <asm/segment.h> from fs/devfs/base.c | ||
1694 | =============================================================================== | ||
1695 | Changes for patch v185 | ||
1696 | |||
1697 | - Made <block_semaphore> and <char_semaphore> in fs/devfs/util.c | ||
1698 | private | ||
1699 | |||
1700 | - Fixed inode table races by removing it and using inode->u.generic_ip | ||
1701 | instead | ||
1702 | |||
1703 | - Moved <devfs_read_inode> into <get_vfs_inode> | ||
1704 | |||
1705 | - Moved <devfs_write_inode> into <devfs_notify_change> | ||
1706 | =============================================================================== | ||
1707 | Changes for patch v186 | ||
1708 | |||
1709 | - Fixed race in <devfs_do_symlink> for uni-processor | ||
1710 | |||
1711 | - Updated README from master HTML file | ||
1712 | =============================================================================== | ||
1713 | Changes for patch v187 | ||
1714 | |||
1715 | - Fixed drivers/char/stallion.c for devfs | ||
1716 | |||
1717 | - Fixed drivers/char/rocket.c for devfs | ||
1718 | |||
1719 | - Fixed bug in <devfs_alloc_unique_number>: limited to 128 numbers | ||
1720 | =============================================================================== | ||
1721 | Changes for patch v188 | ||
1722 | |||
1723 | - Updated major masks in fs/devfs/util.c up to Linus' "no new majors" | ||
1724 | proclamation. Block: were 126 now 122 free, char: were 26 now 19 free | ||
1725 | |||
1726 | - Updated README from master HTML file | ||
1727 | |||
1728 | - Removed remnant of multi-mount support in <devfs_mknod> | ||
1729 | |||
1730 | - Removed unused DEVFS_FL_SHOW_UNREG flag | ||
1731 | =============================================================================== | ||
1732 | Changes for patch v189 | ||
1733 | |||
1734 | - Removed nlink field from struct devfs_inode | ||
1735 | |||
1736 | - Removed auto-ownership for /dev/pty/* (BSD ptys) and used | ||
1737 | DEVFS_FL_CURRENT_OWNER|DEVFS_FL_NO_PERSISTENCE for /dev/pty/s* (just | ||
1738 | like Unix98 pty slaves) and made /dev/pty/m* rw-rw-rw- access | ||
1739 | =============================================================================== | ||
1740 | Changes for patch v190 | ||
1741 | |||
1742 | - Updated README from master HTML file | ||
1743 | |||
1744 | - Replaced BKL with global rwsem to protect symlink data (quick and | ||
1745 | dirty hack) | ||
1746 | =============================================================================== | ||
1747 | Changes for patch v191 | ||
1748 | |||
1749 | - Replaced global rwsem for symlink with per-link refcount | ||
1750 | =============================================================================== | ||
1751 | Changes for patch v192 | ||
1752 | |||
1753 | - Removed unnecessary #ifdef CONFIG_DEVFS_FS from arch/i386/kernel/mtrr.c | ||
1754 | |||
1755 | - Ported to kernel 2.4.10-pre11 | ||
1756 | |||
1757 | - Set inode->i_mapping->a_ops for block nodes in <get_vfs_inode> | ||
1758 | =============================================================================== | ||
1759 | Changes for patch v193 | ||
1760 | |||
1761 | - Went back to global rwsem for symlinks (refcount scheme no good) | ||
1762 | =============================================================================== | ||
1763 | Changes for patch v194 | ||
1764 | |||
1765 | - Fixed overrun in <devfs_link> by removing function (not needed) | ||
1766 | |||
1767 | - Updated README from master HTML file | ||
1768 | =============================================================================== | ||
1769 | Changes for patch v195 | ||
1770 | |||
1771 | - Fixed buffer underrun in <try_modload> | ||
1772 | |||
1773 | - Moved down_read() from <search_for_entry_in_dir> to <find_entry> | ||
1774 | =============================================================================== | ||
1775 | Changes for patch v196 | ||
1776 | |||
1777 | - Fixed race in <devfsd_ioctl> when setting event mask | ||
1778 | Thanks to Kari Hurtta <hurtta@leija.mh.fmi.fi> | ||
1779 | |||
1780 | - Avoid deadlock in <devfs_follow_link> by using temporary buffer | ||
1781 | =============================================================================== | ||
1782 | Changes for patch v197 | ||
1783 | |||
1784 | - First release of new locking code for devfs core (v1.0) | ||
1785 | |||
1786 | - Fixed bug in drivers/cdrom/cdrom.c | ||
1787 | =============================================================================== | ||
1788 | Changes for patch v198 | ||
1789 | |||
1790 | - Discard temporary buffer, now use "%s" for dentry names | ||
1791 | |||
1792 | - Don't generate path in <try_modload>: use fake entry instead | ||
1793 | |||
1794 | - Use "existing" directory in <_devfs_make_parent_for_leaf> | ||
1795 | |||
1796 | - Use slab cache rather than fixed buffer for devfsd events | ||
1797 | =============================================================================== | ||
1798 | Changes for patch v199 | ||
1799 | |||
1800 | - Removed obsolete usage of DEVFS_FL_NO_PERSISTENCE | ||
1801 | |||
1802 | - Send DEVFSD_NOTIFY_REGISTERED events in <devfs_mk_dir> | ||
1803 | |||
1804 | - Fixed locking bug in <devfs_d_revalidate_wait> due to typo | ||
1805 | |||
1806 | - Do not send CREATE, CHANGE, ASYNC_OPEN or DELETE events from devfsd | ||
1807 | or children | ||
1808 | =============================================================================== | ||
1809 | Changes for patch v200 | ||
1810 | |||
1811 | - Ported to kernel 2.5.1-pre2 | ||
1812 | =============================================================================== | ||
1813 | Changes for patch v201 | ||
1814 | |||
1815 | - Fixed bug in <devfsd_read>: was dereferencing freed pointer | ||
1816 | =============================================================================== | ||
1817 | Changes for patch v202 | ||
1818 | |||
1819 | - Fixed bug in <devfsd_close>: was dereferencing freed pointer | ||
1820 | |||
1821 | - Added process group check for devfsd privileges | ||
1822 | =============================================================================== | ||
1823 | Changes for patch v203 | ||
1824 | |||
1825 | - Use SLAB_ATOMIC in <devfsd_notify_de> from <devfs_d_delete> | ||
1826 | =============================================================================== | ||
1827 | Changes for patch v204 | ||
1828 | |||
1829 | - Removed long obsolete rc.devfs | ||
1830 | |||
1831 | - Return old entry in <devfs_mk_dir> for 2.4.x kernels | ||
1832 | |||
1833 | - Updated README from master HTML file | ||
1834 | |||
1835 | - Increment refcount on module in <check_disc_changed> | ||
1836 | |||
1837 | - Created <devfs_get_handle> and exported <devfs_put> | ||
1838 | |||
1839 | - Increment refcount on module in <devfs_get_ops> | ||
1840 | |||
1841 | - Created <devfs_put_ops> and used where needed to fix races | ||
1842 | |||
1843 | - Added clarifying comments in response to preliminary EMC code review | ||
1844 | |||
1845 | - Added poisoning to <devfs_put> | ||
1846 | |||
1847 | - Improved debugging messages | ||
1848 | |||
1849 | - Fixed unregister bugs in drivers/md/lvm-fs.c | ||
1850 | =============================================================================== | ||
1851 | Changes for patch v205 | ||
1852 | |||
1853 | - Corrected (made useful) debugging message in <unregister> | ||
1854 | |||
1855 | - Moved <kmem_cache_create> in <mount_devfs_fs> to <init_devfs_fs> | ||
1856 | |||
1857 | - Fixed drivers/md/lvm-fs.c to create "lvm" entry | ||
1858 | |||
1859 | - Added magic number to guard against scribbling drivers | ||
1860 | |||
1861 | - Only return old entry in <devfs_mk_dir> if a directory | ||
1862 | |||
1863 | - Defined macros for error and debug messages | ||
1864 | |||
1865 | - Updated README from master HTML file | ||
1866 | =============================================================================== | ||
1867 | Changes for patch v206 | ||
1868 | |||
1869 | - Added support for multiple Compaq cpqarray controllers | ||
1870 | |||
1871 | - Fixed (rare, old) race in <devfs_lookup> | ||
1872 | =============================================================================== | ||
1873 | Changes for patch v207 | ||
1874 | |||
1875 | - Fixed deadlock bug in <devfs_d_revalidate_wait> | ||
1876 | |||
1877 | - Tag VFS deletable in <devfs_mk_symlink> if handle ignored | ||
1878 | |||
1879 | - Updated README from master HTML file | ||
1880 | =============================================================================== | ||
1881 | Changes for patch v208 | ||
1882 | |||
1883 | - Added KERN_* to remaining messages | ||
1884 | |||
1885 | - Cleaned up declaration of <stat_read> | ||
1886 | |||
1887 | - Updated README from master HTML file | ||
1888 | =============================================================================== | ||
1889 | Changes for patch v209 | ||
1890 | |||
1891 | - Updated README from master HTML file | ||
1892 | |||
1893 | - Removed silently introduced calls to lock_kernel() and | ||
1894 | unlock_kernel() due to recent VFS locking changes. BKL isn't | ||
1895 | required in devfs | ||
1896 | |||
1897 | - Changed <devfs_rmdir> to allow later additions if not yet empty | ||
1898 | |||
1899 | - Added calls to <devfs_register_partitions> in drivers/block/blkpc.c | ||
1900 | <add_partition> and <del_partition> | ||
1901 | |||
1902 | - Fixed bug in <devfs_alloc_unique_number>: was clearing beyond | ||
1903 | bitfield | ||
1904 | |||
1905 | - Fixed bitfield data type for <devfs_*alloc_devnum> | ||
1906 | |||
1907 | - Made major bitfield type and initialiser 64 bit safe | ||
1908 | =============================================================================== | ||
1909 | Changes for patch v210 | ||
1910 | |||
1911 | - Updated fs/devfs/util.c to fix shift warning on 64 bit machines | ||
1912 | Thanks to Anton Blanchard <anton@samba.org> | ||
1913 | |||
1914 | - Updated README from master HTML file | ||
1915 | =============================================================================== | ||
1916 | Changes for patch v211 | ||
1917 | |||
1918 | - Do not put miscellaneous character devices in /dev/misc if they | ||
1919 | specify their own directory (i.e. contain a '/' character) | ||
1920 | |||
1921 | - Copied macro for error messages from fs/devfs/base.c to | ||
1922 | fs/devfs/util.c and made use of this macro | ||
1923 | |||
1924 | - Removed 2.4.x compatibility code from fs/devfs/base.c | ||
1925 | =============================================================================== | ||
1926 | Changes for patch v212 | ||
1927 | |||
1928 | - Added BKL to <devfs_open> because drivers still need it | ||
1929 | =============================================================================== | ||
1930 | Changes for patch v213 | ||
1931 | |||
1932 | - Protected <scan_dir_for_removable> and <get_removable_partition> | ||
1933 | from changing directory contents | ||
1934 | =============================================================================== | ||
1935 | Changes for patch v214 | ||
1936 | |||
1937 | - Switched to ISO C structure field initialisers | ||
1938 | |||
1939 | - Switch to set_current_state() and move before add_wait_queue() | ||
1940 | |||
1941 | - Updated README from master HTML file | ||
1942 | |||
1943 | - Fixed devfs entry leak in <devfs_readdir> when *readdir fails | ||
1944 | =============================================================================== | ||
1945 | Changes for patch v215 | ||
1946 | |||
1947 | - Created <devfs_find_and_unregister> | ||
1948 | |||
1949 | - Switched many functions from <devfs_find_handle> to | ||
1950 | <devfs_find_and_unregister> | ||
1951 | |||
1952 | - Switched many functions from <devfs_find_handle> to <devfs_get_handle> | ||
1953 | =============================================================================== | ||
1954 | Changes for patch v216 | ||
1955 | |||
1956 | - Switched arch/ia64/sn/io/hcl.c from <devfs_find_handle> to | ||
1957 | <devfs_get_handle> | ||
1958 | |||
1959 | - Removed deprecated <devfs_find_handle> | ||
1960 | =============================================================================== | ||
1961 | Changes for patch v217 | ||
1962 | |||
1963 | - Exported <devfs_find_and_unregister> and <devfs_only> to modules | ||
1964 | |||
1965 | - Updated README from master HTML file | ||
1966 | |||
1967 | - Fixed module unload race in <devfs_open> | ||
1968 | =============================================================================== | ||
1969 | Changes for patch v218 | ||
1970 | |||
1971 | - Removed DEVFS_FL_AUTO_OWNER flag | ||
1972 | |||
1973 | - Switched lingering structure field initialiser to ISO C | ||
1974 | |||
1975 | - Added locking when setting/clearing flags | ||
1976 | |||
1977 | - Documentation fix in fs/devfs/util.c | ||
diff --git a/Documentation/filesystems/devfs/README b/Documentation/filesystems/devfs/README deleted file mode 100644 index aabfba24bc2e..000000000000 --- a/Documentation/filesystems/devfs/README +++ /dev/null | |||
@@ -1,1959 +0,0 @@ | |||
1 | Devfs (Device File System) FAQ | ||
2 | |||
3 | |||
4 | Linux Devfs (Device File System) FAQ | ||
5 | Richard Gooch | ||
6 | 20-AUG-2002 | ||
7 | |||
8 | |||
9 | Document languages: | ||
10 | |||
11 | |||
12 | |||
13 | |||
14 | |||
15 | |||
16 | |||
17 | ----------------------------------------------------------------------------- | ||
18 | |||
19 | NOTE: the master copy of this document is available online at: | ||
20 | |||
21 | http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html | ||
22 | and looks much better than the text version distributed with the | ||
23 | kernel sources. A mirror site is available at: | ||
24 | |||
25 | http://www.ras.ucalgary.ca/~rgooch/linux/docs/devfs.html | ||
26 | |||
27 | There is also an optional daemon that may be used with devfs. You can | ||
28 | find out more about it at: | ||
29 | |||
30 | http://www.atnf.csiro.au/~rgooch/linux/ | ||
31 | |||
32 | A mailing list is available which you may subscribe to. Send | ||
33 | |||
34 | to majordomo@oss.sgi.com with the following line in the | ||
35 | body of the message: | ||
36 | subscribe devfs | ||
37 | To unsubscribe, send the message body: | ||
38 | unsubscribe devfs | ||
39 | instead. The list is archived at | ||
40 | |||
41 | http://oss.sgi.com/projects/devfs/archive/. | ||
42 | |||
43 | ----------------------------------------------------------------------------- | ||
44 | |||
45 | Contents | ||
46 | |||
47 | |||
48 | What is it? | ||
49 | |||
50 | Why do it? | ||
51 | |||
52 | Who else does it? | ||
53 | |||
54 | How it works | ||
55 | |||
56 | Operational issues (essential reading) | ||
57 | |||
58 | Instructions for the impatient | ||
59 | Permissions persistence across reboots | ||
60 | Dealing with drivers without devfs support | ||
61 | All the way with Devfs | ||
62 | Other Issues | ||
63 | Kernel Naming Scheme | ||
64 | Devfsd Naming Scheme | ||
65 | Old Compatibility Names | ||
66 | SCSI Host Probing Issues | ||
67 | |||
68 | |||
69 | |||
70 | Device drivers currently ported | ||
71 | |||
72 | Allocation of Device Numbers | ||
73 | |||
74 | Questions and Answers | ||
75 | |||
76 | Making things work | ||
77 | Alternatives to devfs | ||
78 | What I don't like about devfs | ||
79 | How to report bugs | ||
80 | Strange kernel messages | ||
81 | Compilation problems with devfsd | ||
82 | |||
83 | |||
84 | Other resources | ||
85 | |||
86 | Translations of this document | ||
87 | |||
88 | |||
89 | ----------------------------------------------------------------------------- | ||
90 | |||
91 | |||
92 | What is it? | ||
93 | |||
94 | Devfs is an alternative to "real" character and block special devices | ||
95 | on your root filesystem. Kernel device drivers can register devices by | ||
96 | name rather than major and minor numbers. These devices will appear in | ||
97 | devfs automatically, with whatever default ownership and | ||
98 | protection the driver specified. A daemon (devfsd) can be used to | ||
99 | override these defaults. Devfs has been in the kernel since 2.3.46. | ||
100 | |||
101 | NOTE that devfs is entirely optional. If you prefer the old | ||
102 | disc-based device nodes, then simply leave CONFIG_DEVFS_FS=n (the | ||
103 | default). In this case, nothing will change. ALSO NOTE that if you do | ||
104 | enable devfs, the defaults are such that full compatibility is | ||
105 | maintained with the old devices names. | ||
106 | |||
107 | There are two aspects to devfs: one is the underlying device | ||
108 | namespace, which is a namespace just like any mounted filesystem. The | ||
109 | other aspect is the filesystem code which provides a view of the | ||
110 | device namespace. The reason I make a distinction is because devfs | ||
111 | can be mounted many times, with each mount showing the same device | ||
112 | namespace. Changes made are global to all mounted devfs filesystems. | ||
113 | Also, because the devfs namespace exists without any devfs mounts, you | ||
114 | can easily mount the root filesystem by referring to an entry in the | ||
115 | devfs namespace. | ||
116 | |||
117 | |||
118 | The cost of devfs is a small increase in kernel code size and memory | ||
119 | usage. About 7 pages of code (some of that in __init sections) and 72 | ||
120 | bytes for each entry in the namespace. A modest system has only a | ||
121 | couple of hundred device entries, so this costs a few more | ||
122 | pages. Compare this with the suggestion to put /dev on a <a | ||
123 | href="#why-faq-ramdisc">ramdisc. | ||
124 | |||
125 | On a typical machine, the cost is under 0.2 percent. On a modest | ||
126 | system with 64 MBytes of RAM, the cost is under 0.1 percent. The | ||
127 | accusations of "bloatware" levelled at devfs are not justified. | ||
128 | |||
129 | ----------------------------------------------------------------------------- | ||
130 | |||
131 | |||
132 | Why do it? | ||
133 | |||
134 | There are several problems that devfs addresses. Some of these | ||
135 | problems are more serious than others (depending on your point of | ||
136 | view), and some can be solved without devfs. However, the totality of | ||
137 | these problems really calls out for devfs. | ||
138 | |||
139 | The choice is a patchwork of inefficient user space solutions, which | ||
140 | are complex and likely to be fragile, or to use a simple and efficient | ||
141 | devfs which is robust. | ||
142 | |||
143 | There have been many counter-proposals to devfs, all seeking to | ||
144 | provide some of the benefits without actually implementing devfs. So | ||
145 | far there has been an absence of code and no proposed alternative has | ||
146 | been able to provide all the features that devfs does. Further, | ||
147 | alternative proposals require far more complexity in user-space (and | ||
148 | still deliver less functionality than devfs). Some people have the | ||
149 | mantra of reducing "kernel bloat", but don't consider the effects on | ||
150 | user-space. | ||
151 | |||
152 | A good solution limits the total complexity of kernel-space and | ||
153 | user-space. | ||
154 | |||
155 | |||
156 | Major&minor allocation | ||
157 | |||
158 | The existing scheme requires the allocation of major and minor device | ||
159 | numbers for each and every device. This means that a central | ||
160 | co-ordinating authority is required to issue these device numbers | ||
161 | (unless you're developing a "private" device driver), in order to | ||
162 | preserve uniqueness. Devfs shifts the burden to a namespace. This may | ||
163 | not seem like a huge benefit, but actually it is. Since driver authors | ||
164 | will naturally choose a device name which reflects the functionality | ||
165 | of the device, there is far less potential for namespace conflict. | ||
166 | Solving this requires a kernel change. | ||
167 | |||
168 | /dev management | ||
169 | |||
170 | Because you currently access devices through device nodes, these must | ||
171 | be created by the system administrator. For standard devices you can | ||
172 | usually find a MAKEDEV programme which creates all these (hundreds!) | ||
173 | of nodes. This means that changes in the kernel must be reflected by | ||
174 | changes in the MAKEDEV programme, or else the system administrator | ||
175 | creates device nodes by hand. | ||
176 | |||
177 | The basic problem is that there are two separate databases of | ||
178 | major and minor numbers. One is in the kernel and one is in /dev (or | ||
179 | in a MAKEDEV programme, if you want to look at it that way). This is | ||
180 | duplication of information, which is not good practice. | ||
181 | Solving this requires a kernel change. | ||
182 | |||
183 | /dev growth | ||
184 | |||
185 | A typical /dev has over 1200 nodes! Most of these devices simply don't | ||
186 | exist because the hardware is not available. A huge /dev increases the | ||
187 | time to access devices (I'm just referring to the dentry lookup times | ||
188 | and the time taken to read inodes off disc: the next subsection shows | ||
189 | some more horrors). | ||
190 | |||
191 | An example of how big /dev can grow is if we consider SCSI devices: | ||
192 | |||
193 | host 6 bits (say up to 64 hosts on a really big machine) | ||
194 | channel 4 bits (say up to 16 SCSI buses per host) | ||
195 | id 4 bits | ||
196 | lun 3 bits | ||
197 | partition 6 bits | ||
198 | TOTAL 23 bits | ||
199 | |||
200 | |||
201 | This requires 8 Mega (1024*1024) inodes if we want to store all | ||
202 | possible device nodes. Even if we scrap everything but id,partition | ||
203 | and assume a single host adapter with a single SCSI bus and only one | ||
204 | logical unit per SCSI target (id), that's still 10 bits or 1024 | ||
205 | inodes. Each VFS inode takes around 256 bytes (kernel 2.1.78), so | ||
206 | that's 256 kBytes of inode storage on disc (assuming real inodes take | ||
207 | a similar amount of space as VFS inodes). This is actually not so bad, | ||
208 | because disc is cheap these days. Embedded systems would care about | ||
209 | 256 kBytes of /dev inodes, but you could argue that embedded systems | ||
210 | would have hand-tuned /dev directories. I've had to do just that on my | ||
211 | embedded systems, but I would rather just leave it to devfs. | ||
212 | |||
213 | Another issue is the time taken to lookup an inode when first | ||
214 | referenced. Not only does this take time in scanning through a list in | ||
215 | memory, but also the seek times to read the inodes off disc. | ||
216 | This could be solved in user-space using a clever programme which | ||
217 | scanned the kernel logs and deleted /dev entries which are not | ||
218 | available and created them when they were available. This programme | ||
219 | would need to be run every time a new module was loaded, which would | ||
220 | slow things down a lot. | ||
221 | |||
222 | There is an existing programme called scsidev which will automatically | ||
223 | create device nodes for SCSI devices. It can do this by scanning files | ||
224 | in /proc/scsi. Unfortunately, to extend this idea to other device | ||
225 | nodes would require significant modifications to existing drivers (so | ||
226 | they too would provide information in /proc). This is a non-trivial | ||
227 | change (I should know: devfs has had to do something similar). Once | ||
228 | you go to this much effort, you may as well use devfs itself (which | ||
229 | also provides this information). Furthermore, such a system would | ||
230 | likely be implemented in an ad-hoc fashion, as different drivers will | ||
231 | provide their information in different ways. | ||
232 | |||
233 | Devfs is much cleaner, because it (naturally) has a uniform mechanism | ||
234 | to provide this information: the device nodes themselves! | ||
235 | |||
236 | |||
237 | Node to driver file_operations translation | ||
238 | |||
239 | There is an important difference between the way disc-based character | ||
240 | and block nodes and devfs entries make the connection between an entry | ||
241 | in /dev and the actual device driver. | ||
242 | |||
243 | With the current 8 bit major and minor numbers the connection between | ||
244 | disc-based c&b nodes and per-major drivers is done through a | ||
245 | fixed-length table of 128 entries. The various filesystem types set | ||
246 | the inode operations for c&b nodes to {chr,blk}dev_inode_operations, | ||
247 | so when a device is opened a few quick levels of indirection bring us | ||
248 | to the driver file_operations. | ||
249 | |||
250 | For miscellaneous character devices a second step is required: there | ||
251 | is a scan for the driver entry with the same minor number as the file | ||
252 | that was opened, and the appropriate minor open method is called. This | ||
253 | scanning is done *every time* you open a device node. Potentially, you | ||
254 | may be searching through dozens of misc. entries before you find your | ||
255 | open method. While not an enormous performance overhead, this does | ||
256 | seem pointless. | ||
257 | |||
258 | Linux *must* move beyond the 8 bit major and minor barrier, | ||
259 | somehow. If we simply increase each to 16 bits, then the indexing | ||
260 | scheme used for major driver lookup becomes untenable, because the | ||
261 | major tables (one each for character and block devices) would need to | ||
262 | be 64 k entries long (512 kBytes on x86, 1 MByte for 64 bit | ||
263 | systems). So we would have to use a scheme like that used for | ||
264 | miscellaneous character devices, which means the search time goes up | ||
265 | linearly with the average number of major device drivers on your | ||
266 | system. Not all "devices" are hardware, some are higher-level drivers | ||
267 | like KGI, so you can get more "devices" without adding hardware | ||
268 | You can improve this by creating an ordered (balanced:-) | ||
269 | binary tree, in which case your search time becomes log(N). | ||
270 | Alternatively, you can use hashing to speed up the search. | ||
271 | But why do that search at all if you don't have to? Once again, it | ||
272 | seems pointless. | ||
273 | |||
274 | Note that devfs doesn't use the major&minor system. For devfs | ||
275 | entries, the connection is done when you lookup the /dev entry. When | ||
276 | devfs_register() is called, an internal table is appended which has | ||
277 | the entry name and the file_operations. If the dentry cache doesn't | ||
278 | have the /dev entry already, this internal table is scanned to get the | ||
279 | file_operations, and an inode is created. If the dentry cache already | ||
280 | has the entry, there is *no lookup time* (other than the dentry scan | ||
281 | itself, but we can't avoid that anyway, and besides Linux dentries | ||
282 | cream other OS's which don't have them:-). Furthermore, the number of | ||
283 | node entries in a devfs is only the number of available device | ||
284 | entries, not the number of *conceivable* entries. Even if you remove | ||
285 | unnecessary entries in a disc-based /dev, the number of conceivable | ||
286 | entries remains the same: you just limit yourself in order to save | ||
287 | space. | ||
288 | |||
289 | Devfs provides a fast connection between a VFS node and the device | ||
290 | driver, in a scalable way. | ||
291 | |||
292 | /dev as a system administration tool | ||
293 | |||
294 | Right now /dev contains a list of conceivable devices, most of which I | ||
295 | don't have. Devfs only shows those devices available on my | ||
296 | system. This means that listing /dev is a handy way of checking what | ||
297 | devices are available. | ||
298 | |||
299 | Major&minor size | ||
300 | |||
301 | Existing major and minor numbers are limited to 8 bits each. This is | ||
302 | now a limiting factor for some drivers, particularly the SCSI disc | ||
303 | driver, which consumes a single major number. Only 16 discs are | ||
304 | supported, and each disc may have only 15 partitions. Maybe this isn't | ||
305 | a problem for you, but some of us are building huge Linux systems with | ||
306 | disc arrays. With devfs an arbitrary pointer can be associated with | ||
307 | each device entry, which can be used to give an effective 32 bit | ||
308 | device identifier (i.e. that's like having a 32 bit minor | ||
309 | number). Since this is private to the kernel, there are no C library | ||
310 | compatibility issues which you would have with increasing major and | ||
311 | minor number sizes. See the section on "Allocation of Device Numbers" | ||
312 | for details on maintaining compatibility with userspace. | ||
313 | |||
314 | Solving this requires a kernel change. | ||
315 | |||
316 | Since writing this, the kernel has been modified so that the SCSI disc | ||
317 | driver has more major numbers allocated to it and now supports up to | ||
318 | 128 discs. Since these major numbers are non-contiguous (a result of | ||
319 | unplanned expansion), the implementation is a little more cumbersome | ||
320 | than originally. | ||
321 | |||
322 | Just like the changes to IPv4 to fix impending limitations in the | ||
323 | address space, people find ways around the limitations. In the long | ||
324 | run, however, solutions like IPv6 or devfs can't be put off forever. | ||
325 | |||
326 | Read-only root filesystem | ||
327 | |||
328 | Having your device nodes on the root filesystem means that you can't | ||
329 | operate properly with a read-only root filesystem. This is because you | ||
330 | want to change ownerships and protections of tty devices. Existing | ||
331 | practice prevents you using a CD-ROM as your root filesystem for a | ||
332 | *real* system. Sure, you can boot off a CD-ROM, but you can't change | ||
333 | tty ownerships, so it's only good for installing. | ||
334 | |||
335 | Also, you can't use a shared NFS root filesystem for a cluster of | ||
336 | discless Linux machines (having tty ownerships changed on a common | ||
337 | /dev is not good). Nor can you embed your root filesystem in a | ||
338 | ROM-FS. | ||
339 | |||
340 | You can get around this by creating a RAMDISC at boot time, making | ||
341 | an ext2 filesystem in it, mounting it somewhere and copying the | ||
342 | contents of /dev into it, then unmounting it and mounting it over | ||
343 | /dev. | ||
344 | |||
345 | A devfs is a cleaner way of solving this. | ||
346 | |||
347 | Non-Unix root filesystem | ||
348 | |||
349 | Non-Unix filesystems (such as NTFS) can't be used for a root | ||
350 | filesystem because they variously don't support character and block | ||
351 | special files or symbolic links. You can't have a separate disc-based | ||
352 | or RAMDISC-based filesystem mounted on /dev because you need device | ||
353 | nodes before you can mount these. Devfs can be mounted without any | ||
354 | device nodes. Devlinks won't work because symlinks aren't supported. | ||
355 | An alternative solution is to use initrd to mount a RAMDISC initial | ||
356 | root filesystem (which is populated with a minimal set of device | ||
357 | nodes), and then construct a new /dev in another RAMDISC, and finally | ||
358 | switch to your non-Unix root filesystem. This requires clever boot | ||
359 | scripts and a fragile and conceptually complex boot procedure. | ||
360 | |||
361 | Devfs solves this in a robust and conceptually simple way. | ||
362 | |||
363 | PTY security | ||
364 | |||
365 | Current pseudo-tty (pty) devices are owned by root and read-writable | ||
366 | by everyone. The user of a pty-pair cannot change | ||
367 | ownership/protections without being suid-root. | ||
368 | |||
369 | This could be solved with a secure user-space daemon which runs as | ||
370 | root and does the actual creation of pty-pairs. Such a daemon would | ||
371 | require modification to *every* programme that wants to use this new | ||
372 | mechanism. It also slows down creation of pty-pairs. | ||
373 | |||
374 | An alternative is to create a new open_pty() syscall which does much | ||
375 | the same thing as the user-space daemon. Once again, this requires | ||
376 | modifications to pty-handling programmes. | ||
377 | |||
378 | The devfs solution allows a device driver to "tag" certain device | ||
379 | files so that when an unopened device is opened, the ownerships are | ||
380 | changed to the current euid and egid of the opening process, and the | ||
381 | protections are changed to the default registered by the driver. When | ||
382 | the device is closed ownership is set back to root and protections are | ||
383 | set back to read-write for everybody. No programme need be changed. | ||
384 | The devpts filesystem provides this auto-ownership feature for Unix98 | ||
385 | ptys. It doesn't support old-style pty devices, nor does it have all | ||
386 | the other features of devfs. | ||
387 | |||
388 | Intelligent device management | ||
389 | |||
390 | Devfs implements a simple yet powerful protocol for communication with | ||
391 | a device management daemon (devfsd) which runs in user space. It is | ||
392 | possible to send a message (either synchronously or asynchronously) to | ||
393 | devfsd on any event, such as registration/unregistration of device | ||
394 | entries, opening and closing devices, looking up inodes, scanning | ||
395 | directories and more. This has many possibilities. Some of these are | ||
396 | already implemented. See: | ||
397 | |||
398 | |||
399 | http://www.atnf.csiro.au/~rgooch/linux/ | ||
400 | |||
401 | Device entry registration events can be used by devfsd to change | ||
402 | permissions of newly-created device nodes. This is one mechanism to | ||
403 | control device permissions. | ||
404 | |||
405 | Device entry registration/unregistration events can be used to run | ||
406 | programmes or scripts. This can be used to provide automatic mounting | ||
407 | of filesystems when a new block device media is inserted into the | ||
408 | drive. | ||
409 | |||
410 | Asynchronous device open and close events can be used to implement | ||
411 | clever permissions management. For example, the default permissions on | ||
412 | /dev/dsp do not allow everybody to read from the device. This is | ||
413 | sensible, as you don't want some remote user recording what you say at | ||
414 | your console. However, the console user is also prevented from | ||
415 | recording. This behaviour is not desirable. With asynchronous device | ||
416 | open and close events, you can have devfsd run a programme or script | ||
417 | when console devices are opened to change the ownerships for *other* | ||
418 | device nodes (such as /dev/dsp). On closure, you can run a different | ||
419 | script to restore permissions. An advantage of this scheme over | ||
420 | modifying the C library tty handling is that this works even if your | ||
421 | programme crashes (how many times have you seen the utmp database with | ||
422 | lingering entries for non-existent logins?). | ||
423 | |||
424 | Synchronous device open events can be used to perform intelligent | ||
425 | device access protections. Before the device driver open() method is | ||
426 | called, the daemon must first validate the open attempt, by running an | ||
427 | external programme or script. This is far more flexible than access | ||
428 | control lists, as access can be determined on the basis of other | ||
429 | system conditions instead of just the UID and GID. | ||
430 | |||
431 | Inode lookup events can be used to authenticate module autoload | ||
432 | requests. Instead of using kmod directly, the event is sent to | ||
433 | devfsd which can implement an arbitrary authentication before loading | ||
434 | the module itself. | ||
435 | |||
436 | Inode lookup events can also be used to construct arbitrary | ||
437 | namespaces, without having to resort to populating devfs with symlinks | ||
438 | to devices that don't exist. | ||
439 | |||
440 | Speculative Device Scanning | ||
441 | |||
442 | Consider an application (like cdparanoia) that wants to find all | ||
443 | CD-ROM devices on the system (SCSI, IDE and other types), whether or | ||
444 | not their respective modules are loaded. The application must | ||
445 | speculatively open certain device nodes (such as /dev/sr0 for the SCSI | ||
446 | CD-ROMs) in order to make sure the module is loaded. This requires | ||
447 | that all Linux distributions follow the standard device naming scheme | ||
448 | (last time I looked RedHat did things differently). Devfs solves the | ||
449 | naming problem. | ||
450 | |||
451 | The same application also wants to see which devices are actually | ||
452 | available on the system. With the existing system it needs to read the | ||
453 | /dev directory and speculatively open each /dev/sr* device to | ||
454 | determine if the device exists or not. With a large /dev this is an | ||
455 | inefficient operation, especially if there are many /dev/sr* nodes. A | ||
456 | solution like scsidev could reduce the number of /dev/sr* entries (but | ||
457 | of course that also requires all that inefficient directory scanning). | ||
458 | |||
459 | With devfs, the application can open the /dev/sr directory | ||
460 | (which triggers the module autoloading if required), and proceed to | ||
461 | read /dev/sr. Since only the available devices will have | ||
462 | entries, there are no inefficencies in directory scanning or device | ||
463 | openings. | ||
464 | |||
465 | ----------------------------------------------------------------------------- | ||
466 | |||
467 | Who else does it? | ||
468 | |||
469 | FreeBSD has a devfs implementation. Solaris and AIX each have a | ||
470 | pseudo-devfs (something akin to scsidev but for all devices, with some | ||
471 | unspecified kernel support). BeOS, Plan9 and QNX also have it. SGI's | ||
472 | IRIX 6.4 and above also have a device filesystem. | ||
473 | |||
474 | While we shouldn't just automatically do something because others do | ||
475 | it, we should not ignore the work of others either. FreeBSD has a lot | ||
476 | of competent people working on it, so their opinion should not be | ||
477 | blithely ignored. | ||
478 | |||
479 | ----------------------------------------------------------------------------- | ||
480 | |||
481 | |||
482 | How it works | ||
483 | |||
484 | Registering device entries | ||
485 | |||
486 | For every entry (device node) in a devfs-based /dev a driver must call | ||
487 | devfs_register(). This adds the name of the device entry, the | ||
488 | file_operations structure pointer and a few other things to an | ||
489 | internal table. Device entries may be added and removed at any | ||
490 | time. When a device entry is registered, it automagically appears in | ||
491 | any mounted devfs'. | ||
492 | |||
493 | Inode lookup | ||
494 | |||
495 | When a lookup operation on an entry is performed and if there is no | ||
496 | driver information for that entry devfs will attempt to call | ||
497 | devfsd. If still no driver information can be found then a negative | ||
498 | dentry is yielded and the next stage operation will be called by the | ||
499 | VFS (such as create() or mknod() inode methods). If driver information | ||
500 | can be found, an inode is created (if one does not exist already) and | ||
501 | all is well. | ||
502 | |||
503 | Manually creating device nodes | ||
504 | |||
505 | The mknod() method allows you to create an ordinary named pipe in the | ||
506 | devfs, or you can create a character or block special inode if one | ||
507 | does not already exist. You may wish to create a character or block | ||
508 | special inode so that you can set permissions and ownership. Later, if | ||
509 | a device driver registers an entry with the same name, the | ||
510 | permissions, ownership and times are retained. This is how you can set | ||
511 | the protections on a device even before the driver is loaded. Once you | ||
512 | create an inode it appears in the directory listing. | ||
513 | |||
514 | Unregistering device entries | ||
515 | |||
516 | A device driver calls devfs_unregister() to unregister an entry. | ||
517 | |||
518 | Chroot() gaols | ||
519 | |||
520 | 2.2.x kernels | ||
521 | |||
522 | The semantics of inode creation are different when devfs is mounted | ||
523 | with the "explicit" option. Now, when a device entry is registered, it | ||
524 | will not appear until you use mknod() to create the device. It doesn't | ||
525 | matter if you mknod() before or after the device is registered with | ||
526 | devfs_register(). The purpose of this behaviour is to support | ||
527 | chroot(2) gaols, where you want to mount a minimal devfs inside the | ||
528 | gaol. Only the devices you specifically want to be available (through | ||
529 | your mknod() setup) will be accessible. | ||
530 | |||
531 | 2.4.x kernels | ||
532 | |||
533 | As of kernel 2.3.99, the VFS has had the ability to rebind parts of | ||
534 | the global filesystem namespace into another part of the namespace. | ||
535 | This now works even at the leaf-node level, which means that | ||
536 | individual files and device nodes may be bound into other parts of the | ||
537 | namespace. This is like making links, but better, because it works | ||
538 | across filesystems (unlike hard links) and works through chroot() | ||
539 | gaols (unlike symbolic links). | ||
540 | |||
541 | Because of these improvements to the VFS, the multi-mount capability | ||
542 | in devfs is no longer needed. The administrator may create a minimal | ||
543 | device tree inside a chroot(2) gaol by using VFS bindings. As this | ||
544 | provides most of the features of the devfs multi-mount capability, I | ||
545 | removed the multi-mount support code (after issuing an RFC). This | ||
546 | yielded code size reductions and simplifications. | ||
547 | |||
548 | If you want to construct a minimal chroot() gaol, the following | ||
549 | command should suffice: | ||
550 | |||
551 | mount --bind /dev/null /gaol/dev/null | ||
552 | |||
553 | |||
554 | Repeat for other device nodes you want to expose. Simple! | ||
555 | |||
556 | ----------------------------------------------------------------------------- | ||
557 | |||
558 | |||
559 | Operational issues | ||
560 | |||
561 | |||
562 | Instructions for the impatient | ||
563 | |||
564 | Nobody likes reading documentation. People just want to get in there | ||
565 | and play. So this section tells you quickly the steps you need to take | ||
566 | to run with devfs mounted over /dev. Skip these steps and you will end | ||
567 | up with a nearly unbootable system. Subsequent sections describe the | ||
568 | issues in more detail, and discuss non-essential configuration | ||
569 | options. | ||
570 | |||
571 | Devfsd | ||
572 | OK, if you're reading this, I assume you want to play with | ||
573 | devfs. First you should ensure that /usr/src/linux contains a | ||
574 | recent kernel source tree. Then you need to compile devfsd, the device | ||
575 | management daemon, available at | ||
576 | |||
577 | http://www.atnf.csiro.au/~rgooch/linux/. | ||
578 | Because the kernel has a naming scheme | ||
579 | which is quite different from the old naming scheme, you need to | ||
580 | install devfsd so that software and configuration files that use the | ||
581 | old naming scheme will not break. | ||
582 | |||
583 | Compile and install devfsd. You will be provided with a default | ||
584 | configuration file /etc/devfsd.conf which will provide | ||
585 | compatibility symlinks for the old naming scheme. Don't change this | ||
586 | config file unless you know what you're doing. Even if you think you | ||
587 | do know what you're doing, don't change it until you've followed all | ||
588 | the steps below and booted a devfs-enabled system and verified that it | ||
589 | works. | ||
590 | |||
591 | Now edit your main system boot script so that devfsd is started at the | ||
592 | very beginning (before any filesystem | ||
593 | checks). /etc/rc.d/rc.sysinit is often the main boot script | ||
594 | on systems with SysV-style boot scripts. On systems with BSD-style | ||
595 | boot scripts it is often /etc/rc. Also check | ||
596 | /sbin/rc. | ||
597 | |||
598 | NOTE that the line you put into the boot | ||
599 | script should be exactly: | ||
600 | |||
601 | /sbin/devfsd /dev | ||
602 | |||
603 | DO NOT use some special daemon-launching | ||
604 | programme, otherwise the boot script may not wait for devfsd to finish | ||
605 | initialising. | ||
606 | |||
607 | System Libraries | ||
608 | There may still be some problems because of broken software making | ||
609 | assumptions about device names. In particular, some software does not | ||
610 | handle devices which are symbolic links. If you are running a libc 5 | ||
611 | based system, install libc 5.4.44 (if you have libc 5.4.46, go back to | ||
612 | libc 5.4.44, which is actually correct). If you are running a glibc | ||
613 | based system, make sure you have glibc 2.1.3 or later. | ||
614 | |||
615 | /etc/securetty | ||
616 | PAM (Pluggable Authentication Modules) is supposed to be a flexible | ||
617 | mechanism for providing better user authentication and access to | ||
618 | services. Unfortunately, it's also fragile, complex and undocumented | ||
619 | (check out RedHat 6.1, and probably other distributions as well). PAM | ||
620 | has problems with symbolic links. Append the following lines to your | ||
621 | /etc/securetty file: | ||
622 | |||
623 | vc/1 | ||
624 | vc/2 | ||
625 | vc/3 | ||
626 | vc/4 | ||
627 | vc/5 | ||
628 | vc/6 | ||
629 | vc/7 | ||
630 | vc/8 | ||
631 | |||
632 | This will not weaken security. If you have a version of util-linux | ||
633 | earlier than 2.10.h, please upgrade to 2.10.h or later. If you | ||
634 | absolutely cannot upgrade, then also append the following lines to | ||
635 | your /etc/securetty file: | ||
636 | |||
637 | 1 | ||
638 | 2 | ||
639 | 3 | ||
640 | 4 | ||
641 | 5 | ||
642 | 6 | ||
643 | 7 | ||
644 | 8 | ||
645 | |||
646 | This may potentially weaken security by allowing root logins over the | ||
647 | network (a password is still required, though). However, since there | ||
648 | are problems with dealing with symlinks, I'm suspicious of the level | ||
649 | of security offered in any case. | ||
650 | |||
651 | XFree86 | ||
652 | While not essential, it's probably a good idea to upgrade to XFree86 | ||
653 | 4.0, as patches went in to make it more devfs-friendly. If you don't, | ||
654 | you'll probably need to apply the following patch to | ||
655 | /etc/security/console.perms so that ordinary users can run | ||
656 | startx. Note that not all distributions have this file (e.g. Debian), | ||
657 | so if it's not present, don't worry about it. | ||
658 | |||
659 | --- /etc/security/console.perms.orig Sat Apr 17 16:26:47 1999 | ||
660 | +++ /etc/security/console.perms Fri Feb 25 23:53:55 2000 | ||
661 | @@ -14,7 +14,7 @@ | ||
662 | # man 5 console.perms | ||
663 | |||
664 | # file classes -- these are regular expressions | ||
665 | -<console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9] | ||
666 | +<console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9] | ||
667 | |||
668 | # device classes -- these are shell-style globs | ||
669 | <floppy>=/dev/fd[0-1]* | ||
670 | |||
671 | If the patch does not apply, then change the line: | ||
672 | |||
673 | <console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9] | ||
674 | |||
675 | with: | ||
676 | |||
677 | <console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9] | ||
678 | |||
679 | |||
680 | Disable devpts | ||
681 | I've had a report of devpts mounted on /dev/pts not working | ||
682 | correctly. Since devfs will also manage /dev/pts, there is no | ||
683 | need to mount devpts as well. You should either edit your | ||
684 | /etc/fstab so devpts is not mounted, or disable devpts from | ||
685 | your kernel configuration. | ||
686 | |||
687 | Unsupported drivers | ||
688 | Not all drivers have devfs support. If you depend on one of these | ||
689 | drivers, you will need to create a script or tarfile that you can use | ||
690 | at boot time to create device nodes as appropriate. There is a | ||
691 | section which describes this. Another | ||
692 | section lists the drivers which have | ||
693 | devfs support. | ||
694 | |||
695 | /dev/mouse | ||
696 | |||
697 | Many disributions configure /dev/mouse to be the mouse device | ||
698 | for XFree86 and GPM. I actually think this is a bad idea, because it | ||
699 | adds another level of indirection. When looking at a config file, if | ||
700 | you see /dev/mouse you're left wondering which mouse | ||
701 | is being referred to. Hence I recommend putting the actual mouse | ||
702 | device (for example /dev/psaux) into your | ||
703 | /etc/X11/XF86Config file (and similarly for the GPM | ||
704 | configuration file). | ||
705 | |||
706 | Alternatively, use the same technique used for unsupported drivers | ||
707 | described above. | ||
708 | |||
709 | The Kernel | ||
710 | Finally, you need to make sure devfs is compiled into your kernel. Set | ||
711 | CONFIG_EXPERIMENTAL=y, CONFIG_DEVFS_FS=y and CONFIG_DEVFS_MOUNT=y by | ||
712 | using favourite configuration tool (i.e. make config or | ||
713 | make xconfig) and then make clean and then recompile your kernel and | ||
714 | modules. At boot, devfs will be mounted onto /dev. | ||
715 | |||
716 | If you encounter problems booting (for example if you forgot a | ||
717 | configuration step), you can pass devfs=nomount at the kernel | ||
718 | boot command line. This will prevent the kernel from mounting devfs at | ||
719 | boot time onto /dev. | ||
720 | |||
721 | In general, a kernel built with CONFIG_DEVFS_FS=y but without mounting | ||
722 | devfs onto /dev is completely safe, and requires no | ||
723 | configuration changes. One exception to take note of is when | ||
724 | LABEL= directives are used in /etc/fstab. In this | ||
725 | case you will be unable to boot properly. This is because the | ||
726 | mount(8) programme uses /proc/partitions as part of | ||
727 | the volume label search process, and the device names it finds are not | ||
728 | available, because setting CONFIG_DEVFS_FS=y changes the names in | ||
729 | /proc/partitions, irrespective of whether devfs is mounted. | ||
730 | |||
731 | Now you've finished all the steps required. You're now ready to boot | ||
732 | your shiny new kernel. Enjoy. | ||
733 | |||
734 | Changing the configuration | ||
735 | |||
736 | OK, you've now booted a devfs-enabled system, and everything works. | ||
737 | Now you may feel like changing the configuration (common targets are | ||
738 | /etc/fstab and /etc/devfsd.conf). Since you have a | ||
739 | system that works, if you make any changes and it doesn't work, you | ||
740 | now know that you only have to restore your configuration files to the | ||
741 | default and it will work again. | ||
742 | |||
743 | |||
744 | Permissions persistence across reboots | ||
745 | |||
746 | If you don't use mknod(2) to create a device file, nor use chmod(2) or | ||
747 | chown(2) to change the ownerships/permissions, the inode ctime will | ||
748 | remain at 0 (the epoch, 12 am, 1-JAN-1970, GMT). Anything with a ctime | ||
749 | later than this has had it's ownership/permissions changed. Hence, a | ||
750 | simple script or programme may be used to tar up all changed inodes, | ||
751 | prior to shutdown. Although effective, many consider this approach a | ||
752 | kludge. | ||
753 | |||
754 | A much better approach is to use devfsd to save and restore | ||
755 | permissions. It may be configured to record changes in permissions and | ||
756 | will save them in a database (in fact a directory tree), and restore | ||
757 | these upon boot. This is an efficient method and results in immediate | ||
758 | saving of current permissions (unlike the tar approach, which saves | ||
759 | permissions at some unspecified future time). | ||
760 | |||
761 | The default configuration file supplied with devfsd has config entries | ||
762 | which you may uncomment to enable persistence management. | ||
763 | |||
764 | If you decide to use the tar approach anyway, be aware that tar will | ||
765 | first unlink(2) an inode before creating a new device node. The | ||
766 | unlink(2) has the effect of breaking the connection between a devfs | ||
767 | entry and the device driver. If you use the "devfs=only" boot option, | ||
768 | you lose access to the device driver, requiring you to reload the | ||
769 | module. I consider this a bug in tar (there is no real need to | ||
770 | unlink(2) the inode first). | ||
771 | |||
772 | Alternatively, you can use devfsd to provide more sophisticated | ||
773 | management of device permissions. You can use devfsd to store | ||
774 | permissions for whole groups of devices with a single configuration | ||
775 | entry, rather than the conventional single entry per device entry. | ||
776 | |||
777 | Permissions database stored in mounted-over /dev | ||
778 | |||
779 | If you wish to save and restore your device permissions into the | ||
780 | disc-based /dev while still mounting devfs onto /dev | ||
781 | you may do so. This requires a 2.4.x kernel (in fact, 2.3.99 or | ||
782 | later), which has the VFS binding facility. You need to do the | ||
783 | following to set this up: | ||
784 | |||
785 | |||
786 | |||
787 | make sure the kernel does not mount devfs at boot time | ||
788 | |||
789 | |||
790 | make sure you have a correct /dev/console entry in your | ||
791 | root file-system (where your disc-based /dev lives) | ||
792 | |||
793 | create the /dev-state directory | ||
794 | |||
795 | |||
796 | add the following lines near the very beginning of your boot | ||
797 | scripts: | ||
798 | |||
799 | mount --bind /dev /dev-state | ||
800 | mount -t devfs none /dev | ||
801 | devfsd /dev | ||
802 | |||
803 | |||
804 | |||
805 | |||
806 | add the following lines to your /etc/devfsd.conf file: | ||
807 | |||
808 | REGISTER ^pt[sy] IGNORE | ||
809 | CREATE ^pt[sy] IGNORE | ||
810 | CHANGE ^pt[sy] IGNORE | ||
811 | DELETE ^pt[sy] IGNORE | ||
812 | REGISTER .* COPY /dev-state/$devname $devpath | ||
813 | CREATE .* COPY $devpath /dev-state/$devname | ||
814 | CHANGE .* COPY $devpath /dev-state/$devname | ||
815 | DELETE .* CFUNCTION GLOBAL unlink /dev-state/$devname | ||
816 | RESTORE /dev-state | ||
817 | |||
818 | Note that the sample devfsd.conf file contains these lines, | ||
819 | as well as other sample configurations you may find useful. See the | ||
820 | devfsd distribution | ||
821 | |||
822 | |||
823 | reboot. | ||
824 | |||
825 | |||
826 | |||
827 | |||
828 | Permissions database stored in normal directory | ||
829 | |||
830 | If you are using an older kernel which doesn't support VFS binding, | ||
831 | then you won't be able to have the permissions database in a | ||
832 | mounted-over /dev. However, you can still use a regular | ||
833 | directory to store the database. The sample /etc/devfsd.conf | ||
834 | file above may still be used. You will need to create the | ||
835 | /dev-state directory prior to installing devfsd. If you have | ||
836 | old permissions in /dev, then just copy (or move) the device | ||
837 | nodes over to the new directory. | ||
838 | |||
839 | Which method is better? | ||
840 | |||
841 | The best method is to have the permissions database stored in the | ||
842 | mounted-over /dev. This is because you will not need to copy | ||
843 | device nodes over to /dev-state, and because it allows you to | ||
844 | switch between devfs and non-devfs kernels, without requiring you to | ||
845 | copy permissions between /dev-state (for devfs) and | ||
846 | /dev (for non-devfs). | ||
847 | |||
848 | |||
849 | Dealing with drivers without devfs support | ||
850 | |||
851 | Currently, not all device drivers in the kernel have been modified to | ||
852 | use devfs. Device drivers which do not yet have devfs support will not | ||
853 | automagically appear in devfs. The simplest way to create device nodes | ||
854 | for these drivers is to unpack a tarfile containing the required | ||
855 | device nodes. You can do this in your boot scripts. All your drivers | ||
856 | will now work as before. | ||
857 | |||
858 | Hopefully for most people devfs will have enough support so that they | ||
859 | can mount devfs directly over /dev without losing most functionality | ||
860 | (i.e. losing access to various devices). As of 22-JAN-1998 (devfs | ||
861 | patch version 10) I am now running this way. All the devices I have | ||
862 | are available in devfs, so I don't lose anything. | ||
863 | |||
864 | WARNING: if your configuration requires the old-style device names | ||
865 | (i.e. /dev/hda1 or /dev/sda1), you must install devfsd and configure | ||
866 | it to maintain compatibility entries. It is almost certain that you | ||
867 | will require this. Note that the kernel creates a compatibility entry | ||
868 | for the root device, so you don't need initrd. | ||
869 | |||
870 | Note that you no longer need to mount devpts if you use Unix98 PTYs, | ||
871 | as devfs can manage /dev/pts itself. This saves you some RAM, as you | ||
872 | don't need to compile and install devpts. Note that some versions of | ||
873 | glibc have a bug with Unix98 pty handling on devfs systems. Contact | ||
874 | the glibc maintainers for a fix. Glibc 2.1.3 has the fix. | ||
875 | |||
876 | Note also that apart from editing /etc/fstab, other things will need | ||
877 | to be changed if you *don't* install devfsd. Some software (like the X | ||
878 | server) hard-wire device names in their source. It really is much | ||
879 | easier to install devfsd so that compatibility entries are created. | ||
880 | You can then slowly migrate your system to using the new device names | ||
881 | (for example, by starting with /etc/fstab), and then limiting the | ||
882 | compatibility entries that devfsd creates. | ||
883 | |||
884 | IF YOU CONFIGURE TO MOUNT DEVFS AT BOOT, MAKE SURE YOU INSTALL DEVFSD | ||
885 | BEFORE YOU BOOT A DEVFS-ENABLED KERNEL! | ||
886 | |||
887 | Now that devfs has gone into the 2.3.46 kernel, I'm getting a lot of | ||
888 | reports back. Many of these are because people are trying to run | ||
889 | without devfsd, and hence some things break. Please just run devfsd if | ||
890 | things break. I want to concentrate on real bugs rather than | ||
891 | misconfiguration problems at the moment. If people are willing to fix | ||
892 | bugs/false assumptions in other code (i.e. glibc, X server) and submit | ||
893 | that to the respective maintainers, that would be great. | ||
894 | |||
895 | |||
896 | All the way with Devfs | ||
897 | |||
898 | The devfs kernel patch creates a rationalised device tree. As stated | ||
899 | above, if you want to keep using the old /dev naming scheme, | ||
900 | you just need to configure devfsd appopriately (see the man | ||
901 | page). People who prefer the old names can ignore this section. For | ||
902 | those of us who like the rationalised names and an uncluttered | ||
903 | /dev, read on. | ||
904 | |||
905 | If you don't run devfsd, or don't enable compatibility entry | ||
906 | management, then you will have to configure your system to use the new | ||
907 | names. For example, you will then need to edit your | ||
908 | /etc/fstab to use the new disc naming scheme. If you want to | ||
909 | be able to boot non-devfs kernels, you will need compatibility | ||
910 | symlinks in the underlying disc-based /dev pointing back to | ||
911 | the old-style names for when you boot a kernel without devfs. | ||
912 | |||
913 | You can selectively decide which devices you want compatibility | ||
914 | entries for. For example, you may only want compatibility entries for | ||
915 | BSD pseudo-terminal devices (otherwise you'll have to patch you C | ||
916 | library or use Unix98 ptys instead). It's just a matter of putting in | ||
917 | the correct regular expression into /dev/devfsd.conf. | ||
918 | |||
919 | There are other choices of naming schemes that you may prefer. For | ||
920 | example, I don't use the kernel-supplied | ||
921 | names, because they are too verbose. A common misconception is | ||
922 | that the kernel-supplied names are meant to be used directly in | ||
923 | configuration files. This is not the case. They are designed to | ||
924 | reflect the layout of the devices attached and to provide easy | ||
925 | classification. | ||
926 | |||
927 | If you like the kernel-supplied names, that's fine. If you don't then | ||
928 | you should be using devfsd to construct a namespace more to your | ||
929 | liking. Devfsd has built-in code to construct a | ||
930 | namespace that is both logical and easy to | ||
931 | manage. In essence, it creates a convenient abbreviation of the | ||
932 | kernel-supplied namespace. | ||
933 | |||
934 | You are of course free to build your own namespace. Devfsd has all the | ||
935 | infrastructure required to make this easy for you. All you need do is | ||
936 | write a script. You can even write some C code and devfsd can load the | ||
937 | shared object as a callable extension. | ||
938 | |||
939 | |||
940 | Other Issues | ||
941 | |||
942 | The init programme | ||
943 | Another thing to take note of is whether your init programme | ||
944 | creates a Unix socket /dev/telinit. Some versions of init | ||
945 | create /dev/telinit so that the telinit programme can | ||
946 | communicate with the init process. If you have such a system you need | ||
947 | to make sure that devfs is mounted over /dev *before* init | ||
948 | starts. In other words, you can't leave the mounting of devfs to | ||
949 | /etc/rc, since this is executed after init. Other | ||
950 | versions of init require a named pipe /dev/initctl | ||
951 | which must exist *before* init starts. Once again, you need to | ||
952 | mount devfs and then create the named pipe *before* init | ||
953 | starts. | ||
954 | |||
955 | The default behaviour now is not to mount devfs onto /dev at | ||
956 | boot time for 2.3.x and later kernels. You can correct this with the | ||
957 | "devfs=mount" boot option. This solves any problems with init, | ||
958 | and also prevents the dreaded: | ||
959 | |||
960 | Cannot open initial console | ||
961 | |||
962 | message. For 2.2.x kernels where you need to apply the devfs patch, | ||
963 | the default is to mount. | ||
964 | |||
965 | If you have automatic mounting of devfs onto /dev then you | ||
966 | may need to create /dev/initctl in your boot scripts. The | ||
967 | following lines should suffice: | ||
968 | |||
969 | mknod /dev/initctl p | ||
970 | kill -SIGUSR1 1 # tell init that /dev/initctl now exists | ||
971 | |||
972 | Alternatively, if you don't want the kernel to mount devfs onto | ||
973 | /dev then you could use the following procedure is a | ||
974 | guideline for how to get around /dev/initctl problems: | ||
975 | |||
976 | # cd /sbin | ||
977 | # mv init init.real | ||
978 | # cat > init | ||
979 | #! /bin/sh | ||
980 | mount -n -t devfs none /dev | ||
981 | mknod /dev/initctl p | ||
982 | exec /sbin/init.real $* | ||
983 | [control-D] | ||
984 | # chmod a+x init | ||
985 | |||
986 | Note that newer versions of init create /dev/initctl | ||
987 | automatically, so you don't have to worry about this. | ||
988 | |||
989 | Module autoloading | ||
990 | You will need to configure devfsd to enable module | ||
991 | autoloading. The following lines should be placed in your | ||
992 | /etc/devfsd.conf file: | ||
993 | |||
994 | LOOKUP .* MODLOAD | ||
995 | |||
996 | |||
997 | As of devfsd-v1.3.10, a generic /etc/modules.devfs | ||
998 | configuration file is installed, which is used by the MODLOAD | ||
999 | action. This should be sufficient for most configurations. If you | ||
1000 | require further configuration, edit your /etc/modules.conf | ||
1001 | file. The way module autoloading work with devfs is: | ||
1002 | |||
1003 | |||
1004 | a process attempts to lookup a device node (e.g. /dev/fred) | ||
1005 | |||
1006 | |||
1007 | if that device node does not exist, the full pathname is passed to | ||
1008 | devfsd as a string | ||
1009 | |||
1010 | |||
1011 | devfsd will pass the string to the modprobe programme (provided the | ||
1012 | configuration line shown above is present), and specifies that | ||
1013 | /etc/modules.devfs is the configuration file | ||
1014 | |||
1015 | |||
1016 | /etc/modules.devfs includes /etc/modules.conf to | ||
1017 | access local configurations | ||
1018 | |||
1019 | modprobe will search it's configuration files, looking for an alias | ||
1020 | that translates the pathname into a module name | ||
1021 | |||
1022 | |||
1023 | the translated pathname is then used to load the module. | ||
1024 | |||
1025 | |||
1026 | If you wanted a lookup of /dev/fred to load the | ||
1027 | mymod module, you would require the following configuration | ||
1028 | line in /etc/modules.conf: | ||
1029 | |||
1030 | alias /dev/fred mymod | ||
1031 | |||
1032 | The /etc/modules.devfs configuration file provides many such | ||
1033 | aliases for standard device names. If you look closely at this file, | ||
1034 | you will note that some modules require multiple alias configuration | ||
1035 | lines. This is required to support module autoloading for old and new | ||
1036 | device names. | ||
1037 | |||
1038 | Mounting root off a devfs device | ||
1039 | If you wish to mount root off a devfs device when you pass the | ||
1040 | "devfs=only" boot option, then you need to pass in the | ||
1041 | "root=<device>" option to the kernel when booting. If you use | ||
1042 | LILO, then you must have this in lilo.conf: | ||
1043 | |||
1044 | append = "root=<device>" | ||
1045 | |||
1046 | Surprised? Yep, so was I. It turns out if you have (as most people | ||
1047 | do): | ||
1048 | |||
1049 | root = <device> | ||
1050 | |||
1051 | |||
1052 | then LILO will determine the device number of <device> and will | ||
1053 | write that device number into a special place in the kernel image | ||
1054 | before starting the kernel, and the kernel will use that device number | ||
1055 | to mount the root filesystem. So, using the "append" variety ensures | ||
1056 | that LILO passes the root filesystem device as a string, which devfs | ||
1057 | can then use. | ||
1058 | |||
1059 | Note that this isn't an issue if you don't pass "devfs=only". | ||
1060 | |||
1061 | TTY issues | ||
1062 | The ttyname(3) function in some versions of the C library makes | ||
1063 | false assumptions about device entries which are symbolic links. The | ||
1064 | tty(1) programme is one that depends on this function. I've | ||
1065 | written a patch to libc 5.4.43 which fixes this. This has been | ||
1066 | included in libc 5.4.44 and a similar fix is in glibc 2.1.3. | ||
1067 | |||
1068 | |||
1069 | Kernel Naming Scheme | ||
1070 | |||
1071 | The kernel provides a default naming scheme. This scheme is designed | ||
1072 | to make it easy to search for specific devices or device types, and to | ||
1073 | view the available devices. Some device types (such as hard discs), | ||
1074 | have a directory of entries, making it easy to see what devices of | ||
1075 | that class are available. Often, the entries are symbolic links into a | ||
1076 | directory tree that reflects the topology of available devices. The | ||
1077 | topological tree is useful for finding how your devices are arranged. | ||
1078 | |||
1079 | Below is a list of the naming schemes for the most common drivers. A | ||
1080 | list of reserved device names is | ||
1081 | available for reference. Please send email to | ||
1082 | rgooch@atnf.csiro.au to obtain an allocation. Please be | ||
1083 | patient (the maintainer is busy). An alternative name may be allocated | ||
1084 | instead of the requested name, at the discretion of the maintainer. | ||
1085 | |||
1086 | Disc Devices | ||
1087 | |||
1088 | All discs, whether SCSI, IDE or whatever, are placed under the | ||
1089 | /dev/discs hierarchy: | ||
1090 | |||
1091 | /dev/discs/disc0 first disc | ||
1092 | /dev/discs/disc1 second disc | ||
1093 | |||
1094 | |||
1095 | Each of these entries is a symbolic link to the directory for that | ||
1096 | device. The device directory contains: | ||
1097 | |||
1098 | disc for the whole disc | ||
1099 | part* for individual partitions | ||
1100 | |||
1101 | |||
1102 | CD-ROM Devices | ||
1103 | |||
1104 | All CD-ROMs, whether SCSI, IDE or whatever, are placed under the | ||
1105 | /dev/cdroms hierarchy: | ||
1106 | |||
1107 | /dev/cdroms/cdrom0 first CD-ROM | ||
1108 | /dev/cdroms/cdrom1 second CD-ROM | ||
1109 | |||
1110 | |||
1111 | Each of these entries is a symbolic link to the real device entry for | ||
1112 | that device. | ||
1113 | |||
1114 | Tape Devices | ||
1115 | |||
1116 | All tapes, whether SCSI, IDE or whatever, are placed under the | ||
1117 | /dev/tapes hierarchy: | ||
1118 | |||
1119 | /dev/tapes/tape0 first tape | ||
1120 | /dev/tapes/tape1 second tape | ||
1121 | |||
1122 | |||
1123 | Each of these entries is a symbolic link to the directory for that | ||
1124 | device. The device directory contains: | ||
1125 | |||
1126 | mt for mode 0 | ||
1127 | mtl for mode 1 | ||
1128 | mtm for mode 2 | ||
1129 | mta for mode 3 | ||
1130 | mtn for mode 0, no rewind | ||
1131 | mtln for mode 1, no rewind | ||
1132 | mtmn for mode 2, no rewind | ||
1133 | mtan for mode 3, no rewind | ||
1134 | |||
1135 | |||
1136 | SCSI Devices | ||
1137 | |||
1138 | To uniquely identify any SCSI device requires the following | ||
1139 | information: | ||
1140 | |||
1141 | controller (host adapter) | ||
1142 | bus (SCSI channel) | ||
1143 | target (SCSI ID) | ||
1144 | unit (Logical Unit Number) | ||
1145 | |||
1146 | |||
1147 | All SCSI devices are placed under /dev/scsi (assuming devfs | ||
1148 | is mounted on /dev). Hence, a SCSI device with the following | ||
1149 | parameters: c=1,b=2,t=3,u=4 would appear as: | ||
1150 | |||
1151 | /dev/scsi/host1/bus2/target3/lun4 device directory | ||
1152 | |||
1153 | |||
1154 | Inside this directory, a number of device entries may be created, | ||
1155 | depending on which SCSI device-type drivers were installed. | ||
1156 | |||
1157 | See the section on the disc naming scheme to see what entries the SCSI | ||
1158 | disc driver creates. | ||
1159 | |||
1160 | See the section on the tape naming scheme to see what entries the SCSI | ||
1161 | tape driver creates. | ||
1162 | |||
1163 | The SCSI CD-ROM driver creates: | ||
1164 | |||
1165 | cd | ||
1166 | |||
1167 | |||
1168 | The SCSI generic driver creates: | ||
1169 | |||
1170 | generic | ||
1171 | |||
1172 | |||
1173 | IDE Devices | ||
1174 | |||
1175 | To uniquely identify any IDE device requires the following | ||
1176 | information: | ||
1177 | |||
1178 | controller | ||
1179 | bus (aka. primary/secondary) | ||
1180 | target (aka. master/slave) | ||
1181 | unit | ||
1182 | |||
1183 | |||
1184 | All IDE devices are placed under /dev/ide, and uses a similar | ||
1185 | naming scheme to the SCSI subsystem. | ||
1186 | |||
1187 | XT Hard Discs | ||
1188 | |||
1189 | All XT discs are placed under /dev/xd. The first XT disc has | ||
1190 | the directory /dev/xd/disc0. | ||
1191 | |||
1192 | TTY devices | ||
1193 | |||
1194 | The tty devices now appear as: | ||
1195 | |||
1196 | New name Old-name Device Type | ||
1197 | -------- -------- ----------- | ||
1198 | /dev/tts/{0,1,...} /dev/ttyS{0,1,...} Serial ports | ||
1199 | /dev/cua/{0,1,...} /dev/cua{0,1,...} Call out devices | ||
1200 | /dev/vc/0 /dev/tty Current virtual console | ||
1201 | /dev/vc/{1,2,...} /dev/tty{1...63} Virtual consoles | ||
1202 | /dev/vcc/{0,1,...} /dev/vcs{1...63} Virtual consoles | ||
1203 | /dev/pty/m{0,1,...} /dev/ptyp?? PTY masters | ||
1204 | /dev/pty/s{0,1,...} /dev/ttyp?? PTY slaves | ||
1205 | |||
1206 | |||
1207 | RAMDISCS | ||
1208 | |||
1209 | The RAMDISCS are placed in their own directory, and are named thus: | ||
1210 | |||
1211 | /dev/rd/{0,1,2,...} | ||
1212 | |||
1213 | |||
1214 | Meta Devices | ||
1215 | |||
1216 | The meta devices are placed in their own directory, and are named | ||
1217 | thus: | ||
1218 | |||
1219 | /dev/md/{0,1,2,...} | ||
1220 | |||
1221 | |||
1222 | Floppy discs | ||
1223 | |||
1224 | Floppy discs are placed in the /dev/floppy directory. | ||
1225 | |||
1226 | Loop devices | ||
1227 | |||
1228 | Loop devices are placed in the /dev/loop directory. | ||
1229 | |||
1230 | Sound devices | ||
1231 | |||
1232 | Sound devices are placed in the /dev/sound directory | ||
1233 | (audio, sequencer, ...). | ||
1234 | |||
1235 | |||
1236 | Devfsd Naming Scheme | ||
1237 | |||
1238 | Devfsd provides a naming scheme which is a convenient abbreviation of | ||
1239 | the kernel-supplied namespace. In some | ||
1240 | cases, the kernel-supplied naming scheme is quite convenient, so | ||
1241 | devfsd does not provide another naming scheme. The convenience names | ||
1242 | that devfsd creates are in fact the same names as the original devfs | ||
1243 | kernel patch created (before Linus mandated the Big Name | ||
1244 | Change). These are referred to as "new compatibility entries". | ||
1245 | |||
1246 | In order to configure devfsd to create these convenience names, the | ||
1247 | following lines should be placed in your /etc/devfsd.conf: | ||
1248 | |||
1249 | REGISTER .* MKNEWCOMPAT | ||
1250 | UNREGISTER .* RMNEWCOMPAT | ||
1251 | |||
1252 | This will cause devfsd to create (and destroy) symbolic links which | ||
1253 | point to the kernel-supplied names. | ||
1254 | |||
1255 | SCSI Hard Discs | ||
1256 | |||
1257 | All SCSI discs are placed under /dev/sd (assuming devfs is | ||
1258 | mounted on /dev). Hence, a SCSI disc with the following | ||
1259 | parameters: c=1,b=2,t=3,u=4 would appear as: | ||
1260 | |||
1261 | /dev/sd/c1b2t3u4 for the whole disc | ||
1262 | /dev/sd/c1b2t3u4p5 for the 5th partition | ||
1263 | /dev/sd/c1b2t3u4p5s6 for the 6th slice in the 5th partition | ||
1264 | |||
1265 | |||
1266 | SCSI Tapes | ||
1267 | |||
1268 | All SCSI tapes are placed under /dev/st. A similar naming | ||
1269 | scheme is used as for SCSI discs. A SCSI tape with the | ||
1270 | parameters:c=1,b=2,t=3,u=4 would appear as: | ||
1271 | |||
1272 | /dev/st/c1b2t3u4m0 for mode 0 | ||
1273 | /dev/st/c1b2t3u4m1 for mode 1 | ||
1274 | /dev/st/c1b2t3u4m2 for mode 2 | ||
1275 | /dev/st/c1b2t3u4m3 for mode 3 | ||
1276 | /dev/st/c1b2t3u4m0n for mode 0, no rewind | ||
1277 | /dev/st/c1b2t3u4m1n for mode 1, no rewind | ||
1278 | /dev/st/c1b2t3u4m2n for mode 2, no rewind | ||
1279 | /dev/st/c1b2t3u4m3n for mode 3, no rewind | ||
1280 | |||
1281 | |||
1282 | SCSI CD-ROMs | ||
1283 | |||
1284 | All SCSI CD-ROMs are placed under /dev/sr. A similar naming | ||
1285 | scheme is used as for SCSI discs. A SCSI CD-ROM with the | ||
1286 | parameters:c=1,b=2,t=3,u=4 would appear as: | ||
1287 | |||
1288 | /dev/sr/c1b2t3u4 | ||
1289 | |||
1290 | |||
1291 | SCSI Generic Devices | ||
1292 | |||
1293 | The generic (aka. raw) interface for all SCSI devices are placed under | ||
1294 | /dev/sg. A similar naming scheme is used as for SCSI discs. A | ||
1295 | SCSI generic device with the parameters:c=1,b=2,t=3,u=4 would appear | ||
1296 | as: | ||
1297 | |||
1298 | /dev/sg/c1b2t3u4 | ||
1299 | |||
1300 | |||
1301 | IDE Hard Discs | ||
1302 | |||
1303 | All IDE discs are placed under /dev/ide/hd, using a similar | ||
1304 | convention to SCSI discs. The following mappings exist between the new | ||
1305 | and the old names: | ||
1306 | |||
1307 | /dev/hda /dev/ide/hd/c0b0t0u0 | ||
1308 | /dev/hdb /dev/ide/hd/c0b0t1u0 | ||
1309 | /dev/hdc /dev/ide/hd/c0b1t0u0 | ||
1310 | /dev/hdd /dev/ide/hd/c0b1t1u0 | ||
1311 | |||
1312 | |||
1313 | IDE Tapes | ||
1314 | |||
1315 | A similar naming scheme is used as for IDE discs. The entries will | ||
1316 | appear in the /dev/ide/mt directory. | ||
1317 | |||
1318 | IDE CD-ROM | ||
1319 | |||
1320 | A similar naming scheme is used as for IDE discs. The entries will | ||
1321 | appear in the /dev/ide/cd directory. | ||
1322 | |||
1323 | IDE Floppies | ||
1324 | |||
1325 | A similar naming scheme is used as for IDE discs. The entries will | ||
1326 | appear in the /dev/ide/fd directory. | ||
1327 | |||
1328 | XT Hard Discs | ||
1329 | |||
1330 | All XT discs are placed under /dev/xd. The first XT disc | ||
1331 | would appear as /dev/xd/c0t0. | ||
1332 | |||
1333 | |||
1334 | Old Compatibility Names | ||
1335 | |||
1336 | The old compatibility names are the legacy device names, such as | ||
1337 | /dev/hda, /dev/sda, /dev/rtc and so on. | ||
1338 | Devfsd can be configured to create compatibility symlinks so that you | ||
1339 | may continue to use the old names in your configuration files and so | ||
1340 | that old applications will continue to function correctly. | ||
1341 | |||
1342 | In order to configure devfsd to create these legacy names, the | ||
1343 | following lines should be placed in your /etc/devfsd.conf: | ||
1344 | |||
1345 | REGISTER .* MKOLDCOMPAT | ||
1346 | UNREGISTER .* RMOLDCOMPAT | ||
1347 | |||
1348 | This will cause devfsd to create (and destroy) symbolic links which | ||
1349 | point to the kernel-supplied names. | ||
1350 | |||
1351 | |||
1352 | ----------------------------------------------------------------------------- | ||
1353 | |||
1354 | |||
1355 | Device drivers currently ported | ||
1356 | |||
1357 | - All miscellaneous character devices support devfs (this is done | ||
1358 | transparently through misc_register()) | ||
1359 | |||
1360 | - SCSI discs and generic hard discs | ||
1361 | |||
1362 | - Character memory devices (null, zero, full and so on) | ||
1363 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
1364 | |||
1365 | - Loop devices (/dev/loop?) | ||
1366 | |||
1367 | - TTY devices (console, serial ports, terminals and pseudo-terminals) | ||
1368 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
1369 | |||
1370 | - SCSI tapes (/dev/scsi and /dev/tapes) | ||
1371 | |||
1372 | - SCSI CD-ROMs (/dev/scsi and /dev/cdroms) | ||
1373 | |||
1374 | - SCSI generic devices (/dev/scsi) | ||
1375 | |||
1376 | - RAMDISCS (/dev/ram?) | ||
1377 | |||
1378 | - Meta Devices (/dev/md*) | ||
1379 | |||
1380 | - Floppy discs (/dev/floppy) | ||
1381 | |||
1382 | - Parallel port printers (/dev/printers) | ||
1383 | |||
1384 | - Sound devices (/dev/sound) | ||
1385 | Thanks to Eric Dumas <dumas@linux.eu.org> and | ||
1386 | C. Scott Ananian <cananian@alumni.princeton.edu> | ||
1387 | |||
1388 | - Joysticks (/dev/joysticks) | ||
1389 | |||
1390 | - Sparc keyboard (/dev/kbd) | ||
1391 | |||
1392 | - DSP56001 digital signal processor (/dev/dsp56k) | ||
1393 | |||
1394 | - Apple Desktop Bus (/dev/adb) | ||
1395 | |||
1396 | - Coda network file system (/dev/cfs*) | ||
1397 | |||
1398 | - Virtual console capture devices (/dev/vcc) | ||
1399 | Thanks to Dennis Hou <smilax@mindmeld.yi.org> | ||
1400 | |||
1401 | - Frame buffer devices (/dev/fb) | ||
1402 | |||
1403 | - Video capture devices (/dev/v4l) | ||
1404 | |||
1405 | |||
1406 | ----------------------------------------------------------------------------- | ||
1407 | |||
1408 | |||
1409 | Allocation of Device Numbers | ||
1410 | |||
1411 | Devfs allows you to write a driver which doesn't need to allocate a | ||
1412 | device number (major&minor numbers) for the internal operation of the | ||
1413 | kernel. However, there are a number of userspace programmes that use | ||
1414 | the device number as a unique handle for a device. An example is the | ||
1415 | find programme, which uses device numbers to determine whether | ||
1416 | an inode is on a different filesystem than another inode. The device | ||
1417 | number used is the one for the block device which a filesystem is | ||
1418 | using. To preserve compatibility with userspace programmes, block | ||
1419 | devices using devfs need to have unique device numbers allocated to | ||
1420 | them. Furthermore, POSIX specifies device numbers, so some kind of | ||
1421 | device number needs to be presented to userspace. | ||
1422 | |||
1423 | The simplest option (especially when porting drivers to devfs) is to | ||
1424 | keep using the old major and minor numbers. Devfs will take whatever | ||
1425 | values are given for major&minor and pass them onto userspace. | ||
1426 | |||
1427 | This device number is a 16 bit number, so this leaves plenty of space | ||
1428 | for large numbers of discs and partitions. This scheme can also be | ||
1429 | used for character devices, in particular the tty devices, which are | ||
1430 | currently limited to 256 pseudo-ttys (this limits the total number of | ||
1431 | simultaneous xterms and remote logins). Note that the device number | ||
1432 | is limited to the range 36864-61439 (majors 144-239), in order to | ||
1433 | avoid any possible conflicts with existing official allocations. | ||
1434 | |||
1435 | Please note that using dynamically allocated block device numbers may | ||
1436 | break the NFS daemons (both user and kernel mode), which expect dev_t | ||
1437 | for a given device to be constant over the lifetime of remote mounts. | ||
1438 | |||
1439 | A final note on this scheme: since it doesn't increase the size of | ||
1440 | device numbers, there are no compatibility issues with userspace. | ||
1441 | |||
1442 | ----------------------------------------------------------------------------- | ||
1443 | |||
1444 | |||
1445 | Questions and Answers | ||
1446 | |||
1447 | |||
1448 | Making things work | ||
1449 | Alternatives to devfs | ||
1450 | What I don't like about devfs | ||
1451 | How to report bugs | ||
1452 | Strange kernel messages | ||
1453 | Compilation problems with devfsd | ||
1454 | |||
1455 | |||
1456 | |||
1457 | Making things work | ||
1458 | |||
1459 | Here are some common questions and answers. | ||
1460 | |||
1461 | |||
1462 | |||
1463 | Devfsd doesn't start | ||
1464 | |||
1465 | Make sure you have compiled and installed devfsd | ||
1466 | Make sure devfsd is being started from your boot | ||
1467 | scripts | ||
1468 | Make sure you have configured your kernel to enable devfs (see | ||
1469 | below) | ||
1470 | Make sure devfs is mounted (see below) | ||
1471 | |||
1472 | |||
1473 | Devfsd is not managing all my permissions | ||
1474 | |||
1475 | Make sure you are capturing the appropriate events. For example, | ||
1476 | device entries created by the kernel generate REGISTER events, | ||
1477 | but those created by devfsd generate CREATE events. | ||
1478 | |||
1479 | |||
1480 | Devfsd is not capturing all REGISTER events | ||
1481 | |||
1482 | See the previous entry: you may need to capture CREATE events. | ||
1483 | |||
1484 | |||
1485 | X will not start | ||
1486 | |||
1487 | Make sure you followed the steps | ||
1488 | outlined above. | ||
1489 | |||
1490 | |||
1491 | Why don't my network devices appear in devfs? | ||
1492 | |||
1493 | This is not a bug. Network devices have their own, completely separate | ||
1494 | namespace. They are accessed via socket(2) and | ||
1495 | setsockopt(2) calls, and thus require no device nodes. I have | ||
1496 | raised the possibilty of moving network devices into the device | ||
1497 | namespace, but have had no response. | ||
1498 | |||
1499 | |||
1500 | How can I test if I have devfs compiled into my kernel? | ||
1501 | |||
1502 | All filesystems built-in or currently loaded are listed in | ||
1503 | /proc/filesystems. If you see a devfs entry, then | ||
1504 | you know that devfs was compiled into your kernel. If you have | ||
1505 | correctly configured and rebuilt your kernel, then devfs will be | ||
1506 | built-in. If you think you've configured it in, but | ||
1507 | /proc/filesystems doesn't show it, you've made a mistake. | ||
1508 | Common mistakes include: | ||
1509 | |||
1510 | Using a 2.2.x kernel without applying the devfs patch (if you | ||
1511 | don't know how to patch your kernel, use 2.4.x instead, don't bother | ||
1512 | asking me how to patch) | ||
1513 | Forgetting to set CONFIG_EXPERIMENTAL=y | ||
1514 | Forgetting to set CONFIG_DEVFS_FS=y | ||
1515 | Forgetting to set CONFIG_DEVFS_MOUNT=y (if you want devfs | ||
1516 | to be automatically mounted at boot) | ||
1517 | Editing your .config manually, instead of using make | ||
1518 | config or make xconfig | ||
1519 | Forgetting to run make dep; make clean after changing the | ||
1520 | configuration and before compiling | ||
1521 | Forgetting to compile your kernel and modules | ||
1522 | Forgetting to install your kernel | ||
1523 | Forgetting to install your modules | ||
1524 | |||
1525 | Please check twice that you've done all these steps before sending in | ||
1526 | a bug report. | ||
1527 | |||
1528 | |||
1529 | |||
1530 | How can I test if devfs is mounted on /dev? | ||
1531 | |||
1532 | The device filesystem will always create an entry called | ||
1533 | ".devfsd", which is used to communicate with the daemon. Even | ||
1534 | if the daemon is not running, this entry will exist. Testing for the | ||
1535 | existence of this entry is the approved method of determining if devfs | ||
1536 | is mounted or not. Note that the type of entry (i.e. regular file, | ||
1537 | character device, named pipe, etc.) may change without notice. Only | ||
1538 | the existence of the entry should be relied upon. | ||
1539 | |||
1540 | |||
1541 | When I start devfsd, I see the error: | ||
1542 | Error opening file: ".devfsd" No such file or directory? | ||
1543 | |||
1544 | This means that devfs is not mounted. Make sure you have devfs mounted. | ||
1545 | |||
1546 | |||
1547 | How do I mount devfs? | ||
1548 | |||
1549 | First make sure you have devfs compiled into your kernel (see | ||
1550 | above). Then you will either need to: | ||
1551 | |||
1552 | set CONFIG_DEVFS_MOUNT=y in your kernel config | ||
1553 | pass devfs=mount to your boot loader | ||
1554 | mount devfs manually in your boot scripts with: | ||
1555 | mount -t none devfs /dev | ||
1556 | |||
1557 | |||
1558 | |||
1559 | Mount by volume LABEL=<label> doesn't work with | ||
1560 | devfs | ||
1561 | |||
1562 | Most probably you are not mounting devfs onto /dev. What | ||
1563 | happens is that if your kernel config has CONFIG_DEVFS_FS=y | ||
1564 | then the contents of /proc/partitions will have the devfs | ||
1565 | names (such as scsi/host0/bus0/target0/lun0/part1). The | ||
1566 | contents of /proc/partitions are used by mount(8) when | ||
1567 | mounting by volume label. If devfs is not mounted on /dev, | ||
1568 | then mount(8) will fail to find devices. The solution is to | ||
1569 | make sure that devfs is mounted on /dev. See above for how to | ||
1570 | do that. | ||
1571 | |||
1572 | |||
1573 | I have extra or incorrect entries in /dev | ||
1574 | |||
1575 | You may have stale entries in your dev-state area. Check for a | ||
1576 | RESTORE configuration line in your devfsd configuration | ||
1577 | (typically /etc/devfsd.conf). If you have this line, check | ||
1578 | the contents of the specified directory for stale entries. Remove | ||
1579 | any entries which are incorrect, then reboot. | ||
1580 | |||
1581 | |||
1582 | I get "Unable to open initial console" messages at boot | ||
1583 | |||
1584 | This usually happens when you don't have devfs automounted onto | ||
1585 | /dev at boot time, and there is no valid | ||
1586 | /dev/console entry on your root file-system. Create a valid | ||
1587 | /dev/console device node. | ||
1588 | |||
1589 | |||
1590 | |||
1591 | |||
1592 | |||
1593 | Alternatives to devfs | ||
1594 | |||
1595 | I've attempted to collate all the anti-devfs proposals and explain | ||
1596 | their limitations. Under construction. | ||
1597 | |||
1598 | |||
1599 | Why not just pass device create/remove events to a daemon? | ||
1600 | |||
1601 | Here the suggestion is to develop an API in the kernel so that devices | ||
1602 | can register create and remove events, and a daemon listens for those | ||
1603 | events. The daemon would then populate/depopulate /dev (which | ||
1604 | resides on disc). | ||
1605 | |||
1606 | This has several limitations: | ||
1607 | |||
1608 | |||
1609 | it only works for modules loaded and unloaded (or devices inserted | ||
1610 | and removed) after the kernel has finished booting. Without a database | ||
1611 | of events, there is no way the daemon could fully populate | ||
1612 | /dev | ||
1613 | |||
1614 | |||
1615 | if you add a database to this scheme, the question is then how to | ||
1616 | present that database to user-space. If you make it a list of strings | ||
1617 | with embedded event codes which are passed through a pipe to the | ||
1618 | daemon, then this is only of use to the daemon. I would argue that the | ||
1619 | natural way to present this data is via a filesystem (since many of | ||
1620 | the events will be of a hierarchical nature), such as devfs. | ||
1621 | Presenting the data as a filesystem makes it easy for the user to see | ||
1622 | what is available and also makes it easy to write scripts to scan the | ||
1623 | "database" | ||
1624 | |||
1625 | |||
1626 | the tight binding between device nodes and drivers is no longer | ||
1627 | possible (requiring the otherwise perfectly avoidable | ||
1628 | table lookups) | ||
1629 | |||
1630 | |||
1631 | you cannot catch inode lookup events on /dev which means | ||
1632 | that module autoloading requires device nodes to be created. This is a | ||
1633 | problem, particularly for drivers where only a few inodes are created | ||
1634 | from a potentially large set | ||
1635 | |||
1636 | |||
1637 | this technique can't be used when the root FS is mounted | ||
1638 | read-only | ||
1639 | |||
1640 | |||
1641 | |||
1642 | |||
1643 | Just implement a better scsidev | ||
1644 | |||
1645 | This suggestion involves taking the scsidev programme and | ||
1646 | extending it to scan for all devices, not just SCSI devices. The | ||
1647 | scsidev programme works by scanning /proc/scsi | ||
1648 | |||
1649 | Problems: | ||
1650 | |||
1651 | |||
1652 | the kernel does not currently provide a list of all devices | ||
1653 | available. Not all drivers register entries in /proc or | ||
1654 | generate kernel messages | ||
1655 | |||
1656 | |||
1657 | there is no uniform mechanism to register devices other than the | ||
1658 | devfs API | ||
1659 | |||
1660 | |||
1661 | implementing such an API is then the same as the | ||
1662 | proposal above | ||
1663 | |||
1664 | |||
1665 | |||
1666 | |||
1667 | Put /dev on a ramdisc | ||
1668 | |||
1669 | This suggestion involves creating a ramdisc and populating it with | ||
1670 | device nodes and then mounting it over /dev. | ||
1671 | |||
1672 | Problems: | ||
1673 | |||
1674 | |||
1675 | |||
1676 | this doesn't help when mounting the root filesystem, since you | ||
1677 | still need a device node to do that | ||
1678 | |||
1679 | |||
1680 | if you want to use this technique for the root device node as | ||
1681 | well, you need to use initrd. This complicates the booting sequence | ||
1682 | and makes it significantly harder to administer and configure. The | ||
1683 | initrd is essentially opaque, robbing the system administrator of easy | ||
1684 | configuration | ||
1685 | |||
1686 | |||
1687 | insufficient information is available to correctly populate the | ||
1688 | ramdisc. So we come back to the | ||
1689 | proposal above to "solve" this | ||
1690 | |||
1691 | |||
1692 | a ramdisc-based solution would take more kernel memory, since the | ||
1693 | backing store would be (at best) normal VFS inodes and dentries, which | ||
1694 | take 284 bytes and 112 bytes, respectively, for each entry. Compare | ||
1695 | that to 72 bytes for devfs | ||
1696 | |||
1697 | |||
1698 | |||
1699 | |||
1700 | Do nothing: there's no problem | ||
1701 | |||
1702 | Sometimes people can be heard to claim that the existing scheme is | ||
1703 | fine. This is what they're ignoring: | ||
1704 | |||
1705 | |||
1706 | device number size (8 bits each for major and minor) is a real | ||
1707 | limitation, and must be fixed somehow. Systems with large numbers of | ||
1708 | SCSI devices, for example, will continue to consume the remaining | ||
1709 | unallocated major numbers. USB will also need to push beyond the 8 bit | ||
1710 | minor limitation | ||
1711 | |||
1712 | |||
1713 | simply increasing the device number size is insufficient. Apart | ||
1714 | from causing a lot of pain, it doesn't solve the management issues | ||
1715 | of a /dev with thousands or more device nodes | ||
1716 | |||
1717 | |||
1718 | ignoring the problem of a huge /dev will not make it go | ||
1719 | away, and dismisses the legitimacy of a large number of people who | ||
1720 | want a dynamic /dev | ||
1721 | |||
1722 | |||
1723 | the standard response then becomes: "write a device management | ||
1724 | daemon", which brings us back to the | ||
1725 | proposal above | ||
1726 | |||
1727 | |||
1728 | |||
1729 | |||
1730 | What I don't like about devfs | ||
1731 | |||
1732 | Here are some common complaints about devfs, and some suggestions and | ||
1733 | solutions that may make it more palatable for you. I can't please | ||
1734 | everybody, but I do try :-) | ||
1735 | |||
1736 | I hate the naming scheme | ||
1737 | |||
1738 | First, remember that no naming scheme will please everybody. You hate | ||
1739 | the scheme, others love it. Who's to say who's right and who's wrong? | ||
1740 | Ultimately, the person who writes the code gets to choose, and what | ||
1741 | exists now is a combination of the choices made by the | ||
1742 | devfs author and the | ||
1743 | kernel maintainer (Linus). | ||
1744 | |||
1745 | However, not all is lost. If you want to create your own naming | ||
1746 | scheme, it is a simple matter to write a standalone script, hack | ||
1747 | devfsd, or write a script called by devfsd. You can create whatever | ||
1748 | naming scheme you like. | ||
1749 | |||
1750 | Further, if you want to remove all traces of the devfs naming scheme | ||
1751 | from /dev, you can mount devfs elsewhere (say | ||
1752 | /devfs) and populate /dev with links into | ||
1753 | /devfs. This population can be automated using devfsd if you | ||
1754 | wish. | ||
1755 | |||
1756 | You can even use the VFS binding facility to make the links, rather | ||
1757 | than using symbolic links. This way, you don't even have to see the | ||
1758 | "destination" of these symbolic links. | ||
1759 | |||
1760 | Devfs puts policy into the kernel | ||
1761 | |||
1762 | There's already policy in the kernel. Device numbers are in fact | ||
1763 | policy (why should the kernel dictate what device numbers I use?). | ||
1764 | Face it, some policy has to be in the kernel. The real difference | ||
1765 | between device names as policy and device numbers as policy is that | ||
1766 | no one will use device numbers directly, because device | ||
1767 | numbers are devoid of meaning to humans and are ugly. At least with | ||
1768 | the devfs device names, (even though you can add your own naming | ||
1769 | scheme) some people will use the devfs-supplied names directly. This | ||
1770 | offends some people :-) | ||
1771 | |||
1772 | Devfs is bloatware | ||
1773 | |||
1774 | This is not even remotely true. As shown above, | ||
1775 | both code and data size are quite modest. | ||
1776 | |||
1777 | |||
1778 | How to report bugs | ||
1779 | |||
1780 | If you have (or think you have) a bug with devfs, please follow the | ||
1781 | steps below: | ||
1782 | |||
1783 | |||
1784 | |||
1785 | make sure you have enabled debugging output when configuring your | ||
1786 | kernel. You will need to set (at least) the following config options: | ||
1787 | |||
1788 | CONFIG_DEVFS_DEBUG=y | ||
1789 | CONFIG_DEBUG_KERNEL=y | ||
1790 | CONFIG_DEBUG_SLAB=y | ||
1791 | |||
1792 | |||
1793 | |||
1794 | please make sure you have the latest devfs patches applied. The | ||
1795 | latest kernel version might not have the latest devfs patches applied | ||
1796 | yet (Linus is very busy) | ||
1797 | |||
1798 | |||
1799 | save a copy of your complete kernel logs (preferably by | ||
1800 | using the dmesg programme) for later inclusion in your bug | ||
1801 | report. You may need to use the -s switch to increase the | ||
1802 | internal buffer size so you can capture all the boot messages. | ||
1803 | Don't edit or trim the dmesg output | ||
1804 | |||
1805 | |||
1806 | |||
1807 | |||
1808 | try booting with devfs=dall passed to the kernel boot | ||
1809 | command line (read the documentation on your bootloader on how to do | ||
1810 | this), and save the result to a file. This may be quite verbose, and | ||
1811 | it may overflow the messages buffer, but try to get as much of it as | ||
1812 | you can | ||
1813 | |||
1814 | |||
1815 | send a copy of your devfsd configuration file(s) | ||
1816 | |||
1817 | send the bug report to me first. | ||
1818 | Don't expect that I will see it if you post it to the linux-kernel | ||
1819 | mailing list. Include all the information listed above, plus | ||
1820 | anything else that you think might be relevant. Put the string | ||
1821 | devfs somewhere in the subject line, so my mail filters mark | ||
1822 | it as urgent | ||
1823 | |||
1824 | |||
1825 | |||
1826 | |||
1827 | Here is a general guide on how to ask questions in a way that greatly | ||
1828 | improves your chances of getting a reply: | ||
1829 | |||
1830 | http://www.tuxedo.org/~esr/faqs/smart-questions.html. If you have | ||
1831 | a bug to report, you should also read | ||
1832 | |||
1833 | http://www.chiark.greenend.org.uk/~sgtatham/bugs.html. | ||
1834 | |||
1835 | |||
1836 | Strange kernel messages | ||
1837 | |||
1838 | You may see devfs-related messages in your kernel logs. Below are some | ||
1839 | messages and what they mean (and what you should do about them, if | ||
1840 | anything). | ||
1841 | |||
1842 | |||
1843 | |||
1844 | devfs_register(fred): could not append to parent, err: -17 | ||
1845 | |||
1846 | You need to check what the error code means, but usually 17 means | ||
1847 | EEXIST. This means that a driver attempted to create an entry | ||
1848 | fred in a directory, but there already was an entry with that | ||
1849 | name. This is often caused by flawed boot scripts which untar a bunch | ||
1850 | of inodes into /dev, as a way to restore permissions. This | ||
1851 | message is harmless, as the device nodes will still | ||
1852 | provide access to the driver (unless you use the devfs=only | ||
1853 | boot option, which is only for dedicated souls:-). If you want to get | ||
1854 | rid of these annoying messages, upgrade to devfsd-v1.3.20 and use the | ||
1855 | recommended RESTORE directive to restore permissions. | ||
1856 | |||
1857 | |||
1858 | devfs_mk_dir(bill): using old entry in dir: c1808724 "" | ||
1859 | |||
1860 | This is similar to the message above, except that a driver attempted | ||
1861 | to create a directory named bill, and the parent directory | ||
1862 | has an entry with the same name. In this case, to ensure that drivers | ||
1863 | continue to work properly, the old entry is re-used and given to the | ||
1864 | driver. In 2.5 kernels, the driver is given a NULL entry, and thus, | ||
1865 | under rare circumstances, may not create the require device nodes. | ||
1866 | The solution is the same as above. | ||
1867 | |||
1868 | |||
1869 | |||
1870 | |||
1871 | |||
1872 | Compilation problems with devfsd | ||
1873 | |||
1874 | Usually, you can compile devfsd just by typing in | ||
1875 | make in the source directory, followed by a make | ||
1876 | install (as root). Sometimes, you may have problems, particularly | ||
1877 | on broken configurations. | ||
1878 | |||
1879 | |||
1880 | |||
1881 | error messages relating to DEVFSD_NOTIFY_DELETE | ||
1882 | |||
1883 | This happened because you have an ancient set of kernel headers | ||
1884 | installed in /usr/include/linux or /usr/src/linux. | ||
1885 | Install kernel 2.4.10 or later. You may need to pass the | ||
1886 | KERNEL_DIR variable to make (if you did not install | ||
1887 | the new kernel sources as /usr/src/linux), or you may copy | ||
1888 | the devfs_fs.h file in the kernel source tree into | ||
1889 | /usr/include/linux. | ||
1890 | |||
1891 | |||
1892 | |||
1893 | |||
1894 | ----------------------------------------------------------------------------- | ||
1895 | |||
1896 | |||
1897 | Other resources | ||
1898 | |||
1899 | |||
1900 | |||
1901 | Douglas Gilbert has written a useful document at | ||
1902 | |||
1903 | http://www.torque.net/sg/devfs_scsi.html which | ||
1904 | explores the SCSI subsystem and how it interacts with devfs | ||
1905 | |||
1906 | |||
1907 | Douglas Gilbert has written another useful document at | ||
1908 | |||
1909 | http://www.torque.net/scsi/SCSI-2.4-HOWTO/ which | ||
1910 | discusses the Linux SCSI subsystem in 2.4. | ||
1911 | |||
1912 | |||
1913 | Johannes Erdfelt has started a discussion paper on Linux and | ||
1914 | hot-swap devices, describing what the requirements are for a scalable | ||
1915 | solution and how and why he's used devfs+devfsd. Note that this is an | ||
1916 | early draft only, available in plain text form at: | ||
1917 | |||
1918 | http://johannes.erdfelt.com/hotswap.txt. | ||
1919 | Johannes has promised a HTML version will follow. | ||
1920 | |||
1921 | |||
1922 | I presented an invited | ||
1923 | paper | ||
1924 | at the | ||
1925 | |||
1926 | 2nd Annual Storage Management Workshop held in Miamia, Florida, | ||
1927 | U.S.A. in October 2000. | ||
1928 | |||
1929 | |||
1930 | |||
1931 | |||
1932 | ----------------------------------------------------------------------------- | ||
1933 | |||
1934 | |||
1935 | Translations of this document | ||
1936 | |||
1937 | This document has been translated into other languages. | ||
1938 | |||
1939 | |||
1940 | |||
1941 | |||
1942 | The document master (in English) by rgooch@atnf.csiro.au is | ||
1943 | available at | ||
1944 | |||
1945 | http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html | ||
1946 | |||
1947 | |||
1948 | |||
1949 | A Korean translation by viatoris@nownuri.net is available at | ||
1950 | |||
1951 | http://your.destiny.pe.kr/devfs/devfs.html | ||
1952 | |||
1953 | |||
1954 | |||
1955 | |||
1956 | ----------------------------------------------------------------------------- | ||
1957 | Most flags courtesy of ITA's | ||
1958 | Flags of All Countries | ||
1959 | used with permission. | ||
diff --git a/Documentation/filesystems/devfs/ToDo b/Documentation/filesystems/devfs/ToDo deleted file mode 100644 index afd5a8f2c19b..000000000000 --- a/Documentation/filesystems/devfs/ToDo +++ /dev/null | |||
@@ -1,40 +0,0 @@ | |||
1 | Device File System (devfs) ToDo List | ||
2 | |||
3 | Richard Gooch <rgooch@atnf.csiro.au> | ||
4 | |||
5 | 3-JUL-2000 | ||
6 | |||
7 | This is a list of things to be done for better devfs support in the | ||
8 | Linux kernel. If you'd like to contribute to the devfs, please have a | ||
9 | look at this list for anything that is unallocated. Also, if there are | ||
10 | items missing (surely), please contact me so I can add them to the | ||
11 | list (preferably with your name attached to them:-). | ||
12 | |||
13 | |||
14 | - >256 ptys | ||
15 | Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> | ||
16 | |||
17 | - Amiga floppy driver (drivers/block/amiflop.c) | ||
18 | |||
19 | - Atari floppy driver (drivers/block/ataflop.c) | ||
20 | |||
21 | - SWIM3 (Super Woz Integrated Machine 3) floppy driver (drivers/block/swim3.c) | ||
22 | |||
23 | - Amiga ZorroII ramdisc driver (drivers/block/z2ram.c) | ||
24 | |||
25 | - Parallel port ATAPI CD-ROM (drivers/block/paride/pcd.c) | ||
26 | |||
27 | - Parallel port ATAPI floppy (drivers/block/paride/pf.c) | ||
28 | |||
29 | - AP1000 block driver (drivers/ap1000/ap.c, drivers/ap1000/ddv.c) | ||
30 | |||
31 | - Archimedes floppy (drivers/acorn/block/fd1772.c) | ||
32 | |||
33 | - MFM hard drive (drivers/acorn/block/mfmhd.c) | ||
34 | |||
35 | - I2O block device (drivers/message/i2o/i2o_block.c) | ||
36 | |||
37 | - ST-RAM device (arch/m68k/atari/stram.c) | ||
38 | |||
39 | - Raw devices | ||
40 | |||
diff --git a/Documentation/filesystems/devfs/boot-options b/Documentation/filesystems/devfs/boot-options deleted file mode 100644 index df3d33b03e0a..000000000000 --- a/Documentation/filesystems/devfs/boot-options +++ /dev/null | |||
@@ -1,65 +0,0 @@ | |||
1 | /* -*- auto-fill -*- */ | ||
2 | |||
3 | Device File System (devfs) Boot Options | ||
4 | |||
5 | Richard Gooch <rgooch@atnf.csiro.au> | ||
6 | |||
7 | 18-AUG-2001 | ||
8 | |||
9 | |||
10 | When CONFIG_DEVFS_DEBUG is enabled, you can pass several boot options | ||
11 | to the kernel to debug devfs. The boot options are prefixed by | ||
12 | "devfs=", and are separated by commas. Spaces are not allowed. The | ||
13 | syntax looks like this: | ||
14 | |||
15 | devfs=<option1>,<option2>,<option3> | ||
16 | |||
17 | and so on. For example, if you wanted to turn on debugging for module | ||
18 | load requests and device registration, you would do: | ||
19 | |||
20 | devfs=dmod,dreg | ||
21 | |||
22 | You may prefix "no" to any option. This will invert the option. | ||
23 | |||
24 | |||
25 | Debugging Options | ||
26 | ================= | ||
27 | |||
28 | These requires CONFIG_DEVFS_DEBUG to be enabled. | ||
29 | Note that all debugging options have 'd' as the first character. By | ||
30 | default all options are off. All debugging output is sent to the | ||
31 | kernel logs. The debugging options do not take effect until the devfs | ||
32 | version message appears (just prior to the root filesystem being | ||
33 | mounted). | ||
34 | |||
35 | These are the options: | ||
36 | |||
37 | dmod print module load requests to <request_module> | ||
38 | |||
39 | dreg print device register requests to <devfs_register> | ||
40 | |||
41 | dunreg print device unregister requests to <devfs_unregister> | ||
42 | |||
43 | dchange print device change requests to <devfs_set_flags> | ||
44 | |||
45 | dilookup print inode lookup requests | ||
46 | |||
47 | diget print VFS inode allocations | ||
48 | |||
49 | diunlink print inode unlinks | ||
50 | |||
51 | dichange print inode changes | ||
52 | |||
53 | dimknod print calls to mknod(2) | ||
54 | |||
55 | dall some debugging turned on | ||
56 | |||
57 | |||
58 | Other Options | ||
59 | ============= | ||
60 | |||
61 | These control the default behaviour of devfs. The options are: | ||
62 | |||
63 | mount mount devfs onto /dev at boot time | ||
64 | |||
65 | only disable non-devfs device nodes for devfs-capable drivers | ||
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt index afb1335c05d6..4aecc9bdb273 100644 --- a/Documentation/filesystems/ext3.txt +++ b/Documentation/filesystems/ext3.txt | |||
@@ -113,6 +113,14 @@ noquota | |||
113 | grpquota | 113 | grpquota |
114 | usrquota | 114 | usrquota |
115 | 115 | ||
116 | bh (*) ext3 associates buffer heads to data pages to | ||
117 | nobh (a) cache disk block mapping information | ||
118 | (b) link pages into transaction to provide | ||
119 | ordering guarantees. | ||
120 | "bh" option forces use of buffer heads. | ||
121 | "nobh" option tries to avoid associating buffer | ||
122 | heads (supported only for "writeback" mode). | ||
123 | |||
116 | 124 | ||
117 | Specification | 125 | Specification |
118 | ============= | 126 | ============= |
diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt index 33f74310d161..a584f05403a4 100644 --- a/Documentation/filesystems/fuse.txt +++ b/Documentation/filesystems/fuse.txt | |||
@@ -18,6 +18,14 @@ Non-privileged mount (or user mount): | |||
18 | user. NOTE: this is not the same as mounts allowed with the "user" | 18 | user. NOTE: this is not the same as mounts allowed with the "user" |
19 | option in /etc/fstab, which is not discussed here. | 19 | option in /etc/fstab, which is not discussed here. |
20 | 20 | ||
21 | Filesystem connection: | ||
22 | |||
23 | A connection between the filesystem daemon and the kernel. The | ||
24 | connection exists until either the daemon dies, or the filesystem is | ||
25 | umounted. Note that detaching (or lazy umounting) the filesystem | ||
26 | does _not_ break the connection, in this case it will exist until | ||
27 | the last reference to the filesystem is released. | ||
28 | |||
21 | Mount owner: | 29 | Mount owner: |
22 | 30 | ||
23 | The user who does the mounting. | 31 | The user who does the mounting. |
@@ -86,16 +94,20 @@ Mount options | |||
86 | The default is infinite. Note that the size of read requests is | 94 | The default is infinite. Note that the size of read requests is |
87 | limited anyway to 32 pages (which is 128kbyte on i386). | 95 | limited anyway to 32 pages (which is 128kbyte on i386). |
88 | 96 | ||
89 | Sysfs | 97 | Control filesystem |
90 | ~~~~~ | 98 | ~~~~~~~~~~~~~~~~~~ |
99 | |||
100 | There's a control filesystem for FUSE, which can be mounted by: | ||
91 | 101 | ||
92 | FUSE sets up the following hierarchy in sysfs: | 102 | mount -t fusectl none /sys/fs/fuse/connections |
93 | 103 | ||
94 | /sys/fs/fuse/connections/N/ | 104 | Mounting it under the '/sys/fs/fuse/connections' directory makes it |
105 | backwards compatible with earlier versions. | ||
95 | 106 | ||
96 | where N is an increasing number allocated to each new connection. | 107 | Under the fuse control filesystem each connection has a directory |
108 | named by a unique number. | ||
97 | 109 | ||
98 | For each connection the following attributes are defined: | 110 | For each connection the following files exist within this directory: |
99 | 111 | ||
100 | 'waiting' | 112 | 'waiting' |
101 | 113 | ||
@@ -110,7 +122,47 @@ For each connection the following attributes are defined: | |||
110 | connection. This means that all waiting requests will be aborted an | 122 | connection. This means that all waiting requests will be aborted an |
111 | error returned for all aborted and new requests. | 123 | error returned for all aborted and new requests. |
112 | 124 | ||
113 | Only a privileged user may read or write these attributes. | 125 | Only the owner of the mount may read or write these files. |
126 | |||
127 | Interrupting filesystem operations | ||
128 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
129 | |||
130 | If a process issuing a FUSE filesystem request is interrupted, the | ||
131 | following will happen: | ||
132 | |||
133 | 1) If the request is not yet sent to userspace AND the signal is | ||
134 | fatal (SIGKILL or unhandled fatal signal), then the request is | ||
135 | dequeued and returns immediately. | ||
136 | |||
137 | 2) If the request is not yet sent to userspace AND the signal is not | ||
138 | fatal, then an 'interrupted' flag is set for the request. When | ||
139 | the request has been successfully transfered to userspace and | ||
140 | this flag is set, an INTERRUPT request is queued. | ||
141 | |||
142 | 3) If the request is already sent to userspace, then an INTERRUPT | ||
143 | request is queued. | ||
144 | |||
145 | INTERRUPT requests take precedence over other requests, so the | ||
146 | userspace filesystem will receive queued INTERRUPTs before any others. | ||
147 | |||
148 | The userspace filesystem may ignore the INTERRUPT requests entirely, | ||
149 | or may honor them by sending a reply to the _original_ request, with | ||
150 | the error set to EINTR. | ||
151 | |||
152 | It is also possible that there's a race between processing the | ||
153 | original request and it's INTERRUPT request. There are two possibilities: | ||
154 | |||
155 | 1) The INTERRUPT request is processed before the original request is | ||
156 | processed | ||
157 | |||
158 | 2) The INTERRUPT request is processed after the original request has | ||
159 | been answered | ||
160 | |||
161 | If the filesystem cannot find the original request, it should wait for | ||
162 | some timeout and/or a number of new requests to arrive, after which it | ||
163 | should reply to the INTERRUPT request with an EAGAIN error. In case | ||
164 | 1) the INTERRUPT request will be requeued. In case 2) the INTERRUPT | ||
165 | reply will be ignored. | ||
114 | 166 | ||
115 | Aborting a filesystem connection | 167 | Aborting a filesystem connection |
116 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 168 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
@@ -139,8 +191,8 @@ the filesystem. There are several ways to do this: | |||
139 | - Use forced umount (umount -f). Works in all cases but only if | 191 | - Use forced umount (umount -f). Works in all cases but only if |
140 | filesystem is still attached (it hasn't been lazy unmounted) | 192 | filesystem is still attached (it hasn't been lazy unmounted) |
141 | 193 | ||
142 | - Abort filesystem through the sysfs interface. Most powerful | 194 | - Abort filesystem through the FUSE control filesystem. Most |
143 | method, always works. | 195 | powerful method, always works. |
144 | 196 | ||
145 | How do non-privileged mounts work? | 197 | How do non-privileged mounts work? |
146 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 198 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
@@ -304,25 +356,7 @@ Scenario 1 - Simple deadlock | |||
304 | | | for "file"] | 356 | | | for "file"] |
305 | | | *DEADLOCK* | 357 | | | *DEADLOCK* |
306 | 358 | ||
307 | The solution for this is to allow requests to be interrupted while | 359 | The solution for this is to allow the filesystem to be aborted. |
308 | they are in userspace: | ||
309 | |||
310 | | [interrupted by signal] | | ||
311 | | <fuse_unlink() | | ||
312 | | [release semaphore] | [semaphore acquired] | ||
313 | | <sys_unlink() | | ||
314 | | | >fuse_unlink() | ||
315 | | | [queue req on fc->pending] | ||
316 | | | [wake up fc->waitq] | ||
317 | | | [sleep on req->waitq] | ||
318 | |||
319 | If the filesystem daemon was single threaded, this will stop here, | ||
320 | since there's no other thread to dequeue and execute the request. | ||
321 | In this case the solution is to kill the FUSE daemon as well. If | ||
322 | there are multiple serving threads, you just have to kill them as | ||
323 | long as any remain. | ||
324 | |||
325 | Moral: a filesystem which deadlocks, can soon find itself dead. | ||
326 | 360 | ||
327 | Scenario 2 - Tricky deadlock | 361 | Scenario 2 - Tricky deadlock |
328 | ---------------------------- | 362 | ---------------------------- |
@@ -355,24 +389,14 @@ but is caused by a pagefault. | |||
355 | | | [lock page] | 389 | | | [lock page] |
356 | | | * DEADLOCK * | 390 | | | * DEADLOCK * |
357 | 391 | ||
358 | Solution is again to let the the request be interrupted (not | 392 | Solution is basically the same as above. |
359 | elaborated further). | ||
360 | |||
361 | An additional problem is that while the write buffer is being | ||
362 | copied to the request, the request must not be interrupted. This | ||
363 | is because the destination address of the copy may not be valid | ||
364 | after the request is interrupted. | ||
365 | |||
366 | This is solved with doing the copy atomically, and allowing | ||
367 | interruption while the page(s) belonging to the write buffer are | ||
368 | faulted with get_user_pages(). The 'req->locked' flag indicates | ||
369 | when the copy is taking place, and interruption is delayed until | ||
370 | this flag is unset. | ||
371 | 393 | ||
372 | Scenario 3 - Tricky deadlock with asynchronous read | 394 | An additional problem is that while the write buffer is being copied |
373 | --------------------------------------------------- | 395 | to the request, the request must not be interrupted/aborted. This is |
396 | because the destination address of the copy may not be valid after the | ||
397 | request has returned. | ||
374 | 398 | ||
375 | The same situation as above, except thread-1 will wait on page lock | 399 | This is solved with doing the copy atomically, and allowing abort |
376 | and hence it will be uninterruptible as well. The solution is to | 400 | while the page(s) belonging to the write buffer are faulted with |
377 | abort the connection with forced umount (if mount is attached) or | 401 | get_user_pages(). The 'req->locked' flag indicates when the copy is |
378 | through the abort attribute in sysfs. | 402 | taking place, and abort is delayed until this flag is unset. |
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 99902ae6804e..7240ee7515de 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt | |||
@@ -39,6 +39,8 @@ Table of Contents | |||
39 | 2.9 Appletalk | 39 | 2.9 Appletalk |
40 | 2.10 IPX | 40 | 2.10 IPX |
41 | 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem | 41 | 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem |
42 | 2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score | ||
43 | 2.13 /proc/<pid>/oom_score - Display current oom-killer score | ||
42 | 44 | ||
43 | ------------------------------------------------------------------------------ | 45 | ------------------------------------------------------------------------------ |
44 | Preface | 46 | Preface |
@@ -1124,11 +1126,15 @@ debugging information is displayed on console. | |||
1124 | NMI switch that most IA32 servers have fires unknown NMI up, for example. | 1126 | NMI switch that most IA32 servers have fires unknown NMI up, for example. |
1125 | If a system hangs up, try pressing the NMI switch. | 1127 | If a system hangs up, try pressing the NMI switch. |
1126 | 1128 | ||
1127 | [NOTE] | 1129 | nmi_watchdog |
1128 | This function and oprofile share a NMI callback. Therefore this function | 1130 | ------------ |
1129 | cannot be enabled when oprofile is activated. | 1131 | |
1130 | And NMI watchdog will be disabled when the value in this file is set to | 1132 | Enables/Disables the NMI watchdog on x86 systems. When the value is non-zero |
1131 | non-zero. | 1133 | the NMI watchdog is enabled and will continuously test all online cpus to |
1134 | determine whether or not they are still functioning properly. | ||
1135 | |||
1136 | Because the NMI watchdog shares registers with oprofile, by disabling the NMI | ||
1137 | watchdog, oprofile may have more registers to utilize. | ||
1132 | 1138 | ||
1133 | 1139 | ||
1134 | 2.4 /proc/sys/vm - The virtual memory subsystem | 1140 | 2.4 /proc/sys/vm - The virtual memory subsystem |
@@ -1958,6 +1964,22 @@ a queue must be less or equal then msg_max. | |||
1958 | maximum message size value (it is every message queue's attribute set during | 1964 | maximum message size value (it is every message queue's attribute set during |
1959 | its creation). | 1965 | its creation). |
1960 | 1966 | ||
1967 | 2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score | ||
1968 | ------------------------------------------------------ | ||
1969 | |||
1970 | This file can be used to adjust the score used to select which processes | ||
1971 | should be killed in an out-of-memory situation. Giving it a high score will | ||
1972 | increase the likelihood of this process being killed by the oom-killer. Valid | ||
1973 | values are in the range -16 to +15, plus the special value -17, which disables | ||
1974 | oom-killing altogether for this process. | ||
1975 | |||
1976 | 2.13 /proc/<pid>/oom_score - Display current oom-killer score | ||
1977 | ------------------------------------------------------------- | ||
1978 | |||
1979 | ------------------------------------------------------------------------------ | ||
1980 | This file can be used to check the current score used by the oom-killer is for | ||
1981 | any given <pid>. Use it together with /proc/<pid>/oom_adj to tune which | ||
1982 | process should be killed in an out-of-memory situation. | ||
1961 | 1983 | ||
1962 | ------------------------------------------------------------------------------ | 1984 | ------------------------------------------------------------------------------ |
1963 | Summary | 1985 | Summary |
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt index 60ab61e54e8a..25981e2e51be 100644 --- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt +++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt | |||
@@ -70,11 +70,13 @@ tmpfs mounts. See Documentation/filesystems/tmpfs.txt for more information. | |||
70 | What is rootfs? | 70 | What is rootfs? |
71 | --------------- | 71 | --------------- |
72 | 72 | ||
73 | Rootfs is a special instance of ramfs, which is always present in 2.6 systems. | 73 | Rootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is |
74 | (It's used internally as the starting and stopping point for searches of the | 74 | always present in 2.6 systems. You can't unmount rootfs for approximately the |
75 | kernel's doubly-linked list of mount points.) | 75 | same reason you can't kill the init process; rather than having special code |
76 | to check for and handle an empty list, it's smaller and simpler for the kernel | ||
77 | to just make sure certain lists can't become empty. | ||
76 | 78 | ||
77 | Most systems just mount another filesystem over it and ignore it. The | 79 | Most systems just mount another filesystem over rootfs and ignore it. The |
78 | amount of space an empty instance of ramfs takes up is tiny. | 80 | amount of space an empty instance of ramfs takes up is tiny. |
79 | 81 | ||
80 | What is initramfs? | 82 | What is initramfs? |
@@ -92,14 +94,16 @@ out of that. | |||
92 | 94 | ||
93 | All this differs from the old initrd in several ways: | 95 | All this differs from the old initrd in several ways: |
94 | 96 | ||
95 | - The old initrd was a separate file, while the initramfs archive is linked | 97 | - The old initrd was always a separate file, while the initramfs archive is |
96 | into the linux kernel image. (The directory linux-*/usr is devoted to | 98 | linked into the linux kernel image. (The directory linux-*/usr is devoted |
97 | generating this archive during the build.) | 99 | to generating this archive during the build.) |
98 | 100 | ||
99 | - The old initrd file was a gzipped filesystem image (in some file format, | 101 | - The old initrd file was a gzipped filesystem image (in some file format, |
100 | such as ext2, that had to be built into the kernel), while the new | 102 | such as ext2, that needed a driver built into the kernel), while the new |
101 | initramfs archive is a gzipped cpio archive (like tar only simpler, | 103 | initramfs archive is a gzipped cpio archive (like tar only simpler, |
102 | see cpio(1) and Documentation/early-userspace/buffer-format.txt). | 104 | see cpio(1) and Documentation/early-userspace/buffer-format.txt). The |
105 | kernel's cpio extraction code is not only extremely small, it's also | ||
106 | __init data that can be discarded during the boot process. | ||
103 | 107 | ||
104 | - The program run by the old initrd (which was called /initrd, not /init) did | 108 | - The program run by the old initrd (which was called /initrd, not /init) did |
105 | some setup and then returned to the kernel, while the init program from | 109 | some setup and then returned to the kernel, while the init program from |
@@ -124,13 +128,14 @@ Populating initramfs: | |||
124 | 128 | ||
125 | The 2.6 kernel build process always creates a gzipped cpio format initramfs | 129 | The 2.6 kernel build process always creates a gzipped cpio format initramfs |
126 | archive and links it into the resulting kernel binary. By default, this | 130 | archive and links it into the resulting kernel binary. By default, this |
127 | archive is empty (consuming 134 bytes on x86). The config option | 131 | archive is empty (consuming 134 bytes on x86). |
128 | CONFIG_INITRAMFS_SOURCE (for some reason buried under devices->block devices | 132 | |
129 | in menuconfig, and living in usr/Kconfig) can be used to specify a source for | 133 | The config option CONFIG_INITRAMFS_SOURCE (for some reason buried under |
130 | the initramfs archive, which will automatically be incorporated into the | 134 | devices->block devices in menuconfig, and living in usr/Kconfig) can be used |
131 | resulting binary. This option can point to an existing gzipped cpio archive, a | 135 | to specify a source for the initramfs archive, which will automatically be |
132 | directory containing files to be archived, or a text file specification such | 136 | incorporated into the resulting binary. This option can point to an existing |
133 | as the following example: | 137 | gzipped cpio archive, a directory containing files to be archived, or a text |
138 | file specification such as the following example: | ||
134 | 139 | ||
135 | dir /dev 755 0 0 | 140 | dir /dev 755 0 0 |
136 | nod /dev/console 644 0 0 c 5 1 | 141 | nod /dev/console 644 0 0 c 5 1 |
@@ -146,23 +151,84 @@ as the following example: | |||
146 | Run "usr/gen_init_cpio" (after the kernel build) to get a usage message | 151 | Run "usr/gen_init_cpio" (after the kernel build) to get a usage message |
147 | documenting the above file format. | 152 | documenting the above file format. |
148 | 153 | ||
149 | One advantage of the text file is that root access is not required to | 154 | One advantage of the configuration file is that root access is not required to |
150 | set permissions or create device nodes in the new archive. (Note that those | 155 | set permissions or create device nodes in the new archive. (Note that those |
151 | two example "file" entries expect to find files named "init.sh" and "busybox" in | 156 | two example "file" entries expect to find files named "init.sh" and "busybox" in |
152 | a directory called "initramfs", under the linux-2.6.* directory. See | 157 | a directory called "initramfs", under the linux-2.6.* directory. See |
153 | Documentation/early-userspace/README for more details.) | 158 | Documentation/early-userspace/README for more details.) |
154 | 159 | ||
155 | The kernel does not depend on external cpio tools, gen_init_cpio is created | 160 | The kernel does not depend on external cpio tools. If you specify a |
156 | from usr/gen_init_cpio.c which is entirely self-contained, and the kernel's | 161 | directory instead of a configuration file, the kernel's build infrastructure |
157 | boot-time extractor is also (obviously) self-contained. However, if you _do_ | 162 | creates a configuration file from that directory (usr/Makefile calls |
158 | happen to have cpio installed, the following command line can extract the | 163 | scripts/gen_initramfs_list.sh), and proceeds to package up that directory |
159 | generated cpio image back into its component files: | 164 | using the config file (by feeding it to usr/gen_init_cpio, which is created |
165 | from usr/gen_init_cpio.c). The kernel's build-time cpio creation code is | ||
166 | entirely self-contained, and the kernel's boot-time extractor is also | ||
167 | (obviously) self-contained. | ||
168 | |||
169 | The one thing you might need external cpio utilities installed for is creating | ||
170 | or extracting your own preprepared cpio files to feed to the kernel build | ||
171 | (instead of a config file or directory). | ||
172 | |||
173 | The following command line can extract a cpio image (either by the above script | ||
174 | or by the kernel build) back into its component files: | ||
160 | 175 | ||
161 | cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames | 176 | cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames |
162 | 177 | ||
178 | The following shell script can create a prebuilt cpio archive you can | ||
179 | use in place of the above config file: | ||
180 | |||
181 | #!/bin/sh | ||
182 | |||
183 | # Copyright 2006 Rob Landley <rob@landley.net> and TimeSys Corporation. | ||
184 | # Licensed under GPL version 2 | ||
185 | |||
186 | if [ $# -ne 2 ] | ||
187 | then | ||
188 | echo "usage: mkinitramfs directory imagename.cpio.gz" | ||
189 | exit 1 | ||
190 | fi | ||
191 | |||
192 | if [ -d "$1" ] | ||
193 | then | ||
194 | echo "creating $2 from $1" | ||
195 | (cd "$1"; find . | cpio -o -H newc | gzip) > "$2" | ||
196 | else | ||
197 | echo "First argument must be a directory" | ||
198 | exit 1 | ||
199 | fi | ||
200 | |||
201 | Note: The cpio man page contains some bad advice that will break your initramfs | ||
202 | archive if you follow it. It says "A typical way to generate the list | ||
203 | of filenames is with the find command; you should give find the -depth option | ||
204 | to minimize problems with permissions on directories that are unwritable or not | ||
205 | searchable." Don't do this when creating initramfs.cpio.gz images, it won't | ||
206 | work. The Linux kernel cpio extractor won't create files in a directory that | ||
207 | doesn't exist, so the directory entries must go before the files that go in | ||
208 | those directories. The above script gets them in the right order. | ||
209 | |||
210 | External initramfs images: | ||
211 | -------------------------- | ||
212 | |||
213 | If the kernel has initrd support enabled, an external cpio.gz archive can also | ||
214 | be passed into a 2.6 kernel in place of an initrd. In this case, the kernel | ||
215 | will autodetect the type (initramfs, not initrd) and extract the external cpio | ||
216 | archive into rootfs before trying to run /init. | ||
217 | |||
218 | This has the memory efficiency advantages of initramfs (no ramdisk block | ||
219 | device) but the separate packaging of initrd (which is nice if you have | ||
220 | non-GPL code you'd like to run from initramfs, without conflating it with | ||
221 | the GPL licensed Linux kernel binary). | ||
222 | |||
223 | It can also be used to supplement the kernel's built-in initamfs image. The | ||
224 | files in the external archive will overwrite any conflicting files in | ||
225 | the built-in initramfs archive. Some distributors also prefer to customize | ||
226 | a single kernel image with task-specific initramfs images, without recompiling. | ||
227 | |||
163 | Contents of initramfs: | 228 | Contents of initramfs: |
164 | ---------------------- | 229 | ---------------------- |
165 | 230 | ||
231 | An initramfs archive is a complete self-contained root filesystem for Linux. | ||
166 | If you don't already understand what shared libraries, devices, and paths | 232 | If you don't already understand what shared libraries, devices, and paths |
167 | you need to get a minimal root filesystem up and running, here are some | 233 | you need to get a minimal root filesystem up and running, here are some |
168 | references: | 234 | references: |
@@ -176,13 +242,36 @@ code against, along with some related utilities. It is BSD licensed. | |||
176 | 242 | ||
177 | I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net) | 243 | I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net) |
178 | myself. These are LGPL and GPL, respectively. (A self-contained initramfs | 244 | myself. These are LGPL and GPL, respectively. (A self-contained initramfs |
179 | package is planned for the busybox 1.2 release.) | 245 | package is planned for the busybox 1.3 release.) |
180 | 246 | ||
181 | In theory you could use glibc, but that's not well suited for small embedded | 247 | In theory you could use glibc, but that's not well suited for small embedded |
182 | uses like this. (A "hello world" program statically linked against glibc is | 248 | uses like this. (A "hello world" program statically linked against glibc is |
183 | over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do | 249 | over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do |
184 | name lookups, even when otherwise statically linked.) | 250 | name lookups, even when otherwise statically linked.) |
185 | 251 | ||
252 | A good first step is to get initramfs to run a statically linked "hello world" | ||
253 | program as init, and test it under an emulator like qemu (www.qemu.org) or | ||
254 | User Mode Linux, like so: | ||
255 | |||
256 | cat > hello.c << EOF | ||
257 | #include <stdio.h> | ||
258 | #include <unistd.h> | ||
259 | |||
260 | int main(int argc, char *argv[]) | ||
261 | { | ||
262 | printf("Hello world!\n"); | ||
263 | sleep(999999999); | ||
264 | } | ||
265 | EOF | ||
266 | gcc -static hello2.c -o init | ||
267 | echo init | cpio -o -H newc | gzip > test.cpio.gz | ||
268 | # Testing external initramfs using the initrd loading mechanism. | ||
269 | qemu -kernel /boot/vmlinuz -initrd test.cpio.gz /dev/zero | ||
270 | |||
271 | When debugging a normal root filesystem, it's nice to be able to boot with | ||
272 | "init=/bin/sh". The initramfs equivalent is "rdinit=/bin/sh", and it's | ||
273 | just as useful. | ||
274 | |||
186 | Why cpio rather than tar? | 275 | Why cpio rather than tar? |
187 | ------------------------- | 276 | ------------------------- |
188 | 277 | ||
@@ -241,7 +330,7 @@ the above threads) is: | |||
241 | Future directions: | 330 | Future directions: |
242 | ------------------ | 331 | ------------------ |
243 | 332 | ||
244 | Today (2.6.14), initramfs is always compiled in, but not always used. The | 333 | Today (2.6.16), initramfs is always compiled in, but not always used. The |
245 | kernel falls back to legacy boot code that is reached only if initramfs does | 334 | kernel falls back to legacy boot code that is reached only if initramfs does |
246 | not contain an /init program. The fallback is legacy code, there to ensure a | 335 | not contain an /init program. The fallback is legacy code, there to ensure a |
247 | smooth transition and allowing early boot functionality to gradually move to | 336 | smooth transition and allowing early boot functionality to gradually move to |
@@ -258,8 +347,9 @@ and so on. | |||
258 | 347 | ||
259 | This kind of complexity (which inevitably includes policy) is rightly handled | 348 | This kind of complexity (which inevitably includes policy) is rightly handled |
260 | in userspace. Both klibc and busybox/uClibc are working on simple initramfs | 349 | in userspace. Both klibc and busybox/uClibc are working on simple initramfs |
261 | packages to drop into a kernel build, and when standard solutions are ready | 350 | packages to drop into a kernel build. |
262 | and widely deployed, the kernel's legacy early boot code will become obsolete | ||
263 | and a candidate for the feature removal schedule. | ||
264 | 351 | ||
265 | But that's a while off yet. | 352 | The klibc package has now been accepted into Andrew Morton's 2.6.17-mm tree. |
353 | The kernel's current early boot code (partition detection, etc) will probably | ||
354 | be migrated into a default initramfs, automatically created and used by the | ||
355 | kernel build. | ||
diff --git a/Documentation/filesystems/relay.txt b/Documentation/filesystems/relay.txt new file mode 100644 index 000000000000..d6788dae0349 --- /dev/null +++ b/Documentation/filesystems/relay.txt | |||
@@ -0,0 +1,479 @@ | |||
1 | relay interface (formerly relayfs) | ||
2 | ================================== | ||
3 | |||
4 | The relay interface provides a means for kernel applications to | ||
5 | efficiently log and transfer large quantities of data from the kernel | ||
6 | to userspace via user-defined 'relay channels'. | ||
7 | |||
8 | A 'relay channel' is a kernel->user data relay mechanism implemented | ||
9 | as a set of per-cpu kernel buffers ('channel buffers'), each | ||
10 | represented as a regular file ('relay file') in user space. Kernel | ||
11 | clients write into the channel buffers using efficient write | ||
12 | functions; these automatically log into the current cpu's channel | ||
13 | buffer. User space applications mmap() or read() from the relay files | ||
14 | and retrieve the data as it becomes available. The relay files | ||
15 | themselves are files created in a host filesystem, e.g. debugfs, and | ||
16 | are associated with the channel buffers using the API described below. | ||
17 | |||
18 | The format of the data logged into the channel buffers is completely | ||
19 | up to the kernel client; the relay interface does however provide | ||
20 | hooks which allow kernel clients to impose some structure on the | ||
21 | buffer data. The relay interface doesn't implement any form of data | ||
22 | filtering - this also is left to the kernel client. The purpose is to | ||
23 | keep things as simple as possible. | ||
24 | |||
25 | This document provides an overview of the relay interface API. The | ||
26 | details of the function parameters are documented along with the | ||
27 | functions in the relay interface code - please see that for details. | ||
28 | |||
29 | Semantics | ||
30 | ========= | ||
31 | |||
32 | Each relay channel has one buffer per CPU, each buffer has one or more | ||
33 | sub-buffers. Messages are written to the first sub-buffer until it is | ||
34 | too full to contain a new message, in which case it it is written to | ||
35 | the next (if available). Messages are never split across sub-buffers. | ||
36 | At this point, userspace can be notified so it empties the first | ||
37 | sub-buffer, while the kernel continues writing to the next. | ||
38 | |||
39 | When notified that a sub-buffer is full, the kernel knows how many | ||
40 | bytes of it are padding i.e. unused space occurring because a complete | ||
41 | message couldn't fit into a sub-buffer. Userspace can use this | ||
42 | knowledge to copy only valid data. | ||
43 | |||
44 | After copying it, userspace can notify the kernel that a sub-buffer | ||
45 | has been consumed. | ||
46 | |||
47 | A relay channel can operate in a mode where it will overwrite data not | ||
48 | yet collected by userspace, and not wait for it to be consumed. | ||
49 | |||
50 | The relay channel itself does not provide for communication of such | ||
51 | data between userspace and kernel, allowing the kernel side to remain | ||
52 | simple and not impose a single interface on userspace. It does | ||
53 | provide a set of examples and a separate helper though, described | ||
54 | below. | ||
55 | |||
56 | The read() interface both removes padding and internally consumes the | ||
57 | read sub-buffers; thus in cases where read(2) is being used to drain | ||
58 | the channel buffers, special-purpose communication between kernel and | ||
59 | user isn't necessary for basic operation. | ||
60 | |||
61 | One of the major goals of the relay interface is to provide a low | ||
62 | overhead mechanism for conveying kernel data to userspace. While the | ||
63 | read() interface is easy to use, it's not as efficient as the mmap() | ||
64 | approach; the example code attempts to make the tradeoff between the | ||
65 | two approaches as small as possible. | ||
66 | |||
67 | klog and relay-apps example code | ||
68 | ================================ | ||
69 | |||
70 | The relay interface itself is ready to use, but to make things easier, | ||
71 | a couple simple utility functions and a set of examples are provided. | ||
72 | |||
73 | The relay-apps example tarball, available on the relay sourceforge | ||
74 | site, contains a set of self-contained examples, each consisting of a | ||
75 | pair of .c files containing boilerplate code for each of the user and | ||
76 | kernel sides of a relay application. When combined these two sets of | ||
77 | boilerplate code provide glue to easily stream data to disk, without | ||
78 | having to bother with mundane housekeeping chores. | ||
79 | |||
80 | The 'klog debugging functions' patch (klog.patch in the relay-apps | ||
81 | tarball) provides a couple of high-level logging functions to the | ||
82 | kernel which allow writing formatted text or raw data to a channel, | ||
83 | regardless of whether a channel to write into exists or not, or even | ||
84 | whether the relay interface is compiled into the kernel or not. These | ||
85 | functions allow you to put unconditional 'trace' statements anywhere | ||
86 | in the kernel or kernel modules; only when there is a 'klog handler' | ||
87 | registered will data actually be logged (see the klog and kleak | ||
88 | examples for details). | ||
89 | |||
90 | It is of course possible to use the relay interface from scratch, | ||
91 | i.e. without using any of the relay-apps example code or klog, but | ||
92 | you'll have to implement communication between userspace and kernel, | ||
93 | allowing both to convey the state of buffers (full, empty, amount of | ||
94 | padding). The read() interface both removes padding and internally | ||
95 | consumes the read sub-buffers; thus in cases where read(2) is being | ||
96 | used to drain the channel buffers, special-purpose communication | ||
97 | between kernel and user isn't necessary for basic operation. Things | ||
98 | such as buffer-full conditions would still need to be communicated via | ||
99 | some channel though. | ||
100 | |||
101 | klog and the relay-apps examples can be found in the relay-apps | ||
102 | tarball on http://relayfs.sourceforge.net | ||
103 | |||
104 | The relay interface user space API | ||
105 | ================================== | ||
106 | |||
107 | The relay interface implements basic file operations for user space | ||
108 | access to relay channel buffer data. Here are the file operations | ||
109 | that are available and some comments regarding their behavior: | ||
110 | |||
111 | open() enables user to open an _existing_ channel buffer. | ||
112 | |||
113 | mmap() results in channel buffer being mapped into the caller's | ||
114 | memory space. Note that you can't do a partial mmap - you | ||
115 | must map the entire file, which is NRBUF * SUBBUFSIZE. | ||
116 | |||
117 | read() read the contents of a channel buffer. The bytes read are | ||
118 | 'consumed' by the reader, i.e. they won't be available | ||
119 | again to subsequent reads. If the channel is being used | ||
120 | in no-overwrite mode (the default), it can be read at any | ||
121 | time even if there's an active kernel writer. If the | ||
122 | channel is being used in overwrite mode and there are | ||
123 | active channel writers, results may be unpredictable - | ||
124 | users should make sure that all logging to the channel has | ||
125 | ended before using read() with overwrite mode. Sub-buffer | ||
126 | padding is automatically removed and will not be seen by | ||
127 | the reader. | ||
128 | |||
129 | sendfile() transfer data from a channel buffer to an output file | ||
130 | descriptor. Sub-buffer padding is automatically removed | ||
131 | and will not be seen by the reader. | ||
132 | |||
133 | poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are | ||
134 | notified when sub-buffer boundaries are crossed. | ||
135 | |||
136 | close() decrements the channel buffer's refcount. When the refcount | ||
137 | reaches 0, i.e. when no process or kernel client has the | ||
138 | buffer open, the channel buffer is freed. | ||
139 | |||
140 | In order for a user application to make use of relay files, the | ||
141 | host filesystem must be mounted. For example, | ||
142 | |||
143 | mount -t debugfs debugfs /debug | ||
144 | |||
145 | NOTE: the host filesystem doesn't need to be mounted for kernel | ||
146 | clients to create or use channels - it only needs to be | ||
147 | mounted when user space applications need access to the buffer | ||
148 | data. | ||
149 | |||
150 | |||
151 | The relay interface kernel API | ||
152 | ============================== | ||
153 | |||
154 | Here's a summary of the API the relay interface provides to in-kernel clients: | ||
155 | |||
156 | TBD(curr. line MT:/API/) | ||
157 | channel management functions: | ||
158 | |||
159 | relay_open(base_filename, parent, subbuf_size, n_subbufs, | ||
160 | callbacks) | ||
161 | relay_close(chan) | ||
162 | relay_flush(chan) | ||
163 | relay_reset(chan) | ||
164 | |||
165 | channel management typically called on instigation of userspace: | ||
166 | |||
167 | relay_subbufs_consumed(chan, cpu, subbufs_consumed) | ||
168 | |||
169 | write functions: | ||
170 | |||
171 | relay_write(chan, data, length) | ||
172 | __relay_write(chan, data, length) | ||
173 | relay_reserve(chan, length) | ||
174 | |||
175 | callbacks: | ||
176 | |||
177 | subbuf_start(buf, subbuf, prev_subbuf, prev_padding) | ||
178 | buf_mapped(buf, filp) | ||
179 | buf_unmapped(buf, filp) | ||
180 | create_buf_file(filename, parent, mode, buf, is_global) | ||
181 | remove_buf_file(dentry) | ||
182 | |||
183 | helper functions: | ||
184 | |||
185 | relay_buf_full(buf) | ||
186 | subbuf_start_reserve(buf, length) | ||
187 | |||
188 | |||
189 | Creating a channel | ||
190 | ------------------ | ||
191 | |||
192 | relay_open() is used to create a channel, along with its per-cpu | ||
193 | channel buffers. Each channel buffer will have an associated file | ||
194 | created for it in the host filesystem, which can be and mmapped or | ||
195 | read from in user space. The files are named basename0...basenameN-1 | ||
196 | where N is the number of online cpus, and by default will be created | ||
197 | in the root of the filesystem (if the parent param is NULL). If you | ||
198 | want a directory structure to contain your relay files, you should | ||
199 | create it using the host filesystem's directory creation function, | ||
200 | e.g. debugfs_create_dir(), and pass the parent directory to | ||
201 | relay_open(). Users are responsible for cleaning up any directory | ||
202 | structure they create, when the channel is closed - again the host | ||
203 | filesystem's directory removal functions should be used for that, | ||
204 | e.g. debugfs_remove(). | ||
205 | |||
206 | In order for a channel to be created and the host filesystem's files | ||
207 | associated with its channel buffers, the user must provide definitions | ||
208 | for two callback functions, create_buf_file() and remove_buf_file(). | ||
209 | create_buf_file() is called once for each per-cpu buffer from | ||
210 | relay_open() and allows the user to create the file which will be used | ||
211 | to represent the corresponding channel buffer. The callback should | ||
212 | return the dentry of the file created to represent the channel buffer. | ||
213 | remove_buf_file() must also be defined; it's responsible for deleting | ||
214 | the file(s) created in create_buf_file() and is called during | ||
215 | relay_close(). | ||
216 | |||
217 | Here are some typical definitions for these callbacks, in this case | ||
218 | using debugfs: | ||
219 | |||
220 | /* | ||
221 | * create_buf_file() callback. Creates relay file in debugfs. | ||
222 | */ | ||
223 | static struct dentry *create_buf_file_handler(const char *filename, | ||
224 | struct dentry *parent, | ||
225 | int mode, | ||
226 | struct rchan_buf *buf, | ||
227 | int *is_global) | ||
228 | { | ||
229 | return debugfs_create_file(filename, mode, parent, buf, | ||
230 | &relay_file_operations); | ||
231 | } | ||
232 | |||
233 | /* | ||
234 | * remove_buf_file() callback. Removes relay file from debugfs. | ||
235 | */ | ||
236 | static int remove_buf_file_handler(struct dentry *dentry) | ||
237 | { | ||
238 | debugfs_remove(dentry); | ||
239 | |||
240 | return 0; | ||
241 | } | ||
242 | |||
243 | /* | ||
244 | * relay interface callbacks | ||
245 | */ | ||
246 | static struct rchan_callbacks relay_callbacks = | ||
247 | { | ||
248 | .create_buf_file = create_buf_file_handler, | ||
249 | .remove_buf_file = remove_buf_file_handler, | ||
250 | }; | ||
251 | |||
252 | And an example relay_open() invocation using them: | ||
253 | |||
254 | chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks); | ||
255 | |||
256 | If the create_buf_file() callback fails, or isn't defined, channel | ||
257 | creation and thus relay_open() will fail. | ||
258 | |||
259 | The total size of each per-cpu buffer is calculated by multiplying the | ||
260 | number of sub-buffers by the sub-buffer size passed into relay_open(). | ||
261 | The idea behind sub-buffers is that they're basically an extension of | ||
262 | double-buffering to N buffers, and they also allow applications to | ||
263 | easily implement random-access-on-buffer-boundary schemes, which can | ||
264 | be important for some high-volume applications. The number and size | ||
265 | of sub-buffers is completely dependent on the application and even for | ||
266 | the same application, different conditions will warrant different | ||
267 | values for these parameters at different times. Typically, the right | ||
268 | values to use are best decided after some experimentation; in general, | ||
269 | though, it's safe to assume that having only 1 sub-buffer is a bad | ||
270 | idea - you're guaranteed to either overwrite data or lose events | ||
271 | depending on the channel mode being used. | ||
272 | |||
273 | The create_buf_file() implementation can also be defined in such a way | ||
274 | as to allow the creation of a single 'global' buffer instead of the | ||
275 | default per-cpu set. This can be useful for applications interested | ||
276 | mainly in seeing the relative ordering of system-wide events without | ||
277 | the need to bother with saving explicit timestamps for the purpose of | ||
278 | merging/sorting per-cpu files in a postprocessing step. | ||
279 | |||
280 | To have relay_open() create a global buffer, the create_buf_file() | ||
281 | implementation should set the value of the is_global outparam to a | ||
282 | non-zero value in addition to creating the file that will be used to | ||
283 | represent the single buffer. In the case of a global buffer, | ||
284 | create_buf_file() and remove_buf_file() will be called only once. The | ||
285 | normal channel-writing functions, e.g. relay_write(), can still be | ||
286 | used - writes from any cpu will transparently end up in the global | ||
287 | buffer - but since it is a global buffer, callers should make sure | ||
288 | they use the proper locking for such a buffer, either by wrapping | ||
289 | writes in a spinlock, or by copying a write function from relay.h and | ||
290 | creating a local version that internally does the proper locking. | ||
291 | |||
292 | Channel 'modes' | ||
293 | --------------- | ||
294 | |||
295 | relay channels can be used in either of two modes - 'overwrite' or | ||
296 | 'no-overwrite'. The mode is entirely determined by the implementation | ||
297 | of the subbuf_start() callback, as described below. The default if no | ||
298 | subbuf_start() callback is defined is 'no-overwrite' mode. If the | ||
299 | default mode suits your needs, and you plan to use the read() | ||
300 | interface to retrieve channel data, you can ignore the details of this | ||
301 | section, as it pertains mainly to mmap() implementations. | ||
302 | |||
303 | In 'overwrite' mode, also known as 'flight recorder' mode, writes | ||
304 | continuously cycle around the buffer and will never fail, but will | ||
305 | unconditionally overwrite old data regardless of whether it's actually | ||
306 | been consumed. In no-overwrite mode, writes will fail, i.e. data will | ||
307 | be lost, if the number of unconsumed sub-buffers equals the total | ||
308 | number of sub-buffers in the channel. It should be clear that if | ||
309 | there is no consumer or if the consumer can't consume sub-buffers fast | ||
310 | enough, data will be lost in either case; the only difference is | ||
311 | whether data is lost from the beginning or the end of a buffer. | ||
312 | |||
313 | As explained above, a relay channel is made of up one or more | ||
314 | per-cpu channel buffers, each implemented as a circular buffer | ||
315 | subdivided into one or more sub-buffers. Messages are written into | ||
316 | the current sub-buffer of the channel's current per-cpu buffer via the | ||
317 | write functions described below. Whenever a message can't fit into | ||
318 | the current sub-buffer, because there's no room left for it, the | ||
319 | client is notified via the subbuf_start() callback that a switch to a | ||
320 | new sub-buffer is about to occur. The client uses this callback to 1) | ||
321 | initialize the next sub-buffer if appropriate 2) finalize the previous | ||
322 | sub-buffer if appropriate and 3) return a boolean value indicating | ||
323 | whether or not to actually move on to the next sub-buffer. | ||
324 | |||
325 | To implement 'no-overwrite' mode, the userspace client would provide | ||
326 | an implementation of the subbuf_start() callback something like the | ||
327 | following: | ||
328 | |||
329 | static int subbuf_start(struct rchan_buf *buf, | ||
330 | void *subbuf, | ||
331 | void *prev_subbuf, | ||
332 | unsigned int prev_padding) | ||
333 | { | ||
334 | if (prev_subbuf) | ||
335 | *((unsigned *)prev_subbuf) = prev_padding; | ||
336 | |||
337 | if (relay_buf_full(buf)) | ||
338 | return 0; | ||
339 | |||
340 | subbuf_start_reserve(buf, sizeof(unsigned int)); | ||
341 | |||
342 | return 1; | ||
343 | } | ||
344 | |||
345 | If the current buffer is full, i.e. all sub-buffers remain unconsumed, | ||
346 | the callback returns 0 to indicate that the buffer switch should not | ||
347 | occur yet, i.e. until the consumer has had a chance to read the | ||
348 | current set of ready sub-buffers. For the relay_buf_full() function | ||
349 | to make sense, the consumer is reponsible for notifying the relay | ||
350 | interface when sub-buffers have been consumed via | ||
351 | relay_subbufs_consumed(). Any subsequent attempts to write into the | ||
352 | buffer will again invoke the subbuf_start() callback with the same | ||
353 | parameters; only when the consumer has consumed one or more of the | ||
354 | ready sub-buffers will relay_buf_full() return 0, in which case the | ||
355 | buffer switch can continue. | ||
356 | |||
357 | The implementation of the subbuf_start() callback for 'overwrite' mode | ||
358 | would be very similar: | ||
359 | |||
360 | static int subbuf_start(struct rchan_buf *buf, | ||
361 | void *subbuf, | ||
362 | void *prev_subbuf, | ||
363 | unsigned int prev_padding) | ||
364 | { | ||
365 | if (prev_subbuf) | ||
366 | *((unsigned *)prev_subbuf) = prev_padding; | ||
367 | |||
368 | subbuf_start_reserve(buf, sizeof(unsigned int)); | ||
369 | |||
370 | return 1; | ||
371 | } | ||
372 | |||
373 | In this case, the relay_buf_full() check is meaningless and the | ||
374 | callback always returns 1, causing the buffer switch to occur | ||
375 | unconditionally. It's also meaningless for the client to use the | ||
376 | relay_subbufs_consumed() function in this mode, as it's never | ||
377 | consulted. | ||
378 | |||
379 | The default subbuf_start() implementation, used if the client doesn't | ||
380 | define any callbacks, or doesn't define the subbuf_start() callback, | ||
381 | implements the simplest possible 'no-overwrite' mode, i.e. it does | ||
382 | nothing but return 0. | ||
383 | |||
384 | Header information can be reserved at the beginning of each sub-buffer | ||
385 | by calling the subbuf_start_reserve() helper function from within the | ||
386 | subbuf_start() callback. This reserved area can be used to store | ||
387 | whatever information the client wants. In the example above, room is | ||
388 | reserved in each sub-buffer to store the padding count for that | ||
389 | sub-buffer. This is filled in for the previous sub-buffer in the | ||
390 | subbuf_start() implementation; the padding value for the previous | ||
391 | sub-buffer is passed into the subbuf_start() callback along with a | ||
392 | pointer to the previous sub-buffer, since the padding value isn't | ||
393 | known until a sub-buffer is filled. The subbuf_start() callback is | ||
394 | also called for the first sub-buffer when the channel is opened, to | ||
395 | give the client a chance to reserve space in it. In this case the | ||
396 | previous sub-buffer pointer passed into the callback will be NULL, so | ||
397 | the client should check the value of the prev_subbuf pointer before | ||
398 | writing into the previous sub-buffer. | ||
399 | |||
400 | Writing to a channel | ||
401 | -------------------- | ||
402 | |||
403 | Kernel clients write data into the current cpu's channel buffer using | ||
404 | relay_write() or __relay_write(). relay_write() is the main logging | ||
405 | function - it uses local_irqsave() to protect the buffer and should be | ||
406 | used if you might be logging from interrupt context. If you know | ||
407 | you'll never be logging from interrupt context, you can use | ||
408 | __relay_write(), which only disables preemption. These functions | ||
409 | don't return a value, so you can't determine whether or not they | ||
410 | failed - the assumption is that you wouldn't want to check a return | ||
411 | value in the fast logging path anyway, and that they'll always succeed | ||
412 | unless the buffer is full and no-overwrite mode is being used, in | ||
413 | which case you can detect a failed write in the subbuf_start() | ||
414 | callback by calling the relay_buf_full() helper function. | ||
415 | |||
416 | relay_reserve() is used to reserve a slot in a channel buffer which | ||
417 | can be written to later. This would typically be used in applications | ||
418 | that need to write directly into a channel buffer without having to | ||
419 | stage data in a temporary buffer beforehand. Because the actual write | ||
420 | may not happen immediately after the slot is reserved, applications | ||
421 | using relay_reserve() can keep a count of the number of bytes actually | ||
422 | written, either in space reserved in the sub-buffers themselves or as | ||
423 | a separate array. See the 'reserve' example in the relay-apps tarball | ||
424 | at http://relayfs.sourceforge.net for an example of how this can be | ||
425 | done. Because the write is under control of the client and is | ||
426 | separated from the reserve, relay_reserve() doesn't protect the buffer | ||
427 | at all - it's up to the client to provide the appropriate | ||
428 | synchronization when using relay_reserve(). | ||
429 | |||
430 | Closing a channel | ||
431 | ----------------- | ||
432 | |||
433 | The client calls relay_close() when it's finished using the channel. | ||
434 | The channel and its associated buffers are destroyed when there are no | ||
435 | longer any references to any of the channel buffers. relay_flush() | ||
436 | forces a sub-buffer switch on all the channel buffers, and can be used | ||
437 | to finalize and process the last sub-buffers before the channel is | ||
438 | closed. | ||
439 | |||
440 | Misc | ||
441 | ---- | ||
442 | |||
443 | Some applications may want to keep a channel around and re-use it | ||
444 | rather than open and close a new channel for each use. relay_reset() | ||
445 | can be used for this purpose - it resets a channel to its initial | ||
446 | state without reallocating channel buffer memory or destroying | ||
447 | existing mappings. It should however only be called when it's safe to | ||
448 | do so, i.e. when the channel isn't currently being written to. | ||
449 | |||
450 | Finally, there are a couple of utility callbacks that can be used for | ||
451 | different purposes. buf_mapped() is called whenever a channel buffer | ||
452 | is mmapped from user space and buf_unmapped() is called when it's | ||
453 | unmapped. The client can use this notification to trigger actions | ||
454 | within the kernel application, such as enabling/disabling logging to | ||
455 | the channel. | ||
456 | |||
457 | |||
458 | Resources | ||
459 | ========= | ||
460 | |||
461 | For news, example code, mailing list, etc. see the relay interface homepage: | ||
462 | |||
463 | http://relayfs.sourceforge.net | ||
464 | |||
465 | |||
466 | Credits | ||
467 | ======= | ||
468 | |||
469 | The ideas and specs for the relay interface came about as a result of | ||
470 | discussions on tracing involving the following: | ||
471 | |||
472 | Michel Dagenais <michel.dagenais@polymtl.ca> | ||
473 | Richard Moore <richardj_moore@uk.ibm.com> | ||
474 | Bob Wisniewski <bob@watson.ibm.com> | ||
475 | Karim Yaghmour <karim@opersys.com> | ||
476 | Tom Zanussi <zanussi@us.ibm.com> | ||
477 | |||
478 | Also thanks to Hubertus Franke for a lot of useful suggestions and bug | ||
479 | reports. | ||
diff --git a/Documentation/filesystems/relayfs.txt b/Documentation/filesystems/relayfs.txt deleted file mode 100644 index 5832377b7340..000000000000 --- a/Documentation/filesystems/relayfs.txt +++ /dev/null | |||
@@ -1,442 +0,0 @@ | |||
1 | |||
2 | relayfs - a high-speed data relay filesystem | ||
3 | ============================================ | ||
4 | |||
5 | relayfs is a filesystem designed to provide an efficient mechanism for | ||
6 | tools and facilities to relay large and potentially sustained streams | ||
7 | of data from kernel space to user space. | ||
8 | |||
9 | The main abstraction of relayfs is the 'channel'. A channel consists | ||
10 | of a set of per-cpu kernel buffers each represented by a file in the | ||
11 | relayfs filesystem. Kernel clients write into a channel using | ||
12 | efficient write functions which automatically log to the current cpu's | ||
13 | channel buffer. User space applications mmap() the per-cpu files and | ||
14 | retrieve the data as it becomes available. | ||
15 | |||
16 | The format of the data logged into the channel buffers is completely | ||
17 | up to the relayfs client; relayfs does however provide hooks which | ||
18 | allow clients to impose some structure on the buffer data. Nor does | ||
19 | relayfs implement any form of data filtering - this also is left to | ||
20 | the client. The purpose is to keep relayfs as simple as possible. | ||
21 | |||
22 | This document provides an overview of the relayfs API. The details of | ||
23 | the function parameters are documented along with the functions in the | ||
24 | filesystem code - please see that for details. | ||
25 | |||
26 | Semantics | ||
27 | ========= | ||
28 | |||
29 | Each relayfs channel has one buffer per CPU, each buffer has one or | ||
30 | more sub-buffers. Messages are written to the first sub-buffer until | ||
31 | it is too full to contain a new message, in which case it it is | ||
32 | written to the next (if available). Messages are never split across | ||
33 | sub-buffers. At this point, userspace can be notified so it empties | ||
34 | the first sub-buffer, while the kernel continues writing to the next. | ||
35 | |||
36 | When notified that a sub-buffer is full, the kernel knows how many | ||
37 | bytes of it are padding i.e. unused. Userspace can use this knowledge | ||
38 | to copy only valid data. | ||
39 | |||
40 | After copying it, userspace can notify the kernel that a sub-buffer | ||
41 | has been consumed. | ||
42 | |||
43 | relayfs can operate in a mode where it will overwrite data not yet | ||
44 | collected by userspace, and not wait for it to consume it. | ||
45 | |||
46 | relayfs itself does not provide for communication of such data between | ||
47 | userspace and kernel, allowing the kernel side to remain simple and | ||
48 | not impose a single interface on userspace. It does provide a set of | ||
49 | examples and a separate helper though, described below. | ||
50 | |||
51 | klog and relay-apps example code | ||
52 | ================================ | ||
53 | |||
54 | relayfs itself is ready to use, but to make things easier, a couple | ||
55 | simple utility functions and a set of examples are provided. | ||
56 | |||
57 | The relay-apps example tarball, available on the relayfs sourceforge | ||
58 | site, contains a set of self-contained examples, each consisting of a | ||
59 | pair of .c files containing boilerplate code for each of the user and | ||
60 | kernel sides of a relayfs application; combined these two sets of | ||
61 | boilerplate code provide glue to easily stream data to disk, without | ||
62 | having to bother with mundane housekeeping chores. | ||
63 | |||
64 | The 'klog debugging functions' patch (klog.patch in the relay-apps | ||
65 | tarball) provides a couple of high-level logging functions to the | ||
66 | kernel which allow writing formatted text or raw data to a channel, | ||
67 | regardless of whether a channel to write into exists or not, or | ||
68 | whether relayfs is compiled into the kernel or is configured as a | ||
69 | module. These functions allow you to put unconditional 'trace' | ||
70 | statements anywhere in the kernel or kernel modules; only when there | ||
71 | is a 'klog handler' registered will data actually be logged (see the | ||
72 | klog and kleak examples for details). | ||
73 | |||
74 | It is of course possible to use relayfs from scratch i.e. without | ||
75 | using any of the relay-apps example code or klog, but you'll have to | ||
76 | implement communication between userspace and kernel, allowing both to | ||
77 | convey the state of buffers (full, empty, amount of padding). | ||
78 | |||
79 | klog and the relay-apps examples can be found in the relay-apps | ||
80 | tarball on http://relayfs.sourceforge.net | ||
81 | |||
82 | |||
83 | The relayfs user space API | ||
84 | ========================== | ||
85 | |||
86 | relayfs implements basic file operations for user space access to | ||
87 | relayfs channel buffer data. Here are the file operations that are | ||
88 | available and some comments regarding their behavior: | ||
89 | |||
90 | open() enables user to open an _existing_ buffer. | ||
91 | |||
92 | mmap() results in channel buffer being mapped into the caller's | ||
93 | memory space. Note that you can't do a partial mmap - you must | ||
94 | map the entire file, which is NRBUF * SUBBUFSIZE. | ||
95 | |||
96 | read() read the contents of a channel buffer. The bytes read are | ||
97 | 'consumed' by the reader i.e. they won't be available again | ||
98 | to subsequent reads. If the channel is being used in | ||
99 | no-overwrite mode (the default), it can be read at any time | ||
100 | even if there's an active kernel writer. If the channel is | ||
101 | being used in overwrite mode and there are active channel | ||
102 | writers, results may be unpredictable - users should make | ||
103 | sure that all logging to the channel has ended before using | ||
104 | read() with overwrite mode. | ||
105 | |||
106 | poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are | ||
107 | notified when sub-buffer boundaries are crossed. | ||
108 | |||
109 | close() decrements the channel buffer's refcount. When the refcount | ||
110 | reaches 0 i.e. when no process or kernel client has the buffer | ||
111 | open, the channel buffer is freed. | ||
112 | |||
113 | |||
114 | In order for a user application to make use of relayfs files, the | ||
115 | relayfs filesystem must be mounted. For example, | ||
116 | |||
117 | mount -t relayfs relayfs /mnt/relay | ||
118 | |||
119 | NOTE: relayfs doesn't need to be mounted for kernel clients to create | ||
120 | or use channels - it only needs to be mounted when user space | ||
121 | applications need access to the buffer data. | ||
122 | |||
123 | |||
124 | The relayfs kernel API | ||
125 | ====================== | ||
126 | |||
127 | Here's a summary of the API relayfs provides to in-kernel clients: | ||
128 | |||
129 | |||
130 | channel management functions: | ||
131 | |||
132 | relay_open(base_filename, parent, subbuf_size, n_subbufs, | ||
133 | callbacks) | ||
134 | relay_close(chan) | ||
135 | relay_flush(chan) | ||
136 | relay_reset(chan) | ||
137 | relayfs_create_dir(name, parent) | ||
138 | relayfs_remove_dir(dentry) | ||
139 | relayfs_create_file(name, parent, mode, fops, data) | ||
140 | relayfs_remove_file(dentry) | ||
141 | |||
142 | channel management typically called on instigation of userspace: | ||
143 | |||
144 | relay_subbufs_consumed(chan, cpu, subbufs_consumed) | ||
145 | |||
146 | write functions: | ||
147 | |||
148 | relay_write(chan, data, length) | ||
149 | __relay_write(chan, data, length) | ||
150 | relay_reserve(chan, length) | ||
151 | |||
152 | callbacks: | ||
153 | |||
154 | subbuf_start(buf, subbuf, prev_subbuf, prev_padding) | ||
155 | buf_mapped(buf, filp) | ||
156 | buf_unmapped(buf, filp) | ||
157 | create_buf_file(filename, parent, mode, buf, is_global) | ||
158 | remove_buf_file(dentry) | ||
159 | |||
160 | helper functions: | ||
161 | |||
162 | relay_buf_full(buf) | ||
163 | subbuf_start_reserve(buf, length) | ||
164 | |||
165 | |||
166 | Creating a channel | ||
167 | ------------------ | ||
168 | |||
169 | relay_open() is used to create a channel, along with its per-cpu | ||
170 | channel buffers. Each channel buffer will have an associated file | ||
171 | created for it in the relayfs filesystem, which can be opened and | ||
172 | mmapped from user space if desired. The files are named | ||
173 | basename0...basenameN-1 where N is the number of online cpus, and by | ||
174 | default will be created in the root of the filesystem. If you want a | ||
175 | directory structure to contain your relayfs files, you can create it | ||
176 | with relayfs_create_dir() and pass the parent directory to | ||
177 | relay_open(). Clients are responsible for cleaning up any directory | ||
178 | structure they create when the channel is closed - use | ||
179 | relayfs_remove_dir() for that. | ||
180 | |||
181 | The total size of each per-cpu buffer is calculated by multiplying the | ||
182 | number of sub-buffers by the sub-buffer size passed into relay_open(). | ||
183 | The idea behind sub-buffers is that they're basically an extension of | ||
184 | double-buffering to N buffers, and they also allow applications to | ||
185 | easily implement random-access-on-buffer-boundary schemes, which can | ||
186 | be important for some high-volume applications. The number and size | ||
187 | of sub-buffers is completely dependent on the application and even for | ||
188 | the same application, different conditions will warrant different | ||
189 | values for these parameters at different times. Typically, the right | ||
190 | values to use are best decided after some experimentation; in general, | ||
191 | though, it's safe to assume that having only 1 sub-buffer is a bad | ||
192 | idea - you're guaranteed to either overwrite data or lose events | ||
193 | depending on the channel mode being used. | ||
194 | |||
195 | Channel 'modes' | ||
196 | --------------- | ||
197 | |||
198 | relayfs channels can be used in either of two modes - 'overwrite' or | ||
199 | 'no-overwrite'. The mode is entirely determined by the implementation | ||
200 | of the subbuf_start() callback, as described below. In 'overwrite' | ||
201 | mode, also known as 'flight recorder' mode, writes continuously cycle | ||
202 | around the buffer and will never fail, but will unconditionally | ||
203 | overwrite old data regardless of whether it's actually been consumed. | ||
204 | In no-overwrite mode, writes will fail i.e. data will be lost, if the | ||
205 | number of unconsumed sub-buffers equals the total number of | ||
206 | sub-buffers in the channel. It should be clear that if there is no | ||
207 | consumer or if the consumer can't consume sub-buffers fast enought, | ||
208 | data will be lost in either case; the only difference is whether data | ||
209 | is lost from the beginning or the end of a buffer. | ||
210 | |||
211 | As explained above, a relayfs channel is made of up one or more | ||
212 | per-cpu channel buffers, each implemented as a circular buffer | ||
213 | subdivided into one or more sub-buffers. Messages are written into | ||
214 | the current sub-buffer of the channel's current per-cpu buffer via the | ||
215 | write functions described below. Whenever a message can't fit into | ||
216 | the current sub-buffer, because there's no room left for it, the | ||
217 | client is notified via the subbuf_start() callback that a switch to a | ||
218 | new sub-buffer is about to occur. The client uses this callback to 1) | ||
219 | initialize the next sub-buffer if appropriate 2) finalize the previous | ||
220 | sub-buffer if appropriate and 3) return a boolean value indicating | ||
221 | whether or not to actually go ahead with the sub-buffer switch. | ||
222 | |||
223 | To implement 'no-overwrite' mode, the userspace client would provide | ||
224 | an implementation of the subbuf_start() callback something like the | ||
225 | following: | ||
226 | |||
227 | static int subbuf_start(struct rchan_buf *buf, | ||
228 | void *subbuf, | ||
229 | void *prev_subbuf, | ||
230 | unsigned int prev_padding) | ||
231 | { | ||
232 | if (prev_subbuf) | ||
233 | *((unsigned *)prev_subbuf) = prev_padding; | ||
234 | |||
235 | if (relay_buf_full(buf)) | ||
236 | return 0; | ||
237 | |||
238 | subbuf_start_reserve(buf, sizeof(unsigned int)); | ||
239 | |||
240 | return 1; | ||
241 | } | ||
242 | |||
243 | If the current buffer is full i.e. all sub-buffers remain unconsumed, | ||
244 | the callback returns 0 to indicate that the buffer switch should not | ||
245 | occur yet i.e. until the consumer has had a chance to read the current | ||
246 | set of ready sub-buffers. For the relay_buf_full() function to make | ||
247 | sense, the consumer is reponsible for notifying relayfs when | ||
248 | sub-buffers have been consumed via relay_subbufs_consumed(). Any | ||
249 | subsequent attempts to write into the buffer will again invoke the | ||
250 | subbuf_start() callback with the same parameters; only when the | ||
251 | consumer has consumed one or more of the ready sub-buffers will | ||
252 | relay_buf_full() return 0, in which case the buffer switch can | ||
253 | continue. | ||
254 | |||
255 | The implementation of the subbuf_start() callback for 'overwrite' mode | ||
256 | would be very similar: | ||
257 | |||
258 | static int subbuf_start(struct rchan_buf *buf, | ||
259 | void *subbuf, | ||
260 | void *prev_subbuf, | ||
261 | unsigned int prev_padding) | ||
262 | { | ||
263 | if (prev_subbuf) | ||
264 | *((unsigned *)prev_subbuf) = prev_padding; | ||
265 | |||
266 | subbuf_start_reserve(buf, sizeof(unsigned int)); | ||
267 | |||
268 | return 1; | ||
269 | } | ||
270 | |||
271 | In this case, the relay_buf_full() check is meaningless and the | ||
272 | callback always returns 1, causing the buffer switch to occur | ||
273 | unconditionally. It's also meaningless for the client to use the | ||
274 | relay_subbufs_consumed() function in this mode, as it's never | ||
275 | consulted. | ||
276 | |||
277 | The default subbuf_start() implementation, used if the client doesn't | ||
278 | define any callbacks, or doesn't define the subbuf_start() callback, | ||
279 | implements the simplest possible 'no-overwrite' mode i.e. it does | ||
280 | nothing but return 0. | ||
281 | |||
282 | Header information can be reserved at the beginning of each sub-buffer | ||
283 | by calling the subbuf_start_reserve() helper function from within the | ||
284 | subbuf_start() callback. This reserved area can be used to store | ||
285 | whatever information the client wants. In the example above, room is | ||
286 | reserved in each sub-buffer to store the padding count for that | ||
287 | sub-buffer. This is filled in for the previous sub-buffer in the | ||
288 | subbuf_start() implementation; the padding value for the previous | ||
289 | sub-buffer is passed into the subbuf_start() callback along with a | ||
290 | pointer to the previous sub-buffer, since the padding value isn't | ||
291 | known until a sub-buffer is filled. The subbuf_start() callback is | ||
292 | also called for the first sub-buffer when the channel is opened, to | ||
293 | give the client a chance to reserve space in it. In this case the | ||
294 | previous sub-buffer pointer passed into the callback will be NULL, so | ||
295 | the client should check the value of the prev_subbuf pointer before | ||
296 | writing into the previous sub-buffer. | ||
297 | |||
298 | Writing to a channel | ||
299 | -------------------- | ||
300 | |||
301 | kernel clients write data into the current cpu's channel buffer using | ||
302 | relay_write() or __relay_write(). relay_write() is the main logging | ||
303 | function - it uses local_irqsave() to protect the buffer and should be | ||
304 | used if you might be logging from interrupt context. If you know | ||
305 | you'll never be logging from interrupt context, you can use | ||
306 | __relay_write(), which only disables preemption. These functions | ||
307 | don't return a value, so you can't determine whether or not they | ||
308 | failed - the assumption is that you wouldn't want to check a return | ||
309 | value in the fast logging path anyway, and that they'll always succeed | ||
310 | unless the buffer is full and no-overwrite mode is being used, in | ||
311 | which case you can detect a failed write in the subbuf_start() | ||
312 | callback by calling the relay_buf_full() helper function. | ||
313 | |||
314 | relay_reserve() is used to reserve a slot in a channel buffer which | ||
315 | can be written to later. This would typically be used in applications | ||
316 | that need to write directly into a channel buffer without having to | ||
317 | stage data in a temporary buffer beforehand. Because the actual write | ||
318 | may not happen immediately after the slot is reserved, applications | ||
319 | using relay_reserve() can keep a count of the number of bytes actually | ||
320 | written, either in space reserved in the sub-buffers themselves or as | ||
321 | a separate array. See the 'reserve' example in the relay-apps tarball | ||
322 | at http://relayfs.sourceforge.net for an example of how this can be | ||
323 | done. Because the write is under control of the client and is | ||
324 | separated from the reserve, relay_reserve() doesn't protect the buffer | ||
325 | at all - it's up to the client to provide the appropriate | ||
326 | synchronization when using relay_reserve(). | ||
327 | |||
328 | Closing a channel | ||
329 | ----------------- | ||
330 | |||
331 | The client calls relay_close() when it's finished using the channel. | ||
332 | The channel and its associated buffers are destroyed when there are no | ||
333 | longer any references to any of the channel buffers. relay_flush() | ||
334 | forces a sub-buffer switch on all the channel buffers, and can be used | ||
335 | to finalize and process the last sub-buffers before the channel is | ||
336 | closed. | ||
337 | |||
338 | Creating non-relay files | ||
339 | ------------------------ | ||
340 | |||
341 | relay_open() automatically creates files in the relayfs filesystem to | ||
342 | represent the per-cpu kernel buffers; it's often useful for | ||
343 | applications to be able to create their own files alongside the relay | ||
344 | files in the relayfs filesystem as well e.g. 'control' files much like | ||
345 | those created in /proc or debugfs for similar purposes, used to | ||
346 | communicate control information between the kernel and user sides of a | ||
347 | relayfs application. For this purpose the relayfs_create_file() and | ||
348 | relayfs_remove_file() API functions exist. For relayfs_create_file(), | ||
349 | the caller passes in a set of user-defined file operations to be used | ||
350 | for the file and an optional void * to a user-specified data item, | ||
351 | which will be accessible via inode->u.generic_ip (see the relay-apps | ||
352 | tarball for examples). The file_operations are a required parameter | ||
353 | to relayfs_create_file() and thus the semantics of these files are | ||
354 | completely defined by the caller. | ||
355 | |||
356 | See the relay-apps tarball at http://relayfs.sourceforge.net for | ||
357 | examples of how these non-relay files are meant to be used. | ||
358 | |||
359 | Creating relay files in other filesystems | ||
360 | ----------------------------------------- | ||
361 | |||
362 | By default of course, relay_open() creates relay files in the relayfs | ||
363 | filesystem. Because relay_file_operations is exported, however, it's | ||
364 | also possible to create and use relay files in other pseudo-filesytems | ||
365 | such as debugfs. | ||
366 | |||
367 | For this purpose, two callback functions are provided, | ||
368 | create_buf_file() and remove_buf_file(). create_buf_file() is called | ||
369 | once for each per-cpu buffer from relay_open() to allow the client to | ||
370 | create a file to be used to represent the corresponding buffer; if | ||
371 | this callback is not defined, the default implementation will create | ||
372 | and return a file in the relayfs filesystem to represent the buffer. | ||
373 | The callback should return the dentry of the file created to represent | ||
374 | the relay buffer. Note that the parent directory passed to | ||
375 | relay_open() (and passed along to the callback), if specified, must | ||
376 | exist in the same filesystem the new relay file is created in. If | ||
377 | create_buf_file() is defined, remove_buf_file() must also be defined; | ||
378 | it's responsible for deleting the file(s) created in create_buf_file() | ||
379 | and is called during relay_close(). | ||
380 | |||
381 | The create_buf_file() implementation can also be defined in such a way | ||
382 | as to allow the creation of a single 'global' buffer instead of the | ||
383 | default per-cpu set. This can be useful for applications interested | ||
384 | mainly in seeing the relative ordering of system-wide events without | ||
385 | the need to bother with saving explicit timestamps for the purpose of | ||
386 | merging/sorting per-cpu files in a postprocessing step. | ||
387 | |||
388 | To have relay_open() create a global buffer, the create_buf_file() | ||
389 | implementation should set the value of the is_global outparam to a | ||
390 | non-zero value in addition to creating the file that will be used to | ||
391 | represent the single buffer. In the case of a global buffer, | ||
392 | create_buf_file() and remove_buf_file() will be called only once. The | ||
393 | normal channel-writing functions e.g. relay_write() can still be used | ||
394 | - writes from any cpu will transparently end up in the global buffer - | ||
395 | but since it is a global buffer, callers should make sure they use the | ||
396 | proper locking for such a buffer, either by wrapping writes in a | ||
397 | spinlock, or by copying a write function from relayfs_fs.h and | ||
398 | creating a local version that internally does the proper locking. | ||
399 | |||
400 | See the 'exported-relayfile' examples in the relay-apps tarball for | ||
401 | examples of creating and using relay files in debugfs. | ||
402 | |||
403 | Misc | ||
404 | ---- | ||
405 | |||
406 | Some applications may want to keep a channel around and re-use it | ||
407 | rather than open and close a new channel for each use. relay_reset() | ||
408 | can be used for this purpose - it resets a channel to its initial | ||
409 | state without reallocating channel buffer memory or destroying | ||
410 | existing mappings. It should however only be called when it's safe to | ||
411 | do so i.e. when the channel isn't currently being written to. | ||
412 | |||
413 | Finally, there are a couple of utility callbacks that can be used for | ||
414 | different purposes. buf_mapped() is called whenever a channel buffer | ||
415 | is mmapped from user space and buf_unmapped() is called when it's | ||
416 | unmapped. The client can use this notification to trigger actions | ||
417 | within the kernel application, such as enabling/disabling logging to | ||
418 | the channel. | ||
419 | |||
420 | |||
421 | Resources | ||
422 | ========= | ||
423 | |||
424 | For news, example code, mailing list, etc. see the relayfs homepage: | ||
425 | |||
426 | http://relayfs.sourceforge.net | ||
427 | |||
428 | |||
429 | Credits | ||
430 | ======= | ||
431 | |||
432 | The ideas and specs for relayfs came about as a result of discussions | ||
433 | on tracing involving the following: | ||
434 | |||
435 | Michel Dagenais <michel.dagenais@polymtl.ca> | ||
436 | Richard Moore <richardj_moore@uk.ibm.com> | ||
437 | Bob Wisniewski <bob@watson.ibm.com> | ||
438 | Karim Yaghmour <karim@opersys.com> | ||
439 | Tom Zanussi <zanussi@us.ibm.com> | ||
440 | |||
441 | Also thanks to Hubertus Franke for a lot of useful suggestions and bug | ||
442 | reports. | ||
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 9d3aed628bc1..1cb7e8be927a 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt | |||
@@ -113,8 +113,8 @@ members are defined: | |||
113 | struct file_system_type { | 113 | struct file_system_type { |
114 | const char *name; | 114 | const char *name; |
115 | int fs_flags; | 115 | int fs_flags; |
116 | struct int (*get_sb) (struct file_system_type *, int, | 116 | int (*get_sb) (struct file_system_type *, int, |
117 | const char *, void *, struct vfsmount *); | 117 | const char *, void *, struct vfsmount *); |
118 | void (*kill_sb) (struct super_block *); | 118 | void (*kill_sb) (struct super_block *); |
119 | struct module *owner; | 119 | struct module *owner; |
120 | struct file_system_type * next; | 120 | struct file_system_type * next; |
diff --git a/Documentation/hwmon/abituguru b/Documentation/hwmon/abituguru index 69cdb527d58f..b2c0d61b39a2 100644 --- a/Documentation/hwmon/abituguru +++ b/Documentation/hwmon/abituguru | |||
@@ -2,13 +2,36 @@ Kernel driver abituguru | |||
2 | ======================= | 2 | ======================= |
3 | 3 | ||
4 | Supported chips: | 4 | Supported chips: |
5 | * Abit uGuru (Hardware Monitor part only) | 5 | * Abit uGuru revision 1-3 (Hardware Monitor part only) |
6 | Prefix: 'abituguru' | 6 | Prefix: 'abituguru' |
7 | Addresses scanned: ISA 0x0E0 | 7 | Addresses scanned: ISA 0x0E0 |
8 | Datasheet: Not available, this driver is based on reverse engineering. | 8 | Datasheet: Not available, this driver is based on reverse engineering. |
9 | A "Datasheet" has been written based on the reverse engineering it | 9 | A "Datasheet" has been written based on the reverse engineering it |
10 | should be available in the same dir as this file under the name | 10 | should be available in the same dir as this file under the name |
11 | abituguru-datasheet. | 11 | abituguru-datasheet. |
12 | Note: | ||
13 | The uGuru is a microcontroller with onboard firmware which programs | ||
14 | it to behave as a hwmon IC. There are many different revisions of the | ||
15 | firmware and thus effectivly many different revisions of the uGuru. | ||
16 | Below is an incomplete list with which revisions are used for which | ||
17 | Motherboards: | ||
18 | uGuru 1.00 ~ 1.24 (AI7, KV8-MAX3, AN7) (1) | ||
19 | uGuru 2.0.0.0 ~ 2.0.4.2 (KV8-PRO) | ||
20 | uGuru 2.1.0.0 ~ 2.1.2.8 (AS8, AV8, AA8, AG8, AA8XE, AX8) | ||
21 | uGuru 2.2.0.0 ~ 2.2.0.6 (AA8 Fatal1ty) | ||
22 | uGuru 2.3.0.0 ~ 2.3.0.9 (AN8) | ||
23 | uGuru 3.0.0.0 ~ 3.0.1.2 (AW8, AL8, NI8) | ||
24 | uGuru 4.xxxxx? (AT8 32X) (2) | ||
25 | 1) For revisions 2 and 3 uGuru's the driver can autodetect the | ||
26 | sensortype (Volt or Temp) for bank1 sensors, for revision 1 uGuru's | ||
27 | this doesnot always work. For these uGuru's the autodection can | ||
28 | be overriden with the bank1_types module param. For all 3 known | ||
29 | revison 1 motherboards the correct use of this param is: | ||
30 | bank1_types=1,1,0,0,0,0,0,2,0,0,0,0,2,0,0,1 | ||
31 | You may also need to specify the fan_sensors option for these boards | ||
32 | fan_sensors=5 | ||
33 | 2) The current version of the abituguru driver is known to NOT work | ||
34 | on these Motherboards | ||
12 | 35 | ||
13 | Authors: | 36 | Authors: |
14 | Hans de Goede <j.w.r.degoede@hhs.nl>, | 37 | Hans de Goede <j.w.r.degoede@hhs.nl>, |
@@ -22,6 +45,11 @@ Module Parameters | |||
22 | * force: bool Force detection. Note this parameter only causes the | 45 | * force: bool Force detection. Note this parameter only causes the |
23 | detection to be skipped, if the uGuru can't be read | 46 | detection to be skipped, if the uGuru can't be read |
24 | the module initialization (insmod) will still fail. | 47 | the module initialization (insmod) will still fail. |
48 | * bank1_types: int[] Bank1 sensortype autodetection override: | ||
49 | -1 autodetect (default) | ||
50 | 0 volt sensor | ||
51 | 1 temp sensor | ||
52 | 2 not connected | ||
25 | * fan_sensors: int Tell the driver how many fan speed sensors there are | 53 | * fan_sensors: int Tell the driver how many fan speed sensors there are |
26 | on your motherboard. Default: 0 (autodetect). | 54 | on your motherboard. Default: 0 (autodetect). |
27 | * pwms: int Tell the driver how many fan speed controls (fan | 55 | * pwms: int Tell the driver how many fan speed controls (fan |
@@ -29,7 +57,7 @@ Module Parameters | |||
29 | * verbose: int How verbose should the driver be? (0-3): | 57 | * verbose: int How verbose should the driver be? (0-3): |
30 | 0 normal output | 58 | 0 normal output |
31 | 1 + verbose error reporting | 59 | 1 + verbose error reporting |
32 | 2 + sensors type probing info\n" | 60 | 2 + sensors type probing info (default) |
33 | 3 + retryable error reporting | 61 | 3 + retryable error reporting |
34 | Default: 2 (the driver is still in the testing phase) | 62 | Default: 2 (the driver is still in the testing phase) |
35 | 63 | ||
diff --git a/Documentation/hwmon/it87 b/Documentation/hwmon/it87 index 9555be1ed999..e783fd62e308 100644 --- a/Documentation/hwmon/it87 +++ b/Documentation/hwmon/it87 | |||
@@ -13,12 +13,25 @@ Supported chips: | |||
13 | from Super I/O config space (8 I/O ports) | 13 | from Super I/O config space (8 I/O ports) |
14 | Datasheet: Publicly available at the ITE website | 14 | Datasheet: Publicly available at the ITE website |
15 | http://www.ite.com.tw/ | 15 | http://www.ite.com.tw/ |
16 | * IT8716F | ||
17 | Prefix: 'it8716' | ||
18 | Addresses scanned: from Super I/O config space (8 I/O ports) | ||
19 | Datasheet: Publicly available at the ITE website | ||
20 | http://www.ite.com.tw/product_info/file/pc/IT8716F_V0.3.ZIP | ||
21 | * IT8718F | ||
22 | Prefix: 'it8718' | ||
23 | Addresses scanned: from Super I/O config space (8 I/O ports) | ||
24 | Datasheet: Publicly available at the ITE website | ||
25 | http://www.ite.com.tw/product_info/file/pc/IT8718F_V0.2.zip | ||
26 | http://www.ite.com.tw/product_info/file/pc/IT8718F_V0%203_(for%20C%20version).zip | ||
16 | * SiS950 [clone of IT8705F] | 27 | * SiS950 [clone of IT8705F] |
17 | Prefix: 'it87' | 28 | Prefix: 'it87' |
18 | Addresses scanned: from Super I/O config space (8 I/O ports) | 29 | Addresses scanned: from Super I/O config space (8 I/O ports) |
19 | Datasheet: No longer be available | 30 | Datasheet: No longer be available |
20 | 31 | ||
21 | Author: Christophe Gauthron <chrisg@0-in.com> | 32 | Authors: |
33 | Christophe Gauthron <chrisg@0-in.com> | ||
34 | Jean Delvare <khali@linux-fr.org> | ||
22 | 35 | ||
23 | 36 | ||
24 | Module Parameters | 37 | Module Parameters |
@@ -43,26 +56,46 @@ Module Parameters | |||
43 | Description | 56 | Description |
44 | ----------- | 57 | ----------- |
45 | 58 | ||
46 | This driver implements support for the IT8705F, IT8712F and SiS950 chips. | 59 | This driver implements support for the IT8705F, IT8712F, IT8716F, |
47 | 60 | IT8718F and SiS950 chips. | |
48 | This driver also supports IT8712F, which adds SMBus access, and a VID | ||
49 | input, used to report the Vcore voltage of the Pentium processor. | ||
50 | The IT8712F additionally features VID inputs. | ||
51 | 61 | ||
52 | These chips are 'Super I/O chips', supporting floppy disks, infrared ports, | 62 | These chips are 'Super I/O chips', supporting floppy disks, infrared ports, |
53 | joysticks and other miscellaneous stuff. For hardware monitoring, they | 63 | joysticks and other miscellaneous stuff. For hardware monitoring, they |
54 | include an 'environment controller' with 3 temperature sensors, 3 fan | 64 | include an 'environment controller' with 3 temperature sensors, 3 fan |
55 | rotation speed sensors, 8 voltage sensors, and associated alarms. | 65 | rotation speed sensors, 8 voltage sensors, and associated alarms. |
56 | 66 | ||
67 | The IT8712F and IT8716F additionally feature VID inputs, used to report | ||
68 | the Vcore voltage of the processor. The early IT8712F have 5 VID pins, | ||
69 | the IT8716F and late IT8712F have 6. They are shared with other functions | ||
70 | though, so the functionality may not be available on a given system. | ||
71 | The driver dumbly assume it is there. | ||
72 | |||
73 | The IT8718F also features VID inputs (up to 8 pins) but the value is | ||
74 | stored in the Super-I/O configuration space. Due to technical limitations, | ||
75 | this value can currently only be read once at initialization time, so | ||
76 | the driver won't notice and report changes in the VID value. The two | ||
77 | upper VID bits share their pins with voltage inputs (in5 and in6) so you | ||
78 | can't have both on a given board. | ||
79 | |||
80 | The IT8716F, IT8718F and later IT8712F revisions have support for | ||
81 | 2 additional fans. They are not yet supported by the driver. | ||
82 | |||
83 | The IT8716F and IT8718F, and late IT8712F and IT8705F also have optional | ||
84 | 16-bit tachometer counters for fans 1 to 3. This is better (no more fan | ||
85 | clock divider mess) but not compatible with the older chips and | ||
86 | revisions. For now, the driver only uses the 16-bit mode on the | ||
87 | IT8716F and IT8718F. | ||
88 | |||
57 | Temperatures are measured in degrees Celsius. An alarm is triggered once | 89 | Temperatures are measured in degrees Celsius. An alarm is triggered once |
58 | when the Overtemperature Shutdown limit is crossed. | 90 | when the Overtemperature Shutdown limit is crossed. |
59 | 91 | ||
60 | Fan rotation speeds are reported in RPM (rotations per minute). An alarm is | 92 | Fan rotation speeds are reported in RPM (rotations per minute). An alarm is |
61 | triggered if the rotation speed has dropped below a programmable limit. Fan | 93 | triggered if the rotation speed has dropped below a programmable limit. When |
62 | readings can be divided by a programmable divider (1, 2, 4 or 8) to give the | 94 | 16-bit tachometer counters aren't used, fan readings can be divided by |
63 | readings more range or accuracy. Not all RPM values can accurately be | 95 | a programmable divider (1, 2, 4 or 8) to give the readings more range or |
64 | represented, so some rounding is done. With a divider of 2, the lowest | 96 | accuracy. With a divider of 2, the lowest representable value is around |
65 | representable value is around 2600 RPM. | 97 | 2600 RPM. Not all RPM values can accurately be represented, so some rounding |
98 | is done. | ||
66 | 99 | ||
67 | Voltage sensors (also known as IN sensors) report their values in volts. An | 100 | Voltage sensors (also known as IN sensors) report their values in volts. An |
68 | alarm is triggered if the voltage has crossed a programmable minimum or | 101 | alarm is triggered if the voltage has crossed a programmable minimum or |
@@ -71,9 +104,9 @@ zero'; this is important for negative voltage measurements. All voltage | |||
71 | inputs can measure voltages between 0 and 4.08 volts, with a resolution of | 104 | inputs can measure voltages between 0 and 4.08 volts, with a resolution of |
72 | 0.016 volt. The battery voltage in8 does not have limit registers. | 105 | 0.016 volt. The battery voltage in8 does not have limit registers. |
73 | 106 | ||
74 | The VID lines (IT8712F only) encode the core voltage value: the voltage | 107 | The VID lines (IT8712F/IT8716F/IT8718F) encode the core voltage value: |
75 | level your processor should work with. This is hardcoded by the mainboard | 108 | the voltage level your processor should work with. This is hardcoded by |
76 | and/or processor itself. It is a value in volts. | 109 | the mainboard and/or processor itself. It is a value in volts. |
77 | 110 | ||
78 | If an alarm triggers, it will remain triggered until the hardware register | 111 | If an alarm triggers, it will remain triggered until the hardware register |
79 | is read at least once. This means that the cause for the alarm may already | 112 | is read at least once. This means that the cause for the alarm may already |
diff --git a/Documentation/hwmon/k8temp b/Documentation/hwmon/k8temp new file mode 100644 index 000000000000..bab445ab0f52 --- /dev/null +++ b/Documentation/hwmon/k8temp | |||
@@ -0,0 +1,52 @@ | |||
1 | Kernel driver k8temp | ||
2 | ==================== | ||
3 | |||
4 | Supported chips: | ||
5 | * AMD K8 CPU | ||
6 | Prefix: 'k8temp' | ||
7 | Addresses scanned: PCI space | ||
8 | Datasheet: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/32559.pdf | ||
9 | |||
10 | Author: Rudolf Marek | ||
11 | Contact: Rudolf Marek <r.marek@sh.cvut.cz> | ||
12 | |||
13 | Description | ||
14 | ----------- | ||
15 | |||
16 | This driver permits reading temperature sensor(s) embedded inside AMD K8 CPUs. | ||
17 | Official documentation says that it works from revision F of K8 core, but | ||
18 | in fact it seems to be implemented for all revisions of K8 except the first | ||
19 | two revisions (SH-B0 and SH-B3). | ||
20 | |||
21 | There can be up to four temperature sensors inside single CPU. The driver | ||
22 | will auto-detect the sensors and will display only temperatures from | ||
23 | implemented sensors. | ||
24 | |||
25 | Mapping of /sys files is as follows: | ||
26 | |||
27 | temp1_input - temperature of Core 0 and "place" 0 | ||
28 | temp2_input - temperature of Core 0 and "place" 1 | ||
29 | temp3_input - temperature of Core 1 and "place" 0 | ||
30 | temp4_input - temperature of Core 1 and "place" 1 | ||
31 | |||
32 | Temperatures are measured in degrees Celsius and measurement resolution is | ||
33 | 1 degree C. It is expected that future CPU will have better resolution. The | ||
34 | temperature is updated once a second. Valid temperatures are from -49 to | ||
35 | 206 degrees C. | ||
36 | |||
37 | Temperature known as TCaseMax was specified for processors up to revision E. | ||
38 | This temperature is defined as temperature between heat-spreader and CPU | ||
39 | case, so the internal CPU temperature supplied by this driver can be higher. | ||
40 | There is no easy way how to measure the temperature which will correlate | ||
41 | with TCaseMax temperature. | ||
42 | |||
43 | For newer revisions of CPU (rev F, socket AM2) there is a mathematically | ||
44 | computed temperature called TControl, which must be lower than TControlMax. | ||
45 | |||
46 | The relationship is following: | ||
47 | |||
48 | temp1_input - TjOffset*2 < TControlMax, | ||
49 | |||
50 | TjOffset is not yet exported by the driver, TControlMax is usually | ||
51 | 70 degrees C. The rule of the thumb -> CPU temperature should not cross | ||
52 | 60 degrees C too much. | ||
diff --git a/Documentation/hwmon/vt1211 b/Documentation/hwmon/vt1211 new file mode 100644 index 000000000000..77fa633b97a8 --- /dev/null +++ b/Documentation/hwmon/vt1211 | |||
@@ -0,0 +1,206 @@ | |||
1 | Kernel driver vt1211 | ||
2 | ==================== | ||
3 | |||
4 | Supported chips: | ||
5 | * VIA VT1211 | ||
6 | Prefix: 'vt1211' | ||
7 | Addresses scanned: none, address read from Super-I/O config space | ||
8 | Datasheet: Provided by VIA upon request and under NDA | ||
9 | |||
10 | Authors: Juerg Haefliger <juergh@gmail.com> | ||
11 | |||
12 | This driver is based on the driver for kernel 2.4 by Mark D. Studebaker and | ||
13 | its port to kernel 2.6 by Lars Ekman. | ||
14 | |||
15 | Thanks to Joseph Chan and Fiona Gatt from VIA for providing documentation and | ||
16 | technical support. | ||
17 | |||
18 | |||
19 | Module Parameters | ||
20 | ----------------- | ||
21 | |||
22 | * uch_config: int Override the BIOS default universal channel (UCH) | ||
23 | configuration for channels 1-5. | ||
24 | Legal values are in the range of 0-31. Bit 0 maps to | ||
25 | UCH1, bit 1 maps to UCH2 and so on. Setting a bit to 1 | ||
26 | enables the thermal input of that particular UCH and | ||
27 | setting a bit to 0 enables the voltage input. | ||
28 | |||
29 | * int_mode: int Override the BIOS default temperature interrupt mode. | ||
30 | The only possible value is 0 which forces interrupt | ||
31 | mode 0. In this mode, any pending interrupt is cleared | ||
32 | when the status register is read but is regenerated as | ||
33 | long as the temperature stays above the hysteresis | ||
34 | limit. | ||
35 | |||
36 | Be aware that overriding BIOS defaults might cause some unwanted side effects! | ||
37 | |||
38 | |||
39 | Description | ||
40 | ----------- | ||
41 | |||
42 | The VIA VT1211 Super-I/O chip includes complete hardware monitoring | ||
43 | capabilities. It monitors 2 dedicated temperature sensor inputs (temp1 and | ||
44 | temp2), 1 dedicated voltage (in5) and 2 fans. Additionally, the chip | ||
45 | implements 5 universal input channels (UCH1-5) that can be individually | ||
46 | programmed to either monitor a voltage or a temperature. | ||
47 | |||
48 | This chip also provides manual and automatic control of fan speeds (according | ||
49 | to the datasheet). The driver only supports automatic control since the manual | ||
50 | mode doesn't seem to work as advertised in the datasheet. In fact I couldn't | ||
51 | get manual mode to work at all! Be aware that automatic mode hasn't been | ||
52 | tested very well (due to the fact that my EPIA M10000 doesn't have the fans | ||
53 | connected to the PWM outputs of the VT1211 :-(). | ||
54 | |||
55 | The following table shows the relationship between the vt1211 inputs and the | ||
56 | sysfs nodes. | ||
57 | |||
58 | Sensor Voltage Mode Temp Mode Default Use (from the datasheet) | ||
59 | ------ ------------ --------- -------------------------------- | ||
60 | Reading 1 temp1 Intel thermal diode | ||
61 | Reading 3 temp2 Internal thermal diode | ||
62 | UCH1/Reading2 in0 temp3 NTC type thermistor | ||
63 | UCH2 in1 temp4 +2.5V | ||
64 | UCH3 in2 temp5 VccP (processor core) | ||
65 | UCH4 in3 temp6 +5V | ||
66 | UCH5 in4 temp7 +12V | ||
67 | +3.3V in5 Internal VCC (+3.3V) | ||
68 | |||
69 | |||
70 | Voltage Monitoring | ||
71 | ------------------ | ||
72 | |||
73 | Voltages are sampled by an 8-bit ADC with a LSB of ~10mV. The supported input | ||
74 | range is thus from 0 to 2.60V. Voltage values outside of this range need | ||
75 | external scaling resistors. This external scaling needs to be compensated for | ||
76 | via compute lines in sensors.conf, like: | ||
77 | |||
78 | compute inx @*(1+R1/R2), @/(1+R1/R2) | ||
79 | |||
80 | The board level scaling resistors according to VIA's recommendation are as | ||
81 | follows. And this is of course totally dependent on the actual board | ||
82 | implementation :-) You will have to find documentation for your own | ||
83 | motherboard and edit sensors.conf accordingly. | ||
84 | |||
85 | Expected | ||
86 | Voltage R1 R2 Divider Raw Value | ||
87 | ----------------------------------------------- | ||
88 | +2.5V 2K 10K 1.2 2083 mV | ||
89 | VccP --- --- 1.0 1400 mV (1) | ||
90 | +5V 14K 10K 2.4 2083 mV | ||
91 | +12V 47K 10K 5.7 2105 mV | ||
92 | +3.3V (int) 2K 3.4K 1.588 3300 mV (2) | ||
93 | +3.3V (ext) 6.8K 10K 1.68 1964 mV | ||
94 | |||
95 | (1) Depending on the CPU (1.4V is for a VIA C3 Nehemiah). | ||
96 | (2) R1 and R2 for 3.3V (int) are internal to the VT1211 chip and the driver | ||
97 | performs the scaling and returns the properly scaled voltage value. | ||
98 | |||
99 | Each measured voltage has an associated low and high limit which triggers an | ||
100 | alarm when crossed. | ||
101 | |||
102 | |||
103 | Temperature Monitoring | ||
104 | ---------------------- | ||
105 | |||
106 | Temperatures are reported in millidegree Celsius. Each measured temperature | ||
107 | has a high limit which triggers an alarm if crossed. There is an associated | ||
108 | hysteresis value with each temperature below which the temperature has to drop | ||
109 | before the alarm is cleared (this is only true for interrupt mode 0). The | ||
110 | interrupt mode can be forced to 0 in case the BIOS doesn't do it | ||
111 | automatically. See the 'Module Parameters' section for details. | ||
112 | |||
113 | All temperature channels except temp2 are external. Temp2 is the VT1211 | ||
114 | internal thermal diode and the driver does all the scaling for temp2 and | ||
115 | returns the temperature in millidegree Celsius. For the external channels | ||
116 | temp1 and temp3-temp7, scaling depends on the board implementation and needs | ||
117 | to be performed in userspace via sensors.conf. | ||
118 | |||
119 | Temp1 is an Intel-type thermal diode which requires the following formula to | ||
120 | convert between sysfs readings and real temperatures: | ||
121 | |||
122 | compute temp1 (@-Offset)/Gain, (@*Gain)+Offset | ||
123 | |||
124 | According to the VIA VT1211 BIOS porting guide, the following gain and offset | ||
125 | values should be used: | ||
126 | |||
127 | Diode Type Offset Gain | ||
128 | ---------- ------ ---- | ||
129 | Intel CPU 88.638 0.9528 | ||
130 | 65.000 0.9686 *) | ||
131 | VIA C3 Ezra 83.869 0.9528 | ||
132 | VIA C3 Ezra-T 73.869 0.9528 | ||
133 | |||
134 | *) This is the formula from the lm_sensors 2.10.0 sensors.conf file. I don't | ||
135 | know where it comes from or how it was derived, it's just listed here for | ||
136 | completeness. | ||
137 | |||
138 | Temp3-temp7 support NTC thermistors. For these channels, the driver returns | ||
139 | the voltages as seen at the individual pins of UCH1-UCH5. The voltage at the | ||
140 | pin (Vpin) is formed by a voltage divider made of the thermistor (Rth) and a | ||
141 | scaling resistor (Rs): | ||
142 | |||
143 | Vpin = 2200 * Rth / (Rs + Rth) (2200 is the ADC max limit of 2200 mV) | ||
144 | |||
145 | The equation for the thermistor is as follows (google it if you want to know | ||
146 | more about it): | ||
147 | |||
148 | Rth = Ro * exp(B * (1 / T - 1 / To)) (To is 298.15K (25C) and Ro is the | ||
149 | nominal resistance at 25C) | ||
150 | |||
151 | Mingling the above two equations and assuming Rs = Ro and B = 3435 yields the | ||
152 | following formula for sensors.conf: | ||
153 | |||
154 | compute tempx 1 / (1 / 298.15 - (` (2200 / @ - 1)) / 3435) - 273.15, | ||
155 | 2200 / (1 + (^ (3435 / 298.15 - 3435 / (273.15 + @)))) | ||
156 | |||
157 | |||
158 | Fan Speed Control | ||
159 | ----------------- | ||
160 | |||
161 | The VT1211 provides 2 programmable PWM outputs to control the speeds of 2 | ||
162 | fans. Writing a 2 to any of the two pwm[1-2]_enable sysfs nodes will put the | ||
163 | PWM controller in automatic mode. There is only a single controller that | ||
164 | controls both PWM outputs but each PWM output can be individually enabled and | ||
165 | disabled. | ||
166 | |||
167 | Each PWM has 4 associated distinct output duty-cycles: full, high, low and | ||
168 | off. Full and off are internally hard-wired to 255 (100%) and 0 (0%), | ||
169 | respectively. High and low can be programmed via | ||
170 | pwm[1-2]_auto_point[2-3]_pwm. Each PWM output can be associated with a | ||
171 | different thermal input but - and here's the weird part - only one set of | ||
172 | thermal thresholds exist that controls both PWMs output duty-cycles. The | ||
173 | thermal thresholds are accessible via pwm[1-2]_auto_point[1-4]_temp. Note | ||
174 | that even though there are 2 sets of 4 auto points each, they map to the same | ||
175 | registers in the VT1211 and programming one set is sufficient (actually only | ||
176 | the first set pwm1_auto_point[1-4]_temp is writable, the second set is | ||
177 | read-only). | ||
178 | |||
179 | PWM Auto Point PWM Output Duty-Cycle | ||
180 | ------------------------------------------------ | ||
181 | pwm[1-2]_auto_point4_pwm full speed duty-cycle (hard-wired to 255) | ||
182 | pwm[1-2]_auto_point3_pwm high speed duty-cycle | ||
183 | pwm[1-2]_auto_point2_pwm low speed duty-cycle | ||
184 | pwm[1-2]_auto_point1_pwm off duty-cycle (hard-wired to 0) | ||
185 | |||
186 | Temp Auto Point Thermal Threshold | ||
187 | --------------------------------------------- | ||
188 | pwm[1-2]_auto_point4_temp full speed temp | ||
189 | pwm[1-2]_auto_point3_temp high speed temp | ||
190 | pwm[1-2]_auto_point2_temp low speed temp | ||
191 | pwm[1-2]_auto_point1_temp off temp | ||
192 | |||
193 | Long story short, the controller implements the following algorithm to set the | ||
194 | PWM output duty-cycle based on the input temperature: | ||
195 | |||
196 | Thermal Threshold Output Duty-Cycle | ||
197 | (Rising Temp) (Falling Temp) | ||
198 | ---------------------------------------------------------- | ||
199 | full speed duty-cycle full speed duty-cycle | ||
200 | full speed temp | ||
201 | high speed duty-cycle full speed duty-cycle | ||
202 | high speed temp | ||
203 | low speed duty-cycle high speed duty-cycle | ||
204 | low speed temp | ||
205 | off duty-cycle low speed duty-cycle | ||
206 | off temp | ||
diff --git a/Documentation/hwmon/w83627ehf b/Documentation/hwmon/w83627ehf new file mode 100644 index 000000000000..fae3b781d82d --- /dev/null +++ b/Documentation/hwmon/w83627ehf | |||
@@ -0,0 +1,85 @@ | |||
1 | Kernel driver w83627ehf | ||
2 | ======================= | ||
3 | |||
4 | Supported chips: | ||
5 | * Winbond W83627EHF/EHG (ISA access ONLY) | ||
6 | Prefix: 'w83627ehf' | ||
7 | Addresses scanned: ISA address retrieved from Super I/O registers | ||
8 | Datasheet: http://www.winbond-usa.com/products/winbond_products/pdfs/PCIC/W83627EHF_%20W83627EHGb.pdf | ||
9 | |||
10 | Authors: | ||
11 | Jean Delvare <khali@linux-fr.org> | ||
12 | Yuan Mu (Winbond) | ||
13 | Rudolf Marek <r.marek@sh.cvut.cz> | ||
14 | |||
15 | Description | ||
16 | ----------- | ||
17 | |||
18 | This driver implements support for the Winbond W83627EHF and W83627EHG | ||
19 | super I/O chips. We will refer to them collectively as Winbond chips. | ||
20 | |||
21 | The chips implement three temperature sensors, five fan rotation | ||
22 | speed sensors, ten analog voltage sensors, alarms with beep warnings (control | ||
23 | unimplemented), and some automatic fan regulation strategies (plus manual | ||
24 | fan control mode). | ||
25 | |||
26 | Temperatures are measured in degrees Celsius and measurement resolution is 1 | ||
27 | degC for temp1 and 0.5 degC for temp2 and temp3. An alarm is triggered when | ||
28 | the temperature gets higher than high limit; it stays on until the temperature | ||
29 | falls below the Hysteresis value. | ||
30 | |||
31 | Fan rotation speeds are reported in RPM (rotations per minute). An alarm is | ||
32 | triggered if the rotation speed has dropped below a programmable limit. Fan | ||
33 | readings can be divided by a programmable divider (1, 2, 4, 8, 16, 32, 64 or | ||
34 | 128) to give the readings more range or accuracy. The driver sets the most | ||
35 | suitable fan divisor itself. Some fans might not be present because they | ||
36 | share pins with other functions. | ||
37 | |||
38 | Voltage sensors (also known as IN sensors) report their values in millivolts. | ||
39 | An alarm is triggered if the voltage has crossed a programmable minimum | ||
40 | or maximum limit. | ||
41 | |||
42 | The driver supports automatic fan control mode known as Thermal Cruise. | ||
43 | In this mode, the chip attempts to keep the measured temperature in a | ||
44 | predefined temperature range. If the temperature goes out of range, fan | ||
45 | is driven slower/faster to reach the predefined range again. | ||
46 | |||
47 | The mode works for fan1-fan4. Mapping of temperatures to pwm outputs is as | ||
48 | follows: | ||
49 | |||
50 | temp1 -> pwm1 | ||
51 | temp2 -> pwm2 | ||
52 | temp3 -> pwm3 | ||
53 | prog -> pwm4 (the programmable setting is not supported by the driver) | ||
54 | |||
55 | /sys files | ||
56 | ---------- | ||
57 | |||
58 | pwm[1-4] - this file stores PWM duty cycle or DC value (fan speed) in range: | ||
59 | 0 (stop) to 255 (full) | ||
60 | |||
61 | pwm[1-4]_enable - this file controls mode of fan/temperature control: | ||
62 | * 1 Manual Mode, write to pwm file any value 0-255 (full speed) | ||
63 | * 2 Thermal Cruise | ||
64 | |||
65 | Thermal Cruise mode | ||
66 | ------------------- | ||
67 | |||
68 | If the temperature is in the range defined by: | ||
69 | |||
70 | pwm[1-4]_target - set target temperature, unit millidegree Celcius | ||
71 | (range 0 - 127000) | ||
72 | pwm[1-4]_tolerance - tolerance, unit millidegree Celcius (range 0 - 15000) | ||
73 | |||
74 | there are no changes to fan speed. Once the temperature leaves the interval, | ||
75 | fan speed increases (temp is higher) or decreases if lower than desired. | ||
76 | There are defined steps and times, but not exported by the driver yet. | ||
77 | |||
78 | pwm[1-4]_min_output - minimum fan speed (range 1 - 255), when the temperature | ||
79 | is below defined range. | ||
80 | pwm[1-4]_stop_time - how many milliseconds [ms] must elapse to switch | ||
81 | corresponding fan off. (when the temperature was below | ||
82 | defined range). | ||
83 | |||
84 | Note: last two functions are influenced by other control bits, not yet exported | ||
85 | by the driver, so a change might not have any effect. | ||
diff --git a/Documentation/hwmon/w83791d b/Documentation/hwmon/w83791d index 83a3836289c2..19b2ed739fa1 100644 --- a/Documentation/hwmon/w83791d +++ b/Documentation/hwmon/w83791d | |||
@@ -5,7 +5,7 @@ Supported chips: | |||
5 | * Winbond W83791D | 5 | * Winbond W83791D |
6 | Prefix: 'w83791d' | 6 | Prefix: 'w83791d' |
7 | Addresses scanned: I2C 0x2c - 0x2f | 7 | Addresses scanned: I2C 0x2c - 0x2f |
8 | Datasheet: http://www.winbond-usa.com/products/winbond_products/pdfs/PCIC/W83791Da.pdf | 8 | Datasheet: http://www.winbond-usa.com/products/winbond_products/pdfs/PCIC/W83791D_W83791Gb.pdf |
9 | 9 | ||
10 | Author: Charles Spirakis <bezaur@gmail.com> | 10 | Author: Charles Spirakis <bezaur@gmail.com> |
11 | 11 | ||
@@ -20,6 +20,9 @@ Credits: | |||
20 | Chunhao Huang <DZShen@Winbond.com.tw>, | 20 | Chunhao Huang <DZShen@Winbond.com.tw>, |
21 | Rudolf Marek <r.marek@sh.cvut.cz> | 21 | Rudolf Marek <r.marek@sh.cvut.cz> |
22 | 22 | ||
23 | Additional contributors: | ||
24 | Sven Anders <anders@anduras.de> | ||
25 | |||
23 | Module Parameters | 26 | Module Parameters |
24 | ----------------- | 27 | ----------------- |
25 | 28 | ||
@@ -46,7 +49,8 @@ Module Parameters | |||
46 | Description | 49 | Description |
47 | ----------- | 50 | ----------- |
48 | 51 | ||
49 | This driver implements support for the Winbond W83791D chip. | 52 | This driver implements support for the Winbond W83791D chip. The W83791G |
53 | chip appears to be the same as the W83791D but is lead free. | ||
50 | 54 | ||
51 | Detection of the chip can sometimes be foiled because it can be in an | 55 | Detection of the chip can sometimes be foiled because it can be in an |
52 | internal state that allows no clean access (Bank with ID register is not | 56 | internal state that allows no clean access (Bank with ID register is not |
@@ -71,34 +75,36 @@ Voltage sensors (also known as IN sensors) report their values in millivolts. | |||
71 | An alarm is triggered if the voltage has crossed a programmable minimum | 75 | An alarm is triggered if the voltage has crossed a programmable minimum |
72 | or maximum limit. | 76 | or maximum limit. |
73 | 77 | ||
74 | Alarms are provided as output from a "realtime status register". The | 78 | The bit ordering for the alarm "realtime status register" and the |
75 | following bits are defined: | 79 | "beep enable registers" are different. |
76 | 80 | ||
77 | bit - alarm on: | 81 | in0 (VCORE) : alarms: 0x000001 beep_enable: 0x000001 |
78 | 0 - Vcore | 82 | in1 (VINR0) : alarms: 0x000002 beep_enable: 0x002000 <== mismatch |
79 | 1 - VINR0 | 83 | in2 (+3.3VIN): alarms: 0x000004 beep_enable: 0x000004 |
80 | 2 - +3.3VIN | 84 | in3 (5VDD) : alarms: 0x000008 beep_enable: 0x000008 |
81 | 3 - 5VDD | 85 | in4 (+12VIN) : alarms: 0x000100 beep_enable: 0x000100 |
82 | 4 - temp1 | 86 | in5 (-12VIN) : alarms: 0x000200 beep_enable: 0x000200 |
83 | 5 - temp2 | 87 | in6 (-5VIN) : alarms: 0x000400 beep_enable: 0x000400 |
84 | 6 - fan1 | 88 | in7 (VSB) : alarms: 0x080000 beep_enable: 0x010000 <== mismatch |
85 | 7 - fan2 | 89 | in8 (VBAT) : alarms: 0x100000 beep_enable: 0x020000 <== mismatch |
86 | 8 - +12VIN | 90 | in9 (VINR1) : alarms: 0x004000 beep_enable: 0x004000 |
87 | 9 - -12VIN | 91 | temp1 : alarms: 0x000010 beep_enable: 0x000010 |
88 | 10 - -5VIN | 92 | temp2 : alarms: 0x000020 beep_enable: 0x000020 |
89 | 11 - fan3 | 93 | temp3 : alarms: 0x002000 beep_enable: 0x000002 <== mismatch |
90 | 12 - chassis | 94 | fan1 : alarms: 0x000040 beep_enable: 0x000040 |
91 | 13 - temp3 | 95 | fan2 : alarms: 0x000080 beep_enable: 0x000080 |
92 | 14 - VINR1 | 96 | fan3 : alarms: 0x000800 beep_enable: 0x000800 |
93 | 15 - reserved | 97 | fan4 : alarms: 0x200000 beep_enable: 0x200000 |
94 | 16 - tart1 | 98 | fan5 : alarms: 0x400000 beep_enable: 0x400000 |
95 | 17 - tart2 | 99 | tart1 : alarms: 0x010000 beep_enable: 0x040000 <== mismatch |
96 | 18 - tart3 | 100 | tart2 : alarms: 0x020000 beep_enable: 0x080000 <== mismatch |
97 | 19 - VSB | 101 | tart3 : alarms: 0x040000 beep_enable: 0x100000 <== mismatch |
98 | 20 - VBAT | 102 | case_open : alarms: 0x001000 beep_enable: 0x001000 |
99 | 21 - fan4 | 103 | user_enable : alarms: -------- beep_enable: 0x800000 |
100 | 22 - fan5 | 104 | |
101 | 23 - reserved | 105 | *** NOTE: It is the responsibility of user-space code to handle the fact |
106 | that the beep enable and alarm bits are in different positions when using that | ||
107 | feature of the chip. | ||
102 | 108 | ||
103 | When an alarm goes off, you can be warned by a beeping signal through your | 109 | When an alarm goes off, you can be warned by a beeping signal through your |
104 | computer speaker. It is possible to enable all beeping globally, or only | 110 | computer speaker. It is possible to enable all beeping globally, or only |
@@ -109,5 +115,6 @@ often will do no harm, but will return 'old' values. | |||
109 | 115 | ||
110 | W83791D TODO: | 116 | W83791D TODO: |
111 | --------------- | 117 | --------------- |
112 | Provide a patch for per-file alarms as discussed on the mailing list | 118 | Provide a patch for per-file alarms and beep enables as defined in the hwmon |
119 | documentation (Documentation/hwmon/sysfs-interface) | ||
113 | Provide a patch for smart-fan control (still need appropriate motherboard/fans) | 120 | Provide a patch for smart-fan control (still need appropriate motherboard/fans) |
diff --git a/Documentation/i2c/busses/i2c-sis96x b/Documentation/i2c/busses/i2c-sis96x index 00a009b977e9..08d7b2dac69a 100644 --- a/Documentation/i2c/busses/i2c-sis96x +++ b/Documentation/i2c/busses/i2c-sis96x | |||
@@ -42,8 +42,8 @@ I suspect that this driver could be made to work for the following SiS | |||
42 | chipsets as well: 635, and 635T. If anyone owns a board with those chips | 42 | chipsets as well: 635, and 635T. If anyone owns a board with those chips |
43 | AND is willing to risk crashing & burning an otherwise well-behaved kernel | 43 | AND is willing to risk crashing & burning an otherwise well-behaved kernel |
44 | in the name of progress... please contact me at <mhoffman@lightlink.com> or | 44 | in the name of progress... please contact me at <mhoffman@lightlink.com> or |
45 | via the project's mailing list: <lm-sensors@lm-sensors.org>. Please | 45 | via the project's mailing list: <i2c@lm-sensors.org>. Please send bug |
46 | send bug reports and/or success stories as well. | 46 | reports and/or success stories as well. |
47 | 47 | ||
48 | 48 | ||
49 | TO DOs | 49 | TO DOs |
diff --git a/Documentation/i2c/busses/i2c-viapro b/Documentation/i2c/busses/i2c-viapro index 16775663b9f5..25680346e0ac 100644 --- a/Documentation/i2c/busses/i2c-viapro +++ b/Documentation/i2c/busses/i2c-viapro | |||
@@ -7,9 +7,12 @@ Supported adapters: | |||
7 | * VIA Technologies, Inc. VT82C686A/B | 7 | * VIA Technologies, Inc. VT82C686A/B |
8 | Datasheet: Sometimes available at the VIA website | 8 | Datasheet: Sometimes available at the VIA website |
9 | 9 | ||
10 | * VIA Technologies, Inc. VT8231, VT8233, VT8233A, VT8235, VT8237R | 10 | * VIA Technologies, Inc. VT8231, VT8233, VT8233A |
11 | Datasheet: available on request from VIA | 11 | Datasheet: available on request from VIA |
12 | 12 | ||
13 | * VIA Technologies, Inc. VT8235, VT8237R, VT8237A, VT8251 | ||
14 | Datasheet: available on request and under NDA from VIA | ||
15 | |||
13 | Authors: | 16 | Authors: |
14 | Kyösti Mälkki <kmalkki@cc.hut.fi>, | 17 | Kyösti Mälkki <kmalkki@cc.hut.fi>, |
15 | Mark D. Studebaker <mdsxyz123@yahoo.com>, | 18 | Mark D. Studebaker <mdsxyz123@yahoo.com>, |
@@ -39,6 +42,8 @@ Your lspci -n listing must show one of these : | |||
39 | device 1106:8235 (VT8231 function 4) | 42 | device 1106:8235 (VT8231 function 4) |
40 | device 1106:3177 (VT8235) | 43 | device 1106:3177 (VT8235) |
41 | device 1106:3227 (VT8237R) | 44 | device 1106:3227 (VT8237R) |
45 | device 1106:3337 (VT8237A) | ||
46 | device 1106:3287 (VT8251) | ||
42 | 47 | ||
43 | If none of these show up, you should look in the BIOS for settings like | 48 | If none of these show up, you should look in the BIOS for settings like |
44 | enable ACPI / SMBus or even USB. | 49 | enable ACPI / SMBus or even USB. |
diff --git a/Documentation/i2c/i2c-stub b/Documentation/i2c/i2c-stub index d6dcb138abf5..9cc081e69764 100644 --- a/Documentation/i2c/i2c-stub +++ b/Documentation/i2c/i2c-stub | |||
@@ -6,9 +6,12 @@ This module is a very simple fake I2C/SMBus driver. It implements four | |||
6 | types of SMBus commands: write quick, (r/w) byte, (r/w) byte data, and | 6 | types of SMBus commands: write quick, (r/w) byte, (r/w) byte data, and |
7 | (r/w) word data. | 7 | (r/w) word data. |
8 | 8 | ||
9 | You need to provide a chip address as a module parameter when loading | ||
10 | this driver, which will then only react to SMBus commands to this address. | ||
11 | |||
9 | No hardware is needed nor associated with this module. It will accept write | 12 | No hardware is needed nor associated with this module. It will accept write |
10 | quick commands to all addresses; it will respond to the other commands (also | 13 | quick commands to one address; it will respond to the other commands (also |
11 | to all addresses) by reading from or writing to an array in memory. It will | 14 | to one address) by reading from or writing to an array in memory. It will |
12 | also spam the kernel logs for every command it handles. | 15 | also spam the kernel logs for every command it handles. |
13 | 16 | ||
14 | A pointer register with auto-increment is implemented for all byte | 17 | A pointer register with auto-increment is implemented for all byte |
@@ -21,6 +24,11 @@ The typical use-case is like this: | |||
21 | 3. load the target sensors chip driver module | 24 | 3. load the target sensors chip driver module |
22 | 4. observe its behavior in the kernel log | 25 | 4. observe its behavior in the kernel log |
23 | 26 | ||
27 | PARAMETERS: | ||
28 | |||
29 | int chip_addr: | ||
30 | The SMBus address to emulate a chip at. | ||
31 | |||
24 | CAVEATS: | 32 | CAVEATS: |
25 | 33 | ||
26 | There are independent arrays for byte/data and word/data commands. Depending | 34 | There are independent arrays for byte/data and word/data commands. Depending |
@@ -33,6 +41,9 @@ If the hardware for your driver has banked registers (e.g. Winbond sensors | |||
33 | chips) this module will not work well - although it could be extended to | 41 | chips) this module will not work well - although it could be extended to |
34 | support that pretty easily. | 42 | support that pretty easily. |
35 | 43 | ||
44 | Only one chip address is supported - although this module could be | ||
45 | extended to support more. | ||
46 | |||
36 | If you spam it hard enough, printk can be lossy. This module really wants | 47 | If you spam it hard enough, printk can be lossy. This module really wants |
37 | something like relayfs. | 48 | something like relayfs. |
38 | 49 | ||
diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt index 10312bebe55d..c51314b1a463 100644 --- a/Documentation/i386/boot.txt +++ b/Documentation/i386/boot.txt | |||
@@ -181,6 +181,7 @@ filled out, however: | |||
181 | 5 ELILO | 181 | 5 ELILO |
182 | 7 GRuB | 182 | 7 GRuB |
183 | 8 U-BOOT | 183 | 8 U-BOOT |
184 | 9 Xen | ||
184 | 185 | ||
185 | Please contact <hpa@zytor.com> if you need a bootloader ID | 186 | Please contact <hpa@zytor.com> if you need a bootloader ID |
186 | value assigned. | 187 | value assigned. |
diff --git a/Documentation/i386/zero-page.txt b/Documentation/i386/zero-page.txt index df28c7416781..c04a421f4a7c 100644 --- a/Documentation/i386/zero-page.txt +++ b/Documentation/i386/zero-page.txt | |||
@@ -63,6 +63,10 @@ Offset Type Description | |||
63 | 2 for bootsect-loader | 63 | 2 for bootsect-loader |
64 | 3 for SYSLINUX | 64 | 3 for SYSLINUX |
65 | 4 for ETHERBOOT | 65 | 4 for ETHERBOOT |
66 | 5 for ELILO | ||
67 | 7 for GRuB | ||
68 | 8 for U-BOOT | ||
69 | 9 for Xen | ||
66 | V = version | 70 | V = version |
67 | 0x211 char loadflags: | 71 | 0x211 char loadflags: |
68 | bit0 = 1: kernel is loaded high (bzImage) | 72 | bit0 = 1: kernel is loaded high (bzImage) |
diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt index 187035560d7f..864ff3283780 100644 --- a/Documentation/infiniband/ipoib.txt +++ b/Documentation/infiniband/ipoib.txt | |||
@@ -51,8 +51,6 @@ Debugging Information | |||
51 | 51 | ||
52 | References | 52 | References |
53 | 53 | ||
54 | IETF IP over InfiniBand (ipoib) Working Group | ||
55 | http://ietf.org/html.charters/ipoib-charter.html | ||
56 | Transmission of IP over InfiniBand (IPoIB) (RFC 4391) | 54 | Transmission of IP over InfiniBand (IPoIB) (RFC 4391) |
57 | http://ietf.org/rfc/rfc4391.txt | 55 | http://ietf.org/rfc/rfc4391.txt |
58 | IP over InfiniBand (IPoIB) Architecture (RFC 4392) | 56 | IP over InfiniBand (IPoIB) Architecture (RFC 4392) |
diff --git a/Documentation/initrd.txt b/Documentation/initrd.txt index 7de1c80cd719..15f1b35deb34 100644 --- a/Documentation/initrd.txt +++ b/Documentation/initrd.txt | |||
@@ -67,12 +67,27 @@ initrd adds the following new options: | |||
67 | as the last process has closed it, all data is freed and /dev/initrd | 67 | as the last process has closed it, all data is freed and /dev/initrd |
68 | can't be opened anymore. | 68 | can't be opened anymore. |
69 | 69 | ||
70 | root=/dev/ram0 (without devfs) | 70 | root=/dev/ram0 |
71 | root=/dev/rd/0 (with devfs) | ||
72 | 71 | ||
73 | initrd is mounted as root, and the normal boot procedure is followed, | 72 | initrd is mounted as root, and the normal boot procedure is followed, |
74 | with the RAM disk still mounted as root. | 73 | with the RAM disk still mounted as root. |
75 | 74 | ||
75 | Compressed cpio images | ||
76 | ---------------------- | ||
77 | |||
78 | Recent kernels have support for populating a ramdisk from a compressed cpio | ||
79 | archive, on such systems, the creation of a ramdisk image doesn't need to | ||
80 | involve special block devices or loopbacks, you merely create a directory on | ||
81 | disk with the desired initrd content, cd to that directory, and run (as an | ||
82 | example): | ||
83 | |||
84 | find . | cpio --quiet -c -o | gzip -9 -n > /boot/imagefile.img | ||
85 | |||
86 | Examining the contents of an existing image file is just as simple: | ||
87 | |||
88 | mkdir /tmp/imagefile | ||
89 | cd /tmp/imagefile | ||
90 | gzip -cd /boot/imagefile.img | cpio -imd --quiet | ||
76 | 91 | ||
77 | Installation | 92 | Installation |
78 | ------------ | 93 | ------------ |
@@ -90,8 +105,7 @@ you're building an install floppy), the root file system creation | |||
90 | procedure should create the /initrd directory. | 105 | procedure should create the /initrd directory. |
91 | 106 | ||
92 | If initrd will not be mounted in some cases, its content is still | 107 | If initrd will not be mounted in some cases, its content is still |
93 | accessible if the following device has been created (note that this | 108 | accessible if the following device has been created: |
94 | does not work if using devfs): | ||
95 | 109 | ||
96 | # mknod /dev/initrd b 1 250 | 110 | # mknod /dev/initrd b 1 250 |
97 | # chmod 400 /dev/initrd | 111 | # chmod 400 /dev/initrd |
@@ -119,8 +133,7 @@ We'll describe the loopback device method: | |||
119 | (if space is critical, you may want to use the Minix FS instead of Ext2) | 133 | (if space is critical, you may want to use the Minix FS instead of Ext2) |
120 | 3) mount the file system, e.g. | 134 | 3) mount the file system, e.g. |
121 | # mount -t ext2 -o loop initrd /mnt | 135 | # mount -t ext2 -o loop initrd /mnt |
122 | 4) create the console device (not necessary if using devfs, but it can't | 136 | 4) create the console device: |
123 | hurt to do it anyway): | ||
124 | # mkdir /mnt/dev | 137 | # mkdir /mnt/dev |
125 | # mknod /mnt/dev/console c 5 1 | 138 | # mknod /mnt/dev/console c 5 1 |
126 | 5) copy all the files that are needed to properly use the initrd | 139 | 5) copy all the files that are needed to properly use the initrd |
@@ -152,12 +165,7 @@ have to be given: | |||
152 | 165 | ||
153 | root=/dev/ram0 init=/linuxrc rw | 166 | root=/dev/ram0 init=/linuxrc rw |
154 | 167 | ||
155 | if not using devfs, or | 168 | (rw is only necessary if writing to the initrd file system.) |
156 | |||
157 | root=/dev/rd/0 init=/linuxrc rw | ||
158 | |||
159 | if using devfs. (rw is only necessary if writing to the initrd file | ||
160 | system.) | ||
161 | 169 | ||
162 | With LOADLIN, you simply execute | 170 | With LOADLIN, you simply execute |
163 | 171 | ||
@@ -217,9 +225,9 @@ following command: | |||
217 | # exec chroot . what-follows <dev/console >dev/console 2>&1 | 225 | # exec chroot . what-follows <dev/console >dev/console 2>&1 |
218 | 226 | ||
219 | Where what-follows is a program under the new root, e.g. /sbin/init | 227 | Where what-follows is a program under the new root, e.g. /sbin/init |
220 | If the new root file system will be used with devfs and has no valid | 228 | If the new root file system will be used with udev and has no valid |
221 | /dev directory, devfs must be mounted before invoking chroot in order to | 229 | /dev directory, udev must be initialized before invoking chroot in order |
222 | provide /dev/console. | 230 | to provide /dev/console. |
223 | 231 | ||
224 | Note: implementation details of pivot_root may change with time. In order | 232 | Note: implementation details of pivot_root may change with time. In order |
225 | to ensure compatibility, the following points should be observed: | 233 | to ensure compatibility, the following points should be observed: |
@@ -236,7 +244,7 @@ Now, the initrd can be unmounted and the memory allocated by the RAM | |||
236 | disk can be freed: | 244 | disk can be freed: |
237 | 245 | ||
238 | # umount /initrd | 246 | # umount /initrd |
239 | # blockdev --flushbufs /dev/ram0 # /dev/rd/0 if using devfs | 247 | # blockdev --flushbufs /dev/ram0 |
240 | 248 | ||
241 | It is also possible to use initrd with an NFS-mounted root, see the | 249 | It is also possible to use initrd with an NFS-mounted root, see the |
242 | pivot_root(8) man page for details. | 250 | pivot_root(8) man page for details. |
diff --git a/Documentation/input/joystick.txt b/Documentation/input/joystick.txt index d53b857a3710..841c353297e6 100644 --- a/Documentation/input/joystick.txt +++ b/Documentation/input/joystick.txt | |||
@@ -39,7 +39,6 @@ them. Bug reports and success stories are also welcome. | |||
39 | 39 | ||
40 | The input project website is at: | 40 | The input project website is at: |
41 | 41 | ||
42 | http://www.suse.cz/development/input/ | ||
43 | http://atrey.karlin.mff.cuni.cz/~vojtech/input/ | 42 | http://atrey.karlin.mff.cuni.cz/~vojtech/input/ |
44 | 43 | ||
45 | There is also a mailing list for the driver at: | 44 | There is also a mailing list for the driver at: |
diff --git a/Documentation/ioctl-number.txt b/Documentation/ioctl-number.txt index 1543802ef53e..edc04d74ae23 100644 --- a/Documentation/ioctl-number.txt +++ b/Documentation/ioctl-number.txt | |||
@@ -119,7 +119,6 @@ Code Seq# Include File Comments | |||
119 | 'c' 00-7F linux/comstats.h conflict! | 119 | 'c' 00-7F linux/comstats.h conflict! |
120 | 'c' 00-7F linux/coda.h conflict! | 120 | 'c' 00-7F linux/coda.h conflict! |
121 | 'd' 00-FF linux/char/drm/drm/h conflict! | 121 | 'd' 00-FF linux/char/drm/drm/h conflict! |
122 | 'd' 00-1F linux/devfs_fs.h conflict! | ||
123 | 'd' 00-DF linux/video_decoder.h conflict! | 122 | 'd' 00-DF linux/video_decoder.h conflict! |
124 | 'd' F0-FF linux/digi1.h | 123 | 'd' F0-FF linux/digi1.h |
125 | 'e' all linux/digi1.h conflict! | 124 | 'e' all linux/digi1.h conflict! |
diff --git a/Documentation/irqflags-tracing.txt b/Documentation/irqflags-tracing.txt new file mode 100644 index 000000000000..6a444877ee0b --- /dev/null +++ b/Documentation/irqflags-tracing.txt | |||
@@ -0,0 +1,57 @@ | |||
1 | IRQ-flags state tracing | ||
2 | |||
3 | started by Ingo Molnar <mingo@redhat.com> | ||
4 | |||
5 | the "irq-flags tracing" feature "traces" hardirq and softirq state, in | ||
6 | that it gives interested subsystems an opportunity to be notified of | ||
7 | every hardirqs-off/hardirqs-on, softirqs-off/softirqs-on event that | ||
8 | happens in the kernel. | ||
9 | |||
10 | CONFIG_TRACE_IRQFLAGS_SUPPORT is needed for CONFIG_PROVE_SPIN_LOCKING | ||
11 | and CONFIG_PROVE_RW_LOCKING to be offered by the generic lock debugging | ||
12 | code. Otherwise only CONFIG_PROVE_MUTEX_LOCKING and | ||
13 | CONFIG_PROVE_RWSEM_LOCKING will be offered on an architecture - these | ||
14 | are locking APIs that are not used in IRQ context. (the one exception | ||
15 | for rwsems is worked around) | ||
16 | |||
17 | architecture support for this is certainly not in the "trivial" | ||
18 | category, because lots of lowlevel assembly code deal with irq-flags | ||
19 | state changes. But an architecture can be irq-flags-tracing enabled in a | ||
20 | rather straightforward and risk-free manner. | ||
21 | |||
22 | Architectures that want to support this need to do a couple of | ||
23 | code-organizational changes first: | ||
24 | |||
25 | - move their irq-flags manipulation code from their asm/system.h header | ||
26 | to asm/irqflags.h | ||
27 | |||
28 | - rename local_irq_disable()/etc to raw_local_irq_disable()/etc. so that | ||
29 | the linux/irqflags.h code can inject callbacks and can construct the | ||
30 | real local_irq_disable()/etc APIs. | ||
31 | |||
32 | - add and enable TRACE_IRQFLAGS_SUPPORT in their arch level Kconfig file | ||
33 | |||
34 | and then a couple of functional changes are needed as well to implement | ||
35 | irq-flags-tracing support: | ||
36 | |||
37 | - in lowlevel entry code add (build-conditional) calls to the | ||
38 | trace_hardirqs_off()/trace_hardirqs_on() functions. The lock validator | ||
39 | closely guards whether the 'real' irq-flags matches the 'virtual' | ||
40 | irq-flags state, and complains loudly (and turns itself off) if the | ||
41 | two do not match. Usually most of the time for arch support for | ||
42 | irq-flags-tracing is spent in this state: look at the lockdep | ||
43 | complaint, try to figure out the assembly code we did not cover yet, | ||
44 | fix and repeat. Once the system has booted up and works without a | ||
45 | lockdep complaint in the irq-flags-tracing functions arch support is | ||
46 | complete. | ||
47 | - if the architecture has non-maskable interrupts then those need to be | ||
48 | excluded from the irq-tracing [and lock validation] mechanism via | ||
49 | lockdep_off()/lockdep_on(). | ||
50 | |||
51 | in general there is no risk from having an incomplete irq-flags-tracing | ||
52 | implementation in an architecture: lockdep will detect that and will | ||
53 | turn itself off. I.e. the lock validator will still be reliable. There | ||
54 | should be no crashes due to irq-tracing bugs. (except if the assembly | ||
55 | changes break other code by modifying conditions or registers that | ||
56 | shouldnt be) | ||
57 | |||
diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt index ca1967f36423..003fccc14d24 100644 --- a/Documentation/kbuild/kconfig-language.txt +++ b/Documentation/kbuild/kconfig-language.txt | |||
@@ -67,19 +67,19 @@ applicable everywhere (see syntax). | |||
67 | - default value: "default" <expr> ["if" <expr>] | 67 | - default value: "default" <expr> ["if" <expr>] |
68 | A config option can have any number of default values. If multiple | 68 | A config option can have any number of default values. If multiple |
69 | default values are visible, only the first defined one is active. | 69 | default values are visible, only the first defined one is active. |
70 | Default values are not limited to the menu entry, where they are | 70 | Default values are not limited to the menu entry where they are |
71 | defined, this means the default can be defined somewhere else or be | 71 | defined. This means the default can be defined somewhere else or be |
72 | overridden by an earlier definition. | 72 | overridden by an earlier definition. |
73 | The default value is only assigned to the config symbol if no other | 73 | The default value is only assigned to the config symbol if no other |
74 | value was set by the user (via the input prompt above). If an input | 74 | value was set by the user (via the input prompt above). If an input |
75 | prompt is visible the default value is presented to the user and can | 75 | prompt is visible the default value is presented to the user and can |
76 | be overridden by him. | 76 | be overridden by him. |
77 | Optionally dependencies only for this default value can be added with | 77 | Optionally, dependencies only for this default value can be added with |
78 | "if". | 78 | "if". |
79 | 79 | ||
80 | - dependencies: "depends on"/"requires" <expr> | 80 | - dependencies: "depends on"/"requires" <expr> |
81 | This defines a dependency for this menu entry. If multiple | 81 | This defines a dependency for this menu entry. If multiple |
82 | dependencies are defined they are connected with '&&'. Dependencies | 82 | dependencies are defined, they are connected with '&&'. Dependencies |
83 | are applied to all other options within this menu entry (which also | 83 | are applied to all other options within this menu entry (which also |
84 | accept an "if" expression), so these two examples are equivalent: | 84 | accept an "if" expression), so these two examples are equivalent: |
85 | 85 | ||
@@ -153,7 +153,7 @@ Nonconstant symbols are the most common ones and are defined with the | |||
153 | 'config' statement. Nonconstant symbols consist entirely of alphanumeric | 153 | 'config' statement. Nonconstant symbols consist entirely of alphanumeric |
154 | characters or underscores. | 154 | characters or underscores. |
155 | Constant symbols are only part of expressions. Constant symbols are | 155 | Constant symbols are only part of expressions. Constant symbols are |
156 | always surrounded by single or double quotes. Within the quote any | 156 | always surrounded by single or double quotes. Within the quote, any |
157 | other character is allowed and the quotes can be escaped using '\'. | 157 | other character is allowed and the quotes can be escaped using '\'. |
158 | 158 | ||
159 | Menu structure | 159 | Menu structure |
@@ -237,7 +237,7 @@ choices: | |||
237 | <choice block> | 237 | <choice block> |
238 | "endchoice" | 238 | "endchoice" |
239 | 239 | ||
240 | This defines a choice group and accepts any of above attributes as | 240 | This defines a choice group and accepts any of the above attributes as |
241 | options. A choice can only be of type bool or tristate, while a boolean | 241 | options. A choice can only be of type bool or tristate, while a boolean |
242 | choice only allows a single config entry to be selected, a tristate | 242 | choice only allows a single config entry to be selected, a tristate |
243 | choice also allows any number of config entries to be set to 'm'. This | 243 | choice also allows any number of config entries to be set to 'm'. This |
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index a9c00facdf40..e2cbd59cf2d0 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt | |||
@@ -22,7 +22,7 @@ This document describes the Linux kernel Makefiles. | |||
22 | === 4 Host Program support | 22 | === 4 Host Program support |
23 | --- 4.1 Simple Host Program | 23 | --- 4.1 Simple Host Program |
24 | --- 4.2 Composite Host Programs | 24 | --- 4.2 Composite Host Programs |
25 | --- 4.3 Defining shared libraries | 25 | --- 4.3 Defining shared libraries |
26 | --- 4.4 Using C++ for host programs | 26 | --- 4.4 Using C++ for host programs |
27 | --- 4.5 Controlling compiler options for host programs | 27 | --- 4.5 Controlling compiler options for host programs |
28 | --- 4.6 When host programs are actually built | 28 | --- 4.6 When host programs are actually built |
@@ -69,7 +69,7 @@ architecture-specific information to the top Makefile. | |||
69 | 69 | ||
70 | Each subdirectory has a kbuild Makefile which carries out the commands | 70 | Each subdirectory has a kbuild Makefile which carries out the commands |
71 | passed down from above. The kbuild Makefile uses information from the | 71 | passed down from above. The kbuild Makefile uses information from the |
72 | .config file to construct various file lists used by kbuild to build | 72 | .config file to construct various file lists used by kbuild to build |
73 | any built-in or modular targets. | 73 | any built-in or modular targets. |
74 | 74 | ||
75 | scripts/Makefile.* contains all the definitions/rules etc. that | 75 | scripts/Makefile.* contains all the definitions/rules etc. that |
@@ -86,7 +86,7 @@ any kernel Makefiles (or any other source files). | |||
86 | 86 | ||
87 | *Normal developers* are people who work on features such as device | 87 | *Normal developers* are people who work on features such as device |
88 | drivers, file systems, and network protocols. These people need to | 88 | drivers, file systems, and network protocols. These people need to |
89 | maintain the kbuild Makefiles for the subsystem that they are | 89 | maintain the kbuild Makefiles for the subsystem they are |
90 | working on. In order to do this effectively, they need some overall | 90 | working on. In order to do this effectively, they need some overall |
91 | knowledge about the kernel Makefiles, plus detailed knowledge about the | 91 | knowledge about the kernel Makefiles, plus detailed knowledge about the |
92 | public interface for kbuild. | 92 | public interface for kbuild. |
@@ -104,10 +104,10 @@ This document is aimed towards normal developers and arch developers. | |||
104 | === 3 The kbuild files | 104 | === 3 The kbuild files |
105 | 105 | ||
106 | Most Makefiles within the kernel are kbuild Makefiles that use the | 106 | Most Makefiles within the kernel are kbuild Makefiles that use the |
107 | kbuild infrastructure. This chapter introduce the syntax used in the | 107 | kbuild infrastructure. This chapter introduces the syntax used in the |
108 | kbuild makefiles. | 108 | kbuild makefiles. |
109 | The preferred name for the kbuild files are 'Makefile' but 'Kbuild' can | 109 | The preferred name for the kbuild files are 'Makefile' but 'Kbuild' can |
110 | be used and if both a 'Makefile' and a 'Kbuild' file exists then the 'Kbuild' | 110 | be used and if both a 'Makefile' and a 'Kbuild' file exists, then the 'Kbuild' |
111 | file will be used. | 111 | file will be used. |
112 | 112 | ||
113 | Section 3.1 "Goal definitions" is a quick intro, further chapters provide | 113 | Section 3.1 "Goal definitions" is a quick intro, further chapters provide |
@@ -124,7 +124,7 @@ more details, with real examples. | |||
124 | Example: | 124 | Example: |
125 | obj-y += foo.o | 125 | obj-y += foo.o |
126 | 126 | ||
127 | This tell kbuild that there is one object in that directory named | 127 | This tell kbuild that there is one object in that directory, named |
128 | foo.o. foo.o will be built from foo.c or foo.S. | 128 | foo.o. foo.o will be built from foo.c or foo.S. |
129 | 129 | ||
130 | If foo.o shall be built as a module, the variable obj-m is used. | 130 | If foo.o shall be built as a module, the variable obj-m is used. |
@@ -140,7 +140,7 @@ more details, with real examples. | |||
140 | --- 3.2 Built-in object goals - obj-y | 140 | --- 3.2 Built-in object goals - obj-y |
141 | 141 | ||
142 | The kbuild Makefile specifies object files for vmlinux | 142 | The kbuild Makefile specifies object files for vmlinux |
143 | in the lists $(obj-y). These lists depend on the kernel | 143 | in the $(obj-y) lists. These lists depend on the kernel |
144 | configuration. | 144 | configuration. |
145 | 145 | ||
146 | Kbuild compiles all the $(obj-y) files. It then calls | 146 | Kbuild compiles all the $(obj-y) files. It then calls |
@@ -154,8 +154,8 @@ more details, with real examples. | |||
154 | Link order is significant, because certain functions | 154 | Link order is significant, because certain functions |
155 | (module_init() / __initcall) will be called during boot in the | 155 | (module_init() / __initcall) will be called during boot in the |
156 | order they appear. So keep in mind that changing the link | 156 | order they appear. So keep in mind that changing the link |
157 | order may e.g. change the order in which your SCSI | 157 | order may e.g. change the order in which your SCSI |
158 | controllers are detected, and thus you disks are renumbered. | 158 | controllers are detected, and thus your disks are renumbered. |
159 | 159 | ||
160 | Example: | 160 | Example: |
161 | #drivers/isdn/i4l/Makefile | 161 | #drivers/isdn/i4l/Makefile |
@@ -203,11 +203,11 @@ more details, with real examples. | |||
203 | Example: | 203 | Example: |
204 | #fs/ext2/Makefile | 204 | #fs/ext2/Makefile |
205 | obj-$(CONFIG_EXT2_FS) += ext2.o | 205 | obj-$(CONFIG_EXT2_FS) += ext2.o |
206 | ext2-y := balloc.o bitmap.o | 206 | ext2-y := balloc.o bitmap.o |
207 | ext2-$(CONFIG_EXT2_FS_XATTR) += xattr.o | 207 | ext2-$(CONFIG_EXT2_FS_XATTR) += xattr.o |
208 | 208 | ||
209 | In this example xattr.o is only part of the composite object | 209 | In this example, xattr.o is only part of the composite object |
210 | ext2.o, if $(CONFIG_EXT2_FS_XATTR) evaluates to 'y'. | 210 | ext2.o if $(CONFIG_EXT2_FS_XATTR) evaluates to 'y'. |
211 | 211 | ||
212 | Note: Of course, when you are building objects into the kernel, | 212 | Note: Of course, when you are building objects into the kernel, |
213 | the syntax above will also work. So, if you have CONFIG_EXT2_FS=y, | 213 | the syntax above will also work. So, if you have CONFIG_EXT2_FS=y, |
@@ -221,16 +221,16 @@ more details, with real examples. | |||
221 | 221 | ||
222 | --- 3.5 Library file goals - lib-y | 222 | --- 3.5 Library file goals - lib-y |
223 | 223 | ||
224 | Objects listed with obj-* are used for modules or | 224 | Objects listed with obj-* are used for modules, or |
225 | combined in a built-in.o for that specific directory. | 225 | combined in a built-in.o for that specific directory. |
226 | There is also the possibility to list objects that will | 226 | There is also the possibility to list objects that will |
227 | be included in a library, lib.a. | 227 | be included in a library, lib.a. |
228 | All objects listed with lib-y are combined in a single | 228 | All objects listed with lib-y are combined in a single |
229 | library for that directory. | 229 | library for that directory. |
230 | Objects that are listed in obj-y and additional listed in | 230 | Objects that are listed in obj-y and additionaly listed in |
231 | lib-y will not be included in the library, since they will anyway | 231 | lib-y will not be included in the library, since they will anyway |
232 | be accessible. | 232 | be accessible. |
233 | For consistency objects listed in lib-m will be included in lib.a. | 233 | For consistency, objects listed in lib-m will be included in lib.a. |
234 | 234 | ||
235 | Note that the same kbuild makefile may list files to be built-in | 235 | Note that the same kbuild makefile may list files to be built-in |
236 | and to be part of a library. Therefore the same directory | 236 | and to be part of a library. Therefore the same directory |
@@ -241,11 +241,11 @@ more details, with real examples. | |||
241 | lib-y := checksum.o delay.o | 241 | lib-y := checksum.o delay.o |
242 | 242 | ||
243 | This will create a library lib.a based on checksum.o and delay.o. | 243 | This will create a library lib.a based on checksum.o and delay.o. |
244 | For kbuild to actually recognize that there is a lib.a being build | 244 | For kbuild to actually recognize that there is a lib.a being built, |
245 | the directory shall be listed in libs-y. | 245 | the directory shall be listed in libs-y. |
246 | See also "6.3 List directories to visit when descending". | 246 | See also "6.3 List directories to visit when descending". |
247 | 247 | ||
248 | Usage of lib-y is normally restricted to lib/ and arch/*/lib. | 248 | Use of lib-y is normally restricted to lib/ and arch/*/lib. |
249 | 249 | ||
250 | --- 3.6 Descending down in directories | 250 | --- 3.6 Descending down in directories |
251 | 251 | ||
@@ -255,7 +255,7 @@ more details, with real examples. | |||
255 | invoke make recursively in subdirectories, provided you let it know of | 255 | invoke make recursively in subdirectories, provided you let it know of |
256 | them. | 256 | them. |
257 | 257 | ||
258 | To do so obj-y and obj-m are used. | 258 | To do so, obj-y and obj-m are used. |
259 | ext2 lives in a separate directory, and the Makefile present in fs/ | 259 | ext2 lives in a separate directory, and the Makefile present in fs/ |
260 | tells kbuild to descend down using the following assignment. | 260 | tells kbuild to descend down using the following assignment. |
261 | 261 | ||
@@ -353,8 +353,8 @@ more details, with real examples. | |||
353 | Special rules are used when the kbuild infrastructure does | 353 | Special rules are used when the kbuild infrastructure does |
354 | not provide the required support. A typical example is | 354 | not provide the required support. A typical example is |
355 | header files generated during the build process. | 355 | header files generated during the build process. |
356 | Another example is the architecture specific Makefiles which | 356 | Another example are the architecture specific Makefiles which |
357 | needs special rules to prepare boot images etc. | 357 | need special rules to prepare boot images etc. |
358 | 358 | ||
359 | Special rules are written as normal Make rules. | 359 | Special rules are written as normal Make rules. |
360 | Kbuild is not executing in the directory where the Makefile is | 360 | Kbuild is not executing in the directory where the Makefile is |
@@ -387,28 +387,47 @@ more details, with real examples. | |||
387 | 387 | ||
388 | --- 3.11 $(CC) support functions | 388 | --- 3.11 $(CC) support functions |
389 | 389 | ||
390 | The kernel may be build with several different versions of | 390 | The kernel may be built with several different versions of |
391 | $(CC), each supporting a unique set of features and options. | 391 | $(CC), each supporting a unique set of features and options. |
392 | kbuild provide basic support to check for valid options for $(CC). | 392 | kbuild provide basic support to check for valid options for $(CC). |
393 | $(CC) is useally the gcc compiler, but other alternatives are | 393 | $(CC) is useally the gcc compiler, but other alternatives are |
394 | available. | 394 | available. |
395 | 395 | ||
396 | as-option | 396 | as-option |
397 | as-option is used to check if $(CC) when used to compile | 397 | as-option is used to check if $(CC) -- when used to compile |
398 | assembler (*.S) files supports the given option. An optional | 398 | assembler (*.S) files -- supports the given option. An optional |
399 | second option may be specified if first option are not supported. | 399 | second option may be specified if the first option is not supported. |
400 | 400 | ||
401 | Example: | 401 | Example: |
402 | #arch/sh/Makefile | 402 | #arch/sh/Makefile |
403 | cflags-y += $(call as-option,-Wa$(comma)-isa=$(isa-y),) | 403 | cflags-y += $(call as-option,-Wa$(comma)-isa=$(isa-y),) |
404 | 404 | ||
405 | In the above example cflags-y will be assinged the the option | 405 | In the above example, cflags-y will be assigned the option |
406 | -Wa$(comma)-isa=$(isa-y) if it is supported by $(CC). | 406 | -Wa$(comma)-isa=$(isa-y) if it is supported by $(CC). |
407 | The second argument is optional, and if supplied will be used | 407 | The second argument is optional, and if supplied will be used |
408 | if first argument is not supported. | 408 | if first argument is not supported. |
409 | 409 | ||
410 | ld-option | ||
411 | ld-option is used to check if $(CC) when used to link object files | ||
412 | supports the given option. An optional second option may be | ||
413 | specified if first option are not supported. | ||
414 | |||
415 | Example: | ||
416 | #arch/i386/kernel/Makefile | ||
417 | vsyscall-flags += $(call ld-option, -Wl$(comma)--hash-style=sysv) | ||
418 | |||
419 | In the above example vsyscall-flags will be assigned the option | ||
420 | -Wl$(comma)--hash-style=sysv if it is supported by $(CC). | ||
421 | The second argument is optional, and if supplied will be used | ||
422 | if first argument is not supported. | ||
423 | |||
424 | as-instr | ||
425 | as-instr checks if the assembler reports a specific instruction | ||
426 | and then outputs either option1 or option2 | ||
427 | C escapes are supported in the test instruction | ||
428 | |||
410 | cc-option | 429 | cc-option |
411 | cc-option is used to check if $(CC) support a given option, and not | 430 | cc-option is used to check if $(CC) supports a given option, and not |
412 | supported to use an optional second option. | 431 | supported to use an optional second option. |
413 | 432 | ||
414 | Example: | 433 | Example: |
@@ -416,12 +435,12 @@ more details, with real examples. | |||
416 | cflags-y += $(call cc-option,-march=pentium-mmx,-march=i586) | 435 | cflags-y += $(call cc-option,-march=pentium-mmx,-march=i586) |
417 | 436 | ||
418 | In the above example cflags-y will be assigned the option | 437 | In the above example cflags-y will be assigned the option |
419 | -march=pentium-mmx if supported by $(CC), otherwise -march-i586. | 438 | -march=pentium-mmx if supported by $(CC), otherwise -march=i586. |
420 | The second argument to cc-option is optional, and if omitted | 439 | The second argument to cc-option is optional, and if omitted, |
421 | cflags-y will be assigned no value if first option is not supported. | 440 | cflags-y will be assigned no value if first option is not supported. |
422 | 441 | ||
423 | cc-option-yn | 442 | cc-option-yn |
424 | cc-option-yn is used to check if gcc supports a given option | 443 | cc-option-yn is used to check if gcc supports a given option |
425 | and return 'y' if supported, otherwise 'n'. | 444 | and return 'y' if supported, otherwise 'n'. |
426 | 445 | ||
427 | Example: | 446 | Example: |
@@ -429,32 +448,33 @@ more details, with real examples. | |||
429 | biarch := $(call cc-option-yn, -m32) | 448 | biarch := $(call cc-option-yn, -m32) |
430 | aflags-$(biarch) += -a32 | 449 | aflags-$(biarch) += -a32 |
431 | cflags-$(biarch) += -m32 | 450 | cflags-$(biarch) += -m32 |
432 | 451 | ||
433 | In the above example $(biarch) is set to y if $(CC) supports the -m32 | 452 | In the above example, $(biarch) is set to y if $(CC) supports the -m32 |
434 | option. When $(biarch) equals to y the expanded variables $(aflags-y) | 453 | option. When $(biarch) equals 'y', the expanded variables $(aflags-y) |
435 | and $(cflags-y) will be assigned the values -a32 and -m32. | 454 | and $(cflags-y) will be assigned the values -a32 and -m32, |
455 | respectively. | ||
436 | 456 | ||
437 | cc-option-align | 457 | cc-option-align |
438 | gcc version >= 3.0 shifted type of options used to speify | 458 | gcc versions >= 3.0 changed the type of options used to specify |
439 | alignment of functions, loops etc. $(cc-option-align) whrn used | 459 | alignment of functions, loops etc. $(cc-option-align), when used |
440 | as prefix to the align options will select the right prefix: | 460 | as prefix to the align options, will select the right prefix: |
441 | gcc < 3.00 | 461 | gcc < 3.00 |
442 | cc-option-align = -malign | 462 | cc-option-align = -malign |
443 | gcc >= 3.00 | 463 | gcc >= 3.00 |
444 | cc-option-align = -falign | 464 | cc-option-align = -falign |
445 | 465 | ||
446 | Example: | 466 | Example: |
447 | CFLAGS += $(cc-option-align)-functions=4 | 467 | CFLAGS += $(cc-option-align)-functions=4 |
448 | 468 | ||
449 | In the above example the option -falign-functions=4 is used for | 469 | In the above example, the option -falign-functions=4 is used for |
450 | gcc >= 3.00. For gcc < 3.00 -malign-functions=4 is used. | 470 | gcc >= 3.00. For gcc < 3.00, -malign-functions=4 is used. |
451 | 471 | ||
452 | cc-version | 472 | cc-version |
453 | cc-version return a numerical version of the $(CC) compiler version. | 473 | cc-version returns a numerical version of the $(CC) compiler version. |
454 | The format is <major><minor> where both are two digits. So for example | 474 | The format is <major><minor> where both are two digits. So for example |
455 | gcc 3.41 would return 0341. | 475 | gcc 3.41 would return 0341. |
456 | cc-version is useful when a specific $(CC) version is faulty in one | 476 | cc-version is useful when a specific $(CC) version is faulty in one |
457 | area, for example the -mregparm=3 were broken in some gcc version | 477 | area, for example -mregparm=3 was broken in some gcc versions |
458 | even though the option was accepted by gcc. | 478 | even though the option was accepted by gcc. |
459 | 479 | ||
460 | Example: | 480 | Example: |
@@ -463,20 +483,20 @@ more details, with real examples. | |||
463 | if [ $(call cc-version) -ge 0300 ] ; then \ | 483 | if [ $(call cc-version) -ge 0300 ] ; then \ |
464 | echo "-mregparm=3"; fi ;) | 484 | echo "-mregparm=3"; fi ;) |
465 | 485 | ||
466 | In the above example -mregparm=3 is only used for gcc version greater | 486 | In the above example, -mregparm=3 is only used for gcc version greater |
467 | than or equal to gcc 3.0. | 487 | than or equal to gcc 3.0. |
468 | 488 | ||
469 | cc-ifversion | 489 | cc-ifversion |
470 | cc-ifversion test the version of $(CC) and equals last argument if | 490 | cc-ifversion tests the version of $(CC) and equals last argument if |
471 | version expression is true. | 491 | version expression is true. |
472 | 492 | ||
473 | Example: | 493 | Example: |
474 | #fs/reiserfs/Makefile | 494 | #fs/reiserfs/Makefile |
475 | EXTRA_CFLAGS := $(call cc-ifversion, -lt, 0402, -O1) | 495 | EXTRA_CFLAGS := $(call cc-ifversion, -lt, 0402, -O1) |
476 | 496 | ||
477 | In this example EXTRA_CFLAGS will be assigned the value -O1 if the | 497 | In this example, EXTRA_CFLAGS will be assigned the value -O1 if the |
478 | $(CC) version is less than 4.2. | 498 | $(CC) version is less than 4.2. |
479 | cc-ifversion takes all the shell operators: | 499 | cc-ifversion takes all the shell operators: |
480 | -eq, -ne, -lt, -le, -gt, and -ge | 500 | -eq, -ne, -lt, -le, -gt, and -ge |
481 | The third parameter may be a text as in this example, but it may also | 501 | The third parameter may be a text as in this example, but it may also |
482 | be an expanded variable or a macro. | 502 | be an expanded variable or a macro. |
@@ -492,7 +512,7 @@ The first step is to tell kbuild that a host program exists. This is | |||
492 | done utilising the variable hostprogs-y. | 512 | done utilising the variable hostprogs-y. |
493 | 513 | ||
494 | The second step is to add an explicit dependency to the executable. | 514 | The second step is to add an explicit dependency to the executable. |
495 | This can be done in two ways. Either add the dependency in a rule, | 515 | This can be done in two ways. Either add the dependency in a rule, |
496 | or utilise the variable $(always). | 516 | or utilise the variable $(always). |
497 | Both possibilities are described in the following. | 517 | Both possibilities are described in the following. |
498 | 518 | ||
@@ -509,28 +529,28 @@ Both possibilities are described in the following. | |||
509 | Kbuild assumes in the above example that bin2hex is made from a single | 529 | Kbuild assumes in the above example that bin2hex is made from a single |
510 | c-source file named bin2hex.c located in the same directory as | 530 | c-source file named bin2hex.c located in the same directory as |
511 | the Makefile. | 531 | the Makefile. |
512 | 532 | ||
513 | --- 4.2 Composite Host Programs | 533 | --- 4.2 Composite Host Programs |
514 | 534 | ||
515 | Host programs can be made up based on composite objects. | 535 | Host programs can be made up based on composite objects. |
516 | The syntax used to define composite objects for host programs is | 536 | The syntax used to define composite objects for host programs is |
517 | similar to the syntax used for kernel objects. | 537 | similar to the syntax used for kernel objects. |
518 | $(<executeable>-objs) list all objects used to link the final | 538 | $(<executeable>-objs) lists all objects used to link the final |
519 | executable. | 539 | executable. |
520 | 540 | ||
521 | Example: | 541 | Example: |
522 | #scripts/lxdialog/Makefile | 542 | #scripts/lxdialog/Makefile |
523 | hostprogs-y := lxdialog | 543 | hostprogs-y := lxdialog |
524 | lxdialog-objs := checklist.o lxdialog.o | 544 | lxdialog-objs := checklist.o lxdialog.o |
525 | 545 | ||
526 | Objects with extension .o are compiled from the corresponding .c | 546 | Objects with extension .o are compiled from the corresponding .c |
527 | files. In the above example checklist.c is compiled to checklist.o | 547 | files. In the above example, checklist.c is compiled to checklist.o |
528 | and lxdialog.c is compiled to lxdialog.o. | 548 | and lxdialog.c is compiled to lxdialog.o. |
529 | Finally the two .o files are linked to the executable, lxdialog. | 549 | Finally, the two .o files are linked to the executable, lxdialog. |
530 | Note: The syntax <executable>-y is not permitted for host-programs. | 550 | Note: The syntax <executable>-y is not permitted for host-programs. |
531 | 551 | ||
532 | --- 4.3 Defining shared libraries | 552 | --- 4.3 Defining shared libraries |
533 | 553 | ||
534 | Objects with extension .so are considered shared libraries, and | 554 | Objects with extension .so are considered shared libraries, and |
535 | will be compiled as position independent objects. | 555 | will be compiled as position independent objects. |
536 | Kbuild provides support for shared libraries, but the usage | 556 | Kbuild provides support for shared libraries, but the usage |
@@ -543,7 +563,7 @@ Both possibilities are described in the following. | |||
543 | hostprogs-y := conf | 563 | hostprogs-y := conf |
544 | conf-objs := conf.o libkconfig.so | 564 | conf-objs := conf.o libkconfig.so |
545 | libkconfig-objs := expr.o type.o | 565 | libkconfig-objs := expr.o type.o |
546 | 566 | ||
547 | Shared libraries always require a corresponding -objs line, and | 567 | Shared libraries always require a corresponding -objs line, and |
548 | in the example above the shared library libkconfig is composed by | 568 | in the example above the shared library libkconfig is composed by |
549 | the two objects expr.o and type.o. | 569 | the two objects expr.o and type.o. |
@@ -564,7 +584,7 @@ Both possibilities are described in the following. | |||
564 | 584 | ||
565 | In the example above the executable is composed of the C++ file | 585 | In the example above the executable is composed of the C++ file |
566 | qconf.cc - identified by $(qconf-cxxobjs). | 586 | qconf.cc - identified by $(qconf-cxxobjs). |
567 | 587 | ||
568 | If qconf is composed by a mixture of .c and .cc files, then an | 588 | If qconf is composed by a mixture of .c and .cc files, then an |
569 | additional line can be used to identify this. | 589 | additional line can be used to identify this. |
570 | 590 | ||
@@ -573,34 +593,35 @@ Both possibilities are described in the following. | |||
573 | hostprogs-y := qconf | 593 | hostprogs-y := qconf |
574 | qconf-cxxobjs := qconf.o | 594 | qconf-cxxobjs := qconf.o |
575 | qconf-objs := check.o | 595 | qconf-objs := check.o |
576 | 596 | ||
577 | --- 4.5 Controlling compiler options for host programs | 597 | --- 4.5 Controlling compiler options for host programs |
578 | 598 | ||
579 | When compiling host programs, it is possible to set specific flags. | 599 | When compiling host programs, it is possible to set specific flags. |
580 | The programs will always be compiled utilising $(HOSTCC) passed | 600 | The programs will always be compiled utilising $(HOSTCC) passed |
581 | the options specified in $(HOSTCFLAGS). | 601 | the options specified in $(HOSTCFLAGS). |
582 | To set flags that will take effect for all host programs created | 602 | To set flags that will take effect for all host programs created |
583 | in that Makefile use the variable HOST_EXTRACFLAGS. | 603 | in that Makefile, use the variable HOST_EXTRACFLAGS. |
584 | 604 | ||
585 | Example: | 605 | Example: |
586 | #scripts/lxdialog/Makefile | 606 | #scripts/lxdialog/Makefile |
587 | HOST_EXTRACFLAGS += -I/usr/include/ncurses | 607 | HOST_EXTRACFLAGS += -I/usr/include/ncurses |
588 | 608 | ||
589 | To set specific flags for a single file the following construction | 609 | To set specific flags for a single file the following construction |
590 | is used: | 610 | is used: |
591 | 611 | ||
592 | Example: | 612 | Example: |
593 | #arch/ppc64/boot/Makefile | 613 | #arch/ppc64/boot/Makefile |
594 | HOSTCFLAGS_piggyback.o := -DKERNELBASE=$(KERNELBASE) | 614 | HOSTCFLAGS_piggyback.o := -DKERNELBASE=$(KERNELBASE) |
595 | 615 | ||
596 | It is also possible to specify additional options to the linker. | 616 | It is also possible to specify additional options to the linker. |
597 | 617 | ||
598 | Example: | 618 | Example: |
599 | #scripts/kconfig/Makefile | 619 | #scripts/kconfig/Makefile |
600 | HOSTLOADLIBES_qconf := -L$(QTDIR)/lib | 620 | HOSTLOADLIBES_qconf := -L$(QTDIR)/lib |
601 | 621 | ||
602 | When linking qconf it will be passed the extra option "-L$(QTDIR)/lib". | 622 | When linking qconf, it will be passed the extra option |
603 | 623 | "-L$(QTDIR)/lib". | |
624 | |||
604 | --- 4.6 When host programs are actually built | 625 | --- 4.6 When host programs are actually built |
605 | 626 | ||
606 | Kbuild will only build host-programs when they are referenced | 627 | Kbuild will only build host-programs when they are referenced |
@@ -615,7 +636,7 @@ Both possibilities are described in the following. | |||
615 | $(obj)/devlist.h: $(src)/pci.ids $(obj)/gen-devlist | 636 | $(obj)/devlist.h: $(src)/pci.ids $(obj)/gen-devlist |
616 | ( cd $(obj); ./gen-devlist ) < $< | 637 | ( cd $(obj); ./gen-devlist ) < $< |
617 | 638 | ||
618 | The target $(obj)/devlist.h will not be built before | 639 | The target $(obj)/devlist.h will not be built before |
619 | $(obj)/gen-devlist is updated. Note that references to | 640 | $(obj)/gen-devlist is updated. Note that references to |
620 | the host programs in special rules must be prefixed with $(obj). | 641 | the host programs in special rules must be prefixed with $(obj). |
621 | 642 | ||
@@ -634,7 +655,7 @@ Both possibilities are described in the following. | |||
634 | 655 | ||
635 | --- 4.7 Using hostprogs-$(CONFIG_FOO) | 656 | --- 4.7 Using hostprogs-$(CONFIG_FOO) |
636 | 657 | ||
637 | A typcal pattern in a Kbuild file lok like this: | 658 | A typical pattern in a Kbuild file looks like this: |
638 | 659 | ||
639 | Example: | 660 | Example: |
640 | #scripts/Makefile | 661 | #scripts/Makefile |
@@ -642,13 +663,13 @@ Both possibilities are described in the following. | |||
642 | 663 | ||
643 | Kbuild knows about both 'y' for built-in and 'm' for module. | 664 | Kbuild knows about both 'y' for built-in and 'm' for module. |
644 | So if a config symbol evaluate to 'm', kbuild will still build | 665 | So if a config symbol evaluate to 'm', kbuild will still build |
645 | the binary. In other words Kbuild handle hostprogs-m exactly | 666 | the binary. In other words, Kbuild handles hostprogs-m exactly |
646 | like hostprogs-y. But only hostprogs-y is recommend used | 667 | like hostprogs-y. But only hostprogs-y is recommended to be used |
647 | when no CONFIG symbol are involved. | 668 | when no CONFIG symbols are involved. |
648 | 669 | ||
649 | === 5 Kbuild clean infrastructure | 670 | === 5 Kbuild clean infrastructure |
650 | 671 | ||
651 | "make clean" deletes most generated files in the src tree where the kernel | 672 | "make clean" deletes most generated files in the obj tree where the kernel |
652 | is compiled. This includes generated files such as host programs. | 673 | is compiled. This includes generated files such as host programs. |
653 | Kbuild knows targets listed in $(hostprogs-y), $(hostprogs-m), $(always), | 674 | Kbuild knows targets listed in $(hostprogs-y), $(hostprogs-m), $(always), |
654 | $(extra-y) and $(targets). They are all deleted during "make clean". | 675 | $(extra-y) and $(targets). They are all deleted during "make clean". |
@@ -666,7 +687,8 @@ When executing "make clean", the two files "devlist.h classlist.h" will | |||
666 | be deleted. Kbuild will assume files to be in same relative directory as the | 687 | be deleted. Kbuild will assume files to be in same relative directory as the |
667 | Makefile except if an absolute path is specified (path starting with '/'). | 688 | Makefile except if an absolute path is specified (path starting with '/'). |
668 | 689 | ||
669 | To delete a directory hirachy use: | 690 | To delete a directory hierarchy use: |
691 | |||
670 | Example: | 692 | Example: |
671 | #scripts/package/Makefile | 693 | #scripts/package/Makefile |
672 | clean-dirs := $(objtree)/debian/ | 694 | clean-dirs := $(objtree)/debian/ |
@@ -709,29 +731,29 @@ be visited during "make clean". | |||
709 | 731 | ||
710 | The top level Makefile sets up the environment and does the preparation, | 732 | The top level Makefile sets up the environment and does the preparation, |
711 | before starting to descend down in the individual directories. | 733 | before starting to descend down in the individual directories. |
712 | The top level makefile contains the generic part, whereas the | 734 | The top level makefile contains the generic part, whereas |
713 | arch/$(ARCH)/Makefile contains what is required to set-up kbuild | 735 | arch/$(ARCH)/Makefile contains what is required to set up kbuild |
714 | to the said architecture. | 736 | for said architecture. |
715 | To do so arch/$(ARCH)/Makefile sets a number of variables, and defines | 737 | To do so, arch/$(ARCH)/Makefile sets up a number of variables and defines |
716 | a few targets. | 738 | a few targets. |
717 | 739 | ||
718 | When kbuild executes the following steps are followed (roughly): | 740 | When kbuild executes, the following steps are followed (roughly): |
719 | 1) Configuration of the kernel => produced .config | 741 | 1) Configuration of the kernel => produce .config |
720 | 2) Store kernel version in include/linux/version.h | 742 | 2) Store kernel version in include/linux/version.h |
721 | 3) Symlink include/asm to include/asm-$(ARCH) | 743 | 3) Symlink include/asm to include/asm-$(ARCH) |
722 | 4) Updating all other prerequisites to the target prepare: | 744 | 4) Updating all other prerequisites to the target prepare: |
723 | - Additional prerequisites are specified in arch/$(ARCH)/Makefile | 745 | - Additional prerequisites are specified in arch/$(ARCH)/Makefile |
724 | 5) Recursively descend down in all directories listed in | 746 | 5) Recursively descend down in all directories listed in |
725 | init-* core* drivers-* net-* libs-* and build all targets. | 747 | init-* core* drivers-* net-* libs-* and build all targets. |
726 | - The value of the above variables are extended in arch/$(ARCH)/Makefile. | 748 | - The values of the above variables are expanded in arch/$(ARCH)/Makefile. |
727 | 6) All object files are then linked and the resulting file vmlinux is | 749 | 6) All object files are then linked and the resulting file vmlinux is |
728 | located at the root of the src tree. | 750 | located at the root of the obj tree. |
729 | The very first objects linked are listed in head-y, assigned by | 751 | The very first objects linked are listed in head-y, assigned by |
730 | arch/$(ARCH)/Makefile. | 752 | arch/$(ARCH)/Makefile. |
731 | 7) Finally the architecture specific part does any required post processing | 753 | 7) Finally, the architecture specific part does any required post processing |
732 | and builds the final bootimage. | 754 | and builds the final bootimage. |
733 | - This includes building boot records | 755 | - This includes building boot records |
734 | - Preparing initrd images and the like | 756 | - Preparing initrd images and thelike |
735 | 757 | ||
736 | 758 | ||
737 | --- 6.1 Set variables to tweak the build to the architecture | 759 | --- 6.1 Set variables to tweak the build to the architecture |
@@ -746,7 +768,7 @@ When kbuild executes the following steps are followed (roughly): | |||
746 | LDFLAGS := -m elf_s390 | 768 | LDFLAGS := -m elf_s390 |
747 | Note: EXTRA_LDFLAGS and LDFLAGS_$@ can be used to further customise | 769 | Note: EXTRA_LDFLAGS and LDFLAGS_$@ can be used to further customise |
748 | the flags used. See chapter 7. | 770 | the flags used. See chapter 7. |
749 | 771 | ||
750 | LDFLAGS_MODULE Options for $(LD) when linking modules | 772 | LDFLAGS_MODULE Options for $(LD) when linking modules |
751 | 773 | ||
752 | LDFLAGS_MODULE is used to set specific flags for $(LD) when | 774 | LDFLAGS_MODULE is used to set specific flags for $(LD) when |
@@ -756,7 +778,7 @@ When kbuild executes the following steps are followed (roughly): | |||
756 | LDFLAGS_vmlinux Options for $(LD) when linking vmlinux | 778 | LDFLAGS_vmlinux Options for $(LD) when linking vmlinux |
757 | 779 | ||
758 | LDFLAGS_vmlinux is used to specify additional flags to pass to | 780 | LDFLAGS_vmlinux is used to specify additional flags to pass to |
759 | the linker when linking the final vmlinux. | 781 | the linker when linking the final vmlinux image. |
760 | LDFLAGS_vmlinux uses the LDFLAGS_$@ support. | 782 | LDFLAGS_vmlinux uses the LDFLAGS_$@ support. |
761 | 783 | ||
762 | Example: | 784 | Example: |
@@ -766,7 +788,7 @@ When kbuild executes the following steps are followed (roughly): | |||
766 | OBJCOPYFLAGS objcopy flags | 788 | OBJCOPYFLAGS objcopy flags |
767 | 789 | ||
768 | When $(call if_changed,objcopy) is used to translate a .o file, | 790 | When $(call if_changed,objcopy) is used to translate a .o file, |
769 | then the flags specified in OBJCOPYFLAGS will be used. | 791 | the flags specified in OBJCOPYFLAGS will be used. |
770 | $(call if_changed,objcopy) is often used to generate raw binaries on | 792 | $(call if_changed,objcopy) is often used to generate raw binaries on |
771 | vmlinux. | 793 | vmlinux. |
772 | 794 | ||
@@ -778,7 +800,7 @@ When kbuild executes the following steps are followed (roughly): | |||
778 | $(obj)/image: vmlinux FORCE | 800 | $(obj)/image: vmlinux FORCE |
779 | $(call if_changed,objcopy) | 801 | $(call if_changed,objcopy) |
780 | 802 | ||
781 | In this example the binary $(obj)/image is a binary version of | 803 | In this example, the binary $(obj)/image is a binary version of |
782 | vmlinux. The usage of $(call if_changed,xxx) will be described later. | 804 | vmlinux. The usage of $(call if_changed,xxx) will be described later. |
783 | 805 | ||
784 | AFLAGS $(AS) assembler flags | 806 | AFLAGS $(AS) assembler flags |
@@ -795,7 +817,7 @@ When kbuild executes the following steps are followed (roughly): | |||
795 | Default value - see top level Makefile | 817 | Default value - see top level Makefile |
796 | Append or modify as required per architecture. | 818 | Append or modify as required per architecture. |
797 | 819 | ||
798 | Often the CFLAGS variable depends on the configuration. | 820 | Often, the CFLAGS variable depends on the configuration. |
799 | 821 | ||
800 | Example: | 822 | Example: |
801 | #arch/i386/Makefile | 823 | #arch/i386/Makefile |
@@ -816,7 +838,7 @@ When kbuild executes the following steps are followed (roughly): | |||
816 | ... | 838 | ... |
817 | 839 | ||
818 | 840 | ||
819 | The first examples utilises the trick that a config option expands | 841 | The first example utilises the trick that a config option expands |
820 | to 'y' when selected. | 842 | to 'y' when selected. |
821 | 843 | ||
822 | CFLAGS_KERNEL $(CC) options specific for built-in | 844 | CFLAGS_KERNEL $(CC) options specific for built-in |
@@ -829,18 +851,18 @@ When kbuild executes the following steps are followed (roughly): | |||
829 | $(CFLAGS_MODULE) contains extra C compiler flags used to compile code | 851 | $(CFLAGS_MODULE) contains extra C compiler flags used to compile code |
830 | for loadable kernel modules. | 852 | for loadable kernel modules. |
831 | 853 | ||
832 | 854 | ||
833 | --- 6.2 Add prerequisites to archprepare: | 855 | --- 6.2 Add prerequisites to archprepare: |
834 | 856 | ||
835 | The archprepare: rule is used to list prerequisites that needs to be | 857 | The archprepare: rule is used to list prerequisites that need to be |
836 | built before starting to descend down in the subdirectories. | 858 | built before starting to descend down in the subdirectories. |
837 | This is usual header files containing assembler constants. | 859 | This is usually used for header files containing assembler constants. |
838 | 860 | ||
839 | Example: | 861 | Example: |
840 | #arch/arm/Makefile | 862 | #arch/arm/Makefile |
841 | archprepare: maketools | 863 | archprepare: maketools |
842 | 864 | ||
843 | In this example the file target maketools will be processed | 865 | In this example, the file target maketools will be processed |
844 | before descending down in the subdirectories. | 866 | before descending down in the subdirectories. |
845 | See also chapter XXX-TODO that describe how kbuild supports | 867 | See also chapter XXX-TODO that describe how kbuild supports |
846 | generating offset header files. | 868 | generating offset header files. |
@@ -853,18 +875,19 @@ When kbuild executes the following steps are followed (roughly): | |||
853 | corresponding arch-specific section for modules; the module-building | 875 | corresponding arch-specific section for modules; the module-building |
854 | machinery is all architecture-independent. | 876 | machinery is all architecture-independent. |
855 | 877 | ||
856 | 878 | ||
857 | head-y, init-y, core-y, libs-y, drivers-y, net-y | 879 | head-y, init-y, core-y, libs-y, drivers-y, net-y |
858 | 880 | ||
859 | $(head-y) list objects to be linked first in vmlinux. | 881 | $(head-y) lists objects to be linked first in vmlinux. |
860 | $(libs-y) list directories where a lib.a archive can be located. | 882 | $(libs-y) lists directories where a lib.a archive can be located. |
861 | The rest list directories where a built-in.o object file can be located. | 883 | The rest lists directories where a built-in.o object file can be |
884 | located. | ||
862 | 885 | ||
863 | $(init-y) objects will be located after $(head-y). | 886 | $(init-y) objects will be located after $(head-y). |
864 | Then the rest follows in this order: | 887 | Then the rest follows in this order: |
865 | $(core-y), $(libs-y), $(drivers-y) and $(net-y). | 888 | $(core-y), $(libs-y), $(drivers-y) and $(net-y). |
866 | 889 | ||
867 | The top level Makefile define values for all generic directories, | 890 | The top level Makefile defines values for all generic directories, |
868 | and arch/$(ARCH)/Makefile only adds architecture specific directories. | 891 | and arch/$(ARCH)/Makefile only adds architecture specific directories. |
869 | 892 | ||
870 | Example: | 893 | Example: |
@@ -901,27 +924,27 @@ When kbuild executes the following steps are followed (roughly): | |||
901 | "$(Q)$(MAKE) $(build)=<dir>" is the recommended way to invoke | 924 | "$(Q)$(MAKE) $(build)=<dir>" is the recommended way to invoke |
902 | make in a subdirectory. | 925 | make in a subdirectory. |
903 | 926 | ||
904 | There are no rules for naming of the architecture specific targets, | 927 | There are no rules for naming architecture specific targets, |
905 | but executing "make help" will list all relevant targets. | 928 | but executing "make help" will list all relevant targets. |
906 | To support this $(archhelp) must be defined. | 929 | To support this, $(archhelp) must be defined. |
907 | 930 | ||
908 | Example: | 931 | Example: |
909 | #arch/i386/Makefile | 932 | #arch/i386/Makefile |
910 | define archhelp | 933 | define archhelp |
911 | echo '* bzImage - Image (arch/$(ARCH)/boot/bzImage)' | 934 | echo '* bzImage - Image (arch/$(ARCH)/boot/bzImage)' |
912 | endef | 935 | endif |
913 | 936 | ||
914 | When make is executed without arguments, the first goal encountered | 937 | When make is executed without arguments, the first goal encountered |
915 | will be built. In the top level Makefile the first goal present | 938 | will be built. In the top level Makefile the first goal present |
916 | is all:. | 939 | is all:. |
917 | An architecture shall always per default build a bootable image. | 940 | An architecture shall always, per default, build a bootable image. |
918 | In "make help" the default goal is highlighted with a '*'. | 941 | In "make help", the default goal is highlighted with a '*'. |
919 | Add a new prerequisite to all: to select a default goal different | 942 | Add a new prerequisite to all: to select a default goal different |
920 | from vmlinux. | 943 | from vmlinux. |
921 | 944 | ||
922 | Example: | 945 | Example: |
923 | #arch/i386/Makefile | 946 | #arch/i386/Makefile |
924 | all: bzImage | 947 | all: bzImage |
925 | 948 | ||
926 | When "make" is executed without arguments, bzImage will be built. | 949 | When "make" is executed without arguments, bzImage will be built. |
927 | 950 | ||
@@ -941,10 +964,10 @@ When kbuild executes the following steps are followed (roughly): | |||
941 | #arch/i386/kernel/Makefile | 964 | #arch/i386/kernel/Makefile |
942 | extra-y := head.o init_task.o | 965 | extra-y := head.o init_task.o |
943 | 966 | ||
944 | In this example extra-y is used to list object files that | 967 | In this example, extra-y is used to list object files that |
945 | shall be built, but shall not be linked as part of built-in.o. | 968 | shall be built, but shall not be linked as part of built-in.o. |
946 | 969 | ||
947 | 970 | ||
948 | --- 6.6 Commands useful for building a boot image | 971 | --- 6.6 Commands useful for building a boot image |
949 | 972 | ||
950 | Kbuild provides a few macros that are useful when building a | 973 | Kbuild provides a few macros that are useful when building a |
@@ -958,8 +981,8 @@ When kbuild executes the following steps are followed (roughly): | |||
958 | target: source(s) FORCE | 981 | target: source(s) FORCE |
959 | $(call if_changed,ld/objcopy/gzip) | 982 | $(call if_changed,ld/objcopy/gzip) |
960 | 983 | ||
961 | When the rule is evaluated it is checked to see if any files | 984 | When the rule is evaluated, it is checked to see if any files |
962 | needs an update, or the commandline has changed since last | 985 | needs an update, or the command line has changed since the last |
963 | invocation. The latter will force a rebuild if any options | 986 | invocation. The latter will force a rebuild if any options |
964 | to the executable have changed. | 987 | to the executable have changed. |
965 | Any target that utilises if_changed must be listed in $(targets), | 988 | Any target that utilises if_changed must be listed in $(targets), |
@@ -977,8 +1000,8 @@ When kbuild executes the following steps are followed (roughly): | |||
977 | #WRONG!# $(call if_changed, ld/objcopy/gzip) | 1000 | #WRONG!# $(call if_changed, ld/objcopy/gzip) |
978 | 1001 | ||
979 | ld | 1002 | ld |
980 | Link target. Often LDFLAGS_$@ is used to set specific options to ld. | 1003 | Link target. Often, LDFLAGS_$@ is used to set specific options to ld. |
981 | 1004 | ||
982 | objcopy | 1005 | objcopy |
983 | Copy binary. Uses OBJCOPYFLAGS usually specified in | 1006 | Copy binary. Uses OBJCOPYFLAGS usually specified in |
984 | arch/$(ARCH)/Makefile. | 1007 | arch/$(ARCH)/Makefile. |
@@ -996,10 +1019,10 @@ When kbuild executes the following steps are followed (roughly): | |||
996 | $(obj)/setup $(obj)/bootsect: %: %.o FORCE | 1019 | $(obj)/setup $(obj)/bootsect: %: %.o FORCE |
997 | $(call if_changed,ld) | 1020 | $(call if_changed,ld) |
998 | 1021 | ||
999 | In this example there are two possible targets, requiring different | 1022 | In this example, there are two possible targets, requiring different |
1000 | options to the linker. the linker options are specified using the | 1023 | options to the linker. The linker options are specified using the |
1001 | LDFLAGS_$@ syntax - one for each potential target. | 1024 | LDFLAGS_$@ syntax - one for each potential target. |
1002 | $(targets) are assinged all potential targets, herby kbuild knows | 1025 | $(targets) are assinged all potential targets, by which kbuild knows |
1003 | the targets and will: | 1026 | the targets and will: |
1004 | 1) check for commandline changes | 1027 | 1) check for commandline changes |
1005 | 2) delete target during make clean | 1028 | 2) delete target during make clean |
@@ -1013,7 +1036,7 @@ When kbuild executes the following steps are followed (roughly): | |||
1013 | 1036 | ||
1014 | --- 6.7 Custom kbuild commands | 1037 | --- 6.7 Custom kbuild commands |
1015 | 1038 | ||
1016 | When kbuild is executing with KBUILD_VERBOSE=0 then only a shorthand | 1039 | When kbuild is executing with KBUILD_VERBOSE=0, then only a shorthand |
1017 | of a command is normally displayed. | 1040 | of a command is normally displayed. |
1018 | To enable this behaviour for custom commands kbuild requires | 1041 | To enable this behaviour for custom commands kbuild requires |
1019 | two variables to be set: | 1042 | two variables to be set: |
@@ -1031,34 +1054,34 @@ When kbuild executes the following steps are followed (roughly): | |||
1031 | $(call if_changed,image) | 1054 | $(call if_changed,image) |
1032 | @echo 'Kernel: $@ is ready' | 1055 | @echo 'Kernel: $@ is ready' |
1033 | 1056 | ||
1034 | When updating the $(obj)/bzImage target the line: | 1057 | When updating the $(obj)/bzImage target, the line |
1035 | 1058 | ||
1036 | BUILD arch/i386/boot/bzImage | 1059 | BUILD arch/i386/boot/bzImage |
1037 | 1060 | ||
1038 | will be displayed with "make KBUILD_VERBOSE=0". | 1061 | will be displayed with "make KBUILD_VERBOSE=0". |
1039 | 1062 | ||
1040 | 1063 | ||
1041 | --- 6.8 Preprocessing linker scripts | 1064 | --- 6.8 Preprocessing linker scripts |
1042 | 1065 | ||
1043 | When the vmlinux image is build the linker script: | 1066 | When the vmlinux image is built, the linker script |
1044 | arch/$(ARCH)/kernel/vmlinux.lds is used. | 1067 | arch/$(ARCH)/kernel/vmlinux.lds is used. |
1045 | The script is a preprocessed variant of the file vmlinux.lds.S | 1068 | The script is a preprocessed variant of the file vmlinux.lds.S |
1046 | located in the same directory. | 1069 | located in the same directory. |
1047 | kbuild knows .lds file and includes a rule *lds.S -> *lds. | 1070 | kbuild knows .lds files and includes a rule *lds.S -> *lds. |
1048 | 1071 | ||
1049 | Example: | 1072 | Example: |
1050 | #arch/i386/kernel/Makefile | 1073 | #arch/i386/kernel/Makefile |
1051 | always := vmlinux.lds | 1074 | always := vmlinux.lds |
1052 | 1075 | ||
1053 | #Makefile | 1076 | #Makefile |
1054 | export CPPFLAGS_vmlinux.lds += -P -C -U$(ARCH) | 1077 | export CPPFLAGS_vmlinux.lds += -P -C -U$(ARCH) |
1055 | 1078 | ||
1056 | The assigment to $(always) is used to tell kbuild to build the | 1079 | The assignment to $(always) is used to tell kbuild to build the |
1057 | target: vmlinux.lds. | 1080 | target vmlinux.lds. |
1058 | The assignment to $(CPPFLAGS_vmlinux.lds) tell kbuild to use the | 1081 | The assignment to $(CPPFLAGS_vmlinux.lds) tells kbuild to use the |
1059 | specified options when building the target vmlinux.lds. | 1082 | specified options when building the target vmlinux.lds. |
1060 | 1083 | ||
1061 | When building the *.lds target kbuild used the variakles: | 1084 | When building the *.lds target, kbuild uses the variables: |
1062 | CPPFLAGS : Set in top-level Makefile | 1085 | CPPFLAGS : Set in top-level Makefile |
1063 | EXTRA_CPPFLAGS : May be set in the kbuild makefile | 1086 | EXTRA_CPPFLAGS : May be set in the kbuild makefile |
1064 | CPPFLAGS_$(@F) : Target specific flags. | 1087 | CPPFLAGS_$(@F) : Target specific flags. |
@@ -1123,9 +1146,17 @@ The top Makefile exports the following variables: | |||
1123 | $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE). The user may | 1146 | $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE). The user may |
1124 | override this value on the command line if desired. | 1147 | override this value on the command line if desired. |
1125 | 1148 | ||
1149 | INSTALL_MOD_STRIP | ||
1150 | |||
1151 | If this variable is specified, will cause modules to be stripped | ||
1152 | after they are installed. If INSTALL_MOD_STRIP is '1', then the | ||
1153 | default option --strip-debug will be used. Otherwise, | ||
1154 | INSTALL_MOD_STRIP will used as the option(s) to the strip command. | ||
1155 | |||
1156 | |||
1126 | === 8 Makefile language | 1157 | === 8 Makefile language |
1127 | 1158 | ||
1128 | The kernel Makefiles are designed to run with GNU Make. The Makefiles | 1159 | The kernel Makefiles are designed to be run with GNU Make. The Makefiles |
1129 | use only the documented features of GNU Make, but they do use many | 1160 | use only the documented features of GNU Make, but they do use many |
1130 | GNU extensions. | 1161 | GNU extensions. |
1131 | 1162 | ||
@@ -1147,10 +1178,13 @@ is the right choice. | |||
1147 | Original version made by Michael Elizabeth Chastain, <mailto:mec@shout.net> | 1178 | Original version made by Michael Elizabeth Chastain, <mailto:mec@shout.net> |
1148 | Updates by Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de> | 1179 | Updates by Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de> |
1149 | Updates by Sam Ravnborg <sam@ravnborg.org> | 1180 | Updates by Sam Ravnborg <sam@ravnborg.org> |
1181 | Language QA by Jan Engelhardt <jengelh@gmx.de> | ||
1150 | 1182 | ||
1151 | === 10 TODO | 1183 | === 10 TODO |
1152 | 1184 | ||
1153 | - Describe how kbuild support shipped files with _shipped. | 1185 | - Describe how kbuild supports shipped files with _shipped. |
1154 | - Generating offset header files. | 1186 | - Generating offset header files. |
1155 | - Add more variables to section 7? | 1187 | - Add more variables to section 7? |
1156 | 1188 | ||
1189 | |||
1190 | |||
diff --git a/Documentation/kbuild/modules.txt b/Documentation/kbuild/modules.txt index 61fc079eb966..2e7702e94a78 100644 --- a/Documentation/kbuild/modules.txt +++ b/Documentation/kbuild/modules.txt | |||
@@ -1,7 +1,7 @@ | |||
1 | 1 | ||
2 | In this document you will find information about: | 2 | In this document you will find information about: |
3 | - how to build external modules | 3 | - how to build external modules |
4 | - how to make your module use kbuild infrastructure | 4 | - how to make your module use the kbuild infrastructure |
5 | - how kbuild will install a kernel | 5 | - how kbuild will install a kernel |
6 | - how to install modules in a non-standard location | 6 | - how to install modules in a non-standard location |
7 | 7 | ||
@@ -24,7 +24,7 @@ In this document you will find information about: | |||
24 | --- 6.1 INSTALL_MOD_PATH | 24 | --- 6.1 INSTALL_MOD_PATH |
25 | --- 6.2 INSTALL_MOD_DIR | 25 | --- 6.2 INSTALL_MOD_DIR |
26 | === 7. Module versioning & Module.symvers | 26 | === 7. Module versioning & Module.symvers |
27 | --- 7.1 Symbols fron the kernel (vmlinux + modules) | 27 | --- 7.1 Symbols from the kernel (vmlinux + modules) |
28 | --- 7.2 Symbols and external modules | 28 | --- 7.2 Symbols and external modules |
29 | --- 7.3 Symbols from another external module | 29 | --- 7.3 Symbols from another external module |
30 | === 8. Tips & Tricks | 30 | === 8. Tips & Tricks |
@@ -36,13 +36,13 @@ In this document you will find information about: | |||
36 | 36 | ||
37 | kbuild includes functionality for building modules both | 37 | kbuild includes functionality for building modules both |
38 | within the kernel source tree and outside the kernel source tree. | 38 | within the kernel source tree and outside the kernel source tree. |
39 | The latter is usually referred to as external modules and is used | 39 | The latter is usually referred to as external or "out-of-tree" |
40 | both during development and for modules that are not planned to be | 40 | modules and is used both during development and for modules that |
41 | included in the kernel tree. | 41 | are not planned to be included in the kernel tree. |
42 | 42 | ||
43 | What is covered within this file is mainly information to authors | 43 | What is covered within this file is mainly information to authors |
44 | of modules. The author of an external modules should supply | 44 | of modules. The author of an external module should supply |
45 | a makefile that hides most of the complexity so one only has to type | 45 | a makefile that hides most of the complexity, so one only has to type |
46 | 'make' to build the module. A complete example will be present in | 46 | 'make' to build the module. A complete example will be present in |
47 | chapter 4, "Creating a kbuild file for an external module". | 47 | chapter 4, "Creating a kbuild file for an external module". |
48 | 48 | ||
@@ -63,14 +63,15 @@ when building an external module. | |||
63 | For the running kernel use: | 63 | For the running kernel use: |
64 | make -C /lib/modules/`uname -r`/build M=`pwd` | 64 | make -C /lib/modules/`uname -r`/build M=`pwd` |
65 | 65 | ||
66 | For the above command to succeed the kernel must have been built with | 66 | For the above command to succeed, the kernel must have been |
67 | modules enabled. | 67 | built with modules enabled. |
68 | 68 | ||
69 | To install the modules that were just built: | 69 | To install the modules that were just built: |
70 | 70 | ||
71 | make -C <path-to-kernel> M=`pwd` modules_install | 71 | make -C <path-to-kernel> M=`pwd` modules_install |
72 | 72 | ||
73 | More complex examples later, the above should get you going. | 73 | More complex examples will be shown later, the above should |
74 | be enough to get you started. | ||
74 | 75 | ||
75 | --- 2.2 Available targets | 76 | --- 2.2 Available targets |
76 | 77 | ||
@@ -89,13 +90,13 @@ when building an external module. | |||
89 | Same functionality as if no target was specified. | 90 | Same functionality as if no target was specified. |
90 | See description above. | 91 | See description above. |
91 | 92 | ||
92 | make -C $KDIR M=$PWD modules_install | 93 | make -C $KDIR M=`pwd` modules_install |
93 | Install the external module(s). | 94 | Install the external module(s). |
94 | Installation default is in /lib/modules/<kernel-version>/extra, | 95 | Installation default is in /lib/modules/<kernel-version>/extra, |
95 | but may be prefixed with INSTALL_MOD_PATH - see separate | 96 | but may be prefixed with INSTALL_MOD_PATH - see separate |
96 | chapter. | 97 | chapter. |
97 | 98 | ||
98 | make -C $KDIR M=$PWD clean | 99 | make -C $KDIR M=`pwd` clean |
99 | Remove all generated files for the module - the kernel | 100 | Remove all generated files for the module - the kernel |
100 | source directory is not modified. | 101 | source directory is not modified. |
101 | 102 | ||
@@ -129,29 +130,28 @@ when building an external module. | |||
129 | 130 | ||
130 | To make sure the kernel contains the information required to | 131 | To make sure the kernel contains the information required to |
131 | build external modules the target 'modules_prepare' must be used. | 132 | build external modules the target 'modules_prepare' must be used. |
132 | 'module_prepare' solely exists as a simple way to prepare | 133 | 'module_prepare' exists solely as a simple way to prepare |
133 | a kernel for building external modules. | 134 | a kernel source tree for building external modules. |
134 | Note: modules_prepare will not build Module.symvers even if | 135 | Note: modules_prepare will not build Module.symvers even if |
135 | CONFIG_MODULEVERSIONING is set. | 136 | CONFIG_MODULEVERSIONING is set. Therefore a full kernel build |
136 | Therefore a full kernel build needs to be executed to make | 137 | needs to be executed to make module versioning work. |
137 | module versioning work. | ||
138 | 138 | ||
139 | --- 2.5 Building separate files for a module | 139 | --- 2.5 Building separate files for a module |
140 | It is possible to build single files which is part of a module. | 140 | It is possible to build single files which are part of a module. |
141 | This works equal for the kernel, a module and even for external | 141 | This works equally well for the kernel, a module and even for |
142 | modules. | 142 | external modules. |
143 | Examples (module foo.ko, consist of bar.o, baz.o): | 143 | Examples (module foo.ko, consist of bar.o, baz.o): |
144 | make -C $KDIR M=`pwd` bar.lst | 144 | make -C $KDIR M=`pwd` bar.lst |
145 | make -C $KDIR M=`pwd` bar.o | 145 | make -C $KDIR M=`pwd` bar.o |
146 | make -C $KDIR M=`pwd` foo.ko | 146 | make -C $KDIR M=`pwd` foo.ko |
147 | make -C $KDIR M=`pwd` / | 147 | make -C $KDIR M=`pwd` / |
148 | 148 | ||
149 | 149 | ||
150 | === 3. Example commands | 150 | === 3. Example commands |
151 | 151 | ||
152 | This example shows the actual commands to be executed when building | 152 | This example shows the actual commands to be executed when building |
153 | an external module for the currently running kernel. | 153 | an external module for the currently running kernel. |
154 | In the example below the distribution is supposed to use the | 154 | In the example below, the distribution is supposed to use the |
155 | facility to locate output files for a kernel compile in a different | 155 | facility to locate output files for a kernel compile in a different |
156 | directory than the kernel source - but the examples will also work | 156 | directory than the kernel source - but the examples will also work |
157 | when the source and the output files are mixed in the same directory. | 157 | when the source and the output files are mixed in the same directory. |
@@ -170,14 +170,14 @@ the following commands to build the module: | |||
170 | O=/lib/modules/`uname-r`/build \ | 170 | O=/lib/modules/`uname-r`/build \ |
171 | M=`pwd` | 171 | M=`pwd` |
172 | 172 | ||
173 | Then to install the module use the following command: | 173 | Then, to install the module use the following command: |
174 | 174 | ||
175 | make -C /usr/src/`uname -r`/source \ | 175 | make -C /usr/src/`uname -r`/source \ |
176 | O=/lib/modules/`uname-r`/build \ | 176 | O=/lib/modules/`uname-r`/build \ |
177 | M=`pwd` \ | 177 | M=`pwd` \ |
178 | modules_install | 178 | modules_install |
179 | 179 | ||
180 | If one looks closely you will see that this is the same commands as | 180 | If you look closely you will see that this is the same command as |
181 | listed before - with the directories spelled out. | 181 | listed before - with the directories spelled out. |
182 | 182 | ||
183 | The above are rather long commands, and the following chapter | 183 | The above are rather long commands, and the following chapter |
@@ -230,7 +230,7 @@ following files: | |||
230 | 230 | ||
231 | endif | 231 | endif |
232 | 232 | ||
233 | In example 1 the check for KERNELRELEASE is used to separate | 233 | In example 1, the check for KERNELRELEASE is used to separate |
234 | the two parts of the Makefile. kbuild will only see the two | 234 | the two parts of the Makefile. kbuild will only see the two |
235 | assignments whereas make will see everything except the two | 235 | assignments whereas make will see everything except the two |
236 | kbuild assignments. | 236 | kbuild assignments. |
@@ -255,7 +255,7 @@ following files: | |||
255 | echo "X" > 8123_bin_shipped | 255 | echo "X" > 8123_bin_shipped |
256 | 256 | ||
257 | 257 | ||
258 | In example 2 we are down to two fairly simple files and for simple | 258 | In example 2, we are down to two fairly simple files and for simple |
259 | files as used in this example the split is questionable. But some | 259 | files as used in this example the split is questionable. But some |
260 | external modules use Makefiles of several hundred lines and here it | 260 | external modules use Makefiles of several hundred lines and here it |
261 | really pays off to separate the kbuild part from the rest. | 261 | really pays off to separate the kbuild part from the rest. |
@@ -282,9 +282,9 @@ following files: | |||
282 | 282 | ||
283 | endif | 283 | endif |
284 | 284 | ||
285 | The trick here is to include the Kbuild file from Makefile so | 285 | The trick here is to include the Kbuild file from Makefile, so |
286 | if an older version of kbuild picks up the Makefile the Kbuild | 286 | if an older version of kbuild picks up the Makefile, the Kbuild |
287 | file will be included. | 287 | file will be included. |
288 | 288 | ||
289 | --- 4.2 Binary blobs included in a module | 289 | --- 4.2 Binary blobs included in a module |
290 | 290 | ||
@@ -301,18 +301,19 @@ following files: | |||
301 | obj-m := 8123.o | 301 | obj-m := 8123.o |
302 | 8123-y := 8123_if.o 8123_pci.o 8123_bin.o | 302 | 8123-y := 8123_if.o 8123_pci.o 8123_bin.o |
303 | 303 | ||
304 | In example 4 there is no distinction between the ordinary .c/.h files | 304 | In example 4, there is no distinction between the ordinary .c/.h files |
305 | and the binary file. But kbuild will pick up different rules to create | 305 | and the binary file. But kbuild will pick up different rules to create |
306 | the .o file. | 306 | the .o file. |
307 | 307 | ||
308 | 308 | ||
309 | === 5. Include files | 309 | === 5. Include files |
310 | 310 | ||
311 | Include files are a necessity when a .c file uses something from another .c | 311 | Include files are a necessity when a .c file uses something from other .c |
312 | files (not strictly in the sense of .c but if good programming practice is | 312 | files (not strictly in the sense of C, but if good programming practice is |
313 | used). Any module that consist of more than one .c file will have a .h file | 313 | used). Any module that consists of more than one .c file will have a .h file |
314 | for one of the .c files. | 314 | for one of the .c files. |
315 | - If the .h file only describes a module internal interface then the .h file | 315 | |
316 | - If the .h file only describes a module internal interface, then the .h file | ||
316 | shall be placed in the same directory as the .c files. | 317 | shall be placed in the same directory as the .c files. |
317 | - If the .h files describe an interface used by other parts of the kernel | 318 | - If the .h files describe an interface used by other parts of the kernel |
318 | located in different directories, the .h files shall be located in | 319 | located in different directories, the .h files shall be located in |
@@ -323,11 +324,11 @@ under include/ such as include/scsi. Another exception is arch-specific | |||
323 | .h files which are located under include/asm-$(ARCH)/*. | 324 | .h files which are located under include/asm-$(ARCH)/*. |
324 | 325 | ||
325 | External modules have a tendency to locate include files in a separate include/ | 326 | External modules have a tendency to locate include files in a separate include/ |
326 | directory and therefore needs to deal with this in their kbuild file. | 327 | directory and therefore need to deal with this in their kbuild file. |
327 | 328 | ||
328 | --- 5.1 How to include files from the kernel include dir | 329 | --- 5.1 How to include files from the kernel include dir |
329 | 330 | ||
330 | When a module needs to include a file from include/linux/ then one | 331 | When a module needs to include a file from include/linux/, then one |
331 | just uses: | 332 | just uses: |
332 | 333 | ||
333 | #include <linux/modules.h> | 334 | #include <linux/modules.h> |
@@ -348,7 +349,7 @@ directory and therefore needs to deal with this in their kbuild file. | |||
348 | The trick here is to use either EXTRA_CFLAGS (take effect for all .c | 349 | The trick here is to use either EXTRA_CFLAGS (take effect for all .c |
349 | files) or CFLAGS_$F.o (take effect only for a single file). | 350 | files) or CFLAGS_$F.o (take effect only for a single file). |
350 | 351 | ||
351 | In our example if we move 8123_if.h to a subdirectory named include/ | 352 | In our example, if we move 8123_if.h to a subdirectory named include/ |
352 | the resulting Kbuild file would look like: | 353 | the resulting Kbuild file would look like: |
353 | 354 | ||
354 | --> filename: Kbuild | 355 | --> filename: Kbuild |
@@ -362,19 +363,19 @@ directory and therefore needs to deal with this in their kbuild file. | |||
362 | 363 | ||
363 | --- 5.3 External modules using several directories | 364 | --- 5.3 External modules using several directories |
364 | 365 | ||
365 | If an external module does not follow the usual kernel style but | 366 | If an external module does not follow the usual kernel style, but |
366 | decide to spread files over several directories then kbuild can | 367 | decides to spread files over several directories, then kbuild can |
367 | support this too. | 368 | handle this too. |
368 | 369 | ||
369 | Consider the following example: | 370 | Consider the following example: |
370 | 371 | ||
371 | | | 372 | | |
372 | +- src/complex_main.c | 373 | +- src/complex_main.c |
373 | | +- hal/hardwareif.c | 374 | | +- hal/hardwareif.c |
374 | | +- hal/include/hardwareif.h | 375 | | +- hal/include/hardwareif.h |
375 | +- include/complex.h | 376 | +- include/complex.h |
376 | 377 | ||
377 | To build a single module named complex.ko we then need the following | 378 | To build a single module named complex.ko, we then need the following |
378 | kbuild file: | 379 | kbuild file: |
379 | 380 | ||
380 | Kbuild: | 381 | Kbuild: |
@@ -387,12 +388,12 @@ directory and therefore needs to deal with this in their kbuild file. | |||
387 | 388 | ||
388 | 389 | ||
389 | kbuild knows how to handle .o files located in another directory - | 390 | kbuild knows how to handle .o files located in another directory - |
390 | although this is NOT reccommended practice. The syntax is to specify | 391 | although this is NOT recommended practice. The syntax is to specify |
391 | the directory relative to the directory where the Kbuild file is | 392 | the directory relative to the directory where the Kbuild file is |
392 | located. | 393 | located. |
393 | 394 | ||
394 | To find the .h files we have to explicitly tell kbuild where to look | 395 | To find the .h files, we have to explicitly tell kbuild where to look |
395 | for the .h files. When kbuild executes current directory is always | 396 | for the .h files. When kbuild executes, the current directory is always |
396 | the root of the kernel tree (argument to -C) and therefore we have to | 397 | the root of the kernel tree (argument to -C) and therefore we have to |
397 | tell kbuild how to find the .h files using absolute paths. | 398 | tell kbuild how to find the .h files using absolute paths. |
398 | $(src) will specify the absolute path to the directory where the | 399 | $(src) will specify the absolute path to the directory where the |
@@ -412,7 +413,7 @@ External modules are installed in the directory: | |||
412 | 413 | ||
413 | --- 6.1 INSTALL_MOD_PATH | 414 | --- 6.1 INSTALL_MOD_PATH |
414 | 415 | ||
415 | Above are the default directories, but as always some level of | 416 | Above are the default directories, but as always, some level of |
416 | customization is possible. One can prefix the path using the variable | 417 | customization is possible. One can prefix the path using the variable |
417 | INSTALL_MOD_PATH: | 418 | INSTALL_MOD_PATH: |
418 | 419 | ||
@@ -420,17 +421,17 @@ External modules are installed in the directory: | |||
420 | => Install dir: /frodo/lib/modules/$(KERNELRELEASE)/kernel | 421 | => Install dir: /frodo/lib/modules/$(KERNELRELEASE)/kernel |
421 | 422 | ||
422 | INSTALL_MOD_PATH may be set as an ordinary shell variable or as in the | 423 | INSTALL_MOD_PATH may be set as an ordinary shell variable or as in the |
423 | example above be specified on the command line when calling make. | 424 | example above, can be specified on the command line when calling make. |
424 | INSTALL_MOD_PATH has effect both when installing modules included in | 425 | INSTALL_MOD_PATH has effect both when installing modules included in |
425 | the kernel as well as when installing external modules. | 426 | the kernel as well as when installing external modules. |
426 | 427 | ||
427 | --- 6.2 INSTALL_MOD_DIR | 428 | --- 6.2 INSTALL_MOD_DIR |
428 | 429 | ||
429 | When installing external modules they are default installed in a | 430 | When installing external modules they are by default installed to a |
430 | directory under /lib/modules/$(KERNELRELEASE)/extra, but one may wish | 431 | directory under /lib/modules/$(KERNELRELEASE)/extra, but one may wish |
431 | to locate modules for a specific functionality in a separate | 432 | to locate modules for a specific functionality in a separate |
432 | directory. For this purpose one can use INSTALL_MOD_DIR to specify an | 433 | directory. For this purpose, one can use INSTALL_MOD_DIR to specify an |
433 | alternative name than 'extra'. | 434 | alternative name to 'extra'. |
434 | 435 | ||
435 | $ make INSTALL_MOD_DIR=gandalf -C KERNELDIR \ | 436 | $ make INSTALL_MOD_DIR=gandalf -C KERNELDIR \ |
436 | M=`pwd` modules_install | 437 | M=`pwd` modules_install |
@@ -444,16 +445,16 @@ Module versioning is enabled by the CONFIG_MODVERSIONS tag. | |||
444 | Module versioning is used as a simple ABI consistency check. The Module | 445 | Module versioning is used as a simple ABI consistency check. The Module |
445 | versioning creates a CRC value of the full prototype for an exported symbol and | 446 | versioning creates a CRC value of the full prototype for an exported symbol and |
446 | when a module is loaded/used then the CRC values contained in the kernel are | 447 | when a module is loaded/used then the CRC values contained in the kernel are |
447 | compared with similar values in the module. If they are not equal then the | 448 | compared with similar values in the module. If they are not equal, then the |
448 | kernel refuses to load the module. | 449 | kernel refuses to load the module. |
449 | 450 | ||
450 | Module.symvers contains a list of all exported symbols from a kernel build. | 451 | Module.symvers contains a list of all exported symbols from a kernel build. |
451 | 452 | ||
452 | --- 7.1 Symbols fron the kernel (vmlinux + modules) | 453 | --- 7.1 Symbols fron the kernel (vmlinux + modules) |
453 | 454 | ||
454 | During a kernel build a file named Module.symvers will be generated. | 455 | During a kernel build, a file named Module.symvers will be generated. |
455 | Module.symvers contains all exported symbols from the kernel and | 456 | Module.symvers contains all exported symbols from the kernel and |
456 | compiled modules. For each symbols the corresponding CRC value | 457 | compiled modules. For each symbols, the corresponding CRC value |
457 | is stored too. | 458 | is stored too. |
458 | 459 | ||
459 | The syntax of the Module.symvers file is: | 460 | The syntax of the Module.symvers file is: |
@@ -461,27 +462,27 @@ Module.symvers contains a list of all exported symbols from a kernel build. | |||
461 | Sample: | 462 | Sample: |
462 | 0x2d036834 scsi_remove_host drivers/scsi/scsi_mod | 463 | 0x2d036834 scsi_remove_host drivers/scsi/scsi_mod |
463 | 464 | ||
464 | For a kernel build without CONFIG_MODVERSIONING enabled the crc | 465 | For a kernel build without CONFIG_MODVERSIONS enabled, the crc |
465 | would read: 0x00000000 | 466 | would read: 0x00000000 |
466 | 467 | ||
467 | Module.symvers serve two purposes. | 468 | Module.symvers serves two purposes: |
468 | 1) It list all exported symbols both from vmlinux and all modules | 469 | 1) It lists all exported symbols both from vmlinux and all modules |
469 | 2) It list CRC if CONFIG_MODVERSION is enabled | 470 | 2) It lists the CRC if CONFIG_MODVERSIONS is enabled |
470 | 471 | ||
471 | --- 7.2 Symbols and external modules | 472 | --- 7.2 Symbols and external modules |
472 | 473 | ||
473 | When building an external module the build system needs access to | 474 | When building an external module, the build system needs access to |
474 | the symbols from the kernel to check if all external symbols are | 475 | the symbols from the kernel to check if all external symbols are |
475 | defined. This is done in the MODPOST step and to obtain all | 476 | defined. This is done in the MODPOST step and to obtain all |
476 | symbols modpost reads Module.symvers from the kernel. | 477 | symbols, modpost reads Module.symvers from the kernel. |
477 | If a Module.symvers file is present in the directory where | 478 | If a Module.symvers file is present in the directory where |
478 | the external module is being build this file will be read too. | 479 | the external module is being built, this file will be read too. |
479 | During the MODPOST step a new Module.symvers file will be written | 480 | During the MODPOST step, a new Module.symvers file will be written |
480 | containing all exported symbols that was not defined in the kernel. | 481 | containing all exported symbols that were not defined in the kernel. |
481 | 482 | ||
482 | --- 7.3 Symbols from another external module | 483 | --- 7.3 Symbols from another external module |
483 | 484 | ||
484 | Sometimes one external module uses exported symbols from another | 485 | Sometimes, an external module uses exported symbols from another |
485 | external module. Kbuild needs to have full knowledge on all symbols | 486 | external module. Kbuild needs to have full knowledge on all symbols |
486 | to avoid spitting out warnings about undefined symbols. | 487 | to avoid spitting out warnings about undefined symbols. |
487 | Two solutions exist to let kbuild know all symbols of more than | 488 | Two solutions exist to let kbuild know all symbols of more than |
@@ -490,15 +491,15 @@ Module.symvers contains a list of all exported symbols from a kernel build. | |||
490 | impractical in certain situations. | 491 | impractical in certain situations. |
491 | 492 | ||
492 | Use a top-level Kbuild file | 493 | Use a top-level Kbuild file |
493 | If you have two modules: 'foo', 'bar' and 'foo' needs symbols | 494 | If you have two modules: 'foo' and 'bar', and 'foo' needs |
494 | from 'bar' then one can use a common top-level kbuild file so | 495 | symbols from 'bar', then one can use a common top-level kbuild |
495 | both modules are compiled in same build. | 496 | file so both modules are compiled in same build. |
496 | 497 | ||
497 | Consider following directory layout: | 498 | Consider following directory layout: |
498 | ./foo/ <= contains the foo module | 499 | ./foo/ <= contains the foo module |
499 | ./bar/ <= contains the bar module | 500 | ./bar/ <= contains the bar module |
500 | The top-level Kbuild file would then look like: | 501 | The top-level Kbuild file would then look like: |
501 | 502 | ||
502 | #./Kbuild: (this file may also be named Makefile) | 503 | #./Kbuild: (this file may also be named Makefile) |
503 | obj-y := foo/ bar/ | 504 | obj-y := foo/ bar/ |
504 | 505 | ||
@@ -509,23 +510,23 @@ Module.symvers contains a list of all exported symbols from a kernel build. | |||
509 | knowledge on symbols from both modules. | 510 | knowledge on symbols from both modules. |
510 | 511 | ||
511 | Use an extra Module.symvers file | 512 | Use an extra Module.symvers file |
512 | When an external module is build a Module.symvers file is | 513 | When an external module is built, a Module.symvers file is |
513 | generated containing all exported symbols which are not | 514 | generated containing all exported symbols which are not |
514 | defined in the kernel. | 515 | defined in the kernel. |
515 | To get access to symbols from module 'bar' one can copy the | 516 | To get access to symbols from module 'bar', one can copy the |
516 | Module.symvers file from the compilation of the 'bar' module | 517 | Module.symvers file from the compilation of the 'bar' module |
517 | to the directory where the 'foo' module is build. | 518 | to the directory where the 'foo' module is built. |
518 | During the module build kbuild will read the Module.symvers | 519 | During the module build, kbuild will read the Module.symvers |
519 | file in the directory of the external module and when the | 520 | file in the directory of the external module and when the |
520 | build is finished a new Module.symvers file is created | 521 | build is finished, a new Module.symvers file is created |
521 | containing the sum of all symbols defined and not part of the | 522 | containing the sum of all symbols defined and not part of the |
522 | kernel. | 523 | kernel. |
523 | 524 | ||
524 | === 8. Tips & Tricks | 525 | === 8. Tips & Tricks |
525 | 526 | ||
526 | --- 8.1 Testing for CONFIG_FOO_BAR | 527 | --- 8.1 Testing for CONFIG_FOO_BAR |
527 | 528 | ||
528 | Modules often needs to check for certain CONFIG_ options to decide if | 529 | Modules often need to check for certain CONFIG_ options to decide if |
529 | a specific feature shall be included in the module. When kbuild is used | 530 | a specific feature shall be included in the module. When kbuild is used |
530 | this is done by referencing the CONFIG_ variable directly. | 531 | this is done by referencing the CONFIG_ variable directly. |
531 | 532 | ||
@@ -537,7 +538,7 @@ Module.symvers contains a list of all exported symbols from a kernel build. | |||
537 | 538 | ||
538 | External modules have traditionally used grep to check for specific | 539 | External modules have traditionally used grep to check for specific |
539 | CONFIG_ settings directly in .config. This usage is broken. | 540 | CONFIG_ settings directly in .config. This usage is broken. |
540 | As introduced before external modules shall use kbuild when building | 541 | As introduced before, external modules shall use kbuild when building |
541 | and therefore can use the same methods as in-kernel modules when testing | 542 | and therefore can use the same methods as in-kernel modules when |
542 | for CONFIG_ definitions. | 543 | testing for CONFIG_ definitions. |
543 | 544 | ||
diff --git a/Documentation/kdump/gdbmacros.txt b/Documentation/kdump/gdbmacros.txt index dcf5580380ab..9b9b454b048a 100644 --- a/Documentation/kdump/gdbmacros.txt +++ b/Documentation/kdump/gdbmacros.txt | |||
@@ -175,7 +175,7 @@ end | |||
175 | document trapinfo | 175 | document trapinfo |
176 | Run info threads and lookup pid of thread #1 | 176 | Run info threads and lookup pid of thread #1 |
177 | 'trapinfo <pid>' will tell you by which trap & possibly | 177 | 'trapinfo <pid>' will tell you by which trap & possibly |
178 | addresthe kernel paniced. | 178 | address the kernel panicked. |
179 | end | 179 | end |
180 | 180 | ||
181 | 181 | ||
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index 212cf3c21abf..08bafa8c1caa 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt | |||
@@ -1,155 +1,325 @@ | |||
1 | Documentation for kdump - the kexec-based crash dumping solution | 1 | ================================================================ |
2 | Documentation for Kdump - The kexec-based Crash Dumping Solution | ||
2 | ================================================================ | 3 | ================================================================ |
3 | 4 | ||
4 | DESIGN | 5 | This document includes overview, setup and installation, and analysis |
5 | ====== | 6 | information. |
6 | 7 | ||
7 | Kdump uses kexec to reboot to a second kernel whenever a dump needs to be | 8 | Overview |
8 | taken. This second kernel is booted with very little memory. The first kernel | 9 | ======== |
9 | reserves the section of memory that the second kernel uses. This ensures that | ||
10 | on-going DMA from the first kernel does not corrupt the second kernel. | ||
11 | 10 | ||
12 | All the necessary information about Core image is encoded in ELF format and | 11 | Kdump uses kexec to quickly boot to a dump-capture kernel whenever a |
13 | stored in reserved area of memory before crash. Physical address of start of | 12 | dump of the system kernel's memory needs to be taken (for example, when |
14 | ELF header is passed to new kernel through command line parameter elfcorehdr=. | 13 | the system panics). The system kernel's memory image is preserved across |
14 | the reboot and is accessible to the dump-capture kernel. | ||
15 | 15 | ||
16 | On i386, the first 640 KB of physical memory is needed to boot, irrespective | 16 | You can use common Linux commands, such as cp and scp, to copy the |
17 | of where the kernel loads. Hence, this region is backed up by kexec just before | 17 | memory image to a dump file on the local disk, or across the network to |
18 | rebooting into the new kernel. | 18 | a remote system. |
19 | 19 | ||
20 | In the second kernel, "old memory" can be accessed in two ways. | 20 | Kdump and kexec are currently supported on the x86, x86_64, and ppc64 |
21 | architectures. | ||
21 | 22 | ||
22 | - The first one is through a /dev/oldmem device interface. A capture utility | 23 | When the system kernel boots, it reserves a small section of memory for |
23 | can read the device file and write out the memory in raw format. This is raw | 24 | the dump-capture kernel. This ensures that ongoing Direct Memory Access |
24 | dump of memory and analysis/capture tool should be intelligent enough to | 25 | (DMA) from the system kernel does not corrupt the dump-capture kernel. |
25 | determine where to look for the right information. ELF headers (elfcorehdr=) | 26 | The kexec -p command loads the dump-capture kernel into this reserved |
26 | can become handy here. | 27 | memory. |
27 | 28 | ||
28 | - The second interface is through /proc/vmcore. This exports the dump as an ELF | 29 | On x86 machines, the first 640 KB of physical memory is needed to boot, |
29 | format file which can be written out using any file copy command | 30 | regardless of where the kernel loads. Therefore, kexec backs up this |
30 | (cp, scp, etc). Further, gdb can be used to perform limited debugging on | 31 | region just before rebooting into the dump-capture kernel. |
31 | the dump file. This method ensures methods ensure that there is correct | ||
32 | ordering of the dump pages (corresponding to the first 640 KB that has been | ||
33 | relocated). | ||
34 | 32 | ||
35 | SETUP | 33 | All of the necessary information about the system kernel's core image is |
36 | ===== | 34 | encoded in the ELF format, and stored in a reserved area of memory |
35 | before a crash. The physical address of the start of the ELF header is | ||
36 | passed to the dump-capture kernel through the elfcorehdr= boot | ||
37 | parameter. | ||
38 | |||
39 | With the dump-capture kernel, you can access the memory image, or "old | ||
40 | memory," in two ways: | ||
41 | |||
42 | - Through a /dev/oldmem device interface. A capture utility can read the | ||
43 | device file and write out the memory in raw format. This is a raw dump | ||
44 | of memory. Analysis and capture tools must be intelligent enough to | ||
45 | determine where to look for the right information. | ||
46 | |||
47 | - Through /proc/vmcore. This exports the dump as an ELF-format file that | ||
48 | you can write out using file copy commands such as cp or scp. Further, | ||
49 | you can use analysis tools such as the GNU Debugger (GDB) and the Crash | ||
50 | tool to debug the dump file. This method ensures that the dump pages are | ||
51 | correctly ordered. | ||
52 | |||
53 | |||
54 | Setup and Installation | ||
55 | ====================== | ||
56 | |||
57 | Install kexec-tools and the Kdump patch | ||
58 | --------------------------------------- | ||
59 | |||
60 | 1) Login as the root user. | ||
61 | |||
62 | 2) Download the kexec-tools user-space package from the following URL: | ||
63 | |||
64 | http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz | ||
65 | |||
66 | 3) Unpack the tarball with the tar command, as follows: | ||
67 | |||
68 | tar xvpzf kexec-tools-1.101.tar.gz | ||
69 | |||
70 | 4) Download the latest consolidated Kdump patch from the following URL: | ||
71 | |||
72 | http://lse.sourceforge.net/kdump/ | ||
73 | |||
74 | (This location is being used until all the user-space Kdump patches | ||
75 | are integrated with the kexec-tools package.) | ||
76 | |||
77 | 5) Change to the kexec-tools-1.101 directory, as follows: | ||
78 | |||
79 | cd kexec-tools-1.101 | ||
80 | |||
81 | 6) Apply the consolidated patch to the kexec-tools-1.101 source tree | ||
82 | with the patch command, as follows. (Modify the path to the downloaded | ||
83 | patch as necessary.) | ||
84 | |||
85 | patch -p1 < /path-to-kdump-patch/kexec-tools-1.101-kdump.patch | ||
86 | |||
87 | 7) Configure the package, as follows: | ||
88 | |||
89 | ./configure | ||
90 | |||
91 | 8) Compile the package, as follows: | ||
92 | |||
93 | make | ||
94 | |||
95 | 9) Install the package, as follows: | ||
96 | |||
97 | make install | ||
98 | |||
99 | |||
100 | Download and build the system and dump-capture kernels | ||
101 | ------------------------------------------------------ | ||
102 | |||
103 | Download the mainline (vanilla) kernel source code (2.6.13-rc1 or newer) | ||
104 | from http://www.kernel.org. Two kernels must be built: a system kernel | ||
105 | and a dump-capture kernel. Use the following steps to configure these | ||
106 | kernels with the necessary kexec and Kdump features: | ||
107 | |||
108 | System kernel | ||
109 | ------------- | ||
110 | |||
111 | 1) Enable "kexec system call" in "Processor type and features." | ||
112 | |||
113 | CONFIG_KEXEC=y | ||
114 | |||
115 | 2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo | ||
116 | filesystems." This is usually enabled by default. | ||
117 | |||
118 | CONFIG_SYSFS=y | ||
119 | |||
120 | Note that "sysfs file system support" might not appear in the "Pseudo | ||
121 | filesystems" menu if "Configure standard kernel features (for small | ||
122 | systems)" is not enabled in "General Setup." In this case, check the | ||
123 | .config file itself to ensure that sysfs is turned on, as follows: | ||
124 | |||
125 | grep 'CONFIG_SYSFS' .config | ||
126 | |||
127 | 3) Enable "Compile the kernel with debug info" in "Kernel hacking." | ||
128 | |||
129 | CONFIG_DEBUG_INFO=Y | ||
130 | |||
131 | This causes the kernel to be built with debug symbols. The dump | ||
132 | analysis tools require a vmlinux with debug symbols in order to read | ||
133 | and analyze a dump file. | ||
134 | |||
135 | 4) Make and install the kernel and its modules. Update the boot loader | ||
136 | (such as grub, yaboot, or lilo) configuration files as necessary. | ||
137 | |||
138 | 5) Boot the system kernel with the boot parameter "crashkernel=Y@X", | ||
139 | where Y specifies how much memory to reserve for the dump-capture kernel | ||
140 | and X specifies the beginning of this reserved memory. For example, | ||
141 | "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory | ||
142 | starting at physical address 0x01000000 for the dump-capture kernel. | ||
143 | |||
144 | On x86 and x86_64, use "crashkernel=64M@16M". | ||
145 | |||
146 | On ppc64, use "crashkernel=128M@32M". | ||
147 | |||
148 | |||
149 | The dump-capture kernel | ||
150 | ----------------------- | ||
37 | 151 | ||
38 | 1) Download the upstream kexec-tools userspace package from | 152 | 1) Under "General setup," append "-kdump" to the current string in |
39 | http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz. | 153 | "Local version." |
40 | 154 | ||
41 | Apply the latest consolidated kdump patch on top of kexec-tools-1.101 | 155 | 2) On x86, enable high memory support under "Processor type and |
42 | from http://lse.sourceforge.net/kdump/. This arrangment has been made | 156 | features": |
43 | till all the userspace patches supporting kdump are integrated with | 157 | |
44 | upstream kexec-tools userspace. | 158 | CONFIG_HIGHMEM64G=y |
45 | 159 | or | |
46 | 2) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernels. | 160 | CONFIG_HIGHMEM4G |
47 | Two kernels need to be built in order to get this feature working. | 161 | |
48 | Following are the steps to properly configure the two kernels specific | 162 | 3) On x86 and x86_64, disable symmetric multi-processing support |
49 | to kexec and kdump features: | 163 | under "Processor type and features": |
50 | 164 | ||
51 | A) First kernel or regular kernel: | 165 | CONFIG_SMP=n |
52 | ---------------------------------- | 166 | (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line |
53 | a) Enable "kexec system call" feature (in Processor type and features). | 167 | when loading the dump-capture kernel, see section "Load the Dump-capture |
54 | CONFIG_KEXEC=y | 168 | Kernel".) |
55 | b) Enable "sysfs file system support" (in Pseudo filesystems). | 169 | |
56 | CONFIG_SYSFS=y | 170 | 4) On ppc64, disable NUMA support and enable EMBEDDED support: |
57 | c) make | 171 | |
58 | d) Boot into first kernel with the command line parameter "crashkernel=Y@X". | 172 | CONFIG_NUMA=n |
59 | Use appropriate values for X and Y. Y denotes how much memory to reserve | 173 | CONFIG_EMBEDDED=y |
60 | for the second kernel, and X denotes at what physical address the | 174 | CONFIG_EEH=N for the dump-capture kernel |
61 | reserved memory section starts. For example: "crashkernel=64M@16M". | 175 | |
62 | 176 | 5) Enable "kernel crash dumps" support under "Processor type and | |
63 | 177 | features": | |
64 | B) Second kernel or dump capture kernel: | 178 | |
65 | --------------------------------------- | 179 | CONFIG_CRASH_DUMP=y |
66 | a) For i386 architecture enable Highmem support | 180 | |
67 | CONFIG_HIGHMEM=y | 181 | 6) Use a suitable value for "Physical address where the kernel is |
68 | b) Enable "kernel crash dumps" feature (under "Processor type and features") | 182 | loaded" (under "Processor type and features"). This only appears when |
69 | CONFIG_CRASH_DUMP=y | 183 | "kernel crash dumps" is enabled. By default this value is 0x1000000 |
70 | c) Make sure a suitable value for "Physical address where the kernel is | 184 | (16MB). It should be the same as X in the "crashkernel=Y@X" boot |
71 | loaded" (under "Processor type and features"). By default this value | 185 | parameter discussed above. |
72 | is 0x1000000 (16MB) and it should be same as X (See option d above), | 186 | |
73 | e.g., 16 MB or 0x1000000. | 187 | On x86 and x86_64, use "CONFIG_PHYSICAL_START=0x1000000". |
74 | CONFIG_PHYSICAL_START=0x1000000 | 188 | |
75 | d) Enable "/proc/vmcore support" (Optional, under "Pseudo filesystems"). | 189 | On ppc64 the value is automatically set at 32MB when |
76 | CONFIG_PROC_VMCORE=y | 190 | CONFIG_CRASH_DUMP is set. |
77 | 191 | ||
78 | 3) After booting to regular kernel or first kernel, load the second kernel | 192 | 6) Optionally enable "/proc/vmcore support" under "Filesystems" -> |
79 | using the following command: | 193 | "Pseudo filesystems". |
80 | 194 | ||
81 | kexec -p <second-kernel> --args-linux --elf32-core-headers | 195 | CONFIG_PROC_VMCORE=y |
82 | --append="root=<root-dev> init 1 irqpoll maxcpus=1" | 196 | (CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.) |
83 | 197 | ||
84 | Notes: | 198 | 7) Make and install the kernel and its modules. DO NOT add this kernel |
85 | ====== | 199 | to the boot loader configuration files. |
86 | i) <second-kernel> has to be a vmlinux image ie uncompressed elf image. | 200 | |
87 | bzImage will not work, as of now. | 201 | |
88 | ii) --args-linux has to be speicfied as if kexec it loading an elf image, | 202 | Load the Dump-capture Kernel |
89 | it needs to know that the arguments supplied are of linux type. | 203 | ============================ |
90 | iii) By default ELF headers are stored in ELF64 format to support systems | 204 | |
91 | with more than 4GB memory. Option --elf32-core-headers forces generation | 205 | After booting to the system kernel, load the dump-capture kernel using |
92 | of ELF32 headers. The reason for this option being, as of now gdb can | 206 | the following command: |
93 | not open vmcore file with ELF64 headers on a 32 bit systems. So ELF32 | 207 | |
94 | headers can be used if one has non-PAE systems and hence memory less | 208 | kexec -p <dump-capture-kernel> \ |
95 | than 4GB. | 209 | --initrd=<initrd-for-dump-capture-kernel> --args-linux \ |
96 | iv) Specify "irqpoll" as command line parameter. This reduces driver | 210 | --append="root=<root-dev> init 1 irqpoll" |
97 | initialization failures in second kernel due to shared interrupts. | 211 | |
98 | v) <root-dev> needs to be specified in a format corresponding to the root | 212 | |
99 | device name in the output of mount command. | 213 | Notes on loading the dump-capture kernel: |
100 | vi) If you have built the drivers required to mount root file system as | 214 | |
101 | modules in <second-kernel>, then, specify | 215 | * <dump-capture-kernel> must be a vmlinux image (that is, an |
102 | --initrd=<initrd-for-second-kernel>. | 216 | uncompressed ELF image). bzImage does not work at this time. |
103 | vii) Specify maxcpus=1 as, if during first kernel run, if panic happens on | 217 | |
104 | non-boot cpus, second kernel doesn't seem to be boot up all the cpus. | 218 | * By default, the ELF headers are stored in ELF64 format to support |
105 | The other option is to always built the second kernel without SMP | 219 | systems with more than 4GB memory. The --elf32-core-headers option can |
106 | support ie CONFIG_SMP=n | 220 | be used to force the generation of ELF32 headers. This is necessary |
107 | 221 | because GDB currently cannot open vmcore files with ELF64 headers on | |
108 | 4) After successfully loading the second kernel as above, if a panic occurs | 222 | 32-bit systems. ELF32 headers can be used on non-PAE systems (that is, |
109 | system reboots into the second kernel. A module can be written to force | 223 | less than 4GB of memory). |
110 | the panic or "ALT-SysRq-c" can be used initiate a crash dump for testing | 224 | |
111 | purposes. | 225 | * The "irqpoll" boot parameter reduces driver initialization failures |
112 | 226 | due to shared interrupts in the dump-capture kernel. | |
113 | 5) Once the second kernel has booted, write out the dump file using | 227 | |
228 | * You must specify <root-dev> in the format corresponding to the root | ||
229 | device name in the output of mount command. | ||
230 | |||
231 | * "init 1" boots the dump-capture kernel into single-user mode without | ||
232 | networking. If you want networking, use "init 3." | ||
233 | |||
234 | |||
235 | Kernel Panic | ||
236 | ============ | ||
237 | |||
238 | After successfully loading the dump-capture kernel as previously | ||
239 | described, the system will reboot into the dump-capture kernel if a | ||
240 | system crash is triggered. Trigger points are located in panic(), | ||
241 | die(), die_nmi() and in the sysrq handler (ALT-SysRq-c). | ||
242 | |||
243 | The following conditions will execute a crash trigger point: | ||
244 | |||
245 | If a hard lockup is detected and "NMI watchdog" is configured, the system | ||
246 | will boot into the dump-capture kernel ( die_nmi() ). | ||
247 | |||
248 | If die() is called, and it happens to be a thread with pid 0 or 1, or die() | ||
249 | is called inside interrupt context or die() is called and panic_on_oops is set, | ||
250 | the system will boot into the dump-capture kernel. | ||
251 | |||
252 | On powererpc systems when a soft-reset is generated, die() is called by all cpus and the system system will boot into the dump-capture kernel. | ||
253 | |||
254 | For testing purposes, you can trigger a crash by using "ALT-SysRq-c", | ||
255 | "echo c > /proc/sysrq-trigger or write a module to force the panic. | ||
256 | |||
257 | Write Out the Dump File | ||
258 | ======================= | ||
259 | |||
260 | After the dump-capture kernel is booted, write out the dump file with | ||
261 | the following command: | ||
114 | 262 | ||
115 | cp /proc/vmcore <dump-file> | 263 | cp /proc/vmcore <dump-file> |
116 | 264 | ||
117 | Dump memory can also be accessed as a /dev/oldmem device for a linear/raw | 265 | You can also access dumped memory as a /dev/oldmem device for a linear |
118 | view. To create the device, type: | 266 | and raw view. To create the device, use the following command: |
119 | 267 | ||
120 | mknod /dev/oldmem c 1 12 | 268 | mknod /dev/oldmem c 1 12 |
121 | 269 | ||
122 | Use "dd" with suitable options for count, bs and skip to access specific | 270 | Use the dd command with suitable options for count, bs, and skip to |
123 | portions of the dump. | 271 | access specific portions of the dump. |
124 | 272 | ||
125 | Entire memory: dd if=/dev/oldmem of=oldmem.001 | 273 | To see the entire memory, use the following command: |
126 | 274 | ||
275 | dd if=/dev/oldmem of=oldmem.001 | ||
127 | 276 | ||
128 | ANALYSIS | 277 | |
278 | Analysis | ||
129 | ======== | 279 | ======== |
130 | Limited analysis can be done using gdb on the dump file copied out of | ||
131 | /proc/vmcore. Use vmlinux built with -g and run | ||
132 | 280 | ||
133 | gdb vmlinux <dump-file> | 281 | Before analyzing the dump image, you should reboot into a stable kernel. |
282 | |||
283 | You can do limited analysis using GDB on the dump file copied out of | ||
284 | /proc/vmcore. Use the debug vmlinux built with -g and run the following | ||
285 | command: | ||
286 | |||
287 | gdb vmlinux <dump-file> | ||
134 | 288 | ||
135 | Stack trace for the task on processor 0, register display, memory display | 289 | Stack trace for the task on processor 0, register display, and memory |
136 | work fine. | 290 | display work fine. |
137 | 291 | ||
138 | Note: gdb cannot analyse core files generated in ELF64 format for i386. | 292 | Note: GDB cannot analyze core files generated in ELF64 format for x86. |
293 | On systems with a maximum of 4GB of memory, you can generate | ||
294 | ELF32-format headers using the --elf32-core-headers kernel option on the | ||
295 | dump kernel. | ||
139 | 296 | ||
140 | Latest "crash" (crash-4.0-2.18) as available on Dave Anderson's site | 297 | You can also use the Crash utility to analyze dump files in Kdump |
141 | http://people.redhat.com/~anderson/ works well with kdump format. | 298 | format. Crash is available on Dave Anderson's site at the following URL: |
142 | 299 | ||
300 | http://people.redhat.com/~anderson/ | ||
301 | |||
302 | |||
303 | To Do | ||
304 | ===== | ||
143 | 305 | ||
144 | TODO | 306 | 1) Provide a kernel pages filtering mechanism, so core file size is not |
145 | ==== | 307 | extreme on systems with huge memory banks. |
146 | 1) Provide a kernel pages filtering mechanism so that core file size is not | ||
147 | insane on systems having huge memory banks. | ||
148 | 2) Relocatable kernel can help in maintaining multiple kernels for crashdump | ||
149 | and same kernel as the first kernel can be used to capture the dump. | ||
150 | 308 | ||
309 | 2) Relocatable kernel can help in maintaining multiple kernels for | ||
310 | crash_dump, and the same kernel as the system kernel can be used to | ||
311 | capture the dump. | ||
151 | 312 | ||
152 | CONTACT | 313 | |
314 | Contact | ||
153 | ======= | 315 | ======= |
316 | |||
154 | Vivek Goyal (vgoyal@in.ibm.com) | 317 | Vivek Goyal (vgoyal@in.ibm.com) |
155 | Maneesh Soni (maneesh@in.ibm.com) | 318 | Maneesh Soni (maneesh@in.ibm.com) |
319 | |||
320 | |||
321 | Trademark | ||
322 | ========= | ||
323 | |||
324 | Linux is a trademark of Linus Torvalds in the United States, other | ||
325 | countries, or both. | ||
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index bca6f389da66..137e993f4329 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -35,7 +35,6 @@ parameter is applicable: | |||
35 | APM Advanced Power Management support is enabled. | 35 | APM Advanced Power Management support is enabled. |
36 | AX25 Appropriate AX.25 support is enabled. | 36 | AX25 Appropriate AX.25 support is enabled. |
37 | CD Appropriate CD support is enabled. | 37 | CD Appropriate CD support is enabled. |
38 | DEVFS devfs support is enabled. | ||
39 | DRM Direct Rendering Management support is enabled. | 38 | DRM Direct Rendering Management support is enabled. |
40 | EDD BIOS Enhanced Disk Drive Services (EDD) is enabled | 39 | EDD BIOS Enhanced Disk Drive Services (EDD) is enabled |
41 | EFI EFI Partitioning (GPT) is enabled | 40 | EFI EFI Partitioning (GPT) is enabled |
@@ -61,6 +60,7 @@ parameter is applicable: | |||
61 | MTD MTD support is enabled. | 60 | MTD MTD support is enabled. |
62 | NET Appropriate network support is enabled. | 61 | NET Appropriate network support is enabled. |
63 | NUMA NUMA support is enabled. | 62 | NUMA NUMA support is enabled. |
63 | GENERIC_TIME The generic timeofday code is enabled. | ||
64 | NFS Appropriate NFS support is enabled. | 64 | NFS Appropriate NFS support is enabled. |
65 | OSS OSS sound support is enabled. | 65 | OSS OSS sound support is enabled. |
66 | PARIDE The ParIDE subsystem is enabled. | 66 | PARIDE The ParIDE subsystem is enabled. |
@@ -110,6 +110,13 @@ be entered as an environment variable, whereas its absence indicates that | |||
110 | it will appear as a kernel argument readable via /proc/cmdline by programs | 110 | it will appear as a kernel argument readable via /proc/cmdline by programs |
111 | running once the system is up. | 111 | running once the system is up. |
112 | 112 | ||
113 | The number of kernel parameters is not limited, but the length of the | ||
114 | complete command line (parameters including spaces etc.) is limited to | ||
115 | a fixed number of characters. This limit depends on the architecture | ||
116 | and is between 256 and 4096 characters. It is defined in the file | ||
117 | ./include/asm/setup.h as COMMAND_LINE_SIZE. | ||
118 | |||
119 | |||
113 | 53c7xx= [HW,SCSI] Amiga SCSI controllers | 120 | 53c7xx= [HW,SCSI] Amiga SCSI controllers |
114 | See header of drivers/scsi/53c7xx.c. | 121 | See header of drivers/scsi/53c7xx.c. |
115 | See also Documentation/scsi/ncr53c7xx.txt. | 122 | See also Documentation/scsi/ncr53c7xx.txt. |
@@ -179,6 +186,11 @@ running once the system is up. | |||
179 | override platform specific driver. | 186 | override platform specific driver. |
180 | See also Documentation/acpi-hotkey.txt. | 187 | See also Documentation/acpi-hotkey.txt. |
181 | 188 | ||
189 | acpi_pm_good [IA-32,X86-64] | ||
190 | Override the pmtimer bug detection: force the kernel | ||
191 | to assume that this machine's pmtimer latches its value | ||
192 | and always returns good values. | ||
193 | |||
182 | enable_timer_pin_1 [i386,x86-64] | 194 | enable_timer_pin_1 [i386,x86-64] |
183 | Enable PIN 1 of APIC timer | 195 | Enable PIN 1 of APIC timer |
184 | Can be useful to work around chipset bugs | 196 | Can be useful to work around chipset bugs |
@@ -341,10 +353,11 @@ running once the system is up. | |||
341 | Value can be changed at runtime via | 353 | Value can be changed at runtime via |
342 | /selinux/checkreqprot. | 354 | /selinux/checkreqprot. |
343 | 355 | ||
344 | clock= [BUGS=IA-32,HW] gettimeofday timesource override. | 356 | clock= [BUGS=IA-32, HW] gettimeofday clocksource override. |
345 | Forces specified timesource (if avaliable) to be used | 357 | [Deprecated] |
346 | when calculating gettimeofday(). If specicified | 358 | Forces specified clocksource (if avaliable) to be used |
347 | timesource is not avalible, it defaults to PIT. | 359 | when calculating gettimeofday(). If specified |
360 | clocksource is not avalible, it defaults to PIT. | ||
348 | Format: { pit | tsc | cyclone | pmtmr } | 361 | Format: { pit | tsc | cyclone | pmtmr } |
349 | 362 | ||
350 | disable_8254_timer | 363 | disable_8254_timer |
@@ -429,13 +442,19 @@ running once the system is up. | |||
429 | 442 | ||
430 | debug [KNL] Enable kernel debugging (events log level). | 443 | debug [KNL] Enable kernel debugging (events log level). |
431 | 444 | ||
445 | debug_locks_verbose= | ||
446 | [KNL] verbose self-tests | ||
447 | Format=<0|1> | ||
448 | Print debugging info while doing the locking API | ||
449 | self-tests. | ||
450 | We default to 0 (no extra messages), setting it to | ||
451 | 1 will print _a lot_ more information - normally | ||
452 | only useful to kernel developers. | ||
453 | |||
432 | decnet= [HW,NET] | 454 | decnet= [HW,NET] |
433 | Format: <area>[,<node>] | 455 | Format: <area>[,<node>] |
434 | See also Documentation/networking/decnet.txt. | 456 | See also Documentation/networking/decnet.txt. |
435 | 457 | ||
436 | devfs= [DEVFS] | ||
437 | See Documentation/filesystems/devfs/boot-options. | ||
438 | |||
439 | dhash_entries= [KNL] | 458 | dhash_entries= [KNL] |
440 | Set number of hash buckets for dentry cache. | 459 | Set number of hash buckets for dentry cache. |
441 | 460 | ||
@@ -561,8 +580,6 @@ running once the system is up. | |||
561 | gscd= [HW,CD] | 580 | gscd= [HW,CD] |
562 | Format: <io> | 581 | Format: <io> |
563 | 582 | ||
564 | gt96100eth= [NET] MIPS GT96100 Advanced Communication Controller | ||
565 | |||
566 | gus= [HW,OSS] | 583 | gus= [HW,OSS] |
567 | Format: <io>,<irq>,<dma>,<dma16> | 584 | Format: <io>,<irq>,<dma>,<dma16> |
568 | 585 | ||
@@ -685,6 +702,12 @@ running once the system is up. | |||
685 | ips= [HW,SCSI] Adaptec / IBM ServeRAID controller | 702 | ips= [HW,SCSI] Adaptec / IBM ServeRAID controller |
686 | See header of drivers/scsi/ips.c. | 703 | See header of drivers/scsi/ips.c. |
687 | 704 | ||
705 | ports= [IP_VS_FTP] IPVS ftp helper module | ||
706 | Default is 21. | ||
707 | Up to 8 (IP_VS_APP_MAX_PORTS) ports | ||
708 | may be specified. | ||
709 | Format: <port>,<port>.... | ||
710 | |||
688 | irqfixup [HW] | 711 | irqfixup [HW] |
689 | When an interrupt is not handled search all handlers | 712 | When an interrupt is not handled search all handlers |
690 | for it. Intended to get systems with badly broken | 713 | for it. Intended to get systems with badly broken |
@@ -1017,6 +1040,8 @@ running once the system is up. | |||
1017 | 1040 | ||
1018 | nocache [ARM] | 1041 | nocache [ARM] |
1019 | 1042 | ||
1043 | nodelayacct [KNL] Disable per-task delay accounting | ||
1044 | |||
1020 | nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects. | 1045 | nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects. |
1021 | 1046 | ||
1022 | noexec [IA-64] | 1047 | noexec [IA-64] |
@@ -1220,7 +1245,11 @@ running once the system is up. | |||
1220 | bootloader. This is currently used on | 1245 | bootloader. This is currently used on |
1221 | IXP2000 systems where the bus has to be | 1246 | IXP2000 systems where the bus has to be |
1222 | configured a certain way for adjunct CPUs. | 1247 | configured a certain way for adjunct CPUs. |
1223 | 1248 | noearly [X86] Don't do any early type 1 scanning. | |
1249 | This might help on some broken boards which | ||
1250 | machine check when some devices' config space | ||
1251 | is read. But various workarounds are disabled | ||
1252 | and some IOMMU drivers will not work. | ||
1224 | pcmv= [HW,PCMCIA] BadgePAD 4 | 1253 | pcmv= [HW,PCMCIA] BadgePAD 4 |
1225 | 1254 | ||
1226 | pd. [PARIDE] | 1255 | pd. [PARIDE] |
@@ -1302,7 +1331,7 @@ running once the system is up. | |||
1302 | pt. [PARIDE] | 1331 | pt. [PARIDE] |
1303 | See Documentation/paride.txt. | 1332 | See Documentation/paride.txt. |
1304 | 1333 | ||
1305 | quiet= [KNL] Disable log messages | 1334 | quiet [KNL] Disable most log messages |
1306 | 1335 | ||
1307 | r128= [HW,DRM] | 1336 | r128= [HW,DRM] |
1308 | 1337 | ||
@@ -1343,6 +1372,14 @@ running once the system is up. | |||
1343 | 1372 | ||
1344 | reserve= [KNL,BUGS] Force the kernel to ignore some iomem area | 1373 | reserve= [KNL,BUGS] Force the kernel to ignore some iomem area |
1345 | 1374 | ||
1375 | reservetop= [IA-32] | ||
1376 | Format: nn[KMG] | ||
1377 | Reserves a hole at the top of the kernel virtual | ||
1378 | address space. | ||
1379 | |||
1380 | reset_devices [KNL] Force drivers to reset the underlying device | ||
1381 | during initialization. | ||
1382 | |||
1346 | resume= [SWSUSP] | 1383 | resume= [SWSUSP] |
1347 | Specify the partition device for software suspend | 1384 | Specify the partition device for software suspend |
1348 | 1385 | ||
@@ -1617,6 +1654,10 @@ running once the system is up. | |||
1617 | 1654 | ||
1618 | time Show timing data prefixed to each printk message line | 1655 | time Show timing data prefixed to each printk message line |
1619 | 1656 | ||
1657 | clocksource= [GENERIC_TIME] Override the default clocksource | ||
1658 | Override the default clocksource and use the clocksource | ||
1659 | with the name specified. | ||
1660 | |||
1620 | tipar.timeout= [HW,PPT] | 1661 | tipar.timeout= [HW,PPT] |
1621 | Set communications timeout in tenths of a second | 1662 | Set communications timeout in tenths of a second |
1622 | (default 15). | 1663 | (default 15). |
@@ -1658,6 +1699,10 @@ running once the system is up. | |||
1658 | usbhid.mousepoll= | 1699 | usbhid.mousepoll= |
1659 | [USBHID] The interval which mice are to be polled at. | 1700 | [USBHID] The interval which mice are to be polled at. |
1660 | 1701 | ||
1702 | vdso= [IA-32] | ||
1703 | vdso=1: enable VDSO (default) | ||
1704 | vdso=0: disable VDSO mapping | ||
1705 | |||
1661 | video= [FB] Frame buffer configuration | 1706 | video= [FB] Frame buffer configuration |
1662 | See Documentation/fb/modedb.txt. | 1707 | See Documentation/fb/modedb.txt. |
1663 | 1708 | ||
@@ -1674,9 +1719,14 @@ running once the system is up. | |||
1674 | decrease the size and leave more room for directly | 1719 | decrease the size and leave more room for directly |
1675 | mapped kernel RAM. | 1720 | mapped kernel RAM. |
1676 | 1721 | ||
1677 | vmhalt= [KNL,S390] | 1722 | vmhalt= [KNL,S390] Perform z/VM CP command after system halt. |
1723 | Format: <command> | ||
1724 | |||
1725 | vmpanic= [KNL,S390] Perform z/VM CP command after kernel panic. | ||
1726 | Format: <command> | ||
1678 | 1727 | ||
1679 | vmpoff= [KNL,S390] | 1728 | vmpoff= [KNL,S390] Perform z/VM CP command after power off. |
1729 | Format: <command> | ||
1680 | 1730 | ||
1681 | waveartist= [HW,OSS] | 1731 | waveartist= [HW,OSS] |
1682 | Format: <io>,<irq>,<dma>,<dma2> | 1732 | Format: <io>,<irq>,<dma>,<dma2> |
diff --git a/Documentation/keys-request-key.txt b/Documentation/keys-request-key.txt index 22488d791168..c1f64fdf84cb 100644 --- a/Documentation/keys-request-key.txt +++ b/Documentation/keys-request-key.txt | |||
@@ -3,16 +3,23 @@ | |||
3 | =================== | 3 | =================== |
4 | 4 | ||
5 | The key request service is part of the key retention service (refer to | 5 | The key request service is part of the key retention service (refer to |
6 | Documentation/keys.txt). This document explains more fully how that the | 6 | Documentation/keys.txt). This document explains more fully how the requesting |
7 | requesting algorithm works. | 7 | algorithm works. |
8 | 8 | ||
9 | The process starts by either the kernel requesting a service by calling | 9 | The process starts by either the kernel requesting a service by calling |
10 | request_key(): | 10 | request_key*(): |
11 | 11 | ||
12 | struct key *request_key(const struct key_type *type, | 12 | struct key *request_key(const struct key_type *type, |
13 | const char *description, | 13 | const char *description, |
14 | const char *callout_string); | 14 | const char *callout_string); |
15 | 15 | ||
16 | or: | ||
17 | |||
18 | struct key *request_key_with_auxdata(const struct key_type *type, | ||
19 | const char *description, | ||
20 | const char *callout_string, | ||
21 | void *aux); | ||
22 | |||
16 | Or by userspace invoking the request_key system call: | 23 | Or by userspace invoking the request_key system call: |
17 | 24 | ||
18 | key_serial_t request_key(const char *type, | 25 | key_serial_t request_key(const char *type, |
@@ -20,16 +27,26 @@ Or by userspace invoking the request_key system call: | |||
20 | const char *callout_info, | 27 | const char *callout_info, |
21 | key_serial_t dest_keyring); | 28 | key_serial_t dest_keyring); |
22 | 29 | ||
23 | The main difference between the two access points is that the in-kernel | 30 | The main difference between the access points is that the in-kernel interface |
24 | interface does not need to link the key to a keyring to prevent it from being | 31 | does not need to link the key to a keyring to prevent it from being immediately |
25 | immediately destroyed. The kernel interface returns a pointer directly to the | 32 | destroyed. The kernel interface returns a pointer directly to the key, and |
26 | key, and it's up to the caller to destroy the key. | 33 | it's up to the caller to destroy the key. |
34 | |||
35 | The request_key_with_auxdata() call is like the in-kernel request_key() call, | ||
36 | except that it permits auxiliary data to be passed to the upcaller (the default | ||
37 | is NULL). This is only useful for those key types that define their own upcall | ||
38 | mechanism rather than using /sbin/request-key. | ||
27 | 39 | ||
28 | The userspace interface links the key to a keyring associated with the process | 40 | The userspace interface links the key to a keyring associated with the process |
29 | to prevent the key from going away, and returns the serial number of the key to | 41 | to prevent the key from going away, and returns the serial number of the key to |
30 | the caller. | 42 | the caller. |
31 | 43 | ||
32 | 44 | ||
45 | The following example assumes that the key types involved don't define their | ||
46 | own upcall mechanisms. If they do, then those should be substituted for the | ||
47 | forking and execution of /sbin/request-key. | ||
48 | |||
49 | |||
33 | =========== | 50 | =========== |
34 | THE PROCESS | 51 | THE PROCESS |
35 | =========== | 52 | =========== |
@@ -40,8 +57,8 @@ A request proceeds in the following manner: | |||
40 | interface]. | 57 | interface]. |
41 | 58 | ||
42 | (2) request_key() searches the process's subscribed keyrings to see if there's | 59 | (2) request_key() searches the process's subscribed keyrings to see if there's |
43 | a suitable key there. If there is, it returns the key. If there isn't, and | 60 | a suitable key there. If there is, it returns the key. If there isn't, |
44 | callout_info is not set, an error is returned. Otherwise the process | 61 | and callout_info is not set, an error is returned. Otherwise the process |
45 | proceeds to the next step. | 62 | proceeds to the next step. |
46 | 63 | ||
47 | (3) request_key() sees that A doesn't have the desired key yet, so it creates | 64 | (3) request_key() sees that A doesn't have the desired key yet, so it creates |
@@ -62,7 +79,7 @@ A request proceeds in the following manner: | |||
62 | instantiation. | 79 | instantiation. |
63 | 80 | ||
64 | (7) The program may want to access another key from A's context (say a | 81 | (7) The program may want to access another key from A's context (say a |
65 | Kerberos TGT key). It just requests the appropriate key, and the keyring | 82 | Kerberos TGT key). It just requests the appropriate key, and the keyring |
66 | search notes that the session keyring has auth key V in its bottom level. | 83 | search notes that the session keyring has auth key V in its bottom level. |
67 | 84 | ||
68 | This will permit it to then search the keyrings of process A with the | 85 | This will permit it to then search the keyrings of process A with the |
@@ -79,10 +96,11 @@ A request proceeds in the following manner: | |||
79 | (10) The program then exits 0 and request_key() deletes key V and returns key | 96 | (10) The program then exits 0 and request_key() deletes key V and returns key |
80 | U to the caller. | 97 | U to the caller. |
81 | 98 | ||
82 | This also extends further. If key W (step 7 above) didn't exist, key W would be | 99 | This also extends further. If key W (step 7 above) didn't exist, key W would |
83 | created uninstantiated, another auth key (X) would be created (as per step 3) | 100 | be created uninstantiated, another auth key (X) would be created (as per step |
84 | and another copy of /sbin/request-key spawned (as per step 4); but the context | 101 | 3) and another copy of /sbin/request-key spawned (as per step 4); but the |
85 | specified by auth key X will still be process A, as it was in auth key V. | 102 | context specified by auth key X will still be process A, as it was in auth key |
103 | V. | ||
86 | 104 | ||
87 | This is because process A's keyrings can't simply be attached to | 105 | This is because process A's keyrings can't simply be attached to |
88 | /sbin/request-key at the appropriate places because (a) execve will discard two | 106 | /sbin/request-key at the appropriate places because (a) execve will discard two |
@@ -118,17 +136,17 @@ A search of any particular keyring proceeds in the following fashion: | |||
118 | 136 | ||
119 | (2) It considers all the non-keyring keys within that keyring and, if any key | 137 | (2) It considers all the non-keyring keys within that keyring and, if any key |
120 | matches the criteria specified, calls key_permission(SEARCH) on it to see | 138 | matches the criteria specified, calls key_permission(SEARCH) on it to see |
121 | if the key is allowed to be found. If it is, that key is returned; if | 139 | if the key is allowed to be found. If it is, that key is returned; if |
122 | not, the search continues, and the error code is retained if of higher | 140 | not, the search continues, and the error code is retained if of higher |
123 | priority than the one currently set. | 141 | priority than the one currently set. |
124 | 142 | ||
125 | (3) It then considers all the keyring-type keys in the keyring it's currently | 143 | (3) It then considers all the keyring-type keys in the keyring it's currently |
126 | searching. It calls key_permission(SEARCH) on each keyring, and if this | 144 | searching. It calls key_permission(SEARCH) on each keyring, and if this |
127 | grants permission, it recurses, executing steps (2) and (3) on that | 145 | grants permission, it recurses, executing steps (2) and (3) on that |
128 | keyring. | 146 | keyring. |
129 | 147 | ||
130 | The process stops immediately a valid key is found with permission granted to | 148 | The process stops immediately a valid key is found with permission granted to |
131 | use it. Any error from a previous match attempt is discarded and the key is | 149 | use it. Any error from a previous match attempt is discarded and the key is |
132 | returned. | 150 | returned. |
133 | 151 | ||
134 | When search_process_keyrings() is invoked, it performs the following searches | 152 | When search_process_keyrings() is invoked, it performs the following searches |
@@ -153,7 +171,7 @@ The moment one succeeds, all pending errors are discarded and the found key is | |||
153 | returned. | 171 | returned. |
154 | 172 | ||
155 | Only if all these fail does the whole thing fail with the highest priority | 173 | Only if all these fail does the whole thing fail with the highest priority |
156 | error. Note that several errors may have come from LSM. | 174 | error. Note that several errors may have come from LSM. |
157 | 175 | ||
158 | The error priority is: | 176 | The error priority is: |
159 | 177 | ||
diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 3bbe157b45e4..e373f0212843 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt | |||
@@ -241,25 +241,30 @@ The security class "key" has been added to SELinux so that mandatory access | |||
241 | controls can be applied to keys created within various contexts. This support | 241 | controls can be applied to keys created within various contexts. This support |
242 | is preliminary, and is likely to change quite significantly in the near future. | 242 | is preliminary, and is likely to change quite significantly in the near future. |
243 | Currently, all of the basic permissions explained above are provided in SELinux | 243 | Currently, all of the basic permissions explained above are provided in SELinux |
244 | as well; SE Linux is simply invoked after all basic permission checks have been | 244 | as well; SELinux is simply invoked after all basic permission checks have been |
245 | performed. | 245 | performed. |
246 | 246 | ||
247 | Each key is labeled with the same context as the task to which it belongs. | 247 | The value of the file /proc/self/attr/keycreate influences the labeling of |
248 | Typically, this is the same task that was running when the key was created. | 248 | newly-created keys. If the contents of that file correspond to an SELinux |
249 | The default keyrings are handled differently, but in a way that is very | 249 | security context, then the key will be assigned that context. Otherwise, the |
250 | intuitive: | 250 | key will be assigned the current context of the task that invoked the key |
251 | creation request. Tasks must be granted explicit permission to assign a | ||
252 | particular context to newly-created keys, using the "create" permission in the | ||
253 | key security class. | ||
251 | 254 | ||
252 | (*) The user and user session keyrings that are created when the user logs in | 255 | The default keyrings associated with users will be labeled with the default |
253 | are currently labeled with the context of the login manager. | 256 | context of the user if and only if the login programs have been instrumented to |
254 | 257 | properly initialize keycreate during the login process. Otherwise, they will | |
255 | (*) The keyrings associated with new threads are each labeled with the context | 258 | be labeled with the context of the login program itself. |
256 | of their associated thread, and both session and process keyrings are | ||
257 | handled similarly. | ||
258 | 259 | ||
259 | Note, however, that the default keyrings associated with the root user are | 260 | Note, however, that the default keyrings associated with the root user are |
260 | labeled with the default kernel context, since they are created early in the | 261 | labeled with the default kernel context, since they are created early in the |
261 | boot process, before root has a chance to log in. | 262 | boot process, before root has a chance to log in. |
262 | 263 | ||
264 | The keyrings associated with new threads are each labeled with the context of | ||
265 | their associated thread, and both session and process keyrings are handled | ||
266 | similarly. | ||
267 | |||
263 | 268 | ||
264 | ================ | 269 | ================ |
265 | NEW PROCFS FILES | 270 | NEW PROCFS FILES |
@@ -270,9 +275,17 @@ about the status of the key service: | |||
270 | 275 | ||
271 | (*) /proc/keys | 276 | (*) /proc/keys |
272 | 277 | ||
273 | This lists all the keys on the system, giving information about their | 278 | This lists the keys that are currently viewable by the task reading the |
274 | type, description and permissions. The payload of the key is not available | 279 | file, giving information about their type, description and permissions. |
275 | this way: | 280 | It is not possible to view the payload of the key this way, though some |
281 | information about it may be given. | ||
282 | |||
283 | The only keys included in the list are those that grant View permission to | ||
284 | the reading process whether or not it possesses them. Note that LSM | ||
285 | security checks are still performed, and may further filter out keys that | ||
286 | the current process is not authorised to view. | ||
287 | |||
288 | The contents of the file look like this: | ||
276 | 289 | ||
277 | SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY | 290 | SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY |
278 | 00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4 | 291 | 00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4 |
@@ -300,7 +313,7 @@ about the status of the key service: | |||
300 | (*) /proc/key-users | 313 | (*) /proc/key-users |
301 | 314 | ||
302 | This file lists the tracking data for each user that has at least one key | 315 | This file lists the tracking data for each user that has at least one key |
303 | on the system. Such data includes quota information and statistics: | 316 | on the system. Such data includes quota information and statistics: |
304 | 317 | ||
305 | [root@andromeda root]# cat /proc/key-users | 318 | [root@andromeda root]# cat /proc/key-users |
306 | 0: 46 45/45 1/100 13/10000 | 319 | 0: 46 45/45 1/100 13/10000 |
@@ -767,6 +780,17 @@ payload contents" for more information. | |||
767 | See also Documentation/keys-request-key.txt. | 780 | See also Documentation/keys-request-key.txt. |
768 | 781 | ||
769 | 782 | ||
783 | (*) To search for a key, passing auxiliary data to the upcaller, call: | ||
784 | |||
785 | struct key *request_key_with_auxdata(const struct key_type *type, | ||
786 | const char *description, | ||
787 | const char *callout_string, | ||
788 | void *aux); | ||
789 | |||
790 | This is identical to request_key(), except that the auxiliary data is | ||
791 | passed to the key_type->request_key() op if it exists. | ||
792 | |||
793 | |||
770 | (*) When it is no longer required, the key should be released using: | 794 | (*) When it is no longer required, the key should be released using: |
771 | 795 | ||
772 | void key_put(struct key *key); | 796 | void key_put(struct key *key); |
@@ -1018,6 +1042,24 @@ The structure has a number of fields, some of which are mandatory: | |||
1018 | as might happen when the userspace buffer is accessed. | 1042 | as might happen when the userspace buffer is accessed. |
1019 | 1043 | ||
1020 | 1044 | ||
1045 | (*) int (*request_key)(struct key *key, struct key *authkey, const char *op, | ||
1046 | void *aux); | ||
1047 | |||
1048 | This method is optional. If provided, request_key() and | ||
1049 | request_key_with_auxdata() will invoke this function rather than | ||
1050 | upcalling to /sbin/request-key to operate upon a key of this type. | ||
1051 | |||
1052 | The aux parameter is as passed to request_key_with_auxdata() or is NULL | ||
1053 | otherwise. Also passed are the key to be operated upon, the | ||
1054 | authorisation key for this operation and the operation type (currently | ||
1055 | only "create"). | ||
1056 | |||
1057 | This function should return only when the upcall is complete. Upon return | ||
1058 | the authorisation key will be revoked, and the target key will be | ||
1059 | negatively instantiated if it is still uninstantiated. The error will be | ||
1060 | returned to the caller of request_key*(). | ||
1061 | |||
1062 | |||
1021 | ============================ | 1063 | ============================ |
1022 | REQUEST-KEY CALLBACK SERVICE | 1064 | REQUEST-KEY CALLBACK SERVICE |
1023 | ============================ | 1065 | ============================ |
diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt index 8d9bffbd192c..949f7b5a2053 100644 --- a/Documentation/kobject.txt +++ b/Documentation/kobject.txt | |||
@@ -247,7 +247,7 @@ the object-specific fields, which include: | |||
247 | - default_attrs: Default attributes to be exported via sysfs when the | 247 | - default_attrs: Default attributes to be exported via sysfs when the |
248 | object is registered.Note that the last attribute has to be | 248 | object is registered.Note that the last attribute has to be |
249 | initialized to NULL ! You can find a complete implementation | 249 | initialized to NULL ! You can find a complete implementation |
250 | in drivers/block/genhd.c | 250 | in block/genhd.c |
251 | 251 | ||
252 | 252 | ||
253 | Instances of struct kobj_type are not registered; only referenced by | 253 | Instances of struct kobj_type are not registered; only referenced by |
diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt new file mode 100644 index 000000000000..00d93605bfd3 --- /dev/null +++ b/Documentation/lockdep-design.txt | |||
@@ -0,0 +1,197 @@ | |||
1 | Runtime locking correctness validator | ||
2 | ===================================== | ||
3 | |||
4 | started by Ingo Molnar <mingo@redhat.com> | ||
5 | additions by Arjan van de Ven <arjan@linux.intel.com> | ||
6 | |||
7 | Lock-class | ||
8 | ---------- | ||
9 | |||
10 | The basic object the validator operates upon is a 'class' of locks. | ||
11 | |||
12 | A class of locks is a group of locks that are logically the same with | ||
13 | respect to locking rules, even if the locks may have multiple (possibly | ||
14 | tens of thousands of) instantiations. For example a lock in the inode | ||
15 | struct is one class, while each inode has its own instantiation of that | ||
16 | lock class. | ||
17 | |||
18 | The validator tracks the 'state' of lock-classes, and it tracks | ||
19 | dependencies between different lock-classes. The validator maintains a | ||
20 | rolling proof that the state and the dependencies are correct. | ||
21 | |||
22 | Unlike an lock instantiation, the lock-class itself never goes away: when | ||
23 | a lock-class is used for the first time after bootup it gets registered, | ||
24 | and all subsequent uses of that lock-class will be attached to this | ||
25 | lock-class. | ||
26 | |||
27 | State | ||
28 | ----- | ||
29 | |||
30 | The validator tracks lock-class usage history into 5 separate state bits: | ||
31 | |||
32 | - 'ever held in hardirq context' [ == hardirq-safe ] | ||
33 | - 'ever held in softirq context' [ == softirq-safe ] | ||
34 | - 'ever held with hardirqs enabled' [ == hardirq-unsafe ] | ||
35 | - 'ever held with softirqs and hardirqs enabled' [ == softirq-unsafe ] | ||
36 | |||
37 | - 'ever used' [ == !unused ] | ||
38 | |||
39 | Single-lock state rules: | ||
40 | ------------------------ | ||
41 | |||
42 | A softirq-unsafe lock-class is automatically hardirq-unsafe as well. The | ||
43 | following states are exclusive, and only one of them is allowed to be | ||
44 | set for any lock-class: | ||
45 | |||
46 | <hardirq-safe> and <hardirq-unsafe> | ||
47 | <softirq-safe> and <softirq-unsafe> | ||
48 | |||
49 | The validator detects and reports lock usage that violate these | ||
50 | single-lock state rules. | ||
51 | |||
52 | Multi-lock dependency rules: | ||
53 | ---------------------------- | ||
54 | |||
55 | The same lock-class must not be acquired twice, because this could lead | ||
56 | to lock recursion deadlocks. | ||
57 | |||
58 | Furthermore, two locks may not be taken in different order: | ||
59 | |||
60 | <L1> -> <L2> | ||
61 | <L2> -> <L1> | ||
62 | |||
63 | because this could lead to lock inversion deadlocks. (The validator | ||
64 | finds such dependencies in arbitrary complexity, i.e. there can be any | ||
65 | other locking sequence between the acquire-lock operations, the | ||
66 | validator will still track all dependencies between locks.) | ||
67 | |||
68 | Furthermore, the following usage based lock dependencies are not allowed | ||
69 | between any two lock-classes: | ||
70 | |||
71 | <hardirq-safe> -> <hardirq-unsafe> | ||
72 | <softirq-safe> -> <softirq-unsafe> | ||
73 | |||
74 | The first rule comes from the fact the a hardirq-safe lock could be | ||
75 | taken by a hardirq context, interrupting a hardirq-unsafe lock - and | ||
76 | thus could result in a lock inversion deadlock. Likewise, a softirq-safe | ||
77 | lock could be taken by an softirq context, interrupting a softirq-unsafe | ||
78 | lock. | ||
79 | |||
80 | The above rules are enforced for any locking sequence that occurs in the | ||
81 | kernel: when acquiring a new lock, the validator checks whether there is | ||
82 | any rule violation between the new lock and any of the held locks. | ||
83 | |||
84 | When a lock-class changes its state, the following aspects of the above | ||
85 | dependency rules are enforced: | ||
86 | |||
87 | - if a new hardirq-safe lock is discovered, we check whether it | ||
88 | took any hardirq-unsafe lock in the past. | ||
89 | |||
90 | - if a new softirq-safe lock is discovered, we check whether it took | ||
91 | any softirq-unsafe lock in the past. | ||
92 | |||
93 | - if a new hardirq-unsafe lock is discovered, we check whether any | ||
94 | hardirq-safe lock took it in the past. | ||
95 | |||
96 | - if a new softirq-unsafe lock is discovered, we check whether any | ||
97 | softirq-safe lock took it in the past. | ||
98 | |||
99 | (Again, we do these checks too on the basis that an interrupt context | ||
100 | could interrupt _any_ of the irq-unsafe or hardirq-unsafe locks, which | ||
101 | could lead to a lock inversion deadlock - even if that lock scenario did | ||
102 | not trigger in practice yet.) | ||
103 | |||
104 | Exception: Nested data dependencies leading to nested locking | ||
105 | ------------------------------------------------------------- | ||
106 | |||
107 | There are a few cases where the Linux kernel acquires more than one | ||
108 | instance of the same lock-class. Such cases typically happen when there | ||
109 | is some sort of hierarchy within objects of the same type. In these | ||
110 | cases there is an inherent "natural" ordering between the two objects | ||
111 | (defined by the properties of the hierarchy), and the kernel grabs the | ||
112 | locks in this fixed order on each of the objects. | ||
113 | |||
114 | An example of such an object hieararchy that results in "nested locking" | ||
115 | is that of a "whole disk" block-dev object and a "partition" block-dev | ||
116 | object; the partition is "part of" the whole device and as long as one | ||
117 | always takes the whole disk lock as a higher lock than the partition | ||
118 | lock, the lock ordering is fully correct. The validator does not | ||
119 | automatically detect this natural ordering, as the locking rule behind | ||
120 | the ordering is not static. | ||
121 | |||
122 | In order to teach the validator about this correct usage model, new | ||
123 | versions of the various locking primitives were added that allow you to | ||
124 | specify a "nesting level". An example call, for the block device mutex, | ||
125 | looks like this: | ||
126 | |||
127 | enum bdev_bd_mutex_lock_class | ||
128 | { | ||
129 | BD_MUTEX_NORMAL, | ||
130 | BD_MUTEX_WHOLE, | ||
131 | BD_MUTEX_PARTITION | ||
132 | }; | ||
133 | |||
134 | mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION); | ||
135 | |||
136 | In this case the locking is done on a bdev object that is known to be a | ||
137 | partition. | ||
138 | |||
139 | The validator treats a lock that is taken in such a nested fasion as a | ||
140 | separate (sub)class for the purposes of validation. | ||
141 | |||
142 | Note: When changing code to use the _nested() primitives, be careful and | ||
143 | check really thoroughly that the hiearchy is correctly mapped; otherwise | ||
144 | you can get false positives or false negatives. | ||
145 | |||
146 | Proof of 100% correctness: | ||
147 | -------------------------- | ||
148 | |||
149 | The validator achieves perfect, mathematical 'closure' (proof of locking | ||
150 | correctness) in the sense that for every simple, standalone single-task | ||
151 | locking sequence that occured at least once during the lifetime of the | ||
152 | kernel, the validator proves it with a 100% certainty that no | ||
153 | combination and timing of these locking sequences can cause any class of | ||
154 | lock related deadlock. [*] | ||
155 | |||
156 | I.e. complex multi-CPU and multi-task locking scenarios do not have to | ||
157 | occur in practice to prove a deadlock: only the simple 'component' | ||
158 | locking chains have to occur at least once (anytime, in any | ||
159 | task/context) for the validator to be able to prove correctness. (For | ||
160 | example, complex deadlocks that would normally need more than 3 CPUs and | ||
161 | a very unlikely constellation of tasks, irq-contexts and timings to | ||
162 | occur, can be detected on a plain, lightly loaded single-CPU system as | ||
163 | well!) | ||
164 | |||
165 | This radically decreases the complexity of locking related QA of the | ||
166 | kernel: what has to be done during QA is to trigger as many "simple" | ||
167 | single-task locking dependencies in the kernel as possible, at least | ||
168 | once, to prove locking correctness - instead of having to trigger every | ||
169 | possible combination of locking interaction between CPUs, combined with | ||
170 | every possible hardirq and softirq nesting scenario (which is impossible | ||
171 | to do in practice). | ||
172 | |||
173 | [*] assuming that the validator itself is 100% correct, and no other | ||
174 | part of the system corrupts the state of the validator in any way. | ||
175 | We also assume that all NMI/SMM paths [which could interrupt | ||
176 | even hardirq-disabled codepaths] are correct and do not interfere | ||
177 | with the validator. We also assume that the 64-bit 'chain hash' | ||
178 | value is unique for every lock-chain in the system. Also, lock | ||
179 | recursion must not be higher than 20. | ||
180 | |||
181 | Performance: | ||
182 | ------------ | ||
183 | |||
184 | The above rules require _massive_ amounts of runtime checking. If we did | ||
185 | that for every lock taken and for every irqs-enable event, it would | ||
186 | render the system practically unusably slow. The complexity of checking | ||
187 | is O(N^2), so even with just a few hundred lock-classes we'd have to do | ||
188 | tens of thousands of checks for every event. | ||
189 | |||
190 | This problem is solved by checking any given 'locking scenario' (unique | ||
191 | sequence of locks taken after each other) only once. A simple stack of | ||
192 | held locks is maintained, and a lightweight 64-bit hash value is | ||
193 | calculated, which hash is unique for every lock chain. The hash value, | ||
194 | when the chain is validated for the first time, is then put into a hash | ||
195 | table, which hash-table can be checked in a lockfree manner. If the | ||
196 | locking chain occurs again later on, the hash table tells us that we | ||
197 | dont have to validate the chain again. | ||
diff --git a/Documentation/md.txt b/Documentation/md.txt index 03a13c462cf2..0668f9dc9d29 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt | |||
@@ -200,6 +200,17 @@ All md devices contain: | |||
200 | This can be written only while the array is being assembled, not | 200 | This can be written only while the array is being assembled, not |
201 | after it is started. | 201 | after it is started. |
202 | 202 | ||
203 | layout | ||
204 | The "layout" for the array for the particular level. This is | ||
205 | simply a number that is interpretted differently by different | ||
206 | levels. It can be written while assembling an array. | ||
207 | |||
208 | resync_start | ||
209 | The point at which resync should start. If no resync is needed, | ||
210 | this will be a very large number. At array creation it will | ||
211 | default to 0, though starting the array as 'clean' will | ||
212 | set it much larger. | ||
213 | |||
203 | new_dev | 214 | new_dev |
204 | This file can be written but not read. The value written should | 215 | This file can be written but not read. The value written should |
205 | be a block device number as major:minor. e.g. 8:0 | 216 | be a block device number as major:minor. e.g. 8:0 |
@@ -207,6 +218,54 @@ All md devices contain: | |||
207 | available. It will then appear at md/dev-XXX (depending on the | 218 | available. It will then appear at md/dev-XXX (depending on the |
208 | name of the device) and further configuration is then possible. | 219 | name of the device) and further configuration is then possible. |
209 | 220 | ||
221 | safe_mode_delay | ||
222 | When an md array has seen no write requests for a certain period | ||
223 | of time, it will be marked as 'clean'. When another write | ||
224 | request arrive, the array is marked as 'dirty' before the write | ||
225 | commenses. This is known as 'safe_mode'. | ||
226 | The 'certain period' is controlled by this file which stores the | ||
227 | period as a number of seconds. The default is 200msec (0.200). | ||
228 | Writing a value of 0 disables safemode. | ||
229 | |||
230 | array_state | ||
231 | This file contains a single word which describes the current | ||
232 | state of the array. In many cases, the state can be set by | ||
233 | writing the word for the desired state, however some states | ||
234 | cannot be explicitly set, and some transitions are not allowed. | ||
235 | |||
236 | clear | ||
237 | No devices, no size, no level | ||
238 | Writing is equivalent to STOP_ARRAY ioctl | ||
239 | inactive | ||
240 | May have some settings, but array is not active | ||
241 | all IO results in error | ||
242 | When written, doesn't tear down array, but just stops it | ||
243 | suspended (not supported yet) | ||
244 | All IO requests will block. The array can be reconfigured. | ||
245 | Writing this, if accepted, will block until array is quiessent | ||
246 | readonly | ||
247 | no resync can happen. no superblocks get written. | ||
248 | write requests fail | ||
249 | read-auto | ||
250 | like readonly, but behaves like 'clean' on a write request. | ||
251 | |||
252 | clean - no pending writes, but otherwise active. | ||
253 | When written to inactive array, starts without resync | ||
254 | If a write request arrives then | ||
255 | if metadata is known, mark 'dirty' and switch to 'active'. | ||
256 | if not known, block and switch to write-pending | ||
257 | If written to an active array that has pending writes, then fails. | ||
258 | active | ||
259 | fully active: IO and resync can be happening. | ||
260 | When written to inactive array, starts with resync | ||
261 | |||
262 | write-pending | ||
263 | clean, but writes are blocked waiting for 'active' to be written. | ||
264 | |||
265 | active-idle | ||
266 | like active, but no writes have been seen for a while (safe_mode_delay). | ||
267 | |||
268 | |||
210 | sync_speed_min | 269 | sync_speed_min |
211 | sync_speed_max | 270 | sync_speed_max |
212 | This are similar to /proc/sys/dev/raid/speed_limit_{min,max} | 271 | This are similar to /proc/sys/dev/raid/speed_limit_{min,max} |
@@ -250,10 +309,18 @@ Each directory contains: | |||
250 | faulty - device has been kicked from active use due to | 309 | faulty - device has been kicked from active use due to |
251 | a detected fault | 310 | a detected fault |
252 | in_sync - device is a fully in-sync member of the array | 311 | in_sync - device is a fully in-sync member of the array |
312 | writemostly - device will only be subject to read | ||
313 | requests if there are no other options. | ||
314 | This applies only to raid1 arrays. | ||
253 | spare - device is working, but not a full member. | 315 | spare - device is working, but not a full member. |
254 | This includes spares that are in the process | 316 | This includes spares that are in the process |
255 | of being recoverred to | 317 | of being recoverred to |
256 | This list make grow in future. | 318 | This list make grow in future. |
319 | This can be written to. | ||
320 | Writing "faulty" simulates a failure on the device. | ||
321 | Writing "remove" removes the device from the array. | ||
322 | Writing "writemostly" sets the writemostly flag. | ||
323 | Writing "-writemostly" clears the writemostly flag. | ||
257 | 324 | ||
258 | errors | 325 | errors |
259 | An approximate count of read errors that have been detected on | 326 | An approximate count of read errors that have been detected on |
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 4710845dbac4..46b9b389df35 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt | |||
@@ -262,9 +262,14 @@ What is required is some way of intervening to instruct the compiler and the | |||
262 | CPU to restrict the order. | 262 | CPU to restrict the order. |
263 | 263 | ||
264 | Memory barriers are such interventions. They impose a perceived partial | 264 | Memory barriers are such interventions. They impose a perceived partial |
265 | ordering between the memory operations specified on either side of the barrier. | 265 | ordering over the memory operations on either side of the barrier. |
266 | They request that the sequence of memory events generated appears to other | 266 | |
267 | parts of the system as if the barrier is effective on that CPU. | 267 | Such enforcement is important because the CPUs and other devices in a system |
268 | can use a variety of tricks to improve performance - including reordering, | ||
269 | deferral and combination of memory operations; speculative loads; speculative | ||
270 | branch prediction and various types of caching. Memory barriers are used to | ||
271 | override or suppress these tricks, allowing the code to sanely control the | ||
272 | interaction of multiple CPUs and/or devices. | ||
268 | 273 | ||
269 | 274 | ||
270 | VARIETIES OF MEMORY BARRIER | 275 | VARIETIES OF MEMORY BARRIER |
@@ -282,7 +287,7 @@ Memory barriers come in four basic varieties: | |||
282 | A write barrier is a partial ordering on stores only; it is not required | 287 | A write barrier is a partial ordering on stores only; it is not required |
283 | to have any effect on loads. | 288 | to have any effect on loads. |
284 | 289 | ||
285 | A CPU can be viewed as as commiting a sequence of store operations to the | 290 | A CPU can be viewed as committing a sequence of store operations to the |
286 | memory system as time progresses. All stores before a write barrier will | 291 | memory system as time progresses. All stores before a write barrier will |
287 | occur in the sequence _before_ all the stores after the write barrier. | 292 | occur in the sequence _before_ all the stores after the write barrier. |
288 | 293 | ||
@@ -413,7 +418,7 @@ There are certain things that the Linux kernel memory barriers do not guarantee: | |||
413 | indirect effect will be the order in which the second CPU sees the effects | 418 | indirect effect will be the order in which the second CPU sees the effects |
414 | of the first CPU's accesses occur, but see the next point: | 419 | of the first CPU's accesses occur, but see the next point: |
415 | 420 | ||
416 | (*) There is no guarantee that the a CPU will see the correct order of effects | 421 | (*) There is no guarantee that a CPU will see the correct order of effects |
417 | from a second CPU's accesses, even _if_ the second CPU uses a memory | 422 | from a second CPU's accesses, even _if_ the second CPU uses a memory |
418 | barrier, unless the first CPU _also_ uses a matching memory barrier (see | 423 | barrier, unless the first CPU _also_ uses a matching memory barrier (see |
419 | the subsection on "SMP Barrier Pairing"). | 424 | the subsection on "SMP Barrier Pairing"). |
@@ -461,8 +466,8 @@ Whilst this may seem like a failure of coherency or causality maintenance, it | |||
461 | isn't, and this behaviour can be observed on certain real CPUs (such as the DEC | 466 | isn't, and this behaviour can be observed on certain real CPUs (such as the DEC |
462 | Alpha). | 467 | Alpha). |
463 | 468 | ||
464 | To deal with this, a data dependency barrier must be inserted between the | 469 | To deal with this, a data dependency barrier or better must be inserted |
465 | address load and the data load: | 470 | between the address load and the data load: |
466 | 471 | ||
467 | CPU 1 CPU 2 | 472 | CPU 1 CPU 2 |
468 | =============== =============== | 473 | =============== =============== |
@@ -484,7 +489,7 @@ lines. The pointer P might be stored in an odd-numbered cache line, and the | |||
484 | variable B might be stored in an even-numbered cache line. Then, if the | 489 | variable B might be stored in an even-numbered cache line. Then, if the |
485 | even-numbered bank of the reading CPU's cache is extremely busy while the | 490 | even-numbered bank of the reading CPU's cache is extremely busy while the |
486 | odd-numbered bank is idle, one can see the new value of the pointer P (&B), | 491 | odd-numbered bank is idle, one can see the new value of the pointer P (&B), |
487 | but the old value of the variable B (1). | 492 | but the old value of the variable B (2). |
488 | 493 | ||
489 | 494 | ||
490 | Another example of where data dependency barriers might by required is where a | 495 | Another example of where data dependency barriers might by required is where a |
@@ -597,7 +602,7 @@ Consider the following sequence of events: | |||
597 | 602 | ||
598 | This sequence of events is committed to the memory coherence system in an order | 603 | This sequence of events is committed to the memory coherence system in an order |
599 | that the rest of the system might perceive as the unordered set of { STORE A, | 604 | that the rest of the system might perceive as the unordered set of { STORE A, |
600 | STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E | 605 | STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E |
601 | }: | 606 | }: |
602 | 607 | ||
603 | +-------+ : : | 608 | +-------+ : : |
@@ -744,7 +749,7 @@ some effectively random order, despite the write barrier issued by CPU 1: | |||
744 | : : | 749 | : : |
745 | 750 | ||
746 | 751 | ||
747 | If, however, a read barrier were to be placed between the load of E and the | 752 | If, however, a read barrier were to be placed between the load of B and the |
748 | load of A on CPU 2: | 753 | load of A on CPU 2: |
749 | 754 | ||
750 | CPU 1 CPU 2 | 755 | CPU 1 CPU 2 |
@@ -1010,10 +1015,9 @@ CPU from reordering them. | |||
1010 | There are some more advanced barrier functions: | 1015 | There are some more advanced barrier functions: |
1011 | 1016 | ||
1012 | (*) set_mb(var, value) | 1017 | (*) set_mb(var, value) |
1013 | (*) set_wmb(var, value) | ||
1014 | 1018 | ||
1015 | These assign the value to the variable and then insert at least a write | 1019 | This assigns the value to the variable and then inserts at least a write |
1016 | barrier after it, depending on the function. They aren't guaranteed to | 1020 | barrier after it, depending on the function. It isn't guaranteed to |
1017 | insert anything more than a compiler barrier in a UP compilation. | 1021 | insert anything more than a compiler barrier in a UP compilation. |
1018 | 1022 | ||
1019 | 1023 | ||
@@ -1461,9 +1465,8 @@ instruction itself is complete. | |||
1461 | 1465 | ||
1462 | On a UP system - where this wouldn't be a problem - the smp_mb() is just a | 1466 | On a UP system - where this wouldn't be a problem - the smp_mb() is just a |
1463 | compiler barrier, thus making sure the compiler emits the instructions in the | 1467 | compiler barrier, thus making sure the compiler emits the instructions in the |
1464 | right order without actually intervening in the CPU. Since there there's only | 1468 | right order without actually intervening in the CPU. Since there's only one |
1465 | one CPU, that CPU's dependency ordering logic will take care of everything | 1469 | CPU, that CPU's dependency ordering logic will take care of everything else. |
1466 | else. | ||
1467 | 1470 | ||
1468 | 1471 | ||
1469 | ATOMIC OPERATIONS | 1472 | ATOMIC OPERATIONS |
@@ -1640,9 +1643,9 @@ functions: | |||
1640 | 1643 | ||
1641 | The PCI bus, amongst others, defines an I/O space concept - which on such | 1644 | The PCI bus, amongst others, defines an I/O space concept - which on such |
1642 | CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O | 1645 | CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O |
1643 | space. However, it may also mapped as a virtual I/O space in the CPU's | 1646 | space. However, it may also be mapped as a virtual I/O space in the CPU's |
1644 | memory map, particularly on those CPUs that don't support alternate | 1647 | memory map, particularly on those CPUs that don't support alternate I/O |
1645 | I/O spaces. | 1648 | spaces. |
1646 | 1649 | ||
1647 | Accesses to this space may be fully synchronous (as on i386), but | 1650 | Accesses to this space may be fully synchronous (as on i386), but |
1648 | intermediary bridges (such as the PCI host bridge) may not fully honour | 1651 | intermediary bridges (such as the PCI host bridge) may not fully honour |
diff --git a/Documentation/mips/time.README b/Documentation/mips/time.README index 70bc0dd43d6d..69ddc5c14b79 100644 --- a/Documentation/mips/time.README +++ b/Documentation/mips/time.README | |||
@@ -65,7 +65,7 @@ the following functions or values: | |||
65 | 1. (optional) set up RTC routines | 65 | 1. (optional) set up RTC routines |
66 | 2. (optional) calibrate and set the mips_counter_frequency | 66 | 2. (optional) calibrate and set the mips_counter_frequency |
67 | 67 | ||
68 | b) board_timer_setup - a function pointer. Invoked at the end of time_init() | 68 | b) plat_timer_setup - a function pointer. Invoked at the end of time_init() |
69 | 1. (optional) over-ride any decisions made in time_init() | 69 | 1. (optional) over-ride any decisions made in time_init() |
70 | 2. set up the irqaction for timer interrupt. | 70 | 2. set up the irqaction for timer interrupt. |
71 | 3. enable the timer interrupt | 71 | 3. enable the timer interrupt |
@@ -116,19 +116,17 @@ Step 2: the machine setup() function | |||
116 | 116 | ||
117 | If you supply board_time_init(), set the function poointer. | 117 | If you supply board_time_init(), set the function poointer. |
118 | 118 | ||
119 | Set the function pointer board_timer_setup() (mandatory) | ||
120 | 119 | ||
121 | 120 | Step 3: implement rtc routines, board_time_init() and plat_timer_setup() | |
122 | Step 3: implement rtc routines, board_time_init() and board_timer_setup() | ||
123 | if needed. | 121 | if needed. |
124 | 122 | ||
125 | board_time_init() - | 123 | board_time_init() - |
126 | a) (optional) set up RTC routines, | 124 | a) (optional) set up RTC routines, |
127 | b) (optional) calibrate and set the mips_counter_frequency | 125 | b) (optional) calibrate and set the mips_counter_frequency |
128 | (only needed if you intended to use fixed_rate_gettimeoffset | 126 | (only needed if you intended to use fixed_rate_gettimeoffset |
129 | or use cpu counter as timer interrupt source) | 127 | or use cpu counter as timer interrupt source) |
130 | 128 | ||
131 | board_timer_setup() - | 129 | plat_timer_setup() - |
132 | a) (optional) over-write any choices made above by time_init(). | 130 | a) (optional) over-write any choices made above by time_init(). |
133 | b) machine specific code should setup the timer irqaction. | 131 | b) machine specific code should setup the timer irqaction. |
134 | c) enable the timer interrupt | 132 | c) enable the timer interrupt |
diff --git a/Documentation/netlabel/00-INDEX b/Documentation/netlabel/00-INDEX new file mode 100644 index 000000000000..837bf35990e2 --- /dev/null +++ b/Documentation/netlabel/00-INDEX | |||
@@ -0,0 +1,10 @@ | |||
1 | 00-INDEX | ||
2 | - this file. | ||
3 | cipso_ipv4.txt | ||
4 | - documentation on the IPv4 CIPSO protocol engine. | ||
5 | draft-ietf-cipso-ipsecurity-01.txt | ||
6 | - IETF draft of the CIPSO protocol, dated 16 July 1992. | ||
7 | introduction.txt | ||
8 | - NetLabel introduction, READ THIS FIRST. | ||
9 | lsm_interface.txt | ||
10 | - documentation on the NetLabel kernel security module API. | ||
diff --git a/Documentation/netlabel/cipso_ipv4.txt b/Documentation/netlabel/cipso_ipv4.txt new file mode 100644 index 000000000000..93dacb132c3c --- /dev/null +++ b/Documentation/netlabel/cipso_ipv4.txt | |||
@@ -0,0 +1,48 @@ | |||
1 | NetLabel CIPSO/IPv4 Protocol Engine | ||
2 | ============================================================================== | ||
3 | Paul Moore, paul.moore@hp.com | ||
4 | |||
5 | May 17, 2006 | ||
6 | |||
7 | * Overview | ||
8 | |||
9 | The NetLabel CIPSO/IPv4 protocol engine is based on the IETF Commercial IP | ||
10 | Security Option (CIPSO) draft from July 16, 1992. A copy of this draft can be | ||
11 | found in this directory, consult '00-INDEX' for the filename. While the IETF | ||
12 | draft never made it to an RFC standard it has become a de-facto standard for | ||
13 | labeled networking and is used in many trusted operating systems. | ||
14 | |||
15 | * Outbound Packet Processing | ||
16 | |||
17 | The CIPSO/IPv4 protocol engine applies the CIPSO IP option to packets by | ||
18 | adding the CIPSO label to the socket. This causes all packets leaving the | ||
19 | system through the socket to have the CIPSO IP option applied. The socket's | ||
20 | CIPSO label can be changed at any point in time, however, it is recommended | ||
21 | that it is set upon the socket's creation. The LSM can set the socket's CIPSO | ||
22 | label by using the NetLabel security module API; if the NetLabel "domain" is | ||
23 | configured to use CIPSO for packet labeling then a CIPSO IP option will be | ||
24 | generated and attached to the socket. | ||
25 | |||
26 | * Inbound Packet Processing | ||
27 | |||
28 | The CIPSO/IPv4 protocol engine validates every CIPSO IP option it finds at the | ||
29 | IP layer without any special handling required by the LSM. However, in order | ||
30 | to decode and translate the CIPSO label on the packet the LSM must use the | ||
31 | NetLabel security module API to extract the security attributes of the packet. | ||
32 | This is typically done at the socket layer using the 'socket_sock_rcv_skb()' | ||
33 | LSM hook. | ||
34 | |||
35 | * Label Translation | ||
36 | |||
37 | The CIPSO/IPv4 protocol engine contains a mechanism to translate CIPSO security | ||
38 | attributes such as sensitivity level and category to values which are | ||
39 | appropriate for the host. These mappings are defined as part of a CIPSO | ||
40 | Domain Of Interpretation (DOI) definition and are configured through the | ||
41 | NetLabel user space communication layer. Each DOI definition can have a | ||
42 | different security attribute mapping table. | ||
43 | |||
44 | * Label Translation Cache | ||
45 | |||
46 | The NetLabel system provides a framework for caching security attribute | ||
47 | mappings from the network labels to the corresponding LSM identifiers. The | ||
48 | CIPSO/IPv4 protocol engine supports this caching mechanism. | ||
diff --git a/Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt b/Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt new file mode 100644 index 000000000000..256c2c9d4f50 --- /dev/null +++ b/Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt | |||
@@ -0,0 +1,791 @@ | |||
1 | IETF CIPSO Working Group | ||
2 | 16 July, 1992 | ||
3 | |||
4 | |||
5 | |||
6 | COMMERCIAL IP SECURITY OPTION (CIPSO 2.2) | ||
7 | |||
8 | |||
9 | |||
10 | 1. Status | ||
11 | |||
12 | This Internet Draft provides the high level specification for a Commercial | ||
13 | IP Security Option (CIPSO). This draft reflects the version as approved by | ||
14 | the CIPSO IETF Working Group. Distribution of this memo is unlimited. | ||
15 | |||
16 | This document is an Internet Draft. Internet Drafts are working documents | ||
17 | of the Internet Engineering Task Force (IETF), its Areas, and its Working | ||
18 | Groups. Note that other groups may also distribute working documents as | ||
19 | Internet Drafts. | ||
20 | |||
21 | Internet Drafts are draft documents valid for a maximum of six months. | ||
22 | Internet Drafts may be updated, replaced, or obsoleted by other documents | ||
23 | at any time. It is not appropriate to use Internet Drafts as reference | ||
24 | material or to cite them other than as a "working draft" or "work in | ||
25 | progress." | ||
26 | |||
27 | Please check the I-D abstract listing contained in each Internet Draft | ||
28 | directory to learn the current status of this or any other Internet Draft. | ||
29 | |||
30 | |||
31 | |||
32 | |||
33 | 2. Background | ||
34 | |||
35 | Currently the Internet Protocol includes two security options. One of | ||
36 | these options is the DoD Basic Security Option (BSO) (Type 130) which allows | ||
37 | IP datagrams to be labeled with security classifications. This option | ||
38 | provides sixteen security classifications and a variable number of handling | ||
39 | restrictions. To handle additional security information, such as security | ||
40 | categories or compartments, another security option (Type 133) exists and | ||
41 | is referred to as the DoD Extended Security Option (ESO). The values for | ||
42 | the fixed fields within these two options are administered by the Defense | ||
43 | Information Systems Agency (DISA). | ||
44 | |||
45 | Computer vendors are now building commercial operating systems with | ||
46 | mandatory access controls and multi-level security. These systems are | ||
47 | no longer built specifically for a particular group in the defense or | ||
48 | intelligence communities. They are generally available commercial systems | ||
49 | for use in a variety of government and civil sector environments. | ||
50 | |||
51 | The small number of ESO format codes can not support all the possible | ||
52 | applications of a commercial security option. The BSO and ESO were | ||
53 | designed to only support the United States DoD. CIPSO has been designed | ||
54 | to support multiple security policies. This Internet Draft provides the | ||
55 | format and procedures required to support a Mandatory Access Control | ||
56 | security policy. Support for additional security policies shall be | ||
57 | defined in future RFCs. | ||
58 | |||
59 | |||
60 | |||
61 | |||
62 | Internet Draft, Expires 15 Jan 93 [PAGE 1] | ||
63 | |||
64 | |||
65 | |||
66 | CIPSO INTERNET DRAFT 16 July, 1992 | ||
67 | |||
68 | |||
69 | |||
70 | |||
71 | 3. CIPSO Format | ||
72 | |||
73 | Option type: 134 (Class 0, Number 6, Copy on Fragmentation) | ||
74 | Option length: Variable | ||
75 | |||
76 | This option permits security related information to be passed between | ||
77 | systems within a single Domain of Interpretation (DOI). A DOI is a | ||
78 | collection of systems which agree on the meaning of particular values | ||
79 | in the security option. An authority that has been assigned a DOI | ||
80 | identifier will define a mapping between appropriate CIPSO field values | ||
81 | and their human readable equivalent. This authority will distribute that | ||
82 | mapping to hosts within the authority's domain. These mappings may be | ||
83 | sensitive, therefore a DOI authority is not required to make these | ||
84 | mappings available to anyone other than the systems that are included in | ||
85 | the DOI. | ||
86 | |||
87 | This option MUST be copied on fragmentation. This option appears at most | ||
88 | once in a datagram. All multi-octet fields in the option are defined to be | ||
89 | transmitted in network byte order. The format of this option is as follows: | ||
90 | |||
91 | +----------+----------+------//------+-----------//---------+ | ||
92 | | 10000110 | LLLLLLLL | DDDDDDDDDDDD | TTTTTTTTTTTTTTTTTTTT | | ||
93 | +----------+----------+------//------+-----------//---------+ | ||
94 | |||
95 | TYPE=134 OPTION DOMAIN OF TAGS | ||
96 | LENGTH INTERPRETATION | ||
97 | |||
98 | |||
99 | Figure 1. CIPSO Format | ||
100 | |||
101 | |||
102 | 3.1 Type | ||
103 | |||
104 | This field is 1 octet in length. Its value is 134. | ||
105 | |||
106 | |||
107 | 3.2 Length | ||
108 | |||
109 | This field is 1 octet in length. It is the total length of the option | ||
110 | including the type and length fields. With the current IP header length | ||
111 | restriction of 40 octets the value of this field MUST not exceed 40. | ||
112 | |||
113 | |||
114 | 3.3 Domain of Interpretation Identifier | ||
115 | |||
116 | This field is an unsigned 32 bit integer. The value 0 is reserved and MUST | ||
117 | not appear as the DOI identifier in any CIPSO option. Implementations | ||
118 | should assume that the DOI identifier field is not aligned on any particular | ||
119 | byte boundary. | ||
120 | |||
121 | To conserve space in the protocol, security levels and categories are | ||
122 | represented by numbers rather than their ASCII equivalent. This requires | ||
123 | a mapping table within CIPSO hosts to map these numbers to their | ||
124 | corresponding ASCII representations. Non-related groups of systems may | ||
125 | |||
126 | |||
127 | |||
128 | Internet Draft, Expires 15 Jan 93 [PAGE 2] | ||
129 | |||
130 | |||
131 | |||
132 | CIPSO INTERNET DRAFT 16 July, 1992 | ||
133 | |||
134 | |||
135 | |||
136 | have their own unique mappings. For example, one group of systems may | ||
137 | use the number 5 to represent Unclassified while another group may use the | ||
138 | number 1 to represent that same security level. The DOI identifier is used | ||
139 | to identify which mapping was used for the values within the option. | ||
140 | |||
141 | |||
142 | 3.4 Tag Types | ||
143 | |||
144 | A common format for passing security related information is necessary | ||
145 | for interoperability. CIPSO uses sets of "tags" to contain the security | ||
146 | information relevant to the data in the IP packet. Each tag begins with | ||
147 | a tag type identifier followed by the length of the tag and ends with the | ||
148 | actual security information to be passed. All multi-octet fields in a tag | ||
149 | are defined to be transmitted in network byte order. Like the DOI | ||
150 | identifier field in the CIPSO header, implementations should assume that | ||
151 | all tags, as well as fields within a tag, are not aligned on any particular | ||
152 | octet boundary. The tag types defined in this document contain alignment | ||
153 | bytes to assist alignment of some information, however alignment can not | ||
154 | be guaranteed if CIPSO is not the first IP option. | ||
155 | |||
156 | CIPSO tag types 0 through 127 are reserved for defining standard tag | ||
157 | formats. Their definitions will be published in RFCs. Tag types whose | ||
158 | identifiers are greater than 127 are defined by the DOI authority and may | ||
159 | only be meaningful in certain Domains of Interpretation. For these tag | ||
160 | types, implementations will require the DOI identifier as well as the tag | ||
161 | number to determine the security policy and the format associated with the | ||
162 | tag. Use of tag types above 127 are restricted to closed networks where | ||
163 | interoperability with other networks will not be an issue. Implementations | ||
164 | that support a tag type greater than 127 MUST support at least one DOI that | ||
165 | requires only tag types 1 to 127. | ||
166 | |||
167 | Tag type 0 is reserved. Tag types 1, 2, and 5 are defined in this | ||
168 | Internet Draft. Types 3 and 4 are reserved for work in progress. | ||
169 | The standard format for all current and future CIPSO tags is shown below: | ||
170 | |||
171 | +----------+----------+--------//--------+ | ||
172 | | TTTTTTTT | LLLLLLLL | IIIIIIIIIIIIIIII | | ||
173 | +----------+----------+--------//--------+ | ||
174 | TAG TAG TAG | ||
175 | TYPE LENGTH INFORMATION | ||
176 | |||
177 | Figure 2: Standard Tag Format | ||
178 | |||
179 | In the three tag types described in this document, the length and count | ||
180 | restrictions are based on the current IP limitation of 40 octets for all | ||
181 | IP options. If the IP header is later expanded, then the length and count | ||
182 | restrictions specified in this document may increase to use the full area | ||
183 | provided for IP options. | ||
184 | |||
185 | |||
186 | 3.4.1 Tag Type Classes | ||
187 | |||
188 | Tag classes consist of tag types that have common processing requirements | ||
189 | and support the same security policy. The three tags defined in this | ||
190 | Internet Draft belong to the Mandatory Access Control (MAC) Sensitivity | ||
191 | |||
192 | |||
193 | |||
194 | Internet Draft, Expires 15 Jan 93 [PAGE 3] | ||
195 | |||
196 | |||
197 | |||
198 | CIPSO INTERNET DRAFT 16 July, 1992 | ||
199 | |||
200 | |||
201 | |||
202 | class and support the MAC Sensitivity security policy. | ||
203 | |||
204 | |||
205 | 3.4.2 Tag Type 1 | ||
206 | |||
207 | This is referred to as the "bit-mapped" tag type. Tag type 1 is included | ||
208 | in the MAC Sensitivity tag type class. The format of this tag type is as | ||
209 | follows: | ||
210 | |||
211 | +----------+----------+----------+----------+--------//---------+ | ||
212 | | 00000001 | LLLLLLLL | 00000000 | LLLLLLLL | CCCCCCCCCCCCCCCCC | | ||
213 | +----------+----------+----------+----------+--------//---------+ | ||
214 | |||
215 | TAG TAG ALIGNMENT SENSITIVITY BIT MAP OF | ||
216 | TYPE LENGTH OCTET LEVEL CATEGORIES | ||
217 | |||
218 | Figure 3. Tag Type 1 Format | ||
219 | |||
220 | |||
221 | 3.4.2.1 Tag Type | ||
222 | |||
223 | This field is 1 octet in length and has a value of 1. | ||
224 | |||
225 | |||
226 | 3.4.2.2 Tag Length | ||
227 | |||
228 | This field is 1 octet in length. It is the total length of the tag type | ||
229 | including the type and length fields. With the current IP header length | ||
230 | restriction of 40 bytes the value within this field is between 4 and 34. | ||
231 | |||
232 | |||
233 | 3.4.2.3 Alignment Octet | ||
234 | |||
235 | This field is 1 octet in length and always has the value of 0. Its purpose | ||
236 | is to align the category bitmap field on an even octet boundary. This will | ||
237 | speed many implementations including router implementations. | ||
238 | |||
239 | |||
240 | 3.4.2.4 Sensitivity Level | ||
241 | |||
242 | This field is 1 octet in length. Its value is from 0 to 255. The values | ||
243 | are ordered with 0 being the minimum value and 255 representing the maximum | ||
244 | value. | ||
245 | |||
246 | |||
247 | 3.4.2.5 Bit Map of Categories | ||
248 | |||
249 | The length of this field is variable and ranges from 0 to 30 octets. This | ||
250 | provides representation of categories 0 to 239. The ordering of the bits | ||
251 | is left to right or MSB to LSB. For example category 0 is represented by | ||
252 | the most significant bit of the first byte and category 15 is represented | ||
253 | by the least significant bit of the second byte. Figure 4 graphically | ||
254 | shows this ordering. Bit N is binary 1 if category N is part of the label | ||
255 | for the datagram, and bit N is binary 0 if category N is not part of the | ||
256 | label. Except for the optimized tag 1 format described in the next section, | ||
257 | |||
258 | |||
259 | |||
260 | Internet Draft, Expires 15 Jan 93 [PAGE 4] | ||
261 | |||
262 | |||
263 | |||
264 | CIPSO INTERNET DRAFT 16 July, 1992 | ||
265 | |||
266 | |||
267 | |||
268 | minimal encoding SHOULD be used resulting in no trailing zero octets in the | ||
269 | category bitmap. | ||
270 | |||
271 | octet 0 octet 1 octet 2 octet 3 octet 4 octet 5 | ||
272 | XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX . . . | ||
273 | bit 01234567 89111111 11112222 22222233 33333333 44444444 | ||
274 | number 012345 67890123 45678901 23456789 01234567 | ||
275 | |||
276 | Figure 4. Ordering of Bits in Tag 1 Bit Map | ||
277 | |||
278 | |||
279 | 3.4.2.6 Optimized Tag 1 Format | ||
280 | |||
281 | Routers work most efficiently when processing fixed length fields. To | ||
282 | support these routers there is an optimized form of tag type 1. The format | ||
283 | does not change. The only change is to the category bitmap which is set to | ||
284 | a constant length of 10 octets. Trailing octets required to fill out the 10 | ||
285 | octets are zero filled. Ten octets, allowing for 80 categories, was chosen | ||
286 | because it makes the total length of the CIPSO option 20 octets. If CIPSO | ||
287 | is the only option then the option will be full word aligned and additional | ||
288 | filler octets will not be required. | ||
289 | |||
290 | |||
291 | 3.4.3 Tag Type 2 | ||
292 | |||
293 | This is referred to as the "enumerated" tag type. It is used to describe | ||
294 | large but sparsely populated sets of categories. Tag type 2 is in the MAC | ||
295 | Sensitivity tag type class. The format of this tag type is as follows: | ||
296 | |||
297 | +----------+----------+----------+----------+-------------//-------------+ | ||
298 | | 00000010 | LLLLLLLL | 00000000 | LLLLLLLL | CCCCCCCCCCCCCCCCCCCCCCCCCC | | ||
299 | +----------+----------+----------+----------+-------------//-------------+ | ||
300 | |||
301 | TAG TAG ALIGNMENT SENSITIVITY ENUMERATED | ||
302 | TYPE LENGTH OCTET LEVEL CATEGORIES | ||
303 | |||
304 | Figure 5. Tag Type 2 Format | ||
305 | |||
306 | |||
307 | 3.4.3.1 Tag Type | ||
308 | |||
309 | This field is one octet in length and has a value of 2. | ||
310 | |||
311 | |||
312 | 3.4.3.2 Tag Length | ||
313 | |||
314 | This field is 1 octet in length. It is the total length of the tag type | ||
315 | including the type and length fields. With the current IP header length | ||
316 | restriction of 40 bytes the value within this field is between 4 and 34. | ||
317 | |||
318 | |||
319 | 3.4.3.3 Alignment Octet | ||
320 | |||
321 | This field is 1 octet in length and always has the value of 0. Its purpose | ||
322 | is to align the category field on an even octet boundary. This will | ||
323 | |||
324 | |||
325 | |||
326 | Internet Draft, Expires 15 Jan 93 [PAGE 5] | ||
327 | |||
328 | |||
329 | |||
330 | CIPSO INTERNET DRAFT 16 July, 1992 | ||
331 | |||
332 | |||
333 | |||
334 | speed many implementations including router implementations. | ||
335 | |||
336 | |||
337 | 3.4.3.4 Sensitivity Level | ||
338 | |||
339 | This field is 1 octet in length. Its value is from 0 to 255. The values | ||
340 | are ordered with 0 being the minimum value and 255 representing the | ||
341 | maximum value. | ||
342 | |||
343 | |||
344 | 3.4.3.5 Enumerated Categories | ||
345 | |||
346 | In this tag, categories are represented by their actual value rather than | ||
347 | by their position within a bit field. The length of each category is 2 | ||
348 | octets. Up to 15 categories may be represented by this tag. Valid values | ||
349 | for categories are 0 to 65534. Category 65535 is not a valid category | ||
350 | value. The categories MUST be listed in ascending order within the tag. | ||
351 | |||
352 | |||
353 | 3.4.4 Tag Type 5 | ||
354 | |||
355 | This is referred to as the "range" tag type. It is used to represent | ||
356 | labels where all categories in a range, or set of ranges, are included | ||
357 | in the sensitivity label. Tag type 5 is in the MAC Sensitivity tag type | ||
358 | class. The format of this tag type is as follows: | ||
359 | |||
360 | +----------+----------+----------+----------+------------//-------------+ | ||
361 | | 00000101 | LLLLLLLL | 00000000 | LLLLLLLL | Top/Bottom | Top/Bottom | | ||
362 | +----------+----------+----------+----------+------------//-------------+ | ||
363 | |||
364 | TAG TAG ALIGNMENT SENSITIVITY CATEGORY RANGES | ||
365 | TYPE LENGTH OCTET LEVEL | ||
366 | |||
367 | Figure 6. Tag Type 5 Format | ||
368 | |||
369 | |||
370 | 3.4.4.1 Tag Type | ||
371 | |||
372 | This field is one octet in length and has a value of 5. | ||
373 | |||
374 | |||
375 | 3.4.4.2 Tag Length | ||
376 | |||
377 | This field is 1 octet in length. It is the total length of the tag type | ||
378 | including the type and length fields. With the current IP header length | ||
379 | restriction of 40 bytes the value within this field is between 4 and 34. | ||
380 | |||
381 | |||
382 | 3.4.4.3 Alignment Octet | ||
383 | |||
384 | This field is 1 octet in length and always has the value of 0. Its purpose | ||
385 | is to align the category range field on an even octet boundary. This will | ||
386 | speed many implementations including router implementations. | ||
387 | |||
388 | |||
389 | |||
390 | |||
391 | |||
392 | Internet Draft, Expires 15 Jan 93 [PAGE 6] | ||
393 | |||
394 | |||
395 | |||
396 | CIPSO INTERNET DRAFT 16 July, 1992 | ||
397 | |||
398 | |||
399 | |||
400 | 3.4.4.4 Sensitivity Level | ||
401 | |||
402 | This field is 1 octet in length. Its value is from 0 to 255. The values | ||
403 | are ordered with 0 being the minimum value and 255 representing the maximum | ||
404 | value. | ||
405 | |||
406 | |||
407 | 3.4.4.5 Category Ranges | ||
408 | |||
409 | A category range is a 4 octet field comprised of the 2 octet index of the | ||
410 | highest numbered category followed by the 2 octet index of the lowest | ||
411 | numbered category. These range endpoints are inclusive within the range of | ||
412 | categories. All categories within a range are included in the sensitivity | ||
413 | label. This tag may contain a maximum of 7 category pairs. The bottom | ||
414 | category endpoint for the last pair in the tag MAY be omitted and SHOULD be | ||
415 | assumed to be 0. The ranges MUST be non-overlapping and be listed in | ||
416 | descending order. Valid values for categories are 0 to 65534. Category | ||
417 | 65535 is not a valid category value. | ||
418 | |||
419 | |||
420 | 3.4.5 Minimum Requirements | ||
421 | |||
422 | A CIPSO implementation MUST be capable of generating at least tag type 1 in | ||
423 | the non-optimized form. In addition, a CIPSO implementation MUST be able | ||
424 | to receive any valid tag type 1 even those using the optimized tag type 1 | ||
425 | format. | ||
426 | |||
427 | |||
428 | 4. Configuration Parameters | ||
429 | |||
430 | The configuration parameters defined below are required for all CIPSO hosts, | ||
431 | gateways, and routers that support multiple sensitivity labels. A CIPSO | ||
432 | host is defined to be the origination or destination system for an IP | ||
433 | datagram. A CIPSO gateway provides IP routing services between two or more | ||
434 | IP networks and may be required to perform label translations between | ||
435 | networks. A CIPSO gateway may be an enhanced CIPSO host or it may just | ||
436 | provide gateway services with no end system CIPSO capabilities. A CIPSO | ||
437 | router is a dedicated IP router that routes IP datagrams between two or more | ||
438 | IP networks. | ||
439 | |||
440 | An implementation of CIPSO on a host MUST have the capability to reject a | ||
441 | datagram for reasons that the information contained can not be adequately | ||
442 | protected by the receiving host or if acceptance may result in violation of | ||
443 | the host or network security policy. In addition, a CIPSO gateway or router | ||
444 | MUST be able to reject datagrams going to networks that can not provide | ||
445 | adequate protection or may violate the network's security policy. To | ||
446 | provide this capability the following minimal set of configuration | ||
447 | parameters are required for CIPSO implementations: | ||
448 | |||
449 | HOST_LABEL_MAX - This parameter contains the maximum sensitivity label that | ||
450 | a CIPSO host is authorized to handle. All datagrams that have a label | ||
451 | greater than this maximum MUST be rejected by the CIPSO host. This | ||
452 | parameter does not apply to CIPSO gateways or routers. This parameter need | ||
453 | not be defined explicitly as it can be implicitly derived from the | ||
454 | PORT_LABEL_MAX parameters for the associated interfaces. | ||
455 | |||
456 | |||
457 | |||
458 | Internet Draft, Expires 15 Jan 93 [PAGE 7] | ||
459 | |||
460 | |||
461 | |||
462 | CIPSO INTERNET DRAFT 16 July, 1992 | ||
463 | |||
464 | |||
465 | |||
466 | |||
467 | HOST_LABEL_MIN - This parameter contains the minimum sensitivity label that | ||
468 | a CIPSO host is authorized to handle. All datagrams that have a label less | ||
469 | than this minimum MUST be rejected by the CIPSO host. This parameter does | ||
470 | not apply to CIPSO gateways or routers. This parameter need not be defined | ||
471 | explicitly as it can be implicitly derived from the PORT_LABEL_MIN | ||
472 | parameters for the associated interfaces. | ||
473 | |||
474 | PORT_LABEL_MAX - This parameter contains the maximum sensitivity label for | ||
475 | all datagrams that may exit a particular network interface port. All | ||
476 | outgoing datagrams that have a label greater than this maximum MUST be | ||
477 | rejected by the CIPSO system. The label within this parameter MUST be | ||
478 | less than or equal to the label within the HOST_LABEL_MAX parameter. This | ||
479 | parameter does not apply to CIPSO hosts that support only one network port. | ||
480 | |||
481 | PORT_LABEL_MIN - This parameter contains the minimum sensitivity label for | ||
482 | all datagrams that may exit a particular network interface port. All | ||
483 | outgoing datagrams that have a label less than this minimum MUST be | ||
484 | rejected by the CIPSO system. The label within this parameter MUST be | ||
485 | greater than or equal to the label within the HOST_LABEL_MIN parameter. | ||
486 | This parameter does not apply to CIPSO hosts that support only one network | ||
487 | port. | ||
488 | |||
489 | PORT_DOI - This parameter is used to assign a DOI identifier value to a | ||
490 | particular network interface port. All CIPSO labels within datagrams | ||
491 | going out this port MUST use the specified DOI identifier. All CIPSO | ||
492 | hosts and gateways MUST support either this parameter, the NET_DOI | ||
493 | parameter, or the HOST_DOI parameter. | ||
494 | |||
495 | NET_DOI - This parameter is used to assign a DOI identifier value to a | ||
496 | particular IP network address. All CIPSO labels within datagrams destined | ||
497 | for the particular IP network MUST use the specified DOI identifier. All | ||
498 | CIPSO hosts and gateways MUST support either this parameter, the PORT_DOI | ||
499 | parameter, or the HOST_DOI parameter. | ||
500 | |||
501 | HOST_DOI - This parameter is used to assign a DOI identifier value to a | ||
502 | particular IP host address. All CIPSO labels within datagrams destined for | ||
503 | the particular IP host will use the specified DOI identifier. All CIPSO | ||
504 | hosts and gateways MUST support either this parameter, the PORT_DOI | ||
505 | parameter, or the NET_DOI parameter. | ||
506 | |||
507 | This list represents the minimal set of configuration parameters required | ||
508 | to be compliant. Implementors are encouraged to add to this list to | ||
509 | provide enhanced functionality and control. For example, many security | ||
510 | policies may require both incoming and outgoing datagrams be checked against | ||
511 | the port and host label ranges. | ||
512 | |||
513 | |||
514 | 4.1 Port Range Parameters | ||
515 | |||
516 | The labels represented by the PORT_LABEL_MAX and PORT_LABEL_MIN parameters | ||
517 | MAY be in CIPSO or local format. Some CIPSO systems, such as routers, may | ||
518 | want to have the range parameters expressed in CIPSO format so that incoming | ||
519 | labels do not have to be converted to a local format before being compared | ||
520 | against the range. If multiple DOIs are supported by one of these CIPSO | ||
521 | |||
522 | |||
523 | |||
524 | Internet Draft, Expires 15 Jan 93 [PAGE 8] | ||
525 | |||
526 | |||
527 | |||
528 | CIPSO INTERNET DRAFT 16 July, 1992 | ||
529 | |||
530 | |||
531 | |||
532 | systems then multiple port range parameters would be needed, one set for | ||
533 | each DOI supported on a particular port. | ||
534 | |||
535 | The port range will usually represent the total set of labels that may | ||
536 | exist on the logical network accessed through the corresponding network | ||
537 | interface. It may, however, represent a subset of these labels that are | ||
538 | allowed to enter the CIPSO system. | ||
539 | |||
540 | |||
541 | 4.2 Single Label CIPSO Hosts | ||
542 | |||
543 | CIPSO implementations that support only one label are not required to | ||
544 | support the parameters described above. These limited implementations are | ||
545 | only required to support a NET_LABEL parameter. This parameter contains | ||
546 | the CIPSO label that may be inserted in datagrams that exit the host. In | ||
547 | addition, the host MUST reject any incoming datagram that has a label which | ||
548 | is not equivalent to the NET_LABEL parameter. | ||
549 | |||
550 | |||
551 | 5. Handling Procedures | ||
552 | |||
553 | This section describes the processing requirements for incoming and | ||
554 | outgoing IP datagrams. Just providing the correct CIPSO label format | ||
555 | is not enough. Assumptions will be made by one system on how a | ||
556 | receiving system will handle the CIPSO label. Wrong assumptions may | ||
557 | lead to non-interoperability or even a security incident. The | ||
558 | requirements described below represent the minimal set needed for | ||
559 | interoperability and that provide users some level of confidence. | ||
560 | Many other requirements could be added to increase user confidence, | ||
561 | however at the risk of restricting creativity and limiting vendor | ||
562 | participation. | ||
563 | |||
564 | |||
565 | 5.1 Input Procedures | ||
566 | |||
567 | All datagrams received through a network port MUST have a security label | ||
568 | associated with them, either contained in the datagram or assigned to the | ||
569 | receiving port. Without this label the host, gateway, or router will not | ||
570 | have the information it needs to make security decisions. This security | ||
571 | label will be obtained from the CIPSO if the option is present in the | ||
572 | datagram. See section 4.1.2 for handling procedures for unlabeled | ||
573 | datagrams. This label will be compared against the PORT (if appropriate) | ||
574 | and HOST configuration parameters defined in section 3. | ||
575 | |||
576 | If any field within the CIPSO option, such as the DOI identifier, is not | ||
577 | recognized the IP datagram is discarded and an ICMP "parameter problem" | ||
578 | (type 12) is generated and returned. The ICMP code field is set to "bad | ||
579 | parameter" (code 0) and the pointer is set to the start of the CIPSO field | ||
580 | that is unrecognized. | ||
581 | |||
582 | If the contents of the CIPSO are valid but the security label is | ||
583 | outside of the configured host or port label range, the datagram is | ||
584 | discarded and an ICMP "destination unreachable" (type 3) is generated | ||
585 | and returned. The code field of the ICMP is set to "communication with | ||
586 | destination network administratively prohibited" (code 9) or to | ||
587 | |||
588 | |||
589 | |||
590 | Internet Draft, Expires 15 Jan 93 [PAGE 9] | ||
591 | |||
592 | |||
593 | |||
594 | CIPSO INTERNET DRAFT 16 July, 1992 | ||
595 | |||
596 | |||
597 | |||
598 | "communication with destination host administratively prohibited" | ||
599 | (code 10). The value of the code field used is dependent upon whether | ||
600 | the originator of the ICMP message is acting as a CIPSO host or a CIPSO | ||
601 | gateway. The recipient of the ICMP message MUST be able to handle either | ||
602 | value. The same procedure is performed if a CIPSO can not be added to an | ||
603 | IP packet because it is too large to fit in the IP options area. | ||
604 | |||
605 | If the error is triggered by receipt of an ICMP message, the message | ||
606 | is discarded and no response is permitted (consistent with general ICMP | ||
607 | processing rules). | ||
608 | |||
609 | |||
610 | 5.1.1 Unrecognized tag types | ||
611 | |||
612 | The default condition for any CIPSO implementation is that an | ||
613 | unrecognized tag type MUST be treated as a "parameter problem" and | ||
614 | handled as described in section 4.1. A CIPSO implementation MAY allow | ||
615 | the system administrator to identify tag types that may safely be | ||
616 | ignored. This capability is an allowable enhancement, not a | ||
617 | requirement. | ||
618 | |||
619 | |||
620 | 5.1.2 Unlabeled Packets | ||
621 | |||
622 | A network port may be configured to not require a CIPSO label for all | ||
623 | incoming datagrams. For this configuration a CIPSO label must be | ||
624 | assigned to that network port and associated with all unlabeled IP | ||
625 | datagrams. This capability might be used for single level networks or | ||
626 | networks that have CIPSO and non-CIPSO hosts and the non-CIPSO hosts | ||
627 | all operate at the same label. | ||
628 | |||
629 | If a CIPSO option is required and none is found, the datagram is | ||
630 | discarded and an ICMP "parameter problem" (type 12) is generated and | ||
631 | returned to the originator of the datagram. The code field of the ICMP | ||
632 | is set to "option missing" (code 1) and the ICMP pointer is set to 134 | ||
633 | (the value of the option type for the missing CIPSO option). | ||
634 | |||
635 | |||
636 | 5.2 Output Procedures | ||
637 | |||
638 | A CIPSO option MUST appear only once in a datagram. Only one tag type | ||
639 | from the MAC Sensitivity class MAY be included in a CIPSO option. Given | ||
640 | the current set of defined tag types, this means that CIPSO labels at | ||
641 | first will contain only one tag. | ||
642 | |||
643 | All datagrams leaving a CIPSO system MUST meet the following condition: | ||
644 | |||
645 | PORT_LABEL_MIN <= CIPSO label <= PORT_LABEL_MAX | ||
646 | |||
647 | If this condition is not satisfied the datagram MUST be discarded. | ||
648 | If the CIPSO system only supports one port, the HOST_LABEL_MIN and the | ||
649 | HOST_LABEL_MAX parameters MAY be substituted for the PORT parameters in | ||
650 | the above condition. | ||
651 | |||
652 | The DOI identifier to be used for all outgoing datagrams is configured by | ||
653 | |||
654 | |||
655 | |||
656 | Internet Draft, Expires 15 Jan 93 [PAGE 10] | ||
657 | |||
658 | |||
659 | |||
660 | CIPSO INTERNET DRAFT 16 July, 1992 | ||
661 | |||
662 | |||
663 | |||
664 | the administrator. If port level DOI identifier assignment is used, then | ||
665 | the PORT_DOI configuration parameter MUST contain the DOI identifier to | ||
666 | use. If network level DOI assignment is used, then the NET_DOI parameter | ||
667 | MUST contain the DOI identifier to use. And if host level DOI assignment | ||
668 | is employed, then the HOST_DOI parameter MUST contain the DOI identifier | ||
669 | to use. A CIPSO implementation need only support one level of DOI | ||
670 | assignment. | ||
671 | |||
672 | |||
673 | 5.3 DOI Processing Requirements | ||
674 | |||
675 | A CIPSO implementation MUST support at least one DOI and SHOULD support | ||
676 | multiple DOIs. System and network administrators are cautioned to | ||
677 | ensure that at least one DOI is common within an IP network to allow for | ||
678 | broadcasting of IP datagrams. | ||
679 | |||
680 | CIPSO gateways MUST be capable of translating a CIPSO option from one | ||
681 | DOI to another when forwarding datagrams between networks. For | ||
682 | efficiency purposes this capability is only a desired feature for CIPSO | ||
683 | routers. | ||
684 | |||
685 | |||
686 | 5.4 Label of ICMP Messages | ||
687 | |||
688 | The CIPSO label to be used on all outgoing ICMP messages MUST be equivalent | ||
689 | to the label of the datagram that caused the ICMP message. If the ICMP was | ||
690 | generated due to a problem associated with the original CIPSO label then the | ||
691 | following responses are allowed: | ||
692 | |||
693 | a. Use the CIPSO label of the original IP datagram | ||
694 | b. Drop the original datagram with no return message generated | ||
695 | |||
696 | In most cases these options will have the same effect. If you can not | ||
697 | interpret the label or if it is outside the label range of your host or | ||
698 | interface then an ICMP message with the same label will probably not be | ||
699 | able to exit the system. | ||
700 | |||
701 | |||
702 | 6. Assignment of DOI Identifier Numbers = | ||
703 | |||
704 | Requests for assignment of a DOI identifier number should be addressed to | ||
705 | the Internet Assigned Numbers Authority (IANA). | ||
706 | |||
707 | |||
708 | 7. Acknowledgements | ||
709 | |||
710 | Much of the material in this RFC is based on (and copied from) work | ||
711 | done by Gary Winiger of Sun Microsystems and published as Commercial | ||
712 | IP Security Option at the INTEROP 89, Commercial IPSO Workshop. | ||
713 | |||
714 | |||
715 | 8. Author's Address | ||
716 | |||
717 | To submit mail for distribution to members of the IETF CIPSO Working | ||
718 | Group, send mail to: cipso@wdl1.wdl.loral.com. | ||
719 | |||
720 | |||
721 | |||
722 | Internet Draft, Expires 15 Jan 93 [PAGE 11] | ||
723 | |||
724 | |||
725 | |||
726 | CIPSO INTERNET DRAFT 16 July, 1992 | ||
727 | |||
728 | |||
729 | |||
730 | |||
731 | To be added to or deleted from this distribution, send mail to: | ||
732 | cipso-request@wdl1.wdl.loral.com. | ||
733 | |||
734 | |||
735 | 9. References | ||
736 | |||
737 | RFC 1038, "Draft Revised IP Security Option", M. St. Johns, IETF, January | ||
738 | 1988. | ||
739 | |||
740 | RFC 1108, "U.S. Department of Defense Security Options | ||
741 | for the Internet Protocol", Stephen Kent, IAB, 1 March, 1991. | ||
742 | |||
743 | |||
744 | |||
745 | |||
746 | |||
747 | |||
748 | |||
749 | |||
750 | |||
751 | |||
752 | |||
753 | |||
754 | |||
755 | |||
756 | |||
757 | |||
758 | |||
759 | |||
760 | |||
761 | |||
762 | |||
763 | |||
764 | |||
765 | |||
766 | |||
767 | |||
768 | |||
769 | |||
770 | |||
771 | |||
772 | |||
773 | |||
774 | |||
775 | |||
776 | |||
777 | |||
778 | |||
779 | |||
780 | |||
781 | |||
782 | |||
783 | |||
784 | |||
785 | |||
786 | |||
787 | |||
788 | Internet Draft, Expires 15 Jan 93 [PAGE 12] | ||
789 | |||
790 | |||
791 | |||
diff --git a/Documentation/netlabel/introduction.txt b/Documentation/netlabel/introduction.txt new file mode 100644 index 000000000000..a4ffba1694c8 --- /dev/null +++ b/Documentation/netlabel/introduction.txt | |||
@@ -0,0 +1,46 @@ | |||
1 | NetLabel Introduction | ||
2 | ============================================================================== | ||
3 | Paul Moore, paul.moore@hp.com | ||
4 | |||
5 | August 2, 2006 | ||
6 | |||
7 | * Overview | ||
8 | |||
9 | NetLabel is a mechanism which can be used by kernel security modules to attach | ||
10 | security attributes to outgoing network packets generated from user space | ||
11 | applications and read security attributes from incoming network packets. It | ||
12 | is composed of three main components, the protocol engines, the communication | ||
13 | layer, and the kernel security module API. | ||
14 | |||
15 | * Protocol Engines | ||
16 | |||
17 | The protocol engines are responsible for both applying and retrieving the | ||
18 | network packet's security attributes. If any translation between the network | ||
19 | security attributes and those on the host are required then the protocol | ||
20 | engine will handle those tasks as well. Other kernel subsystems should | ||
21 | refrain from calling the protocol engines directly, instead they should use | ||
22 | the NetLabel kernel security module API described below. | ||
23 | |||
24 | Detailed information about each NetLabel protocol engine can be found in this | ||
25 | directory, consult '00-INDEX' for filenames. | ||
26 | |||
27 | * Communication Layer | ||
28 | |||
29 | The communication layer exists to allow NetLabel configuration and monitoring | ||
30 | from user space. The NetLabel communication layer uses a message based | ||
31 | protocol built on top of the Generic NETLINK transport mechanism. The exact | ||
32 | formatting of these NetLabel messages as well as the Generic NETLINK family | ||
33 | names can be found in the the 'net/netlabel/' directory as comments in the | ||
34 | header files as well as in 'include/net/netlabel.h'. | ||
35 | |||
36 | * Security Module API | ||
37 | |||
38 | The purpose of the NetLabel security module API is to provide a protocol | ||
39 | independent interface to the underlying NetLabel protocol engines. In addition | ||
40 | to protocol independence, the security module API is designed to be completely | ||
41 | LSM independent which should allow multiple LSMs to leverage the same code | ||
42 | base. | ||
43 | |||
44 | Detailed information about the NetLabel security module API can be found in the | ||
45 | 'include/net/netlabel.h' header file as well as the 'lsm_interface.txt' file | ||
46 | found in this directory. | ||
diff --git a/Documentation/netlabel/lsm_interface.txt b/Documentation/netlabel/lsm_interface.txt new file mode 100644 index 000000000000..98dd9f7430f2 --- /dev/null +++ b/Documentation/netlabel/lsm_interface.txt | |||
@@ -0,0 +1,47 @@ | |||
1 | NetLabel Linux Security Module Interface | ||
2 | ============================================================================== | ||
3 | Paul Moore, paul.moore@hp.com | ||
4 | |||
5 | May 17, 2006 | ||
6 | |||
7 | * Overview | ||
8 | |||
9 | NetLabel is a mechanism which can set and retrieve security attributes from | ||
10 | network packets. It is intended to be used by LSM developers who want to make | ||
11 | use of a common code base for several different packet labeling protocols. | ||
12 | The NetLabel security module API is defined in 'include/net/netlabel.h' but a | ||
13 | brief overview is given below. | ||
14 | |||
15 | * NetLabel Security Attributes | ||
16 | |||
17 | Since NetLabel supports multiple different packet labeling protocols and LSMs | ||
18 | it uses the concept of security attributes to refer to the packet's security | ||
19 | labels. The NetLabel security attributes are defined by the | ||
20 | 'netlbl_lsm_secattr' structure in the NetLabel header file. Internally the | ||
21 | NetLabel subsystem converts the security attributes to and from the correct | ||
22 | low-level packet label depending on the NetLabel build time and run time | ||
23 | configuration. It is up to the LSM developer to translate the NetLabel | ||
24 | security attributes into whatever security identifiers are in use for their | ||
25 | particular LSM. | ||
26 | |||
27 | * NetLabel LSM Protocol Operations | ||
28 | |||
29 | These are the functions which allow the LSM developer to manipulate the labels | ||
30 | on outgoing packets as well as read the labels on incoming packets. Functions | ||
31 | exist to operate both on sockets as well as the sk_buffs directly. These high | ||
32 | level functions are translated into low level protocol operations based on how | ||
33 | the administrator has configured the NetLabel subsystem. | ||
34 | |||
35 | * NetLabel Label Mapping Cache Operations | ||
36 | |||
37 | Depending on the exact configuration, translation between the network packet | ||
38 | label and the internal LSM security identifier can be time consuming. The | ||
39 | NetLabel label mapping cache is a caching mechanism which can be used to | ||
40 | sidestep much of this overhead once a mapping has been established. Once the | ||
41 | LSM has received a packet, used NetLabel to decode it's security attributes, | ||
42 | and translated the security attributes into a LSM internal identifier the LSM | ||
43 | can use the NetLabel caching functions to associate the LSM internal | ||
44 | identifier with the network packet's label. This means that in the future | ||
45 | when a incoming packet matches a cached value not only are the internal | ||
46 | NetLabel translation mechanisms bypassed but the LSM translation mechanisms are | ||
47 | bypassed as well which should result in a significant reduction in overhead. | ||
diff --git a/Documentation/networking/LICENSE.qla3xxx b/Documentation/networking/LICENSE.qla3xxx new file mode 100644 index 000000000000..2f2077e34d81 --- /dev/null +++ b/Documentation/networking/LICENSE.qla3xxx | |||
@@ -0,0 +1,46 @@ | |||
1 | Copyright (c) 2003-2006 QLogic Corporation | ||
2 | QLogic Linux Networking HBA Driver | ||
3 | |||
4 | This program includes a device driver for Linux 2.6 that may be | ||
5 | distributed with QLogic hardware specific firmware binary file. | ||
6 | You may modify and redistribute the device driver code under the | ||
7 | GNU General Public License as published by the Free Software | ||
8 | Foundation (version 2 or a later version). | ||
9 | |||
10 | You may redistribute the hardware specific firmware binary file | ||
11 | under the following terms: | ||
12 | |||
13 | 1. Redistribution of source code (only if applicable), | ||
14 | must retain the above copyright notice, this list of | ||
15 | conditions and the following disclaimer. | ||
16 | |||
17 | 2. Redistribution in binary form must reproduce the above | ||
18 | copyright notice, this list of conditions and the | ||
19 | following disclaimer in the documentation and/or other | ||
20 | materials provided with the distribution. | ||
21 | |||
22 | 3. The name of QLogic Corporation may not be used to | ||
23 | endorse or promote products derived from this software | ||
24 | without specific prior written permission | ||
25 | |||
26 | REGARDLESS OF WHAT LICENSING MECHANISM IS USED OR APPLICABLE, | ||
27 | THIS PROGRAM IS PROVIDED BY QLOGIC CORPORATION "AS IS'' AND ANY | ||
28 | EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
29 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A | ||
30 | PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR | ||
31 | BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, | ||
32 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED | ||
33 | TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | ||
34 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON | ||
35 | ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | ||
36 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | ||
37 | OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | ||
38 | POSSIBILITY OF SUCH DAMAGE. | ||
39 | |||
40 | USER ACKNOWLEDGES AND AGREES THAT USE OF THIS PROGRAM WILL NOT | ||
41 | CREATE OR GIVE GROUNDS FOR A LICENSE BY IMPLICATION, ESTOPPEL, OR | ||
42 | OTHERWISE IN ANY INTELLECTUAL PROPERTY RIGHTS (PATENT, COPYRIGHT, | ||
43 | TRADE SECRET, MASK WORK, OR OTHER PROPRIETARY RIGHT) EMBODIED IN | ||
44 | ANY OTHER QLOGIC HARDWARE OR SOFTWARE EITHER SOLELY OR IN | ||
45 | COMBINATION WITH THIS PROGRAM. | ||
46 | |||
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index afac780445cd..dc942eaf490f 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt | |||
@@ -192,6 +192,17 @@ or, for backwards compatibility, the option value. E.g., | |||
192 | arp_interval | 192 | arp_interval |
193 | 193 | ||
194 | Specifies the ARP link monitoring frequency in milliseconds. | 194 | Specifies the ARP link monitoring frequency in milliseconds. |
195 | |||
196 | The ARP monitor works by periodically checking the slave | ||
197 | devices to determine whether they have sent or received | ||
198 | traffic recently (the precise criteria depends upon the | ||
199 | bonding mode, and the state of the slave). Regular traffic is | ||
200 | generated via ARP probes issued for the addresses specified by | ||
201 | the arp_ip_target option. | ||
202 | |||
203 | This behavior can be modified by the arp_validate option, | ||
204 | below. | ||
205 | |||
195 | If ARP monitoring is used in an etherchannel compatible mode | 206 | If ARP monitoring is used in an etherchannel compatible mode |
196 | (modes 0 and 2), the switch should be configured in a mode | 207 | (modes 0 and 2), the switch should be configured in a mode |
197 | that evenly distributes packets across all links. If the | 208 | that evenly distributes packets across all links. If the |
@@ -213,6 +224,54 @@ arp_ip_target | |||
213 | maximum number of targets that can be specified is 16. The | 224 | maximum number of targets that can be specified is 16. The |
214 | default value is no IP addresses. | 225 | default value is no IP addresses. |
215 | 226 | ||
227 | arp_validate | ||
228 | |||
229 | Specifies whether or not ARP probes and replies should be | ||
230 | validated in the active-backup mode. This causes the ARP | ||
231 | monitor to examine the incoming ARP requests and replies, and | ||
232 | only consider a slave to be up if it is receiving the | ||
233 | appropriate ARP traffic. | ||
234 | |||
235 | Possible values are: | ||
236 | |||
237 | none or 0 | ||
238 | |||
239 | No validation is performed. This is the default. | ||
240 | |||
241 | active or 1 | ||
242 | |||
243 | Validation is performed only for the active slave. | ||
244 | |||
245 | backup or 2 | ||
246 | |||
247 | Validation is performed only for backup slaves. | ||
248 | |||
249 | all or 3 | ||
250 | |||
251 | Validation is performed for all slaves. | ||
252 | |||
253 | For the active slave, the validation checks ARP replies to | ||
254 | confirm that they were generated by an arp_ip_target. Since | ||
255 | backup slaves do not typically receive these replies, the | ||
256 | validation performed for backup slaves is on the ARP request | ||
257 | sent out via the active slave. It is possible that some | ||
258 | switch or network configurations may result in situations | ||
259 | wherein the backup slaves do not receive the ARP requests; in | ||
260 | such a situation, validation of backup slaves must be | ||
261 | disabled. | ||
262 | |||
263 | This option is useful in network configurations in which | ||
264 | multiple bonding hosts are concurrently issuing ARPs to one or | ||
265 | more targets beyond a common switch. Should the link between | ||
266 | the switch and target fail (but not the switch itself), the | ||
267 | probe traffic generated by the multiple bonding instances will | ||
268 | fool the standard ARP monitor into considering the links as | ||
269 | still up. Use of the arp_validate option can resolve this, as | ||
270 | the ARP monitor will only consider ARP requests and replies | ||
271 | associated with its own instance of bonding. | ||
272 | |||
273 | This option was added in bonding version 3.1.0. | ||
274 | |||
216 | downdelay | 275 | downdelay |
217 | 276 | ||
218 | Specifies the time, in milliseconds, to wait before disabling | 277 | Specifies the time, in milliseconds, to wait before disabling |
diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index c45daabd3bfe..74563b38ffd9 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt | |||
@@ -1,7 +1,6 @@ | |||
1 | DCCP protocol | 1 | DCCP protocol |
2 | ============ | 2 | ============ |
3 | 3 | ||
4 | Last updated: 10 November 2005 | ||
5 | 4 | ||
6 | Contents | 5 | Contents |
7 | ======== | 6 | ======== |
@@ -42,8 +41,11 @@ Socket options | |||
42 | DCCP_SOCKOPT_PACKET_SIZE is used for CCID3 to set default packet size for | 41 | DCCP_SOCKOPT_PACKET_SIZE is used for CCID3 to set default packet size for |
43 | calculations. | 42 | calculations. |
44 | 43 | ||
45 | DCCP_SOCKOPT_SERVICE sets the service. This is compulsory as per the | 44 | DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of |
46 | specification. If you don't set it you will get EPROTO. | 45 | service codes (RFC 4340, sec. 8.1.2); if this socket option is not set, |
46 | the socket will fall back to 0 (which means that no meaningful service code | ||
47 | is present). Connecting sockets set at most one service option; for | ||
48 | listening sockets, multiple service codes can be specified. | ||
47 | 49 | ||
48 | Notes | 50 | Notes |
49 | ===== | 51 | ===== |
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index d46338af6002..935e298f674a 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt | |||
@@ -102,9 +102,15 @@ inet_peer_gc_maxtime - INTEGER | |||
102 | TCP variables: | 102 | TCP variables: |
103 | 103 | ||
104 | tcp_abc - INTEGER | 104 | tcp_abc - INTEGER |
105 | Controls Appropriate Byte Count defined in RFC3465. If set to | 105 | Controls Appropriate Byte Count (ABC) defined in RFC3465. |
106 | 0 then does congestion avoid once per ack. 1 is conservative | 106 | ABC is a way of increasing congestion window (cwnd) more slowly |
107 | value, and 2 is more agressive. | 107 | in response to partial acknowledgments. |
108 | Possible values are: | ||
109 | 0 increase cwnd once per acknowledgment (no ABC) | ||
110 | 1 increase cwnd once per acknowledgment of full sized segment | ||
111 | 2 allow increase cwnd by two if acknowledgment is | ||
112 | of two segments to compensate for delayed acknowledgments. | ||
113 | Default: 0 (off) | ||
108 | 114 | ||
109 | tcp_syn_retries - INTEGER | 115 | tcp_syn_retries - INTEGER |
110 | Number of times initial SYNs for an active TCP connection attempt | 116 | Number of times initial SYNs for an active TCP connection attempt |
@@ -294,15 +300,15 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max | |||
294 | Default: 87380*2 bytes. | 300 | Default: 87380*2 bytes. |
295 | 301 | ||
296 | tcp_mem - vector of 3 INTEGERs: min, pressure, max | 302 | tcp_mem - vector of 3 INTEGERs: min, pressure, max |
297 | low: below this number of pages TCP is not bothered about its | 303 | min: below this number of pages TCP is not bothered about its |
298 | memory appetite. | 304 | memory appetite. |
299 | 305 | ||
300 | pressure: when amount of memory allocated by TCP exceeds this number | 306 | pressure: when amount of memory allocated by TCP exceeds this number |
301 | of pages, TCP moderates its memory consumption and enters memory | 307 | of pages, TCP moderates its memory consumption and enters memory |
302 | pressure mode, which is exited when memory consumption falls | 308 | pressure mode, which is exited when memory consumption falls |
303 | under "low". | 309 | under "min". |
304 | 310 | ||
305 | high: number of pages allowed for queueing by all TCP sockets. | 311 | max: number of pages allowed for queueing by all TCP sockets. |
306 | 312 | ||
307 | Defaults are calculated at boot time from amount of available | 313 | Defaults are calculated at boot time from amount of available |
308 | memory. | 314 | memory. |
@@ -369,6 +375,41 @@ tcp_slow_start_after_idle - BOOLEAN | |||
369 | be timed out after an idle period. | 375 | be timed out after an idle period. |
370 | Default: 1 | 376 | Default: 1 |
371 | 377 | ||
378 | CIPSOv4 Variables: | ||
379 | |||
380 | cipso_cache_enable - BOOLEAN | ||
381 | If set, enable additions to and lookups from the CIPSO label mapping | ||
382 | cache. If unset, additions are ignored and lookups always result in a | ||
383 | miss. However, regardless of the setting the cache is still | ||
384 | invalidated when required when means you can safely toggle this on and | ||
385 | off and the cache will always be "safe". | ||
386 | Default: 1 | ||
387 | |||
388 | cipso_cache_bucket_size - INTEGER | ||
389 | The CIPSO label cache consists of a fixed size hash table with each | ||
390 | hash bucket containing a number of cache entries. This variable limits | ||
391 | the number of entries in each hash bucket; the larger the value the | ||
392 | more CIPSO label mappings that can be cached. When the number of | ||
393 | entries in a given hash bucket reaches this limit adding new entries | ||
394 | causes the oldest entry in the bucket to be removed to make room. | ||
395 | Default: 10 | ||
396 | |||
397 | cipso_rbm_optfmt - BOOLEAN | ||
398 | Enable the "Optimized Tag 1 Format" as defined in section 3.4.2.6 of | ||
399 | the CIPSO draft specification (see Documentation/netlabel for details). | ||
400 | This means that when set the CIPSO tag will be padded with empty | ||
401 | categories in order to make the packet data 32-bit aligned. | ||
402 | Default: 0 | ||
403 | |||
404 | cipso_rbm_structvalid - BOOLEAN | ||
405 | If set, do a very strict check of the CIPSO option when | ||
406 | ip_options_compile() is called. If unset, relax the checks done during | ||
407 | ip_options_compile(). Either way is "safe" as errors are caught else | ||
408 | where in the CIPSO processing code but setting this to 0 (False) should | ||
409 | result in less work (i.e. it should be faster) but could cause problems | ||
410 | with other implementations that require strict checking. | ||
411 | Default: 0 | ||
412 | |||
372 | IP Variables: | 413 | IP Variables: |
373 | 414 | ||
374 | ip_local_port_range - 2 INTEGERS | 415 | ip_local_port_range - 2 INTEGERS |
@@ -724,6 +765,9 @@ conf/all/forwarding - BOOLEAN | |||
724 | 765 | ||
725 | This referred to as global forwarding. | 766 | This referred to as global forwarding. |
726 | 767 | ||
768 | proxy_ndp - BOOLEAN | ||
769 | Do proxy ndp. | ||
770 | |||
727 | conf/interface/*: | 771 | conf/interface/*: |
728 | Change special settings per interface. | 772 | Change special settings per interface. |
729 | 773 | ||
diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt new file mode 100644 index 000000000000..4ccdbca03811 --- /dev/null +++ b/Documentation/networking/ipvs-sysctl.txt | |||
@@ -0,0 +1,143 @@ | |||
1 | /proc/sys/net/ipv4/vs/* Variables: | ||
2 | |||
3 | am_droprate - INTEGER | ||
4 | default 10 | ||
5 | |||
6 | It sets the always mode drop rate, which is used in the mode 3 | ||
7 | of the drop_rate defense. | ||
8 | |||
9 | amemthresh - INTEGER | ||
10 | default 1024 | ||
11 | |||
12 | It sets the available memory threshold (in pages), which is | ||
13 | used in the automatic modes of defense. When there is no | ||
14 | enough available memory, the respective strategy will be | ||
15 | enabled and the variable is automatically set to 2, otherwise | ||
16 | the strategy is disabled and the variable is set to 1. | ||
17 | |||
18 | cache_bypass - BOOLEAN | ||
19 | 0 - disabled (default) | ||
20 | not 0 - enabled | ||
21 | |||
22 | If it is enabled, forward packets to the original destination | ||
23 | directly when no cache server is available and destination | ||
24 | address is not local (iph->daddr is RTN_UNICAST). It is mostly | ||
25 | used in transparent web cache cluster. | ||
26 | |||
27 | debug_level - INTEGER | ||
28 | 0 - transmission error messages (default) | ||
29 | 1 - non-fatal error messages | ||
30 | 2 - configuration | ||
31 | 3 - destination trash | ||
32 | 4 - drop entry | ||
33 | 5 - service lookup | ||
34 | 6 - scheduling | ||
35 | 7 - connection new/expire, lookup and synchronization | ||
36 | 8 - state transition | ||
37 | 9 - binding destination, template checks and applications | ||
38 | 10 - IPVS packet transmission | ||
39 | 11 - IPVS packet handling (ip_vs_in/ip_vs_out) | ||
40 | 12 or more - packet traversal | ||
41 | |||
42 | Only available when IPVS is compiled with the CONFIG_IPVS_DEBUG | ||
43 | |||
44 | Higher debugging levels include the messages for lower debugging | ||
45 | levels, so setting debug level 2, includes level 0, 1 and 2 | ||
46 | messages. Thus, logging becomes more and more verbose the higher | ||
47 | the level. | ||
48 | |||
49 | drop_entry - INTEGER | ||
50 | 0 - disabled (default) | ||
51 | |||
52 | The drop_entry defense is to randomly drop entries in the | ||
53 | connection hash table, just in order to collect back some | ||
54 | memory for new connections. In the current code, the | ||
55 | drop_entry procedure can be activated every second, then it | ||
56 | randomly scans 1/32 of the whole and drops entries that are in | ||
57 | the SYN-RECV/SYNACK state, which should be effective against | ||
58 | syn-flooding attack. | ||
59 | |||
60 | The valid values of drop_entry are from 0 to 3, where 0 means | ||
61 | that this strategy is always disabled, 1 and 2 mean automatic | ||
62 | modes (when there is no enough available memory, the strategy | ||
63 | is enabled and the variable is automatically set to 2, | ||
64 | otherwise the strategy is disabled and the variable is set to | ||
65 | 1), and 3 means that that the strategy is always enabled. | ||
66 | |||
67 | drop_packet - INTEGER | ||
68 | 0 - disabled (default) | ||
69 | |||
70 | The drop_packet defense is designed to drop 1/rate packets | ||
71 | before forwarding them to real servers. If the rate is 1, then | ||
72 | drop all the incoming packets. | ||
73 | |||
74 | The value definition is the same as that of the drop_entry. In | ||
75 | the automatic mode, the rate is determined by the follow | ||
76 | formula: rate = amemthresh / (amemthresh - available_memory) | ||
77 | when available memory is less than the available memory | ||
78 | threshold. When the mode 3 is set, the always mode drop rate | ||
79 | is controlled by the /proc/sys/net/ipv4/vs/am_droprate. | ||
80 | |||
81 | expire_nodest_conn - BOOLEAN | ||
82 | 0 - disabled (default) | ||
83 | not 0 - enabled | ||
84 | |||
85 | The default value is 0, the load balancer will silently drop | ||
86 | packets when its destination server is not available. It may | ||
87 | be useful, when user-space monitoring program deletes the | ||
88 | destination server (because of server overload or wrong | ||
89 | detection) and add back the server later, and the connections | ||
90 | to the server can continue. | ||
91 | |||
92 | If this feature is enabled, the load balancer will expire the | ||
93 | connection immediately when a packet arrives and its | ||
94 | destination server is not available, then the client program | ||
95 | will be notified that the connection is closed. This is | ||
96 | equivalent to the feature some people requires to flush | ||
97 | connections when its destination is not available. | ||
98 | |||
99 | expire_quiescent_template - BOOLEAN | ||
100 | 0 - disabled (default) | ||
101 | not 0 - enabled | ||
102 | |||
103 | When set to a non-zero value, the load balancer will expire | ||
104 | persistent templates when the destination server is quiescent. | ||
105 | This may be useful, when a user makes a destination server | ||
106 | quiescent by setting its weight to 0 and it is desired that | ||
107 | subsequent otherwise persistent connections are sent to a | ||
108 | different destination server. By default new persistent | ||
109 | connections are allowed to quiescent destination servers. | ||
110 | |||
111 | If this feature is enabled, the load balancer will expire the | ||
112 | persistence template if it is to be used to schedule a new | ||
113 | connection and the destination server is quiescent. | ||
114 | |||
115 | nat_icmp_send - BOOLEAN | ||
116 | 0 - disabled (default) | ||
117 | not 0 - enabled | ||
118 | |||
119 | It controls sending icmp error messages (ICMP_DEST_UNREACH) | ||
120 | for VS/NAT when the load balancer receives packets from real | ||
121 | servers but the connection entries don't exist. | ||
122 | |||
123 | secure_tcp - INTEGER | ||
124 | 0 - disabled (default) | ||
125 | |||
126 | The secure_tcp defense is to use a more complicated state | ||
127 | transition table and some possible short timeouts of each | ||
128 | state. In the VS/NAT, it delays the entering the ESTABLISHED | ||
129 | until the real server starts to send data and ACK packet | ||
130 | (after 3-way handshake). | ||
131 | |||
132 | The value definition is the same as that of drop_entry or | ||
133 | drop_packet. | ||
134 | |||
135 | sync_threshold - INTEGER | ||
136 | default 3 | ||
137 | |||
138 | It sets synchronization threshold, which is the minimum number | ||
139 | of incoming packets that a connection needs to receive before | ||
140 | the connection will be synchronized. A connection will be | ||
141 | synchronized, every time the number of its incoming packets | ||
142 | modulus 50 equals the threshold. The range of the threshold is | ||
143 | from 0 to 49. | ||
diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt index 278771c9ad99..18d385c068fc 100644 --- a/Documentation/networking/pktgen.txt +++ b/Documentation/networking/pktgen.txt | |||
@@ -74,7 +74,7 @@ Examples: | |||
74 | pgset "pkt_size 9014" sets packet size to 9014 | 74 | pgset "pkt_size 9014" sets packet size to 9014 |
75 | pgset "frags 5" packet will consist of 5 fragments | 75 | pgset "frags 5" packet will consist of 5 fragments |
76 | pgset "count 200000" sets number of packets to send, set to zero | 76 | pgset "count 200000" sets number of packets to send, set to zero |
77 | for continious sends untill explicitl stopped. | 77 | for continuous sends until explicitly stopped. |
78 | 78 | ||
79 | pgset "delay 5000" adds delay to hard_start_xmit(). nanoseconds | 79 | pgset "delay 5000" adds delay to hard_start_xmit(). nanoseconds |
80 | 80 | ||
@@ -100,6 +100,7 @@ Examples: | |||
100 | are: IPSRC_RND #IP Source is random (between min/max), | 100 | are: IPSRC_RND #IP Source is random (between min/max), |
101 | IPDST_RND, UDPSRC_RND, | 101 | IPDST_RND, UDPSRC_RND, |
102 | UDPDST_RND, MACSRC_RND, MACDST_RND | 102 | UDPDST_RND, MACSRC_RND, MACDST_RND |
103 | MPLS_RND, VID_RND, SVID_RND | ||
103 | 104 | ||
104 | pgset "udp_src_min 9" set UDP source port min, If < udp_src_max, then | 105 | pgset "udp_src_min 9" set UDP source port min, If < udp_src_max, then |
105 | cycle through the port range. | 106 | cycle through the port range. |
@@ -125,6 +126,21 @@ Examples: | |||
125 | 126 | ||
126 | pgset "mpls 0" turn off mpls (or any invalid argument works too!) | 127 | pgset "mpls 0" turn off mpls (or any invalid argument works too!) |
127 | 128 | ||
129 | pgset "vlan_id 77" set VLAN ID 0-4095 | ||
130 | pgset "vlan_p 3" set priority bit 0-7 (default 0) | ||
131 | pgset "vlan_cfi 0" set canonical format identifier 0-1 (default 0) | ||
132 | |||
133 | pgset "svlan_id 22" set SVLAN ID 0-4095 | ||
134 | pgset "svlan_p 3" set priority bit 0-7 (default 0) | ||
135 | pgset "svlan_cfi 0" set canonical format identifier 0-1 (default 0) | ||
136 | |||
137 | pgset "vlan_id 9999" > 4095 remove vlan and svlan tags | ||
138 | pgset "svlan 9999" > 4095 remove svlan tag | ||
139 | |||
140 | |||
141 | pgset "tos XX" set former IPv4 TOS field (e.g. "tos 28" for AF11 no ECN, default 00) | ||
142 | pgset "traffic_class XX" set former IPv6 TRAFFIC CLASS (e.g. "traffic_class B8" for EF no ECN, default 00) | ||
143 | |||
128 | pgset stop aborts injection. Also, ^C aborts generator. | 144 | pgset stop aborts injection. Also, ^C aborts generator. |
129 | 145 | ||
130 | 146 | ||
diff --git a/Documentation/networking/secid.txt b/Documentation/networking/secid.txt new file mode 100644 index 000000000000..95ea06784333 --- /dev/null +++ b/Documentation/networking/secid.txt | |||
@@ -0,0 +1,14 @@ | |||
1 | flowi structure: | ||
2 | |||
3 | The secid member in the flow structure is used in LSMs (e.g. SELinux) to indicate | ||
4 | the label of the flow. This label of the flow is currently used in selecting | ||
5 | matching labeled xfrm(s). | ||
6 | |||
7 | If this is an outbound flow, the label is derived from the socket, if any, or | ||
8 | the incoming packet this flow is being generated as a response to (e.g. tcp | ||
9 | resets, timewait ack, etc.). It is also conceivable that the label could be | ||
10 | derived from other sources such as process context, device, etc., in special | ||
11 | cases, as may be appropriate. | ||
12 | |||
13 | If this is an inbound flow, the label is derived from the IPSec security | ||
14 | associations, if any, used by the packet. | ||
diff --git a/Documentation/nfsroot.txt b/Documentation/nfsroot.txt index d56dc71d9430..3cc953cb288f 100644 --- a/Documentation/nfsroot.txt +++ b/Documentation/nfsroot.txt | |||
@@ -4,15 +4,16 @@ Mounting the root filesystem via NFS (nfsroot) | |||
4 | Written 1996 by Gero Kuhlmann <gero@gkminix.han.de> | 4 | Written 1996 by Gero Kuhlmann <gero@gkminix.han.de> |
5 | Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz> | 5 | Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz> |
6 | Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org> | 6 | Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org> |
7 | Updated 2006 by Horms <horms@verge.net.au> | ||
7 | 8 | ||
8 | 9 | ||
9 | 10 | ||
10 | If you want to use a diskless system, as an X-terminal or printer | 11 | In order to use a diskless system, such as an X-terminal or printer server |
11 | server for example, you have to put your root filesystem onto a | 12 | for example, it is necessary for the root filesystem to be present on a |
12 | non-disk device. This can either be a ramdisk (see initrd.txt in | 13 | non-disk device. This may be an initramfs (see Documentation/filesystems/ |
13 | this directory for further information) or a filesystem mounted | 14 | ramfs-rootfs-initramfs.txt), a ramdisk (see Documenation/initrd.txt) or a |
14 | via NFS. The following text describes on how to use NFS for the | 15 | filesystem mounted via NFS. The following text describes on how to use NFS |
15 | root filesystem. For the rest of this text 'client' means the | 16 | for the root filesystem. For the rest of this text 'client' means the |
16 | diskless system, and 'server' means the NFS server. | 17 | diskless system, and 'server' means the NFS server. |
17 | 18 | ||
18 | 19 | ||
@@ -21,11 +22,13 @@ diskless system, and 'server' means the NFS server. | |||
21 | 1.) Enabling nfsroot capabilities | 22 | 1.) Enabling nfsroot capabilities |
22 | ----------------------------- | 23 | ----------------------------- |
23 | 24 | ||
24 | In order to use nfsroot you have to select support for NFS during | 25 | In order to use nfsroot, NFS client support needs to be selected as |
25 | kernel configuration. Note that NFS cannot be loaded as a module | 26 | built-in during configuration. Once this has been selected, the nfsroot |
26 | in this case. The configuration script will then ask you whether | 27 | option will become available, which should also be selected. |
27 | you want to use nfsroot, and if yes what kind of auto configuration | 28 | |
28 | system you want to use. Selecting both BOOTP and RARP is safe. | 29 | In the networking options, kernel level autoconfiguration can be selected, |
30 | along with the types of autoconfiguration to support. Selecting all of | ||
31 | DHCP, BOOTP and RARP is safe. | ||
29 | 32 | ||
30 | 33 | ||
31 | 34 | ||
@@ -33,11 +36,10 @@ system you want to use. Selecting both BOOTP and RARP is safe. | |||
33 | 2.) Kernel command line | 36 | 2.) Kernel command line |
34 | ------------------- | 37 | ------------------- |
35 | 38 | ||
36 | When the kernel has been loaded by a boot loader (either by loadlin, | 39 | When the kernel has been loaded by a boot loader (see below) it needs to be |
37 | LILO or a network boot program) it has to be told what root fs device | 40 | told what root fs device to use. And in the case of nfsroot, where to find |
38 | to use, and where to find the server and the name of the directory | 41 | both the server and the name of the directory on the server to mount as root. |
39 | on the server to mount as root. This can be established by a couple | 42 | This can be established using the following kernel command line parameters: |
40 | of kernel command line parameters: | ||
41 | 43 | ||
42 | 44 | ||
43 | root=/dev/nfs | 45 | root=/dev/nfs |
@@ -49,23 +51,21 @@ root=/dev/nfs | |||
49 | 51 | ||
50 | nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>] | 52 | nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>] |
51 | 53 | ||
52 | If the `nfsroot' parameter is NOT given on the command line, the default | 54 | If the `nfsroot' parameter is NOT given on the command line, |
53 | "/tftpboot/%s" will be used. | 55 | the default "/tftpboot/%s" will be used. |
54 | 56 | ||
55 | <server-ip> Specifies the IP address of the NFS server. If this field | 57 | <server-ip> Specifies the IP address of the NFS server. |
56 | is not given, the default address as determined by the | 58 | The default address is determined by the `ip' parameter |
57 | `ip' variable (see below) is used. One use of this | 59 | (see below). This parameter allows the use of different |
58 | parameter is for example to allow using different servers | 60 | servers for IP autoconfiguration and NFS. |
59 | for RARP and NFS. Usually you can leave this blank. | ||
60 | 61 | ||
61 | <root-dir> Name of the directory on the server to mount as root. If | 62 | <root-dir> Name of the directory on the server to mount as root. |
62 | there is a "%s" token in the string, the token will be | 63 | If there is a "%s" token in the string, it will be |
63 | replaced by the ASCII-representation of the client's IP | 64 | replaced by the ASCII-representation of the client's |
64 | address. | 65 | IP address. |
65 | 66 | ||
66 | <nfs-options> Standard NFS options. All options are separated by commas. | 67 | <nfs-options> Standard NFS options. All options are separated by commas. |
67 | If the options field is not given, the following defaults | 68 | The following defaults are used: |
68 | will be used: | ||
69 | port = as given by server portmap daemon | 69 | port = as given by server portmap daemon |
70 | rsize = 1024 | 70 | rsize = 1024 |
71 | wsize = 1024 | 71 | wsize = 1024 |
@@ -81,129 +81,174 @@ nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>] | |||
81 | ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf> | 81 | ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf> |
82 | 82 | ||
83 | This parameter tells the kernel how to configure IP addresses of devices | 83 | This parameter tells the kernel how to configure IP addresses of devices |
84 | and also how to set up the IP routing table. It was originally called `nfsaddrs', | 84 | and also how to set up the IP routing table. It was originally called |
85 | but now the boot-time IP configuration works independently of NFS, so it | 85 | `nfsaddrs', but now the boot-time IP configuration works independently of |
86 | was renamed to `ip' and the old name remained as an alias for compatibility | 86 | NFS, so it was renamed to `ip' and the old name remained as an alias for |
87 | reasons. | 87 | compatibility reasons. |
88 | 88 | ||
89 | If this parameter is missing from the kernel command line, all fields are | 89 | If this parameter is missing from the kernel command line, all fields are |
90 | assumed to be empty, and the defaults mentioned below apply. In general | 90 | assumed to be empty, and the defaults mentioned below apply. In general |
91 | this means that the kernel tries to configure everything using both | 91 | this means that the kernel tries to configure everything using |
92 | RARP and BOOTP (depending on what has been enabled during kernel confi- | 92 | autoconfiguration. |
93 | guration, and if both what protocol answer got in first). | 93 | |
94 | The <autoconf> parameter can appear alone as the value to the `ip' | ||
95 | parameter (without all the ':' characters before) in which case auto- | ||
96 | configuration is used. | ||
97 | |||
98 | <client-ip> IP address of the client. | ||
94 | 99 | ||
95 | <client-ip> IP address of the client. If empty, the address will either | 100 | Default: Determined using autoconfiguration. |
96 | be determined by RARP or BOOTP. What protocol is used de- | ||
97 | pends on what has been enabled during kernel configuration | ||
98 | and on the <autoconf> parameter. If this parameter is not | ||
99 | empty, neither RARP nor BOOTP will be used. | ||
100 | 101 | ||
101 | <server-ip> IP address of the NFS server. If RARP is used to determine | 102 | <server-ip> IP address of the NFS server. If RARP is used to determine |
102 | the client address and this parameter is NOT empty only | 103 | the client address and this parameter is NOT empty only |
103 | replies from the specified server are accepted. To use | 104 | replies from the specified server are accepted. |
104 | different RARP and NFS server, specify your RARP server | 105 | |
105 | here (or leave it blank), and specify your NFS server in | 106 | Only required for for NFS root. That is autoconfiguration |
106 | the `nfsroot' parameter (see above). If this entry is blank | 107 | will not be triggered if it is missing and NFS root is not |
107 | the address of the server is used which answered the RARP | 108 | in operation. |
108 | or BOOTP request. | 109 | |
109 | 110 | Default: Determined using autoconfiguration. | |
110 | <gw-ip> IP address of a gateway if the server is on a different | 111 | The address of the autoconfiguration server is used. |
111 | subnet. If this entry is empty no gateway is used and the | 112 | |
112 | server is assumed to be on the local network, unless a | 113 | <gw-ip> IP address of a gateway if the server is on a different subnet. |
113 | value has been received by BOOTP. | 114 | |
114 | 115 | Default: Determined using autoconfiguration. | |
115 | <netmask> Netmask for local network interface. If this is empty, | 116 | |
117 | <netmask> Netmask for local network interface. If unspecified | ||
116 | the netmask is derived from the client IP address assuming | 118 | the netmask is derived from the client IP address assuming |
117 | classful addressing, unless overridden in BOOTP reply. | 119 | classful addressing. |
118 | 120 | ||
119 | <hostname> Name of the client. If empty, the client IP address is | 121 | Default: Determined using autoconfiguration. |
120 | used in ASCII notation, or the value received by BOOTP. | ||
121 | 122 | ||
122 | <device> Name of network device to use. If this is empty, all | 123 | <hostname> Name of the client. May be supplied by autoconfiguration, |
123 | devices are used for RARP and BOOTP requests, and the | 124 | but its absence will not trigger autoconfiguration. |
124 | first one we receive a reply on is configured. If you have | ||
125 | only one device, you can safely leave this blank. | ||
126 | 125 | ||
127 | <autoconf> Method to use for autoconfiguration. If this is either | 126 | Default: Client IP address is used in ASCII notation. |
128 | 'rarp' or 'bootp', the specified protocol is used. | ||
129 | If the value is 'both' or empty, both protocols are used | ||
130 | so far as they have been enabled during kernel configura- | ||
131 | tion. 'off' means no autoconfiguration. | ||
132 | 127 | ||
133 | The <autoconf> parameter can appear alone as the value to the `ip' | 128 | <device> Name of network device to use. |
134 | parameter (without all the ':' characters before) in which case auto- | 129 | |
135 | configuration is used. | 130 | Default: If the host only has one device, it is used. |
131 | Otherwise the device is determined using | ||
132 | autoconfiguration. This is done by sending | ||
133 | autoconfiguration requests out of all devices, | ||
134 | and using the device that received the first reply. | ||
136 | 135 | ||
136 | <autoconf> Method to use for autoconfiguration. In the case of options | ||
137 | which specify multiple autoconfiguration protocols, | ||
138 | requests are sent using all protocols, and the first one | ||
139 | to reply is used. | ||
137 | 140 | ||
141 | Only autoconfiguration protocols that have been compiled | ||
142 | into the kernel will be used, regardless of the value of | ||
143 | this option. | ||
138 | 144 | ||
145 | off or none: don't use autoconfiguration (default) | ||
146 | on or any: use any protocol available in the kernel | ||
147 | dhcp: use DHCP | ||
148 | bootp: use BOOTP | ||
149 | rarp: use RARP | ||
150 | both: use both BOOTP and RARP but not DHCP | ||
151 | (old option kept for backwards compatibility) | ||
139 | 152 | ||
140 | 3.) Kernel loader | 153 | Default: any |
141 | ------------- | ||
142 | 154 | ||
143 | To get the kernel into memory different approaches can be used. They | ||
144 | depend on what facilities are available: | ||
145 | 155 | ||
146 | 156 | ||
147 | 3.1) Writing the kernel onto a floppy using dd: | ||
148 | As always you can just write the kernel onto a floppy using dd, | ||
149 | but then it's not possible to use kernel command lines at all. | ||
150 | To substitute the 'root=' parameter, create a dummy device on any | ||
151 | linux system with major number 0 and minor number 255 using mknod: | ||
152 | 157 | ||
153 | mknod /dev/boot255 c 0 255 | 158 | 3.) Boot Loader |
159 | ---------- | ||
154 | 160 | ||
155 | Then copy the kernel zImage file onto a floppy using dd: | 161 | To get the kernel into memory different approaches can be used. |
162 | They depend on various facilities being available: | ||
156 | 163 | ||
157 | dd if=/usr/src/linux/arch/i386/boot/zImage of=/dev/fd0 | ||
158 | 164 | ||
159 | And finally use rdev to set the root device: | 165 | 3.1) Booting from a floppy using syslinux |
160 | 166 | ||
161 | rdev /dev/fd0 /dev/boot255 | 167 | When building kernels, an easy way to create a boot floppy that uses |
168 | syslinux is to use the zdisk or bzdisk make targets which use | ||
169 | and bzimage images respectively. Both targets accept the | ||
170 | FDARGS parameter which can be used to set the kernel command line. | ||
162 | 171 | ||
163 | You can then remove the dummy device /dev/boot255 again. There | 172 | e.g. |
164 | is no real device available for it. | 173 | make bzdisk FDARGS="root=/dev/nfs" |
165 | The other two kernel command line parameters cannot be substi- | 174 | |
166 | tuted with rdev. Therefore, using this method the kernel will | 175 | Note that the user running this command will need to have |
167 | by default use RARP and/or BOOTP, and if it gets an answer via | 176 | access to the floppy drive device, /dev/fd0 |
168 | RARP will mount the directory /tftpboot/<client-ip>/ as its | 177 | |
169 | root. If it got a BOOTP answer the directory name in that answer | 178 | For more information on syslinux, including how to create bootdisks |
170 | is used. | 179 | for prebuilt kernels, see http://syslinux.zytor.com/ |
180 | |||
181 | N.B: Previously it was possible to write a kernel directly to | ||
182 | a floppy using dd, configure the boot device using rdev, and | ||
183 | boot using the resulting floppy. Linux no longer supports this | ||
184 | method of booting. | ||
185 | |||
186 | 3.2) Booting from a cdrom using isolinux | ||
187 | |||
188 | When building kernels, an easy way to create a bootable cdrom that | ||
189 | uses isolinux is to use the isoimage target which uses a bzimage | ||
190 | image. Like zdisk and bzdisk, this target accepts the FDARGS | ||
191 | parameter which can be used to set the kernel command line. | ||
192 | |||
193 | e.g. | ||
194 | make isoimage FDARGS="root=/dev/nfs" | ||
195 | |||
196 | The resulting iso image will be arch/<ARCH>/boot/image.iso | ||
197 | This can be written to a cdrom using a variety of tools including | ||
198 | cdrecord. | ||
199 | |||
200 | e.g. | ||
201 | cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso | ||
202 | |||
203 | For more information on isolinux, including how to create bootdisks | ||
204 | for prebuilt kernels, see http://syslinux.zytor.com/ | ||
171 | 205 | ||
172 | 3.2) Using LILO | 206 | 3.2) Using LILO |
173 | When using LILO you can specify all necessary command line | 207 | When using LILO all the necessary command line parameters may be |
174 | parameters with the 'append=' command in the LILO configuration | 208 | specified using the 'append=' directive in the LILO configuration |
175 | file. However, to use the 'root=' command you also need to | 209 | file. |
176 | set up a dummy device as described in 3.1 above. For how to use | 210 | |
177 | LILO and its 'append=' command please refer to the LILO | 211 | However, to use the 'root=' directive you also need to create |
178 | documentation. | 212 | a dummy root device, which may be removed after LILO is run. |
213 | |||
214 | mknod /dev/boot255 c 0 255 | ||
215 | |||
216 | For information on configuring LILO, please refer to its documentation. | ||
179 | 217 | ||
180 | 3.3) Using GRUB | 218 | 3.3) Using GRUB |
181 | When you use GRUB, you simply append the parameters after the kernel | 219 | When using GRUB, kernel parameter are simply appended after the kernel |
182 | specification: "kernel <kernel> <parameters>" (without the quotes). | 220 | specification: kernel <kernel> <parameters> |
183 | 221 | ||
184 | 3.4) Using loadlin | 222 | 3.4) Using loadlin |
185 | When you want to boot Linux from a DOS command prompt without | 223 | loadlin may be used to boot Linux from a DOS command prompt without |
186 | having a local hard disk to mount as root, you can use loadlin. | 224 | requiring a local hard disk to mount as root. This has not been |
187 | I was told that it works, but haven't used it myself yet. In | 225 | thoroughly tested by the authors of this document, but in general |
188 | general you should be able to create a kernel command line simi- | 226 | it should be possible configure the kernel command line similarly |
189 | lar to how LILO is doing it. Please refer to the loadlin docu- | 227 | to the configuration of LILO. |
190 | mentation for further information. | 228 | |
229 | Please refer to the loadlin documentation for further information. | ||
191 | 230 | ||
192 | 3.5) Using a boot ROM | 231 | 3.5) Using a boot ROM |
193 | This is probably the most elegant way of booting a diskless | 232 | This is probably the most elegant way of booting a diskless client. |
194 | client. With a boot ROM the kernel gets loaded using the TFTP | 233 | With a boot ROM the kernel is loaded using the TFTP protocol. The |
195 | protocol. As far as I know, no commercial boot ROMs yet | 234 | authors of this document are not aware of any no commercial boot |
196 | support booting Linux over the network, but there are two | 235 | ROMs that support booting Linux over the network. However, there |
197 | free implementations of a boot ROM available on sunsite.unc.edu | 236 | are two free implementations of a boot ROM, netboot-nfs and |
198 | and its mirrors. They are called 'netboot-nfs' and 'etherboot'. | 237 | etherboot, both of which are available on sunsite.unc.edu, and both |
199 | Both contain everything you need to boot a diskless Linux client. | 238 | of which contain everything you need to boot a diskless Linux client. |
200 | 239 | ||
201 | 3.6) Using pxelinux | 240 | 3.6) Using pxelinux |
202 | Using pxelinux you specify the kernel you built with | 241 | Pxelinux may be used to boot linux using the PXE boot loader |
242 | which is present on many modern network cards. | ||
243 | |||
244 | When using pxelinux, the kernel image is specified using | ||
203 | "kernel <relative-path-below /tftpboot>". The nfsroot parameters | 245 | "kernel <relative-path-below /tftpboot>". The nfsroot parameters |
204 | are passed to the kernel by adding them to the "append" line. | 246 | are passed to the kernel by adding them to the "append" line. |
205 | You may perhaps also want to fine tune the console output, | 247 | It is common to use serial console in conjunction with pxeliunx, |
206 | see Documentation/serial-console.txt for serial console help. | 248 | see Documentation/serial-console.txt for more information. |
249 | |||
250 | For more information on isolinux, including how to create bootdisks | ||
251 | for prebuilt kernels, see http://syslinux.zytor.com/ | ||
207 | 252 | ||
208 | 253 | ||
209 | 254 | ||
diff --git a/Documentation/nommu-mmap.txt b/Documentation/nommu-mmap.txt index b88ebe4d808c..7714f57caad5 100644 --- a/Documentation/nommu-mmap.txt +++ b/Documentation/nommu-mmap.txt | |||
@@ -116,6 +116,9 @@ FURTHER NOTES ON NO-MMU MMAP | |||
116 | (*) A list of all the mappings on the system is visible through /proc/maps in | 116 | (*) A list of all the mappings on the system is visible through /proc/maps in |
117 | no-MMU mode. | 117 | no-MMU mode. |
118 | 118 | ||
119 | (*) A list of all the mappings in use by a process is visible through | ||
120 | /proc/<pid>/maps in no-MMU mode. | ||
121 | |||
119 | (*) Supplying MAP_FIXED or a requesting a particular mapping address will | 122 | (*) Supplying MAP_FIXED or a requesting a particular mapping address will |
120 | result in an error. | 123 | result in an error. |
121 | 124 | ||
@@ -125,6 +128,49 @@ FURTHER NOTES ON NO-MMU MMAP | |||
125 | error will result if they don't. This is most likely to be encountered | 128 | error will result if they don't. This is most likely to be encountered |
126 | with character device files, pipes, fifos and sockets. | 129 | with character device files, pipes, fifos and sockets. |
127 | 130 | ||
131 | |||
132 | ========================== | ||
133 | INTERPROCESS SHARED MEMORY | ||
134 | ========================== | ||
135 | |||
136 | Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU | ||
137 | mode. The former through the usual mechanism, the latter through files created | ||
138 | on ramfs or tmpfs mounts. | ||
139 | |||
140 | |||
141 | ======= | ||
142 | FUTEXES | ||
143 | ======= | ||
144 | |||
145 | Futexes are supported in NOMMU mode if the arch supports them. An error will | ||
146 | be given if an address passed to the futex system call lies outside the | ||
147 | mappings made by a process or if the mapping in which the address lies does not | ||
148 | support futexes (such as an I/O chardev mapping). | ||
149 | |||
150 | |||
151 | ============= | ||
152 | NO-MMU MREMAP | ||
153 | ============= | ||
154 | |||
155 | The mremap() function is partially supported. It may change the size of a | ||
156 | mapping, and may move it[*] if MREMAP_MAYMOVE is specified and if the new size | ||
157 | of the mapping exceeds the size of the slab object currently occupied by the | ||
158 | memory to which the mapping refers, or if a smaller slab object could be used. | ||
159 | |||
160 | MREMAP_FIXED is not supported, though it is ignored if there's no change of | ||
161 | address and the object does not need to be moved. | ||
162 | |||
163 | Shared mappings may not be moved. Shareable mappings may not be moved either, | ||
164 | even if they are not currently shared. | ||
165 | |||
166 | The mremap() function must be given an exact match for base address and size of | ||
167 | a previously mapped object. It may not be used to create holes in existing | ||
168 | mappings, move parts of existing mappings or resize parts of mappings. It must | ||
169 | act on a complete mapping. | ||
170 | |||
171 | [*] Not currently supported. | ||
172 | |||
173 | |||
128 | ============================================ | 174 | ============================================ |
129 | PROVIDING SHAREABLE CHARACTER DEVICE SUPPORT | 175 | PROVIDING SHAREABLE CHARACTER DEVICE SUPPORT |
130 | ============================================ | 176 | ============================================ |
diff --git a/Documentation/pci.txt b/Documentation/pci.txt index 3242e5c1ee9c..2b395e478961 100644 --- a/Documentation/pci.txt +++ b/Documentation/pci.txt | |||
@@ -225,7 +225,7 @@ Generic flavors of pci_request_region() are request_mem_region() | |||
225 | Use these for address resources that are not described by "normal" PCI | 225 | Use these for address resources that are not described by "normal" PCI |
226 | interfaces (e.g. BAR). | 226 | interfaces (e.g. BAR). |
227 | 227 | ||
228 | All interrupt handlers should be registered with SA_SHIRQ and use the devid | 228 | All interrupt handlers should be registered with IRQF_SHARED and use the devid |
229 | to map IRQs to devices (remember that all PCI interrupts are shared). | 229 | to map IRQs to devices (remember that all PCI interrupts are shared). |
230 | 230 | ||
231 | 231 | ||
diff --git a/Documentation/pcieaer-howto.txt b/Documentation/pcieaer-howto.txt new file mode 100644 index 000000000000..16c251230c82 --- /dev/null +++ b/Documentation/pcieaer-howto.txt | |||
@@ -0,0 +1,253 @@ | |||
1 | The PCI Express Advanced Error Reporting Driver Guide HOWTO | ||
2 | T. Long Nguyen <tom.l.nguyen@intel.com> | ||
3 | Yanmin Zhang <yanmin.zhang@intel.com> | ||
4 | 07/29/2006 | ||
5 | |||
6 | |||
7 | 1. Overview | ||
8 | |||
9 | 1.1 About this guide | ||
10 | |||
11 | This guide describes the basics of the PCI Express Advanced Error | ||
12 | Reporting (AER) driver and provides information on how to use it, as | ||
13 | well as how to enable the drivers of endpoint devices to conform with | ||
14 | PCI Express AER driver. | ||
15 | |||
16 | 1.2 Copyright © Intel Corporation 2006. | ||
17 | |||
18 | 1.3 What is the PCI Express AER Driver? | ||
19 | |||
20 | PCI Express error signaling can occur on the PCI Express link itself | ||
21 | or on behalf of transactions initiated on the link. PCI Express | ||
22 | defines two error reporting paradigms: the baseline capability and | ||
23 | the Advanced Error Reporting capability. The baseline capability is | ||
24 | required of all PCI Express components providing a minimum defined | ||
25 | set of error reporting requirements. Advanced Error Reporting | ||
26 | capability is implemented with a PCI Express advanced error reporting | ||
27 | extended capability structure providing more robust error reporting. | ||
28 | |||
29 | The PCI Express AER driver provides the infrastructure to support PCI | ||
30 | Express Advanced Error Reporting capability. The PCI Express AER | ||
31 | driver provides three basic functions: | ||
32 | |||
33 | - Gathers the comprehensive error information if errors occurred. | ||
34 | - Reports error to the users. | ||
35 | - Performs error recovery actions. | ||
36 | |||
37 | AER driver only attaches root ports which support PCI-Express AER | ||
38 | capability. | ||
39 | |||
40 | |||
41 | 2. User Guide | ||
42 | |||
43 | 2.1 Include the PCI Express AER Root Driver into the Linux Kernel | ||
44 | |||
45 | The PCI Express AER Root driver is a Root Port service driver attached | ||
46 | to the PCI Express Port Bus driver. If a user wants to use it, the driver | ||
47 | has to be compiled. Option CONFIG_PCIEAER supports this capability. It | ||
48 | depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and | ||
49 | CONFIG_PCIEAER = y. | ||
50 | |||
51 | 2.2 Load PCI Express AER Root Driver | ||
52 | There is a case where a system has AER support in BIOS. Enabling the AER | ||
53 | Root driver and having AER support in BIOS may result unpredictable | ||
54 | behavior. To avoid this conflict, a successful load of the AER Root driver | ||
55 | requires ACPI _OSC support in the BIOS to allow the AER Root driver to | ||
56 | request for native control of AER. See the PCI FW 3.0 Specification for | ||
57 | details regarding OSC usage. Currently, lots of firmwares don't provide | ||
58 | _OSC support while they use PCI Express. To support such firmwares, | ||
59 | forceload, a parameter of type bool, could enable AER to continue to | ||
60 | be initiated although firmwares have no _OSC support. To enable the | ||
61 | walkaround, pls. add aerdriver.forceload=y to kernel boot parameter line | ||
62 | when booting kernel. Note that forceload=n by default. | ||
63 | |||
64 | 2.3 AER error output | ||
65 | When a PCI-E AER error is captured, an error message will be outputed to | ||
66 | console. If it's a correctable error, it is outputed as a warning. | ||
67 | Otherwise, it is printed as an error. So users could choose different | ||
68 | log level to filter out correctable error messages. | ||
69 | |||
70 | Below shows an example. | ||
71 | +------ PCI-Express Device Error -----+ | ||
72 | Error Severity : Uncorrected (Fatal) | ||
73 | PCIE Bus Error type : Transaction Layer | ||
74 | Unsupported Request : First | ||
75 | Requester ID : 0500 | ||
76 | VendorID=8086h, DeviceID=0329h, Bus=05h, Device=00h, Function=00h | ||
77 | TLB Header: | ||
78 | 04000001 00200a03 05010000 00050100 | ||
79 | |||
80 | In the example, 'Requester ID' means the ID of the device who sends | ||
81 | the error message to root port. Pls. refer to pci express specs for | ||
82 | other fields. | ||
83 | |||
84 | |||
85 | 3. Developer Guide | ||
86 | |||
87 | To enable AER aware support requires a software driver to configure | ||
88 | the AER capability structure within its device and to provide callbacks. | ||
89 | |||
90 | To support AER better, developers need understand how AER does work | ||
91 | firstly. | ||
92 | |||
93 | PCI Express errors are classified into two types: correctable errors | ||
94 | and uncorrectable errors. This classification is based on the impacts | ||
95 | of those errors, which may result in degraded performance or function | ||
96 | failure. | ||
97 | |||
98 | Correctable errors pose no impacts on the functionality of the | ||
99 | interface. The PCI Express protocol can recover without any software | ||
100 | intervention or any loss of data. These errors are detected and | ||
101 | corrected by hardware. Unlike correctable errors, uncorrectable | ||
102 | errors impact functionality of the interface. Uncorrectable errors | ||
103 | can cause a particular transaction or a particular PCI Express link | ||
104 | to be unreliable. Depending on those error conditions, uncorrectable | ||
105 | errors are further classified into non-fatal errors and fatal errors. | ||
106 | Non-fatal errors cause the particular transaction to be unreliable, | ||
107 | but the PCI Express link itself is fully functional. Fatal errors, on | ||
108 | the other hand, cause the link to be unreliable. | ||
109 | |||
110 | When AER is enabled, a PCI Express device will automatically send an | ||
111 | error message to the PCIE root port above it when the device captures | ||
112 | an error. The Root Port, upon receiving an error reporting message, | ||
113 | internally processes and logs the error message in its PCI Express | ||
114 | capability structure. Error information being logged includes storing | ||
115 | the error reporting agent's requestor ID into the Error Source | ||
116 | Identification Registers and setting the error bits of the Root Error | ||
117 | Status Register accordingly. If AER error reporting is enabled in Root | ||
118 | Error Command Register, the Root Port generates an interrupt if an | ||
119 | error is detected. | ||
120 | |||
121 | Note that the errors as described above are related to the PCI Express | ||
122 | hierarchy and links. These errors do not include any device specific | ||
123 | errors because device specific errors will still get sent directly to | ||
124 | the device driver. | ||
125 | |||
126 | 3.1 Configure the AER capability structure | ||
127 | |||
128 | AER aware drivers of PCI Express component need change the device | ||
129 | control registers to enable AER. They also could change AER registers, | ||
130 | including mask and severity registers. Helper function | ||
131 | pci_enable_pcie_error_reporting could be used to enable AER. See | ||
132 | section 3.3. | ||
133 | |||
134 | 3.2. Provide callbacks | ||
135 | |||
136 | 3.2.1 callback reset_link to reset pci express link | ||
137 | |||
138 | This callback is used to reset the pci express physical link when a | ||
139 | fatal error happens. The root port aer service driver provides a | ||
140 | default reset_link function, but different upstream ports might | ||
141 | have different specifications to reset pci express link, so all | ||
142 | upstream ports should provide their own reset_link functions. | ||
143 | |||
144 | In struct pcie_port_service_driver, a new pointer, reset_link, is | ||
145 | added. | ||
146 | |||
147 | pci_ers_result_t (*reset_link) (struct pci_dev *dev); | ||
148 | |||
149 | Section 3.2.2.2 provides more detailed info on when to call | ||
150 | reset_link. | ||
151 | |||
152 | 3.2.2 PCI error-recovery callbacks | ||
153 | |||
154 | The PCI Express AER Root driver uses error callbacks to coordinate | ||
155 | with downstream device drivers associated with a hierarchy in question | ||
156 | when performing error recovery actions. | ||
157 | |||
158 | Data struct pci_driver has a pointer, err_handler, to point to | ||
159 | pci_error_handlers who consists of a couple of callback function | ||
160 | pointers. AER driver follows the rules defined in | ||
161 | pci-error-recovery.txt except pci express specific parts (e.g. | ||
162 | reset_link). Pls. refer to pci-error-recovery.txt for detailed | ||
163 | definitions of the callbacks. | ||
164 | |||
165 | Below sections specify when to call the error callback functions. | ||
166 | |||
167 | 3.2.2.1 Correctable errors | ||
168 | |||
169 | Correctable errors pose no impacts on the functionality of | ||
170 | the interface. The PCI Express protocol can recover without any | ||
171 | software intervention or any loss of data. These errors do not | ||
172 | require any recovery actions. The AER driver clears the device's | ||
173 | correctable error status register accordingly and logs these errors. | ||
174 | |||
175 | 3.2.2.2 Non-correctable (non-fatal and fatal) errors | ||
176 | |||
177 | If an error message indicates a non-fatal error, performing link reset | ||
178 | at upstream is not required. The AER driver calls error_detected(dev, | ||
179 | pci_channel_io_normal) to all drivers associated within a hierarchy in | ||
180 | question. for example, | ||
181 | EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort. | ||
182 | If Upstream port A captures an AER error, the hierarchy consists of | ||
183 | Downstream port B and EndPoint. | ||
184 | |||
185 | A driver may return PCI_ERS_RESULT_CAN_RECOVER, | ||
186 | PCI_ERS_RESULT_DISCONNECT, or PCI_ERS_RESULT_NEED_RESET, depending on | ||
187 | whether it can recover or the AER driver calls mmio_enabled as next. | ||
188 | |||
189 | If an error message indicates a fatal error, kernel will broadcast | ||
190 | error_detected(dev, pci_channel_io_frozen) to all drivers within | ||
191 | a hierarchy in question. Then, performing link reset at upstream is | ||
192 | necessary. As different kinds of devices might use different approaches | ||
193 | to reset link, AER port service driver is required to provide the | ||
194 | function to reset link. Firstly, kernel looks for if the upstream | ||
195 | component has an aer driver. If it has, kernel uses the reset_link | ||
196 | callback of the aer driver. If the upstream component has no aer driver | ||
197 | and the port is downstream port, we will use the aer driver of the | ||
198 | root port who reports the AER error. As for upstream ports, | ||
199 | they should provide their own aer service drivers with reset_link | ||
200 | function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and | ||
201 | reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes | ||
202 | to mmio_enabled. | ||
203 | |||
204 | 3.3 helper functions | ||
205 | |||
206 | 3.3.1 int pci_find_aer_capability(struct pci_dev *dev); | ||
207 | pci_find_aer_capability locates the PCI Express AER capability | ||
208 | in the device configuration space. If the device doesn't support | ||
209 | PCI-Express AER, the function returns 0. | ||
210 | |||
211 | 3.3.2 int pci_enable_pcie_error_reporting(struct pci_dev *dev); | ||
212 | pci_enable_pcie_error_reporting enables the device to send error | ||
213 | messages to root port when an error is detected. Note that devices | ||
214 | don't enable the error reporting by default, so device drivers need | ||
215 | call this function to enable it. | ||
216 | |||
217 | 3.3.3 int pci_disable_pcie_error_reporting(struct pci_dev *dev); | ||
218 | pci_disable_pcie_error_reporting disables the device to send error | ||
219 | messages to root port when an error is detected. | ||
220 | |||
221 | 3.3.4 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev); | ||
222 | pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable | ||
223 | error status register. | ||
224 | |||
225 | 3.4 Frequent Asked Questions | ||
226 | |||
227 | Q: What happens if a PCI Express device driver does not provide an | ||
228 | error recovery handler (pci_driver->err_handler is equal to NULL)? | ||
229 | |||
230 | A: The devices attached with the driver won't be recovered. If the | ||
231 | error is fatal, kernel will print out warning messages. Please refer | ||
232 | to section 3 for more information. | ||
233 | |||
234 | Q: What happens if an upstream port service driver does not provide | ||
235 | callback reset_link? | ||
236 | |||
237 | A: Fatal error recovery will fail if the errors are reported by the | ||
238 | upstream ports who are attached by the service driver. | ||
239 | |||
240 | Q: How does this infrastructure deal with driver that is not PCI | ||
241 | Express aware? | ||
242 | |||
243 | A: This infrastructure calls the error callback functions of the | ||
244 | driver when an error happens. But if the driver is not aware of | ||
245 | PCI Express, the device might not report its own errors to root | ||
246 | port. | ||
247 | |||
248 | Q: What modifications will that driver need to make it compatible | ||
249 | with the PCI Express AER Root driver? | ||
250 | |||
251 | A: It could call the helper functions to enable AER in devices and | ||
252 | cleanup uncorrectable status register. Pls. refer to section 3.3. | ||
253 | |||
diff --git a/Documentation/pcmcia/crc32hash.c b/Documentation/pcmcia/crc32hash.c new file mode 100644 index 000000000000..cbc36d299af8 --- /dev/null +++ b/Documentation/pcmcia/crc32hash.c | |||
@@ -0,0 +1,32 @@ | |||
1 | /* crc32hash.c - derived from linux/lib/crc32.c, GNU GPL v2 */ | ||
2 | /* Usage example: | ||
3 | $ ./crc32hash "Dual Speed" | ||
4 | */ | ||
5 | |||
6 | #include <string.h> | ||
7 | #include <stdio.h> | ||
8 | #include <ctype.h> | ||
9 | #include <stdlib.h> | ||
10 | |||
11 | unsigned int crc32(unsigned char const *p, unsigned int len) | ||
12 | { | ||
13 | int i; | ||
14 | unsigned int crc = 0; | ||
15 | while (len--) { | ||
16 | crc ^= *p++; | ||
17 | for (i = 0; i < 8; i++) | ||
18 | crc = (crc >> 1) ^ ((crc & 1) ? 0xedb88320 : 0); | ||
19 | } | ||
20 | return crc; | ||
21 | } | ||
22 | |||
23 | int main(int argc, char **argv) { | ||
24 | unsigned int result; | ||
25 | if (argc != 2) { | ||
26 | printf("no string passed as argument\n"); | ||
27 | return -1; | ||
28 | } | ||
29 | result = crc32(argv[1], strlen(argv[1])); | ||
30 | printf("0x%x\n", result); | ||
31 | return 0; | ||
32 | } | ||
diff --git a/Documentation/pcmcia/devicetable.txt b/Documentation/pcmcia/devicetable.txt index 3351c0355143..199afd100cf2 100644 --- a/Documentation/pcmcia/devicetable.txt +++ b/Documentation/pcmcia/devicetable.txt | |||
@@ -27,37 +27,7 @@ pcmcia:m0149cC1ABf06pfn00fn00pa725B842DpbF1EFEE84pc0877B627pd00000000 | |||
27 | The hex value after "pa" is the hash of product ID string 1, after "pb" for | 27 | The hex value after "pa" is the hash of product ID string 1, after "pb" for |
28 | string 2 and so on. | 28 | string 2 and so on. |
29 | 29 | ||
30 | Alternatively, you can use this small tool to determine the crc32 hash. | 30 | Alternatively, you can use crc32hash (see Documentation/pcmcia/crc32hash.c) |
31 | simply pass the string you want to evaluate as argument to this program, | 31 | to determine the crc32 hash. Simply pass the string you want to evaluate |
32 | e.g. | 32 | as argument to this program, e.g.: |
33 | $ ./crc32hash "Dual Speed" | 33 | $ ./crc32hash "Dual Speed" |
34 | |||
35 | ------------------------------------------------------------------------- | ||
36 | /* crc32hash.c - derived from linux/lib/crc32.c, GNU GPL v2 */ | ||
37 | #include <string.h> | ||
38 | #include <stdio.h> | ||
39 | #include <ctype.h> | ||
40 | #include <stdlib.h> | ||
41 | |||
42 | unsigned int crc32(unsigned char const *p, unsigned int len) | ||
43 | { | ||
44 | int i; | ||
45 | unsigned int crc = 0; | ||
46 | while (len--) { | ||
47 | crc ^= *p++; | ||
48 | for (i = 0; i < 8; i++) | ||
49 | crc = (crc >> 1) ^ ((crc & 1) ? 0xedb88320 : 0); | ||
50 | } | ||
51 | return crc; | ||
52 | } | ||
53 | |||
54 | int main(int argc, char **argv) { | ||
55 | unsigned int result; | ||
56 | if (argc != 2) { | ||
57 | printf("no string passed as argument\n"); | ||
58 | return -1; | ||
59 | } | ||
60 | result = crc32(argv[1], strlen(argv[1])); | ||
61 | printf("0x%x\n", result); | ||
62 | return 0; | ||
63 | } | ||
diff --git a/Documentation/pi-futex.txt b/Documentation/pi-futex.txt new file mode 100644 index 000000000000..5d61dacd21f6 --- /dev/null +++ b/Documentation/pi-futex.txt | |||
@@ -0,0 +1,121 @@ | |||
1 | Lightweight PI-futexes | ||
2 | ---------------------- | ||
3 | |||
4 | We are calling them lightweight for 3 reasons: | ||
5 | |||
6 | - in the user-space fastpath a PI-enabled futex involves no kernel work | ||
7 | (or any other PI complexity) at all. No registration, no extra kernel | ||
8 | calls - just pure fast atomic ops in userspace. | ||
9 | |||
10 | - even in the slowpath, the system call and scheduling pattern is very | ||
11 | similar to normal futexes. | ||
12 | |||
13 | - the in-kernel PI implementation is streamlined around the mutex | ||
14 | abstraction, with strict rules that keep the implementation | ||
15 | relatively simple: only a single owner may own a lock (i.e. no | ||
16 | read-write lock support), only the owner may unlock a lock, no | ||
17 | recursive locking, etc. | ||
18 | |||
19 | Priority Inheritance - why? | ||
20 | --------------------------- | ||
21 | |||
22 | The short reply: user-space PI helps achieving/improving determinism for | ||
23 | user-space applications. In the best-case, it can help achieve | ||
24 | determinism and well-bound latencies. Even in the worst-case, PI will | ||
25 | improve the statistical distribution of locking related application | ||
26 | delays. | ||
27 | |||
28 | The longer reply: | ||
29 | ----------------- | ||
30 | |||
31 | Firstly, sharing locks between multiple tasks is a common programming | ||
32 | technique that often cannot be replaced with lockless algorithms. As we | ||
33 | can see it in the kernel [which is a quite complex program in itself], | ||
34 | lockless structures are rather the exception than the norm - the current | ||
35 | ratio of lockless vs. locky code for shared data structures is somewhere | ||
36 | between 1:10 and 1:100. Lockless is hard, and the complexity of lockless | ||
37 | algorithms often endangers to ability to do robust reviews of said code. | ||
38 | I.e. critical RT apps often choose lock structures to protect critical | ||
39 | data structures, instead of lockless algorithms. Furthermore, there are | ||
40 | cases (like shared hardware, or other resource limits) where lockless | ||
41 | access is mathematically impossible. | ||
42 | |||
43 | Media players (such as Jack) are an example of reasonable application | ||
44 | design with multiple tasks (with multiple priority levels) sharing | ||
45 | short-held locks: for example, a highprio audio playback thread is | ||
46 | combined with medium-prio construct-audio-data threads and low-prio | ||
47 | display-colory-stuff threads. Add video and decoding to the mix and | ||
48 | we've got even more priority levels. | ||
49 | |||
50 | So once we accept that synchronization objects (locks) are an | ||
51 | unavoidable fact of life, and once we accept that multi-task userspace | ||
52 | apps have a very fair expectation of being able to use locks, we've got | ||
53 | to think about how to offer the option of a deterministic locking | ||
54 | implementation to user-space. | ||
55 | |||
56 | Most of the technical counter-arguments against doing priority | ||
57 | inheritance only apply to kernel-space locks. But user-space locks are | ||
58 | different, there we cannot disable interrupts or make the task | ||
59 | non-preemptible in a critical section, so the 'use spinlocks' argument | ||
60 | does not apply (user-space spinlocks have the same priority inversion | ||
61 | problems as other user-space locking constructs). Fact is, pretty much | ||
62 | the only technique that currently enables good determinism for userspace | ||
63 | locks (such as futex-based pthread mutexes) is priority inheritance: | ||
64 | |||
65 | Currently (without PI), if a high-prio and a low-prio task shares a lock | ||
66 | [this is a quite common scenario for most non-trivial RT applications], | ||
67 | even if all critical sections are coded carefully to be deterministic | ||
68 | (i.e. all critical sections are short in duration and only execute a | ||
69 | limited number of instructions), the kernel cannot guarantee any | ||
70 | deterministic execution of the high-prio task: any medium-priority task | ||
71 | could preempt the low-prio task while it holds the shared lock and | ||
72 | executes the critical section, and could delay it indefinitely. | ||
73 | |||
74 | Implementation: | ||
75 | --------------- | ||
76 | |||
77 | As mentioned before, the userspace fastpath of PI-enabled pthread | ||
78 | mutexes involves no kernel work at all - they behave quite similarly to | ||
79 | normal futex-based locks: a 0 value means unlocked, and a value==TID | ||
80 | means locked. (This is the same method as used by list-based robust | ||
81 | futexes.) Userspace uses atomic ops to lock/unlock these mutexes without | ||
82 | entering the kernel. | ||
83 | |||
84 | To handle the slowpath, we have added two new futex ops: | ||
85 | |||
86 | FUTEX_LOCK_PI | ||
87 | FUTEX_UNLOCK_PI | ||
88 | |||
89 | If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to | ||
90 | TID fails], then FUTEX_LOCK_PI is called. The kernel does all the | ||
91 | remaining work: if there is no futex-queue attached to the futex address | ||
92 | yet then the code looks up the task that owns the futex [it has put its | ||
93 | own TID into the futex value], and attaches a 'PI state' structure to | ||
94 | the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware, | ||
95 | kernel-based synchronization object. The 'other' task is made the owner | ||
96 | of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the | ||
97 | futex value. Then this task tries to lock the rt-mutex, on which it | ||
98 | blocks. Once it returns, it has the mutex acquired, and it sets the | ||
99 | futex value to its own TID and returns. Userspace has no other work to | ||
100 | perform - it now owns the lock, and futex value contains | ||
101 | FUTEX_WAITERS|TID. | ||
102 | |||
103 | If the unlock side fastpath succeeds, [i.e. userspace manages to do a | ||
104 | TID -> 0 atomic transition of the futex value], then no kernel work is | ||
105 | triggered. | ||
106 | |||
107 | If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), | ||
108 | then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the | ||
109 | behalf of userspace - and it also unlocks the attached | ||
110 | pi_state->rt_mutex and thus wakes up any potential waiters. | ||
111 | |||
112 | Note that under this approach, contrary to previous PI-futex approaches, | ||
113 | there is no prior 'registration' of a PI-futex. [which is not quite | ||
114 | possible anyway, due to existing ABI properties of pthread mutexes.] | ||
115 | |||
116 | Also, under this scheme, 'robustness' and 'PI' are two orthogonal | ||
117 | properties of futexes, and all four combinations are possible: futex, | ||
118 | robust-futex, PI-futex, robust+PI-futex. | ||
119 | |||
120 | More details about priority inheritance can be found in | ||
121 | Documentation/rtmutex.txt. | ||
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt index fba1e05c47c7..d0e79d5820a5 100644 --- a/Documentation/power/devices.txt +++ b/Documentation/power/devices.txt | |||
@@ -1,208 +1,553 @@ | |||
1 | Most of the code in Linux is device drivers, so most of the Linux power | ||
2 | management code is also driver-specific. Most drivers will do very little; | ||
3 | others, especially for platforms with small batteries (like cell phones), | ||
4 | will do a lot. | ||
5 | |||
6 | This writeup gives an overview of how drivers interact with system-wide | ||
7 | power management goals, emphasizing the models and interfaces that are | ||
8 | shared by everything that hooks up to the driver model core. Read it as | ||
9 | background for the domain-specific work you'd do with any specific driver. | ||
10 | |||
11 | |||
12 | Two Models for Device Power Management | ||
13 | ====================================== | ||
14 | Drivers will use one or both of these models to put devices into low-power | ||
15 | states: | ||
16 | |||
17 | System Sleep model: | ||
18 | Drivers can enter low power states as part of entering system-wide | ||
19 | low-power states like "suspend-to-ram", or (mostly for systems with | ||
20 | disks) "hibernate" (suspend-to-disk). | ||
21 | |||
22 | This is something that device, bus, and class drivers collaborate on | ||
23 | by implementing various role-specific suspend and resume methods to | ||
24 | cleanly power down hardware and software subsystems, then reactivate | ||
25 | them without loss of data. | ||
26 | |||
27 | Some drivers can manage hardware wakeup events, which make the system | ||
28 | leave that low-power state. This feature may be disabled using the | ||
29 | relevant /sys/devices/.../power/wakeup file; enabling it may cost some | ||
30 | power usage, but let the whole system enter low power states more often. | ||
31 | |||
32 | Runtime Power Management model: | ||
33 | Drivers may also enter low power states while the system is running, | ||
34 | independently of other power management activity. Upstream drivers | ||
35 | will normally not know (or care) if the device is in some low power | ||
36 | state when issuing requests; the driver will auto-resume anything | ||
37 | that's needed when it gets a request. | ||
38 | |||
39 | This doesn't have, or need much infrastructure; it's just something you | ||
40 | should do when writing your drivers. For example, clk_disable() unused | ||
41 | clocks as part of minimizing power drain for currently-unused hardware. | ||
42 | Of course, sometimes clusters of drivers will collaborate with each | ||
43 | other, which could involve task-specific power management. | ||
44 | |||
45 | There's not a lot to be said about those low power states except that they | ||
46 | are very system-specific, and often device-specific. Also, that if enough | ||
47 | drivers put themselves into low power states (at "runtime"), the effect may be | ||
48 | the same as entering some system-wide low-power state (system sleep) ... and | ||
49 | that synergies exist, so that several drivers using runtime pm might put the | ||
50 | system into a state where even deeper power saving options are available. | ||
51 | |||
52 | Most suspended devices will have quiesced all I/O: no more DMA or irqs, no | ||
53 | more data read or written, and requests from upstream drivers are no longer | ||
54 | accepted. A given bus or platform may have different requirements though. | ||
55 | |||
56 | Examples of hardware wakeup events include an alarm from a real time clock, | ||
57 | network wake-on-LAN packets, keyboard or mouse activity, and media insertion | ||
58 | or removal (for PCMCIA, MMC/SD, USB, and so on). | ||
59 | |||
60 | |||
61 | Interfaces for Entering System Sleep States | ||
62 | =========================================== | ||
63 | Most of the programming interfaces a device driver needs to know about | ||
64 | relate to that first model: entering a system-wide low power state, | ||
65 | rather than just minimizing power consumption by one device. | ||
66 | |||
67 | |||
68 | Bus Driver Methods | ||
69 | ------------------ | ||
70 | The core methods to suspend and resume devices reside in struct bus_type. | ||
71 | These are mostly of interest to people writing infrastructure for busses | ||
72 | like PCI or USB, or because they define the primitives that device drivers | ||
73 | may need to apply in domain-specific ways to their devices: | ||
1 | 74 | ||
2 | Device Power Management | 75 | struct bus_type { |
76 | ... | ||
77 | int (*suspend)(struct device *dev, pm_message_t state); | ||
78 | int (*suspend_late)(struct device *dev, pm_message_t state); | ||
3 | 79 | ||
80 | int (*resume_early)(struct device *dev); | ||
81 | int (*resume)(struct device *dev); | ||
82 | }; | ||
4 | 83 | ||
5 | Device power management encompasses two areas - the ability to save | 84 | Bus drivers implement those methods as appropriate for the hardware and |
6 | state and transition a device to a low-power state when the system is | 85 | the drivers using it; PCI works differently from USB, and so on. Not many |
7 | entering a low-power state; and the ability to transition a device to | 86 | people write bus drivers; most driver code is a "device driver" that |
8 | a low-power state while the system is running (and independently of | 87 | builds on top of bus-specific framework code. |
9 | any other power management activity). | 88 | |
89 | For more information on these driver calls, see the description later; | ||
90 | they are called in phases for every device, respecting the parent-child | ||
91 | sequencing in the driver model tree. Note that as this is being written, | ||
92 | only the suspend() and resume() are widely available; not many bus drivers | ||
93 | leverage all of those phases, or pass them down to lower driver levels. | ||
94 | |||
95 | |||
96 | /sys/devices/.../power/wakeup files | ||
97 | ----------------------------------- | ||
98 | All devices in the driver model have two flags to control handling of | ||
99 | wakeup events, which are hardware signals that can force the device and/or | ||
100 | system out of a low power state. These are initialized by bus or device | ||
101 | driver code using device_init_wakeup(dev,can_wakeup). | ||
102 | |||
103 | The "can_wakeup" flag just records whether the device (and its driver) can | ||
104 | physically support wakeup events. When that flag is clear, the sysfs | ||
105 | "wakeup" file is empty, and device_may_wakeup() returns false. | ||
106 | |||
107 | For devices that can issue wakeup events, a separate flag controls whether | ||
108 | that device should try to use its wakeup mechanism. The initial value of | ||
109 | device_may_wakeup() will be true, so that the device's "wakeup" file holds | ||
110 | the value "enabled". Userspace can change that to "disabled" so that | ||
111 | device_may_wakeup() returns false; or change it back to "enabled" (so that | ||
112 | it returns true again). | ||
113 | |||
114 | |||
115 | EXAMPLE: PCI Device Driver Methods | ||
116 | ----------------------------------- | ||
117 | PCI framework software calls these methods when the PCI device driver bound | ||
118 | to a device device has provided them: | ||
119 | |||
120 | struct pci_driver { | ||
121 | ... | ||
122 | int (*suspend)(struct pci_device *pdev, pm_message_t state); | ||
123 | int (*suspend_late)(struct pci_device *pdev, pm_message_t state); | ||
124 | |||
125 | int (*resume_early)(struct pci_device *pdev); | ||
126 | int (*resume)(struct pci_device *pdev); | ||
127 | }; | ||
10 | 128 | ||
129 | Drivers will implement those methods, and call PCI-specific procedures | ||
130 | like pci_set_power_state(), pci_enable_wake(), pci_save_state(), and | ||
131 | pci_restore_state() to manage PCI-specific mechanisms. (PCI config space | ||
132 | could be saved during driver probe, if it weren't for the fact that some | ||
133 | systems rely on userspace tweaking using setpci.) Devices are suspended | ||
134 | before their bridges enter low power states, and likewise bridges resume | ||
135 | before their devices. | ||
136 | |||
137 | |||
138 | Upper Layers of Driver Stacks | ||
139 | ----------------------------- | ||
140 | Device drivers generally have at least two interfaces, and the methods | ||
141 | sketched above are the ones which apply to the lower level (nearer PCI, USB, | ||
142 | or other bus hardware). The network and block layers are examples of upper | ||
143 | level interfaces, as is a character device talking to userspace. | ||
144 | |||
145 | Power management requests normally need to flow through those upper levels, | ||
146 | which often use domain-oriented requests like "blank that screen". In | ||
147 | some cases those upper levels will have power management intelligence that | ||
148 | relates to end-user activity, or other devices that work in cooperation. | ||
149 | |||
150 | When those interfaces are structured using class interfaces, there is a | ||
151 | standard way to have the upper layer stop issuing requests to a given | ||
152 | class device (and restart later): | ||
153 | |||
154 | struct class { | ||
155 | ... | ||
156 | int (*suspend)(struct device *dev, pm_message_t state); | ||
157 | int (*resume)(struct device *dev); | ||
158 | }; | ||
11 | 159 | ||
12 | Methods | 160 | Those calls are issued in specific phases of the process by which the |
161 | system enters a low power "suspend" state, or resumes from it. | ||
162 | |||
163 | |||
164 | Calling Drivers to Enter System Sleep States | ||
165 | ============================================ | ||
166 | When the system enters a low power state, each device's driver is asked | ||
167 | to suspend the device by putting it into state compatible with the target | ||
168 | system state. That's usually some version of "off", but the details are | ||
169 | system-specific. Also, wakeup-enabled devices will usually stay partly | ||
170 | functional in order to wake the system. | ||
171 | |||
172 | When the system leaves that low power state, the device's driver is asked | ||
173 | to resume it. The suspend and resume operations always go together, and | ||
174 | both are multi-phase operations. | ||
175 | |||
176 | For simple drivers, suspend might quiesce the device using the class code | ||
177 | and then turn its hardware as "off" as possible with late_suspend. The | ||
178 | matching resume calls would then completely reinitialize the hardware | ||
179 | before reactivating its class I/O queues. | ||
180 | |||
181 | More power-aware drivers drivers will use more than one device low power | ||
182 | state, either at runtime or during system sleep states, and might trigger | ||
183 | system wakeup events. | ||
184 | |||
185 | |||
186 | Call Sequence Guarantees | ||
187 | ------------------------ | ||
188 | To ensure that bridges and similar links needed to talk to a device are | ||
189 | available when the device is suspended or resumed, the device tree is | ||
190 | walked in a bottom-up order to suspend devices. A top-down order is | ||
191 | used to resume those devices. | ||
192 | |||
193 | The ordering of the device tree is defined by the order in which devices | ||
194 | get registered: a child can never be registered, probed or resumed before | ||
195 | its parent; and can't be removed or suspended after that parent. | ||
196 | |||
197 | The policy is that the device tree should match hardware bus topology. | ||
198 | (Or at least the control bus, for devices which use multiple busses.) | ||
199 | |||
200 | |||
201 | Suspending Devices | ||
202 | ------------------ | ||
203 | Suspending a given device is done in several phases. Suspending the | ||
204 | system always includes every phase, executing calls for every device | ||
205 | before the next phase begins. Not all busses or classes support all | ||
206 | these callbacks; and not all drivers use all the callbacks. | ||
207 | |||
208 | The phases are seen by driver notifications issued in this order: | ||
209 | |||
210 | 1 class.suspend(dev, message) is called after tasks are frozen, for | ||
211 | devices associated with a class that has such a method. This | ||
212 | method may sleep. | ||
213 | |||
214 | Since I/O activity usually comes from such higher layers, this is | ||
215 | a good place to quiesce all drivers of a given type (and keep such | ||
216 | code out of those drivers). | ||
217 | |||
218 | 2 bus.suspend(dev, message) is called next. This method may sleep, | ||
219 | and is often morphed into a device driver call with bus-specific | ||
220 | parameters and/or rules. | ||
221 | |||
222 | This call should handle parts of device suspend logic that require | ||
223 | sleeping. It probably does work to quiesce the device which hasn't | ||
224 | been abstracted into class.suspend() or bus.suspend_late(). | ||
225 | |||
226 | 3 bus.suspend_late(dev, message) is called with IRQs disabled, and | ||
227 | with only one CPU active. Until the bus.resume_early() phase | ||
228 | completes (see later), IRQs are not enabled again. This method | ||
229 | won't be exposed by all busses; for message based busses like USB, | ||
230 | I2C, or SPI, device interactions normally require IRQs. This bus | ||
231 | call may be morphed into a driver call with bus-specific parameters. | ||
232 | |||
233 | This call might save low level hardware state that might otherwise | ||
234 | be lost in the upcoming low power state, and actually put the | ||
235 | device into a low power state ... so that in some cases the device | ||
236 | may stay partly usable until this late. This "late" call may also | ||
237 | help when coping with hardware that behaves badly. | ||
238 | |||
239 | The pm_message_t parameter is currently used to refine those semantics | ||
240 | (described later). | ||
241 | |||
242 | At the end of those phases, drivers should normally have stopped all I/O | ||
243 | transactions (DMA, IRQs), saved enough state that they can re-initialize | ||
244 | or restore previous state (as needed by the hardware), and placed the | ||
245 | device into a low-power state. On many platforms they will also use | ||
246 | clk_disable() to gate off one or more clock sources; sometimes they will | ||
247 | also switch off power supplies, or reduce voltages. Drivers which have | ||
248 | runtime PM support may already have performed some or all of the steps | ||
249 | needed to prepare for the upcoming system sleep state. | ||
250 | |||
251 | When any driver sees that its device_can_wakeup(dev), it should make sure | ||
252 | to use the relevant hardware signals to trigger a system wakeup event. | ||
253 | For example, enable_irq_wake() might identify GPIO signals hooked up to | ||
254 | a switch or other external hardware, and pci_enable_wake() does something | ||
255 | similar for PCI's PME# signal. | ||
256 | |||
257 | If a driver (or bus, or class) fails it suspend method, the system won't | ||
258 | enter the desired low power state; it will resume all the devices it's | ||
259 | suspended so far. | ||
260 | |||
261 | Note that drivers may need to perform different actions based on the target | ||
262 | system lowpower/sleep state. At this writing, there are only platform | ||
263 | specific APIs through which drivers could determine those target states. | ||
264 | |||
265 | |||
266 | Device Low Power (suspend) States | ||
267 | --------------------------------- | ||
268 | Device low-power states aren't very standard. One device might only handle | ||
269 | "on" and "off, while another might support a dozen different versions of | ||
270 | "on" (how many engines are active?), plus a state that gets back to "on" | ||
271 | faster than from a full "off". | ||
272 | |||
273 | Some busses define rules about what different suspend states mean. PCI | ||
274 | gives one example: after the suspend sequence completes, a non-legacy | ||
275 | PCI device may not perform DMA or issue IRQs, and any wakeup events it | ||
276 | issues would be issued through the PME# bus signal. Plus, there are | ||
277 | several PCI-standard device states, some of which are optional. | ||
278 | |||
279 | In contrast, integrated system-on-chip processors often use irqs as the | ||
280 | wakeup event sources (so drivers would call enable_irq_wake) and might | ||
281 | be able to treat DMA completion as a wakeup event (sometimes DMA can stay | ||
282 | active too, it'd only be the CPU and some peripherals that sleep). | ||
283 | |||
284 | Some details here may be platform-specific. Systems may have devices that | ||
285 | can be fully active in certain sleep states, such as an LCD display that's | ||
286 | refreshed using DMA while most of the system is sleeping lightly ... and | ||
287 | its frame buffer might even be updated by a DSP or other non-Linux CPU while | ||
288 | the Linux control processor stays idle. | ||
289 | |||
290 | Moreover, the specific actions taken may depend on the target system state. | ||
291 | One target system state might allow a given device to be very operational; | ||
292 | another might require a hard shut down with re-initialization on resume. | ||
293 | And two different target systems might use the same device in different | ||
294 | ways; the aforementioned LCD might be active in one product's "standby", | ||
295 | but a different product using the same SOC might work differently. | ||
296 | |||
297 | |||
298 | Meaning of pm_message_t.event | ||
299 | ----------------------------- | ||
300 | Parameters to suspend calls include the device affected and a message of | ||
301 | type pm_message_t, which has one field: the event. If driver does not | ||
302 | recognize the event code, suspend calls may abort the request and return | ||
303 | a negative errno. However, most drivers will be fine if they implement | ||
304 | PM_EVENT_SUSPEND semantics for all messages. | ||
305 | |||
306 | The event codes are used to refine the goal of suspending the device, and | ||
307 | mostly matter when creating or resuming system memory image snapshots, as | ||
308 | used with suspend-to-disk: | ||
309 | |||
310 | PM_EVENT_SUSPEND -- quiesce the driver and put hardware into a low-power | ||
311 | state. When used with system sleep states like "suspend-to-RAM" or | ||
312 | "standby", the upcoming resume() call will often be able to rely on | ||
313 | state kept in hardware, or issue system wakeup events. When used | ||
314 | instead with suspend-to-disk, few devices support this capability; | ||
315 | most are completely powered off. | ||
316 | |||
317 | PM_EVENT_FREEZE -- quiesce the driver, but don't necessarily change into | ||
318 | any low power mode. A system snapshot is about to be taken, often | ||
319 | followed by a call to the driver's resume() method. Neither wakeup | ||
320 | events nor DMA are allowed. | ||
321 | |||
322 | PM_EVENT_PRETHAW -- quiesce the driver, knowing that the upcoming resume() | ||
323 | will restore a suspend-to-disk snapshot from a different kernel image. | ||
324 | Drivers that are smart enough to look at their hardware state during | ||
325 | resume() processing need that state to be correct ... a PRETHAW could | ||
326 | be used to invalidate that state (by resetting the device), like a | ||
327 | shutdown() invocation would before a kexec() or system halt. Other | ||
328 | drivers might handle this the same way as PM_EVENT_FREEZE. Neither | ||
329 | wakeup events nor DMA are allowed. | ||
330 | |||
331 | To enter "standby" (ACPI S1) or "Suspend to RAM" (STR, ACPI S3) states, or | ||
332 | the similarly named APM states, only PM_EVENT_SUSPEND is used; for "Suspend | ||
333 | to Disk" (STD, hibernate, ACPI S4), all of those event codes are used. | ||
334 | |||
335 | There's also PM_EVENT_ON, a value which never appears as a suspend event | ||
336 | but is sometimes used to record the "not suspended" device state. | ||
337 | |||
338 | |||
339 | Resuming Devices | ||
340 | ---------------- | ||
341 | Resuming is done in multiple phases, much like suspending, with all | ||
342 | devices processing each phase's calls before the next phase begins. | ||
343 | |||
344 | The phases are seen by driver notifications issued in this order: | ||
345 | |||
346 | 1 bus.resume_early(dev) is called with IRQs disabled, and with | ||
347 | only one CPU active. As with bus.suspend_late(), this method | ||
348 | won't be supported on busses that require IRQs in order to | ||
349 | interact with devices. | ||
350 | |||
351 | This reverses the effects of bus.suspend_late(). | ||
352 | |||
353 | 2 bus.resume(dev) is called next. This may be morphed into a device | ||
354 | driver call with bus-specific parameters; implementations may sleep. | ||
355 | |||
356 | This reverses the effects of bus.suspend(). | ||
357 | |||
358 | 3 class.resume(dev) is called for devices associated with a class | ||
359 | that has such a method. Implementations may sleep. | ||
360 | |||
361 | This reverses the effects of class.suspend(), and would usually | ||
362 | reactivate the device's I/O queue. | ||
363 | |||
364 | At the end of those phases, drivers should normally be as functional as | ||
365 | they were before suspending: I/O can be performed using DMA and IRQs, and | ||
366 | the relevant clocks are gated on. The device need not be "fully on"; it | ||
367 | might be in a runtime lowpower/suspend state that acts as if it were. | ||
368 | |||
369 | However, the details here may again be platform-specific. For example, | ||
370 | some systems support multiple "run" states, and the mode in effect at | ||
371 | the end of resume() might not be the one which preceded suspension. | ||
372 | That means availability of certain clocks or power supplies changed, | ||
373 | which could easily affect how a driver works. | ||
374 | |||
375 | |||
376 | Drivers need to be able to handle hardware which has been reset since the | ||
377 | suspend methods were called, for example by complete reinitialization. | ||
378 | This may be the hardest part, and the one most protected by NDA'd documents | ||
379 | and chip errata. It's simplest if the hardware state hasn't changed since | ||
380 | the suspend() was called, but that can't always be guaranteed. | ||
381 | |||
382 | Drivers must also be prepared to notice that the device has been removed | ||
383 | while the system was powered off, whenever that's physically possible. | ||
384 | PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses | ||
385 | where common Linux platforms will see such removal. Details of how drivers | ||
386 | will notice and handle such removals are currently bus-specific, and often | ||
387 | involve a separate thread. | ||
13 | 388 | ||
14 | The methods to suspend and resume devices reside in struct bus_type: | ||
15 | 389 | ||
16 | struct bus_type { | 390 | Note that the bus-specific runtime PM wakeup mechanism can exist, and might |
17 | ... | 391 | be defined to share some of the same driver code as for system wakeup. For |
18 | int (*suspend)(struct device * dev, pm_message_t state); | 392 | example, a bus-specific device driver's resume() method might be used there, |
19 | int (*resume)(struct device * dev); | 393 | so it wouldn't only be called from bus.resume() during system-wide wakeup. |
20 | }; | 394 | See bus-specific information about how runtime wakeup events are handled. |
21 | 395 | ||
22 | Each bus driver is responsible implementing these methods, translating | ||
23 | the call into a bus-specific request and forwarding the call to the | ||
24 | bus-specific drivers. For example, PCI drivers implement suspend() and | ||
25 | resume() methods in struct pci_driver. The PCI core is simply | ||
26 | responsible for translating the pointers to PCI-specific ones and | ||
27 | calling the low-level driver. | ||
28 | |||
29 | This is done to a) ease transition to the new power management methods | ||
30 | and leverage the existing PM code in various bus drivers; b) allow | ||
31 | buses to implement generic and default PM routines for devices, and c) | ||
32 | make the flow of execution obvious to the reader. | ||
33 | |||
34 | |||
35 | System Power Management | ||
36 | |||
37 | When the system enters a low-power state, the device tree is walked in | ||
38 | a depth-first fashion to transition each device into a low-power | ||
39 | state. The ordering of the device tree is guaranteed by the order in | ||
40 | which devices get registered - children are never registered before | ||
41 | their ancestors, and devices are placed at the back of the list when | ||
42 | registered. By walking the list in reverse order, we are guaranteed to | ||
43 | suspend devices in the proper order. | ||
44 | |||
45 | Devices are suspended once with interrupts enabled. Drivers are | ||
46 | expected to stop I/O transactions, save device state, and place the | ||
47 | device into a low-power state. Drivers may sleep, allocate memory, | ||
48 | etc. at will. | ||
49 | |||
50 | Some devices are broken and will inevitably have problems powering | ||
51 | down or disabling themselves with interrupts enabled. For these | ||
52 | special cases, they may return -EAGAIN. This will put the device on a | ||
53 | list to be taken care of later. When interrupts are disabled, before | ||
54 | we enter the low-power state, their drivers are called again to put | ||
55 | their device to sleep. | ||
56 | |||
57 | On resume, the devices that returned -EAGAIN will be called to power | ||
58 | themselves back on with interrupts disabled. Once interrupts have been | ||
59 | re-enabled, the rest of the drivers will be called to resume their | ||
60 | devices. On resume, a driver is responsible for powering back on each | ||
61 | device, restoring state, and re-enabling I/O transactions for that | ||
62 | device. | ||
63 | 396 | ||
397 | System Devices | ||
398 | -------------- | ||
64 | System devices follow a slightly different API, which can be found in | 399 | System devices follow a slightly different API, which can be found in |
65 | 400 | ||
66 | include/linux/sysdev.h | 401 | include/linux/sysdev.h |
67 | drivers/base/sys.c | 402 | drivers/base/sys.c |
68 | 403 | ||
69 | System devices will only be suspended with interrupts disabled, and | 404 | System devices will only be suspended with interrupts disabled, and after |
70 | after all other devices have been suspended. On resume, they will be | 405 | all other devices have been suspended. On resume, they will be resumed |
71 | resumed before any other devices, and also with interrupts disabled. | 406 | before any other devices, and also with interrupts disabled. |
72 | 407 | ||
408 | That is, IRQs are disabled, the suspend_late() phase begins, then the | ||
409 | sysdev_driver.suspend() phase, and the system enters a sleep state. Then | ||
410 | the sysdev_driver.resume() phase begins, followed by the resume_early() | ||
411 | phase, after which IRQs are enabled. | ||
73 | 412 | ||
74 | Runtime Power Management | 413 | Code to actually enter and exit the system-wide low power state sometimes |
75 | 414 | involves hardware details that are only known to the boot firmware, and | |
76 | Many devices are able to dynamically power down while the system is | 415 | may leave a CPU running software (from SRAM or flash memory) that monitors |
77 | still running. This feature is useful for devices that are not being | 416 | the system and manages its wakeup sequence. |
78 | used, and can offer significant power savings on a running system. | ||
79 | |||
80 | In each device's directory, there is a 'power' directory, which | ||
81 | contains at least a 'state' file. Reading from this file displays what | ||
82 | power state the device is currently in. Writing to this file initiates | ||
83 | a transition to the specified power state, which must be a decimal in | ||
84 | the range 1-3, inclusive; or 0 for 'On'. | ||
85 | 417 | ||
86 | The PM core will call the ->suspend() method in the bus_type object | ||
87 | that the device belongs to if the specified state is not 0, or | ||
88 | ->resume() if it is. | ||
89 | 418 | ||
90 | Nothing will happen if the specified state is the same state the | 419 | Runtime Power Management |
91 | device is currently in. | 420 | ======================== |
92 | 421 | Many devices are able to dynamically power down while the system is still | |
93 | If the device is already in a low-power state, and the specified state | 422 | running. This feature is useful for devices that are not being used, and |
94 | is another, but different, low-power state, the ->resume() method will | 423 | can offer significant power savings on a running system. These devices |
95 | first be called to power the device back on, then ->suspend() will be | 424 | often support a range of runtime power states, which might use names such |
96 | called again with the new state. | 425 | as "off", "sleep", "idle", "active", and so on. Those states will in some |
97 | 426 | cases (like PCI) be partially constrained by a bus the device uses, and will | |
98 | The driver is responsible for saving the working state of the device | 427 | usually include hardware states that are also used in system sleep states. |
99 | and putting it into the low-power state specified. If this was | 428 | |
100 | successful, it returns 0, and the device's power_state field is | 429 | However, note that if a driver puts a device into a runtime low power state |
101 | updated. | 430 | and the system then goes into a system-wide sleep state, it normally ought |
102 | 431 | to resume into that runtime low power state rather than "full on". Such | |
103 | The driver must take care to know whether or not it is able to | 432 | distinctions would be part of the driver-internal state machine for that |
104 | properly resume the device, including all step of reinitialization | 433 | hardware; the whole point of runtime power management is to be sure that |
105 | necessary. (This is the hardest part, and the one most protected by | 434 | drivers are decoupled in that way from the state machine governing phases |
106 | NDA'd documents). | 435 | of the system-wide power/sleep state transitions. |
107 | 436 | ||
108 | The driver must also take care not to suspend a device that is | 437 | |
109 | currently in use. It is their responsibility to provide their own | 438 | Power Saving Techniques |
110 | exclusion mechanisms. | 439 | ----------------------- |
111 | 440 | Normally runtime power management is handled by the drivers without specific | |
112 | The runtime power transition happens with interrupts enabled. If a | 441 | userspace or kernel intervention, by device-aware use of techniques like: |
113 | device cannot support being powered down with interrupts, it may | 442 | |
114 | return -EAGAIN (as it would during a system power management | 443 | Using information provided by other system layers |
115 | transition), but it will _not_ be called again, and the transaction | 444 | - stay deeply "off" except between open() and close() |
116 | will fail. | 445 | - if transceiver/PHY indicates "nobody connected", stay "off" |
117 | 446 | - application protocols may include power commands or hints | |
118 | There is currently no way to know what states a device or driver | 447 | |
119 | supports a priori. This will change in the future. | 448 | Using fewer CPU cycles |
120 | 449 | - using DMA instead of PIO | |
121 | pm_message_t meaning | 450 | - removing timers, or making them lower frequency |
122 | 451 | - shortening "hot" code paths | |
123 | pm_message_t has two fields. event ("major"), and flags. If driver | 452 | - eliminating cache misses |
124 | does not know event code, it aborts the request, returning error. Some | 453 | - (sometimes) offloading work to device firmware |
125 | drivers may need to deal with special cases based on the actual type | 454 | |
126 | of suspend operation being done at the system level. This is why | 455 | Reducing other resource costs |
127 | there are flags. | 456 | - gating off unused clocks in software (or hardware) |
128 | 457 | - switching off unused power supplies | |
129 | Event codes are: | 458 | - eliminating (or delaying/merging) IRQs |
130 | 459 | - tuning DMA to use word and/or burst modes | |
131 | ON -- no need to do anything except special cases like broken | 460 | |
132 | HW. | 461 | Using device-specific low power states |
133 | 462 | - using lower voltages | |
134 | # NOTIFICATION -- pretty much same as ON? | 463 | - avoiding needless DMA transfers |
135 | 464 | ||
136 | FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from | 465 | Read your hardware documentation carefully to see the opportunities that |
137 | scratch. That probably means stop accepting upstream requests, the | 466 | may be available. If you can, measure the actual power usage and check |
138 | actual policy of what to do with them being specific to a given | 467 | it against the budget established for your project. |
139 | driver. It's acceptable for a network driver to just drop packets | 468 | |
140 | while a block driver is expected to block the queue so no request is | 469 | |
141 | lost. (Use IDE as an example on how to do that). FREEZE requires no | 470 | Examples: USB hosts, system timer, system CPU |
142 | power state change, and it's expected for drivers to be able to | 471 | ---------------------------------------------- |
143 | quickly transition back to operating state. | 472 | USB host controllers make interesting, if complex, examples. In many cases |
144 | 473 | these have no work to do: no USB devices are connected, or all of them are | |
145 | SUSPEND -- like FREEZE, but also put hardware into low-power state. If | 474 | in the USB "suspend" state. Linux host controller drivers can then disable |
146 | there's need to distinguish several levels of sleep, additional flag | 475 | periodic DMA transfers that would otherwise be a constant power drain on the |
147 | is probably best way to do that. | 476 | memory subsystem, and enter a suspend state. In power-aware controllers, |
148 | 477 | entering that suspend state may disable the clock used with USB signaling, | |
149 | Transitions are only from a resumed state to a suspended state, never | 478 | saving a certain amount of power. |
150 | between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen, | 479 | |
151 | FREEZE -> SUSPEND or SUSPEND -> FREEZE can not). | 480 | The controller will be woken from that state (with an IRQ) by changes to the |
152 | 481 | signal state on the data lines of a given port, for example by an existing | |
153 | All events are: | 482 | peripheral requesting "remote wakeup" or by plugging a new peripheral. The |
154 | 483 | same wakeup mechanism usually works from "standby" sleep states, and on some | |
155 | [NOTE NOTE NOTE: If you are driver author, you should not care; you | 484 | systems also from "suspend to RAM" (or even "suspend to disk") states. |
156 | should only look at event, and ignore flags.] | 485 | (Except that ACPI may be involved instead of normal IRQs, on some hardware.) |
157 | 486 | ||
158 | #Prepare for suspend -- userland is still running but we are going to | 487 | System devices like timers and CPUs may have special roles in the platform |
159 | #enter suspend state. This gives drivers chance to load firmware from | 488 | power management scheme. For example, system timers using a "dynamic tick" |
160 | #disk and store it in memory, or do other activities taht require | 489 | approach don't just save CPU cycles (by eliminating needless timer IRQs), |
161 | #operating userland, ability to kmalloc GFP_KERNEL, etc... All of these | 490 | but they may also open the door to using lower power CPU "idle" states that |
162 | #are forbiden once the suspend dance is started.. event = ON, flags = | 491 | cost more than a jiffie to enter and exit. On x86 systems these are states |
163 | #PREPARE_TO_SUSPEND | 492 | like "C3"; note that periodic DMA transfers from a USB host controller will |
164 | 493 | also prevent entry to a C3 state, much like a periodic timer IRQ. | |
165 | Apm standby -- prepare for APM event. Quiesce devices to make life | 494 | |
166 | easier for APM BIOS. event = FREEZE, flags = APM_STANDBY | 495 | That kind of runtime mechanism interaction is common. "System On Chip" (SOC) |
167 | 496 | processors often have low power idle modes that can't be entered unless | |
168 | Apm suspend -- same as APM_STANDBY, but it we should probably avoid | 497 | certain medium-speed clocks (often 12 or 48 MHz) are gated off. When the |
169 | spinning down disks. event = FREEZE, flags = APM_SUSPEND | 498 | drivers gate those clocks effectively, then the system idle task may be able |
170 | 499 | to use the lower power idle modes and thereby increase battery life. | |
171 | System halt, reboot -- quiesce devices to make life easier for BIOS. event | 500 | |
172 | = FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT | 501 | If the CPU can have a "cpufreq" driver, there also may be opportunities |
173 | 502 | to shift to lower voltage settings and reduce the power cost of executing | |
174 | System shutdown -- at least disks need to be spun down, or data may be | 503 | a given number of instructions. (Without voltage adjustment, it's rare |
175 | lost. Quiesce devices, just to make life easier for BIOS. event = | 504 | for cpufreq to save much power; the cost-per-instruction must go down.) |
176 | FREEZE, flags = SYSTEM_SHUTDOWN | 505 | |
177 | 506 | ||
178 | Kexec -- turn off DMAs and put hardware into some state where new | 507 | /sys/devices/.../power/state files |
179 | kernel can take over. event = FREEZE, flags = KEXEC | 508 | ================================== |
180 | 509 | For now you can also test some of this functionality using sysfs. | |
181 | Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake | 510 | |
182 | may need to be enabled on some devices. This actually has at least 3 | 511 | DEPRECATED: USE "power/state" ONLY FOR DRIVER TESTING, AND |
183 | subtypes, system can reboot, enter S4 and enter S5 at the end of | 512 | AVOID USING dev->power.power_state IN DRIVERS. |
184 | swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT, | 513 | |
185 | SYSTEM_SHUTDOWN, SYSTEM_S4 | 514 | THESE WILL BE REMOVED. IF THE "power/state" FILE GETS REPLACED, |
186 | 515 | IT WILL BECOME SOMETHING COUPLED TO THE BUS OR DRIVER. | |
187 | Suspend to ram -- put devices into low power state. event = SUSPEND, | 516 | |
188 | flags = SUSPEND_TO_RAM | 517 | In each device's directory, there is a 'power' directory, which contains |
189 | 518 | at least a 'state' file. The value of this field is effectively boolean, | |
190 | Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put | 519 | PM_EVENT_ON or PM_EVENT_SUSPEND. |
191 | devices into low power mode, but you must be able to reinitialize | 520 | |
192 | device from scratch in resume method. This has two flavors, its done | 521 | * Reading from this file displays a value corresponding to |
193 | once on suspending kernel, once on resuming kernel. event = FREEZE, | 522 | the power.power_state.event field. All nonzero values are |
194 | flags = DURING_SUSPEND or DURING_RESUME | 523 | displayed as "2", corresponding to a low power state; zero |
195 | 524 | is displayed as "0", corresponding to normal operation. | |
196 | Device detach requested from /sys -- deinitialize device; proably same as | 525 | |
197 | SYSTEM_SHUTDOWN, I do not understand this one too much. probably event | 526 | * Writing to this file initiates a transition using the |
198 | = FREEZE, flags = DEV_DETACH. | 527 | specified event code number; only '0', '2', and '3' are |
199 | 528 | accepted (without a newline); '2' and '3' are both | |
200 | #These are not really events sent: | 529 | mapped to PM_EVENT_SUSPEND. |
201 | # | 530 | |
202 | #System fully on -- device is working normally; this is probably never | 531 | On writes, the PM core relies on that recorded event code and the device/bus |
203 | #passed to suspend() method... event = ON, flags = 0 | 532 | capabilities to determine whether it uses a partial suspend() or resume() |
204 | # | 533 | sequence to change things so that the recorded event corresponds to the |
205 | #Ready after resume -- userland is now running, again. Time to free any | 534 | numeric parameter. |
206 | #memory you ate during prepare to suspend... event = ON, flags = | 535 | |
207 | #READY_AFTER_RESUME | 536 | - If the bus requires the irqs-disabled suspend_late()/resume_early() |
208 | # | 537 | phases, writes fail because those operations are not supported here. |
538 | |||
539 | - If the recorded value is the expected value, nothing is done. | ||
540 | |||
541 | - If the recorded value is nonzero, the device is partially resumed, | ||
542 | using the bus.resume() and/or class.resume() methods. | ||
543 | |||
544 | - If the target value is nonzero, the device is partially suspended, | ||
545 | using the class.suspend() and/or bus.suspend() methods and the | ||
546 | PM_EVENT_SUSPEND message. | ||
547 | |||
548 | Drivers have no way to tell whether their suspend() and resume() calls | ||
549 | have come through the sysfs power/state file or as part of entering a | ||
550 | system sleep state, except that when accessed through sysfs the normal | ||
551 | parent/child sequencing rules are ignored. Drivers (such as bus, bridge, | ||
552 | or hub drivers) which expose child devices may need to enforce those rules | ||
553 | on their own. | ||
diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt index 4117802af0f8..a66bec222b16 100644 --- a/Documentation/power/interface.txt +++ b/Documentation/power/interface.txt | |||
@@ -52,3 +52,18 @@ suspend image will be as small as possible. | |||
52 | 52 | ||
53 | Reading from this file will display the current image size limit, which | 53 | Reading from this file will display the current image size limit, which |
54 | is set to 500 MB by default. | 54 | is set to 500 MB by default. |
55 | |||
56 | /sys/power/pm_trace controls the code which saves the last PM event point in | ||
57 | the RTC across reboots, so that you can debug a machine that just hangs | ||
58 | during suspend (or more commonly, during resume). Namely, the RTC is only | ||
59 | used to save the last PM event point if this file contains '1'. Initially it | ||
60 | contains '0' which may be changed to '1' by writing a string representing a | ||
61 | nonzero integer into it. | ||
62 | |||
63 | To use this debugging feature you should attempt to suspend the machine, then | ||
64 | reboot it and run | ||
65 | |||
66 | dmesg -s 1000000 | grep 'hash matches' | ||
67 | |||
68 | CAUTION: Using it will cause your machine's real-time (CMOS) clock to be | ||
69 | set to a random invalid time after a resume. | ||
diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt index 217e51768b87..5c0ba235f5a5 100644 --- a/Documentation/powerpc/booting-without-of.txt +++ b/Documentation/powerpc/booting-without-of.txt | |||
@@ -1136,10 +1136,10 @@ Sense and level information should be encoded as follows: | |||
1136 | Devices connected to openPIC-compatible controllers should encode | 1136 | Devices connected to openPIC-compatible controllers should encode |
1137 | sense and polarity as follows: | 1137 | sense and polarity as follows: |
1138 | 1138 | ||
1139 | 0 = high to low edge sensitive type enabled | 1139 | 0 = low to high edge sensitive type enabled |
1140 | 1 = active low level sensitive type enabled | 1140 | 1 = active low level sensitive type enabled |
1141 | 2 = low to high edge sensitive type enabled | 1141 | 2 = active high level sensitive type enabled |
1142 | 3 = active high level sensitive type enabled | 1142 | 3 = high to low edge sensitive type enabled |
1143 | 1143 | ||
1144 | ISA PIC interrupt controllers should adhere to the ISA PIC | 1144 | ISA PIC interrupt controllers should adhere to the ISA PIC |
1145 | encodings listed below: | 1145 | encodings listed below: |
@@ -1196,7 +1196,7 @@ platforms are moved over to use the flattened-device-tree model. | |||
1196 | - model : Model of the device. Can be "TSEC", "eTSEC", or "FEC" | 1196 | - model : Model of the device. Can be "TSEC", "eTSEC", or "FEC" |
1197 | - compatible : Should be "gianfar" | 1197 | - compatible : Should be "gianfar" |
1198 | - reg : Offset and length of the register set for the device | 1198 | - reg : Offset and length of the register set for the device |
1199 | - address : List of bytes representing the ethernet address of | 1199 | - mac-address : List of bytes representing the ethernet address of |
1200 | this controller | 1200 | this controller |
1201 | - interrupts : <a b> where a is the interrupt number and b is a | 1201 | - interrupts : <a b> where a is the interrupt number and b is a |
1202 | field that represents an encoding of the sense and level | 1202 | field that represents an encoding of the sense and level |
@@ -1216,7 +1216,7 @@ platforms are moved over to use the flattened-device-tree model. | |||
1216 | model = "TSEC"; | 1216 | model = "TSEC"; |
1217 | compatible = "gianfar"; | 1217 | compatible = "gianfar"; |
1218 | reg = <24000 1000>; | 1218 | reg = <24000 1000>; |
1219 | address = [ 00 E0 0C 00 73 00 ]; | 1219 | mac-address = [ 00 E0 0C 00 73 00 ]; |
1220 | interrupts = <d 3 e 3 12 3>; | 1220 | interrupts = <d 3 e 3 12 3>; |
1221 | interrupt-parent = <40000>; | 1221 | interrupt-parent = <40000>; |
1222 | phy-handle = <2452000> | 1222 | phy-handle = <2452000> |
@@ -1436,9 +1436,9 @@ platforms are moved over to use the flattened-device-tree model. | |||
1436 | interrupts = <1d 3>; | 1436 | interrupts = <1d 3>; |
1437 | interrupt-parent = <40000>; | 1437 | interrupt-parent = <40000>; |
1438 | num-channels = <4>; | 1438 | num-channels = <4>; |
1439 | channel-fifo-len = <24>; | 1439 | channel-fifo-len = <18>; |
1440 | exec-units-mask = <000000fe>; | 1440 | exec-units-mask = <000000fe>; |
1441 | descriptor-types-mask = <073f1127>; | 1441 | descriptor-types-mask = <012b0ebf>; |
1442 | }; | 1442 | }; |
1443 | 1443 | ||
1444 | 1444 | ||
@@ -1498,7 +1498,7 @@ not necessary as they are usually the same as the root node. | |||
1498 | model = "TSEC"; | 1498 | model = "TSEC"; |
1499 | compatible = "gianfar"; | 1499 | compatible = "gianfar"; |
1500 | reg = <24000 1000>; | 1500 | reg = <24000 1000>; |
1501 | address = [ 00 E0 0C 00 73 00 ]; | 1501 | mac-address = [ 00 E0 0C 00 73 00 ]; |
1502 | interrupts = <d 3 e 3 12 3>; | 1502 | interrupts = <d 3 e 3 12 3>; |
1503 | interrupt-parent = <40000>; | 1503 | interrupt-parent = <40000>; |
1504 | phy-handle = <2452000>; | 1504 | phy-handle = <2452000>; |
@@ -1511,7 +1511,7 @@ not necessary as they are usually the same as the root node. | |||
1511 | model = "TSEC"; | 1511 | model = "TSEC"; |
1512 | compatible = "gianfar"; | 1512 | compatible = "gianfar"; |
1513 | reg = <25000 1000>; | 1513 | reg = <25000 1000>; |
1514 | address = [ 00 E0 0C 00 73 01 ]; | 1514 | mac-address = [ 00 E0 0C 00 73 01 ]; |
1515 | interrupts = <13 3 14 3 18 3>; | 1515 | interrupts = <13 3 14 3 18 3>; |
1516 | interrupt-parent = <40000>; | 1516 | interrupt-parent = <40000>; |
1517 | phy-handle = <2452001>; | 1517 | phy-handle = <2452001>; |
@@ -1524,7 +1524,7 @@ not necessary as they are usually the same as the root node. | |||
1524 | model = "FEC"; | 1524 | model = "FEC"; |
1525 | compatible = "gianfar"; | 1525 | compatible = "gianfar"; |
1526 | reg = <26000 1000>; | 1526 | reg = <26000 1000>; |
1527 | address = [ 00 E0 0C 00 73 02 ]; | 1527 | mac-address = [ 00 E0 0C 00 73 02 ]; |
1528 | interrupts = <19 3>; | 1528 | interrupts = <19 3>; |
1529 | interrupt-parent = <40000>; | 1529 | interrupt-parent = <40000>; |
1530 | phy-handle = <2452002>; | 1530 | phy-handle = <2452002>; |
diff --git a/Documentation/ramdisk.txt b/Documentation/ramdisk.txt index 7c25584e082c..52f75b7d51c2 100644 --- a/Documentation/ramdisk.txt +++ b/Documentation/ramdisk.txt | |||
@@ -6,7 +6,7 @@ Contents: | |||
6 | 1) Overview | 6 | 1) Overview |
7 | 2) Kernel Command Line Parameters | 7 | 2) Kernel Command Line Parameters |
8 | 3) Using "rdev -r" | 8 | 3) Using "rdev -r" |
9 | 4) An Example of Creating a Compressed RAM Disk | 9 | 4) An Example of Creating a Compressed RAM Disk |
10 | 10 | ||
11 | 11 | ||
12 | 1) Overview | 12 | 1) Overview |
@@ -34,7 +34,7 @@ make it clearer. The original "ramdisk=<ram_size>" has been kept around for | |||
34 | compatibility reasons, but it may be removed in the future. | 34 | compatibility reasons, but it may be removed in the future. |
35 | 35 | ||
36 | The new RAM disk also has the ability to load compressed RAM disk images, | 36 | The new RAM disk also has the ability to load compressed RAM disk images, |
37 | allowing one to squeeze more programs onto an average installation or | 37 | allowing one to squeeze more programs onto an average installation or |
38 | rescue floppy disk. | 38 | rescue floppy disk. |
39 | 39 | ||
40 | 40 | ||
@@ -51,7 +51,7 @@ default is 4096 (4 MB) (8192 (8 MB) on S390). | |||
51 | =================== | 51 | =================== |
52 | 52 | ||
53 | This parameter tells the RAM disk driver how many bytes to use per block. The | 53 | This parameter tells the RAM disk driver how many bytes to use per block. The |
54 | default is 512. | 54 | default is 1024 (BLOCK_SIZE). |
55 | 55 | ||
56 | 56 | ||
57 | 3) Using "rdev -r" | 57 | 3) Using "rdev -r" |
@@ -70,7 +70,7 @@ These numbers are no magical secrets, as seen below: | |||
70 | ./arch/i386/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000 | 70 | ./arch/i386/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000 |
71 | ./arch/i386/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000 | 71 | ./arch/i386/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000 |
72 | 72 | ||
73 | Consider a typical two floppy disk setup, where you will have the | 73 | Consider a typical two floppy disk setup, where you will have the |
74 | kernel on disk one, and have already put a RAM disk image onto disk #2. | 74 | kernel on disk one, and have already put a RAM disk image onto disk #2. |
75 | 75 | ||
76 | Hence you want to set bits 0 to 13 as 0, meaning that your RAM disk | 76 | Hence you want to set bits 0 to 13 as 0, meaning that your RAM disk |
@@ -97,12 +97,12 @@ Since the default start = 0 and the default prompt = 1, you could use: | |||
97 | append = "load_ramdisk=1" | 97 | append = "load_ramdisk=1" |
98 | 98 | ||
99 | 99 | ||
100 | 4) An Example of Creating a Compressed RAM Disk | 100 | 4) An Example of Creating a Compressed RAM Disk |
101 | ---------------------------------------------- | 101 | ---------------------------------------------- |
102 | 102 | ||
103 | To create a RAM disk image, you will need a spare block device to | 103 | To create a RAM disk image, you will need a spare block device to |
104 | construct it on. This can be the RAM disk device itself, or an | 104 | construct it on. This can be the RAM disk device itself, or an |
105 | unused disk partition (such as an unmounted swap partition). For this | 105 | unused disk partition (such as an unmounted swap partition). For this |
106 | example, we will use the RAM disk device, "/dev/ram0". | 106 | example, we will use the RAM disk device, "/dev/ram0". |
107 | 107 | ||
108 | Note: This technique should not be done on a machine with less than 8 MB | 108 | Note: This technique should not be done on a machine with less than 8 MB |
diff --git a/Documentation/robust-futexes.txt b/Documentation/robust-futexes.txt index df82d75245a0..76e8064b8c3a 100644 --- a/Documentation/robust-futexes.txt +++ b/Documentation/robust-futexes.txt | |||
@@ -95,7 +95,7 @@ comparison. If the thread has registered a list, then normally the list | |||
95 | is empty. If the thread/process crashed or terminated in some incorrect | 95 | is empty. If the thread/process crashed or terminated in some incorrect |
96 | way then the list might be non-empty: in this case the kernel carefully | 96 | way then the list might be non-empty: in this case the kernel carefully |
97 | walks the list [not trusting it], and marks all locks that are owned by | 97 | walks the list [not trusting it], and marks all locks that are owned by |
98 | this thread with the FUTEX_OWNER_DEAD bit, and wakes up one waiter (if | 98 | this thread with the FUTEX_OWNER_DIED bit, and wakes up one waiter (if |
99 | any). | 99 | any). |
100 | 100 | ||
101 | The list is guaranteed to be private and per-thread at do_exit() time, | 101 | The list is guaranteed to be private and per-thread at do_exit() time, |
diff --git a/Documentation/rt-mutex-design.txt b/Documentation/rt-mutex-design.txt new file mode 100644 index 000000000000..c472ffacc2f6 --- /dev/null +++ b/Documentation/rt-mutex-design.txt | |||
@@ -0,0 +1,781 @@ | |||
1 | # | ||
2 | # Copyright (c) 2006 Steven Rostedt | ||
3 | # Licensed under the GNU Free Documentation License, Version 1.2 | ||
4 | # | ||
5 | |||
6 | RT-mutex implementation design | ||
7 | ------------------------------ | ||
8 | |||
9 | This document tries to describe the design of the rtmutex.c implementation. | ||
10 | It doesn't describe the reasons why rtmutex.c exists. For that please see | ||
11 | Documentation/rt-mutex.txt. Although this document does explain problems | ||
12 | that happen without this code, but that is in the concept to understand | ||
13 | what the code actually is doing. | ||
14 | |||
15 | The goal of this document is to help others understand the priority | ||
16 | inheritance (PI) algorithm that is used, as well as reasons for the | ||
17 | decisions that were made to implement PI in the manner that was done. | ||
18 | |||
19 | |||
20 | Unbounded Priority Inversion | ||
21 | ---------------------------- | ||
22 | |||
23 | Priority inversion is when a lower priority process executes while a higher | ||
24 | priority process wants to run. This happens for several reasons, and | ||
25 | most of the time it can't be helped. Anytime a high priority process wants | ||
26 | to use a resource that a lower priority process has (a mutex for example), | ||
27 | the high priority process must wait until the lower priority process is done | ||
28 | with the resource. This is a priority inversion. What we want to prevent | ||
29 | is something called unbounded priority inversion. That is when the high | ||
30 | priority process is prevented from running by a lower priority process for | ||
31 | an undetermined amount of time. | ||
32 | |||
33 | The classic example of unbounded priority inversion is were you have three | ||
34 | processes, let's call them processes A, B, and C, where A is the highest | ||
35 | priority process, C is the lowest, and B is in between. A tries to grab a lock | ||
36 | that C owns and must wait and lets C run to release the lock. But in the | ||
37 | meantime, B executes, and since B is of a higher priority than C, it preempts C, | ||
38 | but by doing so, it is in fact preempting A which is a higher priority process. | ||
39 | Now there's no way of knowing how long A will be sleeping waiting for C | ||
40 | to release the lock, because for all we know, B is a CPU hog and will | ||
41 | never give C a chance to release the lock. This is called unbounded priority | ||
42 | inversion. | ||
43 | |||
44 | Here's a little ASCII art to show the problem. | ||
45 | |||
46 | grab lock L1 (owned by C) | ||
47 | | | ||
48 | A ---+ | ||
49 | C preempted by B | ||
50 | | | ||
51 | C +----+ | ||
52 | |||
53 | B +--------> | ||
54 | B now keeps A from running. | ||
55 | |||
56 | |||
57 | Priority Inheritance (PI) | ||
58 | ------------------------- | ||
59 | |||
60 | There are several ways to solve this issue, but other ways are out of scope | ||
61 | for this document. Here we only discuss PI. | ||
62 | |||
63 | PI is where a process inherits the priority of another process if the other | ||
64 | process blocks on a lock owned by the current process. To make this easier | ||
65 | to understand, let's use the previous example, with processes A, B, and C again. | ||
66 | |||
67 | This time, when A blocks on the lock owned by C, C would inherit the priority | ||
68 | of A. So now if B becomes runnable, it would not preempt C, since C now has | ||
69 | the high priority of A. As soon as C releases the lock, it loses its | ||
70 | inherited priority, and A then can continue with the resource that C had. | ||
71 | |||
72 | Terminology | ||
73 | ----------- | ||
74 | |||
75 | Here I explain some terminology that is used in this document to help describe | ||
76 | the design that is used to implement PI. | ||
77 | |||
78 | PI chain - The PI chain is an ordered series of locks and processes that cause | ||
79 | processes to inherit priorities from a previous process that is | ||
80 | blocked on one of its locks. This is described in more detail | ||
81 | later in this document. | ||
82 | |||
83 | mutex - In this document, to differentiate from locks that implement | ||
84 | PI and spin locks that are used in the PI code, from now on | ||
85 | the PI locks will be called a mutex. | ||
86 | |||
87 | lock - In this document from now on, I will use the term lock when | ||
88 | referring to spin locks that are used to protect parts of the PI | ||
89 | algorithm. These locks disable preemption for UP (when | ||
90 | CONFIG_PREEMPT is enabled) and on SMP prevents multiple CPUs from | ||
91 | entering critical sections simultaneously. | ||
92 | |||
93 | spin lock - Same as lock above. | ||
94 | |||
95 | waiter - A waiter is a struct that is stored on the stack of a blocked | ||
96 | process. Since the scope of the waiter is within the code for | ||
97 | a process being blocked on the mutex, it is fine to allocate | ||
98 | the waiter on the process's stack (local variable). This | ||
99 | structure holds a pointer to the task, as well as the mutex that | ||
100 | the task is blocked on. It also has the plist node structures to | ||
101 | place the task in the waiter_list of a mutex as well as the | ||
102 | pi_list of a mutex owner task (described below). | ||
103 | |||
104 | waiter is sometimes used in reference to the task that is waiting | ||
105 | on a mutex. This is the same as waiter->task. | ||
106 | |||
107 | waiters - A list of processes that are blocked on a mutex. | ||
108 | |||
109 | top waiter - The highest priority process waiting on a specific mutex. | ||
110 | |||
111 | top pi waiter - The highest priority process waiting on one of the mutexes | ||
112 | that a specific process owns. | ||
113 | |||
114 | Note: task and process are used interchangeably in this document, mostly to | ||
115 | differentiate between two processes that are being described together. | ||
116 | |||
117 | |||
118 | PI chain | ||
119 | -------- | ||
120 | |||
121 | The PI chain is a list of processes and mutexes that may cause priority | ||
122 | inheritance to take place. Multiple chains may converge, but a chain | ||
123 | would never diverge, since a process can't be blocked on more than one | ||
124 | mutex at a time. | ||
125 | |||
126 | Example: | ||
127 | |||
128 | Process: A, B, C, D, E | ||
129 | Mutexes: L1, L2, L3, L4 | ||
130 | |||
131 | A owns: L1 | ||
132 | B blocked on L1 | ||
133 | B owns L2 | ||
134 | C blocked on L2 | ||
135 | C owns L3 | ||
136 | D blocked on L3 | ||
137 | D owns L4 | ||
138 | E blocked on L4 | ||
139 | |||
140 | The chain would be: | ||
141 | |||
142 | E->L4->D->L3->C->L2->B->L1->A | ||
143 | |||
144 | To show where two chains merge, we could add another process F and | ||
145 | another mutex L5 where B owns L5 and F is blocked on mutex L5. | ||
146 | |||
147 | The chain for F would be: | ||
148 | |||
149 | F->L5->B->L1->A | ||
150 | |||
151 | Since a process may own more than one mutex, but never be blocked on more than | ||
152 | one, the chains merge. | ||
153 | |||
154 | Here we show both chains: | ||
155 | |||
156 | E->L4->D->L3->C->L2-+ | ||
157 | | | ||
158 | +->B->L1->A | ||
159 | | | ||
160 | F->L5-+ | ||
161 | |||
162 | For PI to work, the processes at the right end of these chains (or we may | ||
163 | also call it the Top of the chain) must be equal to or higher in priority | ||
164 | than the processes to the left or below in the chain. | ||
165 | |||
166 | Also since a mutex may have more than one process blocked on it, we can | ||
167 | have multiple chains merge at mutexes. If we add another process G that is | ||
168 | blocked on mutex L2: | ||
169 | |||
170 | G->L2->B->L1->A | ||
171 | |||
172 | And once again, to show how this can grow I will show the merging chains | ||
173 | again. | ||
174 | |||
175 | E->L4->D->L3->C-+ | ||
176 | +->L2-+ | ||
177 | | | | ||
178 | G-+ +->B->L1->A | ||
179 | | | ||
180 | F->L5-+ | ||
181 | |||
182 | |||
183 | Plist | ||
184 | ----- | ||
185 | |||
186 | Before I go further and talk about how the PI chain is stored through lists | ||
187 | on both mutexes and processes, I'll explain the plist. This is similar to | ||
188 | the struct list_head functionality that is already in the kernel. | ||
189 | The implementation of plist is out of scope for this document, but it is | ||
190 | very important to understand what it does. | ||
191 | |||
192 | There are a few differences between plist and list, the most important one | ||
193 | being that plist is a priority sorted linked list. This means that the | ||
194 | priorities of the plist are sorted, such that it takes O(1) to retrieve the | ||
195 | highest priority item in the list. Obviously this is useful to store processes | ||
196 | based on their priorities. | ||
197 | |||
198 | Another difference, which is important for implementation, is that, unlike | ||
199 | list, the head of the list is a different element than the nodes of a list. | ||
200 | So the head of the list is declared as struct plist_head and nodes that will | ||
201 | be added to the list are declared as struct plist_node. | ||
202 | |||
203 | |||
204 | Mutex Waiter List | ||
205 | ----------------- | ||
206 | |||
207 | Every mutex keeps track of all the waiters that are blocked on itself. The mutex | ||
208 | has a plist to store these waiters by priority. This list is protected by | ||
209 | a spin lock that is located in the struct of the mutex. This lock is called | ||
210 | wait_lock. Since the modification of the waiter list is never done in | ||
211 | interrupt context, the wait_lock can be taken without disabling interrupts. | ||
212 | |||
213 | |||
214 | Task PI List | ||
215 | ------------ | ||
216 | |||
217 | To keep track of the PI chains, each process has its own PI list. This is | ||
218 | a list of all top waiters of the mutexes that are owned by the process. | ||
219 | Note that this list only holds the top waiters and not all waiters that are | ||
220 | blocked on mutexes owned by the process. | ||
221 | |||
222 | The top of the task's PI list is always the highest priority task that | ||
223 | is waiting on a mutex that is owned by the task. So if the task has | ||
224 | inherited a priority, it will always be the priority of the task that is | ||
225 | at the top of this list. | ||
226 | |||
227 | This list is stored in the task structure of a process as a plist called | ||
228 | pi_list. This list is protected by a spin lock also in the task structure, | ||
229 | called pi_lock. This lock may also be taken in interrupt context, so when | ||
230 | locking the pi_lock, interrupts must be disabled. | ||
231 | |||
232 | |||
233 | Depth of the PI Chain | ||
234 | --------------------- | ||
235 | |||
236 | The maximum depth of the PI chain is not dynamic, and could actually be | ||
237 | defined. But is very complex to figure it out, since it depends on all | ||
238 | the nesting of mutexes. Let's look at the example where we have 3 mutexes, | ||
239 | L1, L2, and L3, and four separate functions func1, func2, func3 and func4. | ||
240 | The following shows a locking order of L1->L2->L3, but may not actually | ||
241 | be directly nested that way. | ||
242 | |||
243 | void func1(void) | ||
244 | { | ||
245 | mutex_lock(L1); | ||
246 | |||
247 | /* do anything */ | ||
248 | |||
249 | mutex_unlock(L1); | ||
250 | } | ||
251 | |||
252 | void func2(void) | ||
253 | { | ||
254 | mutex_lock(L1); | ||
255 | mutex_lock(L2); | ||
256 | |||
257 | /* do something */ | ||
258 | |||
259 | mutex_unlock(L2); | ||
260 | mutex_unlock(L1); | ||
261 | } | ||
262 | |||
263 | void func3(void) | ||
264 | { | ||
265 | mutex_lock(L2); | ||
266 | mutex_lock(L3); | ||
267 | |||
268 | /* do something else */ | ||
269 | |||
270 | mutex_unlock(L3); | ||
271 | mutex_unlock(L2); | ||
272 | } | ||
273 | |||
274 | void func4(void) | ||
275 | { | ||
276 | mutex_lock(L3); | ||
277 | |||
278 | /* do something again */ | ||
279 | |||
280 | mutex_unlock(L3); | ||
281 | } | ||
282 | |||
283 | Now we add 4 processes that run each of these functions separately. | ||
284 | Processes A, B, C, and D which run functions func1, func2, func3 and func4 | ||
285 | respectively, and such that D runs first and A last. With D being preempted | ||
286 | in func4 in the "do something again" area, we have a locking that follows: | ||
287 | |||
288 | D owns L3 | ||
289 | C blocked on L3 | ||
290 | C owns L2 | ||
291 | B blocked on L2 | ||
292 | B owns L1 | ||
293 | A blocked on L1 | ||
294 | |||
295 | And thus we have the chain A->L1->B->L2->C->L3->D. | ||
296 | |||
297 | This gives us a PI depth of 4 (four processes), but looking at any of the | ||
298 | functions individually, it seems as though they only have at most a locking | ||
299 | depth of two. So, although the locking depth is defined at compile time, | ||
300 | it still is very difficult to find the possibilities of that depth. | ||
301 | |||
302 | Now since mutexes can be defined by user-land applications, we don't want a DOS | ||
303 | type of application that nests large amounts of mutexes to create a large | ||
304 | PI chain, and have the code holding spin locks while looking at a large | ||
305 | amount of data. So to prevent this, the implementation not only implements | ||
306 | a maximum lock depth, but also only holds at most two different locks at a | ||
307 | time, as it walks the PI chain. More about this below. | ||
308 | |||
309 | |||
310 | Mutex owner and flags | ||
311 | --------------------- | ||
312 | |||
313 | The mutex structure contains a pointer to the owner of the mutex. If the | ||
314 | mutex is not owned, this owner is set to NULL. Since all architectures | ||
315 | have the task structure on at least a four byte alignment (and if this is | ||
316 | not true, the rtmutex.c code will be broken!), this allows for the two | ||
317 | least significant bits to be used as flags. This part is also described | ||
318 | in Documentation/rt-mutex.txt, but will also be briefly described here. | ||
319 | |||
320 | Bit 0 is used as the "Pending Owner" flag. This is described later. | ||
321 | Bit 1 is used as the "Has Waiters" flags. This is also described later | ||
322 | in more detail, but is set whenever there are waiters on a mutex. | ||
323 | |||
324 | |||
325 | cmpxchg Tricks | ||
326 | -------------- | ||
327 | |||
328 | Some architectures implement an atomic cmpxchg (Compare and Exchange). This | ||
329 | is used (when applicable) to keep the fast path of grabbing and releasing | ||
330 | mutexes short. | ||
331 | |||
332 | cmpxchg is basically the following function performed atomically: | ||
333 | |||
334 | unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C) | ||
335 | { | ||
336 | unsigned long T = *A; | ||
337 | if (*A == *B) { | ||
338 | *A = *C; | ||
339 | } | ||
340 | return T; | ||
341 | } | ||
342 | #define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c) | ||
343 | |||
344 | This is really nice to have, since it allows you to only update a variable | ||
345 | if the variable is what you expect it to be. You know if it succeeded if | ||
346 | the return value (the old value of A) is equal to B. | ||
347 | |||
348 | The macro rt_mutex_cmpxchg is used to try to lock and unlock mutexes. If | ||
349 | the architecture does not support CMPXCHG, then this macro is simply set | ||
350 | to fail every time. But if CMPXCHG is supported, then this will | ||
351 | help out extremely to keep the fast path short. | ||
352 | |||
353 | The use of rt_mutex_cmpxchg with the flags in the owner field help optimize | ||
354 | the system for architectures that support it. This will also be explained | ||
355 | later in this document. | ||
356 | |||
357 | |||
358 | Priority adjustments | ||
359 | -------------------- | ||
360 | |||
361 | The implementation of the PI code in rtmutex.c has several places that a | ||
362 | process must adjust its priority. With the help of the pi_list of a | ||
363 | process this is rather easy to know what needs to be adjusted. | ||
364 | |||
365 | The functions implementing the task adjustments are rt_mutex_adjust_prio, | ||
366 | __rt_mutex_adjust_prio (same as the former, but expects the task pi_lock | ||
367 | to already be taken), rt_mutex_get_prio, and rt_mutex_setprio. | ||
368 | |||
369 | rt_mutex_getprio and rt_mutex_setprio are only used in __rt_mutex_adjust_prio. | ||
370 | |||
371 | rt_mutex_getprio returns the priority that the task should have. Either the | ||
372 | task's own normal priority, or if a process of a higher priority is waiting on | ||
373 | a mutex owned by the task, then that higher priority should be returned. | ||
374 | Since the pi_list of a task holds an order by priority list of all the top | ||
375 | waiters of all the mutexes that the task owns, rt_mutex_getprio simply needs | ||
376 | to compare the top pi waiter to its own normal priority, and return the higher | ||
377 | priority back. | ||
378 | |||
379 | (Note: if looking at the code, you will notice that the lower number of | ||
380 | prio is returned. This is because the prio field in the task structure | ||
381 | is an inverse order of the actual priority. So a "prio" of 5 is | ||
382 | of higher priority than a "prio" of 10.) | ||
383 | |||
384 | __rt_mutex_adjust_prio examines the result of rt_mutex_getprio, and if the | ||
385 | result does not equal the task's current priority, then rt_mutex_setprio | ||
386 | is called to adjust the priority of the task to the new priority. | ||
387 | Note that rt_mutex_setprio is defined in kernel/sched.c to implement the | ||
388 | actual change in priority. | ||
389 | |||
390 | It is interesting to note that __rt_mutex_adjust_prio can either increase | ||
391 | or decrease the priority of the task. In the case that a higher priority | ||
392 | process has just blocked on a mutex owned by the task, __rt_mutex_adjust_prio | ||
393 | would increase/boost the task's priority. But if a higher priority task | ||
394 | were for some reason to leave the mutex (timeout or signal), this same function | ||
395 | would decrease/unboost the priority of the task. That is because the pi_list | ||
396 | always contains the highest priority task that is waiting on a mutex owned | ||
397 | by the task, so we only need to compare the priority of that top pi waiter | ||
398 | to the normal priority of the given task. | ||
399 | |||
400 | |||
401 | High level overview of the PI chain walk | ||
402 | ---------------------------------------- | ||
403 | |||
404 | The PI chain walk is implemented by the function rt_mutex_adjust_prio_chain. | ||
405 | |||
406 | The implementation has gone through several iterations, and has ended up | ||
407 | with what we believe is the best. It walks the PI chain by only grabbing | ||
408 | at most two locks at a time, and is very efficient. | ||
409 | |||
410 | The rt_mutex_adjust_prio_chain can be used either to boost or lower process | ||
411 | priorities. | ||
412 | |||
413 | rt_mutex_adjust_prio_chain is called with a task to be checked for PI | ||
414 | (de)boosting (the owner of a mutex that a process is blocking on), a flag to | ||
415 | check for deadlocking, the mutex that the task owns, and a pointer to a waiter | ||
416 | that is the process's waiter struct that is blocked on the mutex (although this | ||
417 | parameter may be NULL for deboosting). | ||
418 | |||
419 | For this explanation, I will not mention deadlock detection. This explanation | ||
420 | will try to stay at a high level. | ||
421 | |||
422 | When this function is called, there are no locks held. That also means | ||
423 | that the state of the owner and lock can change when entered into this function. | ||
424 | |||
425 | Before this function is called, the task has already had rt_mutex_adjust_prio | ||
426 | performed on it. This means that the task is set to the priority that it | ||
427 | should be at, but the plist nodes of the task's waiter have not been updated | ||
428 | with the new priorities, and that this task may not be in the proper locations | ||
429 | in the pi_lists and wait_lists that the task is blocked on. This function | ||
430 | solves all that. | ||
431 | |||
432 | A loop is entered, where task is the owner to be checked for PI changes that | ||
433 | was passed by parameter (for the first iteration). The pi_lock of this task is | ||
434 | taken to prevent any more changes to the pi_list of the task. This also | ||
435 | prevents new tasks from completing the blocking on a mutex that is owned by this | ||
436 | task. | ||
437 | |||
438 | If the task is not blocked on a mutex then the loop is exited. We are at | ||
439 | the top of the PI chain. | ||
440 | |||
441 | A check is now done to see if the original waiter (the process that is blocked | ||
442 | on the current mutex) is the top pi waiter of the task. That is, is this | ||
443 | waiter on the top of the task's pi_list. If it is not, it either means that | ||
444 | there is another process higher in priority that is blocked on one of the | ||
445 | mutexes that the task owns, or that the waiter has just woken up via a signal | ||
446 | or timeout and has left the PI chain. In either case, the loop is exited, since | ||
447 | we don't need to do any more changes to the priority of the current task, or any | ||
448 | task that owns a mutex that this current task is waiting on. A priority chain | ||
449 | walk is only needed when a new top pi waiter is made to a task. | ||
450 | |||
451 | The next check sees if the task's waiter plist node has the priority equal to | ||
452 | the priority the task is set at. If they are equal, then we are done with | ||
453 | the loop. Remember that the function started with the priority of the | ||
454 | task adjusted, but the plist nodes that hold the task in other processes | ||
455 | pi_lists have not been adjusted. | ||
456 | |||
457 | Next, we look at the mutex that the task is blocked on. The mutex's wait_lock | ||
458 | is taken. This is done by a spin_trylock, because the locking order of the | ||
459 | pi_lock and wait_lock goes in the opposite direction. If we fail to grab the | ||
460 | lock, the pi_lock is released, and we restart the loop. | ||
461 | |||
462 | Now that we have both the pi_lock of the task as well as the wait_lock of | ||
463 | the mutex the task is blocked on, we update the task's waiter's plist node | ||
464 | that is located on the mutex's wait_list. | ||
465 | |||
466 | Now we release the pi_lock of the task. | ||
467 | |||
468 | Next the owner of the mutex has its pi_lock taken, so we can update the | ||
469 | task's entry in the owner's pi_list. If the task is the highest priority | ||
470 | process on the mutex's wait_list, then we remove the previous top waiter | ||
471 | from the owner's pi_list, and replace it with the task. | ||
472 | |||
473 | Note: It is possible that the task was the current top waiter on the mutex, | ||
474 | in which case the task is not yet on the pi_list of the waiter. This | ||
475 | is OK, since plist_del does nothing if the plist node is not on any | ||
476 | list. | ||
477 | |||
478 | If the task was not the top waiter of the mutex, but it was before we | ||
479 | did the priority updates, that means we are deboosting/lowering the | ||
480 | task. In this case, the task is removed from the pi_list of the owner, | ||
481 | and the new top waiter is added. | ||
482 | |||
483 | Lastly, we unlock both the pi_lock of the task, as well as the mutex's | ||
484 | wait_lock, and continue the loop again. On the next iteration of the | ||
485 | loop, the previous owner of the mutex will be the task that will be | ||
486 | processed. | ||
487 | |||
488 | Note: One might think that the owner of this mutex might have changed | ||
489 | since we just grab the mutex's wait_lock. And one could be right. | ||
490 | The important thing to remember is that the owner could not have | ||
491 | become the task that is being processed in the PI chain, since | ||
492 | we have taken that task's pi_lock at the beginning of the loop. | ||
493 | So as long as there is an owner of this mutex that is not the same | ||
494 | process as the tasked being worked on, we are OK. | ||
495 | |||
496 | Looking closely at the code, one might be confused. The check for the | ||
497 | end of the PI chain is when the task isn't blocked on anything or the | ||
498 | task's waiter structure "task" element is NULL. This check is | ||
499 | protected only by the task's pi_lock. But the code to unlock the mutex | ||
500 | sets the task's waiter structure "task" element to NULL with only | ||
501 | the protection of the mutex's wait_lock, which was not taken yet. | ||
502 | Isn't this a race condition if the task becomes the new owner? | ||
503 | |||
504 | The answer is No! The trick is the spin_trylock of the mutex's | ||
505 | wait_lock. If we fail that lock, we release the pi_lock of the | ||
506 | task and continue the loop, doing the end of PI chain check again. | ||
507 | |||
508 | In the code to release the lock, the wait_lock of the mutex is held | ||
509 | the entire time, and it is not let go when we grab the pi_lock of the | ||
510 | new owner of the mutex. So if the switch of a new owner were to happen | ||
511 | after the check for end of the PI chain and the grabbing of the | ||
512 | wait_lock, the unlocking code would spin on the new owner's pi_lock | ||
513 | but never give up the wait_lock. So the PI chain loop is guaranteed to | ||
514 | fail the spin_trylock on the wait_lock, release the pi_lock, and | ||
515 | try again. | ||
516 | |||
517 | If you don't quite understand the above, that's OK. You don't have to, | ||
518 | unless you really want to make a proof out of it ;) | ||
519 | |||
520 | |||
521 | Pending Owners and Lock stealing | ||
522 | -------------------------------- | ||
523 | |||
524 | One of the flags in the owner field of the mutex structure is "Pending Owner". | ||
525 | What this means is that an owner was chosen by the process releasing the | ||
526 | mutex, but that owner has yet to wake up and actually take the mutex. | ||
527 | |||
528 | Why is this important? Why can't we just give the mutex to another process | ||
529 | and be done with it? | ||
530 | |||
531 | The PI code is to help with real-time processes, and to let the highest | ||
532 | priority process run as long as possible with little latencies and delays. | ||
533 | If a high priority process owns a mutex that a lower priority process is | ||
534 | blocked on, when the mutex is released it would be given to the lower priority | ||
535 | process. What if the higher priority process wants to take that mutex again. | ||
536 | The high priority process would fail to take that mutex that it just gave up | ||
537 | and it would need to boost the lower priority process to run with full | ||
538 | latency of that critical section (since the low priority process just entered | ||
539 | it). | ||
540 | |||
541 | There's no reason a high priority process that gives up a mutex should be | ||
542 | penalized if it tries to take that mutex again. If the new owner of the | ||
543 | mutex has not woken up yet, there's no reason that the higher priority process | ||
544 | could not take that mutex away. | ||
545 | |||
546 | To solve this, we introduced Pending Ownership and Lock Stealing. When a | ||
547 | new process is given a mutex that it was blocked on, it is only given | ||
548 | pending ownership. This means that it's the new owner, unless a higher | ||
549 | priority process comes in and tries to grab that mutex. If a higher priority | ||
550 | process does come along and wants that mutex, we let the higher priority | ||
551 | process "steal" the mutex from the pending owner (only if it is still pending) | ||
552 | and continue with the mutex. | ||
553 | |||
554 | |||
555 | Taking of a mutex (The walk through) | ||
556 | ------------------------------------ | ||
557 | |||
558 | OK, now let's take a look at the detailed walk through of what happens when | ||
559 | taking a mutex. | ||
560 | |||
561 | The first thing that is tried is the fast taking of the mutex. This is | ||
562 | done when we have CMPXCHG enabled (otherwise the fast taking automatically | ||
563 | fails). Only when the owner field of the mutex is NULL can the lock be | ||
564 | taken with the CMPXCHG and nothing else needs to be done. | ||
565 | |||
566 | If there is contention on the lock, whether it is owned or pending owner | ||
567 | we go about the slow path (rt_mutex_slowlock). | ||
568 | |||
569 | The slow path function is where the task's waiter structure is created on | ||
570 | the stack. This is because the waiter structure is only needed for the | ||
571 | scope of this function. The waiter structure holds the nodes to store | ||
572 | the task on the wait_list of the mutex, and if need be, the pi_list of | ||
573 | the owner. | ||
574 | |||
575 | The wait_lock of the mutex is taken since the slow path of unlocking the | ||
576 | mutex also takes this lock. | ||
577 | |||
578 | We then call try_to_take_rt_mutex. This is where the architecture that | ||
579 | does not implement CMPXCHG would always grab the lock (if there's no | ||
580 | contention). | ||
581 | |||
582 | try_to_take_rt_mutex is used every time the task tries to grab a mutex in the | ||
583 | slow path. The first thing that is done here is an atomic setting of | ||
584 | the "Has Waiters" flag of the mutex's owner field. Yes, this could really | ||
585 | be false, because if the the mutex has no owner, there are no waiters and | ||
586 | the current task also won't have any waiters. But we don't have the lock | ||
587 | yet, so we assume we are going to be a waiter. The reason for this is to | ||
588 | play nice for those architectures that do have CMPXCHG. By setting this flag | ||
589 | now, the owner of the mutex can't release the mutex without going into the | ||
590 | slow unlock path, and it would then need to grab the wait_lock, which this | ||
591 | code currently holds. So setting the "Has Waiters" flag forces the owner | ||
592 | to synchronize with this code. | ||
593 | |||
594 | Now that we know that we can't have any races with the owner releasing the | ||
595 | mutex, we check to see if we can take the ownership. This is done if the | ||
596 | mutex doesn't have a owner, or if we can steal the mutex from a pending | ||
597 | owner. Let's look at the situations we have here. | ||
598 | |||
599 | 1) Has owner that is pending | ||
600 | ---------------------------- | ||
601 | |||
602 | The mutex has a owner, but it hasn't woken up and the mutex flag | ||
603 | "Pending Owner" is set. The first check is to see if the owner isn't the | ||
604 | current task. This is because this function is also used for the pending | ||
605 | owner to grab the mutex. When a pending owner wakes up, it checks to see | ||
606 | if it can take the mutex, and this is done if the owner is already set to | ||
607 | itself. If so, we succeed and leave the function, clearing the "Pending | ||
608 | Owner" bit. | ||
609 | |||
610 | If the pending owner is not current, we check to see if the current priority is | ||
611 | higher than the pending owner. If not, we fail the function and return. | ||
612 | |||
613 | There's also something special about a pending owner. That is a pending owner | ||
614 | is never blocked on a mutex. So there is no PI chain to worry about. It also | ||
615 | means that if the mutex doesn't have any waiters, there's no accounting needed | ||
616 | to update the pending owner's pi_list, since we only worry about processes | ||
617 | blocked on the current mutex. | ||
618 | |||
619 | If there are waiters on this mutex, and we just stole the ownership, we need | ||
620 | to take the top waiter, remove it from the pi_list of the pending owner, and | ||
621 | add it to the current pi_list. Note that at this moment, the pending owner | ||
622 | is no longer on the list of waiters. This is fine, since the pending owner | ||
623 | would add itself back when it realizes that it had the ownership stolen | ||
624 | from itself. When the pending owner tries to grab the mutex, it will fail | ||
625 | in try_to_take_rt_mutex if the owner field points to another process. | ||
626 | |||
627 | 2) No owner | ||
628 | ----------- | ||
629 | |||
630 | If there is no owner (or we successfully stole the lock), we set the owner | ||
631 | of the mutex to current, and set the flag of "Has Waiters" if the current | ||
632 | mutex actually has waiters, or we clear the flag if it doesn't. See, it was | ||
633 | OK that we set that flag early, since now it is cleared. | ||
634 | |||
635 | 3) Failed to grab ownership | ||
636 | --------------------------- | ||
637 | |||
638 | The most interesting case is when we fail to take ownership. This means that | ||
639 | there exists an owner, or there's a pending owner with equal or higher | ||
640 | priority than the current task. | ||
641 | |||
642 | We'll continue on the failed case. | ||
643 | |||
644 | If the mutex has a timeout, we set up a timer to go off to break us out | ||
645 | of this mutex if we failed to get it after a specified amount of time. | ||
646 | |||
647 | Now we enter a loop that will continue to try to take ownership of the mutex, or | ||
648 | fail from a timeout or signal. | ||
649 | |||
650 | Once again we try to take the mutex. This will usually fail the first time | ||
651 | in the loop, since it had just failed to get the mutex. But the second time | ||
652 | in the loop, this would likely succeed, since the task would likely be | ||
653 | the pending owner. | ||
654 | |||
655 | If the mutex is TASK_INTERRUPTIBLE a check for signals and timeout is done | ||
656 | here. | ||
657 | |||
658 | The waiter structure has a "task" field that points to the task that is blocked | ||
659 | on the mutex. This field can be NULL the first time it goes through the loop | ||
660 | or if the task is a pending owner and had it's mutex stolen. If the "task" | ||
661 | field is NULL then we need to set up the accounting for it. | ||
662 | |||
663 | Task blocks on mutex | ||
664 | -------------------- | ||
665 | |||
666 | The accounting of a mutex and process is done with the waiter structure of | ||
667 | the process. The "task" field is set to the process, and the "lock" field | ||
668 | to the mutex. The plist nodes are initialized to the processes current | ||
669 | priority. | ||
670 | |||
671 | Since the wait_lock was taken at the entry of the slow lock, we can safely | ||
672 | add the waiter to the wait_list. If the current process is the highest | ||
673 | priority process currently waiting on this mutex, then we remove the | ||
674 | previous top waiter process (if it exists) from the pi_list of the owner, | ||
675 | and add the current process to that list. Since the pi_list of the owner | ||
676 | has changed, we call rt_mutex_adjust_prio on the owner to see if the owner | ||
677 | should adjust its priority accordingly. | ||
678 | |||
679 | If the owner is also blocked on a lock, and had its pi_list changed | ||
680 | (or deadlock checking is on), we unlock the wait_lock of the mutex and go ahead | ||
681 | and run rt_mutex_adjust_prio_chain on the owner, as described earlier. | ||
682 | |||
683 | Now all locks are released, and if the current process is still blocked on a | ||
684 | mutex (waiter "task" field is not NULL), then we go to sleep (call schedule). | ||
685 | |||
686 | Waking up in the loop | ||
687 | --------------------- | ||
688 | |||
689 | The schedule can then wake up for a few reasons. | ||
690 | 1) we were given pending ownership of the mutex. | ||
691 | 2) we received a signal and was TASK_INTERRUPTIBLE | ||
692 | 3) we had a timeout and was TASK_INTERRUPTIBLE | ||
693 | |||
694 | In any of these cases, we continue the loop and once again try to grab the | ||
695 | ownership of the mutex. If we succeed, we exit the loop, otherwise we continue | ||
696 | and on signal and timeout, will exit the loop, or if we had the mutex stolen | ||
697 | we just simply add ourselves back on the lists and go back to sleep. | ||
698 | |||
699 | Note: For various reasons, because of timeout and signals, the steal mutex | ||
700 | algorithm needs to be careful. This is because the current process is | ||
701 | still on the wait_list. And because of dynamic changing of priorities, | ||
702 | especially on SCHED_OTHER tasks, the current process can be the | ||
703 | highest priority task on the wait_list. | ||
704 | |||
705 | Failed to get mutex on Timeout or Signal | ||
706 | ---------------------------------------- | ||
707 | |||
708 | If a timeout or signal occurred, the waiter's "task" field would not be | ||
709 | NULL and the task needs to be taken off the wait_list of the mutex and perhaps | ||
710 | pi_list of the owner. If this process was a high priority process, then | ||
711 | the rt_mutex_adjust_prio_chain needs to be executed again on the owner, | ||
712 | but this time it will be lowering the priorities. | ||
713 | |||
714 | |||
715 | Unlocking the Mutex | ||
716 | ------------------- | ||
717 | |||
718 | The unlocking of a mutex also has a fast path for those architectures with | ||
719 | CMPXCHG. Since the taking of a mutex on contention always sets the | ||
720 | "Has Waiters" flag of the mutex's owner, we use this to know if we need to | ||
721 | take the slow path when unlocking the mutex. If the mutex doesn't have any | ||
722 | waiters, the owner field of the mutex would equal the current process and | ||
723 | the mutex can be unlocked by just replacing the owner field with NULL. | ||
724 | |||
725 | If the owner field has the "Has Waiters" bit set (or CMPXCHG is not available), | ||
726 | the slow unlock path is taken. | ||
727 | |||
728 | The first thing done in the slow unlock path is to take the wait_lock of the | ||
729 | mutex. This synchronizes the locking and unlocking of the mutex. | ||
730 | |||
731 | A check is made to see if the mutex has waiters or not. On architectures that | ||
732 | do not have CMPXCHG, this is the location that the owner of the mutex will | ||
733 | determine if a waiter needs to be awoken or not. On architectures that | ||
734 | do have CMPXCHG, that check is done in the fast path, but it is still needed | ||
735 | in the slow path too. If a waiter of a mutex woke up because of a signal | ||
736 | or timeout between the time the owner failed the fast path CMPXCHG check and | ||
737 | the grabbing of the wait_lock, the mutex may not have any waiters, thus the | ||
738 | owner still needs to make this check. If there are no waiters than the mutex | ||
739 | owner field is set to NULL, the wait_lock is released and nothing more is | ||
740 | needed. | ||
741 | |||
742 | If there are waiters, then we need to wake one up and give that waiter | ||
743 | pending ownership. | ||
744 | |||
745 | On the wake up code, the pi_lock of the current owner is taken. The top | ||
746 | waiter of the lock is found and removed from the wait_list of the mutex | ||
747 | as well as the pi_list of the current owner. The task field of the new | ||
748 | pending owner's waiter structure is set to NULL, and the owner field of the | ||
749 | mutex is set to the new owner with the "Pending Owner" bit set, as well | ||
750 | as the "Has Waiters" bit if there still are other processes blocked on the | ||
751 | mutex. | ||
752 | |||
753 | The pi_lock of the previous owner is released, and the new pending owner's | ||
754 | pi_lock is taken. Remember that this is the trick to prevent the race | ||
755 | condition in rt_mutex_adjust_prio_chain from adding itself as a waiter | ||
756 | on the mutex. | ||
757 | |||
758 | We now clear the "pi_blocked_on" field of the new pending owner, and if | ||
759 | the mutex still has waiters pending, we add the new top waiter to the pi_list | ||
760 | of the pending owner. | ||
761 | |||
762 | Finally we unlock the pi_lock of the pending owner and wake it up. | ||
763 | |||
764 | |||
765 | Contact | ||
766 | ------- | ||
767 | |||
768 | For updates on this document, please email Steven Rostedt <rostedt@goodmis.org> | ||
769 | |||
770 | |||
771 | Credits | ||
772 | ------- | ||
773 | |||
774 | Author: Steven Rostedt <rostedt@goodmis.org> | ||
775 | |||
776 | Reviewers: Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and Randy Dunlap | ||
777 | |||
778 | Updates | ||
779 | ------- | ||
780 | |||
781 | This document was originally written for 2.6.17-rc3-mm1 | ||
diff --git a/Documentation/rt-mutex.txt b/Documentation/rt-mutex.txt new file mode 100644 index 000000000000..243393d882ee --- /dev/null +++ b/Documentation/rt-mutex.txt | |||
@@ -0,0 +1,79 @@ | |||
1 | RT-mutex subsystem with PI support | ||
2 | ---------------------------------- | ||
3 | |||
4 | RT-mutexes with priority inheritance are used to support PI-futexes, | ||
5 | which enable pthread_mutex_t priority inheritance attributes | ||
6 | (PTHREAD_PRIO_INHERIT). [See Documentation/pi-futex.txt for more details | ||
7 | about PI-futexes.] | ||
8 | |||
9 | This technology was developed in the -rt tree and streamlined for | ||
10 | pthread_mutex support. | ||
11 | |||
12 | Basic principles: | ||
13 | ----------------- | ||
14 | |||
15 | RT-mutexes extend the semantics of simple mutexes by the priority | ||
16 | inheritance protocol. | ||
17 | |||
18 | A low priority owner of a rt-mutex inherits the priority of a higher | ||
19 | priority waiter until the rt-mutex is released. If the temporarily | ||
20 | boosted owner blocks on a rt-mutex itself it propagates the priority | ||
21 | boosting to the owner of the other rt_mutex it gets blocked on. The | ||
22 | priority boosting is immediately removed once the rt_mutex has been | ||
23 | unlocked. | ||
24 | |||
25 | This approach allows us to shorten the block of high-prio tasks on | ||
26 | mutexes which protect shared resources. Priority inheritance is not a | ||
27 | magic bullet for poorly designed applications, but it allows | ||
28 | well-designed applications to use userspace locks in critical parts of | ||
29 | an high priority thread, without losing determinism. | ||
30 | |||
31 | The enqueueing of the waiters into the rtmutex waiter list is done in | ||
32 | priority order. For same priorities FIFO order is chosen. For each | ||
33 | rtmutex, only the top priority waiter is enqueued into the owner's | ||
34 | priority waiters list. This list too queues in priority order. Whenever | ||
35 | the top priority waiter of a task changes (for example it timed out or | ||
36 | got a signal), the priority of the owner task is readjusted. [The | ||
37 | priority enqueueing is handled by "plists", see include/linux/plist.h | ||
38 | for more details.] | ||
39 | |||
40 | RT-mutexes are optimized for fastpath operations and have no internal | ||
41 | locking overhead when locking an uncontended mutex or unlocking a mutex | ||
42 | without waiters. The optimized fastpath operations require cmpxchg | ||
43 | support. [If that is not available then the rt-mutex internal spinlock | ||
44 | is used] | ||
45 | |||
46 | The state of the rt-mutex is tracked via the owner field of the rt-mutex | ||
47 | structure: | ||
48 | |||
49 | rt_mutex->owner holds the task_struct pointer of the owner. Bit 0 and 1 | ||
50 | are used to keep track of the "owner is pending" and "rtmutex has | ||
51 | waiters" state. | ||
52 | |||
53 | owner bit1 bit0 | ||
54 | NULL 0 0 mutex is free (fast acquire possible) | ||
55 | NULL 0 1 invalid state | ||
56 | NULL 1 0 Transitional state* | ||
57 | NULL 1 1 invalid state | ||
58 | taskpointer 0 0 mutex is held (fast release possible) | ||
59 | taskpointer 0 1 task is pending owner | ||
60 | taskpointer 1 0 mutex is held and has waiters | ||
61 | taskpointer 1 1 task is pending owner and mutex has waiters | ||
62 | |||
63 | Pending-ownership handling is a performance optimization: | ||
64 | pending-ownership is assigned to the first (highest priority) waiter of | ||
65 | the mutex, when the mutex is released. The thread is woken up and once | ||
66 | it starts executing it can acquire the mutex. Until the mutex is taken | ||
67 | by it (bit 0 is cleared) a competing higher priority thread can "steal" | ||
68 | the mutex which puts the woken up thread back on the waiters list. | ||
69 | |||
70 | The pending-ownership optimization is especially important for the | ||
71 | uninterrupted workflow of high-prio tasks which repeatedly | ||
72 | takes/releases locks that have lower-prio waiters. Without this | ||
73 | optimization the higher-prio thread would ping-pong to the lower-prio | ||
74 | task [because at unlock time we always assign a new owner]. | ||
75 | |||
76 | (*) The "mutex has waiters" bit gets set to take the lock. If the lock | ||
77 | doesn't already have an owner, this bit is quickly cleared if there are | ||
78 | no waiters. So this is a transitional state to synchronize with looking | ||
79 | at the owner field of the mutex and the mutex owner releasing the lock. | ||
diff --git a/Documentation/rtc.txt b/Documentation/rtc.txt index 95d17b3e2eee..2a58f985795a 100644 --- a/Documentation/rtc.txt +++ b/Documentation/rtc.txt | |||
@@ -44,8 +44,10 @@ normal timer interrupt, which is 100Hz. | |||
44 | Programming and/or enabling interrupt frequencies greater than 64Hz is | 44 | Programming and/or enabling interrupt frequencies greater than 64Hz is |
45 | only allowed by root. This is perhaps a bit conservative, but we don't want | 45 | only allowed by root. This is perhaps a bit conservative, but we don't want |
46 | an evil user generating lots of IRQs on a slow 386sx-16, where it might have | 46 | an evil user generating lots of IRQs on a slow 386sx-16, where it might have |
47 | a negative impact on performance. Note that the interrupt handler is only | 47 | a negative impact on performance. This 64Hz limit can be changed by writing |
48 | a few lines of code to minimize any possibility of this effect. | 48 | a different value to /proc/sys/dev/rtc/max-user-freq. Note that the |
49 | interrupt handler is only a few lines of code to minimize any possibility | ||
50 | of this effect. | ||
49 | 51 | ||
50 | Also, if the kernel time is synchronized with an external source, the | 52 | Also, if the kernel time is synchronized with an external source, the |
51 | kernel will write the time back to the CMOS clock every 11 minutes. In | 53 | kernel will write the time back to the CMOS clock every 11 minutes. In |
@@ -81,6 +83,7 @@ that will be using this driver. | |||
81 | */ | 83 | */ |
82 | 84 | ||
83 | #include <stdio.h> | 85 | #include <stdio.h> |
86 | #include <stdlib.h> | ||
84 | #include <linux/rtc.h> | 87 | #include <linux/rtc.h> |
85 | #include <sys/ioctl.h> | 88 | #include <sys/ioctl.h> |
86 | #include <sys/time.h> | 89 | #include <sys/time.h> |
diff --git a/Documentation/scsi/ChangeLog.arcmsr b/Documentation/scsi/ChangeLog.arcmsr new file mode 100644 index 000000000000..162c47fdf45f --- /dev/null +++ b/Documentation/scsi/ChangeLog.arcmsr | |||
@@ -0,0 +1,56 @@ | |||
1 | ************************************************************************** | ||
2 | ** History | ||
3 | ** | ||
4 | ** REV# DATE NAME DESCRIPTION | ||
5 | ** 1.00.00.00 3/31/2004 Erich Chen First release | ||
6 | ** 1.10.00.04 7/28/2004 Erich Chen modify for ioctl | ||
7 | ** 1.10.00.06 8/28/2004 Erich Chen modify for 2.6.x | ||
8 | ** 1.10.00.08 9/28/2004 Erich Chen modify for x86_64 | ||
9 | ** 1.10.00.10 10/10/2004 Erich Chen bug fix for SMP & ioctl | ||
10 | ** 1.20.00.00 11/29/2004 Erich Chen bug fix with arcmsr_bus_reset when PHY error | ||
11 | ** 1.20.00.02 12/09/2004 Erich Chen bug fix with over 2T bytes RAID Volume | ||
12 | ** 1.20.00.04 1/09/2005 Erich Chen fits for Debian linux kernel version 2.2.xx | ||
13 | ** 1.20.00.05 2/20/2005 Erich Chen cleanly as look like a Linux driver at 2.6.x | ||
14 | ** thanks for peoples kindness comment | ||
15 | ** Kornel Wieliczek | ||
16 | ** Christoph Hellwig | ||
17 | ** Adrian Bunk | ||
18 | ** Andrew Morton | ||
19 | ** Christoph Hellwig | ||
20 | ** James Bottomley | ||
21 | ** Arjan van de Ven | ||
22 | ** 1.20.00.06 3/12/2005 Erich Chen fix with arcmsr_pci_unmap_dma "unsigned long" cast, | ||
23 | ** modify PCCB POOL allocated by "dma_alloc_coherent" | ||
24 | ** (Kornel Wieliczek's comment) | ||
25 | ** 1.20.00.07 3/23/2005 Erich Chen bug fix with arcmsr_scsi_host_template_init | ||
26 | ** occur segmentation fault, | ||
27 | ** if RAID adapter does not on PCI slot | ||
28 | ** and modprobe/rmmod this driver twice. | ||
29 | ** bug fix enormous stack usage (Adrian Bunk's comment) | ||
30 | ** 1.20.00.08 6/23/2005 Erich Chen bug fix with abort command, | ||
31 | ** in case of heavy loading when sata cable | ||
32 | ** working on low quality connection | ||
33 | ** 1.20.00.09 9/12/2005 Erich Chen bug fix with abort command handling, firmware version check | ||
34 | ** and firmware update notify for hardware bug fix | ||
35 | ** 1.20.00.10 9/23/2005 Erich Chen enhance sysfs function for change driver's max tag Q number. | ||
36 | ** add DMA_64BIT_MASK for backward compatible with all 2.6.x | ||
37 | ** add some useful message for abort command | ||
38 | ** add ioctl code 'ARCMSR_IOCTL_FLUSH_ADAPTER_CACHE' | ||
39 | ** customer can send this command for sync raid volume data | ||
40 | ** 1.20.00.11 9/29/2005 Erich Chen by comment of Arjan van de Ven fix incorrect msleep redefine | ||
41 | ** cast off sizeof(dma_addr_t) condition for 64bit pci_set_dma_mask | ||
42 | ** 1.20.00.12 9/30/2005 Erich Chen bug fix with 64bit platform's ccbs using if over 4G system memory | ||
43 | ** change 64bit pci_set_consistent_dma_mask into 32bit | ||
44 | ** increcct adapter count if adapter initialize fail. | ||
45 | ** miss edit at arcmsr_build_ccb.... | ||
46 | ** psge += sizeof(struct _SG64ENTRY *) => | ||
47 | ** psge += sizeof(struct _SG64ENTRY) | ||
48 | ** 64 bits sg entry would be incorrectly calculated | ||
49 | ** thanks Kornel Wieliczek give me kindly notify | ||
50 | ** and detail description | ||
51 | ** 1.20.00.13 11/15/2005 Erich Chen scheduling pending ccb with FIFO | ||
52 | ** change the architecture of arcmsr command queue list | ||
53 | ** for linux standard list | ||
54 | ** enable usage of pci message signal interrupt | ||
55 | ** follow Randy.Danlup kindness suggestion cleanup this code | ||
56 | ************************************************************************** \ No newline at end of file | ||
diff --git a/Documentation/scsi/ChangeLog.megaraid b/Documentation/scsi/ChangeLog.megaraid index c173806c91fa..a056bbe67c7e 100644 --- a/Documentation/scsi/ChangeLog.megaraid +++ b/Documentation/scsi/ChangeLog.megaraid | |||
@@ -1,3 +1,126 @@ | |||
1 | Release Date : Fri May 19 09:31:45 EST 2006 - Seokmann Ju <sju@lsil.com> | ||
2 | Current Version : 2.20.4.9 (scsi module), 2.20.2.6 (cmm module) | ||
3 | Older Version : 2.20.4.8 (scsi module), 2.20.2.6 (cmm module) | ||
4 | |||
5 | 1. Fixed a bug in megaraid_init_mbox(). | ||
6 | Customer reported "garbage in file on x86_64 platform". | ||
7 | Root Cause: the driver registered controllers as 64-bit DMA capable | ||
8 | for those which are not support it. | ||
9 | Fix: Made change in the function inserting identification machanism | ||
10 | identifying 64-bit DMA capable controllers. | ||
11 | |||
12 | > -----Original Message----- | ||
13 | > From: Vasily Averin [mailto:vvs@sw.ru] | ||
14 | > Sent: Thursday, May 04, 2006 2:49 PM | ||
15 | > To: linux-scsi@vger.kernel.org; Kolli, Neela; Mukker, Atul; | ||
16 | > Ju, Seokmann; Bagalkote, Sreenivas; | ||
17 | > James.Bottomley@SteelEye.com; devel@openvz.org | ||
18 | > Subject: megaraid_mbox: garbage in file | ||
19 | > | ||
20 | > Hello all, | ||
21 | > | ||
22 | > I've investigated customers claim on the unstable work of | ||
23 | > their node and found a | ||
24 | > strange effect: reading from some files leads to the | ||
25 | > "attempt to access beyond end of device" messages. | ||
26 | > | ||
27 | > I've checked filesystem, memory on the node, motherboard BIOS | ||
28 | > version, but it | ||
29 | > does not help and issue still has been reproduced by simple | ||
30 | > file reading. | ||
31 | > | ||
32 | > Reproducer is simple: | ||
33 | > | ||
34 | > echo 0xffffffff >/proc/sys/dev/scsi/logging_level ; | ||
35 | > cat /vz/private/101/root/etc/ld.so.cache >/tmp/ttt ; | ||
36 | > echo 0 >/proc/sys/dev/scsi/logging | ||
37 | > | ||
38 | > It leads to the following messages in dmesg | ||
39 | > | ||
40 | > sd_init_command: disk=sda, block=871769260, count=26 | ||
41 | > sda : block=871769260 | ||
42 | > sda : reading 26/26 512 byte blocks. | ||
43 | > scsi_add_timer: scmd: f79ed980, time: 7500, (c02b1420) | ||
44 | > sd 0:1:0:0: send 0xf79ed980 sd 0:1:0:0: | ||
45 | > command: Read (10): 28 00 33 f6 24 ac 00 00 1a 00 | ||
46 | > buffer = 0xf7cfb540, bufflen = 13312, done = 0xc0366b40, | ||
47 | > queuecommand 0xc0344010 | ||
48 | > leaving scsi_dispatch_cmnd() | ||
49 | > scsi_delete_timer: scmd: f79ed980, rtn: 1 | ||
50 | > sd 0:1:0:0: done 0xf79ed980 SUCCESS 0 sd 0:1:0:0: | ||
51 | > command: Read (10): 28 00 33 f6 24 ac 00 00 1a 00 | ||
52 | > scsi host busy 1 failed 0 | ||
53 | > sd 0:1:0:0: Notifying upper driver of completion (result 0) | ||
54 | > sd_rw_intr: sda: res=0x0 | ||
55 | > 26 sectors total, 13312 bytes done. | ||
56 | > use_sg is 4 | ||
57 | > attempt to access beyond end of device | ||
58 | > sda6: rw=0, want=1044134458, limit=951401367 | ||
59 | > Buffer I/O error on device sda6, logical block 522067228 | ||
60 | > attempt to access beyond end of device | ||
61 | |||
62 | 2. When INQUIRY with EVPD bit set issued to the MegaRAID controller, | ||
63 | system memory gets corrupted. | ||
64 | Root Cause: MegaRAID F/W handle the INQUIRY with EVPD bit set | ||
65 | incorrectly. | ||
66 | Fix: MegaRAID F/W has fixed the problem and being process of release, | ||
67 | soon. Meanwhile, driver will filter out the request. | ||
68 | |||
69 | 3. One of member in the data structure of the driver leads unaligne | ||
70 | issue on 64-bit platform. | ||
71 | Customer reporeted "kernel unaligned access addrss" issue when | ||
72 | application communicates with MegaRAID HBA driver. | ||
73 | Root Cause: in uioc_t structure, one of member had misaligned and it | ||
74 | led system to display the error message. | ||
75 | Fix: A patch submitted to community from following folk. | ||
76 | |||
77 | > -----Original Message----- | ||
78 | > From: linux-scsi-owner@vger.kernel.org | ||
79 | > [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Sakurai Hiroomi | ||
80 | > Sent: Wednesday, July 12, 2006 4:20 AM | ||
81 | > To: linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org | ||
82 | > Subject: Re: Help: strange messages from kernel on IA64 platform | ||
83 | > | ||
84 | > Hi, | ||
85 | > | ||
86 | > I saw same message. | ||
87 | > | ||
88 | > When GAM(Global Array Manager) is started, The following | ||
89 | > message output. | ||
90 | > kernel: kernel unaligned access to 0xe0000001fe1080d4, | ||
91 | > ip=0xa000000200053371 | ||
92 | > | ||
93 | > The uioc structure used by ioctl is defined by packed, | ||
94 | > the allignment of each member are disturbed. | ||
95 | > In a 64 bit structure, the allignment of member doesn't fit 64 bit | ||
96 | > boundary. this causes this messages. | ||
97 | > In a 32 bit structure, we don't see the message because the allinment | ||
98 | > of member fit 32 bit boundary even if packed is specified. | ||
99 | > | ||
100 | > patch | ||
101 | > I Add 32 bit dummy member to fit 64 bit boundary. I tested. | ||
102 | > We confirmed this patch fix the problem by IA64 server. | ||
103 | > | ||
104 | > ************************************************************** | ||
105 | > **************** | ||
106 | > --- linux-2.6.9/drivers/scsi/megaraid/megaraid_ioctl.h.orig | ||
107 | > 2006-04-03 17:13:03.000000000 +0900 | ||
108 | > +++ linux-2.6.9/drivers/scsi/megaraid/megaraid_ioctl.h | ||
109 | > 2006-04-03 17:14:09.000000000 +0900 | ||
110 | > @@ -132,6 +132,10 @@ | ||
111 | > /* Driver Data: */ | ||
112 | > void __user * user_data; | ||
113 | > uint32_t user_data_len; | ||
114 | > + | ||
115 | > + /* 64bit alignment */ | ||
116 | > + uint32_t pad_0xBC; | ||
117 | > + | ||
118 | > mraid_passthru_t __user *user_pthru; | ||
119 | > | ||
120 | > mraid_passthru_t *pthru32; | ||
121 | > ************************************************************** | ||
122 | > **************** | ||
123 | |||
1 | Release Date : Mon Apr 11 12:27:22 EST 2006 - Seokmann Ju <sju@lsil.com> | 124 | Release Date : Mon Apr 11 12:27:22 EST 2006 - Seokmann Ju <sju@lsil.com> |
2 | Current Version : 2.20.4.8 (scsi module), 2.20.2.6 (cmm module) | 125 | Current Version : 2.20.4.8 (scsi module), 2.20.2.6 (cmm module) |
3 | Older Version : 2.20.4.7 (scsi module), 2.20.2.6 (cmm module) | 126 | Older Version : 2.20.4.7 (scsi module), 2.20.2.6 (cmm module) |
diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas index 0a85a7e8120e..d9e5960dafd5 100644 --- a/Documentation/scsi/ChangeLog.megaraid_sas +++ b/Documentation/scsi/ChangeLog.megaraid_sas | |||
@@ -1,4 +1,20 @@ | |||
1 | 1 | ||
2 | 1 Release Date : Sun May 14 22:49:52 PDT 2006 - Sumant Patro <Sumant.Patro@lsil.com> | ||
3 | 2 Current Version : 00.00.03.01 | ||
4 | 3 Older Version : 00.00.02.04 | ||
5 | |||
6 | i. Added support for ZCR controller. | ||
7 | |||
8 | New device id 0x413 added. | ||
9 | |||
10 | ii. Bug fix : Disable controller interrupt before firing INIT cmd to FW. | ||
11 | |||
12 | Interrupt is enabled after required initialization is over. | ||
13 | This is done to ensure that driver is ready to handle interrupts when | ||
14 | it is generated by the controller. | ||
15 | |||
16 | -Sumant Patro <Sumant.Patro@lsil.com> | ||
17 | |||
2 | 1 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com> | 18 | 1 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com> |
3 | 2 Current Version : 00.00.02.04 | 19 | 2 Current Version : 00.00.02.04 |
4 | 3 Older Version : 00.00.02.04 | 20 | 3 Older Version : 00.00.02.04 |
diff --git a/Documentation/scsi/aacraid.txt b/Documentation/scsi/aacraid.txt index be55670851a4..ee03678c8029 100644 --- a/Documentation/scsi/aacraid.txt +++ b/Documentation/scsi/aacraid.txt | |||
@@ -11,38 +11,43 @@ the original). | |||
11 | Supported Cards/Chipsets | 11 | Supported Cards/Chipsets |
12 | ------------------------- | 12 | ------------------------- |
13 | PCI ID (pci.ids) OEM Product | 13 | PCI ID (pci.ids) OEM Product |
14 | 9005:0285:9005:028a Adaptec 2020ZCR (Skyhawk) | 14 | 9005:0283:9005:0283 Adaptec Catapult (3210S with arc firmware) |
15 | 9005:0285:9005:028e Adaptec 2020SA (Skyhawk) | 15 | 9005:0284:9005:0284 Adaptec Tomcat (3410S with arc firmware) |
16 | 9005:0285:9005:028b Adaptec 2025ZCR (Terminator) | ||
17 | 9005:0285:9005:028f Adaptec 2025SA (Terminator) | ||
18 | 9005:0285:9005:0286 Adaptec 2120S (Crusader) | ||
19 | 9005:0286:9005:028d Adaptec 2130S (Lancer) | ||
20 | 9005:0285:9005:0285 Adaptec 2200S (Vulcan) | 16 | 9005:0285:9005:0285 Adaptec 2200S (Vulcan) |
17 | 9005:0285:9005:0286 Adaptec 2120S (Crusader) | ||
21 | 9005:0285:9005:0287 Adaptec 2200S (Vulcan-2m) | 18 | 9005:0285:9005:0287 Adaptec 2200S (Vulcan-2m) |
19 | 9005:0285:9005:0288 Adaptec 3230S (Harrier) | ||
20 | 9005:0285:9005:0289 Adaptec 3240S (Tornado) | ||
21 | 9005:0285:9005:028a Adaptec 2020ZCR (Skyhawk) | ||
22 | 9005:0285:9005:028b Adaptec 2025ZCR (Terminator) | ||
22 | 9005:0286:9005:028c Adaptec 2230S (Lancer) | 23 | 9005:0286:9005:028c Adaptec 2230S (Lancer) |
23 | 9005:0286:9005:028c Adaptec 2230SLP (Lancer) | 24 | 9005:0286:9005:028c Adaptec 2230SLP (Lancer) |
24 | 9005:0285:9005:0296 Adaptec 2240S (SabreExpress) | 25 | 9005:0286:9005:028d Adaptec 2130S (Lancer) |
26 | 9005:0285:9005:028e Adaptec 2020SA (Skyhawk) | ||
27 | 9005:0285:9005:028f Adaptec 2025SA (Terminator) | ||
25 | 9005:0285:9005:0290 Adaptec 2410SA (Jaguar) | 28 | 9005:0285:9005:0290 Adaptec 2410SA (Jaguar) |
26 | 9005:0285:9005:0293 Adaptec 21610SA (Corsair-16) | ||
27 | 9005:0285:103c:3227 Adaptec 2610SA (Bearcat HP release) | 29 | 9005:0285:103c:3227 Adaptec 2610SA (Bearcat HP release) |
30 | 9005:0285:9005:0293 Adaptec 21610SA (Corsair-16) | ||
31 | 9005:0285:9005:0296 Adaptec 2240S (SabreExpress) | ||
28 | 9005:0285:9005:0292 Adaptec 2810SA (Corsair-8) | 32 | 9005:0285:9005:0292 Adaptec 2810SA (Corsair-8) |
29 | 9005:0285:9005:0294 Adaptec Prowler | 33 | 9005:0285:9005:0294 Adaptec Prowler |
30 | 9005:0286:9005:029d Adaptec 2420SA (Intruder HP release) | ||
31 | 9005:0286:9005:029c Adaptec 2620SA (Intruder) | ||
32 | 9005:0286:9005:029b Adaptec 2820SA (Intruder) | ||
33 | 9005:0286:9005:02a7 Adaptec 2830SA (Skyray) | ||
34 | 9005:0286:9005:02a8 Adaptec 2430SA (Skyray) | ||
35 | 9005:0285:9005:0288 Adaptec 3230S (Harrier) | ||
36 | 9005:0285:9005:0289 Adaptec 3240S (Tornado) | ||
37 | 9005:0285:9005:0298 Adaptec 4000SAS (BlackBird) | ||
38 | 9005:0285:9005:0297 Adaptec 4005SAS (AvonPark) | 34 | 9005:0285:9005:0297 Adaptec 4005SAS (AvonPark) |
35 | 9005:0285:9005:0298 Adaptec 4000SAS (BlackBird) | ||
39 | 9005:0285:9005:0299 Adaptec 4800SAS (Marauder-X) | 36 | 9005:0285:9005:0299 Adaptec 4800SAS (Marauder-X) |
40 | 9005:0285:9005:029a Adaptec 4805SAS (Marauder-E) | 37 | 9005:0285:9005:029a Adaptec 4805SAS (Marauder-E) |
38 | 9005:0286:9005:029b Adaptec 2820SA (Intruder) | ||
39 | 9005:0286:9005:029c Adaptec 2620SA (Intruder) | ||
40 | 9005:0286:9005:029d Adaptec 2420SA (Intruder HP release) | ||
41 | 9005:0286:9005:02a2 Adaptec 3800SAS (Hurricane44) | 41 | 9005:0286:9005:02a2 Adaptec 3800SAS (Hurricane44) |
42 | 9005:0286:9005:02a7 Adaptec 3805SAS (Hurricane80) | ||
43 | 9005:0286:9005:02a8 Adaptec 3400SAS (Hurricane40) | ||
44 | 9005:0286:9005:02ac Adaptec 1800SAS (Typhoon44) | ||
45 | 9005:0286:9005:02b3 Adaptec 2400SAS (Hurricane40lm) | ||
46 | 9005:0285:9005:02b5 Adaptec ASR5800 (Voodoo44) | ||
47 | 9005:0285:9005:02b6 Adaptec ASR5805 (Voodoo80) | ||
48 | 9005:0285:9005:02b7 Adaptec ASR5808 (Voodoo08) | ||
42 | 1011:0046:9005:0364 Adaptec 5400S (Mustang) | 49 | 1011:0046:9005:0364 Adaptec 5400S (Mustang) |
43 | 1011:0046:9005:0365 Adaptec 5400S (Mustang) | 50 | 1011:0046:9005:0365 Adaptec 5400S (Mustang) |
44 | 9005:0283:9005:0283 Adaptec Catapult (3210S with arc firmware) | ||
45 | 9005:0284:9005:0284 Adaptec Tomcat (3410S with arc firmware) | ||
46 | 9005:0287:9005:0800 Adaptec Themisto (Jupiter) | 51 | 9005:0287:9005:0800 Adaptec Themisto (Jupiter) |
47 | 9005:0200:9005:0200 Adaptec Themisto (Jupiter) | 52 | 9005:0200:9005:0200 Adaptec Themisto (Jupiter) |
48 | 9005:0286:9005:0800 Adaptec Callisto (Jupiter) | 53 | 9005:0286:9005:0800 Adaptec Callisto (Jupiter) |
@@ -64,18 +69,20 @@ Supported Cards/Chipsets | |||
64 | 9005:0285:9005:0290 IBM ServeRAID 7t (Jaguar) | 69 | 9005:0285:9005:0290 IBM ServeRAID 7t (Jaguar) |
65 | 9005:0285:1014:02F2 IBM ServeRAID 8i (AvonPark) | 70 | 9005:0285:1014:02F2 IBM ServeRAID 8i (AvonPark) |
66 | 9005:0285:1014:0312 IBM ServeRAID 8i (AvonParkLite) | 71 | 9005:0285:1014:0312 IBM ServeRAID 8i (AvonParkLite) |
67 | 9005:0286:1014:9580 IBM ServeRAID 8k/8k-l8 (Aurora) | ||
68 | 9005:0286:1014:9540 IBM ServeRAID 8k/8k-l4 (AuroraLite) | 72 | 9005:0286:1014:9540 IBM ServeRAID 8k/8k-l4 (AuroraLite) |
69 | 9005:0286:9005:029f ICP ICP9014R0 (Lancer) | 73 | 9005:0286:1014:9580 IBM ServeRAID 8k/8k-l8 (Aurora) |
74 | 9005:0286:1014:034d IBM ServeRAID 8s (Hurricane) | ||
70 | 9005:0286:9005:029e ICP ICP9024R0 (Lancer) | 75 | 9005:0286:9005:029e ICP ICP9024R0 (Lancer) |
76 | 9005:0286:9005:029f ICP ICP9014R0 (Lancer) | ||
71 | 9005:0286:9005:02a0 ICP ICP9047MA (Lancer) | 77 | 9005:0286:9005:02a0 ICP ICP9047MA (Lancer) |
72 | 9005:0286:9005:02a1 ICP ICP9087MA (Lancer) | 78 | 9005:0286:9005:02a1 ICP ICP9087MA (Lancer) |
79 | 9005:0286:9005:02a3 ICP ICP5445AU (Hurricane44) | ||
73 | 9005:0286:9005:02a4 ICP ICP9085LI (Marauder-X) | 80 | 9005:0286:9005:02a4 ICP ICP9085LI (Marauder-X) |
74 | 9005:0286:9005:02a5 ICP ICP5085BR (Marauder-E) | 81 | 9005:0286:9005:02a5 ICP ICP5085BR (Marauder-E) |
75 | 9005:0286:9005:02a3 ICP ICP5445AU (Hurricane44) | ||
76 | 9005:0286:9005:02a6 ICP ICP9067MA (Intruder-6) | 82 | 9005:0286:9005:02a6 ICP ICP9067MA (Intruder-6) |
77 | 9005:0286:9005:02a9 ICP ICP5087AU (Skyray) | 83 | 9005:0286:9005:02a9 ICP ICP5085AU (Hurricane80) |
78 | 9005:0286:9005:02aa ICP ICP5047AU (Skyray) | 84 | 9005:0286:9005:02aa ICP ICP5045AU (Hurricane40) |
85 | 9005:0286:9005:02b4 ICP ICP5045AL (Hurricane40lm) | ||
79 | 86 | ||
80 | People | 87 | People |
81 | ------------------------- | 88 | ------------------------- |
diff --git a/Documentation/scsi/arcmsr_spec.txt b/Documentation/scsi/arcmsr_spec.txt new file mode 100644 index 000000000000..5e0042340fd3 --- /dev/null +++ b/Documentation/scsi/arcmsr_spec.txt | |||
@@ -0,0 +1,574 @@ | |||
1 | ******************************************************************************* | ||
2 | ** ARECA FIRMWARE SPEC | ||
3 | ******************************************************************************* | ||
4 | ** Usage of IOP331 adapter | ||
5 | ** (All In/Out is in IOP331's view) | ||
6 | ** 1. Message 0 --> InitThread message and retrun code | ||
7 | ** 2. Doorbell is used for RS-232 emulation | ||
8 | ** inDoorBell : bit0 -- data in ready | ||
9 | ** (DRIVER DATA WRITE OK) | ||
10 | ** bit1 -- data out has been read | ||
11 | ** (DRIVER DATA READ OK) | ||
12 | ** outDooeBell: bit0 -- data out ready | ||
13 | ** (IOP331 DATA WRITE OK) | ||
14 | ** bit1 -- data in has been read | ||
15 | ** (IOP331 DATA READ OK) | ||
16 | ** 3. Index Memory Usage | ||
17 | ** offset 0xf00 : for RS232 out (request buffer) | ||
18 | ** offset 0xe00 : for RS232 in (scratch buffer) | ||
19 | ** offset 0xa00 : for inbound message code message_rwbuffer | ||
20 | ** (driver send to IOP331) | ||
21 | ** offset 0xa00 : for outbound message code message_rwbuffer | ||
22 | ** (IOP331 send to driver) | ||
23 | ** 4. RS-232 emulation | ||
24 | ** Currently 128 byte buffer is used | ||
25 | ** 1st uint32_t : Data length (1--124) | ||
26 | ** Byte 4--127 : Max 124 bytes of data | ||
27 | ** 5. PostQ | ||
28 | ** All SCSI Command must be sent through postQ: | ||
29 | ** (inbound queue port) Request frame must be 32 bytes aligned | ||
30 | ** #bit27--bit31 => flag for post ccb | ||
31 | ** #bit0--bit26 => real address (bit27--bit31) of post arcmsr_cdb | ||
32 | ** bit31 : | ||
33 | ** 0 : 256 bytes frame | ||
34 | ** 1 : 512 bytes frame | ||
35 | ** bit30 : | ||
36 | ** 0 : normal request | ||
37 | ** 1 : BIOS request | ||
38 | ** bit29 : reserved | ||
39 | ** bit28 : reserved | ||
40 | ** bit27 : reserved | ||
41 | ** --------------------------------------------------------------------------- | ||
42 | ** (outbount queue port) Request reply | ||
43 | ** #bit27--bit31 | ||
44 | ** => flag for reply | ||
45 | ** #bit0--bit26 | ||
46 | ** => real address (bit27--bit31) of reply arcmsr_cdb | ||
47 | ** bit31 : must be 0 (for this type of reply) | ||
48 | ** bit30 : reserved for BIOS handshake | ||
49 | ** bit29 : reserved | ||
50 | ** bit28 : | ||
51 | ** 0 : no error, ignore AdapStatus/DevStatus/SenseData | ||
52 | ** 1 : Error, error code in AdapStatus/DevStatus/SenseData | ||
53 | ** bit27 : reserved | ||
54 | ** 6. BIOS request | ||
55 | ** All BIOS request is the same with request from PostQ | ||
56 | ** Except : | ||
57 | ** Request frame is sent from configuration space | ||
58 | ** offset: 0x78 : Request Frame (bit30 == 1) | ||
59 | ** offset: 0x18 : writeonly to generate | ||
60 | ** IRQ to IOP331 | ||
61 | ** Completion of request: | ||
62 | ** (bit30 == 0, bit28==err flag) | ||
63 | ** 7. Definition of SGL entry (structure) | ||
64 | ** 8. Message1 Out - Diag Status Code (????) | ||
65 | ** 9. Message0 message code : | ||
66 | ** 0x00 : NOP | ||
67 | ** 0x01 : Get Config | ||
68 | ** ->offset 0xa00 :for outbound message code message_rwbuffer | ||
69 | ** (IOP331 send to driver) | ||
70 | ** Signature 0x87974060(4) | ||
71 | ** Request len 0x00000200(4) | ||
72 | ** numbers of queue 0x00000100(4) | ||
73 | ** SDRAM Size 0x00000100(4)-->256 MB | ||
74 | ** IDE Channels 0x00000008(4) | ||
75 | ** vendor 40 bytes char | ||
76 | ** model 8 bytes char | ||
77 | ** FirmVer 16 bytes char | ||
78 | ** Device Map 16 bytes char | ||
79 | ** FirmwareVersion DWORD <== Added for checking of | ||
80 | ** new firmware capability | ||
81 | ** 0x02 : Set Config | ||
82 | ** ->offset 0xa00 :for inbound message code message_rwbuffer | ||
83 | ** (driver send to IOP331) | ||
84 | ** Signature 0x87974063(4) | ||
85 | ** UPPER32 of Request Frame (4)-->Driver Only | ||
86 | ** 0x03 : Reset (Abort all queued Command) | ||
87 | ** 0x04 : Stop Background Activity | ||
88 | ** 0x05 : Flush Cache | ||
89 | ** 0x06 : Start Background Activity | ||
90 | ** (re-start if background is halted) | ||
91 | ** 0x07 : Check If Host Command Pending | ||
92 | ** (Novell May Need This Function) | ||
93 | ** 0x08 : Set controller time | ||
94 | ** ->offset 0xa00 : for inbound message code message_rwbuffer | ||
95 | ** (driver to IOP331) | ||
96 | ** byte 0 : 0xaa <-- signature | ||
97 | ** byte 1 : 0x55 <-- signature | ||
98 | ** byte 2 : year (04) | ||
99 | ** byte 3 : month (1..12) | ||
100 | ** byte 4 : date (1..31) | ||
101 | ** byte 5 : hour (0..23) | ||
102 | ** byte 6 : minute (0..59) | ||
103 | ** byte 7 : second (0..59) | ||
104 | ******************************************************************************* | ||
105 | ******************************************************************************* | ||
106 | ** RS-232 Interface for Areca Raid Controller | ||
107 | ** The low level command interface is exclusive with VT100 terminal | ||
108 | ** -------------------------------------------------------------------- | ||
109 | ** 1. Sequence of command execution | ||
110 | ** -------------------------------------------------------------------- | ||
111 | ** (A) Header : 3 bytes sequence (0x5E, 0x01, 0x61) | ||
112 | ** (B) Command block : variable length of data including length, | ||
113 | ** command code, data and checksum byte | ||
114 | ** (C) Return data : variable length of data | ||
115 | ** -------------------------------------------------------------------- | ||
116 | ** 2. Command block | ||
117 | ** -------------------------------------------------------------------- | ||
118 | ** (A) 1st byte : command block length (low byte) | ||
119 | ** (B) 2nd byte : command block length (high byte) | ||
120 | ** note ..command block length shouldn't > 2040 bytes, | ||
121 | ** length excludes these two bytes | ||
122 | ** (C) 3rd byte : command code | ||
123 | ** (D) 4th and following bytes : variable length data bytes | ||
124 | ** depends on command code | ||
125 | ** (E) last byte : checksum byte (sum of 1st byte until last data byte) | ||
126 | ** -------------------------------------------------------------------- | ||
127 | ** 3. Command code and associated data | ||
128 | ** -------------------------------------------------------------------- | ||
129 | ** The following are command code defined in raid controller Command | ||
130 | ** code 0x10--0x1? are used for system level management, | ||
131 | ** no password checking is needed and should be implemented in separate | ||
132 | ** well controlled utility and not for end user access. | ||
133 | ** Command code 0x20--0x?? always check the password, | ||
134 | ** password must be entered to enable these command. | ||
135 | ** enum | ||
136 | ** { | ||
137 | ** GUI_SET_SERIAL=0x10, | ||
138 | ** GUI_SET_VENDOR, | ||
139 | ** GUI_SET_MODEL, | ||
140 | ** GUI_IDENTIFY, | ||
141 | ** GUI_CHECK_PASSWORD, | ||
142 | ** GUI_LOGOUT, | ||
143 | ** GUI_HTTP, | ||
144 | ** GUI_SET_ETHERNET_ADDR, | ||
145 | ** GUI_SET_LOGO, | ||
146 | ** GUI_POLL_EVENT, | ||
147 | ** GUI_GET_EVENT, | ||
148 | ** GUI_GET_HW_MONITOR, | ||
149 | ** // GUI_QUICK_CREATE=0x20, (function removed) | ||
150 | ** GUI_GET_INFO_R=0x20, | ||
151 | ** GUI_GET_INFO_V, | ||
152 | ** GUI_GET_INFO_P, | ||
153 | ** GUI_GET_INFO_S, | ||
154 | ** GUI_CLEAR_EVENT, | ||
155 | ** GUI_MUTE_BEEPER=0x30, | ||
156 | ** GUI_BEEPER_SETTING, | ||
157 | ** GUI_SET_PASSWORD, | ||
158 | ** GUI_HOST_INTERFACE_MODE, | ||
159 | ** GUI_REBUILD_PRIORITY, | ||
160 | ** GUI_MAX_ATA_MODE, | ||
161 | ** GUI_RESET_CONTROLLER, | ||
162 | ** GUI_COM_PORT_SETTING, | ||
163 | ** GUI_NO_OPERATION, | ||
164 | ** GUI_DHCP_IP, | ||
165 | ** GUI_CREATE_PASS_THROUGH=0x40, | ||
166 | ** GUI_MODIFY_PASS_THROUGH, | ||
167 | ** GUI_DELETE_PASS_THROUGH, | ||
168 | ** GUI_IDENTIFY_DEVICE, | ||
169 | ** GUI_CREATE_RAIDSET=0x50, | ||
170 | ** GUI_DELETE_RAIDSET, | ||
171 | ** GUI_EXPAND_RAIDSET, | ||
172 | ** GUI_ACTIVATE_RAIDSET, | ||
173 | ** GUI_CREATE_HOT_SPARE, | ||
174 | ** GUI_DELETE_HOT_SPARE, | ||
175 | ** GUI_CREATE_VOLUME=0x60, | ||
176 | ** GUI_MODIFY_VOLUME, | ||
177 | ** GUI_DELETE_VOLUME, | ||
178 | ** GUI_START_CHECK_VOLUME, | ||
179 | ** GUI_STOP_CHECK_VOLUME | ||
180 | ** }; | ||
181 | ** Command description : | ||
182 | ** GUI_SET_SERIAL : Set the controller serial# | ||
183 | ** byte 0,1 : length | ||
184 | ** byte 2 : command code 0x10 | ||
185 | ** byte 3 : password length (should be 0x0f) | ||
186 | ** byte 4-0x13 : should be "ArEcATecHnoLogY" | ||
187 | ** byte 0x14--0x23 : Serial number string (must be 16 bytes) | ||
188 | ** GUI_SET_VENDOR : Set vendor string for the controller | ||
189 | ** byte 0,1 : length | ||
190 | ** byte 2 : command code 0x11 | ||
191 | ** byte 3 : password length (should be 0x08) | ||
192 | ** byte 4-0x13 : should be "ArEcAvAr" | ||
193 | ** byte 0x14--0x3B : vendor string (must be 40 bytes) | ||
194 | ** GUI_SET_MODEL : Set the model name of the controller | ||
195 | ** byte 0,1 : length | ||
196 | ** byte 2 : command code 0x12 | ||
197 | ** byte 3 : password length (should be 0x08) | ||
198 | ** byte 4-0x13 : should be "ArEcAvAr" | ||
199 | ** byte 0x14--0x1B : model string (must be 8 bytes) | ||
200 | ** GUI_IDENTIFY : Identify device | ||
201 | ** byte 0,1 : length | ||
202 | ** byte 2 : command code 0x13 | ||
203 | ** return "Areca RAID Subsystem " | ||
204 | ** GUI_CHECK_PASSWORD : Verify password | ||
205 | ** byte 0,1 : length | ||
206 | ** byte 2 : command code 0x14 | ||
207 | ** byte 3 : password length | ||
208 | ** byte 4-0x?? : user password to be checked | ||
209 | ** GUI_LOGOUT : Logout GUI (force password checking on next command) | ||
210 | ** byte 0,1 : length | ||
211 | ** byte 2 : command code 0x15 | ||
212 | ** GUI_HTTP : HTTP interface (reserved for Http proxy service)(0x16) | ||
213 | ** | ||
214 | ** GUI_SET_ETHERNET_ADDR : Set the ethernet MAC address | ||
215 | ** byte 0,1 : length | ||
216 | ** byte 2 : command code 0x17 | ||
217 | ** byte 3 : password length (should be 0x08) | ||
218 | ** byte 4-0x13 : should be "ArEcAvAr" | ||
219 | ** byte 0x14--0x19 : Ethernet MAC address (must be 6 bytes) | ||
220 | ** GUI_SET_LOGO : Set logo in HTTP | ||
221 | ** byte 0,1 : length | ||
222 | ** byte 2 : command code 0x18 | ||
223 | ** byte 3 : Page# (0/1/2/3) (0xff --> clear OEM logo) | ||
224 | ** byte 4/5/6/7 : 0x55/0xaa/0xa5/0x5a | ||
225 | ** byte 8 : TITLE.JPG data (each page must be 2000 bytes) | ||
226 | ** note page0 1st 2 byte must be | ||
227 | ** actual length of the JPG file | ||
228 | ** GUI_POLL_EVENT : Poll If Event Log Changed | ||
229 | ** byte 0,1 : length | ||
230 | ** byte 2 : command code 0x19 | ||
231 | ** GUI_GET_EVENT : Read Event | ||
232 | ** byte 0,1 : length | ||
233 | ** byte 2 : command code 0x1a | ||
234 | ** byte 3 : Event Page (0:1st page/1/2/3:last page) | ||
235 | ** GUI_GET_HW_MONITOR : Get HW monitor data | ||
236 | ** byte 0,1 : length | ||
237 | ** byte 2 : command code 0x1b | ||
238 | ** byte 3 : # of FANs(example 2) | ||
239 | ** byte 4 : # of Voltage sensor(example 3) | ||
240 | ** byte 5 : # of temperature sensor(example 2) | ||
241 | ** byte 6 : # of power | ||
242 | ** byte 7/8 : Fan#0 (RPM) | ||
243 | ** byte 9/10 : Fan#1 | ||
244 | ** byte 11/12 : Voltage#0 original value in *1000 | ||
245 | ** byte 13/14 : Voltage#0 value | ||
246 | ** byte 15/16 : Voltage#1 org | ||
247 | ** byte 17/18 : Voltage#1 | ||
248 | ** byte 19/20 : Voltage#2 org | ||
249 | ** byte 21/22 : Voltage#2 | ||
250 | ** byte 23 : Temp#0 | ||
251 | ** byte 24 : Temp#1 | ||
252 | ** byte 25 : Power indicator (bit0 : power#0, | ||
253 | ** bit1 : power#1) | ||
254 | ** byte 26 : UPS indicator | ||
255 | ** GUI_QUICK_CREATE : Quick create raid/volume set | ||
256 | ** byte 0,1 : length | ||
257 | ** byte 2 : command code 0x20 | ||
258 | ** byte 3/4/5/6 : raw capacity | ||
259 | ** byte 7 : raid level | ||
260 | ** byte 8 : stripe size | ||
261 | ** byte 9 : spare | ||
262 | ** byte 10/11/12/13: device mask (the devices to create raid/volume) | ||
263 | ** This function is removed, application like | ||
264 | ** to implement quick create function | ||
265 | ** need to use GUI_CREATE_RAIDSET and GUI_CREATE_VOLUMESET function. | ||
266 | ** GUI_GET_INFO_R : Get Raid Set Information | ||
267 | ** byte 0,1 : length | ||
268 | ** byte 2 : command code 0x20 | ||
269 | ** byte 3 : raidset# | ||
270 | ** typedef struct sGUI_RAIDSET | ||
271 | ** { | ||
272 | ** BYTE grsRaidSetName[16]; | ||
273 | ** DWORD grsCapacity; | ||
274 | ** DWORD grsCapacityX; | ||
275 | ** DWORD grsFailMask; | ||
276 | ** BYTE grsDevArray[32]; | ||
277 | ** BYTE grsMemberDevices; | ||
278 | ** BYTE grsNewMemberDevices; | ||
279 | ** BYTE grsRaidState; | ||
280 | ** BYTE grsVolumes; | ||
281 | ** BYTE grsVolumeList[16]; | ||
282 | ** BYTE grsRes1; | ||
283 | ** BYTE grsRes2; | ||
284 | ** BYTE grsRes3; | ||
285 | ** BYTE grsFreeSegments; | ||
286 | ** DWORD grsRawStripes[8]; | ||
287 | ** DWORD grsRes4; | ||
288 | ** DWORD grsRes5; // Total to 128 bytes | ||
289 | ** DWORD grsRes6; // Total to 128 bytes | ||
290 | ** } sGUI_RAIDSET, *pGUI_RAIDSET; | ||
291 | ** GUI_GET_INFO_V : Get Volume Set Information | ||
292 | ** byte 0,1 : length | ||
293 | ** byte 2 : command code 0x21 | ||
294 | ** byte 3 : volumeset# | ||
295 | ** typedef struct sGUI_VOLUMESET | ||
296 | ** { | ||
297 | ** BYTE gvsVolumeName[16]; // 16 | ||
298 | ** DWORD gvsCapacity; | ||
299 | ** DWORD gvsCapacityX; | ||
300 | ** DWORD gvsFailMask; | ||
301 | ** DWORD gvsStripeSize; | ||
302 | ** DWORD gvsNewFailMask; | ||
303 | ** DWORD gvsNewStripeSize; | ||
304 | ** DWORD gvsVolumeStatus; | ||
305 | ** DWORD gvsProgress; // 32 | ||
306 | ** sSCSI_ATTR gvsScsi; | ||
307 | ** BYTE gvsMemberDisks; | ||
308 | ** BYTE gvsRaidLevel; // 8 | ||
309 | ** BYTE gvsNewMemberDisks; | ||
310 | ** BYTE gvsNewRaidLevel; | ||
311 | ** BYTE gvsRaidSetNumber; | ||
312 | ** BYTE gvsRes0; // 4 | ||
313 | ** BYTE gvsRes1[4]; // 64 bytes | ||
314 | ** } sGUI_VOLUMESET, *pGUI_VOLUMESET; | ||
315 | ** GUI_GET_INFO_P : Get Physical Drive Information | ||
316 | ** byte 0,1 : length | ||
317 | ** byte 2 : command code 0x22 | ||
318 | ** byte 3 : drive # (from 0 to max-channels - 1) | ||
319 | ** typedef struct sGUI_PHY_DRV | ||
320 | ** { | ||
321 | ** BYTE gpdModelName[40]; | ||
322 | ** BYTE gpdSerialNumber[20]; | ||
323 | ** BYTE gpdFirmRev[8]; | ||
324 | ** DWORD gpdCapacity; | ||
325 | ** DWORD gpdCapacityX; // Reserved for expansion | ||
326 | ** BYTE gpdDeviceState; | ||
327 | ** BYTE gpdPioMode; | ||
328 | ** BYTE gpdCurrentUdmaMode; | ||
329 | ** BYTE gpdUdmaMode; | ||
330 | ** BYTE gpdDriveSelect; | ||
331 | ** BYTE gpdRaidNumber; // 0xff if not belongs to a raid set | ||
332 | ** sSCSI_ATTR gpdScsi; | ||
333 | ** BYTE gpdReserved[40]; // Total to 128 bytes | ||
334 | ** } sGUI_PHY_DRV, *pGUI_PHY_DRV; | ||
335 | ** GUI_GET_INFO_S : Get System Information | ||
336 | ** byte 0,1 : length | ||
337 | ** byte 2 : command code 0x23 | ||
338 | ** typedef struct sCOM_ATTR | ||
339 | ** { | ||
340 | ** BYTE comBaudRate; | ||
341 | ** BYTE comDataBits; | ||
342 | ** BYTE comStopBits; | ||
343 | ** BYTE comParity; | ||
344 | ** BYTE comFlowControl; | ||
345 | ** } sCOM_ATTR, *pCOM_ATTR; | ||
346 | ** typedef struct sSYSTEM_INFO | ||
347 | ** { | ||
348 | ** BYTE gsiVendorName[40]; | ||
349 | ** BYTE gsiSerialNumber[16]; | ||
350 | ** BYTE gsiFirmVersion[16]; | ||
351 | ** BYTE gsiBootVersion[16]; | ||
352 | ** BYTE gsiMbVersion[16]; | ||
353 | ** BYTE gsiModelName[8]; | ||
354 | ** BYTE gsiLocalIp[4]; | ||
355 | ** BYTE gsiCurrentIp[4]; | ||
356 | ** DWORD gsiTimeTick; | ||
357 | ** DWORD gsiCpuSpeed; | ||
358 | ** DWORD gsiICache; | ||
359 | ** DWORD gsiDCache; | ||
360 | ** DWORD gsiScache; | ||
361 | ** DWORD gsiMemorySize; | ||
362 | ** DWORD gsiMemorySpeed; | ||
363 | ** DWORD gsiEvents; | ||
364 | ** BYTE gsiMacAddress[6]; | ||
365 | ** BYTE gsiDhcp; | ||
366 | ** BYTE gsiBeeper; | ||
367 | ** BYTE gsiChannelUsage; | ||
368 | ** BYTE gsiMaxAtaMode; | ||
369 | ** BYTE gsiSdramEcc; // 1:if ECC enabled | ||
370 | ** BYTE gsiRebuildPriority; | ||
371 | ** sCOM_ATTR gsiComA; // 5 bytes | ||
372 | ** sCOM_ATTR gsiComB; // 5 bytes | ||
373 | ** BYTE gsiIdeChannels; | ||
374 | ** BYTE gsiScsiHostChannels; | ||
375 | ** BYTE gsiIdeHostChannels; | ||
376 | ** BYTE gsiMaxVolumeSet; | ||
377 | ** BYTE gsiMaxRaidSet; | ||
378 | ** BYTE gsiEtherPort; // 1:if ether net port supported | ||
379 | ** BYTE gsiRaid6Engine; // 1:Raid6 engine supported | ||
380 | ** BYTE gsiRes[75]; | ||
381 | ** } sSYSTEM_INFO, *pSYSTEM_INFO; | ||
382 | ** GUI_CLEAR_EVENT : Clear System Event | ||
383 | ** byte 0,1 : length | ||
384 | ** byte 2 : command code 0x24 | ||
385 | ** GUI_MUTE_BEEPER : Mute current beeper | ||
386 | ** byte 0,1 : length | ||
387 | ** byte 2 : command code 0x30 | ||
388 | ** GUI_BEEPER_SETTING : Disable beeper | ||
389 | ** byte 0,1 : length | ||
390 | ** byte 2 : command code 0x31 | ||
391 | ** byte 3 : 0->disable, 1->enable | ||
392 | ** GUI_SET_PASSWORD : Change password | ||
393 | ** byte 0,1 : length | ||
394 | ** byte 2 : command code 0x32 | ||
395 | ** byte 3 : pass word length ( must <= 15 ) | ||
396 | ** byte 4 : password (must be alpha-numerical) | ||
397 | ** GUI_HOST_INTERFACE_MODE : Set host interface mode | ||
398 | ** byte 0,1 : length | ||
399 | ** byte 2 : command code 0x33 | ||
400 | ** byte 3 : 0->Independent, 1->cluster | ||
401 | ** GUI_REBUILD_PRIORITY : Set rebuild priority | ||
402 | ** byte 0,1 : length | ||
403 | ** byte 2 : command code 0x34 | ||
404 | ** byte 3 : 0/1/2/3 (low->high) | ||
405 | ** GUI_MAX_ATA_MODE : Set maximum ATA mode to be used | ||
406 | ** byte 0,1 : length | ||
407 | ** byte 2 : command code 0x35 | ||
408 | ** byte 3 : 0/1/2/3 (133/100/66/33) | ||
409 | ** GUI_RESET_CONTROLLER : Reset Controller | ||
410 | ** byte 0,1 : length | ||
411 | ** byte 2 : command code 0x36 | ||
412 | ** *Response with VT100 screen (discard it) | ||
413 | ** GUI_COM_PORT_SETTING : COM port setting | ||
414 | ** byte 0,1 : length | ||
415 | ** byte 2 : command code 0x37 | ||
416 | ** byte 3 : 0->COMA (term port), | ||
417 | ** 1->COMB (debug port) | ||
418 | ** byte 4 : 0/1/2/3/4/5/6/7 | ||
419 | ** (1200/2400/4800/9600/19200/38400/57600/115200) | ||
420 | ** byte 5 : data bit | ||
421 | ** (0:7 bit, 1:8 bit : must be 8 bit) | ||
422 | ** byte 6 : stop bit (0:1, 1:2 stop bits) | ||
423 | ** byte 7 : parity (0:none, 1:off, 2:even) | ||
424 | ** byte 8 : flow control | ||
425 | ** (0:none, 1:xon/xoff, 2:hardware => must use none) | ||
426 | ** GUI_NO_OPERATION : No operation | ||
427 | ** byte 0,1 : length | ||
428 | ** byte 2 : command code 0x38 | ||
429 | ** GUI_DHCP_IP : Set DHCP option and local IP address | ||
430 | ** byte 0,1 : length | ||
431 | ** byte 2 : command code 0x39 | ||
432 | ** byte 3 : 0:dhcp disabled, 1:dhcp enabled | ||
433 | ** byte 4/5/6/7 : IP address | ||
434 | ** GUI_CREATE_PASS_THROUGH : Create pass through disk | ||
435 | ** byte 0,1 : length | ||
436 | ** byte 2 : command code 0x40 | ||
437 | ** byte 3 : device # | ||
438 | ** byte 4 : scsi channel (0/1) | ||
439 | ** byte 5 : scsi id (0-->15) | ||
440 | ** byte 6 : scsi lun (0-->7) | ||
441 | ** byte 7 : tagged queue (1 : enabled) | ||
442 | ** byte 8 : cache mode (1 : enabled) | ||
443 | ** byte 9 : max speed (0/1/2/3/4, | ||
444 | ** async/20/40/80/160 for scsi) | ||
445 | ** (0/1/2/3/4, 33/66/100/133/150 for ide ) | ||
446 | ** GUI_MODIFY_PASS_THROUGH : Modify pass through disk | ||
447 | ** byte 0,1 : length | ||
448 | ** byte 2 : command code 0x41 | ||
449 | ** byte 3 : device # | ||
450 | ** byte 4 : scsi channel (0/1) | ||
451 | ** byte 5 : scsi id (0-->15) | ||
452 | ** byte 6 : scsi lun (0-->7) | ||
453 | ** byte 7 : tagged queue (1 : enabled) | ||
454 | ** byte 8 : cache mode (1 : enabled) | ||
455 | ** byte 9 : max speed (0/1/2/3/4, | ||
456 | ** async/20/40/80/160 for scsi) | ||
457 | ** (0/1/2/3/4, 33/66/100/133/150 for ide ) | ||
458 | ** GUI_DELETE_PASS_THROUGH : Delete pass through disk | ||
459 | ** byte 0,1 : length | ||
460 | ** byte 2 : command code 0x42 | ||
461 | ** byte 3 : device# to be deleted | ||
462 | ** GUI_IDENTIFY_DEVICE : Identify Device | ||
463 | ** byte 0,1 : length | ||
464 | ** byte 2 : command code 0x43 | ||
465 | ** byte 3 : Flash Method | ||
466 | ** (0:flash selected, 1:flash not selected) | ||
467 | ** byte 4/5/6/7 : IDE device mask to be flashed | ||
468 | ** note .... no response data available | ||
469 | ** GUI_CREATE_RAIDSET : Create Raid Set | ||
470 | ** byte 0,1 : length | ||
471 | ** byte 2 : command code 0x50 | ||
472 | ** byte 3/4/5/6 : device mask | ||
473 | ** byte 7-22 : raidset name (if byte 7 == 0:use default) | ||
474 | ** GUI_DELETE_RAIDSET : Delete Raid Set | ||
475 | ** byte 0,1 : length | ||
476 | ** byte 2 : command code 0x51 | ||
477 | ** byte 3 : raidset# | ||
478 | ** GUI_EXPAND_RAIDSET : Expand Raid Set | ||
479 | ** byte 0,1 : length | ||
480 | ** byte 2 : command code 0x52 | ||
481 | ** byte 3 : raidset# | ||
482 | ** byte 4/5/6/7 : device mask for expansion | ||
483 | ** byte 8/9/10 : (8:0 no change, 1 change, 0xff:terminate, | ||
484 | ** 9:new raid level, | ||
485 | ** 10:new stripe size | ||
486 | ** 0/1/2/3/4/5->4/8/16/32/64/128K ) | ||
487 | ** byte 11/12/13 : repeat for each volume in the raidset | ||
488 | ** GUI_ACTIVATE_RAIDSET : Activate incomplete raid set | ||
489 | ** byte 0,1 : length | ||
490 | ** byte 2 : command code 0x53 | ||
491 | ** byte 3 : raidset# | ||
492 | ** GUI_CREATE_HOT_SPARE : Create hot spare disk | ||
493 | ** byte 0,1 : length | ||
494 | ** byte 2 : command code 0x54 | ||
495 | ** byte 3/4/5/6 : device mask for hot spare creation | ||
496 | ** GUI_DELETE_HOT_SPARE : Delete hot spare disk | ||
497 | ** byte 0,1 : length | ||
498 | ** byte 2 : command code 0x55 | ||
499 | ** byte 3/4/5/6 : device mask for hot spare deletion | ||
500 | ** GUI_CREATE_VOLUME : Create volume set | ||
501 | ** byte 0,1 : length | ||
502 | ** byte 2 : command code 0x60 | ||
503 | ** byte 3 : raidset# | ||
504 | ** byte 4-19 : volume set name | ||
505 | ** (if byte4 == 0, use default) | ||
506 | ** byte 20-27 : volume capacity (blocks) | ||
507 | ** byte 28 : raid level | ||
508 | ** byte 29 : stripe size | ||
509 | ** (0/1/2/3/4/5->4/8/16/32/64/128K) | ||
510 | ** byte 30 : channel | ||
511 | ** byte 31 : ID | ||
512 | ** byte 32 : LUN | ||
513 | ** byte 33 : 1 enable tag | ||
514 | ** byte 34 : 1 enable cache | ||
515 | ** byte 35 : speed | ||
516 | ** (0/1/2/3/4->async/20/40/80/160 for scsi) | ||
517 | ** (0/1/2/3/4->33/66/100/133/150 for IDE ) | ||
518 | ** byte 36 : 1 to select quick init | ||
519 | ** | ||
520 | ** GUI_MODIFY_VOLUME : Modify volume Set | ||
521 | ** byte 0,1 : length | ||
522 | ** byte 2 : command code 0x61 | ||
523 | ** byte 3 : volumeset# | ||
524 | ** byte 4-19 : new volume set name | ||
525 | ** (if byte4 == 0, not change) | ||
526 | ** byte 20-27 : new volume capacity (reserved) | ||
527 | ** byte 28 : new raid level | ||
528 | ** byte 29 : new stripe size | ||
529 | ** (0/1/2/3/4/5->4/8/16/32/64/128K) | ||
530 | ** byte 30 : new channel | ||
531 | ** byte 31 : new ID | ||
532 | ** byte 32 : new LUN | ||
533 | ** byte 33 : 1 enable tag | ||
534 | ** byte 34 : 1 enable cache | ||
535 | ** byte 35 : speed | ||
536 | ** (0/1/2/3/4->async/20/40/80/160 for scsi) | ||
537 | ** (0/1/2/3/4->33/66/100/133/150 for IDE ) | ||
538 | ** GUI_DELETE_VOLUME : Delete volume set | ||
539 | ** byte 0,1 : length | ||
540 | ** byte 2 : command code 0x62 | ||
541 | ** byte 3 : volumeset# | ||
542 | ** GUI_START_CHECK_VOLUME : Start volume consistency check | ||
543 | ** byte 0,1 : length | ||
544 | ** byte 2 : command code 0x63 | ||
545 | ** byte 3 : volumeset# | ||
546 | ** GUI_STOP_CHECK_VOLUME : Stop volume consistency check | ||
547 | ** byte 0,1 : length | ||
548 | ** byte 2 : command code 0x64 | ||
549 | ** --------------------------------------------------------------------- | ||
550 | ** 4. Returned data | ||
551 | ** --------------------------------------------------------------------- | ||
552 | ** (A) Header : 3 bytes sequence (0x5E, 0x01, 0x61) | ||
553 | ** (B) Length : 2 bytes | ||
554 | ** (low byte 1st, excludes length and checksum byte) | ||
555 | ** (C) status or data : | ||
556 | ** <1> If length == 1 ==> 1 byte status code | ||
557 | ** #define GUI_OK 0x41 | ||
558 | ** #define GUI_RAIDSET_NOT_NORMAL 0x42 | ||
559 | ** #define GUI_VOLUMESET_NOT_NORMAL 0x43 | ||
560 | ** #define GUI_NO_RAIDSET 0x44 | ||
561 | ** #define GUI_NO_VOLUMESET 0x45 | ||
562 | ** #define GUI_NO_PHYSICAL_DRIVE 0x46 | ||
563 | ** #define GUI_PARAMETER_ERROR 0x47 | ||
564 | ** #define GUI_UNSUPPORTED_COMMAND 0x48 | ||
565 | ** #define GUI_DISK_CONFIG_CHANGED 0x49 | ||
566 | ** #define GUI_INVALID_PASSWORD 0x4a | ||
567 | ** #define GUI_NO_DISK_SPACE 0x4b | ||
568 | ** #define GUI_CHECKSUM_ERROR 0x4c | ||
569 | ** #define GUI_PASSWORD_REQUIRED 0x4d | ||
570 | ** <2> If length > 1 ==> | ||
571 | ** data block returned from controller | ||
572 | ** and the contents depends on the command code | ||
573 | ** (E) Checksum : checksum of length and status or data byte | ||
574 | ************************************************************************** | ||
diff --git a/Documentation/scsi/libsas.txt b/Documentation/scsi/libsas.txt new file mode 100644 index 000000000000..9e2078b2a615 --- /dev/null +++ b/Documentation/scsi/libsas.txt | |||
@@ -0,0 +1,484 @@ | |||
1 | SAS Layer | ||
2 | --------- | ||
3 | |||
4 | The SAS Layer is a management infrastructure which manages | ||
5 | SAS LLDDs. It sits between SCSI Core and SAS LLDDs. The | ||
6 | layout is as follows: while SCSI Core is concerned with | ||
7 | SAM/SPC issues, and a SAS LLDD+sequencer is concerned with | ||
8 | phy/OOB/link management, the SAS layer is concerned with: | ||
9 | |||
10 | * SAS Phy/Port/HA event management (LLDD generates, | ||
11 | SAS Layer processes), | ||
12 | * SAS Port management (creation/destruction), | ||
13 | * SAS Domain discovery and revalidation, | ||
14 | * SAS Domain device management, | ||
15 | * SCSI Host registration/unregistration, | ||
16 | * Device registration with SCSI Core (SAS) or libata | ||
17 | (SATA), and | ||
18 | * Expander management and exporting expander control | ||
19 | to user space. | ||
20 | |||
21 | A SAS LLDD is a PCI device driver. It is concerned with | ||
22 | phy/OOB management, and vendor specific tasks and generates | ||
23 | events to the SAS layer. | ||
24 | |||
25 | The SAS Layer does most SAS tasks as outlined in the SAS 1.1 | ||
26 | spec. | ||
27 | |||
28 | The sas_ha_struct describes the SAS LLDD to the SAS layer. | ||
29 | Most of it is used by the SAS Layer but a few fields need to | ||
30 | be initialized by the LLDDs. | ||
31 | |||
32 | After initializing your hardware, from the probe() function | ||
33 | you call sas_register_ha(). It will register your LLDD with | ||
34 | the SCSI subsystem, creating a SCSI host and it will | ||
35 | register your SAS driver with the sysfs SAS tree it creates. | ||
36 | It will then return. Then you enable your phys to actually | ||
37 | start OOB (at which point your driver will start calling the | ||
38 | notify_* event callbacks). | ||
39 | |||
40 | Structure descriptions: | ||
41 | |||
42 | struct sas_phy -------------------- | ||
43 | Normally this is statically embedded to your driver's | ||
44 | phy structure: | ||
45 | struct my_phy { | ||
46 | blah; | ||
47 | struct sas_phy sas_phy; | ||
48 | bleh; | ||
49 | }; | ||
50 | And then all the phys are an array of my_phy in your HA | ||
51 | struct (shown below). | ||
52 | |||
53 | Then as you go along and initialize your phys you also | ||
54 | initialize the sas_phy struct, along with your own | ||
55 | phy structure. | ||
56 | |||
57 | In general, the phys are managed by the LLDD and the ports | ||
58 | are managed by the SAS layer. So the phys are initialized | ||
59 | and updated by the LLDD and the ports are initialized and | ||
60 | updated by the SAS layer. | ||
61 | |||
62 | There is a scheme where the LLDD can RW certain fields, | ||
63 | and the SAS layer can only read such ones, and vice versa. | ||
64 | The idea is to avoid unnecessary locking. | ||
65 | |||
66 | enabled -- must be set (0/1) | ||
67 | id -- must be set [0,MAX_PHYS) | ||
68 | class, proto, type, role, oob_mode, linkrate -- must be set | ||
69 | oob_mode -- you set this when OOB has finished and then notify | ||
70 | the SAS Layer. | ||
71 | |||
72 | sas_addr -- this normally points to an array holding the sas | ||
73 | address of the phy, possibly somewhere in your my_phy | ||
74 | struct. | ||
75 | |||
76 | attached_sas_addr -- set this when you (LLDD) receive an | ||
77 | IDENTIFY frame or a FIS frame, _before_ notifying the SAS | ||
78 | layer. The idea is that sometimes the LLDD may want to fake | ||
79 | or provide a different SAS address on that phy/port and this | ||
80 | allows it to do this. At best you should copy the sas | ||
81 | address from the IDENTIFY frame or maybe generate a SAS | ||
82 | address for SATA directly attached devices. The Discover | ||
83 | process may later change this. | ||
84 | |||
85 | frame_rcvd -- this is where you copy the IDENTIFY/FIS frame | ||
86 | when you get it; you lock, copy, set frame_rcvd_size and | ||
87 | unlock the lock, and then call the event. It is a pointer | ||
88 | since there's no way to know your hw frame size _exactly_, | ||
89 | so you define the actual array in your phy struct and let | ||
90 | this pointer point to it. You copy the frame from your | ||
91 | DMAable memory to that area holding the lock. | ||
92 | |||
93 | sas_prim -- this is where primitives go when they're | ||
94 | received. See sas.h. Grab the lock, set the primitive, | ||
95 | release the lock, notify. | ||
96 | |||
97 | port -- this points to the sas_port if the phy belongs | ||
98 | to a port -- the LLDD only reads this. It points to the | ||
99 | sas_port this phy is part of. Set by the SAS Layer. | ||
100 | |||
101 | ha -- may be set; the SAS layer sets it anyway. | ||
102 | |||
103 | lldd_phy -- you should set this to point to your phy so you | ||
104 | can find your way around faster when the SAS layer calls one | ||
105 | of your callbacks and passes you a phy. If the sas_phy is | ||
106 | embedded you can also use container_of -- whatever you | ||
107 | prefer. | ||
108 | |||
109 | |||
110 | struct sas_port -------------------- | ||
111 | The LLDD doesn't set any fields of this struct -- it only | ||
112 | reads them. They should be self explanatory. | ||
113 | |||
114 | phy_mask is 32 bit, this should be enough for now, as I | ||
115 | haven't heard of a HA having more than 8 phys. | ||
116 | |||
117 | lldd_port -- I haven't found use for that -- maybe other | ||
118 | LLDD who wish to have internal port representation can make | ||
119 | use of this. | ||
120 | |||
121 | |||
122 | struct sas_ha_struct -------------------- | ||
123 | It normally is statically declared in your own LLDD | ||
124 | structure describing your adapter: | ||
125 | struct my_sas_ha { | ||
126 | blah; | ||
127 | struct sas_ha_struct sas_ha; | ||
128 | struct my_phy phys[MAX_PHYS]; | ||
129 | struct sas_port sas_ports[MAX_PHYS]; /* (1) */ | ||
130 | bleh; | ||
131 | }; | ||
132 | |||
133 | (1) If your LLDD doesn't have its own port representation. | ||
134 | |||
135 | What needs to be initialized (sample function given below). | ||
136 | |||
137 | pcidev | ||
138 | sas_addr -- since the SAS layer doesn't want to mess with | ||
139 | memory allocation, etc, this points to statically | ||
140 | allocated array somewhere (say in your host adapter | ||
141 | structure) and holds the SAS address of the host | ||
142 | adapter as given by you or the manufacturer, etc. | ||
143 | sas_port | ||
144 | sas_phy -- an array of pointers to structures. (see | ||
145 | note above on sas_addr). | ||
146 | These must be set. See more notes below. | ||
147 | num_phys -- the number of phys present in the sas_phy array, | ||
148 | and the number of ports present in the sas_port | ||
149 | array. There can be a maximum num_phys ports (one per | ||
150 | port) so we drop the num_ports, and only use | ||
151 | num_phys. | ||
152 | |||
153 | The event interface: | ||
154 | |||
155 | /* LLDD calls these to notify the class of an event. */ | ||
156 | void (*notify_ha_event)(struct sas_ha_struct *, enum ha_event); | ||
157 | void (*notify_port_event)(struct sas_phy *, enum port_event); | ||
158 | void (*notify_phy_event)(struct sas_phy *, enum phy_event); | ||
159 | |||
160 | When sas_register_ha() returns, those are set and can be | ||
161 | called by the LLDD to notify the SAS layer of such events | ||
162 | the SAS layer. | ||
163 | |||
164 | The port notification: | ||
165 | |||
166 | /* The class calls these to notify the LLDD of an event. */ | ||
167 | void (*lldd_port_formed)(struct sas_phy *); | ||
168 | void (*lldd_port_deformed)(struct sas_phy *); | ||
169 | |||
170 | If the LLDD wants notification when a port has been formed | ||
171 | or deformed it sets those to a function satisfying the type. | ||
172 | |||
173 | A SAS LLDD should also implement at least one of the Task | ||
174 | Management Functions (TMFs) described in SAM: | ||
175 | |||
176 | /* Task Management Functions. Must be called from process context. */ | ||
177 | int (*lldd_abort_task)(struct sas_task *); | ||
178 | int (*lldd_abort_task_set)(struct domain_device *, u8 *lun); | ||
179 | int (*lldd_clear_aca)(struct domain_device *, u8 *lun); | ||
180 | int (*lldd_clear_task_set)(struct domain_device *, u8 *lun); | ||
181 | int (*lldd_I_T_nexus_reset)(struct domain_device *); | ||
182 | int (*lldd_lu_reset)(struct domain_device *, u8 *lun); | ||
183 | int (*lldd_query_task)(struct sas_task *); | ||
184 | |||
185 | For more information please read SAM from T10.org. | ||
186 | |||
187 | Port and Adapter management: | ||
188 | |||
189 | /* Port and Adapter management */ | ||
190 | int (*lldd_clear_nexus_port)(struct sas_port *); | ||
191 | int (*lldd_clear_nexus_ha)(struct sas_ha_struct *); | ||
192 | |||
193 | A SAS LLDD should implement at least one of those. | ||
194 | |||
195 | Phy management: | ||
196 | |||
197 | /* Phy management */ | ||
198 | int (*lldd_control_phy)(struct sas_phy *, enum phy_func); | ||
199 | |||
200 | lldd_ha -- set this to point to your HA struct. You can also | ||
201 | use container_of if you embedded it as shown above. | ||
202 | |||
203 | A sample initialization and registration function | ||
204 | can look like this (called last thing from probe()) | ||
205 | *but* before you enable the phys to do OOB: | ||
206 | |||
207 | static int register_sas_ha(struct my_sas_ha *my_ha) | ||
208 | { | ||
209 | int i; | ||
210 | static struct sas_phy *sas_phys[MAX_PHYS]; | ||
211 | static struct sas_port *sas_ports[MAX_PHYS]; | ||
212 | |||
213 | my_ha->sas_ha.sas_addr = &my_ha->sas_addr[0]; | ||
214 | |||
215 | for (i = 0; i < MAX_PHYS; i++) { | ||
216 | sas_phys[i] = &my_ha->phys[i].sas_phy; | ||
217 | sas_ports[i] = &my_ha->sas_ports[i]; | ||
218 | } | ||
219 | |||
220 | my_ha->sas_ha.sas_phy = sas_phys; | ||
221 | my_ha->sas_ha.sas_port = sas_ports; | ||
222 | my_ha->sas_ha.num_phys = MAX_PHYS; | ||
223 | |||
224 | my_ha->sas_ha.lldd_port_formed = my_port_formed; | ||
225 | |||
226 | my_ha->sas_ha.lldd_dev_found = my_dev_found; | ||
227 | my_ha->sas_ha.lldd_dev_gone = my_dev_gone; | ||
228 | |||
229 | my_ha->sas_ha.lldd_max_execute_num = lldd_max_execute_num; (1) | ||
230 | |||
231 | my_ha->sas_ha.lldd_queue_size = ha_can_queue; | ||
232 | my_ha->sas_ha.lldd_execute_task = my_execute_task; | ||
233 | |||
234 | my_ha->sas_ha.lldd_abort_task = my_abort_task; | ||
235 | my_ha->sas_ha.lldd_abort_task_set = my_abort_task_set; | ||
236 | my_ha->sas_ha.lldd_clear_aca = my_clear_aca; | ||
237 | my_ha->sas_ha.lldd_clear_task_set = my_clear_task_set; | ||
238 | my_ha->sas_ha.lldd_I_T_nexus_reset= NULL; (2) | ||
239 | my_ha->sas_ha.lldd_lu_reset = my_lu_reset; | ||
240 | my_ha->sas_ha.lldd_query_task = my_query_task; | ||
241 | |||
242 | my_ha->sas_ha.lldd_clear_nexus_port = my_clear_nexus_port; | ||
243 | my_ha->sas_ha.lldd_clear_nexus_ha = my_clear_nexus_ha; | ||
244 | |||
245 | my_ha->sas_ha.lldd_control_phy = my_control_phy; | ||
246 | |||
247 | return sas_register_ha(&my_ha->sas_ha); | ||
248 | } | ||
249 | |||
250 | (1) This is normally a LLDD parameter, something of the | ||
251 | lines of a task collector. What it tells the SAS Layer is | ||
252 | whether the SAS layer should run in Direct Mode (default: | ||
253 | value 0 or 1) or Task Collector Mode (value greater than 1). | ||
254 | |||
255 | In Direct Mode, the SAS Layer calls Execute Task as soon as | ||
256 | it has a command to send to the SDS, _and_ this is a single | ||
257 | command, i.e. not linked. | ||
258 | |||
259 | Some hardware (e.g. aic94xx) has the capability to DMA more | ||
260 | than one task at a time (interrupt) from host memory. Task | ||
261 | Collector Mode is an optional feature for HAs which support | ||
262 | this in their hardware. (Again, it is completely optional | ||
263 | even if your hardware supports it.) | ||
264 | |||
265 | In Task Collector Mode, the SAS Layer would do _natural_ | ||
266 | coalescing of tasks and at the appropriate moment it would | ||
267 | call your driver to DMA more than one task in a single HA | ||
268 | interrupt. DMBS may want to use this by insmod/modprobe | ||
269 | setting the lldd_max_execute_num to something greater than | ||
270 | 1. | ||
271 | |||
272 | (2) SAS 1.1 does not define I_T Nexus Reset TMF. | ||
273 | |||
274 | Events | ||
275 | ------ | ||
276 | |||
277 | Events are _the only way_ a SAS LLDD notifies the SAS layer | ||
278 | of anything. There is no other method or way a LLDD to tell | ||
279 | the SAS layer of anything happening internally or in the SAS | ||
280 | domain. | ||
281 | |||
282 | Phy events: | ||
283 | PHYE_LOSS_OF_SIGNAL, (C) | ||
284 | PHYE_OOB_DONE, | ||
285 | PHYE_OOB_ERROR, (C) | ||
286 | PHYE_SPINUP_HOLD. | ||
287 | |||
288 | Port events, passed on a _phy_: | ||
289 | PORTE_BYTES_DMAED, (M) | ||
290 | PORTE_BROADCAST_RCVD, (E) | ||
291 | PORTE_LINK_RESET_ERR, (C) | ||
292 | PORTE_TIMER_EVENT, (C) | ||
293 | PORTE_HARD_RESET. | ||
294 | |||
295 | Host Adapter event: | ||
296 | HAE_RESET | ||
297 | |||
298 | A SAS LLDD should be able to generate | ||
299 | - at least one event from group C (choice), | ||
300 | - events marked M (mandatory) are mandatory (only one), | ||
301 | - events marked E (expander) if it wants the SAS layer | ||
302 | to handle domain revalidation (only one such). | ||
303 | - Unmarked events are optional. | ||
304 | |||
305 | Meaning: | ||
306 | |||
307 | HAE_RESET -- when your HA got internal error and was reset. | ||
308 | |||
309 | PORTE_BYTES_DMAED -- on receiving an IDENTIFY/FIS frame | ||
310 | PORTE_BROADCAST_RCVD -- on receiving a primitive | ||
311 | PORTE_LINK_RESET_ERR -- timer expired, loss of signal, loss | ||
312 | of DWS, etc. (*) | ||
313 | PORTE_TIMER_EVENT -- DWS reset timeout timer expired (*) | ||
314 | PORTE_HARD_RESET -- Hard Reset primitive received. | ||
315 | |||
316 | PHYE_LOSS_OF_SIGNAL -- the device is gone (*) | ||
317 | PHYE_OOB_DONE -- OOB went fine and oob_mode is valid | ||
318 | PHYE_OOB_ERROR -- Error while doing OOB, the device probably | ||
319 | got disconnected. (*) | ||
320 | PHYE_SPINUP_HOLD -- SATA is present, COMWAKE not sent. | ||
321 | |||
322 | (*) should set/clear the appropriate fields in the phy, | ||
323 | or alternatively call the inlined sas_phy_disconnected() | ||
324 | which is just a helper, from their tasklet. | ||
325 | |||
326 | The Execute Command SCSI RPC: | ||
327 | |||
328 | int (*lldd_execute_task)(struct sas_task *, int num, | ||
329 | unsigned long gfp_flags); | ||
330 | |||
331 | Used to queue a task to the SAS LLDD. @task is the tasks to | ||
332 | be executed. @num should be the number of tasks being | ||
333 | queued at this function call (they are linked listed via | ||
334 | task::list), @gfp_mask should be the gfp_mask defining the | ||
335 | context of the caller. | ||
336 | |||
337 | This function should implement the Execute Command SCSI RPC, | ||
338 | or if you're sending a SCSI Task as linked commands, you | ||
339 | should also use this function. | ||
340 | |||
341 | That is, when lldd_execute_task() is called, the command(s) | ||
342 | go out on the transport *immediately*. There is *no* | ||
343 | queuing of any sort and at any level in a SAS LLDD. | ||
344 | |||
345 | The use of task::list is two-fold, one for linked commands, | ||
346 | the other discussed below. | ||
347 | |||
348 | It is possible to queue up more than one task at a time, by | ||
349 | initializing the list element of struct sas_task, and | ||
350 | passing the number of tasks enlisted in this manner in num. | ||
351 | |||
352 | Returns: -SAS_QUEUE_FULL, -ENOMEM, nothing was queued; | ||
353 | 0, the task(s) were queued. | ||
354 | |||
355 | If you want to pass num > 1, then either | ||
356 | A) you're the only caller of this function and keep track | ||
357 | of what you've queued to the LLDD, or | ||
358 | B) you know what you're doing and have a strategy of | ||
359 | retrying. | ||
360 | |||
361 | As opposed to queuing one task at a time (function call), | ||
362 | batch queuing of tasks, by having num > 1, greatly | ||
363 | simplifies LLDD code, sequencer code, and _hardware design_, | ||
364 | and has some performance advantages in certain situations | ||
365 | (DBMS). | ||
366 | |||
367 | The LLDD advertises if it can take more than one command at | ||
368 | a time at lldd_execute_task(), by setting the | ||
369 | lldd_max_execute_num parameter (controlled by "collector" | ||
370 | module parameter in aic94xx SAS LLDD). | ||
371 | |||
372 | You should leave this to the default 1, unless you know what | ||
373 | you're doing. | ||
374 | |||
375 | This is a function of the LLDD, to which the SAS layer can | ||
376 | cater to. | ||
377 | |||
378 | int lldd_queue_size | ||
379 | The host adapter's queue size. This is the maximum | ||
380 | number of commands the lldd can have pending to domain | ||
381 | devices on behalf of all upper layers submitting through | ||
382 | lldd_execute_task(). | ||
383 | |||
384 | You really want to set this to something (much) larger than | ||
385 | 1. | ||
386 | |||
387 | This _really_ has absolutely nothing to do with queuing. | ||
388 | There is no queuing in SAS LLDDs. | ||
389 | |||
390 | struct sas_task { | ||
391 | dev -- the device this task is destined to | ||
392 | list -- must be initialized (INIT_LIST_HEAD) | ||
393 | task_proto -- _one_ of enum sas_proto | ||
394 | scatter -- pointer to scatter gather list array | ||
395 | num_scatter -- number of elements in scatter | ||
396 | total_xfer_len -- total number of bytes expected to be transfered | ||
397 | data_dir -- PCI_DMA_... | ||
398 | task_done -- callback when the task has finished execution | ||
399 | }; | ||
400 | |||
401 | When an external entity, entity other than the LLDD or the | ||
402 | SAS Layer, wants to work with a struct domain_device, it | ||
403 | _must_ call kobject_get() when getting a handle on the | ||
404 | device and kobject_put() when it is done with the device. | ||
405 | |||
406 | This does two things: | ||
407 | A) implements proper kfree() for the device; | ||
408 | B) increments/decrements the kref for all players: | ||
409 | domain_device | ||
410 | all domain_device's ... (if past an expander) | ||
411 | port | ||
412 | host adapter | ||
413 | pci device | ||
414 | and up the ladder, etc. | ||
415 | |||
416 | DISCOVERY | ||
417 | --------- | ||
418 | |||
419 | The sysfs tree has the following purposes: | ||
420 | a) It shows you the physical layout of the SAS domain at | ||
421 | the current time, i.e. how the domain looks in the | ||
422 | physical world right now. | ||
423 | b) Shows some device parameters _at_discovery_time_. | ||
424 | |||
425 | This is a link to the tree(1) program, very useful in | ||
426 | viewing the SAS domain: | ||
427 | ftp://mama.indstate.edu/linux/tree/ | ||
428 | I expect user space applications to actually create a | ||
429 | graphical interface of this. | ||
430 | |||
431 | That is, the sysfs domain tree doesn't show or keep state if | ||
432 | you e.g., change the meaning of the READY LED MEANING | ||
433 | setting, but it does show you the current connection status | ||
434 | of the domain device. | ||
435 | |||
436 | Keeping internal device state changes is responsibility of | ||
437 | upper layers (Command set drivers) and user space. | ||
438 | |||
439 | When a device or devices are unplugged from the domain, this | ||
440 | is reflected in the sysfs tree immediately, and the device(s) | ||
441 | removed from the system. | ||
442 | |||
443 | The structure domain_device describes any device in the SAS | ||
444 | domain. It is completely managed by the SAS layer. A task | ||
445 | points to a domain device, this is how the SAS LLDD knows | ||
446 | where to send the task(s) to. A SAS LLDD only reads the | ||
447 | contents of the domain_device structure, but it never creates | ||
448 | or destroys one. | ||
449 | |||
450 | Expander management from User Space | ||
451 | ----------------------------------- | ||
452 | |||
453 | In each expander directory in sysfs, there is a file called | ||
454 | "smp_portal". It is a binary sysfs attribute file, which | ||
455 | implements an SMP portal (Note: this is *NOT* an SMP port), | ||
456 | to which user space applications can send SMP requests and | ||
457 | receive SMP responses. | ||
458 | |||
459 | Functionality is deceptively simple: | ||
460 | |||
461 | 1. Build the SMP frame you want to send. The format and layout | ||
462 | is described in the SAS spec. Leave the CRC field equal 0. | ||
463 | open(2) | ||
464 | 2. Open the expander's SMP portal sysfs file in RW mode. | ||
465 | write(2) | ||
466 | 3. Write the frame you built in 1. | ||
467 | read(2) | ||
468 | 4. Read the amount of data you expect to receive for the frame you built. | ||
469 | If you receive different amount of data you expected to receive, | ||
470 | then there was some kind of error. | ||
471 | close(2) | ||
472 | All this process is shown in detail in the function do_smp_func() | ||
473 | and its callers, in the file "expander_conf.c". | ||
474 | |||
475 | The kernel functionality is implemented in the file | ||
476 | "sas_expander.c". | ||
477 | |||
478 | The program "expander_conf.c" implements this. It takes one | ||
479 | argument, the sysfs file name of the SMP portal to the | ||
480 | expander, and gives expander information, including routing | ||
481 | tables. | ||
482 | |||
483 | The SMP portal gives you complete control of the expander, | ||
484 | so please be careful. | ||
diff --git a/Documentation/scsi/ppa.txt b/Documentation/scsi/ppa.txt index 0dac88d86d87..5d9223bc1bd5 100644 --- a/Documentation/scsi/ppa.txt +++ b/Documentation/scsi/ppa.txt | |||
@@ -12,5 +12,3 @@ http://www.torque.net/parport/ | |||
12 | Email list for Linux Parport | 12 | Email list for Linux Parport |
13 | linux-parport@torque.net | 13 | linux-parport@torque.net |
14 | 14 | ||
15 | Email for problems with ZIP or ZIP Plus drivers | ||
16 | campbell@torque.net | ||
diff --git a/Documentation/scsi/tmscsim.txt b/Documentation/scsi/tmscsim.txt index e165229adf50..df7a02bfb5bf 100644 --- a/Documentation/scsi/tmscsim.txt +++ b/Documentation/scsi/tmscsim.txt | |||
@@ -109,7 +109,7 @@ than the 33.33 MHz being in the PCI spec. | |||
109 | 109 | ||
110 | If you want to share the IRQ with another device and the driver refuses to | 110 | If you want to share the IRQ with another device and the driver refuses to |
111 | do so, you might succeed with changing the DC390_IRQ type in tmscsim.c to | 111 | do so, you might succeed with changing the DC390_IRQ type in tmscsim.c to |
112 | SA_SHIRQ | SA_INTERRUPT. | 112 | IRQF_SHARED | IRQF_DISABLED. |
113 | 113 | ||
114 | 114 | ||
115 | 3.Features | 115 | 3.Features |
diff --git a/Documentation/seclvl.txt b/Documentation/seclvl.txt deleted file mode 100644 index 97274d122d0e..000000000000 --- a/Documentation/seclvl.txt +++ /dev/null | |||
@@ -1,97 +0,0 @@ | |||
1 | BSD Secure Levels Linux Security Module | ||
2 | Michael A. Halcrow <mike@halcrow.us> | ||
3 | |||
4 | |||
5 | Introduction | ||
6 | |||
7 | Under the BSD Secure Levels security model, sets of policies are | ||
8 | associated with levels. Levels range from -1 to 2, with -1 being the | ||
9 | weakest and 2 being the strongest. These security policies are | ||
10 | enforced at the kernel level, so not even the superuser is able to | ||
11 | disable or circumvent them. This hardens the machine against attackers | ||
12 | who gain root access to the system. | ||
13 | |||
14 | |||
15 | Levels and Policies | ||
16 | |||
17 | Level -1 (Permanently Insecure): | ||
18 | - Cannot increase the secure level | ||
19 | |||
20 | Level 0 (Insecure): | ||
21 | - Cannot ptrace the init process | ||
22 | |||
23 | Level 1 (Default): | ||
24 | - /dev/mem and /dev/kmem are read-only | ||
25 | - IMMUTABLE and APPEND extended attributes, if set, may not be unset | ||
26 | - Cannot load or unload kernel modules | ||
27 | - Cannot write directly to a mounted block device | ||
28 | - Cannot perform raw I/O operations | ||
29 | - Cannot perform network administrative tasks | ||
30 | - Cannot setuid any file | ||
31 | |||
32 | Level 2 (Secure): | ||
33 | - Cannot decrement the system time | ||
34 | - Cannot write to any block device, whether mounted or not | ||
35 | - Cannot unmount any mounted filesystems | ||
36 | |||
37 | |||
38 | Compilation | ||
39 | |||
40 | To compile the BSD Secure Levels LSM, seclvl.ko, enable the | ||
41 | SECURITY_SECLVL configuration option. This is found under Security | ||
42 | options -> BSD Secure Levels in the kernel configuration menu. | ||
43 | |||
44 | |||
45 | Basic Usage | ||
46 | |||
47 | Once the machine is in a running state, with all the necessary modules | ||
48 | loaded and all the filesystems mounted, you can load the seclvl.ko | ||
49 | module: | ||
50 | |||
51 | # insmod seclvl.ko | ||
52 | |||
53 | The module defaults to secure level 1, except when compiled directly | ||
54 | into the kernel, in which case it defaults to secure level 0. To raise | ||
55 | the secure level to 2, the administrator writes ``2'' to the | ||
56 | seclvl/seclvl file under the sysfs mount point (assumed to be /sys in | ||
57 | these examples): | ||
58 | |||
59 | # echo -n "2" > /sys/seclvl/seclvl | ||
60 | |||
61 | Alternatively, you can initialize the module at secure level 2 with | ||
62 | the initlvl module parameter: | ||
63 | |||
64 | # insmod seclvl.ko initlvl=2 | ||
65 | |||
66 | At this point, it is impossible to remove the module or reduce the | ||
67 | secure level. If the administrator wishes to have the option of doing | ||
68 | so, he must provide a module parameter, sha1_passwd, that specifies | ||
69 | the SHA1 hash of the password that can be used to reduce the secure | ||
70 | level to 0. | ||
71 | |||
72 | To generate this SHA1 hash, the administrator can use OpenSSL: | ||
73 | |||
74 | # echo -n "boogabooga" | openssl sha1 | ||
75 | abeda4e0f33defa51741217592bf595efb8d289c | ||
76 | |||
77 | In order to use password-instigated secure level reduction, the SHA1 | ||
78 | crypto module must be loaded or compiled into the kernel: | ||
79 | |||
80 | # insmod sha1.ko | ||
81 | |||
82 | The administrator can then insmod the seclvl module, including the | ||
83 | SHA1 hash of the password: | ||
84 | |||
85 | # insmod seclvl.ko | ||
86 | sha1_passwd=abeda4e0f33defa51741217592bf595efb8d289c | ||
87 | |||
88 | To reduce the secure level, write the password to seclvl/passwd under | ||
89 | your sysfs mount point: | ||
90 | |||
91 | # echo -n "boogabooga" > /sys/seclvl/passwd | ||
92 | |||
93 | The September 2004 edition of Sys Admin Magazine has an article about | ||
94 | the BSD Secure Levels LSM. I encourage you to refer to that article | ||
95 | for a more in-depth treatment of this security module: | ||
96 | |||
97 | http://www.samag.com/documents/s=9304/sam0409a/0409a.htm | ||
diff --git a/Documentation/sh/new-machine.txt b/Documentation/sh/new-machine.txt index eb2dd2e6993b..73988e0d112b 100644 --- a/Documentation/sh/new-machine.txt +++ b/Documentation/sh/new-machine.txt | |||
@@ -41,11 +41,6 @@ Board-specific code: | |||
41 | | | 41 | | |
42 | .. more boards here ... | 42 | .. more boards here ... |
43 | 43 | ||
44 | It should also be noted that each board is required to have some certain | ||
45 | headers. At the time of this writing, io.h is the only thing that needs | ||
46 | to be provided for each board, and can generally just reference generic | ||
47 | functions (with the exception of isa_port2addr). | ||
48 | |||
49 | Next, for companion chips: | 44 | Next, for companion chips: |
50 | . | 45 | . |
51 | `-- arch | 46 | `-- arch |
@@ -104,12 +99,13 @@ and then populate that with sub-directories for each member of the family. | |||
104 | Both the Solution Engine and the hp6xx boards are an example of this. | 99 | Both the Solution Engine and the hp6xx boards are an example of this. |
105 | 100 | ||
106 | After you have setup your new arch/sh/boards/ directory, remember that you | 101 | After you have setup your new arch/sh/boards/ directory, remember that you |
107 | also must add a directory in include/asm-sh for headers localized to this | 102 | should also add a directory in include/asm-sh for headers localized to this |
108 | board. In order to interoperate seamlessly with the build system, it's best | 103 | board (if there are going to be more than one). In order to interoperate |
109 | to have this directory the same as the arch/sh/boards/ directory name, | 104 | seamlessly with the build system, it's best to have this directory the same |
110 | though if your board is again part of a family, the build system has ways | 105 | as the arch/sh/boards/ directory name, though if your board is again part of |
111 | of dealing with this, and you can feel free to name the directory after | 106 | a family, the build system has ways of dealing with this (via incdir-y |
112 | the family member itself. | 107 | overloading), and you can feel free to name the directory after the family |
108 | member itself. | ||
113 | 109 | ||
114 | There are a few things that each board is required to have, both in the | 110 | There are a few things that each board is required to have, both in the |
115 | arch/sh/boards and the include/asm-sh/ heirarchy. In order to better | 111 | arch/sh/boards and the include/asm-sh/ heirarchy. In order to better |
@@ -122,6 +118,7 @@ might look something like: | |||
122 | * arch/sh/boards/vapor/setup.c - Setup code for imaginary board | 118 | * arch/sh/boards/vapor/setup.c - Setup code for imaginary board |
123 | */ | 119 | */ |
124 | #include <linux/init.h> | 120 | #include <linux/init.h> |
121 | #include <asm/rtc.h> /* for board_time_init() */ | ||
125 | 122 | ||
126 | const char *get_system_type(void) | 123 | const char *get_system_type(void) |
127 | { | 124 | { |
@@ -152,79 +149,57 @@ int __init platform_setup(void) | |||
152 | } | 149 | } |
153 | 150 | ||
154 | Our new imaginary board will also have to tie into the machvec in order for it | 151 | Our new imaginary board will also have to tie into the machvec in order for it |
155 | to be of any use. Currently the machvec is slowly on its way out, but is still | 152 | to be of any use. |
156 | required for the time being. As such, let us take a look at what needs to be | ||
157 | done for the machvec assignment. | ||
158 | 153 | ||
159 | machvec functions fall into a number of categories: | 154 | machvec functions fall into a number of categories: |
160 | 155 | ||
161 | - I/O functions to IO memory (inb etc) and PCI/main memory (readb etc). | 156 | - I/O functions to IO memory (inb etc) and PCI/main memory (readb etc). |
162 | - I/O remapping functions (ioremap etc) | 157 | - I/O mapping functions (ioport_map, ioport_unmap, etc). |
163 | - some initialisation functions | 158 | - a 'heartbeat' function. |
164 | - a 'heartbeat' function | 159 | - PCI and IRQ initialization routines. |
165 | - some miscellaneous flags | 160 | - Consistent allocators (for boards that need special allocators, |
166 | 161 | particularly for allocating out of some board-specific SRAM for DMA | |
167 | The tree can be built in two ways: | 162 | handles). |
168 | - as a fully generic build. All drivers are linked in, and all functions | 163 | |
169 | go through the machvec | 164 | There are machvec functions added and removed over time, so always be sure to |
170 | - as a machine specific build. In this case only the required drivers | 165 | consult include/asm-sh/machvec.h for the current state of the machvec. |
171 | will be linked in, and some macros may be redefined to not go through | 166 | |
172 | the machvec where performance is important (in particular IO functions). | 167 | The kernel will automatically wrap in generic routines for undefined function |
173 | 168 | pointers in the machvec at boot time, as machvec functions are referenced | |
174 | There are three ways in which IO can be performed: | 169 | unconditionally throughout most of the tree. Some boards have incredibly |
175 | - none at all. This is really only useful for the 'unknown' machine type, | 170 | sparse machvecs (such as the dreamcast and sh03), whereas others must define |
176 | which us designed to run on a machine about which we know nothing, and | 171 | virtually everything (rts7751r2d). |
177 | so all all IO instructions do nothing. | 172 | |
178 | - fully custom. In this case all IO functions go to a machine specific | 173 | Adding a new machine is relatively trivial (using vapor as an example): |
179 | set of functions which can do what they like | 174 | |
180 | - a generic set of functions. These will cope with most situations, | 175 | If the board-specific definitions are quite minimalistic, as is the case for |
181 | and rely on a single function, mv_port2addr, which is called through the | 176 | the vast majority of boards, simply having a single board-specific header is |
182 | machine vector, and converts an IO address into a memory address, which | 177 | sufficient. |
183 | can be read from/written to directly. | 178 | |
184 | 179 | - add a new file include/asm-sh/vapor.h which contains prototypes for | |
185 | Thus adding a new machine involves the following steps (I will assume I am | ||
186 | adding a machine called vapor): | ||
187 | |||
188 | - add a new file include/asm-sh/vapor/io.h which contains prototypes for | ||
189 | any machine specific IO functions prefixed with the machine name, for | 180 | any machine specific IO functions prefixed with the machine name, for |
190 | example vapor_inb. These will be needed when filling out the machine | 181 | example vapor_inb. These will be needed when filling out the machine |
191 | vector. | 182 | vector. |
192 | 183 | ||
193 | This is the minimum that is required, however there are ample | 184 | Note that these prototypes are generated automatically by setting |
194 | opportunities to optimise this. In particular, by making the prototypes | 185 | __IO_PREFIX to something sensible. A typical example would be: |
195 | inline function definitions, it is possible to inline the function when | 186 | |
196 | building machine specific versions. Note that the machine vector | 187 | #define __IO_PREFIX vapor |
197 | functions will still be needed, so that a module built for a generic | 188 | #include <asm/io_generic.h> |
198 | setup can be loaded. | 189 | |
199 | 190 | somewhere in the board-specific header. Any boards being ported that still | |
200 | - add a new file arch/sh/boards/vapor/mach.c. This contains the definition | 191 | have a legacy io.h should remove it entirely and switch to the new model. |
201 | of the machine vector. When building the machine specific version, this | 192 | |
202 | will be the real machine vector (via an alias), while in the generic | 193 | - Add machine vector definitions to the board's setup.c. At a bare minimum, |
203 | version is used to initialise the machine vector, and then freed, by | 194 | this must be defined as something like: |
204 | making it initdata. This should be defined as: | 195 | |
205 | 196 | struct sh_machine_vector mv_vapor __initmv = { | |
206 | struct sh_machine_vector mv_vapor __initmv = { | 197 | .mv_name = "vapor", |
207 | .mv_name = "vapor", | 198 | }; |
208 | } | 199 | ALIAS_MV(vapor) |
209 | ALIAS_MV(vapor) | 200 | |
210 | 201 | - finally add a file arch/sh/boards/vapor/io.c, which contains definitions of | |
211 | - finally add a file arch/sh/boards/vapor/io.c, which contains | 202 | the machine specific io functions (if there are enough to warrant it). |
212 | definitions of the machine specific io functions. | ||
213 | |||
214 | A note about initialisation functions. Three initialisation functions are | ||
215 | provided in the machine vector: | ||
216 | - mv_arch_init - called very early on from setup_arch | ||
217 | - mv_init_irq - called from init_IRQ, after the generic SH interrupt | ||
218 | initialisation | ||
219 | - mv_init_pci - currently not used | ||
220 | |||
221 | Any other remaining functions which need to be called at start up can be | ||
222 | added to the list using the __initcalls macro (or module_init if the code | ||
223 | can be built as a module). Many generic drivers probe to see if the device | ||
224 | they are targeting is present, however this may not always be appropriate, | ||
225 | so a flag can be added to the machine vector which will be set on those | ||
226 | machines which have the hardware in question, reducing the probe to a | ||
227 | single conditional. | ||
228 | 203 | ||
229 | 3. Hooking into the Build System | 204 | 3. Hooking into the Build System |
230 | ================================ | 205 | ================================ |
@@ -303,4 +278,3 @@ which will in turn copy the defconfig for this board, run it through | |||
303 | oldconfig (prompting you for any new options since the time of creation), | 278 | oldconfig (prompting you for any new options since the time of creation), |
304 | and start you on your way to having a functional kernel for your new | 279 | and start you on your way to having a functional kernel for your new |
305 | board. | 280 | board. |
306 | |||
diff --git a/Documentation/sh/register-banks.txt b/Documentation/sh/register-banks.txt new file mode 100644 index 000000000000..a6719f2f6594 --- /dev/null +++ b/Documentation/sh/register-banks.txt | |||
@@ -0,0 +1,33 @@ | |||
1 | Notes on register bank usage in the kernel | ||
2 | ========================================== | ||
3 | |||
4 | Introduction | ||
5 | ------------ | ||
6 | |||
7 | The SH-3 and SH-4 CPU families traditionally include a single partial register | ||
8 | bank (selected by SR.RB, only r0 ... r7 are banked), whereas other families | ||
9 | may have more full-featured banking or simply no such capabilities at all. | ||
10 | |||
11 | SR.RB banking | ||
12 | ------------- | ||
13 | |||
14 | In the case of this type of banking, banked registers are mapped directly to | ||
15 | r0 ... r7 if SR.RB is set to the bank we are interested in, otherwise ldc/stc | ||
16 | can still be used to reference the banked registers (as r0_bank ... r7_bank) | ||
17 | when in the context of another bank. The developer must keep the SR.RB value | ||
18 | in mind when writing code that utilizes these banked registers, for obvious | ||
19 | reasons. Userspace is also not able to poke at the bank1 values, so these can | ||
20 | be used rather effectively as scratch registers by the kernel. | ||
21 | |||
22 | Presently the kernel uses several of these registers. | ||
23 | |||
24 | - r0_bank, r1_bank (referenced as k0 and k1, used for scratch | ||
25 | registers when doing exception handling). | ||
26 | - r2_bank (used to track the EXPEVT/INTEVT code) | ||
27 | - Used by do_IRQ() and friends for doing irq mapping based off | ||
28 | of the interrupt exception vector jump table offset | ||
29 | - r6_bank (global interrupt mask) | ||
30 | - The SR.IMASK interrupt handler makes use of this to set the | ||
31 | interrupt priority level (used by local_irq_enable()) | ||
32 | - r7_bank (current) | ||
33 | |||
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt index 87d76a5c73d0..e6b57dd46a4f 100644 --- a/Documentation/sound/alsa/ALSA-Configuration.txt +++ b/Documentation/sound/alsa/ALSA-Configuration.txt | |||
@@ -472,6 +472,22 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
472 | 472 | ||
473 | The power-management is supported. | 473 | The power-management is supported. |
474 | 474 | ||
475 | Module snd-darla20 | ||
476 | ------------------ | ||
477 | |||
478 | Module for Echoaudio Darla20 | ||
479 | |||
480 | This module supports multiple cards. | ||
481 | The driver requires the firmware loader support on kernel. | ||
482 | |||
483 | Module snd-darla24 | ||
484 | ------------------ | ||
485 | |||
486 | Module for Echoaudio Darla24 | ||
487 | |||
488 | This module supports multiple cards. | ||
489 | The driver requires the firmware loader support on kernel. | ||
490 | |||
475 | Module snd-dt019x | 491 | Module snd-dt019x |
476 | ----------------- | 492 | ----------------- |
477 | 493 | ||
@@ -499,6 +515,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
499 | 515 | ||
500 | The power-management is supported. | 516 | The power-management is supported. |
501 | 517 | ||
518 | Module snd-echo3g | ||
519 | ----------------- | ||
520 | |||
521 | Module for Echoaudio 3G cards (Gina3G/Layla3G) | ||
522 | |||
523 | This module supports multiple cards. | ||
524 | The driver requires the firmware loader support on kernel. | ||
525 | |||
502 | Module snd-emu10k1 | 526 | Module snd-emu10k1 |
503 | ------------------ | 527 | ------------------ |
504 | 528 | ||
@@ -657,6 +681,22 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
657 | 681 | ||
658 | The power-management is supported. | 682 | The power-management is supported. |
659 | 683 | ||
684 | Module snd-gina20 | ||
685 | ----------------- | ||
686 | |||
687 | Module for Echoaudio Gina20 | ||
688 | |||
689 | This module supports multiple cards. | ||
690 | The driver requires the firmware loader support on kernel. | ||
691 | |||
692 | Module snd-gina24 | ||
693 | ----------------- | ||
694 | |||
695 | Module for Echoaudio Gina24 | ||
696 | |||
697 | This module supports multiple cards. | ||
698 | The driver requires the firmware loader support on kernel. | ||
699 | |||
660 | Module snd-gusclassic | 700 | Module snd-gusclassic |
661 | --------------------- | 701 | --------------------- |
662 | 702 | ||
@@ -718,6 +758,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
718 | position_fix - Fix DMA pointer (0 = auto, 1 = none, 2 = POSBUF, 3 = FIFO size) | 758 | position_fix - Fix DMA pointer (0 = auto, 1 = none, 2 = POSBUF, 3 = FIFO size) |
719 | single_cmd - Use single immediate commands to communicate with | 759 | single_cmd - Use single immediate commands to communicate with |
720 | codecs (for debugging only) | 760 | codecs (for debugging only) |
761 | disable_msi - Disable Message Signaled Interrupt (MSI) | ||
721 | 762 | ||
722 | This module supports one card and autoprobe. | 763 | This module supports one card and autoprobe. |
723 | 764 | ||
@@ -738,11 +779,16 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
738 | 6stack-digout 6-jack with a SPDIF out | 779 | 6stack-digout 6-jack with a SPDIF out |
739 | w810 3-jack | 780 | w810 3-jack |
740 | z71v 3-jack (HP shared SPDIF) | 781 | z71v 3-jack (HP shared SPDIF) |
741 | asus 3-jack | 782 | asus 3-jack (ASUS Mobo) |
783 | asus-w1v ASUS W1V | ||
784 | asus-dig ASUS with SPDIF out | ||
785 | asus-dig2 ASUS with SPDIF out (using GPIO2) | ||
742 | uniwill 3-jack | 786 | uniwill 3-jack |
743 | F1734 2-jack | 787 | F1734 2-jack |
744 | lg LG laptop (m1 express dual) | 788 | lg LG laptop (m1 express dual) |
745 | lg-lw LG LW20 laptop | 789 | lg-lw LG LW20/LW25 laptop |
790 | tcl TCL S700 | ||
791 | clevo Clevo laptops (m520G, m665n) | ||
746 | test for testing/debugging purpose, almost all controls can be | 792 | test for testing/debugging purpose, almost all controls can be |
747 | adjusted. Appearing only when compiled with | 793 | adjusted. Appearing only when compiled with |
748 | $CONFIG_SND_DEBUG=y | 794 | $CONFIG_SND_DEBUG=y |
@@ -750,6 +796,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
750 | 796 | ||
751 | ALC260 | 797 | ALC260 |
752 | hp HP machines | 798 | hp HP machines |
799 | hp-3013 HP machines (3013-variant) | ||
753 | fujitsu Fujitsu S7020 | 800 | fujitsu Fujitsu S7020 |
754 | acer Acer TravelMate | 801 | acer Acer TravelMate |
755 | basic fixed pin assignment (old default model) | 802 | basic fixed pin assignment (old default model) |
@@ -757,18 +804,32 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
757 | 804 | ||
758 | ALC262 | 805 | ALC262 |
759 | fujitsu Fujitsu Laptop | 806 | fujitsu Fujitsu Laptop |
807 | hp-bpc HP xw4400/6400/8400/9400 laptops | ||
808 | benq Benq ED8 | ||
760 | basic fixed pin assignment w/o SPDIF | 809 | basic fixed pin assignment w/o SPDIF |
761 | auto auto-config reading BIOS (default) | 810 | auto auto-config reading BIOS (default) |
762 | 811 | ||
763 | ALC882/883/885 | 812 | ALC882/885 |
764 | 3stack-dig 3-jack with SPDIF I/O | 813 | 3stack-dig 3-jack with SPDIF I/O |
765 | 6stck-dig 6-jack digital with SPDIF I/O | 814 | 6stck-dig 6-jack digital with SPDIF I/O |
815 | arima Arima W820Di1 | ||
816 | auto auto-config reading BIOS (default) | ||
817 | |||
818 | ALC883/888 | ||
819 | 3stack-dig 3-jack with SPDIF I/O | ||
820 | 6stack-dig 6-jack digital with SPDIF I/O | ||
821 | 3stack-6ch 3-jack 6-channel | ||
822 | 3stack-6ch-dig 3-jack 6-channel with SPDIF I/O | ||
823 | 6stack-dig-demo 6-jack digital for Intel demo board | ||
824 | acer Acer laptops (Travelmate 3012WTMi, Aspire 5600, etc) | ||
766 | auto auto-config reading BIOS (default) | 825 | auto auto-config reading BIOS (default) |
767 | 826 | ||
768 | ALC861 | 827 | ALC861/660 |
769 | 3stack 3-jack | 828 | 3stack 3-jack |
770 | 3stack-dig 3-jack with SPDIF I/O | 829 | 3stack-dig 3-jack with SPDIF I/O |
771 | 6stack-dig 6-jack with SPDIF I/O | 830 | 6stack-dig 6-jack with SPDIF I/O |
831 | 3stack-660 3-jack (for ALC660) | ||
832 | uniwill-m31 Uniwill M31 laptop | ||
772 | auto auto-config reading BIOS (default) | 833 | auto auto-config reading BIOS (default) |
773 | 834 | ||
774 | CMI9880 | 835 | CMI9880 |
@@ -797,10 +858,21 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
797 | 3stack-dig ditto with SPDIF | 858 | 3stack-dig ditto with SPDIF |
798 | laptop 3-jack with hp-jack automute | 859 | laptop 3-jack with hp-jack automute |
799 | laptop-dig ditto with SPDIF | 860 | laptop-dig ditto with SPDIF |
800 | auto auto-confgi reading BIOS (default) | 861 | auto auto-config reading BIOS (default) |
862 | |||
863 | STAC9200/9205/9220/9221/9254 | ||
864 | ref Reference board | ||
865 | 3stack D945 3stack | ||
866 | 5stack D945 5stack + SPDIF | ||
867 | |||
868 | STAC9227/9228/9229/927x | ||
869 | ref Reference board | ||
870 | 3stack D965 3stack | ||
871 | 5stack D965 5stack + SPDIF | ||
801 | 872 | ||
802 | STAC7661(?) | 873 | STAC9872 |
803 | vaio Setup for VAIO FE550G/SZ110 | 874 | vaio Setup for VAIO FE550G/SZ110 |
875 | vaio-ar Setup for VAIO AR | ||
804 | 876 | ||
805 | If the default configuration doesn't work and one of the above | 877 | If the default configuration doesn't work and one of the above |
806 | matches with your device, report it together with the PCI | 878 | matches with your device, report it together with the PCI |
@@ -937,6 +1009,30 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
937 | driver isn't configured properly or you want to try another | 1009 | driver isn't configured properly or you want to try another |
938 | type for testing. | 1010 | type for testing. |
939 | 1011 | ||
1012 | Module snd-indigo | ||
1013 | ----------------- | ||
1014 | |||
1015 | Module for Echoaudio Indigo | ||
1016 | |||
1017 | This module supports multiple cards. | ||
1018 | The driver requires the firmware loader support on kernel. | ||
1019 | |||
1020 | Module snd-indigodj | ||
1021 | ------------------- | ||
1022 | |||
1023 | Module for Echoaudio Indigo DJ | ||
1024 | |||
1025 | This module supports multiple cards. | ||
1026 | The driver requires the firmware loader support on kernel. | ||
1027 | |||
1028 | Module snd-indigoio | ||
1029 | ------------------- | ||
1030 | |||
1031 | Module for Echoaudio Indigo IO | ||
1032 | |||
1033 | This module supports multiple cards. | ||
1034 | The driver requires the firmware loader support on kernel. | ||
1035 | |||
940 | Module snd-intel8x0 | 1036 | Module snd-intel8x0 |
941 | ------------------- | 1037 | ------------------- |
942 | 1038 | ||
@@ -1036,6 +1132,22 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
1036 | 1132 | ||
1037 | This module supports multiple cards. | 1133 | This module supports multiple cards. |
1038 | 1134 | ||
1135 | Module snd-layla20 | ||
1136 | ------------------ | ||
1137 | |||
1138 | Module for Echoaudio Layla20 | ||
1139 | |||
1140 | This module supports multiple cards. | ||
1141 | The driver requires the firmware loader support on kernel. | ||
1142 | |||
1143 | Module snd-layla24 | ||
1144 | ------------------ | ||
1145 | |||
1146 | Module for Echoaudio Layla24 | ||
1147 | |||
1148 | This module supports multiple cards. | ||
1149 | The driver requires the firmware loader support on kernel. | ||
1150 | |||
1039 | Module snd-maestro3 | 1151 | Module snd-maestro3 |
1040 | ------------------- | 1152 | ------------------- |
1041 | 1153 | ||
@@ -1056,6 +1168,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
1056 | 1168 | ||
1057 | The power-management is supported. | 1169 | The power-management is supported. |
1058 | 1170 | ||
1171 | Module snd-mia | ||
1172 | --------------- | ||
1173 | |||
1174 | Module for Echoaudio Mia | ||
1175 | |||
1176 | This module supports multiple cards. | ||
1177 | The driver requires the firmware loader support on kernel. | ||
1178 | |||
1059 | Module snd-miro | 1179 | Module snd-miro |
1060 | --------------- | 1180 | --------------- |
1061 | 1181 | ||
@@ -1088,6 +1208,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
1088 | When no hotplug fw loader is available, you need to load the | 1208 | When no hotplug fw loader is available, you need to load the |
1089 | firmware via mixartloader utility in alsa-tools package. | 1209 | firmware via mixartloader utility in alsa-tools package. |
1090 | 1210 | ||
1211 | Module snd-mona | ||
1212 | --------------- | ||
1213 | |||
1214 | Module for Echoaudio Mona | ||
1215 | |||
1216 | This module supports multiple cards. | ||
1217 | The driver requires the firmware loader support on kernel. | ||
1218 | |||
1091 | Module snd-mpu401 | 1219 | Module snd-mpu401 |
1092 | ----------------- | 1220 | ----------------- |
1093 | 1221 | ||
@@ -1111,6 +1239,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
1111 | 1239 | ||
1112 | Module supports only 1 card. This module has no enable option. | 1240 | Module supports only 1 card. This module has no enable option. |
1113 | 1241 | ||
1242 | Module snd-mts64 | ||
1243 | ---------------- | ||
1244 | |||
1245 | Module for Ego Systems (ESI) Miditerminal 4140 | ||
1246 | |||
1247 | This module supports multiple devices. | ||
1248 | Requires parport (CONFIG_PARPORT). | ||
1249 | |||
1114 | Module snd-nm256 | 1250 | Module snd-nm256 |
1115 | ---------------- | 1251 | ---------------- |
1116 | 1252 | ||
diff --git a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl index 635cbb94357c..4807ef79a94d 100644 --- a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl +++ b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl | |||
@@ -1054,9 +1054,8 @@ | |||
1054 | 1054 | ||
1055 | <para> | 1055 | <para> |
1056 | For a device which allows hotplugging, you can use | 1056 | For a device which allows hotplugging, you can use |
1057 | <function>snd_card_free_in_thread</function>. This one will | 1057 | <function>snd_card_free_when_closed</function>. This one will |
1058 | postpone the destruction and wait in a kernel-thread until all | 1058 | postpone the destruction until all devices are closed. |
1059 | devices are closed. | ||
1060 | </para> | 1059 | </para> |
1061 | 1060 | ||
1062 | </section> | 1061 | </section> |
@@ -1149,7 +1148,7 @@ | |||
1149 | } | 1148 | } |
1150 | chip->port = pci_resource_start(pci, 0); | 1149 | chip->port = pci_resource_start(pci, 0); |
1151 | if (request_irq(pci->irq, snd_mychip_interrupt, | 1150 | if (request_irq(pci->irq, snd_mychip_interrupt, |
1152 | SA_INTERRUPT|SA_SHIRQ, "My Chip", chip)) { | 1151 | IRQF_DISABLED|IRQF_SHARED, "My Chip", chip)) { |
1153 | printk(KERN_ERR "cannot grab irq %d\n", pci->irq); | 1152 | printk(KERN_ERR "cannot grab irq %d\n", pci->irq); |
1154 | snd_mychip_free(chip); | 1153 | snd_mychip_free(chip); |
1155 | return -EBUSY; | 1154 | return -EBUSY; |
@@ -1172,7 +1171,7 @@ | |||
1172 | } | 1171 | } |
1173 | 1172 | ||
1174 | /* PCI IDs */ | 1173 | /* PCI IDs */ |
1175 | static struct pci_device_id snd_mychip_ids[] __devinitdata = { | 1174 | static struct pci_device_id snd_mychip_ids[] = { |
1176 | { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, | 1175 | { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, |
1177 | PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, | 1176 | PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, |
1178 | .... | 1177 | .... |
@@ -1323,7 +1322,7 @@ | |||
1323 | <programlisting> | 1322 | <programlisting> |
1324 | <![CDATA[ | 1323 | <![CDATA[ |
1325 | if (request_irq(pci->irq, snd_mychip_interrupt, | 1324 | if (request_irq(pci->irq, snd_mychip_interrupt, |
1326 | SA_INTERRUPT|SA_SHIRQ, "My Chip", chip)) { | 1325 | IRQF_DISABLED|IRQF_SHARED, "My Chip", chip)) { |
1327 | printk(KERN_ERR "cannot grab irq %d\n", pci->irq); | 1326 | printk(KERN_ERR "cannot grab irq %d\n", pci->irq); |
1328 | snd_mychip_free(chip); | 1327 | snd_mychip_free(chip); |
1329 | return -EBUSY; | 1328 | return -EBUSY; |
@@ -1342,7 +1341,7 @@ | |||
1342 | 1341 | ||
1343 | <para> | 1342 | <para> |
1344 | On the PCI bus, the interrupts can be shared. Thus, | 1343 | On the PCI bus, the interrupts can be shared. Thus, |
1345 | <constant>SA_SHIRQ</constant> is given as the interrupt flag of | 1344 | <constant>IRQF_SHARED</constant> is given as the interrupt flag of |
1346 | <function>request_irq()</function>. | 1345 | <function>request_irq()</function>. |
1347 | </para> | 1346 | </para> |
1348 | 1347 | ||
@@ -1565,7 +1564,7 @@ | |||
1565 | <informalexample> | 1564 | <informalexample> |
1566 | <programlisting> | 1565 | <programlisting> |
1567 | <![CDATA[ | 1566 | <![CDATA[ |
1568 | static struct pci_device_id snd_mychip_ids[] __devinitdata = { | 1567 | static struct pci_device_id snd_mychip_ids[] = { |
1569 | { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, | 1568 | { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, |
1570 | PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, | 1569 | PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, |
1571 | .... | 1570 | .... |
@@ -3048,7 +3047,7 @@ struct _snd_pcm_runtime { | |||
3048 | </para> | 3047 | </para> |
3049 | 3048 | ||
3050 | <para> | 3049 | <para> |
3051 | If you aquire a spinlock in the interrupt handler, and the | 3050 | If you acquire a spinlock in the interrupt handler, and the |
3052 | lock is used in other pcm callbacks, too, then you have to | 3051 | lock is used in other pcm callbacks, too, then you have to |
3053 | release the lock before calling | 3052 | release the lock before calling |
3054 | <function>snd_pcm_period_elapsed()</function>, because | 3053 | <function>snd_pcm_period_elapsed()</function>, because |
diff --git a/Documentation/sparc/sbus_drivers.txt b/Documentation/sparc/sbus_drivers.txt index 876195dc2aef..4b9351624f13 100644 --- a/Documentation/sparc/sbus_drivers.txt +++ b/Documentation/sparc/sbus_drivers.txt | |||
@@ -25,42 +25,84 @@ the bits necessary to run your device. The most commonly | |||
25 | used members of this structure, and their typical usage, | 25 | used members of this structure, and their typical usage, |
26 | will be detailed below. | 26 | will be detailed below. |
27 | 27 | ||
28 | Here is how probing is performed by an SBUS driver | 28 | Here is a piece of skeleton code for perofming a device |
29 | under Linux: | 29 | probe in an SBUS driverunder Linux: |
30 | 30 | ||
31 | static void init_one_mydevice(struct sbus_dev *sdev) | 31 | static int __devinit mydevice_probe_one(struct sbus_dev *sdev) |
32 | { | 32 | { |
33 | struct mysdevice *mp = kzalloc(sizeof(*mp), GFP_KERNEL); | ||
34 | |||
35 | if (!mp) | ||
36 | return -ENODEV; | ||
37 | |||
38 | ... | ||
39 | dev_set_drvdata(&sdev->ofdev.dev, mp); | ||
40 | return 0; | ||
33 | ... | 41 | ... |
34 | } | 42 | } |
35 | 43 | ||
36 | static int mydevice_match(struct sbus_dev *sdev) | 44 | static int __devinit mydevice_probe(struct of_device *dev, |
45 | const struct of_device_id *match) | ||
37 | { | 46 | { |
38 | if (some_criteria(sdev)) | 47 | struct sbus_dev *sdev = to_sbus_device(&dev->dev); |
39 | return 1; | 48 | |
40 | return 0; | 49 | return mydevice_probe_one(sdev); |
41 | } | 50 | } |
42 | 51 | ||
43 | static void mydevice_probe(void) | 52 | static int __devexit mydevice_remove(struct of_device *dev) |
44 | { | 53 | { |
45 | struct sbus_bus *sbus; | 54 | struct sbus_dev *sdev = to_sbus_device(&dev->dev); |
46 | struct sbus_dev *sdev; | 55 | struct mydevice *mp = dev_get_drvdata(&dev->dev); |
47 | 56 | ||
48 | for_each_sbus(sbus) { | 57 | return mydevice_remove_one(sdev, mp); |
49 | for_each_sbusdev(sdev, sbus) { | ||
50 | if (mydevice_match(sdev)) | ||
51 | init_one_mydevice(sdev); | ||
52 | } | ||
53 | } | ||
54 | } | 58 | } |
55 | 59 | ||
56 | All this does is walk through all SBUS devices in the | 60 | static struct of_device_id mydevice_match[] = { |
57 | system, checks each to see if it is of the type which | 61 | { |
58 | your driver is written for, and if so it calls the init | 62 | .name = "mydevice", |
59 | routine to attach the device and prepare to drive it. | 63 | }, |
64 | {}, | ||
65 | }; | ||
66 | |||
67 | MODULE_DEVICE_TABLE(of, mydevice_match); | ||
60 | 68 | ||
61 | "init_one_mydevice" might do things like allocate software | 69 | static struct of_platform_driver mydevice_driver = { |
62 | state structures, map in I/O registers, place the hardware | 70 | .name = "mydevice", |
63 | into an initialized state, etc. | 71 | .match_table = mydevice_match, |
72 | .probe = mydevice_probe, | ||
73 | .remove = __devexit_p(mydevice_remove), | ||
74 | }; | ||
75 | |||
76 | static int __init mydevice_init(void) | ||
77 | { | ||
78 | return of_register_driver(&mydevice_driver, &sbus_bus_type); | ||
79 | } | ||
80 | |||
81 | static void __exit mydevice_exit(void) | ||
82 | { | ||
83 | of_unregister_driver(&mydevice_driver); | ||
84 | } | ||
85 | |||
86 | module_init(mydevice_init); | ||
87 | module_exit(mydevice_exit); | ||
88 | |||
89 | The mydevice_match table is a series of entries which | ||
90 | describes what SBUS devices your driver is meant for. In the | ||
91 | simplest case you specify a string for the 'name' field. Every | ||
92 | SBUS device with a 'name' property matching your string will | ||
93 | be passed one-by-one to your .probe method. | ||
94 | |||
95 | You should store away your device private state structure | ||
96 | pointer in the drvdata area so that you can retrieve it later on | ||
97 | in your .remove method. | ||
98 | |||
99 | Any memory allocated, registers mapped, IRQs registered, | ||
100 | etc. must be undone by your .remove method so that all resources | ||
101 | of your device are relased by the time it returns. | ||
102 | |||
103 | You should _NOT_ use the for_each_sbus(), for_each_sbusdev(), | ||
104 | and for_all_sbusdev() interfaces. They are deprecated, will be | ||
105 | removed, and no new driver should reference them ever. | ||
64 | 106 | ||
65 | Mapping and Accessing I/O Registers | 107 | Mapping and Accessing I/O Registers |
66 | 108 | ||
@@ -263,10 +305,3 @@ discussed above and plus it handles both PCI and SBUS boards. | |||
263 | Lance driver abuses consistent mappings for data transfer. | 305 | Lance driver abuses consistent mappings for data transfer. |
264 | It is a nifty trick which we do not particularly recommend... | 306 | It is a nifty trick which we do not particularly recommend... |
265 | Just check it out and know that it's legal. | 307 | Just check it out and know that it's legal. |
266 | |||
267 | Bad examples, do NOT use | ||
268 | |||
269 | drivers/video/cgsix.c | ||
270 | This one uses result of sbus_ioremap as if it is an address. | ||
271 | This does NOT work on sparc64 and therefore is broken. We will | ||
272 | convert it at a later date. | ||
diff --git a/Documentation/sparse.txt b/Documentation/sparse.txt index 5a311c38dd1a..f9c99c9a54f9 100644 --- a/Documentation/sparse.txt +++ b/Documentation/sparse.txt | |||
@@ -69,10 +69,10 @@ recompiled, or use "make C=2" to run sparse on the files whether they need to | |||
69 | be recompiled or not. The latter is a fast way to check the whole tree if you | 69 | be recompiled or not. The latter is a fast way to check the whole tree if you |
70 | have already built it. | 70 | have already built it. |
71 | 71 | ||
72 | The optional make variable CF can be used to pass arguments to sparse. The | 72 | The optional make variable CHECKFLAGS can be used to pass arguments to sparse. |
73 | build system passes -Wbitwise to sparse automatically. To perform endianness | 73 | The build system passes -Wbitwise to sparse automatically. To perform |
74 | checks, you may define __CHECK_ENDIAN__: | 74 | endianness checks, you may define __CHECK_ENDIAN__: |
75 | 75 | ||
76 | make C=2 CF="-D__CHECK_ENDIAN__" | 76 | make C=2 CHECKFLAGS="-D__CHECK_ENDIAN__" |
77 | 77 | ||
78 | These checks are disabled by default as they generate a host of warnings. | 78 | These checks are disabled by default as they generate a host of warnings. |
diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 0b62c62142cf..5c3a51905969 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt | |||
@@ -25,6 +25,7 @@ Currently, these files are in /proc/sys/fs: | |||
25 | - inode-state | 25 | - inode-state |
26 | - overflowuid | 26 | - overflowuid |
27 | - overflowgid | 27 | - overflowgid |
28 | - suid_dumpable | ||
28 | - super-max | 29 | - super-max |
29 | - super-nr | 30 | - super-nr |
30 | 31 | ||
@@ -131,6 +132,25 @@ The default is 65534. | |||
131 | 132 | ||
132 | ============================================================== | 133 | ============================================================== |
133 | 134 | ||
135 | suid_dumpable: | ||
136 | |||
137 | This value can be used to query and set the core dump mode for setuid | ||
138 | or otherwise protected/tainted binaries. The modes are | ||
139 | |||
140 | 0 - (default) - traditional behaviour. Any process which has changed | ||
141 | privilege levels or is execute only will not be dumped | ||
142 | 1 - (debug) - all processes dump core when possible. The core dump is | ||
143 | owned by the current user and no security is applied. This is | ||
144 | intended for system debugging situations only. Ptrace is unchecked. | ||
145 | 2 - (suidsafe) - any binary which normally would not be dumped is dumped | ||
146 | readable by root only. This allows the end user to remove | ||
147 | such a dump but not access it directly. For security reasons | ||
148 | core dumps in this mode will not overwrite one another or | ||
149 | other files. This mode is appropriate when adminstrators are | ||
150 | attempting to debug problems in a normal environment. | ||
151 | |||
152 | ============================================================== | ||
153 | |||
134 | super-max & super-nr: | 154 | super-max & super-nr: |
135 | 155 | ||
136 | These numbers control the maximum number of superblocks, and | 156 | These numbers control the maximum number of superblocks, and |
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index b0c7ab93dcb9..89bf8c20a586 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt | |||
@@ -50,7 +50,6 @@ show up in /proc/sys/kernel: | |||
50 | - shmmax [ sysv ipc ] | 50 | - shmmax [ sysv ipc ] |
51 | - shmmni | 51 | - shmmni |
52 | - stop-a [ SPARC only ] | 52 | - stop-a [ SPARC only ] |
53 | - suid_dumpable | ||
54 | - sysrq ==> Documentation/sysrq.txt | 53 | - sysrq ==> Documentation/sysrq.txt |
55 | - tainted | 54 | - tainted |
56 | - threads-max | 55 | - threads-max |
@@ -211,9 +210,8 @@ Controls the kernel's behaviour when an oops or BUG is encountered. | |||
211 | 210 | ||
212 | 0: try to continue operation | 211 | 0: try to continue operation |
213 | 212 | ||
214 | 1: delay a few seconds (to give klogd time to record the oops output) and | 213 | 1: panic immediatly. If the `panic' sysctl is also non-zero then the |
215 | then panic. If the `panic' sysctl is also non-zero then the machine will | 214 | machine will be rebooted. |
216 | be rebooted. | ||
217 | 215 | ||
218 | ============================================================== | 216 | ============================================================== |
219 | 217 | ||
@@ -311,25 +309,6 @@ kernel. This value defaults to SHMMAX. | |||
311 | 309 | ||
312 | ============================================================== | 310 | ============================================================== |
313 | 311 | ||
314 | suid_dumpable: | ||
315 | |||
316 | This value can be used to query and set the core dump mode for setuid | ||
317 | or otherwise protected/tainted binaries. The modes are | ||
318 | |||
319 | 0 - (default) - traditional behaviour. Any process which has changed | ||
320 | privilege levels or is execute only will not be dumped | ||
321 | 1 - (debug) - all processes dump core when possible. The core dump is | ||
322 | owned by the current user and no security is applied. This is | ||
323 | intended for system debugging situations only. Ptrace is unchecked. | ||
324 | 2 - (suidsafe) - any binary which normally would not be dumped is dumped | ||
325 | readable by root only. This allows the end user to remove | ||
326 | such a dump but not access it directly. For security reasons | ||
327 | core dumps in this mode will not overwrite one another or | ||
328 | other files. This mode is appropriate when adminstrators are | ||
329 | attempting to debug problems in a normal environment. | ||
330 | |||
331 | ============================================================== | ||
332 | |||
333 | tainted: | 312 | tainted: |
334 | 313 | ||
335 | Non-zero if the kernel has been tainted. Numeric values, which | 314 | Non-zero if the kernel has been tainted. Numeric values, which |
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 2dc246af4885..20d0d797f539 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt | |||
@@ -28,7 +28,8 @@ Currently, these files are in /proc/sys/vm: | |||
28 | - block_dump | 28 | - block_dump |
29 | - drop-caches | 29 | - drop-caches |
30 | - zone_reclaim_mode | 30 | - zone_reclaim_mode |
31 | - zone_reclaim_interval | 31 | - min_unmapped_ratio |
32 | - min_slab_ratio | ||
32 | - panic_on_oom | 33 | - panic_on_oom |
33 | 34 | ||
34 | ============================================================== | 35 | ============================================================== |
@@ -138,7 +139,6 @@ This is value ORed together of | |||
138 | 1 = Zone reclaim on | 139 | 1 = Zone reclaim on |
139 | 2 = Zone reclaim writes dirty pages out | 140 | 2 = Zone reclaim writes dirty pages out |
140 | 4 = Zone reclaim swaps pages | 141 | 4 = Zone reclaim swaps pages |
141 | 8 = Also do a global slab reclaim pass | ||
142 | 142 | ||
143 | zone_reclaim_mode is set during bootup to 1 if it is determined that pages | 143 | zone_reclaim_mode is set during bootup to 1 if it is determined that pages |
144 | from remote zones will cause a measurable performance reduction. The | 144 | from remote zones will cause a measurable performance reduction. The |
@@ -162,22 +162,36 @@ Allowing regular swap effectively restricts allocations to the local | |||
162 | node unless explicitly overridden by memory policies or cpuset | 162 | node unless explicitly overridden by memory policies or cpuset |
163 | configurations. | 163 | configurations. |
164 | 164 | ||
165 | It may be advisable to allow slab reclaim if the system makes heavy | 165 | ============================================================= |
166 | use of files and builds up large slab caches. However, the slab | 166 | |
167 | shrink operation is global, may take a long time and free slabs | 167 | min_unmapped_ratio: |
168 | in all nodes of the system. | 168 | |
169 | This is available only on NUMA kernels. | ||
170 | |||
171 | A percentage of the total pages in each zone. Zone reclaim will only | ||
172 | occur if more than this percentage of pages are file backed and unmapped. | ||
173 | This is to insure that a minimal amount of local pages is still available for | ||
174 | file I/O even if the node is overallocated. | ||
175 | |||
176 | The default is 1 percent. | ||
177 | |||
178 | ============================================================= | ||
169 | 179 | ||
170 | ================================================================ | 180 | min_slab_ratio: |
171 | 181 | ||
172 | zone_reclaim_interval: | 182 | This is available only on NUMA kernels. |
173 | 183 | ||
174 | The time allowed for off node allocations after zone reclaim | 184 | A percentage of the total pages in each zone. On Zone reclaim |
175 | has failed to reclaim enough pages to allow a local allocation. | 185 | (fallback from the local zone occurs) slabs will be reclaimed if more |
186 | than this percentage of pages in a zone are reclaimable slab pages. | ||
187 | This insures that the slab growth stays under control even in NUMA | ||
188 | systems that rarely perform global reclaim. | ||
176 | 189 | ||
177 | Time is set in seconds and set by default to 30 seconds. | 190 | The default is 5 percent. |
178 | 191 | ||
179 | Reduce the interval if undesired off node allocations occur. However, too | 192 | Note that slab reclaim is triggered in a per zone / node fashion. |
180 | frequent scans will have a negative impact onoff node allocation performance. | 193 | The process of reclaiming slab memory is currently not node specific |
194 | and may not be fast. | ||
181 | 195 | ||
182 | ============================================================= | 196 | ============================================================= |
183 | 197 | ||
diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt index ad0bedf678b3..e0188a23fd5e 100644 --- a/Documentation/sysrq.txt +++ b/Documentation/sysrq.txt | |||
@@ -115,8 +115,9 @@ trojan program is running at console and which could grab your password | |||
115 | when you would try to login. It will kill all programs on given console | 115 | when you would try to login. It will kill all programs on given console |
116 | and thus letting you make sure that the login prompt you see is actually | 116 | and thus letting you make sure that the login prompt you see is actually |
117 | the one from init, not some trojan program. | 117 | the one from init, not some trojan program. |
118 | IMPORTANT:In its true form it is not a true SAK like the one in :IMPORTANT | 118 | IMPORTANT: In its true form it is not a true SAK like the one in a :IMPORTANT |
119 | IMPORTANT:c2 compliant systems, and it should be mistook as such. :IMPORTANT | 119 | IMPORTANT: c2 compliant system, and it should not be mistaken as :IMPORTANT |
120 | IMPORTANT: such. :IMPORTANT | ||
120 | It seems other find it useful as (System Attention Key) which is | 121 | It seems other find it useful as (System Attention Key) which is |
121 | useful when you want to exit a program that will not let you switch consoles. | 122 | useful when you want to exit a program that will not let you switch consoles. |
122 | (For example, X or a svgalib program.) | 123 | (For example, X or a svgalib program.) |
diff --git a/Documentation/tty.txt b/Documentation/tty.txt index 8ff7bc2a0811..dab56604745d 100644 --- a/Documentation/tty.txt +++ b/Documentation/tty.txt | |||
@@ -80,13 +80,6 @@ receive_buf() - Hand buffers of bytes from the driver to the ldisc | |||
80 | for processing. Semantics currently rather | 80 | for processing. Semantics currently rather |
81 | mysterious 8( | 81 | mysterious 8( |
82 | 82 | ||
83 | receive_room() - Can be called by the driver layer at any time when | ||
84 | the ldisc is opened. The ldisc must be able to | ||
85 | handle the reported amount of data at that instant. | ||
86 | Synchronization between active receive_buf and | ||
87 | receive_room calls is down to the driver not the | ||
88 | ldisc. Must not sleep. | ||
89 | |||
90 | write_wakeup() - May be called at any point between open and close. | 83 | write_wakeup() - May be called at any point between open and close. |
91 | The TTY_DO_WRITE_WAKEUP flag indicates if a call | 84 | The TTY_DO_WRITE_WAKEUP flag indicates if a call |
92 | is needed but always races versus calls. Thus the | 85 | is needed but always races versus calls. Thus the |
diff --git a/Documentation/usb/error-codes.txt b/Documentation/usb/error-codes.txt index 867f4c38f356..39c68f8c4e6c 100644 --- a/Documentation/usb/error-codes.txt +++ b/Documentation/usb/error-codes.txt | |||
@@ -98,13 +98,13 @@ one or more packets could finish before an error stops further endpoint I/O. | |||
98 | error, a failure to respond (often caused by | 98 | error, a failure to respond (often caused by |
99 | device disconnect), or some other fault. | 99 | device disconnect), or some other fault. |
100 | 100 | ||
101 | -ETIMEDOUT (**) No response packet received within the prescribed | 101 | -ETIME (**) No response packet received within the prescribed |
102 | bus turn-around time. This error may instead be | 102 | bus turn-around time. This error may instead be |
103 | reported as -EPROTO or -EILSEQ. | 103 | reported as -EPROTO or -EILSEQ. |
104 | 104 | ||
105 | Note that the synchronous USB message functions | 105 | -ETIMEDOUT Synchronous USB message functions use this code |
106 | also use this code to indicate timeout expired | 106 | to indicate timeout expired before the transfer |
107 | before the transfer completed. | 107 | completed, and no other error was reported by HC. |
108 | 108 | ||
109 | -EPIPE (**) Endpoint stalled. For non-control endpoints, | 109 | -EPIPE (**) Endpoint stalled. For non-control endpoints, |
110 | reset this status with usb_clear_halt(). | 110 | reset this status with usb_clear_halt(). |
@@ -163,6 +163,3 @@ usb_get_*/usb_set_*(): | |||
163 | usb_control_msg(): | 163 | usb_control_msg(): |
164 | usb_bulk_msg(): | 164 | usb_bulk_msg(): |
165 | -ETIMEDOUT Timeout expired before the transfer completed. | 165 | -ETIMEDOUT Timeout expired before the transfer completed. |
166 | In the future this code may change to -ETIME, | ||
167 | whose definition is a closer match to this sort | ||
168 | of error. | ||
diff --git a/Documentation/usb/proc_usb_info.txt b/Documentation/usb/proc_usb_info.txt index f86550fe38ee..22c5331260ca 100644 --- a/Documentation/usb/proc_usb_info.txt +++ b/Documentation/usb/proc_usb_info.txt | |||
@@ -59,7 +59,7 @@ bind to an interface (or perhaps several) using an ioctl call. You | |||
59 | would issue more ioctls to the device to communicate to it using | 59 | would issue more ioctls to the device to communicate to it using |
60 | control, bulk, or other kinds of USB transfers. The IOCTLs are | 60 | control, bulk, or other kinds of USB transfers. The IOCTLs are |
61 | listed in the <linux/usbdevice_fs.h> file, and at this writing the | 61 | listed in the <linux/usbdevice_fs.h> file, and at this writing the |
62 | source code (linux/drivers/usb/devio.c) is the primary reference | 62 | source code (linux/drivers/usb/core/devio.c) is the primary reference |
63 | for how to access devices through those files. | 63 | for how to access devices through those files. |
64 | 64 | ||
65 | Note that since by default these BBB/DDD files are writable only by | 65 | Note that since by default these BBB/DDD files are writable only by |
diff --git a/Documentation/usb/usb-help.txt b/Documentation/usb/usb-help.txt index b7c324973695..a7408593829f 100644 --- a/Documentation/usb/usb-help.txt +++ b/Documentation/usb/usb-help.txt | |||
@@ -5,8 +5,7 @@ For USB help other than the readme files that are located in | |||
5 | Documentation/usb/*, see the following: | 5 | Documentation/usb/*, see the following: |
6 | 6 | ||
7 | Linux-USB project: http://www.linux-usb.org | 7 | Linux-USB project: http://www.linux-usb.org |
8 | mirrors at http://www.suse.cz/development/linux-usb/ | 8 | mirrors at http://usb.in.tum.de/linux-usb/ |
9 | and http://usb.in.tum.de/linux-usb/ | ||
10 | and http://it.linux-usb.org | 9 | and http://it.linux-usb.org |
11 | Linux USB Guide: http://linux-usb.sourceforge.net | 10 | Linux USB Guide: http://linux-usb.sourceforge.net |
12 | Linux-USB device overview (working devices and drivers): | 11 | Linux-USB device overview (working devices and drivers): |
diff --git a/Documentation/usb/usb-serial.txt b/Documentation/usb/usb-serial.txt index f001cd93b79b..a2dee6e6190d 100644 --- a/Documentation/usb/usb-serial.txt +++ b/Documentation/usb/usb-serial.txt | |||
@@ -399,10 +399,10 @@ REINER SCT cyberJack pinpad/e-com USB chipcard reader | |||
399 | 399 | ||
400 | Prolific PL2303 Driver | 400 | Prolific PL2303 Driver |
401 | 401 | ||
402 | This driver support any device that has the PL2303 chip from Prolific | 402 | This driver supports any device that has the PL2303 chip from Prolific |
403 | in it. This includes a number of single port USB to serial | 403 | in it. This includes a number of single port USB to serial |
404 | converters and USB GPS devices. Devices from Aten (the UC-232) and | 404 | converters and USB GPS devices. Devices from Aten (the UC-232) and |
405 | IO-Data work with this driver. | 405 | IO-Data work with this driver, as does the DCU-11 mobile-phone cable. |
406 | 406 | ||
407 | For any questions or problems with this driver, please contact Greg | 407 | For any questions or problems with this driver, please contact Greg |
408 | Kroah-Hartman at greg@kroah.com | 408 | Kroah-Hartman at greg@kroah.com |
@@ -433,6 +433,11 @@ Options supported: | |||
433 | See http://www.uuhaus.de/linux/palmconnect.html for up-to-date | 433 | See http://www.uuhaus.de/linux/palmconnect.html for up-to-date |
434 | information on this driver. | 434 | information on this driver. |
435 | 435 | ||
436 | AIRcable USB Dongle Bluetooth driver | ||
437 | If there is the cdc_acm driver loaded in the system, you will find that the | ||
438 | cdc_acm claims the device before AIRcable can. This is simply corrected | ||
439 | by unloading both modules and then loading the aircable module before | ||
440 | cdc_acm module | ||
436 | 441 | ||
437 | Generic Serial driver | 442 | Generic Serial driver |
438 | 443 | ||
diff --git a/Documentation/video4linux/CARDLIST.bttv b/Documentation/video4linux/CARDLIST.bttv index b72706c58a44..4efa4645885f 100644 --- a/Documentation/video4linux/CARDLIST.bttv +++ b/Documentation/video4linux/CARDLIST.bttv | |||
@@ -87,7 +87,7 @@ | |||
87 | 86 -> Osprey 101/151 w/ svid | 87 | 86 -> Osprey 101/151 w/ svid |
88 | 87 -> Osprey 200/201/250/251 | 88 | 87 -> Osprey 200/201/250/251 |
89 | 88 -> Osprey 200/250 [0070:ff01] | 89 | 88 -> Osprey 200/250 [0070:ff01] |
90 | 89 -> Osprey 210/220 | 90 | 89 -> Osprey 210/220/230 |
91 | 90 -> Osprey 500 [0070:ff02] | 91 | 90 -> Osprey 500 [0070:ff02] |
92 | 91 -> Osprey 540 [0070:ff04] | 92 | 91 -> Osprey 540 [0070:ff04] |
93 | 92 -> Osprey 2000 [0070:ff03] | 93 | 92 -> Osprey 2000 [0070:ff03] |
@@ -111,7 +111,7 @@ | |||
111 | 110 -> IVC-100 [ff00:a132] | 111 | 110 -> IVC-100 [ff00:a132] |
112 | 111 -> IVC-120G [ff00:a182,ff01:a182,ff02:a182,ff03:a182,ff04:a182,ff05:a182,ff06:a182,ff07:a182,ff08:a182,ff09:a182,ff0a:a182,ff0b:a182,ff0c:a182,ff0d:a182,ff0e:a182,ff0f:a182] | 112 | 111 -> IVC-120G [ff00:a182,ff01:a182,ff02:a182,ff03:a182,ff04:a182,ff05:a182,ff06:a182,ff07:a182,ff08:a182,ff09:a182,ff0a:a182,ff0b:a182,ff0c:a182,ff0d:a182,ff0e:a182,ff0f:a182] |
113 | 112 -> pcHDTV HD-2000 TV [7063:2000] | 113 | 112 -> pcHDTV HD-2000 TV [7063:2000] |
114 | 113 -> Twinhan DST + clones [11bd:0026,1822:0001,270f:fc00] | 114 | 113 -> Twinhan DST + clones [11bd:0026,1822:0001,270f:fc00,1822:0026] |
115 | 114 -> Winfast VC100 [107d:6607] | 115 | 114 -> Winfast VC100 [107d:6607] |
116 | 115 -> Teppro TEV-560/InterVision IV-560 | 116 | 115 -> Teppro TEV-560/InterVision IV-560 |
117 | 116 -> SIMUS GVC1100 [aa6a:82b2] | 117 | 116 -> SIMUS GVC1100 [aa6a:82b2] |
diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88 index 3b39a91b24bd..00d9a1f2a54c 100644 --- a/Documentation/video4linux/CARDLIST.cx88 +++ b/Documentation/video4linux/CARDLIST.cx88 | |||
@@ -15,7 +15,7 @@ | |||
15 | 14 -> KWorld/VStream XPert DVB-T [17de:08a6] | 15 | 14 -> KWorld/VStream XPert DVB-T [17de:08a6] |
16 | 15 -> DViCO FusionHDTV DVB-T1 [18ac:db00] | 16 | 15 -> DViCO FusionHDTV DVB-T1 [18ac:db00] |
17 | 16 -> KWorld LTV883RF | 17 | 16 -> KWorld LTV883RF |
18 | 17 -> DViCO FusionHDTV 3 Gold-Q [18ac:d810] | 18 | 17 -> DViCO FusionHDTV 3 Gold-Q [18ac:d810,18ac:d800] |
19 | 18 -> Hauppauge Nova-T DVB-T [0070:9002,0070:9001] | 19 | 18 -> Hauppauge Nova-T DVB-T [0070:9002,0070:9001] |
20 | 19 -> Conexant DVB-T reference design [14f1:0187] | 20 | 19 -> Conexant DVB-T reference design [14f1:0187] |
21 | 20 -> Provideo PV259 [1540:2580] | 21 | 20 -> Provideo PV259 [1540:2580] |
@@ -40,8 +40,14 @@ | |||
40 | 39 -> KWorld DVB-S 100 [17de:08b2] | 40 | 39 -> KWorld DVB-S 100 [17de:08b2] |
41 | 40 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid [0070:9400,0070:9402] | 41 | 40 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid [0070:9400,0070:9402] |
42 | 41 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid (Low Profile) [0070:9800,0070:9802] | 42 | 41 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid (Low Profile) [0070:9800,0070:9802] |
43 | 42 -> digitalnow DNTV Live! DVB-T Pro [1822:0025] | 43 | 42 -> digitalnow DNTV Live! DVB-T Pro [1822:0025,1822:0019] |
44 | 43 -> KWorld/VStream XPert DVB-T with cx22702 [17de:08a1] | 44 | 43 -> KWorld/VStream XPert DVB-T with cx22702 [17de:08a1] |
45 | 44 -> DViCO FusionHDTV DVB-T Dual Digital [18ac:db50,18ac:db54] | 45 | 44 -> DViCO FusionHDTV DVB-T Dual Digital [18ac:db50,18ac:db54] |
46 | 45 -> KWorld HardwareMpegTV XPert [17de:0840] | 46 | 45 -> KWorld HardwareMpegTV XPert [17de:0840] |
47 | 46 -> DViCO FusionHDTV DVB-T Hybrid [18ac:db40,18ac:db44] | 47 | 46 -> DViCO FusionHDTV DVB-T Hybrid [18ac:db40,18ac:db44] |
48 | 47 -> pcHDTV HD5500 HDTV [7063:5500] | ||
49 | 48 -> Kworld MCE 200 Deluxe [17de:0841] | ||
50 | 49 -> PixelView PlayTV P7000 [1554:4813] | ||
51 | 50 -> NPG Tech Real TV FM Top 10 [14f1:0842] | ||
52 | 51 -> WinFast DTV2000 H [107d:665e] | ||
53 | 52 -> Geniatech DVB-S [14f1:0084] | ||
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134 index bca50903233f..9068b669f5ee 100644 --- a/Documentation/video4linux/CARDLIST.saa7134 +++ b/Documentation/video4linux/CARDLIST.saa7134 | |||
@@ -93,3 +93,4 @@ | |||
93 | 92 -> AVerMedia A169 B1 [1461:6360] | 93 | 92 -> AVerMedia A169 B1 [1461:6360] |
94 | 93 -> Medion 7134 Bridge #2 [16be:0005] | 94 | 93 -> Medion 7134 Bridge #2 [16be:0005] |
95 | 94 -> LifeView FlyDVB-T Hybrid Cardbus [5168:3306,5168:3502] | 95 | 94 -> LifeView FlyDVB-T Hybrid Cardbus [5168:3306,5168:3502] |
96 | 95 -> LifeView FlyVIDEO3000 (NTSC) [5169:0138] | ||
diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner index 1bcdac67dd8c..44134f04b82a 100644 --- a/Documentation/video4linux/CARDLIST.tuner +++ b/Documentation/video4linux/CARDLIST.tuner | |||
@@ -62,7 +62,7 @@ tuner=60 - Thomson DTT 761X (ATSC/NTSC) | |||
62 | tuner=61 - Tena TNF9533-D/IF/TNF9533-B/DF | 62 | tuner=61 - Tena TNF9533-D/IF/TNF9533-B/DF |
63 | tuner=62 - Philips TEA5767HN FM Radio | 63 | tuner=62 - Philips TEA5767HN FM Radio |
64 | tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner | 64 | tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner |
65 | tuner=64 - LG TDVS-H062F/TUA6034 | 65 | tuner=64 - LG TDVS-H06xF |
66 | tuner=65 - Ymec TVF66T5-B/DFF | 66 | tuner=65 - Ymec TVF66T5-B/DFF |
67 | tuner=66 - LG TALN series | 67 | tuner=66 - LG TALN series |
68 | tuner=67 - Philips TD1316 Hybrid Tuner | 68 | tuner=67 - Philips TD1316 Hybrid Tuner |
@@ -71,3 +71,4 @@ tuner=69 - Tena TNF 5335 and similar models | |||
71 | tuner=70 - Samsung TCPN 2121P30A | 71 | tuner=70 - Samsung TCPN 2121P30A |
72 | tuner=71 - Xceive xc3028 | 72 | tuner=71 - Xceive xc3028 |
73 | tuner=72 - Thomson FE6600 | 73 | tuner=72 - Thomson FE6600 |
74 | tuner=73 - Samsung TCPG 6121P30A | ||
diff --git a/Documentation/video4linux/CQcam.txt b/Documentation/video4linux/CQcam.txt index 464e4cec94cb..ade8651e2443 100644 --- a/Documentation/video4linux/CQcam.txt +++ b/Documentation/video4linux/CQcam.txt | |||
@@ -185,207 +185,10 @@ this work is documented at the video4linux2 site listed below. | |||
185 | 185 | ||
186 | 9.0 --- A sample program using v4lgrabber, | 186 | 9.0 --- A sample program using v4lgrabber, |
187 | 187 | ||
188 | This program is a simple image grabber that will copy a frame from the | 188 | v4lgrab is a simple image grabber that will copy a frame from the |
189 | first video device, /dev/video0 to standard output in portable pixmap | 189 | first video device, /dev/video0 to standard output in portable pixmap |
190 | format (.ppm) Using this like: 'v4lgrab | convert - c-qcam.jpg' | 190 | format (.ppm) To produce .jpg output, you can use it like this: |
191 | produced this picture of me at | 191 | 'v4lgrab | convert - c-qcam.jpg' |
192 | http://mug.sys.virginia.edu/~drf5n/extras/c-qcam.jpg | ||
193 | |||
194 | -------------------- 8< ---------------- 8< ----------------------------- | ||
195 | |||
196 | /* Simple Video4Linux image grabber. */ | ||
197 | /* | ||
198 | * Video4Linux Driver Test/Example Framegrabbing Program | ||
199 | * | ||
200 | * Compile with: | ||
201 | * gcc -s -Wall -Wstrict-prototypes v4lgrab.c -o v4lgrab | ||
202 | * Use as: | ||
203 | * v4lgrab >image.ppm | ||
204 | * | ||
205 | * Copyright (C) 1998-05-03, Phil Blundell <philb@gnu.org> | ||
206 | * Copied from http://www.tazenda.demon.co.uk/phil/vgrabber.c | ||
207 | * with minor modifications (Dave Forrest, drf5n@virginia.edu). | ||
208 | * | ||
209 | */ | ||
210 | |||
211 | #include <unistd.h> | ||
212 | #include <sys/types.h> | ||
213 | #include <sys/stat.h> | ||
214 | #include <fcntl.h> | ||
215 | #include <stdio.h> | ||
216 | #include <sys/ioctl.h> | ||
217 | #include <stdlib.h> | ||
218 | |||
219 | #include <linux/types.h> | ||
220 | #include <linux/videodev.h> | ||
221 | |||
222 | #define FILE "/dev/video0" | ||
223 | |||
224 | /* Stole this from tvset.c */ | ||
225 | |||
226 | #define READ_VIDEO_PIXEL(buf, format, depth, r, g, b) \ | ||
227 | { \ | ||
228 | switch (format) \ | ||
229 | { \ | ||
230 | case VIDEO_PALETTE_GREY: \ | ||
231 | switch (depth) \ | ||
232 | { \ | ||
233 | case 4: \ | ||
234 | case 6: \ | ||
235 | case 8: \ | ||
236 | (r) = (g) = (b) = (*buf++ << 8);\ | ||
237 | break; \ | ||
238 | \ | ||
239 | case 16: \ | ||
240 | (r) = (g) = (b) = \ | ||
241 | *((unsigned short *) buf); \ | ||
242 | buf += 2; \ | ||
243 | break; \ | ||
244 | } \ | ||
245 | break; \ | ||
246 | \ | ||
247 | \ | ||
248 | case VIDEO_PALETTE_RGB565: \ | ||
249 | { \ | ||
250 | unsigned short tmp = *(unsigned short *)buf; \ | ||
251 | (r) = tmp&0xF800; \ | ||
252 | (g) = (tmp<<5)&0xFC00; \ | ||
253 | (b) = (tmp<<11)&0xF800; \ | ||
254 | buf += 2; \ | ||
255 | } \ | ||
256 | break; \ | ||
257 | \ | ||
258 | case VIDEO_PALETTE_RGB555: \ | ||
259 | (r) = (buf[0]&0xF8)<<8; \ | ||
260 | (g) = ((buf[0] << 5 | buf[1] >> 3)&0xF8)<<8; \ | ||
261 | (b) = ((buf[1] << 2 ) & 0xF8)<<8; \ | ||
262 | buf += 2; \ | ||
263 | break; \ | ||
264 | \ | ||
265 | case VIDEO_PALETTE_RGB24: \ | ||
266 | (r) = buf[0] << 8; (g) = buf[1] << 8; \ | ||
267 | (b) = buf[2] << 8; \ | ||
268 | buf += 3; \ | ||
269 | break; \ | ||
270 | \ | ||
271 | default: \ | ||
272 | fprintf(stderr, \ | ||
273 | "Format %d not yet supported\n", \ | ||
274 | format); \ | ||
275 | } \ | ||
276 | } | ||
277 | |||
278 | int get_brightness_adj(unsigned char *image, long size, int *brightness) { | ||
279 | long i, tot = 0; | ||
280 | for (i=0;i<size*3;i++) | ||
281 | tot += image[i]; | ||
282 | *brightness = (128 - tot/(size*3))/3; | ||
283 | return !((tot/(size*3)) >= 126 && (tot/(size*3)) <= 130); | ||
284 | } | ||
285 | |||
286 | int main(int argc, char ** argv) | ||
287 | { | ||
288 | int fd = open(FILE, O_RDONLY), f; | ||
289 | struct video_capability cap; | ||
290 | struct video_window win; | ||
291 | struct video_picture vpic; | ||
292 | |||
293 | unsigned char *buffer, *src; | ||
294 | int bpp = 24, r, g, b; | ||
295 | unsigned int i, src_depth; | ||
296 | |||
297 | if (fd < 0) { | ||
298 | perror(FILE); | ||
299 | exit(1); | ||
300 | } | ||
301 | |||
302 | if (ioctl(fd, VIDIOCGCAP, &cap) < 0) { | ||
303 | perror("VIDIOGCAP"); | ||
304 | fprintf(stderr, "(" FILE " not a video4linux device?)\n"); | ||
305 | close(fd); | ||
306 | exit(1); | ||
307 | } | ||
308 | |||
309 | if (ioctl(fd, VIDIOCGWIN, &win) < 0) { | ||
310 | perror("VIDIOCGWIN"); | ||
311 | close(fd); | ||
312 | exit(1); | ||
313 | } | ||
314 | |||
315 | if (ioctl(fd, VIDIOCGPICT, &vpic) < 0) { | ||
316 | perror("VIDIOCGPICT"); | ||
317 | close(fd); | ||
318 | exit(1); | ||
319 | } | ||
320 | |||
321 | if (cap.type & VID_TYPE_MONOCHROME) { | ||
322 | vpic.depth=8; | ||
323 | vpic.palette=VIDEO_PALETTE_GREY; /* 8bit grey */ | ||
324 | if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { | ||
325 | vpic.depth=6; | ||
326 | if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { | ||
327 | vpic.depth=4; | ||
328 | if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { | ||
329 | fprintf(stderr, "Unable to find a supported capture format.\n"); | ||
330 | close(fd); | ||
331 | exit(1); | ||
332 | } | ||
333 | } | ||
334 | } | ||
335 | } else { | ||
336 | vpic.depth=24; | ||
337 | vpic.palette=VIDEO_PALETTE_RGB24; | ||
338 | |||
339 | if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { | ||
340 | vpic.palette=VIDEO_PALETTE_RGB565; | ||
341 | vpic.depth=16; | ||
342 | |||
343 | if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) { | ||
344 | vpic.palette=VIDEO_PALETTE_RGB555; | ||
345 | vpic.depth=15; | ||
346 | |||
347 | if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) { | ||
348 | fprintf(stderr, "Unable to find a supported capture format.\n"); | ||
349 | return -1; | ||
350 | } | ||
351 | } | ||
352 | } | ||
353 | } | ||
354 | |||
355 | buffer = malloc(win.width * win.height * bpp); | ||
356 | if (!buffer) { | ||
357 | fprintf(stderr, "Out of memory.\n"); | ||
358 | exit(1); | ||
359 | } | ||
360 | |||
361 | do { | ||
362 | int newbright; | ||
363 | read(fd, buffer, win.width * win.height * bpp); | ||
364 | f = get_brightness_adj(buffer, win.width * win.height, &newbright); | ||
365 | if (f) { | ||
366 | vpic.brightness += (newbright << 8); | ||
367 | if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) { | ||
368 | perror("VIDIOSPICT"); | ||
369 | break; | ||
370 | } | ||
371 | } | ||
372 | } while (f); | ||
373 | |||
374 | fprintf(stdout, "P6\n%d %d 255\n", win.width, win.height); | ||
375 | |||
376 | src = buffer; | ||
377 | |||
378 | for (i = 0; i < win.width * win.height; i++) { | ||
379 | READ_VIDEO_PIXEL(src, vpic.palette, src_depth, r, g, b); | ||
380 | fputc(r>>8, stdout); | ||
381 | fputc(g>>8, stdout); | ||
382 | fputc(b>>8, stdout); | ||
383 | } | ||
384 | |||
385 | close(fd); | ||
386 | return 0; | ||
387 | } | ||
388 | -------------------- 8< ---------------- 8< ----------------------------- | ||
389 | 192 | ||
390 | 193 | ||
391 | 10.0 --- Other Information | 194 | 10.0 --- Other Information |
diff --git a/Documentation/video4linux/README.pvrusb2 b/Documentation/video4linux/README.pvrusb2 new file mode 100644 index 000000000000..c73a32c34528 --- /dev/null +++ b/Documentation/video4linux/README.pvrusb2 | |||
@@ -0,0 +1,212 @@ | |||
1 | |||
2 | $Id$ | ||
3 | Mike Isely <isely@pobox.com> | ||
4 | |||
5 | pvrusb2 driver | ||
6 | |||
7 | Background: | ||
8 | |||
9 | This driver is intended for the "Hauppauge WinTV PVR USB 2.0", which | ||
10 | is a USB 2.0 hosted TV Tuner. This driver is a work in progress. | ||
11 | Its history started with the reverse-engineering effort by Björn | ||
12 | Danielsson <pvrusb2@dax.nu> whose web page can be found here: | ||
13 | |||
14 | http://pvrusb2.dax.nu/ | ||
15 | |||
16 | From there Aurelien Alleaume <slts@free.fr> began an effort to | ||
17 | create a video4linux compatible driver. I began with Aurelien's | ||
18 | last known snapshot and evolved the driver to the state it is in | ||
19 | here. | ||
20 | |||
21 | More information on this driver can be found at: | ||
22 | |||
23 | http://www.isely.net/pvrusb2.html | ||
24 | |||
25 | |||
26 | This driver has a strong separation of layers. They are very | ||
27 | roughly: | ||
28 | |||
29 | 1a. Low level wire-protocol implementation with the device. | ||
30 | |||
31 | 1b. I2C adaptor implementation and corresponding I2C client drivers | ||
32 | implemented elsewhere in V4L. | ||
33 | |||
34 | 1c. High level hardware driver implementation which coordinates all | ||
35 | activities that ensure correct operation of the device. | ||
36 | |||
37 | 2. A "context" layer which manages instancing of driver, setup, | ||
38 | tear-down, arbitration, and interaction with high level | ||
39 | interfaces appropriately as devices are hotplugged in the | ||
40 | system. | ||
41 | |||
42 | 3. High level interfaces which glue the driver to various published | ||
43 | Linux APIs (V4L, sysfs, maybe DVB in the future). | ||
44 | |||
45 | The most important shearing layer is between the top 2 layers. A | ||
46 | lot of work went into the driver to ensure that any kind of | ||
47 | conceivable API can be laid on top of the core driver. (Yes, the | ||
48 | driver internally leverages V4L to do its work but that really has | ||
49 | nothing to do with the API published by the driver to the outside | ||
50 | world.) The architecture allows for different APIs to | ||
51 | simultaneously access the driver. I have a strong sense of fairness | ||
52 | about APIs and also feel that it is a good design principle to keep | ||
53 | implementation and interface isolated from each other. Thus while | ||
54 | right now the V4L high level interface is the most complete, the | ||
55 | sysfs high level interface will work equally well for similar | ||
56 | functions, and there's no reason I see right now why it shouldn't be | ||
57 | possible to produce a DVB high level interface that can sit right | ||
58 | alongside V4L. | ||
59 | |||
60 | NOTE: Complete documentation on the pvrusb2 driver is contained in | ||
61 | the html files within the doc directory; these are exactly the same | ||
62 | as what is on the web site at the time. Browse those files | ||
63 | (especially the FAQ) before asking questions. | ||
64 | |||
65 | |||
66 | Building | ||
67 | |||
68 | To build these modules essentially amounts to just running "Make", | ||
69 | but you need the kernel source tree nearby and you will likely also | ||
70 | want to set a few controlling environment variables first in order | ||
71 | to link things up with that source tree. Please see the Makefile | ||
72 | here for comments that explain how to do that. | ||
73 | |||
74 | |||
75 | Source file list / functional overview: | ||
76 | |||
77 | (Note: The term "module" used below generally refers to loosely | ||
78 | defined functional units within the pvrusb2 driver and bears no | ||
79 | relation to the Linux kernel's concept of a loadable module.) | ||
80 | |||
81 | pvrusb2-audio.[ch] - This is glue logic that resides between this | ||
82 | driver and the msp3400.ko I2C client driver (which is found | ||
83 | elsewhere in V4L). | ||
84 | |||
85 | pvrusb2-context.[ch] - This module implements the context for an | ||
86 | instance of the driver. Everything else eventually ties back to | ||
87 | or is otherwise instanced within the data structures implemented | ||
88 | here. Hotplugging is ultimately coordinated here. All high level | ||
89 | interfaces tie into the driver through this module. This module | ||
90 | helps arbitrate each interface's access to the actual driver core, | ||
91 | and is designed to allow concurrent access through multiple | ||
92 | instances of multiple interfaces (thus you can for example change | ||
93 | the tuner's frequency through sysfs while simultaneously streaming | ||
94 | video through V4L out to an instance of mplayer). | ||
95 | |||
96 | pvrusb2-debug.h - This header defines a printk() wrapper and a mask | ||
97 | of debugging bit definitions for the various kinds of debug | ||
98 | messages that can be enabled within the driver. | ||
99 | |||
100 | pvrusb2-debugifc.[ch] - This module implements a crude command line | ||
101 | oriented debug interface into the driver. Aside from being part | ||
102 | of the process for implementing manual firmware extraction (see | ||
103 | the pvrusb2 web site mentioned earlier), probably I'm the only one | ||
104 | who has ever used this. It is mainly a debugging aid. | ||
105 | |||
106 | pvrusb2-eeprom.[ch] - This is glue logic that resides between this | ||
107 | driver the tveeprom.ko module, which is itself implemented | ||
108 | elsewhere in V4L. | ||
109 | |||
110 | pvrusb2-encoder.[ch] - This module implements all protocol needed to | ||
111 | interact with the Conexant mpeg2 encoder chip within the pvrusb2 | ||
112 | device. It is a crude echo of corresponding logic in ivtv, | ||
113 | however the design goals (strict isolation) and physical layer | ||
114 | (proxy through USB instead of PCI) are enough different that this | ||
115 | implementation had to be completely different. | ||
116 | |||
117 | pvrusb2-hdw-internal.h - This header defines the core data structure | ||
118 | in the driver used to track ALL internal state related to control | ||
119 | of the hardware. Nobody outside of the core hardware-handling | ||
120 | modules should have any business using this header. All external | ||
121 | access to the driver should be through one of the high level | ||
122 | interfaces (e.g. V4L, sysfs, etc), and in fact even those high | ||
123 | level interfaces are restricted to the API defined in | ||
124 | pvrusb2-hdw.h and NOT this header. | ||
125 | |||
126 | pvrusb2-hdw.h - This header defines the full internal API for | ||
127 | controlling the hardware. High level interfaces (e.g. V4L, sysfs) | ||
128 | will work through here. | ||
129 | |||
130 | pvrusb2-hdw.c - This module implements all the various bits of logic | ||
131 | that handle overall control of a specific pvrusb2 device. | ||
132 | (Policy, instantiation, and arbitration of pvrusb2 devices fall | ||
133 | within the jurisdiction of pvrusb-context not here). | ||
134 | |||
135 | pvrusb2-i2c-chips-*.c - These modules implement the glue logic to | ||
136 | tie together and configure various I2C modules as they attach to | ||
137 | the I2C bus. There are two versions of this file. The "v4l2" | ||
138 | version is intended to be used in-tree alongside V4L, where we | ||
139 | implement just the logic that makes sense for a pure V4L | ||
140 | environment. The "all" version is intended for use outside of | ||
141 | V4L, where we might encounter other possibly "challenging" modules | ||
142 | from ivtv or older kernel snapshots (or even the support modules | ||
143 | in the standalone snapshot). | ||
144 | |||
145 | pvrusb2-i2c-cmd-v4l1.[ch] - This module implements generic V4L1 | ||
146 | compatible commands to the I2C modules. It is here where state | ||
147 | changes inside the pvrusb2 driver are translated into V4L1 | ||
148 | commands that are in turn send to the various I2C modules. | ||
149 | |||
150 | pvrusb2-i2c-cmd-v4l2.[ch] - This module implements generic V4L2 | ||
151 | compatible commands to the I2C modules. It is here where state | ||
152 | changes inside the pvrusb2 driver are translated into V4L2 | ||
153 | commands that are in turn send to the various I2C modules. | ||
154 | |||
155 | pvrusb2-i2c-core.[ch] - This module provides an implementation of a | ||
156 | kernel-friendly I2C adaptor driver, through which other external | ||
157 | I2C client drivers (e.g. msp3400, tuner, lirc) may connect and | ||
158 | operate corresponding chips within the the pvrusb2 device. It is | ||
159 | through here that other V4L modules can reach into this driver to | ||
160 | operate specific pieces (and those modules are in turn driven by | ||
161 | glue logic which is coordinated by pvrusb2-hdw, doled out by | ||
162 | pvrusb2-context, and then ultimately made available to users | ||
163 | through one of the high level interfaces). | ||
164 | |||
165 | pvrusb2-io.[ch] - This module implements a very low level ring of | ||
166 | transfer buffers, required in order to stream data from the | ||
167 | device. This module is *very* low level. It only operates the | ||
168 | buffers and makes no attempt to define any policy or mechanism for | ||
169 | how such buffers might be used. | ||
170 | |||
171 | pvrusb2-ioread.[ch] - This module layers on top of pvrusb2-io.[ch] | ||
172 | to provide a streaming API usable by a read() system call style of | ||
173 | I/O. Right now this is the only layer on top of pvrusb2-io.[ch], | ||
174 | however the underlying architecture here was intended to allow for | ||
175 | other styles of I/O to be implemented with additonal modules, like | ||
176 | mmap()'ed buffers or something even more exotic. | ||
177 | |||
178 | pvrusb2-main.c - This is the top level of the driver. Module level | ||
179 | and USB core entry points are here. This is our "main". | ||
180 | |||
181 | pvrusb2-sysfs.[ch] - This is the high level interface which ties the | ||
182 | pvrusb2 driver into sysfs. Through this interface you can do | ||
183 | everything with the driver except actually stream data. | ||
184 | |||
185 | pvrusb2-tuner.[ch] - This is glue logic that resides between this | ||
186 | driver and the tuner.ko I2C client driver (which is found | ||
187 | elsewhere in V4L). | ||
188 | |||
189 | pvrusb2-util.h - This header defines some common macros used | ||
190 | throughout the driver. These macros are not really specific to | ||
191 | the driver, but they had to go somewhere. | ||
192 | |||
193 | pvrusb2-v4l2.[ch] - This is the high level interface which ties the | ||
194 | pvrusb2 driver into video4linux. It is through here that V4L | ||
195 | applications can open and operate the driver in the usual V4L | ||
196 | ways. Note that **ALL** V4L functionality is published only | ||
197 | through here and nowhere else. | ||
198 | |||
199 | pvrusb2-video-*.[ch] - This is glue logic that resides between this | ||
200 | driver and the saa711x.ko I2C client driver (which is found | ||
201 | elsewhere in V4L). Note that saa711x.ko used to be known as | ||
202 | saa7115.ko in ivtv. There are two versions of this; one is | ||
203 | selected depending on the particular saa711[5x].ko that is found. | ||
204 | |||
205 | pvrusb2.h - This header contains compile time tunable parameters | ||
206 | (and at the moment the driver has very little that needs to be | ||
207 | tuned). | ||
208 | |||
209 | |||
210 | -Mike Isely | ||
211 | isely@pobox.com | ||
212 | |||
diff --git a/Documentation/video4linux/Zoran b/Documentation/video4linux/Zoran index be9f21b84555..040a2c841ae9 100644 --- a/Documentation/video4linux/Zoran +++ b/Documentation/video4linux/Zoran | |||
@@ -33,6 +33,21 @@ Inputs/outputs: Composite and S-video | |||
33 | Norms: PAL, SECAM (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) | 33 | Norms: PAL, SECAM (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) |
34 | Card number: 7 | 34 | Card number: 7 |
35 | 35 | ||
36 | AverMedia 6 Eyes AVS6EYES: | ||
37 | * Zoran zr36067 PCI controller | ||
38 | * Zoran zr36060 MJPEG codec | ||
39 | * Samsung ks0127 TV decoder | ||
40 | * Conexant bt866 TV encoder | ||
41 | Drivers to use: videodev, i2c-core, i2c-algo-bit, | ||
42 | videocodec, ks0127, bt866, zr36060, zr36067 | ||
43 | Inputs/outputs: Six physical inputs. 1-6 are composite, | ||
44 | 1-2, 3-4, 5-6 doubles as S-video, | ||
45 | 1-3 triples as component. | ||
46 | One composite output. | ||
47 | Norms: PAL, SECAM (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) | ||
48 | Card number: 8 | ||
49 | Not autodetected, card=8 is necessary. | ||
50 | |||
36 | Linux Media Labs LML33: | 51 | Linux Media Labs LML33: |
37 | * Zoran zr36067 PCI controller | 52 | * Zoran zr36067 PCI controller |
38 | * Zoran zr36060 MJPEG codec | 53 | * Zoran zr36060 MJPEG codec |
@@ -192,6 +207,10 @@ Micronas vpx3220a TV decoder | |||
192 | was introduced in 1996, is used in the DC30 and DC30+ and | 207 | was introduced in 1996, is used in the DC30 and DC30+ and |
193 | can handle: PAL B/G/H/I, PAL N, PAL M, NTSC M, NTSC 44, PAL 60, SECAM,NTSC Comb | 208 | can handle: PAL B/G/H/I, PAL N, PAL M, NTSC M, NTSC 44, PAL 60, SECAM,NTSC Comb |
194 | 209 | ||
210 | Samsung ks0127 TV decoder | ||
211 | is used in the AVS6EYES card and | ||
212 | can handle: NTSC-M/N/44, PAL-M/N/B/G/H/I/D/K/L and SECAM | ||
213 | |||
195 | =========================== | 214 | =========================== |
196 | 215 | ||
197 | 1.2 What the TV encoder can do an what not | 216 | 1.2 What the TV encoder can do an what not |
@@ -221,6 +240,10 @@ ITT mse3000 TV encoder | |||
221 | was introduced in 1991, is used in the DC10 old | 240 | was introduced in 1991, is used in the DC10 old |
222 | can generate: PAL , NTSC , SECAM | 241 | can generate: PAL , NTSC , SECAM |
223 | 242 | ||
243 | Conexant bt866 TV encoder | ||
244 | is used in AVS6EYES, and | ||
245 | can generate: NTSC/PAL, PALM, PALN | ||
246 | |||
224 | The adv717x, should be able to produce PAL N. But you find nothing PAL N | 247 | The adv717x, should be able to produce PAL N. But you find nothing PAL N |
225 | specific in the registers. Seem that you have to reuse a other standard | 248 | specific in the registers. Seem that you have to reuse a other standard |
226 | to generate PAL N, maybe it would work if you use the PAL M settings. | 249 | to generate PAL N, maybe it would work if you use the PAL M settings. |
diff --git a/Documentation/video4linux/bttv/CONTRIBUTORS b/Documentation/video4linux/bttv/CONTRIBUTORS index aef49db8847d..8aad6dd93d6b 100644 --- a/Documentation/video4linux/bttv/CONTRIBUTORS +++ b/Documentation/video4linux/bttv/CONTRIBUTORS | |||
@@ -1,4 +1,4 @@ | |||
1 | Contributors to bttv: | 1 | Contributors to bttv: |
2 | 2 | ||
3 | Michael Chu <mmchu@pobox.com> | 3 | Michael Chu <mmchu@pobox.com> |
4 | AverMedia fix and more flexible card recognition | 4 | AverMedia fix and more flexible card recognition |
@@ -8,8 +8,8 @@ Alan Cox <alan@redhat.com> | |||
8 | 8 | ||
9 | Chris Kleitsch | 9 | Chris Kleitsch |
10 | Hardware I2C | 10 | Hardware I2C |
11 | 11 | ||
12 | Gerd Knorr <kraxel@cs.tu-berlin.de> | 12 | Gerd Knorr <kraxel@cs.tu-berlin.de> |
13 | Radio card (ITT sound processor) | 13 | Radio card (ITT sound processor) |
14 | 14 | ||
15 | bigfoot <bigfoot@net-way.net> | 15 | bigfoot <bigfoot@net-way.net> |
@@ -18,7 +18,7 @@ Ragnar Hojland Espinosa <ragnar@macula.net> | |||
18 | 18 | ||
19 | 19 | ||
20 | + many more (please mail me if you are missing in this list and would | 20 | + many more (please mail me if you are missing in this list and would |
21 | like to be mentioned) | 21 | like to be mentioned) |
22 | 22 | ||
23 | 23 | ||
24 | 24 | ||
diff --git a/Documentation/video4linux/cx2341x/fw-calling.txt b/Documentation/video4linux/cx2341x/fw-calling.txt new file mode 100644 index 000000000000..8d21181de537 --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-calling.txt | |||
@@ -0,0 +1,69 @@ | |||
1 | This page describes how to make calls to the firmware api. | ||
2 | |||
3 | How to call | ||
4 | =========== | ||
5 | |||
6 | The preferred calling convention is known as the firmware mailbox. The | ||
7 | mailboxes are basically a fixed length array that serves as the call-stack. | ||
8 | |||
9 | Firmware mailboxes can be located by searching the encoder and decoder memory | ||
10 | for a 16 byte signature. That signature will be located on a 256-byte boundary. | ||
11 | |||
12 | Signature: | ||
13 | 0x78, 0x56, 0x34, 0x12, 0x12, 0x78, 0x56, 0x34, | ||
14 | 0x34, 0x12, 0x78, 0x56, 0x56, 0x34, 0x12, 0x78 | ||
15 | |||
16 | The firmware implements 20 mailboxes of 20 32-bit words. The first 10 are | ||
17 | reserved for API calls. The second 10 are used by the firmware for event | ||
18 | notification. | ||
19 | |||
20 | Index Name | ||
21 | ----- ---- | ||
22 | 0 Flags | ||
23 | 1 Command | ||
24 | 2 Return value | ||
25 | 3 Timeout | ||
26 | 4-19 Parameter/Result | ||
27 | |||
28 | |||
29 | The flags are defined in the following table. The direction is from the | ||
30 | perspective of the firmware. | ||
31 | |||
32 | Bit Direction Purpose | ||
33 | --- --------- ------- | ||
34 | 2 O Firmware has processed the command. | ||
35 | 1 I Driver has finished setting the parameters. | ||
36 | 0 I Driver is using this mailbox. | ||
37 | |||
38 | |||
39 | The command is a 32-bit enumerator. The API specifics may be found in the | ||
40 | fw-*-api.txt documents. | ||
41 | |||
42 | The return value is a 32-bit enumerator. Only two values are currently defined: | ||
43 | 0=success and -1=command undefined. | ||
44 | |||
45 | There are 16 parameters/results 32-bit fields. The driver populates these fields | ||
46 | with values for all the parameters required by the call. The driver overwrites | ||
47 | these fields with result values returned by the call. The API specifics may be | ||
48 | found in the fw-*-api.txt documents. | ||
49 | |||
50 | The timeout value protects the card from a hung driver thread. If the driver | ||
51 | doesn't handle the completed call within the timeout specified, the firmware | ||
52 | will reset that mailbox. | ||
53 | |||
54 | To make an API call, the driver iterates over each mailbox looking for the | ||
55 | first one available (bit 0 has been cleared). The driver sets that bit, fills | ||
56 | in the command enumerator, the timeout value and any required parameters. The | ||
57 | driver then sets the parameter ready bit (bit 1). The firmware scans the | ||
58 | mailboxes for pending commands, processes them, sets the result code, populates | ||
59 | the result value array with that call's return values and sets the call | ||
60 | complete bit (bit 2). Once bit 2 is set, the driver should retrieve the results | ||
61 | and clear all the flags. If the driver does not perform this task within the | ||
62 | time set in the timeout register, the firmware will reset that mailbox. | ||
63 | |||
64 | Event notifications are sent from the firmware to the host. The host tells the | ||
65 | firmware which events it is interested in via an API call. That call tells the | ||
66 | firmware which notification mailbox to use. The firmware signals the host via | ||
67 | an interrupt. Only the 16 Results fields are used, the Flags, Command, Return | ||
68 | value and Timeout words are not used. | ||
69 | |||
diff --git a/Documentation/video4linux/cx2341x/fw-decoder-api.txt b/Documentation/video4linux/cx2341x/fw-decoder-api.txt new file mode 100644 index 000000000000..9df4fb3ea0f2 --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-decoder-api.txt | |||
@@ -0,0 +1,319 @@ | |||
1 | Decoder firmware API description | ||
2 | ================================ | ||
3 | |||
4 | Note: this API is part of the decoder firmware, so it's cx23415 only. | ||
5 | |||
6 | ------------------------------------------------------------------------------- | ||
7 | |||
8 | Name CX2341X_DEC_PING_FW | ||
9 | Enum 0/0x00 | ||
10 | Description | ||
11 | This API call does nothing. It may be used to check if the firmware | ||
12 | is responding. | ||
13 | |||
14 | ------------------------------------------------------------------------------- | ||
15 | |||
16 | Name CX2341X_DEC_START_PLAYBACK | ||
17 | Enum 1/0x01 | ||
18 | Description | ||
19 | Begin or resume playback. | ||
20 | Param[0] | ||
21 | 0 based frame number in GOP to begin playback from. | ||
22 | Param[1] | ||
23 | Specifies the number of muted audio frames to play before normal | ||
24 | audio resumes. | ||
25 | |||
26 | ------------------------------------------------------------------------------- | ||
27 | |||
28 | Name CX2341X_DEC_STOP_PLAYBACK | ||
29 | Enum 2/0x02 | ||
30 | Description | ||
31 | Ends playback and clears all decoder buffers. If PTS is not zero, | ||
32 | playback stops at specified PTS. | ||
33 | Param[0] | ||
34 | Display 0=last frame, 1=black | ||
35 | Param[1] | ||
36 | PTS low | ||
37 | Param[2] | ||
38 | PTS high | ||
39 | |||
40 | ------------------------------------------------------------------------------- | ||
41 | |||
42 | Name CX2341X_DEC_SET_PLAYBACK_SPEED | ||
43 | Enum 3/0x03 | ||
44 | Description | ||
45 | Playback stream at speed other than normal. There are two modes of | ||
46 | operation: | ||
47 | Smooth: host transfers entire stream and firmware drops unused | ||
48 | frames. | ||
49 | Coarse: host drops frames based on indexing as required to achieve | ||
50 | desired speed. | ||
51 | Param[0] | ||
52 | Bitmap: | ||
53 | 0:7 0 normal | ||
54 | 1 fast only "1.5 times" | ||
55 | n nX fast, 1/nX slow | ||
56 | 30 Framedrop: | ||
57 | '0' during 1.5 times play, every other B frame is dropped | ||
58 | '1' during 1.5 times play, stream is unchanged (bitrate | ||
59 | must not exceed 8mbps) | ||
60 | 31 Speed: | ||
61 | '0' slow | ||
62 | '1' fast | ||
63 | Param[1] | ||
64 | Direction: 0=forward, 1=reverse | ||
65 | Param[2] | ||
66 | Picture mask: | ||
67 | 1=I frames | ||
68 | 3=I, P frames | ||
69 | 7=I, P, B frames | ||
70 | Param[3] | ||
71 | B frames per GOP (for reverse play only) | ||
72 | Param[4] | ||
73 | Mute audio: 0=disable, 1=enable | ||
74 | Param[5] | ||
75 | Display 0=frame, 1=field | ||
76 | Param[6] | ||
77 | Specifies the number of muted audio frames to play before normal audio | ||
78 | resumes. | ||
79 | |||
80 | ------------------------------------------------------------------------------- | ||
81 | |||
82 | Name CX2341X_DEC_STEP_VIDEO | ||
83 | Enum 5/0x05 | ||
84 | Description | ||
85 | Each call to this API steps the playback to the next unit defined below | ||
86 | in the current playback direction. | ||
87 | Param[0] | ||
88 | 0=frame, 1=top field, 2=bottom field | ||
89 | |||
90 | ------------------------------------------------------------------------------- | ||
91 | |||
92 | Name CX2341X_DEC_SET_DMA_BLOCK_SIZE | ||
93 | Enum 8/0x08 | ||
94 | Description | ||
95 | Set DMA transfer block size. Counterpart to API 0xC9 | ||
96 | Param[0] | ||
97 | DMA transfer block size in bytes. A different size may be specified | ||
98 | when issuing the DMA transfer command. | ||
99 | |||
100 | ------------------------------------------------------------------------------- | ||
101 | |||
102 | Name CX2341X_DEC_GET_XFER_INFO | ||
103 | Enum 9/0x09 | ||
104 | Description | ||
105 | This API call may be used to detect an end of stream condtion. | ||
106 | Result[0] | ||
107 | Stream type | ||
108 | Result[1] | ||
109 | Address offset | ||
110 | Result[2] | ||
111 | Maximum bytes to transfer | ||
112 | Result[3] | ||
113 | Buffer fullness | ||
114 | |||
115 | ------------------------------------------------------------------------------- | ||
116 | |||
117 | Name CX2341X_DEC_GET_DMA_STATUS | ||
118 | Enum 10/0x0A | ||
119 | Description | ||
120 | Status of the last DMA transfer | ||
121 | Result[0] | ||
122 | Bit 1 set means transfer complete | ||
123 | Bit 2 set means DMA error | ||
124 | Bit 3 set means linked list error | ||
125 | Result[1] | ||
126 | DMA type: 0=MPEG, 1=OSD, 2=YUV | ||
127 | |||
128 | ------------------------------------------------------------------------------- | ||
129 | |||
130 | Name CX2341X_DEC_SCHED_DMA_FROM_HOST | ||
131 | Enum 11/0x0B | ||
132 | Description | ||
133 | Setup DMA from host operation. Counterpart to API 0xCC | ||
134 | Param[0] | ||
135 | Memory address of link list | ||
136 | Param[1] | ||
137 | Total # of bytes to transfer | ||
138 | Param[2] | ||
139 | DMA type (0=MPEG, 1=OSD, 2=YUV) | ||
140 | |||
141 | ------------------------------------------------------------------------------- | ||
142 | |||
143 | Name CX2341X_DEC_PAUSE_PLAYBACK | ||
144 | Enum 13/0x0D | ||
145 | Description | ||
146 | Freeze playback immediately. In this mode, when internal buffers are | ||
147 | full, no more data will be accepted and data request IRQs will be | ||
148 | masked. | ||
149 | Param[0] | ||
150 | Display: 0=last frame, 1=black | ||
151 | |||
152 | ------------------------------------------------------------------------------- | ||
153 | |||
154 | Name CX2341X_DEC_HALT_FW | ||
155 | Enum 14/0x0E | ||
156 | Description | ||
157 | The firmware is halted and no further API calls are serviced until | ||
158 | the firmware is uploaded again. | ||
159 | |||
160 | ------------------------------------------------------------------------------- | ||
161 | |||
162 | Name CX2341X_DEC_SET_STANDARD | ||
163 | Enum 16/0x10 | ||
164 | Description | ||
165 | Selects display standard | ||
166 | Param[0] | ||
167 | 0=NTSC, 1=PAL | ||
168 | |||
169 | ------------------------------------------------------------------------------- | ||
170 | |||
171 | Name CX2341X_DEC_GET_VERSION | ||
172 | Enum 17/0x11 | ||
173 | Description | ||
174 | Returns decoder firmware version information | ||
175 | Result[0] | ||
176 | Version bitmask: | ||
177 | Bits 0:15 build | ||
178 | Bits 16:23 minor | ||
179 | Bits 24:31 major | ||
180 | |||
181 | ------------------------------------------------------------------------------- | ||
182 | |||
183 | Name CX2341X_DEC_SET_STREAM_INPUT | ||
184 | Enum 20/0x14 | ||
185 | Description | ||
186 | Select decoder stream input port | ||
187 | Param[0] | ||
188 | 0=memory (default), 1=streaming | ||
189 | |||
190 | ------------------------------------------------------------------------------- | ||
191 | |||
192 | Name CX2341X_DEC_GET_TIMING_INFO | ||
193 | Enum 21/0x15 | ||
194 | Description | ||
195 | Returns timing information from start of playback | ||
196 | Result[0] | ||
197 | Frame count by decode order | ||
198 | Result[1] | ||
199 | Video PTS bits 0:31 by display order | ||
200 | Result[2] | ||
201 | Video PTS bit 32 by display order | ||
202 | Result[3] | ||
203 | SCR bits 0:31 by display order | ||
204 | Result[4] | ||
205 | SCR bit 32 by display order | ||
206 | |||
207 | ------------------------------------------------------------------------------- | ||
208 | |||
209 | Name CX2341X_DEC_SET_AUDIO_MODE | ||
210 | Enum 22/0x16 | ||
211 | Description | ||
212 | Select audio mode | ||
213 | Param[0] | ||
214 | Dual mono mode action | ||
215 | Param[1] | ||
216 | Stereo mode action: | ||
217 | 0=Stereo, 1=Left, 2=Right, 3=Mono, 4=Swap, -1=Unchanged | ||
218 | |||
219 | ------------------------------------------------------------------------------- | ||
220 | |||
221 | Name CX2341X_DEC_SET_EVENT_NOTIFICATION | ||
222 | Enum 23/0x17 | ||
223 | Description | ||
224 | Setup firmware to notify the host about a particular event. | ||
225 | Counterpart to API 0xD5 | ||
226 | Param[0] | ||
227 | Event: 0=Audio mode change between stereo and dual channel | ||
228 | Param[1] | ||
229 | Notification 0=disabled, 1=enabled | ||
230 | Param[2] | ||
231 | Interrupt bit | ||
232 | Param[3] | ||
233 | Mailbox slot, -1 if no mailbox required. | ||
234 | |||
235 | ------------------------------------------------------------------------------- | ||
236 | |||
237 | Name CX2341X_DEC_SET_DISPLAY_BUFFERS | ||
238 | Enum 24/0x18 | ||
239 | Description | ||
240 | Number of display buffers. To decode all frames in reverse playback you | ||
241 | must use nine buffers. | ||
242 | Param[0] | ||
243 | 0=six buffers, 1=nine buffers | ||
244 | |||
245 | ------------------------------------------------------------------------------- | ||
246 | |||
247 | Name CX2341X_DEC_EXTRACT_VBI | ||
248 | Enum 25/0x19 | ||
249 | Description | ||
250 | Extracts VBI data | ||
251 | Param[0] | ||
252 | 0=extract from extension & user data, 1=extract from private packets | ||
253 | Result[0] | ||
254 | VBI table location | ||
255 | Result[1] | ||
256 | VBI table size | ||
257 | |||
258 | ------------------------------------------------------------------------------- | ||
259 | |||
260 | Name CX2341X_DEC_SET_DECODER_SOURCE | ||
261 | Enum 26/0x1A | ||
262 | Description | ||
263 | Selects decoder source. Ensure that the parameters passed to this | ||
264 | API match the encoder settings. | ||
265 | Param[0] | ||
266 | Mode: 0=MPEG from host, 1=YUV from encoder, 2=YUV from host | ||
267 | Param[1] | ||
268 | YUV picture width | ||
269 | Param[2] | ||
270 | YUV picture height | ||
271 | Param[3] | ||
272 | Bitmap: see Param[0] of API 0xBD | ||
273 | |||
274 | ------------------------------------------------------------------------------- | ||
275 | |||
276 | Name CX2341X_DEC_SET_AUDIO_OUTPUT | ||
277 | Enum 27/0x1B | ||
278 | Description | ||
279 | Select audio output format | ||
280 | Param[0] | ||
281 | Bitmask: | ||
282 | 0:1 Data size: | ||
283 | '00' 16 bit | ||
284 | '01' 20 bit | ||
285 | '10' 24 bit | ||
286 | 2:7 Unused | ||
287 | 8:9 Mode: | ||
288 | '00' 2 channels | ||
289 | '01' 4 channels | ||
290 | '10' 6 channels | ||
291 | '11' 6 channels with one line data mode | ||
292 | (for left justified MSB first mode, 20 bit only) | ||
293 | 10:11 Unused | ||
294 | 12:13 Channel format: | ||
295 | '00' right justified MSB first mode | ||
296 | '01' left justified MSB first mode | ||
297 | '10' I2S mode | ||
298 | 14:15 Unused | ||
299 | 16:21 Right justify bit count | ||
300 | 22:31 Unused | ||
301 | |||
302 | ------------------------------------------------------------------------------- | ||
303 | |||
304 | Name CX2341X_DEC_SET_AV_DELAY | ||
305 | Enum 28/0x1C | ||
306 | Description | ||
307 | Set audio/video delay in 90Khz ticks | ||
308 | Param[0] | ||
309 | 0=A/V in sync, negative=audio lags, positive=video lags | ||
310 | |||
311 | ------------------------------------------------------------------------------- | ||
312 | |||
313 | Name CX2341X_DEC_SET_PREBUFFERING | ||
314 | Enum 30/0x1E | ||
315 | Description | ||
316 | Decoder prebuffering, when enabled up to 128KB are buffered for | ||
317 | streams <8mpbs or 640KB for streams >8mbps | ||
318 | Param[0] | ||
319 | 0=off, 1=on | ||
diff --git a/Documentation/video4linux/cx2341x/fw-dma.txt b/Documentation/video4linux/cx2341x/fw-dma.txt new file mode 100644 index 000000000000..8123e262d5b6 --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-dma.txt | |||
@@ -0,0 +1,94 @@ | |||
1 | This page describes the structures and procedures used by the cx2341x DMA | ||
2 | engine. | ||
3 | |||
4 | Introduction | ||
5 | ============ | ||
6 | |||
7 | The cx2341x PCI interface is busmaster capable. This means it has a DMA | ||
8 | engine to efficiently transfer large volumes of data between the card and main | ||
9 | memory without requiring help from a CPU. Like most hardware, it must operate | ||
10 | on contiguous physical memory. This is difficult to come by in large quantities | ||
11 | on virtual memory machines. | ||
12 | |||
13 | Therefore, it also supports a technique called "scatter-gather". The card can | ||
14 | transfer multiple buffers in one operation. Instead of allocating one large | ||
15 | contiguous buffer, the driver can allocate several smaller buffers. | ||
16 | |||
17 | In practice, I've seen the average transfer to be roughly 80K, but transfers | ||
18 | above 128K were not uncommon, particularly at startup. The 128K figure is | ||
19 | important, because that is the largest block that the kernel can normally | ||
20 | allocate. Even still, 128K blocks are hard to come by, so the driver writer is | ||
21 | urged to choose a smaller block size and learn the scatter-gather technique. | ||
22 | |||
23 | Mailbox #10 is reserved for DMA transfer information. | ||
24 | |||
25 | Flow | ||
26 | ==== | ||
27 | |||
28 | This section describes, in general, the order of events when handling DMA | ||
29 | transfers. Detailed information follows this section. | ||
30 | |||
31 | - The card raises the Encoder interrupt. | ||
32 | - The driver reads the transfer type, offset and size from Mailbox #10. | ||
33 | - The driver constructs the scatter-gather array from enough free dma buffers | ||
34 | to cover the size. | ||
35 | - The driver schedules the DMA transfer via the ScheduleDMAtoHost API call. | ||
36 | - The card raises the DMA Complete interrupt. | ||
37 | - The driver checks the DMA status register for any errors. | ||
38 | - The driver post-processes the newly transferred buffers. | ||
39 | |||
40 | NOTE! It is possible that the Encoder and DMA Complete interrupts get raised | ||
41 | simultaneously. (End of the last, start of the next, etc.) | ||
42 | |||
43 | Mailbox #10 | ||
44 | =========== | ||
45 | |||
46 | The Flags, Command, Return Value and Timeout fields are ignored. | ||
47 | |||
48 | Name: Mailbox #10 | ||
49 | Results[0]: Type: 0: MPEG. | ||
50 | Results[1]: Offset: The position relative to the card's memory space. | ||
51 | Results[2]: Size: The exact number of bytes to transfer. | ||
52 | |||
53 | My speculation is that since the StartCapture API has a capture type of "RAW" | ||
54 | available, that the type field will have other values that correspond to YUV | ||
55 | and PCM data. | ||
56 | |||
57 | Scatter-Gather Array | ||
58 | ==================== | ||
59 | |||
60 | The scatter-gather array is a contiguously allocated block of memory that | ||
61 | tells the card the source and destination of each data-block to transfer. | ||
62 | Card "addresses" are derived from the offset supplied by Mailbox #10. Host | ||
63 | addresses are the physical memory location of the target DMA buffer. | ||
64 | |||
65 | Each S-G array element is a struct of three 32-bit words. The first word is | ||
66 | the source address, the second is the destination address. Both take up the | ||
67 | entire 32 bits. The lowest 16 bits of the third word is the transfer byte | ||
68 | count. The high-bit of the third word is the "last" flag. The last-flag tells | ||
69 | the card to raise the DMA_DONE interrupt. From hard personal experience, if | ||
70 | you forget to set this bit, the card will still "work" but the stream will | ||
71 | most likely get corrupted. | ||
72 | |||
73 | The transfer count must be a multiple of 256. Therefore, the driver will need | ||
74 | to track how much data in the target buffer is valid and deal with it | ||
75 | accordingly. | ||
76 | |||
77 | Array Element: | ||
78 | |||
79 | - 32-bit Source Address | ||
80 | - 32-bit Destination Address | ||
81 | - 16-bit reserved (high bit is the last flag) | ||
82 | - 16-bit byte count | ||
83 | |||
84 | DMA Transfer Status | ||
85 | =================== | ||
86 | |||
87 | Register 0x0004 holds the DMA Transfer Status: | ||
88 | |||
89 | Bit | ||
90 | 4 Scatter-Gather array error | ||
91 | 3 DMA write error | ||
92 | 2 DMA read error | ||
93 | 1 write completed | ||
94 | 0 read completed | ||
diff --git a/Documentation/video4linux/cx2341x/fw-encoder-api.txt b/Documentation/video4linux/cx2341x/fw-encoder-api.txt new file mode 100644 index 000000000000..001c68644b08 --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-encoder-api.txt | |||
@@ -0,0 +1,694 @@ | |||
1 | Encoder firmware API description | ||
2 | ================================ | ||
3 | |||
4 | ------------------------------------------------------------------------------- | ||
5 | |||
6 | Name CX2341X_ENC_PING_FW | ||
7 | Enum 128/0x80 | ||
8 | Description | ||
9 | Does nothing. Can be used to check if the firmware is responding. | ||
10 | |||
11 | ------------------------------------------------------------------------------- | ||
12 | |||
13 | Name CX2341X_ENC_START_CAPTURE | ||
14 | Enum 129/0x81 | ||
15 | Description | ||
16 | Commences the capture of video, audio and/or VBI data. All encoding | ||
17 | parameters must be initialized prior to this API call. Captures frames | ||
18 | continuously or until a predefined number of frames have been captured. | ||
19 | Param[0] | ||
20 | Capture stream type: | ||
21 | 0=MPEG | ||
22 | 1=Raw | ||
23 | 2=Raw passthrough | ||
24 | 3=VBI | ||
25 | |||
26 | Param[1] | ||
27 | Bitmask: | ||
28 | Bit 0 when set, captures YUV | ||
29 | Bit 1 when set, captures PCM audio | ||
30 | Bit 2 when set, captures VBI (same as param[0]=3) | ||
31 | Bit 3 when set, the capture destination is the decoder | ||
32 | (same as param[0]=2) | ||
33 | Bit 4 when set, the capture destination is the host | ||
34 | Note: this parameter is only meaningful for RAW capture type. | ||
35 | |||
36 | ------------------------------------------------------------------------------- | ||
37 | |||
38 | Name CX2341X_ENC_STOP_CAPTURE | ||
39 | Enum 130/0x82 | ||
40 | Description | ||
41 | Ends a capture in progress | ||
42 | Param[0] | ||
43 | 0=stop at end of GOP (generates IRQ) | ||
44 | 1=stop immediate (no IRQ) | ||
45 | Param[1] | ||
46 | Stream type to stop, see param[0] of API 0x81 | ||
47 | Param[2] | ||
48 | Subtype, see param[1] of API 0x81 | ||
49 | |||
50 | ------------------------------------------------------------------------------- | ||
51 | |||
52 | Name CX2341X_ENC_SET_AUDIO_ID | ||
53 | Enum 137/0x89 | ||
54 | Description | ||
55 | Assigns the transport stream ID of the encoded audio stream | ||
56 | Param[0] | ||
57 | Audio Stream ID | ||
58 | |||
59 | ------------------------------------------------------------------------------- | ||
60 | |||
61 | Name CX2341X_ENC_SET_VIDEO_ID | ||
62 | Enum 139/0x8B | ||
63 | Description | ||
64 | Set video transport stream ID | ||
65 | Param[0] | ||
66 | Video stream ID | ||
67 | |||
68 | ------------------------------------------------------------------------------- | ||
69 | |||
70 | Name CX2341X_ENC_SET_PCR_ID | ||
71 | Enum 141/0x8D | ||
72 | Description | ||
73 | Assigns the transport stream ID for PCR packets | ||
74 | Param[0] | ||
75 | PCR Stream ID | ||
76 | |||
77 | ------------------------------------------------------------------------------- | ||
78 | |||
79 | Name CX2341X_ENC_SET_FRAME_RATE | ||
80 | Enum 143/0x8F | ||
81 | Description | ||
82 | Set video frames per second. Change occurs at start of new GOP. | ||
83 | Param[0] | ||
84 | 0=30fps | ||
85 | 1=25fps | ||
86 | |||
87 | ------------------------------------------------------------------------------- | ||
88 | |||
89 | Name CX2341X_ENC_SET_FRAME_SIZE | ||
90 | Enum 145/0x91 | ||
91 | Description | ||
92 | Select video stream encoding resolution. | ||
93 | Param[0] | ||
94 | Height in lines. Default 480 | ||
95 | Param[1] | ||
96 | Width in pixels. Default 720 | ||
97 | |||
98 | ------------------------------------------------------------------------------- | ||
99 | |||
100 | Name CX2341X_ENC_SET_BIT_RATE | ||
101 | Enum 149/0x95 | ||
102 | Description | ||
103 | Assign average video stream bitrate. Note on the last three params: | ||
104 | Param[3] and [4] seem to be always 0, param [5] doesn't seem to be used. | ||
105 | Param[0] | ||
106 | 0=variable bitrate, 1=constant bitrate | ||
107 | Param[1] | ||
108 | bitrate in bits per second | ||
109 | Param[2] | ||
110 | peak bitrate in bits per second, divided by 400 | ||
111 | Param[3] | ||
112 | Mux bitrate in bits per second, divided by 400. May be 0 (default). | ||
113 | Param[4] | ||
114 | Rate Control VBR Padding | ||
115 | Param[5] | ||
116 | VBV Buffer used by encoder | ||
117 | |||
118 | ------------------------------------------------------------------------------- | ||
119 | |||
120 | Name CX2341X_ENC_SET_GOP_PROPERTIES | ||
121 | Enum 151/0x97 | ||
122 | Description | ||
123 | Setup the GOP structure | ||
124 | Param[0] | ||
125 | GOP size (maximum is 34) | ||
126 | Param[1] | ||
127 | Number of B frames between the I and P frame, plus 1. | ||
128 | For example: IBBPBBPBBPBB --> GOP size: 12, number of B frames: 2+1 = 3 | ||
129 | Note that GOP size must be a multiple of (B-frames + 1). | ||
130 | |||
131 | ------------------------------------------------------------------------------- | ||
132 | |||
133 | Name CX2341X_ENC_SET_ASPECT_RATIO | ||
134 | Enum 153/0x99 | ||
135 | Description | ||
136 | Sets the encoding aspect ratio. Changes in the aspect ratio take effect | ||
137 | at the start of the next GOP. | ||
138 | Param[0] | ||
139 | '0000' forbidden | ||
140 | '0001' 1:1 square | ||
141 | '0010' 4:3 | ||
142 | '0011' 16:9 | ||
143 | '0100' 2.21:1 | ||
144 | '0101' reserved | ||
145 | .... | ||
146 | '1111' reserved | ||
147 | |||
148 | ------------------------------------------------------------------------------- | ||
149 | |||
150 | Name CX2341X_ENC_SET_DNR_FILTER_MODE | ||
151 | Enum 155/0x9B | ||
152 | Description | ||
153 | Assign Dynamic Noise Reduction operating mode | ||
154 | Param[0] | ||
155 | Bit0: Spatial filter, set=auto, clear=manual | ||
156 | Bit1: Temporal filter, set=auto, clear=manual | ||
157 | Param[1] | ||
158 | Median filter: | ||
159 | 0=Disabled | ||
160 | 1=Horizontal | ||
161 | 2=Vertical | ||
162 | 3=Horiz/Vert | ||
163 | 4=Diagonal | ||
164 | |||
165 | ------------------------------------------------------------------------------- | ||
166 | |||
167 | Name CX2341X_ENC_SET_DNR_FILTER_PROPS | ||
168 | Enum 157/0x9D | ||
169 | Description | ||
170 | These Dynamic Noise Reduction filter values are only meaningful when | ||
171 | the respective filter is set to "manual" (See API 0x9B) | ||
172 | Param[0] | ||
173 | Spatial filter: default 0, range 0:15 | ||
174 | Param[1] | ||
175 | Temporal filter: default 0, range 0:31 | ||
176 | |||
177 | ------------------------------------------------------------------------------- | ||
178 | |||
179 | Name CX2341X_ENC_SET_CORING_LEVELS | ||
180 | Enum 159/0x9F | ||
181 | Description | ||
182 | Assign Dynamic Noise Reduction median filter properties. | ||
183 | Param[0] | ||
184 | Threshold above which the luminance median filter is enabled. | ||
185 | Default: 0, range 0:255 | ||
186 | Param[1] | ||
187 | Threshold below which the luminance median filter is enabled. | ||
188 | Default: 255, range 0:255 | ||
189 | Param[2] | ||
190 | Threshold above which the chrominance median filter is enabled. | ||
191 | Default: 0, range 0:255 | ||
192 | Param[3] | ||
193 | Threshold below which the chrominance median filter is enabled. | ||
194 | Default: 255, range 0:255 | ||
195 | |||
196 | ------------------------------------------------------------------------------- | ||
197 | |||
198 | Name CX2341X_ENC_SET_SPATIAL_FILTER_TYPE | ||
199 | Enum 161/0xA1 | ||
200 | Description | ||
201 | Assign spatial prefilter parameters | ||
202 | Param[0] | ||
203 | Luminance filter | ||
204 | 0=Off | ||
205 | 1=1D Horizontal | ||
206 | 2=1D Vertical | ||
207 | 3=2D H/V Separable (default) | ||
208 | 4=2D Symmetric non-separable | ||
209 | Param[1] | ||
210 | Chrominance filter | ||
211 | 0=Off | ||
212 | 1=1D Horizontal (default) | ||
213 | |||
214 | ------------------------------------------------------------------------------- | ||
215 | |||
216 | Name CX2341X_ENC_SET_3_2_PULLDOWN | ||
217 | Enum 177/0xB1 | ||
218 | Description | ||
219 | 3:2 pulldown properties | ||
220 | Param[0] | ||
221 | 0=enabled | ||
222 | 1=disabled | ||
223 | |||
224 | ------------------------------------------------------------------------------- | ||
225 | |||
226 | Name CX2341X_ENC_SET_VBI_LINE | ||
227 | Enum 183/0xB7 | ||
228 | Description | ||
229 | Selects VBI line number. | ||
230 | Param[0] | ||
231 | Bits 0:4 line number | ||
232 | Bit 31 0=top_field, 1=bottom_field | ||
233 | Bits 0:31 all set specifies "all lines" | ||
234 | Param[1] | ||
235 | VBI line information features: 0=disabled, 1=enabled | ||
236 | Param[2] | ||
237 | Slicing: 0=None, 1=Closed Caption | ||
238 | Almost certainly not implemented. Set to 0. | ||
239 | Param[3] | ||
240 | Luminance samples in this line. | ||
241 | Almost certainly not implemented. Set to 0. | ||
242 | Param[4] | ||
243 | Chrominance samples in this line | ||
244 | Almost certainly not implemented. Set to 0. | ||
245 | |||
246 | ------------------------------------------------------------------------------- | ||
247 | |||
248 | Name CX2341X_ENC_SET_STREAM_TYPE | ||
249 | Enum 185/0xB9 | ||
250 | Description | ||
251 | Assign stream type | ||
252 | Note: Transport stream is not working in recent firmwares. | ||
253 | And in older firmwares the timestamps in the TS seem to be | ||
254 | unreliable. | ||
255 | Param[0] | ||
256 | 0=Program stream | ||
257 | 1=Transport stream | ||
258 | 2=MPEG1 stream | ||
259 | 3=PES A/V stream | ||
260 | 5=PES Video stream | ||
261 | 7=PES Audio stream | ||
262 | 10=DVD stream | ||
263 | 11=VCD stream | ||
264 | 12=SVCD stream | ||
265 | 13=DVD_S1 stream | ||
266 | 14=DVD_S2 stream | ||
267 | |||
268 | ------------------------------------------------------------------------------- | ||
269 | |||
270 | Name CX2341X_ENC_SET_OUTPUT_PORT | ||
271 | Enum 187/0xBB | ||
272 | Description | ||
273 | Assign stream output port. Normally 0 when the data is copied through | ||
274 | the PCI bus (DMA), and 1 when the data is streamed to another chip | ||
275 | (pvrusb and cx88-blackbird). | ||
276 | Param[0] | ||
277 | 0=Memory (default) | ||
278 | 1=Streaming | ||
279 | 2=Serial | ||
280 | Param[1] | ||
281 | Unknown, but leaving this to 0 seems to work best. Indications are that | ||
282 | this might have to do with USB support, although passing anything but 0 | ||
283 | onl breaks things. | ||
284 | |||
285 | ------------------------------------------------------------------------------- | ||
286 | |||
287 | Name CX2341X_ENC_SET_AUDIO_PROPERTIES | ||
288 | Enum 189/0xBD | ||
289 | Description | ||
290 | Set audio stream properties, may be called while encoding is in progress. | ||
291 | Note: all bitfields are consistent with ISO11172 documentation except | ||
292 | bits 2:3 which ISO docs define as: | ||
293 | '11' Layer I | ||
294 | '10' Layer II | ||
295 | '01' Layer III | ||
296 | '00' Undefined | ||
297 | This discrepancy may indicate a possible error in the documentation. | ||
298 | Testing indicated that only Layer II is actually working, and that | ||
299 | the minimum bitrate should be 192 kbps. | ||
300 | Param[0] | ||
301 | Bitmask: | ||
302 | 0:1 '00' 44.1Khz | ||
303 | '01' 48Khz | ||
304 | '10' 32Khz | ||
305 | '11' reserved | ||
306 | |||
307 | 2:3 '01'=Layer I | ||
308 | '10'=Layer II | ||
309 | |||
310 | 4:7 Bitrate: | ||
311 | Index | Layer I | Layer II | ||
312 | ------+-------------+------------ | ||
313 | '0000' | free format | free format | ||
314 | '0001' | 32 kbit/s | 32 kbit/s | ||
315 | '0010' | 64 kbit/s | 48 kbit/s | ||
316 | '0011' | 96 kbit/s | 56 kbit/s | ||
317 | '0100' | 128 kbit/s | 64 kbit/s | ||
318 | '0101' | 160 kbit/s | 80 kbit/s | ||
319 | '0110' | 192 kbit/s | 96 kbit/s | ||
320 | '0111' | 224 kbit/s | 112 kbit/s | ||
321 | '1000' | 256 kbit/s | 128 kbit/s | ||
322 | '1001' | 288 kbit/s | 160 kbit/s | ||
323 | '1010' | 320 kbit/s | 192 kbit/s | ||
324 | '1011' | 352 kbit/s | 224 kbit/s | ||
325 | '1100' | 384 kbit/s | 256 kbit/s | ||
326 | '1101' | 416 kbit/s | 320 kbit/s | ||
327 | '1110' | 448 kbit/s | 384 kbit/s | ||
328 | Note: For Layer II, not all combinations of total bitrate | ||
329 | and mode are allowed. See ISO11172-3 3-Annex B, Table 3-B.2 | ||
330 | |||
331 | 8:9 '00'=Stereo | ||
332 | '01'=JointStereo | ||
333 | '10'=Dual | ||
334 | '11'=Mono | ||
335 | Note: testing seems to indicate that Mono and possibly | ||
336 | JointStereo are not working (default to stereo). | ||
337 | Dual does work, though. | ||
338 | |||
339 | 10:11 Mode Extension used in joint_stereo mode. | ||
340 | In Layer I and II they indicate which subbands are in | ||
341 | intensity_stereo. All other subbands are coded in stereo. | ||
342 | '00' subbands 4-31 in intensity_stereo, bound==4 | ||
343 | '01' subbands 8-31 in intensity_stereo, bound==8 | ||
344 | '10' subbands 12-31 in intensity_stereo, bound==12 | ||
345 | '11' subbands 16-31 in intensity_stereo, bound==16 | ||
346 | |||
347 | 12:13 Emphasis: | ||
348 | '00' None | ||
349 | '01' 50/15uS | ||
350 | '10' reserved | ||
351 | '11' CCITT J.17 | ||
352 | |||
353 | 14 CRC: | ||
354 | '0' off | ||
355 | '1' on | ||
356 | |||
357 | 15 Copyright: | ||
358 | '0' off | ||
359 | '1' on | ||
360 | |||
361 | 16 Generation: | ||
362 | '0' copy | ||
363 | '1' original | ||
364 | |||
365 | ------------------------------------------------------------------------------- | ||
366 | |||
367 | Name CX2341X_ENC_HALT_FW | ||
368 | Enum 195/0xC3 | ||
369 | Description | ||
370 | The firmware is halted and no further API calls are serviced until the | ||
371 | firmware is uploaded again. | ||
372 | |||
373 | ------------------------------------------------------------------------------- | ||
374 | |||
375 | Name CX2341X_ENC_GET_VERSION | ||
376 | Enum 196/0xC4 | ||
377 | Description | ||
378 | Returns the version of the encoder firmware. | ||
379 | Result[0] | ||
380 | Version bitmask: | ||
381 | Bits 0:15 build | ||
382 | Bits 16:23 minor | ||
383 | Bits 24:31 major | ||
384 | |||
385 | ------------------------------------------------------------------------------- | ||
386 | |||
387 | Name CX2341X_ENC_SET_GOP_CLOSURE | ||
388 | Enum 197/0xC5 | ||
389 | Description | ||
390 | Assigns the GOP open/close property. | ||
391 | Param[0] | ||
392 | 0=Open | ||
393 | 1=Closed | ||
394 | |||
395 | ------------------------------------------------------------------------------- | ||
396 | |||
397 | Name CX2341X_ENC_GET_SEQ_END | ||
398 | Enum 198/0xC6 | ||
399 | Description | ||
400 | Obtains the sequence end code of the encoder's buffer. When a capture | ||
401 | is started a number of interrupts are still generated, the last of | ||
402 | which will have Result[0] set to 1 and Result[1] will contain the size | ||
403 | of the buffer. | ||
404 | Result[0] | ||
405 | State of the transfer (1 if last buffer) | ||
406 | Result[1] | ||
407 | If Result[0] is 1, this contains the size of the last buffer, undefined | ||
408 | otherwise. | ||
409 | |||
410 | ------------------------------------------------------------------------------- | ||
411 | |||
412 | Name CX2341X_ENC_SET_PGM_INDEX_INFO | ||
413 | Enum 199/0xC7 | ||
414 | Description | ||
415 | Sets the Program Index Information. | ||
416 | Param[0] | ||
417 | Picture Mask: | ||
418 | 0=No index capture | ||
419 | 1=I frames | ||
420 | 3=I,P frames | ||
421 | 7=I,P,B frames | ||
422 | Param[1] | ||
423 | Elements requested (up to 400) | ||
424 | Result[0] | ||
425 | Offset in SDF memory of the table. | ||
426 | Result[1] | ||
427 | Number of allocated elements up to a maximum of Param[1] | ||
428 | |||
429 | ------------------------------------------------------------------------------- | ||
430 | |||
431 | Name CX2341X_ENC_SET_VBI_CONFIG | ||
432 | Enum 200/0xC8 | ||
433 | Description | ||
434 | Configure VBI settings | ||
435 | Param[0] | ||
436 | Bitmap: | ||
437 | 0 Mode '0' Sliced, '1' Raw | ||
438 | 1:3 Insertion: | ||
439 | '000' insert in extension & user data | ||
440 | '001' insert in private packets | ||
441 | '010' separate stream and user data | ||
442 | '111' separate stream and private data | ||
443 | 8:15 Stream ID (normally 0xBD) | ||
444 | Param[1] | ||
445 | Frames per interrupt (max 8). Only valid in raw mode. | ||
446 | Param[2] | ||
447 | Total raw VBI frames. Only valid in raw mode. | ||
448 | Param[3] | ||
449 | Start codes | ||
450 | Param[4] | ||
451 | Stop codes | ||
452 | Param[5] | ||
453 | Lines per frame | ||
454 | Param[6] | ||
455 | Byte per line | ||
456 | Result[0] | ||
457 | Observed frames per interrupt in raw mode only. Rage 1 to Param[1] | ||
458 | Result[1] | ||
459 | Observed number of frames in raw mode. Range 1 to Param[2] | ||
460 | Result[2] | ||
461 | Memory offset to start or raw VBI data | ||
462 | |||
463 | ------------------------------------------------------------------------------- | ||
464 | |||
465 | Name CX2341X_ENC_SET_DMA_BLOCK_SIZE | ||
466 | Enum 201/0xC9 | ||
467 | Description | ||
468 | Set DMA transfer block size | ||
469 | Param[0] | ||
470 | DMA transfer block size in bytes or frames. When unit is bytes, | ||
471 | supported block sizes are 2^7, 2^8 and 2^9 bytes. | ||
472 | Param[1] | ||
473 | Unit: 0=bytes, 1=frames | ||
474 | |||
475 | ------------------------------------------------------------------------------- | ||
476 | |||
477 | Name CX2341X_ENC_GET_PREV_DMA_INFO_MB_10 | ||
478 | Enum 202/0xCA | ||
479 | Description | ||
480 | Returns information on the previous DMA transfer in conjunction with | ||
481 | bit 27 of the interrupt mask. Uses mailbox 10. | ||
482 | Result[0] | ||
483 | Type of stream | ||
484 | Result[1] | ||
485 | Address Offset | ||
486 | Result[2] | ||
487 | Maximum size of transfer | ||
488 | |||
489 | ------------------------------------------------------------------------------- | ||
490 | |||
491 | Name CX2341X_ENC_GET_PREV_DMA_INFO_MB_9 | ||
492 | Enum 203/0xCB | ||
493 | Description | ||
494 | Returns information on the previous DMA transfer in conjunction with | ||
495 | bit 27 of the interrupt mask. Uses mailbox 9. | ||
496 | Result[0] | ||
497 | Status bits: | ||
498 | Bit 0 set indicates transfer complete | ||
499 | Bit 2 set indicates transfer error | ||
500 | Bit 4 set indicates linked list error | ||
501 | Result[1] | ||
502 | DMA type | ||
503 | Result[2] | ||
504 | Presentation Time Stamp bits 0..31 | ||
505 | Result[3] | ||
506 | Presentation Time Stamp bit 32 | ||
507 | |||
508 | ------------------------------------------------------------------------------- | ||
509 | |||
510 | Name CX2341X_ENC_SCHED_DMA_TO_HOST | ||
511 | Enum 204/0xCC | ||
512 | Description | ||
513 | Setup DMA to host operation | ||
514 | Param[0] | ||
515 | Memory address of link list | ||
516 | Param[1] | ||
517 | Length of link list (wtf: what units ???) | ||
518 | Param[2] | ||
519 | DMA type (0=MPEG) | ||
520 | |||
521 | ------------------------------------------------------------------------------- | ||
522 | |||
523 | Name CX2341X_ENC_INITIALIZE_INPUT | ||
524 | Enum 205/0xCD | ||
525 | Description | ||
526 | Initializes the video input | ||
527 | |||
528 | ------------------------------------------------------------------------------- | ||
529 | |||
530 | Name CX2341X_ENC_SET_FRAME_DROP_RATE | ||
531 | Enum 208/0xD0 | ||
532 | Description | ||
533 | For each frame captured, skip specified number of frames. | ||
534 | Param[0] | ||
535 | Number of frames to skip | ||
536 | |||
537 | ------------------------------------------------------------------------------- | ||
538 | |||
539 | Name CX2341X_ENC_PAUSE_ENCODER | ||
540 | Enum 210/0xD2 | ||
541 | Description | ||
542 | During a pause condition, all frames are dropped instead of being encoded. | ||
543 | Param[0] | ||
544 | 0=Pause encoding | ||
545 | 1=Continue encoding | ||
546 | |||
547 | ------------------------------------------------------------------------------- | ||
548 | |||
549 | Name CX2341X_ENC_REFRESH_INPUT | ||
550 | Enum 211/0xD3 | ||
551 | Description | ||
552 | Refreshes the video input | ||
553 | |||
554 | ------------------------------------------------------------------------------- | ||
555 | |||
556 | Name CX2341X_ENC_SET_COPYRIGHT | ||
557 | Enum 212/0xD4 | ||
558 | Description | ||
559 | Sets stream copyright property | ||
560 | Param[0] | ||
561 | 0=Stream is not copyrighted | ||
562 | 1=Stream is copyrighted | ||
563 | |||
564 | ------------------------------------------------------------------------------- | ||
565 | |||
566 | Name CX2341X_ENC_SET_EVENT_NOTIFICATION | ||
567 | Enum 213/0xD5 | ||
568 | Description | ||
569 | Setup firmware to notify the host about a particular event. Host must | ||
570 | unmask the interrupt bit. | ||
571 | Param[0] | ||
572 | Event (0=refresh encoder input) | ||
573 | Param[1] | ||
574 | Notification 0=disabled 1=enabled | ||
575 | Param[2] | ||
576 | Interrupt bit | ||
577 | Param[3] | ||
578 | Mailbox slot, -1 if no mailbox required. | ||
579 | |||
580 | ------------------------------------------------------------------------------- | ||
581 | |||
582 | Name CX2341X_ENC_SET_NUM_VSYNC_LINES | ||
583 | Enum 214/0xD6 | ||
584 | Description | ||
585 | Depending on the analog video decoder used, this assigns the number | ||
586 | of lines for field 1 and 2. | ||
587 | Param[0] | ||
588 | Field 1 number of lines: | ||
589 | 0x00EF for SAA7114 | ||
590 | 0x00F0 for SAA7115 | ||
591 | 0x0105 for Micronas | ||
592 | Param[1] | ||
593 | Field 2 number of lines: | ||
594 | 0x00EF for SAA7114 | ||
595 | 0x00F0 for SAA7115 | ||
596 | 0x0106 for Micronas | ||
597 | |||
598 | ------------------------------------------------------------------------------- | ||
599 | |||
600 | Name CX2341X_ENC_SET_PLACEHOLDER | ||
601 | Enum 215/0xD7 | ||
602 | Description | ||
603 | Provides a mechanism of inserting custom user data in the MPEG stream. | ||
604 | Param[0] | ||
605 | 0=extension & user data | ||
606 | 1=private packet with stream ID 0xBD | ||
607 | Param[1] | ||
608 | Rate at which to insert data, in units of frames (for private packet) | ||
609 | or GOPs (for ext. & user data) | ||
610 | Param[2] | ||
611 | Number of data DWORDs (below) to insert | ||
612 | Param[3] | ||
613 | Custom data 0 | ||
614 | Param[4] | ||
615 | Custom data 1 | ||
616 | Param[5] | ||
617 | Custom data 2 | ||
618 | Param[6] | ||
619 | Custom data 3 | ||
620 | Param[7] | ||
621 | Custom data 4 | ||
622 | Param[8] | ||
623 | Custom data 5 | ||
624 | Param[9] | ||
625 | Custom data 6 | ||
626 | Param[10] | ||
627 | Custom data 7 | ||
628 | Param[11] | ||
629 | Custom data 8 | ||
630 | |||
631 | ------------------------------------------------------------------------------- | ||
632 | |||
633 | Name CX2341X_ENC_MUTE_VIDEO | ||
634 | Enum 217/0xD9 | ||
635 | Description | ||
636 | Video muting | ||
637 | Param[0] | ||
638 | Bit usage: | ||
639 | 0 '0'=video not muted | ||
640 | '1'=video muted, creates frames with the YUV color defined below | ||
641 | 1:7 Unused | ||
642 | 8:15 V chrominance information | ||
643 | 16:23 U chrominance information | ||
644 | 24:31 Y luminance information | ||
645 | |||
646 | ------------------------------------------------------------------------------- | ||
647 | |||
648 | Name CX2341X_ENC_MUTE_AUDIO | ||
649 | Enum 218/0xDA | ||
650 | Description | ||
651 | Audio muting | ||
652 | Param[0] | ||
653 | 0=audio not muted | ||
654 | 1=audio muted (produces silent mpeg audio stream) | ||
655 | |||
656 | ------------------------------------------------------------------------------- | ||
657 | |||
658 | Name CX2341X_ENC_UNKNOWN | ||
659 | Enum 219/0xDB | ||
660 | Description | ||
661 | Unknown API, it's used by Hauppauge though. | ||
662 | Param[0] | ||
663 | 0 This is the value Hauppauge uses, Unknown what it means. | ||
664 | |||
665 | ------------------------------------------------------------------------------- | ||
666 | |||
667 | Name CX2341X_ENC_MISC | ||
668 | Enum 220/0xDC | ||
669 | Description | ||
670 | Miscellaneous actions. Not known for 100% what it does. It's really a | ||
671 | sort of ioctl call. The first parameter is a command number, the second | ||
672 | the value. | ||
673 | Param[0] | ||
674 | Command number: | ||
675 | 1=set initial SCR value when starting encoding. | ||
676 | 2=set quality mode (apparently some test setting). | ||
677 | 3=setup advanced VIM protection handling (supposedly only for the cx23416 | ||
678 | for raw YUV). | ||
679 | Actually it looks like this should be 0 for saa7114/5 based card and 1 | ||
680 | for cx25840 based cards. | ||
681 | 4=generate artificial PTS timestamps | ||
682 | 5=USB flush mode | ||
683 | 6=something to do with the quantization matrix | ||
684 | 7=set navigation pack insertion for DVD | ||
685 | 8=enable scene change detection (seems to be a failure) | ||
686 | 9=set history parameters of the video input module | ||
687 | 10=set input field order of VIM | ||
688 | 11=set quantization matrix | ||
689 | 12=reset audio interface | ||
690 | 13=set audio volume delay | ||
691 | 14=set audio delay | ||
692 | |||
693 | Param[1] | ||
694 | Command value. | ||
diff --git a/Documentation/video4linux/cx2341x/fw-memory.txt b/Documentation/video4linux/cx2341x/fw-memory.txt new file mode 100644 index 000000000000..ef0aad3f88fc --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-memory.txt | |||
@@ -0,0 +1,141 @@ | |||
1 | This document describes the cx2341x memory map and documents some of the register | ||
2 | space. | ||
3 | |||
4 | Warning! This information was figured out from searching through the memory and | ||
5 | registers, this information may not be correct and is certainly not complete, and | ||
6 | was not derived from anything more than searching through the memory space with | ||
7 | commands like: | ||
8 | |||
9 | ivtvctl -O min=0x02000000,max=0x020000ff | ||
10 | |||
11 | So take this as is, I'm always searching for more stuff, it's a large | ||
12 | register space :-). | ||
13 | |||
14 | Memory Map | ||
15 | ========== | ||
16 | |||
17 | The cx2341x exposes its entire 64M memory space to the PCI host via the PCI BAR0 | ||
18 | (Base Address Register 0). The addresses here are offsets relative to the | ||
19 | address held in BAR0. | ||
20 | |||
21 | 0x00000000-0x00ffffff Encoder memory space | ||
22 | 0x00000000-0x0003ffff Encode.rom | ||
23 | ???-??? MPEG buffer(s) | ||
24 | ???-??? Raw video capture buffer(s) | ||
25 | ???-??? Raw audio capture buffer(s) | ||
26 | ???-??? Display buffers (6 or 9) | ||
27 | |||
28 | 0x01000000-0x01ffffff Decoder memory space | ||
29 | 0x01000000-0x0103ffff Decode.rom | ||
30 | ???-??? MPEG buffers(s) | ||
31 | 0x0114b000-0x0115afff Audio.rom (deprecated?) | ||
32 | |||
33 | 0x02000000-0x0200ffff Register Space | ||
34 | |||
35 | Registers | ||
36 | ========= | ||
37 | |||
38 | The registers occupy the 64k space starting at the 0x02000000 offset from BAR0. | ||
39 | All of these registers are 32 bits wide. | ||
40 | |||
41 | DMA Registers 0x000-0xff: | ||
42 | |||
43 | 0x00 - Control: | ||
44 | 0=reset/cancel, 1=read, 2=write, 4=stop | ||
45 | 0x04 - DMA status: | ||
46 | 1=read busy, 2=write busy, 4=read error, 8=write error, 16=link list error | ||
47 | 0x08 - pci DMA pointer for read link list | ||
48 | 0x0c - pci DMA pointer for write link list | ||
49 | 0x10 - read/write DMA enable: | ||
50 | 1=read enable, 2=write enable | ||
51 | 0x14 - always 0xffffffff, if set any lower instability occurs, 0x00 crashes | ||
52 | 0x18 - ?? | ||
53 | 0x1c - always 0x20 or 32, smaller values slow down DMA transactions | ||
54 | 0x20 - always value of 0x780a010a | ||
55 | 0x24-0x3c - usually just random values??? | ||
56 | 0x40 - Interrupt status | ||
57 | 0x44 - Write a bit here and shows up in Interrupt status 0x40 | ||
58 | 0x48 - Interrupt Mask | ||
59 | 0x4C - always value of 0xfffdffff, | ||
60 | if changed to 0xffffffff DMA write interrupts break. | ||
61 | 0x50 - always 0xffffffff | ||
62 | 0x54 - always 0xffffffff (0x4c, 0x50, 0x54 seem like interrupt masks, are | ||
63 | 3 processors on chip, Java ones, VPU, SPU, APU, maybe these are the | ||
64 | interrupt masks???). | ||
65 | 0x60-0x7C - random values | ||
66 | 0x80 - first write linked list reg, for Encoder Memory addr | ||
67 | 0x84 - first write linked list reg, for pci memory addr | ||
68 | 0x88 - first write linked list reg, for length of buffer in memory addr | ||
69 | (|0x80000000 or this for last link) | ||
70 | 0x8c-0xcc - rest of write linked list reg, 8 sets of 3 total, DMA goes here | ||
71 | from linked list addr in reg 0x0c, firmware must push through or | ||
72 | something. | ||
73 | 0xe0 - first (and only) read linked list reg, for pci memory addr | ||
74 | 0xe4 - first (and only) read linked list reg, for Decoder memory addr | ||
75 | 0xe8 - first (and only) read linked list reg, for length of buffer | ||
76 | 0xec-0xff - Nothing seems to be in these registers, 0xec-f4 are 0x00000000. | ||
77 | |||
78 | Memory locations for Encoder Buffers 0x700-0x7ff: | ||
79 | |||
80 | These registers show offsets of memory locations pertaining to each | ||
81 | buffer area used for encoding, have to shift them by <<1 first. | ||
82 | |||
83 | 0x07F8: Encoder SDRAM refresh | ||
84 | 0x07FC: Encoder SDRAM pre-charge | ||
85 | |||
86 | Memory locations for Decoder Buffers 0x800-0x8ff: | ||
87 | |||
88 | These registers show offsets of memory locations pertaining to each | ||
89 | buffer area used for decoding, have to shift them by <<1 first. | ||
90 | |||
91 | 0x08F8: Decoder SDRAM refresh | ||
92 | 0x08FC: Decoder SDRAM pre-charge | ||
93 | |||
94 | Other memory locations: | ||
95 | |||
96 | 0x2800: Video Display Module control | ||
97 | 0x2D00: AO (audio output?) control | ||
98 | 0x2D24: Bytes Flushed | ||
99 | 0x7000: LSB I2C write clock bit (inverted) | ||
100 | 0x7004: LSB I2C write data bit (inverted) | ||
101 | 0x7008: LSB I2C read clock bit | ||
102 | 0x700c: LSB I2C read data bit | ||
103 | 0x9008: GPIO get input state | ||
104 | 0x900c: GPIO set output state | ||
105 | 0x9020: GPIO direction (Bit7 (GPIO 0..7) - 0:input, 1:output) | ||
106 | 0x9050: SPU control | ||
107 | 0x9054: Reset HW blocks | ||
108 | 0x9058: VPU control | ||
109 | 0xA018: Bit6: interrupt pending? | ||
110 | 0xA064: APU command | ||
111 | |||
112 | |||
113 | Interrupt Status Register | ||
114 | ========================= | ||
115 | |||
116 | The definition of the bits in the interrupt status register 0x0040, and the | ||
117 | interrupt mask 0x0048. If a bit is cleared in the mask, then we want our ISR to | ||
118 | execute. | ||
119 | |||
120 | Bit | ||
121 | 31 Encoder Start Capture | ||
122 | 30 Encoder EOS | ||
123 | 29 Encoder VBI capture | ||
124 | 28 Encoder Video Input Module reset event | ||
125 | 27 Encoder DMA complete | ||
126 | 26 | ||
127 | 25 Decoder copy protect detection event | ||
128 | 24 Decoder audio mode change detection event | ||
129 | 23 | ||
130 | 22 Decoder data request | ||
131 | 21 Decoder I-Frame? done | ||
132 | 20 Decoder DMA complete | ||
133 | 19 Decoder VBI re-insertion | ||
134 | 18 Decoder DMA err (linked-list bad) | ||
135 | |||
136 | Missing | ||
137 | Encoder API call completed | ||
138 | Decoder API call completed | ||
139 | Encoder API post(?) | ||
140 | Decoder API post(?) | ||
141 | Decoder VTRACE event | ||
diff --git a/Documentation/video4linux/cx2341x/fw-osd-api.txt b/Documentation/video4linux/cx2341x/fw-osd-api.txt new file mode 100644 index 000000000000..da98ae30a37a --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-osd-api.txt | |||
@@ -0,0 +1,342 @@ | |||
1 | OSD firmware API description | ||
2 | ============================ | ||
3 | |||
4 | Note: this API is part of the decoder firmware, so it's cx23415 only. | ||
5 | |||
6 | ------------------------------------------------------------------------------- | ||
7 | |||
8 | Name CX2341X_OSD_GET_FRAMEBUFFER | ||
9 | Enum 65/0x41 | ||
10 | Description | ||
11 | Return base and length of contiguous OSD memory. | ||
12 | Result[0] | ||
13 | OSD base address | ||
14 | Result[1] | ||
15 | OSD length | ||
16 | |||
17 | ------------------------------------------------------------------------------- | ||
18 | |||
19 | Name CX2341X_OSD_GET_PIXEL_FORMAT | ||
20 | Enum 66/0x42 | ||
21 | Description | ||
22 | Query OSD format | ||
23 | Result[0] | ||
24 | 0=8bit index, 4=AlphaRGB 8:8:8:8 | ||
25 | |||
26 | ------------------------------------------------------------------------------- | ||
27 | |||
28 | Name CX2341X_OSD_SET_PIXEL_FORMAT | ||
29 | Enum 67/0x43 | ||
30 | Description | ||
31 | Assign pixel format | ||
32 | Param[0] | ||
33 | 0=8bit index, 4=AlphaRGB 8:8:8:8 | ||
34 | |||
35 | ------------------------------------------------------------------------------- | ||
36 | |||
37 | Name CX2341X_OSD_GET_STATE | ||
38 | Enum 68/0x44 | ||
39 | Description | ||
40 | Query OSD state | ||
41 | Result[0] | ||
42 | Bit 0 0=off, 1=on | ||
43 | Bits 1:2 alpha control | ||
44 | Bits 3:5 pixel format | ||
45 | |||
46 | ------------------------------------------------------------------------------- | ||
47 | |||
48 | Name CX2341X_OSD_SET_STATE | ||
49 | Enum 69/0x45 | ||
50 | Description | ||
51 | OSD switch | ||
52 | Param[0] | ||
53 | 0=off, 1=on | ||
54 | |||
55 | ------------------------------------------------------------------------------- | ||
56 | |||
57 | Name CX2341X_OSD_GET_OSD_COORDS | ||
58 | Enum 70/0x46 | ||
59 | Description | ||
60 | Retrieve coordinates of OSD area blended with video | ||
61 | Result[0] | ||
62 | OSD buffer address | ||
63 | Result[1] | ||
64 | Stride in pixels | ||
65 | Result[2] | ||
66 | Lines in OSD buffer | ||
67 | Result[3] | ||
68 | Horizontal offset in buffer | ||
69 | Result[4] | ||
70 | Vertical offset in buffer | ||
71 | |||
72 | ------------------------------------------------------------------------------- | ||
73 | |||
74 | Name CX2341X_OSD_SET_OSD_COORDS | ||
75 | Enum 71/0x47 | ||
76 | Description | ||
77 | Assign the coordinates of the OSD area to blend with video | ||
78 | Param[0] | ||
79 | buffer address | ||
80 | Param[1] | ||
81 | buffer stride in pixels | ||
82 | Param[2] | ||
83 | lines in buffer | ||
84 | Param[3] | ||
85 | horizontal offset | ||
86 | Param[4] | ||
87 | vertical offset | ||
88 | |||
89 | ------------------------------------------------------------------------------- | ||
90 | |||
91 | Name CX2341X_OSD_GET_SCREEN_COORDS | ||
92 | Enum 72/0x48 | ||
93 | Description | ||
94 | Retrieve OSD screen area coordinates | ||
95 | Result[0] | ||
96 | top left horizontal offset | ||
97 | Result[1] | ||
98 | top left vertical offset | ||
99 | Result[2] | ||
100 | bottom right hotizontal offset | ||
101 | Result[3] | ||
102 | bottom right vertical offset | ||
103 | |||
104 | ------------------------------------------------------------------------------- | ||
105 | |||
106 | Name CX2341X_OSD_SET_SCREEN_COORDS | ||
107 | Enum 73/0x49 | ||
108 | Description | ||
109 | Assign the coordinates of the screen area to blend with video | ||
110 | Param[0] | ||
111 | top left horizontal offset | ||
112 | Param[1] | ||
113 | top left vertical offset | ||
114 | Param[2] | ||
115 | bottom left horizontal offset | ||
116 | Param[3] | ||
117 | bottom left vertical offset | ||
118 | |||
119 | ------------------------------------------------------------------------------- | ||
120 | |||
121 | Name CX2341X_OSD_GET_GLOBAL_ALPHA | ||
122 | Enum 74/0x4A | ||
123 | Description | ||
124 | Retrieve OSD global alpha | ||
125 | Result[0] | ||
126 | global alpha: 0=off, 1=on | ||
127 | Result[1] | ||
128 | bits 0:7 global alpha | ||
129 | |||
130 | ------------------------------------------------------------------------------- | ||
131 | |||
132 | Name CX2341X_OSD_SET_GLOBAL_ALPHA | ||
133 | Enum 75/0x4B | ||
134 | Description | ||
135 | Update global alpha | ||
136 | Param[0] | ||
137 | global alpha: 0=off, 1=on | ||
138 | Param[1] | ||
139 | global alpha (8 bits) | ||
140 | Param[2] | ||
141 | local alpha: 0=on, 1=off | ||
142 | |||
143 | ------------------------------------------------------------------------------- | ||
144 | |||
145 | Name CX2341X_OSD_SET_BLEND_COORDS | ||
146 | Enum 78/0x4C | ||
147 | Description | ||
148 | Move start of blending area within display buffer | ||
149 | Param[0] | ||
150 | horizontal offset in buffer | ||
151 | Param[1] | ||
152 | vertical offset in buffer | ||
153 | |||
154 | ------------------------------------------------------------------------------- | ||
155 | |||
156 | Name CX2341X_OSD_GET_FLICKER_STATE | ||
157 | Enum 79/0x4F | ||
158 | Description | ||
159 | Retrieve flicker reduction module state | ||
160 | Result[0] | ||
161 | flicker state: 0=off, 1=on | ||
162 | |||
163 | ------------------------------------------------------------------------------- | ||
164 | |||
165 | Name CX2341X_OSD_SET_FLICKER_STATE | ||
166 | Enum 80/0x50 | ||
167 | Description | ||
168 | Set flicker reduction module state | ||
169 | Param[0] | ||
170 | State: 0=off, 1=on | ||
171 | |||
172 | ------------------------------------------------------------------------------- | ||
173 | |||
174 | Name CX2341X_OSD_BLT_COPY | ||
175 | Enum 82/0x52 | ||
176 | Description | ||
177 | BLT copy | ||
178 | Param[0] | ||
179 | '0000' zero | ||
180 | '0001' ~destination AND ~source | ||
181 | '0010' ~destination AND source | ||
182 | '0011' ~destination | ||
183 | '0100' destination AND ~source | ||
184 | '0101' ~source | ||
185 | '0110' destination XOR source | ||
186 | '0111' ~destination OR ~source | ||
187 | '1000' ~destination AND ~source | ||
188 | '1001' destination XNOR source | ||
189 | '1010' source | ||
190 | '1011' ~destination OR source | ||
191 | '1100' destination | ||
192 | '1101' destination OR ~source | ||
193 | '1110' destination OR source | ||
194 | '1111' one | ||
195 | |||
196 | Param[1] | ||
197 | Resulting alpha blending | ||
198 | '01' source_alpha | ||
199 | '10' destination_alpha | ||
200 | '11' source_alpha*destination_alpha+1 | ||
201 | (zero if both source and destination alpha are zero) | ||
202 | Param[2] | ||
203 | '00' output_pixel = source_pixel | ||
204 | |||
205 | '01' if source_alpha=0: | ||
206 | output_pixel = destination_pixel | ||
207 | if 256 > source_alpha > 1: | ||
208 | output_pixel = ((source_alpha + 1)*source_pixel + | ||
209 | (255 - source_alpha)*destination_pixel)/256 | ||
210 | |||
211 | '10' if destination_alpha=0: | ||
212 | output_pixel = source_pixel | ||
213 | if 255 > destination_alpha > 0: | ||
214 | output_pixel = ((255 - destination_alpha)*source_pixel + | ||
215 | (destination_alpha + 1)*destination_pixel)/256 | ||
216 | |||
217 | '11' if source_alpha=0: | ||
218 | source_temp = 0 | ||
219 | if source_alpha=255: | ||
220 | source_temp = source_pixel*256 | ||
221 | if 255 > source_alpha > 0: | ||
222 | source_temp = source_pixel*(source_alpha + 1) | ||
223 | if destination_alpha=0: | ||
224 | destination_temp = 0 | ||
225 | if destination_alpha=255: | ||
226 | destination_temp = destination_pixel*256 | ||
227 | if 255 > destination_alpha > 0: | ||
228 | destination_temp = destination_pixel*(destination_alpha + 1) | ||
229 | output_pixel = (source_temp + destination_temp)/256 | ||
230 | Param[3] | ||
231 | width | ||
232 | Param[4] | ||
233 | height | ||
234 | Param[5] | ||
235 | destination pixel mask | ||
236 | Param[6] | ||
237 | destination rectangle start address | ||
238 | Param[7] | ||
239 | destination stride in dwords | ||
240 | Param[8] | ||
241 | source stride in dwords | ||
242 | Param[9] | ||
243 | source rectangle start address | ||
244 | |||
245 | ------------------------------------------------------------------------------- | ||
246 | |||
247 | Name CX2341X_OSD_BLT_FILL | ||
248 | Enum 83/0x53 | ||
249 | Description | ||
250 | BLT fill color | ||
251 | Param[0] | ||
252 | Same as Param[0] on API 0x52 | ||
253 | Param[1] | ||
254 | Same as Param[1] on API 0x52 | ||
255 | Param[2] | ||
256 | Same as Param[2] on API 0x52 | ||
257 | Param[3] | ||
258 | width | ||
259 | Param[4] | ||
260 | height | ||
261 | Param[5] | ||
262 | destination pixel mask | ||
263 | Param[6] | ||
264 | destination rectangle start address | ||
265 | Param[7] | ||
266 | destination stride in dwords | ||
267 | Param[8] | ||
268 | color fill value | ||
269 | |||
270 | ------------------------------------------------------------------------------- | ||
271 | |||
272 | Name CX2341X_OSD_BLT_TEXT | ||
273 | Enum 84/0x54 | ||
274 | Description | ||
275 | BLT for 8 bit alpha text source | ||
276 | Param[0] | ||
277 | Same as Param[0] on API 0x52 | ||
278 | Param[1] | ||
279 | Same as Param[1] on API 0x52 | ||
280 | Param[2] | ||
281 | Same as Param[2] on API 0x52 | ||
282 | Param[3] | ||
283 | width | ||
284 | Param[4] | ||
285 | height | ||
286 | Param[5] | ||
287 | destination pixel mask | ||
288 | Param[6] | ||
289 | destination rectangle start address | ||
290 | Param[7] | ||
291 | destination stride in dwords | ||
292 | Param[8] | ||
293 | source stride in dwords | ||
294 | Param[9] | ||
295 | source rectangle start address | ||
296 | Param[10] | ||
297 | color fill value | ||
298 | |||
299 | ------------------------------------------------------------------------------- | ||
300 | |||
301 | Name CX2341X_OSD_SET_FRAMEBUFFER_WINDOW | ||
302 | Enum 86/0x56 | ||
303 | Description | ||
304 | Positions the main output window on the screen. The coordinates must be | ||
305 | such that the entire window fits on the screen. | ||
306 | Param[0] | ||
307 | window width | ||
308 | Param[1] | ||
309 | window height | ||
310 | Param[2] | ||
311 | top left window corner horizontal offset | ||
312 | Param[3] | ||
313 | top left window corner vertical offset | ||
314 | |||
315 | ------------------------------------------------------------------------------- | ||
316 | |||
317 | Name CX2341X_OSD_SET_CHROMA_KEY | ||
318 | Enum 96/0x60 | ||
319 | Description | ||
320 | Chroma key switch and color | ||
321 | Param[0] | ||
322 | state: 0=off, 1=on | ||
323 | Param[1] | ||
324 | color | ||
325 | |||
326 | ------------------------------------------------------------------------------- | ||
327 | |||
328 | Name CX2341X_OSD_GET_ALPHA_CONTENT_INDEX | ||
329 | Enum 97/0x61 | ||
330 | Description | ||
331 | Retrieve alpha content index | ||
332 | Result[0] | ||
333 | alpha content index, Range 0:15 | ||
334 | |||
335 | ------------------------------------------------------------------------------- | ||
336 | |||
337 | Name CX2341X_OSD_SET_ALPHA_CONTENT_INDEX | ||
338 | Enum 98/0x62 | ||
339 | Description | ||
340 | Assign alpha content index | ||
341 | Param[0] | ||
342 | alpha content index, range 0:15 | ||
diff --git a/Documentation/video4linux/cx2341x/fw-upload.txt b/Documentation/video4linux/cx2341x/fw-upload.txt new file mode 100644 index 000000000000..60c502ce3215 --- /dev/null +++ b/Documentation/video4linux/cx2341x/fw-upload.txt | |||
@@ -0,0 +1,49 @@ | |||
1 | This document describes how to upload the cx2341x firmware to the card. | ||
2 | |||
3 | How to find | ||
4 | =========== | ||
5 | |||
6 | See the web pages of the various projects that uses this chip for information | ||
7 | on how to obtain the firmware. | ||
8 | |||
9 | The firmware stored in a Windows driver can be detected as follows: | ||
10 | |||
11 | - Each firmware image is 256k bytes. | ||
12 | - The 1st 32-bit word of the Encoder image is 0x0000da7 | ||
13 | - The 1st 32-bit word of the Decoder image is 0x00003a7 | ||
14 | - The 2nd 32-bit word of both images is 0xaa55bb66 | ||
15 | |||
16 | How to load | ||
17 | =========== | ||
18 | |||
19 | - Issue the FWapi command to stop the encoder if it is running. Wait for the | ||
20 | command to complete. | ||
21 | - Issue the FWapi command to stop the decoder if it is running. Wait for the | ||
22 | command to complete. | ||
23 | - Issue the I2C command to the digitizer to stop emitting VSYNC events. | ||
24 | - Issue the FWapi command to halt the encoder's firmware. | ||
25 | - Sleep for 10ms. | ||
26 | - Issue the FWapi command to halt the decoder's firmware. | ||
27 | - Sleep for 10ms. | ||
28 | - Write 0x00000000 to register 0x2800 to stop the Video Display Module. | ||
29 | - Write 0x00000005 to register 0x2D00 to stop the AO (audio output?). | ||
30 | - Write 0x00000000 to register 0xA064 to ping? the APU. | ||
31 | - Write 0xFFFFFFFE to register 0x9058 to stop the VPU. | ||
32 | - Write 0xFFFFFFFF to register 0x9054 to reset the HW blocks. | ||
33 | - Write 0x00000001 to register 0x9050 to stop the SPU. | ||
34 | - Sleep for 10ms. | ||
35 | - Write 0x0000001A to register 0x07FC to init the Encoder SDRAM's pre-charge. | ||
36 | - Write 0x80000640 to register 0x07F8 to init the Encoder SDRAM's refresh to 1us. | ||
37 | - Write 0x0000001A to register 0x08FC to init the Decoder SDRAM's pre-charge. | ||
38 | - Write 0x80000640 to register 0x08F8 to init the Decoder SDRAM's refresh to 1us. | ||
39 | - Sleep for 512ms. (600ms is recommended) | ||
40 | - Transfer the encoder's firmware image to offset 0 in Encoder memory space. | ||
41 | - Transfer the decoder's firmware image to offset 0 in Decoder memory space. | ||
42 | - Use a read-modify-write operation to Clear bit 0 of register 0x9050 to | ||
43 | re-enable the SPU. | ||
44 | - Sleep for 1 second. | ||
45 | - Use a read-modify-write operation to Clear bits 3 and 0 of register 0x9058 | ||
46 | to re-enable the VPU. | ||
47 | - Sleep for 1 second. | ||
48 | - Issue status API commands to both firmware images to verify. | ||
49 | |||
diff --git a/Documentation/video4linux/cx88/hauppauge-wintv-cx88-ir.txt b/Documentation/video4linux/cx88/hauppauge-wintv-cx88-ir.txt new file mode 100644 index 000000000000..93fec32a1188 --- /dev/null +++ b/Documentation/video4linux/cx88/hauppauge-wintv-cx88-ir.txt | |||
@@ -0,0 +1,54 @@ | |||
1 | The controls for the mux are GPIO [0,1] for source, and GPIO 2 for muting. | ||
2 | |||
3 | GPIO0 GPIO1 | ||
4 | 0 0 TV Audio | ||
5 | 1 0 FM radio | ||
6 | 0 1 Line-In | ||
7 | 1 1 Mono tuner bypass or CD passthru (tuner specific) | ||
8 | |||
9 | GPIO 16(i believe) is tied to the IR port (if present). | ||
10 | |||
11 | ------------------------------------------------------------------------------------ | ||
12 | |||
13 | >From the data sheet: | ||
14 | Register 24'h20004 PCI Interrupt Status | ||
15 | bit [18] IR_SMP_INT Set when 32 input samples have been collected over | ||
16 | gpio[16] pin into GP_SAMPLE register. | ||
17 | |||
18 | What's missing from the data sheet: | ||
19 | |||
20 | Setup 4KHz sampling rate (roughly 2x oversampled; good enough for our RC5 | ||
21 | compat remote) | ||
22 | set register 0x35C050 to 0xa80a80 | ||
23 | |||
24 | enable sampling | ||
25 | set register 0x35C054 to 0x5 | ||
26 | |||
27 | Of course, enable the IRQ bit 18 in the interrupt mask register .(and | ||
28 | provide for a handler) | ||
29 | |||
30 | GP_SAMPLE register is at 0x35C058 | ||
31 | |||
32 | Bits are then right shifted into the GP_SAMPLE register at the specified | ||
33 | rate; you get an interrupt when a full DWORD is recieved. | ||
34 | You need to recover the actual RC5 bits out of the (oversampled) IR sensor | ||
35 | bits. (Hint: look for the 0/1and 1/0 crossings of the RC5 bi-phase data) An | ||
36 | actual raw RC5 code will span 2-3 DWORDS, depending on the actual alignment. | ||
37 | |||
38 | I'm pretty sure when no IR signal is present the receiver is always in a | ||
39 | marking state(1); but stray light, etc can cause intermittent noise values | ||
40 | as well. Remember, this is a free running sample of the IR receiver state | ||
41 | over time, so don't assume any sample starts at any particular place. | ||
42 | |||
43 | http://www.atmel.com/dyn/resources/prod_documents/doc2817.pdf | ||
44 | This data sheet (google search) seems to have a lovely description of the | ||
45 | RC5 basics | ||
46 | |||
47 | http://users.pandora.be/nenya/electronics/rc5/ and more data | ||
48 | |||
49 | http://www.ee.washington.edu/circuit_archive/text/ir_decode.txt | ||
50 | and even a reference to how to decode a bi-phase data stream. | ||
51 | |||
52 | http://www.xs4all.nl/~sbp/knowledge/ir/rc5.htm | ||
53 | still more info | ||
54 | |||
diff --git a/Documentation/video4linux/et61x251.txt b/Documentation/video4linux/et61x251.txt index 29340282ab5f..cd584f20a997 100644 --- a/Documentation/video4linux/et61x251.txt +++ b/Documentation/video4linux/et61x251.txt | |||
@@ -1,9 +1,9 @@ | |||
1 | 1 | ||
2 | ET61X[12]51 PC Camera Controllers | 2 | ET61X[12]51 PC Camera Controllers |
3 | Driver for Linux | 3 | Driver for Linux |
4 | ================================= | 4 | ================================= |
5 | 5 | ||
6 | - Documentation - | 6 | - Documentation - |
7 | 7 | ||
8 | 8 | ||
9 | Index | 9 | Index |
@@ -156,46 +156,46 @@ Name: video_nr | |||
156 | Type: short array (min = 0, max = 64) | 156 | Type: short array (min = 0, max = 64) |
157 | Syntax: <-1|n[,...]> | 157 | Syntax: <-1|n[,...]> |
158 | Description: Specify V4L2 minor mode number: | 158 | Description: Specify V4L2 minor mode number: |
159 | -1 = use next available | 159 | -1 = use next available |
160 | n = use minor number n | 160 | n = use minor number n |
161 | You can specify up to 64 cameras this way. | 161 | You can specify up to 64 cameras this way. |
162 | For example: | 162 | For example: |
163 | video_nr=-1,2,-1 would assign minor number 2 to the second | 163 | video_nr=-1,2,-1 would assign minor number 2 to the second |
164 | registered camera and use auto for the first one and for every | 164 | registered camera and use auto for the first one and for every |
165 | other camera. | 165 | other camera. |
166 | Default: -1 | 166 | Default: -1 |
167 | ------------------------------------------------------------------------------- | 167 | ------------------------------------------------------------------------------- |
168 | Name: force_munmap | 168 | Name: force_munmap |
169 | Type: bool array (min = 0, max = 64) | 169 | Type: bool array (min = 0, max = 64) |
170 | Syntax: <0|1[,...]> | 170 | Syntax: <0|1[,...]> |
171 | Description: Force the application to unmap previously mapped buffer memory | 171 | Description: Force the application to unmap previously mapped buffer memory |
172 | before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not | 172 | before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not |
173 | all the applications support this feature. This parameter is | 173 | all the applications support this feature. This parameter is |
174 | specific for each detected camera. | 174 | specific for each detected camera. |
175 | 0 = do not force memory unmapping | 175 | 0 = do not force memory unmapping |
176 | 1 = force memory unmapping (save memory) | 176 | 1 = force memory unmapping (save memory) |
177 | Default: 0 | 177 | Default: 0 |
178 | ------------------------------------------------------------------------------- | 178 | ------------------------------------------------------------------------------- |
179 | Name: frame_timeout | 179 | Name: frame_timeout |
180 | Type: uint array (min = 0, max = 64) | 180 | Type: uint array (min = 0, max = 64) |
181 | Syntax: <n[,...]> | 181 | Syntax: <n[,...]> |
182 | Description: Timeout for a video frame in seconds. This parameter is | 182 | Description: Timeout for a video frame in seconds. This parameter is |
183 | specific for each detected camera. This parameter can be | 183 | specific for each detected camera. This parameter can be |
184 | changed at runtime thanks to the /sys filesystem interface. | 184 | changed at runtime thanks to the /sys filesystem interface. |
185 | Default: 2 | 185 | Default: 2 |
186 | ------------------------------------------------------------------------------- | 186 | ------------------------------------------------------------------------------- |
187 | Name: debug | 187 | Name: debug |
188 | Type: ushort | 188 | Type: ushort |
189 | Syntax: <n> | 189 | Syntax: <n> |
190 | Description: Debugging information level, from 0 to 3: | 190 | Description: Debugging information level, from 0 to 3: |
191 | 0 = none (use carefully) | 191 | 0 = none (use carefully) |
192 | 1 = critical errors | 192 | 1 = critical errors |
193 | 2 = significant informations | 193 | 2 = significant informations |
194 | 3 = more verbose messages | 194 | 3 = more verbose messages |
195 | Level 3 is useful for testing only, when only one device | 195 | Level 3 is useful for testing only, when only one device |
196 | is used at the same time. It also shows some more informations | 196 | is used at the same time. It also shows some more informations |
197 | about the hardware being detected. This module parameter can be | 197 | about the hardware being detected. This module parameter can be |
198 | changed at runtime thanks to the /sys filesystem interface. | 198 | changed at runtime thanks to the /sys filesystem interface. |
199 | Default: 2 | 199 | Default: 2 |
200 | ------------------------------------------------------------------------------- | 200 | ------------------------------------------------------------------------------- |
201 | 201 | ||
diff --git a/Documentation/video4linux/ibmcam.txt b/Documentation/video4linux/ibmcam.txt index 4a40a2e99451..397a94eb77b8 100644 --- a/Documentation/video4linux/ibmcam.txt +++ b/Documentation/video4linux/ibmcam.txt | |||
@@ -21,7 +21,7 @@ Internal interface: Video For Linux (V4L) | |||
21 | Supported controls: | 21 | Supported controls: |
22 | - by V4L: Contrast, Brightness, Color, Hue | 22 | - by V4L: Contrast, Brightness, Color, Hue |
23 | - by driver options: frame rate, lighting conditions, video format, | 23 | - by driver options: frame rate, lighting conditions, video format, |
24 | default picture settings, sharpness. | 24 | default picture settings, sharpness. |
25 | 25 | ||
26 | SUPPORTED CAMERAS: | 26 | SUPPORTED CAMERAS: |
27 | 27 | ||
@@ -191,66 +191,66 @@ init_model2_sat Integer 0..255 [0x34] init_model2_sat=65 | |||
191 | init_model2_yb Integer 0..255 [0xa0] init_model2_yb=200 | 191 | init_model2_yb Integer 0..255 [0xa0] init_model2_yb=200 |
192 | 192 | ||
193 | debug You don't need this option unless you are a developer. | 193 | debug You don't need this option unless you are a developer. |
194 | If you are a developer then you will see in the code | 194 | If you are a developer then you will see in the code |
195 | what values do what. 0=off. | 195 | what values do what. 0=off. |
196 | 196 | ||
197 | flags This is a bit mask, and you can combine any number of | 197 | flags This is a bit mask, and you can combine any number of |
198 | bits to produce what you want. Usually you don't want | 198 | bits to produce what you want. Usually you don't want |
199 | any of extra features this option provides: | 199 | any of extra features this option provides: |
200 | 200 | ||
201 | FLAGS_RETRY_VIDIOCSYNC 1 This bit allows to retry failed | 201 | FLAGS_RETRY_VIDIOCSYNC 1 This bit allows to retry failed |
202 | VIDIOCSYNC ioctls without failing. | 202 | VIDIOCSYNC ioctls without failing. |
203 | Will work with xawtv, will not | 203 | Will work with xawtv, will not |
204 | with xrealproducer. Default is | 204 | with xrealproducer. Default is |
205 | not set. | 205 | not set. |
206 | FLAGS_MONOCHROME 2 Activates monochrome (b/w) mode. | 206 | FLAGS_MONOCHROME 2 Activates monochrome (b/w) mode. |
207 | FLAGS_DISPLAY_HINTS 4 Shows colored pixels which have | 207 | FLAGS_DISPLAY_HINTS 4 Shows colored pixels which have |
208 | magic meaning to developers. | 208 | magic meaning to developers. |
209 | FLAGS_OVERLAY_STATS 8 Shows tiny numbers on screen, | 209 | FLAGS_OVERLAY_STATS 8 Shows tiny numbers on screen, |
210 | useful only for debugging. | 210 | useful only for debugging. |
211 | FLAGS_FORCE_TESTPATTERN 16 Shows blue screen with numbers. | 211 | FLAGS_FORCE_TESTPATTERN 16 Shows blue screen with numbers. |
212 | FLAGS_SEPARATE_FRAMES 32 Shows each frame separately, as | 212 | FLAGS_SEPARATE_FRAMES 32 Shows each frame separately, as |
213 | it was received from the camera. | 213 | it was received from the camera. |
214 | Default (not set) is to mix the | 214 | Default (not set) is to mix the |
215 | preceding frame in to compensate | 215 | preceding frame in to compensate |
216 | for occasional loss of Isoc data | 216 | for occasional loss of Isoc data |
217 | on high frame rates. | 217 | on high frame rates. |
218 | FLAGS_CLEAN_FRAMES 64 Forces "cleanup" of each frame | 218 | FLAGS_CLEAN_FRAMES 64 Forces "cleanup" of each frame |
219 | prior to use; relevant only if | 219 | prior to use; relevant only if |
220 | FLAGS_SEPARATE_FRAMES is set. | 220 | FLAGS_SEPARATE_FRAMES is set. |
221 | Default is not to clean frames, | 221 | Default is not to clean frames, |
222 | this is a little faster but may | 222 | this is a little faster but may |
223 | produce flicker if frame rate is | 223 | produce flicker if frame rate is |
224 | too high and Isoc data gets lost. | 224 | too high and Isoc data gets lost. |
225 | FLAGS_NO_DECODING 128 This flag turns the video stream | 225 | FLAGS_NO_DECODING 128 This flag turns the video stream |
226 | decoder off, and dumps the raw | 226 | decoder off, and dumps the raw |
227 | Isoc data from the camera into | 227 | Isoc data from the camera into |
228 | the reading process. Useful to | 228 | the reading process. Useful to |
229 | developers, but not to users. | 229 | developers, but not to users. |
230 | 230 | ||
231 | framerate This setting controls frame rate of the camera. This is | 231 | framerate This setting controls frame rate of the camera. This is |
232 | an approximate setting (in terms of "worst" ... "best") | 232 | an approximate setting (in terms of "worst" ... "best") |
233 | because camera changes frame rate depending on amount | 233 | because camera changes frame rate depending on amount |
234 | of light available. Setting 0 is slowest, 6 is fastest. | 234 | of light available. Setting 0 is slowest, 6 is fastest. |
235 | Beware - fast settings are very demanding and may not | 235 | Beware - fast settings are very demanding and may not |
236 | work well with all video sizes. Be conservative. | 236 | work well with all video sizes. Be conservative. |
237 | 237 | ||
238 | hue_correction This highly optional setting allows to adjust the | 238 | hue_correction This highly optional setting allows to adjust the |
239 | hue of the image in a way slightly different from | 239 | hue of the image in a way slightly different from |
240 | what usual "hue" control does. Both controls affect | 240 | what usual "hue" control does. Both controls affect |
241 | YUV colorspace: regular "hue" control adjusts only | 241 | YUV colorspace: regular "hue" control adjusts only |
242 | U component, and this "hue_correction" option similarly | 242 | U component, and this "hue_correction" option similarly |
243 | adjusts only V component. However usually it is enough | 243 | adjusts only V component. However usually it is enough |
244 | to tweak only U or V to compensate for colored light or | 244 | to tweak only U or V to compensate for colored light or |
245 | color temperature; this option simply allows more | 245 | color temperature; this option simply allows more |
246 | complicated correction when and if it is necessary. | 246 | complicated correction when and if it is necessary. |
247 | 247 | ||
248 | init_brightness These settings specify _initial_ values which will be | 248 | init_brightness These settings specify _initial_ values which will be |
249 | init_contrast used to set up the camera. If your V4L application has | 249 | init_contrast used to set up the camera. If your V4L application has |
250 | init_color its own controls to adjust the picture then these | 250 | init_color its own controls to adjust the picture then these |
251 | init_hue controls will be used too. These options allow you to | 251 | init_hue controls will be used too. These options allow you to |
252 | preconfigure the camera when it gets connected, before | 252 | preconfigure the camera when it gets connected, before |
253 | any V4L application connects to it. Good for webcams. | 253 | any V4L application connects to it. Good for webcams. |
254 | 254 | ||
255 | init_model2_rg These initial settings alter color balance of the | 255 | init_model2_rg These initial settings alter color balance of the |
256 | init_model2_rg2 camera on hardware level. All four settings may be used | 256 | init_model2_rg2 camera on hardware level. All four settings may be used |
@@ -258,47 +258,47 @@ init_model2_sat to tune the camera to specific lighting conditions. These | |||
258 | init_model2_yb settings only apply to Model 2 cameras. | 258 | init_model2_yb settings only apply to Model 2 cameras. |
259 | 259 | ||
260 | lighting This option selects one of three hardware-defined | 260 | lighting This option selects one of three hardware-defined |
261 | photosensitivity settings of the camera. 0=bright light, | 261 | photosensitivity settings of the camera. 0=bright light, |
262 | 1=Medium (default), 2=Low light. This setting affects | 262 | 1=Medium (default), 2=Low light. This setting affects |
263 | frame rate: the dimmer the lighting the lower the frame | 263 | frame rate: the dimmer the lighting the lower the frame |
264 | rate (because longer exposition time is needed). The | 264 | rate (because longer exposition time is needed). The |
265 | Model 2 cameras allow values more than 2 for this option, | 265 | Model 2 cameras allow values more than 2 for this option, |
266 | thus enabling extremely high sensitivity at cost of frame | 266 | thus enabling extremely high sensitivity at cost of frame |
267 | rate, color saturation and imaging sensor noise. | 267 | rate, color saturation and imaging sensor noise. |
268 | 268 | ||
269 | sharpness This option controls smoothing (noise reduction) | 269 | sharpness This option controls smoothing (noise reduction) |
270 | made by camera. Setting 0 is most smooth, setting 6 | 270 | made by camera. Setting 0 is most smooth, setting 6 |
271 | is most sharp. Be aware that CMOS sensor used in the | 271 | is most sharp. Be aware that CMOS sensor used in the |
272 | camera is pretty noisy, so if you choose 6 you will | 272 | camera is pretty noisy, so if you choose 6 you will |
273 | be greeted with "snowy" image. Default is 4. Model 2 | 273 | be greeted with "snowy" image. Default is 4. Model 2 |
274 | cameras do not support this feature. | 274 | cameras do not support this feature. |
275 | 275 | ||
276 | size This setting chooses one of several image sizes that are | 276 | size This setting chooses one of several image sizes that are |
277 | supported by this driver. Cameras may support more, but | 277 | supported by this driver. Cameras may support more, but |
278 | it's difficult to reverse-engineer all formats. | 278 | it's difficult to reverse-engineer all formats. |
279 | Following video sizes are supported: | 279 | Following video sizes are supported: |
280 | 280 | ||
281 | size=0 128x96 (Model 1 only) | 281 | size=0 128x96 (Model 1 only) |
282 | size=1 160x120 | 282 | size=1 160x120 |
283 | size=2 176x144 | 283 | size=2 176x144 |
284 | size=3 320x240 (Model 2 only) | 284 | size=3 320x240 (Model 2 only) |
285 | size=4 352x240 (Model 2 only) | 285 | size=4 352x240 (Model 2 only) |
286 | size=5 352x288 | 286 | size=5 352x288 |
287 | size=6 640x480 (Model 3 only) | 287 | size=6 640x480 (Model 3 only) |
288 | 288 | ||
289 | The 352x288 is the native size of the Model 1 sensor | 289 | The 352x288 is the native size of the Model 1 sensor |
290 | array, so it's the best resolution the camera can | 290 | array, so it's the best resolution the camera can |
291 | yield. The best resolution of Model 2 is 176x144, and | 291 | yield. The best resolution of Model 2 is 176x144, and |
292 | larger images are produced by stretching the bitmap. | 292 | larger images are produced by stretching the bitmap. |
293 | Model 3 has sensor with 640x480 grid, and it works too, | 293 | Model 3 has sensor with 640x480 grid, and it works too, |
294 | but the frame rate will be exceptionally low (1-2 FPS); | 294 | but the frame rate will be exceptionally low (1-2 FPS); |
295 | it may be still OK for some applications, like security. | 295 | it may be still OK for some applications, like security. |
296 | Choose the image size you need. The smaller image can | 296 | Choose the image size you need. The smaller image can |
297 | support faster frame rate. Default is 352x288. | 297 | support faster frame rate. Default is 352x288. |
298 | 298 | ||
299 | For more information and the Troubleshooting FAQ visit this URL: | 299 | For more information and the Troubleshooting FAQ visit this URL: |
300 | 300 | ||
301 | http://www.linux-usb.org/ibmcam/ | 301 | http://www.linux-usb.org/ibmcam/ |
302 | 302 | ||
303 | WHAT NEEDS TO BE DONE: | 303 | WHAT NEEDS TO BE DONE: |
304 | 304 | ||
diff --git a/Documentation/video4linux/ov511.txt b/Documentation/video4linux/ov511.txt index 142741e3c578..79af610d4ba5 100644 --- a/Documentation/video4linux/ov511.txt +++ b/Documentation/video4linux/ov511.txt | |||
@@ -81,7 +81,7 @@ MODULE PARAMETERS: | |||
81 | TYPE: integer (Boolean) | 81 | TYPE: integer (Boolean) |
82 | DEFAULT: 1 | 82 | DEFAULT: 1 |
83 | DESC: Brightness is normally under automatic control and can't be set | 83 | DESC: Brightness is normally under automatic control and can't be set |
84 | manually by the video app. Set to 0 for manual control. | 84 | manually by the video app. Set to 0 for manual control. |
85 | 85 | ||
86 | NAME: autogain | 86 | NAME: autogain |
87 | TYPE: integer (Boolean) | 87 | TYPE: integer (Boolean) |
@@ -97,13 +97,13 @@ MODULE PARAMETERS: | |||
97 | TYPE: integer (0-6) | 97 | TYPE: integer (0-6) |
98 | DEFAULT: 3 | 98 | DEFAULT: 3 |
99 | DESC: Sets the threshold for printing debug messages. The higher the value, | 99 | DESC: Sets the threshold for printing debug messages. The higher the value, |
100 | the more is printed. The levels are cumulative, and are as follows: | 100 | the more is printed. The levels are cumulative, and are as follows: |
101 | 0=no debug messages | 101 | 0=no debug messages |
102 | 1=init/detection/unload and other significant messages | 102 | 1=init/detection/unload and other significant messages |
103 | 2=some warning messages | 103 | 2=some warning messages |
104 | 3=config/control function calls | 104 | 3=config/control function calls |
105 | 4=most function calls and data parsing messages | 105 | 4=most function calls and data parsing messages |
106 | 5=highly repetitive mesgs | 106 | 5=highly repetitive mesgs |
107 | 107 | ||
108 | NAME: snapshot | 108 | NAME: snapshot |
109 | TYPE: integer (Boolean) | 109 | TYPE: integer (Boolean) |
@@ -116,24 +116,24 @@ MODULE PARAMETERS: | |||
116 | TYPE: integer (1-4 for OV511, 1-31 for OV511+) | 116 | TYPE: integer (1-4 for OV511, 1-31 for OV511+) |
117 | DEFAULT: 1 | 117 | DEFAULT: 1 |
118 | DESC: Number of cameras allowed to stream simultaneously on a single bus. | 118 | DESC: Number of cameras allowed to stream simultaneously on a single bus. |
119 | Values higher than 1 reduce the data rate of each camera, allowing two | 119 | Values higher than 1 reduce the data rate of each camera, allowing two |
120 | or more to be used at once. If you have a complicated setup involving | 120 | or more to be used at once. If you have a complicated setup involving |
121 | both OV511 and OV511+ cameras, trial-and-error may be necessary for | 121 | both OV511 and OV511+ cameras, trial-and-error may be necessary for |
122 | finding the optimum setting. | 122 | finding the optimum setting. |
123 | 123 | ||
124 | NAME: compress | 124 | NAME: compress |
125 | TYPE: integer (Boolean) | 125 | TYPE: integer (Boolean) |
126 | DEFAULT: 0 | 126 | DEFAULT: 0 |
127 | DESC: Set this to 1 to turn on the camera's compression engine. This can | 127 | DESC: Set this to 1 to turn on the camera's compression engine. This can |
128 | potentially increase the frame rate at the expense of quality, if you | 128 | potentially increase the frame rate at the expense of quality, if you |
129 | have a fast CPU. You must load the proper compression module for your | 129 | have a fast CPU. You must load the proper compression module for your |
130 | camera before starting your application (ov511_decomp or ov518_decomp). | 130 | camera before starting your application (ov511_decomp or ov518_decomp). |
131 | 131 | ||
132 | NAME: testpat | 132 | NAME: testpat |
133 | TYPE: integer (Boolean) | 133 | TYPE: integer (Boolean) |
134 | DEFAULT: 0 | 134 | DEFAULT: 0 |
135 | DESC: This configures the camera's sensor to transmit a colored test-pattern | 135 | DESC: This configures the camera's sensor to transmit a colored test-pattern |
136 | instead of an image. This does not work correctly yet. | 136 | instead of an image. This does not work correctly yet. |
137 | 137 | ||
138 | NAME: dumppix | 138 | NAME: dumppix |
139 | TYPE: integer (0-2) | 139 | TYPE: integer (0-2) |
diff --git a/Documentation/video4linux/sn9c102.txt b/Documentation/video4linux/sn9c102.txt index 142920bc011f..1d20895b4354 100644 --- a/Documentation/video4linux/sn9c102.txt +++ b/Documentation/video4linux/sn9c102.txt | |||
@@ -1,9 +1,9 @@ | |||
1 | 1 | ||
2 | SN9C10x PC Camera Controllers | 2 | SN9C10x PC Camera Controllers |
3 | Driver for Linux | 3 | Driver for Linux |
4 | ============================= | 4 | ============================= |
5 | 5 | ||
6 | - Documentation - | 6 | - Documentation - |
7 | 7 | ||
8 | 8 | ||
9 | Index | 9 | Index |
@@ -176,46 +176,46 @@ Name: video_nr | |||
176 | Type: short array (min = 0, max = 64) | 176 | Type: short array (min = 0, max = 64) |
177 | Syntax: <-1|n[,...]> | 177 | Syntax: <-1|n[,...]> |
178 | Description: Specify V4L2 minor mode number: | 178 | Description: Specify V4L2 minor mode number: |
179 | -1 = use next available | 179 | -1 = use next available |
180 | n = use minor number n | 180 | n = use minor number n |
181 | You can specify up to 64 cameras this way. | 181 | You can specify up to 64 cameras this way. |
182 | For example: | 182 | For example: |
183 | video_nr=-1,2,-1 would assign minor number 2 to the second | 183 | video_nr=-1,2,-1 would assign minor number 2 to the second |
184 | recognized camera and use auto for the first one and for every | 184 | recognized camera and use auto for the first one and for every |
185 | other camera. | 185 | other camera. |
186 | Default: -1 | 186 | Default: -1 |
187 | ------------------------------------------------------------------------------- | 187 | ------------------------------------------------------------------------------- |
188 | Name: force_munmap | 188 | Name: force_munmap |
189 | Type: bool array (min = 0, max = 64) | 189 | Type: bool array (min = 0, max = 64) |
190 | Syntax: <0|1[,...]> | 190 | Syntax: <0|1[,...]> |
191 | Description: Force the application to unmap previously mapped buffer memory | 191 | Description: Force the application to unmap previously mapped buffer memory |
192 | before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not | 192 | before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not |
193 | all the applications support this feature. This parameter is | 193 | all the applications support this feature. This parameter is |
194 | specific for each detected camera. | 194 | specific for each detected camera. |
195 | 0 = do not force memory unmapping | 195 | 0 = do not force memory unmapping |
196 | 1 = force memory unmapping (save memory) | 196 | 1 = force memory unmapping (save memory) |
197 | Default: 0 | 197 | Default: 0 |
198 | ------------------------------------------------------------------------------- | 198 | ------------------------------------------------------------------------------- |
199 | Name: frame_timeout | 199 | Name: frame_timeout |
200 | Type: uint array (min = 0, max = 64) | 200 | Type: uint array (min = 0, max = 64) |
201 | Syntax: <n[,...]> | 201 | Syntax: <n[,...]> |
202 | Description: Timeout for a video frame in seconds. This parameter is | 202 | Description: Timeout for a video frame in seconds. This parameter is |
203 | specific for each detected camera. This parameter can be | 203 | specific for each detected camera. This parameter can be |
204 | changed at runtime thanks to the /sys filesystem interface. | 204 | changed at runtime thanks to the /sys filesystem interface. |
205 | Default: 2 | 205 | Default: 2 |
206 | ------------------------------------------------------------------------------- | 206 | ------------------------------------------------------------------------------- |
207 | Name: debug | 207 | Name: debug |
208 | Type: ushort | 208 | Type: ushort |
209 | Syntax: <n> | 209 | Syntax: <n> |
210 | Description: Debugging information level, from 0 to 3: | 210 | Description: Debugging information level, from 0 to 3: |
211 | 0 = none (use carefully) | 211 | 0 = none (use carefully) |
212 | 1 = critical errors | 212 | 1 = critical errors |
213 | 2 = significant informations | 213 | 2 = significant informations |
214 | 3 = more verbose messages | 214 | 3 = more verbose messages |
215 | Level 3 is useful for testing only, when only one device | 215 | Level 3 is useful for testing only, when only one device |
216 | is used. It also shows some more informations about the | 216 | is used. It also shows some more informations about the |
217 | hardware being detected. This parameter can be changed at | 217 | hardware being detected. This parameter can be changed at |
218 | runtime thanks to the /sys filesystem interface. | 218 | runtime thanks to the /sys filesystem interface. |
219 | Default: 2 | 219 | Default: 2 |
220 | ------------------------------------------------------------------------------- | 220 | ------------------------------------------------------------------------------- |
221 | 221 | ||
@@ -280,24 +280,24 @@ Byte # Value Description | |||
280 | 0x04 0xC4 Frame synchronisation pattern. | 280 | 0x04 0xC4 Frame synchronisation pattern. |
281 | 0x05 0x96 Frame synchronisation pattern. | 281 | 0x05 0x96 Frame synchronisation pattern. |
282 | 0x06 0xXX Unknown meaning. The exact value depends on the chip; | 282 | 0x06 0xXX Unknown meaning. The exact value depends on the chip; |
283 | possible values are 0x00, 0x01 and 0x20. | 283 | possible values are 0x00, 0x01 and 0x20. |
284 | 0x07 0xXX Variable value, whose bits are ff00uzzc, where ff is a | 284 | 0x07 0xXX Variable value, whose bits are ff00uzzc, where ff is a |
285 | frame counter, u is unknown, zz is a size indicator | 285 | frame counter, u is unknown, zz is a size indicator |
286 | (00 = VGA, 01 = SIF, 10 = QSIF) and c stands for | 286 | (00 = VGA, 01 = SIF, 10 = QSIF) and c stands for |
287 | "compression enabled" (1 = yes, 0 = no). | 287 | "compression enabled" (1 = yes, 0 = no). |
288 | 0x08 0xXX Brightness sum inside Auto-Exposure area (low-byte). | 288 | 0x08 0xXX Brightness sum inside Auto-Exposure area (low-byte). |
289 | 0x09 0xXX Brightness sum inside Auto-Exposure area (high-byte). | 289 | 0x09 0xXX Brightness sum inside Auto-Exposure area (high-byte). |
290 | For a pure white image, this number will be equal to 500 | 290 | For a pure white image, this number will be equal to 500 |
291 | times the area of the specified AE area. For images | 291 | times the area of the specified AE area. For images |
292 | that are not pure white, the value scales down according | 292 | that are not pure white, the value scales down according |
293 | to relative whiteness. | 293 | to relative whiteness. |
294 | 0x0A 0xXX Brightness sum outside Auto-Exposure area (low-byte). | 294 | 0x0A 0xXX Brightness sum outside Auto-Exposure area (low-byte). |
295 | 0x0B 0xXX Brightness sum outside Auto-Exposure area (high-byte). | 295 | 0x0B 0xXX Brightness sum outside Auto-Exposure area (high-byte). |
296 | For a pure white image, this number will be equal to 125 | 296 | For a pure white image, this number will be equal to 125 |
297 | times the area outside of the specified AE area. For | 297 | times the area outside of the specified AE area. For |
298 | images that are not pure white, the value scales down | 298 | images that are not pure white, the value scales down |
299 | according to relative whiteness. | 299 | according to relative whiteness. |
300 | according to relative whiteness. | 300 | according to relative whiteness. |
301 | 301 | ||
302 | The following bytes are used by the SN9C103 bridge only: | 302 | The following bytes are used by the SN9C103 bridge only: |
303 | 303 | ||
diff --git a/Documentation/video4linux/v4lgrab.c b/Documentation/video4linux/v4lgrab.c new file mode 100644 index 000000000000..079b628481cf --- /dev/null +++ b/Documentation/video4linux/v4lgrab.c | |||
@@ -0,0 +1,192 @@ | |||
1 | /* Simple Video4Linux image grabber. */ | ||
2 | /* | ||
3 | * Video4Linux Driver Test/Example Framegrabbing Program | ||
4 | * | ||
5 | * Compile with: | ||
6 | * gcc -s -Wall -Wstrict-prototypes v4lgrab.c -o v4lgrab | ||
7 | * Use as: | ||
8 | * v4lgrab >image.ppm | ||
9 | * | ||
10 | * Copyright (C) 1998-05-03, Phil Blundell <philb@gnu.org> | ||
11 | * Copied from http://www.tazenda.demon.co.uk/phil/vgrabber.c | ||
12 | * with minor modifications (Dave Forrest, drf5n@virginia.edu). | ||
13 | * | ||
14 | */ | ||
15 | |||
16 | #include <unistd.h> | ||
17 | #include <sys/types.h> | ||
18 | #include <sys/stat.h> | ||
19 | #include <fcntl.h> | ||
20 | #include <stdio.h> | ||
21 | #include <sys/ioctl.h> | ||
22 | #include <stdlib.h> | ||
23 | |||
24 | #include <linux/types.h> | ||
25 | #include <linux/videodev.h> | ||
26 | |||
27 | #define FILE "/dev/video0" | ||
28 | |||
29 | /* Stole this from tvset.c */ | ||
30 | |||
31 | #define READ_VIDEO_PIXEL(buf, format, depth, r, g, b) \ | ||
32 | { \ | ||
33 | switch (format) \ | ||
34 | { \ | ||
35 | case VIDEO_PALETTE_GREY: \ | ||
36 | switch (depth) \ | ||
37 | { \ | ||
38 | case 4: \ | ||
39 | case 6: \ | ||
40 | case 8: \ | ||
41 | (r) = (g) = (b) = (*buf++ << 8);\ | ||
42 | break; \ | ||
43 | \ | ||
44 | case 16: \ | ||
45 | (r) = (g) = (b) = \ | ||
46 | *((unsigned short *) buf); \ | ||
47 | buf += 2; \ | ||
48 | break; \ | ||
49 | } \ | ||
50 | break; \ | ||
51 | \ | ||
52 | \ | ||
53 | case VIDEO_PALETTE_RGB565: \ | ||
54 | { \ | ||
55 | unsigned short tmp = *(unsigned short *)buf; \ | ||
56 | (r) = tmp&0xF800; \ | ||
57 | (g) = (tmp<<5)&0xFC00; \ | ||
58 | (b) = (tmp<<11)&0xF800; \ | ||
59 | buf += 2; \ | ||
60 | } \ | ||
61 | break; \ | ||
62 | \ | ||
63 | case VIDEO_PALETTE_RGB555: \ | ||
64 | (r) = (buf[0]&0xF8)<<8; \ | ||
65 | (g) = ((buf[0] << 5 | buf[1] >> 3)&0xF8)<<8; \ | ||
66 | (b) = ((buf[1] << 2 ) & 0xF8)<<8; \ | ||
67 | buf += 2; \ | ||
68 | break; \ | ||
69 | \ | ||
70 | case VIDEO_PALETTE_RGB24: \ | ||
71 | (r) = buf[0] << 8; (g) = buf[1] << 8; \ | ||
72 | (b) = buf[2] << 8; \ | ||
73 | buf += 3; \ | ||
74 | break; \ | ||
75 | \ | ||
76 | default: \ | ||
77 | fprintf(stderr, \ | ||
78 | "Format %d not yet supported\n", \ | ||
79 | format); \ | ||
80 | } \ | ||
81 | } | ||
82 | |||
83 | int get_brightness_adj(unsigned char *image, long size, int *brightness) { | ||
84 | long i, tot = 0; | ||
85 | for (i=0;i<size*3;i++) | ||
86 | tot += image[i]; | ||
87 | *brightness = (128 - tot/(size*3))/3; | ||
88 | return !((tot/(size*3)) >= 126 && (tot/(size*3)) <= 130); | ||
89 | } | ||
90 | |||
91 | int main(int argc, char ** argv) | ||
92 | { | ||
93 | int fd = open(FILE, O_RDONLY), f; | ||
94 | struct video_capability cap; | ||
95 | struct video_window win; | ||
96 | struct video_picture vpic; | ||
97 | |||
98 | unsigned char *buffer, *src; | ||
99 | int bpp = 24, r, g, b; | ||
100 | unsigned int i, src_depth; | ||
101 | |||
102 | if (fd < 0) { | ||
103 | perror(FILE); | ||
104 | exit(1); | ||
105 | } | ||
106 | |||
107 | if (ioctl(fd, VIDIOCGCAP, &cap) < 0) { | ||
108 | perror("VIDIOGCAP"); | ||
109 | fprintf(stderr, "(" FILE " not a video4linux device?)\n"); | ||
110 | close(fd); | ||
111 | exit(1); | ||
112 | } | ||
113 | |||
114 | if (ioctl(fd, VIDIOCGWIN, &win) < 0) { | ||
115 | perror("VIDIOCGWIN"); | ||
116 | close(fd); | ||
117 | exit(1); | ||
118 | } | ||
119 | |||
120 | if (ioctl(fd, VIDIOCGPICT, &vpic) < 0) { | ||
121 | perror("VIDIOCGPICT"); | ||
122 | close(fd); | ||
123 | exit(1); | ||
124 | } | ||
125 | |||
126 | if (cap.type & VID_TYPE_MONOCHROME) { | ||
127 | vpic.depth=8; | ||
128 | vpic.palette=VIDEO_PALETTE_GREY; /* 8bit grey */ | ||
129 | if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { | ||
130 | vpic.depth=6; | ||
131 | if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { | ||
132 | vpic.depth=4; | ||
133 | if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { | ||
134 | fprintf(stderr, "Unable to find a supported capture format.\n"); | ||
135 | close(fd); | ||
136 | exit(1); | ||
137 | } | ||
138 | } | ||
139 | } | ||
140 | } else { | ||
141 | vpic.depth=24; | ||
142 | vpic.palette=VIDEO_PALETTE_RGB24; | ||
143 | |||
144 | if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) { | ||
145 | vpic.palette=VIDEO_PALETTE_RGB565; | ||
146 | vpic.depth=16; | ||
147 | |||
148 | if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) { | ||
149 | vpic.palette=VIDEO_PALETTE_RGB555; | ||
150 | vpic.depth=15; | ||
151 | |||
152 | if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) { | ||
153 | fprintf(stderr, "Unable to find a supported capture format.\n"); | ||
154 | return -1; | ||
155 | } | ||
156 | } | ||
157 | } | ||
158 | } | ||
159 | |||
160 | buffer = malloc(win.width * win.height * bpp); | ||
161 | if (!buffer) { | ||
162 | fprintf(stderr, "Out of memory.\n"); | ||
163 | exit(1); | ||
164 | } | ||
165 | |||
166 | do { | ||
167 | int newbright; | ||
168 | read(fd, buffer, win.width * win.height * bpp); | ||
169 | f = get_brightness_adj(buffer, win.width * win.height, &newbright); | ||
170 | if (f) { | ||
171 | vpic.brightness += (newbright << 8); | ||
172 | if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) { | ||
173 | perror("VIDIOSPICT"); | ||
174 | break; | ||
175 | } | ||
176 | } | ||
177 | } while (f); | ||
178 | |||
179 | fprintf(stdout, "P6\n%d %d 255\n", win.width, win.height); | ||
180 | |||
181 | src = buffer; | ||
182 | |||
183 | for (i = 0; i < win.width * win.height; i++) { | ||
184 | READ_VIDEO_PIXEL(src, vpic.palette, src_depth, r, g, b); | ||
185 | fputc(r>>8, stdout); | ||
186 | fputc(g>>8, stdout); | ||
187 | fputc(b>>8, stdout); | ||
188 | } | ||
189 | |||
190 | close(fd); | ||
191 | return 0; | ||
192 | } | ||
diff --git a/Documentation/video4linux/w9968cf.txt b/Documentation/video4linux/w9968cf.txt index 3b704f2aae6d..0d53ce774b01 100644 --- a/Documentation/video4linux/w9968cf.txt +++ b/Documentation/video4linux/w9968cf.txt | |||
@@ -1,9 +1,9 @@ | |||
1 | 1 | ||
2 | W996[87]CF JPEG USB Dual Mode Camera Chip | 2 | W996[87]CF JPEG USB Dual Mode Camera Chip |
3 | Driver for Linux 2.6 (basic version) | 3 | Driver for Linux 2.6 (basic version) |
4 | ========================================= | 4 | ========================================= |
5 | 5 | ||
6 | - Documentation - | 6 | - Documentation - |
7 | 7 | ||
8 | 8 | ||
9 | Index | 9 | Index |
@@ -188,57 +188,57 @@ Name: ovmod_load | |||
188 | Type: bool | 188 | Type: bool |
189 | Syntax: <0|1> | 189 | Syntax: <0|1> |
190 | Description: Automatic 'ovcamchip' module loading: 0 disabled, 1 enabled. | 190 | Description: Automatic 'ovcamchip' module loading: 0 disabled, 1 enabled. |
191 | If enabled, 'insmod' searches for the required 'ovcamchip' | 191 | If enabled, 'insmod' searches for the required 'ovcamchip' |
192 | module in the system, according to its configuration, and | 192 | module in the system, according to its configuration, and |
193 | loads that module automatically. This action is performed as | 193 | loads that module automatically. This action is performed as |
194 | once soon as the 'w9968cf' module is loaded into memory. | 194 | once soon as the 'w9968cf' module is loaded into memory. |
195 | Default: 1 | 195 | Default: 1 |
196 | Note: The kernel must be compiled with the CONFIG_KMOD option | 196 | Note: The kernel must be compiled with the CONFIG_KMOD option |
197 | enabled for the 'ovcamchip' module to be loaded and for | 197 | enabled for the 'ovcamchip' module to be loaded and for |
198 | this parameter to be present. | 198 | this parameter to be present. |
199 | ------------------------------------------------------------------------------- | 199 | ------------------------------------------------------------------------------- |
200 | Name: simcams | 200 | Name: simcams |
201 | Type: int | 201 | Type: int |
202 | Syntax: <n> | 202 | Syntax: <n> |
203 | Description: Number of cameras allowed to stream simultaneously. | 203 | Description: Number of cameras allowed to stream simultaneously. |
204 | n may vary from 0 to 32. | 204 | n may vary from 0 to 32. |
205 | Default: 32 | 205 | Default: 32 |
206 | ------------------------------------------------------------------------------- | 206 | ------------------------------------------------------------------------------- |
207 | Name: video_nr | 207 | Name: video_nr |
208 | Type: int array (min = 0, max = 32) | 208 | Type: int array (min = 0, max = 32) |
209 | Syntax: <-1|n[,...]> | 209 | Syntax: <-1|n[,...]> |
210 | Description: Specify V4L minor mode number. | 210 | Description: Specify V4L minor mode number. |
211 | -1 = use next available | 211 | -1 = use next available |
212 | n = use minor number n | 212 | n = use minor number n |
213 | You can specify up to 32 cameras this way. | 213 | You can specify up to 32 cameras this way. |
214 | For example: | 214 | For example: |
215 | video_nr=-1,2,-1 would assign minor number 2 to the second | 215 | video_nr=-1,2,-1 would assign minor number 2 to the second |
216 | recognized camera and use auto for the first one and for every | 216 | recognized camera and use auto for the first one and for every |
217 | other camera. | 217 | other camera. |
218 | Default: -1 | 218 | Default: -1 |
219 | ------------------------------------------------------------------------------- | 219 | ------------------------------------------------------------------------------- |
220 | Name: packet_size | 220 | Name: packet_size |
221 | Type: int array (min = 0, max = 32) | 221 | Type: int array (min = 0, max = 32) |
222 | Syntax: <n[,...]> | 222 | Syntax: <n[,...]> |
223 | Description: Specify the maximum data payload size in bytes for alternate | 223 | Description: Specify the maximum data payload size in bytes for alternate |
224 | settings, for each device. n is scaled between 63 and 1023. | 224 | settings, for each device. n is scaled between 63 and 1023. |
225 | Default: 1023 | 225 | Default: 1023 |
226 | ------------------------------------------------------------------------------- | 226 | ------------------------------------------------------------------------------- |
227 | Name: max_buffers | 227 | Name: max_buffers |
228 | Type: int array (min = 0, max = 32) | 228 | Type: int array (min = 0, max = 32) |
229 | Syntax: <n[,...]> | 229 | Syntax: <n[,...]> |
230 | Description: For advanced users. | 230 | Description: For advanced users. |
231 | Specify the maximum number of video frame buffers to allocate | 231 | Specify the maximum number of video frame buffers to allocate |
232 | for each device, from 2 to 32. | 232 | for each device, from 2 to 32. |
233 | Default: 2 | 233 | Default: 2 |
234 | ------------------------------------------------------------------------------- | 234 | ------------------------------------------------------------------------------- |
235 | Name: double_buffer | 235 | Name: double_buffer |
236 | Type: bool array (min = 0, max = 32) | 236 | Type: bool array (min = 0, max = 32) |
237 | Syntax: <0|1[,...]> | 237 | Syntax: <0|1[,...]> |
238 | Description: Hardware double buffering: 0 disabled, 1 enabled. | 238 | Description: Hardware double buffering: 0 disabled, 1 enabled. |
239 | It should be enabled if you want smooth video output: if you | 239 | It should be enabled if you want smooth video output: if you |
240 | obtain out of sync. video, disable it, or try to | 240 | obtain out of sync. video, disable it, or try to |
241 | decrease the 'clockdiv' module parameter value. | 241 | decrease the 'clockdiv' module parameter value. |
242 | Default: 1 for every device. | 242 | Default: 1 for every device. |
243 | ------------------------------------------------------------------------------- | 243 | ------------------------------------------------------------------------------- |
244 | Name: clamping | 244 | Name: clamping |
@@ -251,9 +251,9 @@ Name: filter_type | |||
251 | Type: int array (min = 0, max = 32) | 251 | Type: int array (min = 0, max = 32) |
252 | Syntax: <0|1|2[,...]> | 252 | Syntax: <0|1|2[,...]> |
253 | Description: Video filter type. | 253 | Description: Video filter type. |
254 | 0 none, 1 (1-2-1) 3-tap filter, 2 (2-3-6-3-2) 5-tap filter. | 254 | 0 none, 1 (1-2-1) 3-tap filter, 2 (2-3-6-3-2) 5-tap filter. |
255 | The filter is used to reduce noise and aliasing artifacts | 255 | The filter is used to reduce noise and aliasing artifacts |
256 | produced by the CCD or CMOS image sensor. | 256 | produced by the CCD or CMOS image sensor. |
257 | Default: 0 for every device. | 257 | Default: 0 for every device. |
258 | ------------------------------------------------------------------------------- | 258 | ------------------------------------------------------------------------------- |
259 | Name: largeview | 259 | Name: largeview |
@@ -266,9 +266,9 @@ Name: upscaling | |||
266 | Type: bool array (min = 0, max = 32) | 266 | Type: bool array (min = 0, max = 32) |
267 | Syntax: <0|1[,...]> | 267 | Syntax: <0|1[,...]> |
268 | Description: Software scaling (for non-compressed video only): | 268 | Description: Software scaling (for non-compressed video only): |
269 | 0 disabled, 1 enabled. | 269 | 0 disabled, 1 enabled. |
270 | Disable it if you have a slow CPU or you don't have enough | 270 | Disable it if you have a slow CPU or you don't have enough |
271 | memory. | 271 | memory. |
272 | Default: 0 for every device. | 272 | Default: 0 for every device. |
273 | Note: If 'w9968cf-vpp' is not present, this parameter is set to 0. | 273 | Note: If 'w9968cf-vpp' is not present, this parameter is set to 0. |
274 | ------------------------------------------------------------------------------- | 274 | ------------------------------------------------------------------------------- |
@@ -276,36 +276,36 @@ Name: decompression | |||
276 | Type: int array (min = 0, max = 32) | 276 | Type: int array (min = 0, max = 32) |
277 | Syntax: <0|1|2[,...]> | 277 | Syntax: <0|1|2[,...]> |
278 | Description: Software video decompression: | 278 | Description: Software video decompression: |
279 | 0 = disables decompression | 279 | 0 = disables decompression |
280 | (doesn't allow formats needing decompression). | 280 | (doesn't allow formats needing decompression). |
281 | 1 = forces decompression | 281 | 1 = forces decompression |
282 | (allows formats needing decompression only). | 282 | (allows formats needing decompression only). |
283 | 2 = allows any permitted formats. | 283 | 2 = allows any permitted formats. |
284 | Formats supporting (de)compressed video are YUV422P and | 284 | Formats supporting (de)compressed video are YUV422P and |
285 | YUV420P/YUV420 in any resolutions where width and height are | 285 | YUV420P/YUV420 in any resolutions where width and height are |
286 | multiples of 16. | 286 | multiples of 16. |
287 | Default: 2 for every device. | 287 | Default: 2 for every device. |
288 | Note: If 'w9968cf-vpp' is not present, forcing decompression is not | 288 | Note: If 'w9968cf-vpp' is not present, forcing decompression is not |
289 | allowed; in this case this parameter is set to 2. | 289 | allowed; in this case this parameter is set to 2. |
290 | ------------------------------------------------------------------------------- | 290 | ------------------------------------------------------------------------------- |
291 | Name: force_palette | 291 | Name: force_palette |
292 | Type: int array (min = 0, max = 32) | 292 | Type: int array (min = 0, max = 32) |
293 | Syntax: <0|9|10|13|15|8|7|1|6|3|4|5[,...]> | 293 | Syntax: <0|9|10|13|15|8|7|1|6|3|4|5[,...]> |
294 | Description: Force picture palette. | 294 | Description: Force picture palette. |
295 | In order: | 295 | In order: |
296 | 0 = Off - allows any of the following formats: | 296 | 0 = Off - allows any of the following formats: |
297 | 9 = UYVY 16 bpp - Original video, compression disabled | 297 | 9 = UYVY 16 bpp - Original video, compression disabled |
298 | 10 = YUV420 12 bpp - Original video, compression enabled | 298 | 10 = YUV420 12 bpp - Original video, compression enabled |
299 | 13 = YUV422P 16 bpp - Original video, compression enabled | 299 | 13 = YUV422P 16 bpp - Original video, compression enabled |
300 | 15 = YUV420P 12 bpp - Original video, compression enabled | 300 | 15 = YUV420P 12 bpp - Original video, compression enabled |
301 | 8 = YUVY 16 bpp - Software conversion from UYVY | 301 | 8 = YUVY 16 bpp - Software conversion from UYVY |
302 | 7 = YUV422 16 bpp - Software conversion from UYVY | 302 | 7 = YUV422 16 bpp - Software conversion from UYVY |
303 | 1 = GREY 8 bpp - Software conversion from UYVY | 303 | 1 = GREY 8 bpp - Software conversion from UYVY |
304 | 6 = RGB555 16 bpp - Software conversion from UYVY | 304 | 6 = RGB555 16 bpp - Software conversion from UYVY |
305 | 3 = RGB565 16 bpp - Software conversion from UYVY | 305 | 3 = RGB565 16 bpp - Software conversion from UYVY |
306 | 4 = RGB24 24 bpp - Software conversion from UYVY | 306 | 4 = RGB24 24 bpp - Software conversion from UYVY |
307 | 5 = RGB32 32 bpp - Software conversion from UYVY | 307 | 5 = RGB32 32 bpp - Software conversion from UYVY |
308 | When not 0, this parameter will override 'decompression'. | 308 | When not 0, this parameter will override 'decompression'. |
309 | Default: 0 for every device. Initial palette is 9 (UYVY). | 309 | Default: 0 for every device. Initial palette is 9 (UYVY). |
310 | Note: If 'w9968cf-vpp' is not present, this parameter is set to 9. | 310 | Note: If 'w9968cf-vpp' is not present, this parameter is set to 9. |
311 | ------------------------------------------------------------------------------- | 311 | ------------------------------------------------------------------------------- |
@@ -313,77 +313,77 @@ Name: force_rgb | |||
313 | Type: bool array (min = 0, max = 32) | 313 | Type: bool array (min = 0, max = 32) |
314 | Syntax: <0|1[,...]> | 314 | Syntax: <0|1[,...]> |
315 | Description: Read RGB video data instead of BGR: | 315 | Description: Read RGB video data instead of BGR: |
316 | 1 = use RGB component ordering. | 316 | 1 = use RGB component ordering. |
317 | 0 = use BGR component ordering. | 317 | 0 = use BGR component ordering. |
318 | This parameter has effect when using RGBX palettes only. | 318 | This parameter has effect when using RGBX palettes only. |
319 | Default: 0 for every device. | 319 | Default: 0 for every device. |
320 | ------------------------------------------------------------------------------- | 320 | ------------------------------------------------------------------------------- |
321 | Name: autobright | 321 | Name: autobright |
322 | Type: bool array (min = 0, max = 32) | 322 | Type: bool array (min = 0, max = 32) |
323 | Syntax: <0|1[,...]> | 323 | Syntax: <0|1[,...]> |
324 | Description: Image sensor automatically changes brightness: | 324 | Description: Image sensor automatically changes brightness: |
325 | 0 = no, 1 = yes | 325 | 0 = no, 1 = yes |
326 | Default: 0 for every device. | 326 | Default: 0 for every device. |
327 | ------------------------------------------------------------------------------- | 327 | ------------------------------------------------------------------------------- |
328 | Name: autoexp | 328 | Name: autoexp |
329 | Type: bool array (min = 0, max = 32) | 329 | Type: bool array (min = 0, max = 32) |
330 | Syntax: <0|1[,...]> | 330 | Syntax: <0|1[,...]> |
331 | Description: Image sensor automatically changes exposure: | 331 | Description: Image sensor automatically changes exposure: |
332 | 0 = no, 1 = yes | 332 | 0 = no, 1 = yes |
333 | Default: 1 for every device. | 333 | Default: 1 for every device. |
334 | ------------------------------------------------------------------------------- | 334 | ------------------------------------------------------------------------------- |
335 | Name: lightfreq | 335 | Name: lightfreq |
336 | Type: int array (min = 0, max = 32) | 336 | Type: int array (min = 0, max = 32) |
337 | Syntax: <50|60[,...]> | 337 | Syntax: <50|60[,...]> |
338 | Description: Light frequency in Hz: | 338 | Description: Light frequency in Hz: |
339 | 50 for European and Asian lighting, 60 for American lighting. | 339 | 50 for European and Asian lighting, 60 for American lighting. |
340 | Default: 50 for every device. | 340 | Default: 50 for every device. |
341 | ------------------------------------------------------------------------------- | 341 | ------------------------------------------------------------------------------- |
342 | Name: bandingfilter | 342 | Name: bandingfilter |
343 | Type: bool array (min = 0, max = 32) | 343 | Type: bool array (min = 0, max = 32) |
344 | Syntax: <0|1[,...]> | 344 | Syntax: <0|1[,...]> |
345 | Description: Banding filter to reduce effects of fluorescent | 345 | Description: Banding filter to reduce effects of fluorescent |
346 | lighting: | 346 | lighting: |
347 | 0 disabled, 1 enabled. | 347 | 0 disabled, 1 enabled. |
348 | This filter tries to reduce the pattern of horizontal | 348 | This filter tries to reduce the pattern of horizontal |
349 | light/dark bands caused by some (usually fluorescent) lighting. | 349 | light/dark bands caused by some (usually fluorescent) lighting. |
350 | Default: 0 for every device. | 350 | Default: 0 for every device. |
351 | ------------------------------------------------------------------------------- | 351 | ------------------------------------------------------------------------------- |
352 | Name: clockdiv | 352 | Name: clockdiv |
353 | Type: int array (min = 0, max = 32) | 353 | Type: int array (min = 0, max = 32) |
354 | Syntax: <-1|n[,...]> | 354 | Syntax: <-1|n[,...]> |
355 | Description: Force pixel clock divisor to a specific value (for experts): | 355 | Description: Force pixel clock divisor to a specific value (for experts): |
356 | n may vary from 0 to 127. | 356 | n may vary from 0 to 127. |
357 | -1 for automatic value. | 357 | -1 for automatic value. |
358 | See also the 'double_buffer' module parameter. | 358 | See also the 'double_buffer' module parameter. |
359 | Default: -1 for every device. | 359 | Default: -1 for every device. |
360 | ------------------------------------------------------------------------------- | 360 | ------------------------------------------------------------------------------- |
361 | Name: backlight | 361 | Name: backlight |
362 | Type: bool array (min = 0, max = 32) | 362 | Type: bool array (min = 0, max = 32) |
363 | Syntax: <0|1[,...]> | 363 | Syntax: <0|1[,...]> |
364 | Description: Objects are lit from behind: | 364 | Description: Objects are lit from behind: |
365 | 0 = no, 1 = yes | 365 | 0 = no, 1 = yes |
366 | Default: 0 for every device. | 366 | Default: 0 for every device. |
367 | ------------------------------------------------------------------------------- | 367 | ------------------------------------------------------------------------------- |
368 | Name: mirror | 368 | Name: mirror |
369 | Type: bool array (min = 0, max = 32) | 369 | Type: bool array (min = 0, max = 32) |
370 | Syntax: <0|1[,...]> | 370 | Syntax: <0|1[,...]> |
371 | Description: Reverse image horizontally: | 371 | Description: Reverse image horizontally: |
372 | 0 = no, 1 = yes | 372 | 0 = no, 1 = yes |
373 | Default: 0 for every device. | 373 | Default: 0 for every device. |
374 | ------------------------------------------------------------------------------- | 374 | ------------------------------------------------------------------------------- |
375 | Name: monochrome | 375 | Name: monochrome |
376 | Type: bool array (min = 0, max = 32) | 376 | Type: bool array (min = 0, max = 32) |
377 | Syntax: <0|1[,...]> | 377 | Syntax: <0|1[,...]> |
378 | Description: The image sensor is monochrome: | 378 | Description: The image sensor is monochrome: |
379 | 0 = no, 1 = yes | 379 | 0 = no, 1 = yes |
380 | Default: 0 for every device. | 380 | Default: 0 for every device. |
381 | ------------------------------------------------------------------------------- | 381 | ------------------------------------------------------------------------------- |
382 | Name: brightness | 382 | Name: brightness |
383 | Type: long array (min = 0, max = 32) | 383 | Type: long array (min = 0, max = 32) |
384 | Syntax: <n[,...]> | 384 | Syntax: <n[,...]> |
385 | Description: Set picture brightness (0-65535). | 385 | Description: Set picture brightness (0-65535). |
386 | This parameter has no effect if 'autobright' is enabled. | 386 | This parameter has no effect if 'autobright' is enabled. |
387 | Default: 31000 for every device. | 387 | Default: 31000 for every device. |
388 | ------------------------------------------------------------------------------- | 388 | ------------------------------------------------------------------------------- |
389 | Name: hue | 389 | Name: hue |
@@ -414,23 +414,23 @@ Name: debug | |||
414 | Type: int | 414 | Type: int |
415 | Syntax: <n> | 415 | Syntax: <n> |
416 | Description: Debugging information level, from 0 to 6: | 416 | Description: Debugging information level, from 0 to 6: |
417 | 0 = none (use carefully) | 417 | 0 = none (use carefully) |
418 | 1 = critical errors | 418 | 1 = critical errors |
419 | 2 = significant informations | 419 | 2 = significant informations |
420 | 3 = configuration or general messages | 420 | 3 = configuration or general messages |
421 | 4 = warnings | 421 | 4 = warnings |
422 | 5 = called functions | 422 | 5 = called functions |
423 | 6 = function internals | 423 | 6 = function internals |
424 | Level 5 and 6 are useful for testing only, when only one | 424 | Level 5 and 6 are useful for testing only, when only one |
425 | device is used. | 425 | device is used. |
426 | Default: 2 | 426 | Default: 2 |
427 | ------------------------------------------------------------------------------- | 427 | ------------------------------------------------------------------------------- |
428 | Name: specific_debug | 428 | Name: specific_debug |
429 | Type: bool | 429 | Type: bool |
430 | Syntax: <0|1> | 430 | Syntax: <0|1> |
431 | Description: Enable or disable specific debugging messages: | 431 | Description: Enable or disable specific debugging messages: |
432 | 0 = print messages concerning every level <= 'debug' level. | 432 | 0 = print messages concerning every level <= 'debug' level. |
433 | 1 = print messages concerning the level indicated by 'debug'. | 433 | 1 = print messages concerning the level indicated by 'debug'. |
434 | Default: 0 | 434 | Default: 0 |
435 | ------------------------------------------------------------------------------- | 435 | ------------------------------------------------------------------------------- |
436 | 436 | ||
diff --git a/Documentation/video4linux/zc0301.txt b/Documentation/video4linux/zc0301.txt index f55262c6733b..f406f5e80046 100644 --- a/Documentation/video4linux/zc0301.txt +++ b/Documentation/video4linux/zc0301.txt | |||
@@ -1,9 +1,9 @@ | |||
1 | 1 | ||
2 | ZC0301 Image Processor and Control Chip | 2 | ZC0301 and ZC0301P Image Processor and Control Chip |
3 | Driver for Linux | 3 | Driver for Linux |
4 | ======================================= | 4 | =================================================== |
5 | 5 | ||
6 | - Documentation - | 6 | - Documentation - |
7 | 7 | ||
8 | 8 | ||
9 | Index | 9 | Index |
@@ -51,13 +51,13 @@ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. | |||
51 | 51 | ||
52 | 4. Overview and features | 52 | 4. Overview and features |
53 | ======================== | 53 | ======================== |
54 | This driver supports the video interface of the devices mounting the ZC0301 | 54 | This driver supports the video interface of the devices mounting the ZC0301 or |
55 | Image Processor and Control Chip. | 55 | ZC0301P Image Processors and Control Chips. |
56 | 56 | ||
57 | The driver relies on the Video4Linux2 and USB core modules. It has been | 57 | The driver relies on the Video4Linux2 and USB core modules. It has been |
58 | designed to run properly on SMP systems as well. | 58 | designed to run properly on SMP systems as well. |
59 | 59 | ||
60 | The latest version of the ZC0301 driver can be found at the following URL: | 60 | The latest version of the ZC0301[P] driver can be found at the following URL: |
61 | http://www.linux-projects.org/ | 61 | http://www.linux-projects.org/ |
62 | 62 | ||
63 | Some of the features of the driver are: | 63 | Some of the features of the driver are: |
@@ -117,7 +117,7 @@ supported by the USB Audio driver thanks to the ALSA API: | |||
117 | 117 | ||
118 | And finally: | 118 | And finally: |
119 | 119 | ||
120 | # USB Multimedia devices | 120 | # V4L USB devices |
121 | # | 121 | # |
122 | CONFIG_USB_ZC0301=m | 122 | CONFIG_USB_ZC0301=m |
123 | 123 | ||
@@ -146,46 +146,46 @@ Name: video_nr | |||
146 | Type: short array (min = 0, max = 64) | 146 | Type: short array (min = 0, max = 64) |
147 | Syntax: <-1|n[,...]> | 147 | Syntax: <-1|n[,...]> |
148 | Description: Specify V4L2 minor mode number: | 148 | Description: Specify V4L2 minor mode number: |
149 | -1 = use next available | 149 | -1 = use next available |
150 | n = use minor number n | 150 | n = use minor number n |
151 | You can specify up to 64 cameras this way. | 151 | You can specify up to 64 cameras this way. |
152 | For example: | 152 | For example: |
153 | video_nr=-1,2,-1 would assign minor number 2 to the second | 153 | video_nr=-1,2,-1 would assign minor number 2 to the second |
154 | registered camera and use auto for the first one and for every | 154 | registered camera and use auto for the first one and for every |
155 | other camera. | 155 | other camera. |
156 | Default: -1 | 156 | Default: -1 |
157 | ------------------------------------------------------------------------------- | 157 | ------------------------------------------------------------------------------- |
158 | Name: force_munmap | 158 | Name: force_munmap |
159 | Type: bool array (min = 0, max = 64) | 159 | Type: bool array (min = 0, max = 64) |
160 | Syntax: <0|1[,...]> | 160 | Syntax: <0|1[,...]> |
161 | Description: Force the application to unmap previously mapped buffer memory | 161 | Description: Force the application to unmap previously mapped buffer memory |
162 | before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not | 162 | before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not |
163 | all the applications support this feature. This parameter is | 163 | all the applications support this feature. This parameter is |
164 | specific for each detected camera. | 164 | specific for each detected camera. |
165 | 0 = do not force memory unmapping | 165 | 0 = do not force memory unmapping |
166 | 1 = force memory unmapping (save memory) | 166 | 1 = force memory unmapping (save memory) |
167 | Default: 0 | 167 | Default: 0 |
168 | ------------------------------------------------------------------------------- | 168 | ------------------------------------------------------------------------------- |
169 | Name: frame_timeout | 169 | Name: frame_timeout |
170 | Type: uint array (min = 0, max = 64) | 170 | Type: uint array (min = 0, max = 64) |
171 | Syntax: <n[,...]> | 171 | Syntax: <n[,...]> |
172 | Description: Timeout for a video frame in seconds. This parameter is | 172 | Description: Timeout for a video frame in seconds. This parameter is |
173 | specific for each detected camera. This parameter can be | 173 | specific for each detected camera. This parameter can be |
174 | changed at runtime thanks to the /sys filesystem interface. | 174 | changed at runtime thanks to the /sys filesystem interface. |
175 | Default: 2 | 175 | Default: 2 |
176 | ------------------------------------------------------------------------------- | 176 | ------------------------------------------------------------------------------- |
177 | Name: debug | 177 | Name: debug |
178 | Type: ushort | 178 | Type: ushort |
179 | Syntax: <n> | 179 | Syntax: <n> |
180 | Description: Debugging information level, from 0 to 3: | 180 | Description: Debugging information level, from 0 to 3: |
181 | 0 = none (use carefully) | 181 | 0 = none (use carefully) |
182 | 1 = critical errors | 182 | 1 = critical errors |
183 | 2 = significant informations | 183 | 2 = significant informations |
184 | 3 = more verbose messages | 184 | 3 = more verbose messages |
185 | Level 3 is useful for testing only, when only one device | 185 | Level 3 is useful for testing only, when only one device |
186 | is used at the same time. It also shows some more informations | 186 | is used at the same time. It also shows some more informations |
187 | about the hardware being detected. This module parameter can be | 187 | about the hardware being detected. This module parameter can be |
188 | changed at runtime thanks to the /sys filesystem interface. | 188 | changed at runtime thanks to the /sys filesystem interface. |
189 | Default: 2 | 189 | Default: 2 |
190 | ------------------------------------------------------------------------------- | 190 | ------------------------------------------------------------------------------- |
191 | 191 | ||
@@ -204,11 +204,25 @@ Vendor ID Product ID | |||
204 | 0x041e 0x4017 | 204 | 0x041e 0x4017 |
205 | 0x041e 0x401c | 205 | 0x041e 0x401c |
206 | 0x041e 0x401e | 206 | 0x041e 0x401e |
207 | 0x041e 0x401f | ||
208 | 0x041e 0x4022 | ||
207 | 0x041e 0x4034 | 209 | 0x041e 0x4034 |
208 | 0x041e 0x4035 | 210 | 0x041e 0x4035 |
211 | 0x041e 0x4036 | ||
212 | 0x041e 0x403a | ||
213 | 0x0458 0x7007 | ||
214 | 0x0458 0x700C | ||
215 | 0x0458 0x700f | ||
216 | 0x046d 0x08ae | ||
217 | 0x055f 0xd003 | ||
218 | 0x055f 0xd004 | ||
209 | 0x046d 0x08ae | 219 | 0x046d 0x08ae |
210 | 0x0ac8 0x0301 | 220 | 0x0ac8 0x0301 |
221 | 0x0ac8 0x301b | ||
222 | 0x0ac8 0x303b | ||
223 | 0x10fd 0x0128 | ||
211 | 0x10fd 0x8050 | 224 | 0x10fd 0x8050 |
225 | 0x10fd 0x804e | ||
212 | 226 | ||
213 | The list above does not imply that all those devices work with this driver: up | 227 | The list above does not imply that all those devices work with this driver: up |
214 | until now only the ones that mount the following image sensors are supported; | 228 | until now only the ones that mount the following image sensors are supported; |
@@ -217,6 +231,7 @@ kernel messages will always tell you whether this is the case: | |||
217 | Model Manufacturer | 231 | Model Manufacturer |
218 | ----- ------------ | 232 | ----- ------------ |
219 | PAS202BCB PixArt Imaging, Inc. | 233 | PAS202BCB PixArt Imaging, Inc. |
234 | PB-0330 Photobit Corporation | ||
220 | 235 | ||
221 | 236 | ||
222 | 9. Notes for V4L2 application developers | 237 | 9. Notes for V4L2 application developers |
@@ -250,5 +265,6 @@ the fingerprint is: '88E8 F32F 7244 68BA 3958 5D40 99DA 5D2A FCE6 35A4'. | |||
250 | been taken from the documentation of the ZC030x Video4Linux1 driver written | 265 | been taken from the documentation of the ZC030x Video4Linux1 driver written |
251 | by Andrew Birkett <andy@nobugs.org>; | 266 | by Andrew Birkett <andy@nobugs.org>; |
252 | - The initialization values of the ZC0301 controller connected to the PAS202BCB | 267 | - The initialization values of the ZC0301 controller connected to the PAS202BCB |
253 | image sensor have been taken from the SPCA5XX driver maintained by | 268 | and PB-0330 image sensors have been taken from the SPCA5XX driver maintained |
254 | Michel Xhaard <mxhaard@magic.fr>. | 269 | by Michel Xhaard <mxhaard@magic.fr>; |
270 | - Stanislav Lechev donated one camera. | ||
diff --git a/Documentation/watchdog/pcwd-watchdog.txt b/Documentation/watchdog/pcwd-watchdog.txt index 12187a33e310..d9ee6336c1d4 100644 --- a/Documentation/watchdog/pcwd-watchdog.txt +++ b/Documentation/watchdog/pcwd-watchdog.txt | |||
@@ -22,78 +22,9 @@ | |||
22 | to run the program with an "&" to run it in the background!) | 22 | to run the program with an "&" to run it in the background!) |
23 | 23 | ||
24 | If you want to write a program to be compatible with the PC Watchdog | 24 | If you want to write a program to be compatible with the PC Watchdog |
25 | driver, simply do the following: | 25 | driver, simply use of modify the watchdog test program: |
26 | 26 | Documentation/watchdog/src/watchdog-test.c | |
27 | -- Snippet of code -- | 27 | |
28 | /* | ||
29 | * Watchdog Driver Test Program | ||
30 | */ | ||
31 | |||
32 | #include <stdio.h> | ||
33 | #include <stdlib.h> | ||
34 | #include <string.h> | ||
35 | #include <unistd.h> | ||
36 | #include <fcntl.h> | ||
37 | #include <sys/ioctl.h> | ||
38 | #include <linux/types.h> | ||
39 | #include <linux/watchdog.h> | ||
40 | |||
41 | int fd; | ||
42 | |||
43 | /* | ||
44 | * This function simply sends an IOCTL to the driver, which in turn ticks | ||
45 | * the PC Watchdog card to reset its internal timer so it doesn't trigger | ||
46 | * a computer reset. | ||
47 | */ | ||
48 | void keep_alive(void) | ||
49 | { | ||
50 | int dummy; | ||
51 | |||
52 | ioctl(fd, WDIOC_KEEPALIVE, &dummy); | ||
53 | } | ||
54 | |||
55 | /* | ||
56 | * The main program. Run the program with "-d" to disable the card, | ||
57 | * or "-e" to enable the card. | ||
58 | */ | ||
59 | int main(int argc, char *argv[]) | ||
60 | { | ||
61 | fd = open("/dev/watchdog", O_WRONLY); | ||
62 | |||
63 | if (fd == -1) { | ||
64 | fprintf(stderr, "Watchdog device not enabled.\n"); | ||
65 | fflush(stderr); | ||
66 | exit(-1); | ||
67 | } | ||
68 | |||
69 | if (argc > 1) { | ||
70 | if (!strncasecmp(argv[1], "-d", 2)) { | ||
71 | ioctl(fd, WDIOC_SETOPTIONS, WDIOS_DISABLECARD); | ||
72 | fprintf(stderr, "Watchdog card disabled.\n"); | ||
73 | fflush(stderr); | ||
74 | exit(0); | ||
75 | } else if (!strncasecmp(argv[1], "-e", 2)) { | ||
76 | ioctl(fd, WDIOC_SETOPTIONS, WDIOS_ENABLECARD); | ||
77 | fprintf(stderr, "Watchdog card enabled.\n"); | ||
78 | fflush(stderr); | ||
79 | exit(0); | ||
80 | } else { | ||
81 | fprintf(stderr, "-d to disable, -e to enable.\n"); | ||
82 | fprintf(stderr, "run by itself to tick the card.\n"); | ||
83 | fflush(stderr); | ||
84 | exit(0); | ||
85 | } | ||
86 | } else { | ||
87 | fprintf(stderr, "Watchdog Ticking Away!\n"); | ||
88 | fflush(stderr); | ||
89 | } | ||
90 | |||
91 | while(1) { | ||
92 | keep_alive(); | ||
93 | sleep(1); | ||
94 | } | ||
95 | } | ||
96 | -- End snippet -- | ||
97 | 28 | ||
98 | Other IOCTL functions include: | 29 | Other IOCTL functions include: |
99 | 30 | ||
diff --git a/Documentation/watchdog/src/watchdog-simple.c b/Documentation/watchdog/src/watchdog-simple.c new file mode 100644 index 000000000000..85cf17c48669 --- /dev/null +++ b/Documentation/watchdog/src/watchdog-simple.c | |||
@@ -0,0 +1,15 @@ | |||
1 | #include <stdlib.h> | ||
2 | #include <fcntl.h> | ||
3 | |||
4 | int main(int argc, const char *argv[]) { | ||
5 | int fd = open("/dev/watchdog", O_WRONLY); | ||
6 | if (fd == -1) { | ||
7 | perror("watchdog"); | ||
8 | exit(1); | ||
9 | } | ||
10 | while (1) { | ||
11 | write(fd, "\0", 1); | ||
12 | fsync(fd); | ||
13 | sleep(10); | ||
14 | } | ||
15 | } | ||
diff --git a/Documentation/watchdog/src/watchdog-test.c b/Documentation/watchdog/src/watchdog-test.c new file mode 100644 index 000000000000..65f6c19cb865 --- /dev/null +++ b/Documentation/watchdog/src/watchdog-test.c | |||
@@ -0,0 +1,68 @@ | |||
1 | /* | ||
2 | * Watchdog Driver Test Program | ||
3 | */ | ||
4 | |||
5 | #include <stdio.h> | ||
6 | #include <stdlib.h> | ||
7 | #include <string.h> | ||
8 | #include <unistd.h> | ||
9 | #include <fcntl.h> | ||
10 | #include <sys/ioctl.h> | ||
11 | #include <linux/types.h> | ||
12 | #include <linux/watchdog.h> | ||
13 | |||
14 | int fd; | ||
15 | |||
16 | /* | ||
17 | * This function simply sends an IOCTL to the driver, which in turn ticks | ||
18 | * the PC Watchdog card to reset its internal timer so it doesn't trigger | ||
19 | * a computer reset. | ||
20 | */ | ||
21 | void keep_alive(void) | ||
22 | { | ||
23 | int dummy; | ||
24 | |||
25 | ioctl(fd, WDIOC_KEEPALIVE, &dummy); | ||
26 | } | ||
27 | |||
28 | /* | ||
29 | * The main program. Run the program with "-d" to disable the card, | ||
30 | * or "-e" to enable the card. | ||
31 | */ | ||
32 | int main(int argc, char *argv[]) | ||
33 | { | ||
34 | fd = open("/dev/watchdog", O_WRONLY); | ||
35 | |||
36 | if (fd == -1) { | ||
37 | fprintf(stderr, "Watchdog device not enabled.\n"); | ||
38 | fflush(stderr); | ||
39 | exit(-1); | ||
40 | } | ||
41 | |||
42 | if (argc > 1) { | ||
43 | if (!strncasecmp(argv[1], "-d", 2)) { | ||
44 | ioctl(fd, WDIOC_SETOPTIONS, WDIOS_DISABLECARD); | ||
45 | fprintf(stderr, "Watchdog card disabled.\n"); | ||
46 | fflush(stderr); | ||
47 | exit(0); | ||
48 | } else if (!strncasecmp(argv[1], "-e", 2)) { | ||
49 | ioctl(fd, WDIOC_SETOPTIONS, WDIOS_ENABLECARD); | ||
50 | fprintf(stderr, "Watchdog card enabled.\n"); | ||
51 | fflush(stderr); | ||
52 | exit(0); | ||
53 | } else { | ||
54 | fprintf(stderr, "-d to disable, -e to enable.\n"); | ||
55 | fprintf(stderr, "run by itself to tick the card.\n"); | ||
56 | fflush(stderr); | ||
57 | exit(0); | ||
58 | } | ||
59 | } else { | ||
60 | fprintf(stderr, "Watchdog Ticking Away!\n"); | ||
61 | fflush(stderr); | ||
62 | } | ||
63 | |||
64 | while(1) { | ||
65 | keep_alive(); | ||
66 | sleep(1); | ||
67 | } | ||
68 | } | ||
diff --git a/Documentation/watchdog/watchdog-api.txt b/Documentation/watchdog/watchdog-api.txt index 21ed51173662..958ff3d48be3 100644 --- a/Documentation/watchdog/watchdog-api.txt +++ b/Documentation/watchdog/watchdog-api.txt | |||
@@ -34,22 +34,7 @@ activates as soon as /dev/watchdog is opened and will reboot unless | |||
34 | the watchdog is pinged within a certain time, this time is called the | 34 | the watchdog is pinged within a certain time, this time is called the |
35 | timeout or margin. The simplest way to ping the watchdog is to write | 35 | timeout or margin. The simplest way to ping the watchdog is to write |
36 | some data to the device. So a very simple watchdog daemon would look | 36 | some data to the device. So a very simple watchdog daemon would look |
37 | like this: | 37 | like this source file: see Documentation/watchdog/src/watchdog-simple.c |
38 | |||
39 | #include <stdlib.h> | ||
40 | #include <fcntl.h> | ||
41 | |||
42 | int main(int argc, const char *argv[]) { | ||
43 | int fd=open("/dev/watchdog",O_WRONLY); | ||
44 | if (fd==-1) { | ||
45 | perror("watchdog"); | ||
46 | exit(1); | ||
47 | } | ||
48 | while(1) { | ||
49 | write(fd, "\0", 1); | ||
50 | sleep(10); | ||
51 | } | ||
52 | } | ||
53 | 38 | ||
54 | A more advanced driver could for example check that a HTTP server is | 39 | A more advanced driver could for example check that a HTTP server is |
55 | still responding before doing the write call to ping the watchdog. | 40 | still responding before doing the write call to ping the watchdog. |
@@ -110,7 +95,40 @@ current timeout using the GETTIMEOUT ioctl. | |||
110 | ioctl(fd, WDIOC_GETTIMEOUT, &timeout); | 95 | ioctl(fd, WDIOC_GETTIMEOUT, &timeout); |
111 | printf("The timeout was is %d seconds\n", timeout); | 96 | printf("The timeout was is %d seconds\n", timeout); |
112 | 97 | ||
113 | Envinronmental monitoring: | 98 | Pretimeouts: |
99 | |||
100 | Some watchdog timers can be set to have a trigger go off before the | ||
101 | actual time they will reset the system. This can be done with an NMI, | ||
102 | interrupt, or other mechanism. This allows Linux to record useful | ||
103 | information (like panic information and kernel coredumps) before it | ||
104 | resets. | ||
105 | |||
106 | pretimeout = 10; | ||
107 | ioctl(fd, WDIOC_SETPRETIMEOUT, &pretimeout); | ||
108 | |||
109 | Note that the pretimeout is the number of seconds before the time | ||
110 | when the timeout will go off. It is not the number of seconds until | ||
111 | the pretimeout. So, for instance, if you set the timeout to 60 seconds | ||
112 | and the pretimeout to 10 seconds, the pretimout will go of in 50 | ||
113 | seconds. Setting a pretimeout to zero disables it. | ||
114 | |||
115 | There is also a get function for getting the pretimeout: | ||
116 | |||
117 | ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout); | ||
118 | printf("The pretimeout was is %d seconds\n", timeout); | ||
119 | |||
120 | Not all watchdog drivers will support a pretimeout. | ||
121 | |||
122 | Get the number of seconds before reboot: | ||
123 | |||
124 | Some watchdog drivers have the ability to report the remaining time | ||
125 | before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl | ||
126 | that returns the number of seconds before reboot. | ||
127 | |||
128 | ioctl(fd, WDIOC_GETTIMELEFT, &timeleft); | ||
129 | printf("The timeout was is %d seconds\n", timeleft); | ||
130 | |||
131 | Environmental monitoring: | ||
114 | 132 | ||
115 | All watchdog drivers are required return more information about the system, | 133 | All watchdog drivers are required return more information about the system, |
116 | some do temperature, fan and power level monitoring, some can tell you | 134 | some do temperature, fan and power level monitoring, some can tell you |
@@ -169,6 +187,10 @@ The watchdog saw a keepalive ping since it was last queried. | |||
169 | 187 | ||
170 | WDIOF_SETTIMEOUT Can set/get the timeout | 188 | WDIOF_SETTIMEOUT Can set/get the timeout |
171 | 189 | ||
190 | The watchdog can do pretimeouts. | ||
191 | |||
192 | WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set | ||
193 | |||
172 | 194 | ||
173 | For those drivers that return any bits set in the option field, the | 195 | For those drivers that return any bits set in the option field, the |
174 | GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current | 196 | GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current |
diff --git a/Documentation/watchdog/watchdog.txt b/Documentation/watchdog/watchdog.txt index dffda29c8799..4b1ff69cc19a 100644 --- a/Documentation/watchdog/watchdog.txt +++ b/Documentation/watchdog/watchdog.txt | |||
@@ -65,28 +65,7 @@ The external event interfaces on the WDT boards are not currently supported. | |||
65 | Minor numbers are however allocated for it. | 65 | Minor numbers are however allocated for it. |
66 | 66 | ||
67 | 67 | ||
68 | Example Watchdog Driver | 68 | Example Watchdog Driver: see Documentation/watchdog/src/watchdog-simple.c |
69 | ----------------------- | ||
70 | |||
71 | #include <stdio.h> | ||
72 | #include <unistd.h> | ||
73 | #include <fcntl.h> | ||
74 | |||
75 | int main(int argc, const char *argv[]) | ||
76 | { | ||
77 | int fd=open("/dev/watchdog",O_WRONLY); | ||
78 | if(fd==-1) | ||
79 | { | ||
80 | perror("watchdog"); | ||
81 | exit(1); | ||
82 | } | ||
83 | while(1) | ||
84 | { | ||
85 | write(fd,"\0",1); | ||
86 | fsync(fd); | ||
87 | sleep(10); | ||
88 | } | ||
89 | } | ||
90 | 69 | ||
91 | 70 | ||
92 | Contact Information | 71 | Contact Information |
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt index f2cd6ef53ff3..74b77f9e91bc 100644 --- a/Documentation/x86_64/boot-options.txt +++ b/Documentation/x86_64/boot-options.txt | |||
@@ -199,12 +199,38 @@ IOMMU | |||
199 | allowed overwrite iommu off workarounds for specific chipsets. | 199 | allowed overwrite iommu off workarounds for specific chipsets. |
200 | soft Use software bounce buffering (default for Intel machines) | 200 | soft Use software bounce buffering (default for Intel machines) |
201 | noaperture Don't touch the aperture for AGP. | 201 | noaperture Don't touch the aperture for AGP. |
202 | allowdac Allow DMA >4GB | ||
203 | When off all DMA over >4GB is forced through an IOMMU or bounce | ||
204 | buffering. | ||
205 | nodac Forbid DMA >4GB | ||
206 | panic Always panic when IOMMU overflows | ||
202 | 207 | ||
203 | swiotlb=pages[,force] | 208 | swiotlb=pages[,force] |
204 | 209 | ||
205 | pages Prereserve that many 128K pages for the software IO bounce buffering. | 210 | pages Prereserve that many 128K pages for the software IO bounce buffering. |
206 | force Force all IO through the software TLB. | 211 | force Force all IO through the software TLB. |
207 | 212 | ||
213 | calgary=[64k,128k,256k,512k,1M,2M,4M,8M] | ||
214 | calgary=[translate_empty_slots] | ||
215 | calgary=[disable=<PCI bus number>] | ||
216 | |||
217 | 64k,...,8M - Set the size of each PCI slot's translation table | ||
218 | when using the Calgary IOMMU. This is the size of the translation | ||
219 | table itself in main memory. The smallest table, 64k, covers an IO | ||
220 | space of 32MB; the largest, 8MB table, can cover an IO space of | ||
221 | 4GB. Normally the kernel will make the right choice by itself. | ||
222 | |||
223 | translate_empty_slots - Enable translation even on slots that have | ||
224 | no devices attached to them, in case a device will be hotplugged | ||
225 | in the future. | ||
226 | |||
227 | disable=<PCI bus number> - Disable translation on a given PHB. For | ||
228 | example, the built-in graphics adapter resides on the first bridge | ||
229 | (PCI bus number 0); if translation (isolation) is enabled on this | ||
230 | bridge, X servers that access the hardware directly from user | ||
231 | space might stop working. Use this option if you have devices that | ||
232 | are accessed from userspace directly on some PCI host bridge. | ||
233 | |||
208 | Debugging | 234 | Debugging |
209 | 235 | ||
210 | oops=panic Always panic on oopses. Default is to just kill the process, | 236 | oops=panic Always panic on oopses. Default is to just kill the process, |
@@ -217,6 +243,20 @@ Debugging | |||
217 | pagefaulttrace Dump all page faults. Only useful for extreme debugging | 243 | pagefaulttrace Dump all page faults. Only useful for extreme debugging |
218 | and will create a lot of output. | 244 | and will create a lot of output. |
219 | 245 | ||
246 | call_trace=[old|both|newfallback|new] | ||
247 | old: use old inexact backtracer | ||
248 | new: use new exact dwarf2 unwinder | ||
249 | both: print entries from both | ||
250 | newfallback: use new unwinder but fall back to old if it gets | ||
251 | stuck (default) | ||
252 | |||
253 | call_trace=[old|both|newfallback|new] | ||
254 | old: use old inexact backtracer | ||
255 | new: use new exact dwarf2 unwinder | ||
256 | both: print entries from both | ||
257 | newfallback: use new unwinder but fall back to old if it gets | ||
258 | stuck (default) | ||
259 | |||
220 | Misc | 260 | Misc |
221 | 261 | ||
222 | noreplacement Don't replace instructions with more appropriate ones | 262 | noreplacement Don't replace instructions with more appropriate ones |
diff --git a/Documentation/x86_64/kernel-stacks b/Documentation/x86_64/kernel-stacks new file mode 100644 index 000000000000..bddfddd466ab --- /dev/null +++ b/Documentation/x86_64/kernel-stacks | |||
@@ -0,0 +1,99 @@ | |||
1 | Most of the text from Keith Owens, hacked by AK | ||
2 | |||
3 | x86_64 page size (PAGE_SIZE) is 4K. | ||
4 | |||
5 | Like all other architectures, x86_64 has a kernel stack for every | ||
6 | active thread. These thread stacks are THREAD_SIZE (2*PAGE_SIZE) big. | ||
7 | These stacks contain useful data as long as a thread is alive or a | ||
8 | zombie. While the thread is in user space the kernel stack is empty | ||
9 | except for the thread_info structure at the bottom. | ||
10 | |||
11 | In addition to the per thread stacks, there are specialized stacks | ||
12 | associated with each cpu. These stacks are only used while the kernel | ||
13 | is in control on that cpu, when a cpu returns to user space the | ||
14 | specialized stacks contain no useful data. The main cpu stacks is | ||
15 | |||
16 | * Interrupt stack. IRQSTACKSIZE | ||
17 | |||
18 | Used for external hardware interrupts. If this is the first external | ||
19 | hardware interrupt (i.e. not a nested hardware interrupt) then the | ||
20 | kernel switches from the current task to the interrupt stack. Like | ||
21 | the split thread and interrupt stacks on i386 (with CONFIG_4KSTACKS), | ||
22 | this gives more room for kernel interrupt processing without having | ||
23 | to increase the size of every per thread stack. | ||
24 | |||
25 | The interrupt stack is also used when processing a softirq. | ||
26 | |||
27 | Switching to the kernel interrupt stack is done by software based on a | ||
28 | per CPU interrupt nest counter. This is needed because x86-64 "IST" | ||
29 | hardware stacks cannot nest without races. | ||
30 | |||
31 | x86_64 also has a feature which is not available on i386, the ability | ||
32 | to automatically switch to a new stack for designated events such as | ||
33 | double fault or NMI, which makes it easier to handle these unusual | ||
34 | events on x86_64. This feature is called the Interrupt Stack Table | ||
35 | (IST). There can be up to 7 IST entries per cpu. The IST code is an | ||
36 | index into the Task State Segment (TSS), the IST entries in the TSS | ||
37 | point to dedicated stacks, each stack can be a different size. | ||
38 | |||
39 | An IST is selected by an non-zero value in the IST field of an | ||
40 | interrupt-gate descriptor. When an interrupt occurs and the hardware | ||
41 | loads such a descriptor, the hardware automatically sets the new stack | ||
42 | pointer based on the IST value, then invokes the interrupt handler. If | ||
43 | software wants to allow nested IST interrupts then the handler must | ||
44 | adjust the IST values on entry to and exit from the interrupt handler. | ||
45 | (this is occasionally done, e.g. for debug exceptions) | ||
46 | |||
47 | Events with different IST codes (i.e. with different stacks) can be | ||
48 | nested. For example, a debug interrupt can safely be interrupted by an | ||
49 | NMI. arch/x86_64/kernel/entry.S::paranoidentry adjusts the stack | ||
50 | pointers on entry to and exit from all IST events, in theory allowing | ||
51 | IST events with the same code to be nested. However in most cases, the | ||
52 | stack size allocated to an IST assumes no nesting for the same code. | ||
53 | If that assumption is ever broken then the stacks will become corrupt. | ||
54 | |||
55 | The currently assigned IST stacks are :- | ||
56 | |||
57 | * STACKFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE). | ||
58 | |||
59 | Used for interrupt 12 - Stack Fault Exception (#SS). | ||
60 | |||
61 | This allows to recover from invalid stack segments. Rarely | ||
62 | happens. | ||
63 | |||
64 | * DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE). | ||
65 | |||
66 | Used for interrupt 8 - Double Fault Exception (#DF). | ||
67 | |||
68 | Invoked when handling a exception causes another exception. Happens | ||
69 | when the kernel is very confused (e.g. kernel stack pointer corrupt) | ||
70 | Using a separate stack allows to recover from it well enough in many | ||
71 | cases to still output an oops. | ||
72 | |||
73 | * NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE). | ||
74 | |||
75 | Used for non-maskable interrupts (NMI). | ||
76 | |||
77 | NMI can be delivered at any time, including when the kernel is in the | ||
78 | middle of switching stacks. Using IST for NMI events avoids making | ||
79 | assumptions about the previous state of the kernel stack. | ||
80 | |||
81 | * DEBUG_STACK. DEBUG_STKSZ | ||
82 | |||
83 | Used for hardware debug interrupts (interrupt 1) and for software | ||
84 | debug interrupts (INT3). | ||
85 | |||
86 | When debugging a kernel, debug interrupts (both hardware and | ||
87 | software) can occur at any time. Using IST for these interrupts | ||
88 | avoids making assumptions about the previous state of the kernel | ||
89 | stack. | ||
90 | |||
91 | * MCE_STACK. EXCEPTION_STKSZ (PAGE_SIZE). | ||
92 | |||
93 | Used for interrupt 18 - Machine Check Exception (#MC). | ||
94 | |||
95 | MCE can be delivered at any time, including when the kernel is in the | ||
96 | middle of switching stacks. Using IST for MCE events avoids making | ||
97 | assumptions about the previous state of the kernel stack. | ||
98 | |||
99 | For more details see the Intel IA32 or AMD AMD64 architecture manuals. | ||