diff options
Diffstat (limited to 'Documentation')
73 files changed, 5432 insertions, 343 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index d05737aaa84b..06b982affe76 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX | |||
@@ -82,6 +82,8 @@ block/ | |||
82 | - info on the Block I/O (BIO) layer. | 82 | - info on the Block I/O (BIO) layer. |
83 | blockdev/ | 83 | blockdev/ |
84 | - info on block devices & drivers | 84 | - info on block devices & drivers |
85 | btmrvl.txt | ||
86 | - info on Marvell Bluetooth driver usage. | ||
85 | cachetlb.txt | 87 | cachetlb.txt |
86 | - describes the cache/TLB flushing interfaces Linux uses. | 88 | - describes the cache/TLB flushing interfaces Linux uses. |
87 | cdrom/ | 89 | cdrom/ |
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 6bf68053e4b8..25be3250f7d6 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci | |||
@@ -84,6 +84,16 @@ Description: | |||
84 | from this part of the device tree. | 84 | from this part of the device tree. |
85 | Depends on CONFIG_HOTPLUG. | 85 | Depends on CONFIG_HOTPLUG. |
86 | 86 | ||
87 | What: /sys/bus/pci/devices/.../reset | ||
88 | Date: July 2009 | ||
89 | Contact: Michael S. Tsirkin <mst@redhat.com> | ||
90 | Description: | ||
91 | Some devices allow an individual function to be reset | ||
92 | without affecting other functions in the same device. | ||
93 | For devices that have this support, a file named reset | ||
94 | will be present in sysfs. Writing 1 to this file | ||
95 | will perform reset. | ||
96 | |||
87 | What: /sys/bus/pci/devices/.../vpd | 97 | What: /sys/bus/pci/devices/.../vpd |
88 | Date: February 2008 | 98 | Date: February 2008 |
89 | Contact: Ben Hutchings <bhutchings@solarflare.com> | 99 | Contact: Ben Hutchings <bhutchings@solarflare.com> |
diff --git a/Documentation/DocBook/uio-howto.tmpl b/Documentation/DocBook/uio-howto.tmpl index 8f6e3b2403c7..4d4ce0e61e42 100644 --- a/Documentation/DocBook/uio-howto.tmpl +++ b/Documentation/DocBook/uio-howto.tmpl | |||
@@ -25,6 +25,10 @@ | |||
25 | <year>2006-2008</year> | 25 | <year>2006-2008</year> |
26 | <holder>Hans-Jürgen Koch.</holder> | 26 | <holder>Hans-Jürgen Koch.</holder> |
27 | </copyright> | 27 | </copyright> |
28 | <copyright> | ||
29 | <year>2009</year> | ||
30 | <holder>Red Hat Inc, Michael S. Tsirkin (mst@redhat.com)</holder> | ||
31 | </copyright> | ||
28 | 32 | ||
29 | <legalnotice> | 33 | <legalnotice> |
30 | <para> | 34 | <para> |
@@ -42,6 +46,13 @@ GPL version 2. | |||
42 | 46 | ||
43 | <revhistory> | 47 | <revhistory> |
44 | <revision> | 48 | <revision> |
49 | <revnumber>0.9</revnumber> | ||
50 | <date>2009-07-16</date> | ||
51 | <authorinitials>mst</authorinitials> | ||
52 | <revremark>Added generic pci driver | ||
53 | </revremark> | ||
54 | </revision> | ||
55 | <revision> | ||
45 | <revnumber>0.8</revnumber> | 56 | <revnumber>0.8</revnumber> |
46 | <date>2008-12-24</date> | 57 | <date>2008-12-24</date> |
47 | <authorinitials>hjk</authorinitials> | 58 | <authorinitials>hjk</authorinitials> |
@@ -809,6 +820,158 @@ framework to set up sysfs files for this region. Simply leave it alone. | |||
809 | 820 | ||
810 | </chapter> | 821 | </chapter> |
811 | 822 | ||
823 | <chapter id="uio_pci_generic" xreflabel="Using Generic driver for PCI cards"> | ||
824 | <?dbhtml filename="uio_pci_generic.html"?> | ||
825 | <title>Generic PCI UIO driver</title> | ||
826 | <para> | ||
827 | The generic driver is a kernel module named uio_pci_generic. | ||
828 | It can work with any device compliant to PCI 2.3 (circa 2002) and | ||
829 | any compliant PCI Express device. Using this, you only need to | ||
830 | write the userspace driver, removing the need to write | ||
831 | a hardware-specific kernel module. | ||
832 | </para> | ||
833 | |||
834 | <sect1 id="uio_pci_generic_binding"> | ||
835 | <title>Making the driver recognize the device</title> | ||
836 | <para> | ||
837 | Since the driver does not declare any device ids, it will not get loaded | ||
838 | automatically and will not automatically bind to any devices, you must load it | ||
839 | and allocate id to the driver yourself. For example: | ||
840 | <programlisting> | ||
841 | modprobe uio_pci_generic | ||
842 | echo "8086 10f5" > /sys/bus/pci/drivers/uio_pci_generic/new_id | ||
843 | </programlisting> | ||
844 | </para> | ||
845 | <para> | ||
846 | If there already is a hardware specific kernel driver for your device, the | ||
847 | generic driver still won't bind to it, in this case if you want to use the | ||
848 | generic driver (why would you?) you'll have to manually unbind the hardware | ||
849 | specific driver and bind the generic driver, like this: | ||
850 | <programlisting> | ||
851 | echo -n 0000:00:19.0 > /sys/bus/pci/drivers/e1000e/unbind | ||
852 | echo -n 0000:00:19.0 > /sys/bus/pci/drivers/uio_pci_generic/bind | ||
853 | </programlisting> | ||
854 | </para> | ||
855 | <para> | ||
856 | You can verify that the device has been bound to the driver | ||
857 | by looking for it in sysfs, for example like the following: | ||
858 | <programlisting> | ||
859 | ls -l /sys/bus/pci/devices/0000:00:19.0/driver | ||
860 | </programlisting> | ||
861 | Which if successful should print | ||
862 | <programlisting> | ||
863 | .../0000:00:19.0/driver -> ../../../bus/pci/drivers/uio_pci_generic | ||
864 | </programlisting> | ||
865 | Note that the generic driver will not bind to old PCI 2.2 devices. | ||
866 | If binding the device failed, run the following command: | ||
867 | <programlisting> | ||
868 | dmesg | ||
869 | </programlisting> | ||
870 | and look in the output for failure reasons | ||
871 | </para> | ||
872 | </sect1> | ||
873 | |||
874 | <sect1 id="uio_pci_generic_internals"> | ||
875 | <title>Things to know about uio_pci_generic</title> | ||
876 | <para> | ||
877 | Interrupts are handled using the Interrupt Disable bit in the PCI command | ||
878 | register and Interrupt Status bit in the PCI status register. All devices | ||
879 | compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should | ||
880 | support these bits. uio_pci_generic detects this support, and won't bind to | ||
881 | devices which do not support the Interrupt Disable Bit in the command register. | ||
882 | </para> | ||
883 | <para> | ||
884 | On each interrupt, uio_pci_generic sets the Interrupt Disable bit. | ||
885 | This prevents the device from generating further interrupts | ||
886 | until the bit is cleared. The userspace driver should clear this | ||
887 | bit before blocking and waiting for more interrupts. | ||
888 | </para> | ||
889 | </sect1> | ||
890 | <sect1 id="uio_pci_generic_userspace"> | ||
891 | <title>Writing userspace driver using uio_pci_generic</title> | ||
892 | <para> | ||
893 | Userspace driver can use pci sysfs interface, or the | ||
894 | libpci libray that wraps it, to talk to the device and to | ||
895 | re-enable interrupts by writing to the command register. | ||
896 | </para> | ||
897 | </sect1> | ||
898 | <sect1 id="uio_pci_generic_example"> | ||
899 | <title>Example code using uio_pci_generic</title> | ||
900 | <para> | ||
901 | Here is some sample userspace driver code using uio_pci_generic: | ||
902 | <programlisting> | ||
903 | #include <stdlib.h> | ||
904 | #include <stdio.h> | ||
905 | #include <unistd.h> | ||
906 | #include <sys/types.h> | ||
907 | #include <sys/stat.h> | ||
908 | #include <fcntl.h> | ||
909 | #include <errno.h> | ||
910 | |||
911 | int main() | ||
912 | { | ||
913 | int uiofd; | ||
914 | int configfd; | ||
915 | int err; | ||
916 | int i; | ||
917 | unsigned icount; | ||
918 | unsigned char command_high; | ||
919 | |||
920 | uiofd = open("/dev/uio0", O_RDONLY); | ||
921 | if (uiofd < 0) { | ||
922 | perror("uio open:"); | ||
923 | return errno; | ||
924 | } | ||
925 | configfd = open("/sys/class/uio/uio0/device/config", O_RDWR); | ||
926 | if (uiofd < 0) { | ||
927 | perror("config open:"); | ||
928 | return errno; | ||
929 | } | ||
930 | |||
931 | /* Read and cache command value */ | ||
932 | err = pread(configfd, &command_high, 1, 5); | ||
933 | if (err != 1) { | ||
934 | perror("command config read:"); | ||
935 | return errno; | ||
936 | } | ||
937 | command_high &= ~0x4; | ||
938 | |||
939 | for(i = 0;; ++i) { | ||
940 | /* Print out a message, for debugging. */ | ||
941 | if (i == 0) | ||
942 | fprintf(stderr, "Started uio test driver.\n"); | ||
943 | else | ||
944 | fprintf(stderr, "Interrupts: %d\n", icount); | ||
945 | |||
946 | /****************************************/ | ||
947 | /* Here we got an interrupt from the | ||
948 | device. Do something to it. */ | ||
949 | /****************************************/ | ||
950 | |||
951 | /* Re-enable interrupts. */ | ||
952 | err = pwrite(configfd, &command_high, 1, 5); | ||
953 | if (err != 1) { | ||
954 | perror("config write:"); | ||
955 | break; | ||
956 | } | ||
957 | |||
958 | /* Wait for next interrupt. */ | ||
959 | err = read(uiofd, &icount, 4); | ||
960 | if (err != 4) { | ||
961 | perror("uio read:"); | ||
962 | break; | ||
963 | } | ||
964 | |||
965 | } | ||
966 | return errno; | ||
967 | } | ||
968 | |||
969 | </programlisting> | ||
970 | </para> | ||
971 | </sect1> | ||
972 | |||
973 | </chapter> | ||
974 | |||
812 | <appendix id="app1"> | 975 | <appendix id="app1"> |
813 | <title>Further information</title> | 976 | <title>Further information</title> |
814 | <itemizedlist> | 977 | <itemizedlist> |
diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.txt index 6650af432523..e83f2ea76415 100644 --- a/Documentation/PCI/pci-error-recovery.txt +++ b/Documentation/PCI/pci-error-recovery.txt | |||
@@ -4,15 +4,17 @@ | |||
4 | February 2, 2006 | 4 | February 2, 2006 |
5 | 5 | ||
6 | Current document maintainer: | 6 | Current document maintainer: |
7 | Linas Vepstas <linas@austin.ibm.com> | 7 | Linas Vepstas <linasvepstas@gmail.com> |
8 | updated by Richard Lary <rlary@us.ibm.com> | ||
9 | and Mike Mason <mmlnx@us.ibm.com> on 27-Jul-2009 | ||
8 | 10 | ||
9 | 11 | ||
10 | Many PCI bus controllers are able to detect a variety of hardware | 12 | Many PCI bus controllers are able to detect a variety of hardware |
11 | PCI errors on the bus, such as parity errors on the data and address | 13 | PCI errors on the bus, such as parity errors on the data and address |
12 | busses, as well as SERR and PERR errors. Some of the more advanced | 14 | busses, as well as SERR and PERR errors. Some of the more advanced |
13 | chipsets are able to deal with these errors; these include PCI-E chipsets, | 15 | chipsets are able to deal with these errors; these include PCI-E chipsets, |
14 | and the PCI-host bridges found on IBM Power4 and Power5-based pSeries | 16 | and the PCI-host bridges found on IBM Power4, Power5 and Power6-based |
15 | boxes. A typical action taken is to disconnect the affected device, | 17 | pSeries boxes. A typical action taken is to disconnect the affected device, |
16 | halting all I/O to it. The goal of a disconnection is to avoid system | 18 | halting all I/O to it. The goal of a disconnection is to avoid system |
17 | corruption; for example, to halt system memory corruption due to DMA's | 19 | corruption; for example, to halt system memory corruption due to DMA's |
18 | to "wild" addresses. Typically, a reconnection mechanism is also | 20 | to "wild" addresses. Typically, a reconnection mechanism is also |
@@ -37,10 +39,11 @@ is forced by the need to handle multi-function devices, that is, | |||
37 | devices that have multiple device drivers associated with them. | 39 | devices that have multiple device drivers associated with them. |
38 | In the first stage, each driver is allowed to indicate what type | 40 | In the first stage, each driver is allowed to indicate what type |
39 | of reset it desires, the choices being a simple re-enabling of I/O | 41 | of reset it desires, the choices being a simple re-enabling of I/O |
40 | or requesting a hard reset (a full electrical #RST of the PCI card). | 42 | or requesting a slot reset. |
41 | If any driver requests a full reset, that is what will be done. | ||
42 | 43 | ||
43 | After a full reset and/or a re-enabling of I/O, all drivers are | 44 | If any driver requests a slot reset, that is what will be done. |
45 | |||
46 | After a reset and/or a re-enabling of I/O, all drivers are | ||
44 | again notified, so that they may then perform any device setup/config | 47 | again notified, so that they may then perform any device setup/config |
45 | that may be required. After these have all completed, a final | 48 | that may be required. After these have all completed, a final |
46 | "resume normal operations" event is sent out. | 49 | "resume normal operations" event is sent out. |
@@ -101,7 +104,7 @@ if it implements any, it must implement error_detected(). If a callback | |||
101 | is not implemented, the corresponding feature is considered unsupported. | 104 | is not implemented, the corresponding feature is considered unsupported. |
102 | For example, if mmio_enabled() and resume() aren't there, then it | 105 | For example, if mmio_enabled() and resume() aren't there, then it |
103 | is assumed that the driver is not doing any direct recovery and requires | 106 | is assumed that the driver is not doing any direct recovery and requires |
104 | a reset. If link_reset() is not implemented, the card is assumed as | 107 | a slot reset. If link_reset() is not implemented, the card is assumed to |
105 | not care about link resets. Typically a driver will want to know about | 108 | not care about link resets. Typically a driver will want to know about |
106 | a slot_reset(). | 109 | a slot_reset(). |
107 | 110 | ||
@@ -111,7 +114,7 @@ sequence described below. | |||
111 | 114 | ||
112 | STEP 0: Error Event | 115 | STEP 0: Error Event |
113 | ------------------- | 116 | ------------------- |
114 | PCI bus error is detect by the PCI hardware. On powerpc, the slot | 117 | A PCI bus error is detected by the PCI hardware. On powerpc, the slot |
115 | is isolated, in that all I/O is blocked: all reads return 0xffffffff, | 118 | is isolated, in that all I/O is blocked: all reads return 0xffffffff, |
116 | all writes are ignored. | 119 | all writes are ignored. |
117 | 120 | ||
@@ -139,7 +142,7 @@ The driver must return one of the following result codes: | |||
139 | a chance to extract some diagnostic information (see | 142 | a chance to extract some diagnostic information (see |
140 | mmio_enable, below). | 143 | mmio_enable, below). |
141 | - PCI_ERS_RESULT_NEED_RESET: | 144 | - PCI_ERS_RESULT_NEED_RESET: |
142 | Driver returns this if it can't recover without a hard | 145 | Driver returns this if it can't recover without a |
143 | slot reset. | 146 | slot reset. |
144 | - PCI_ERS_RESULT_DISCONNECT: | 147 | - PCI_ERS_RESULT_DISCONNECT: |
145 | Driver returns this if it doesn't want to recover at all. | 148 | Driver returns this if it doesn't want to recover at all. |
@@ -169,11 +172,11 @@ is STEP 6 (Permanent Failure). | |||
169 | 172 | ||
170 | >>> The current powerpc implementation doesn't much care if the device | 173 | >>> The current powerpc implementation doesn't much care if the device |
171 | >>> attempts I/O at this point, or not. I/O's will fail, returning | 174 | >>> attempts I/O at this point, or not. I/O's will fail, returning |
172 | >>> a value of 0xff on read, and writes will be dropped. If the device | 175 | >>> a value of 0xff on read, and writes will be dropped. If more than |
173 | >>> driver attempts more than 10K I/O's to a frozen adapter, it will | 176 | >>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH |
174 | >>> assume that the device driver has gone into an infinite loop, and | 177 | >>> assumes that the device driver has gone into an infinite loop |
175 | >>> it will panic the kernel. There doesn't seem to be any other | 178 | >>> and prints an error to syslog. A reboot is then required to |
176 | >>> way of stopping a device driver that insists on spinning on I/O. | 179 | >>> get the device working again. |
177 | 180 | ||
178 | STEP 2: MMIO Enabled | 181 | STEP 2: MMIO Enabled |
179 | ------------------- | 182 | ------------------- |
@@ -182,15 +185,14 @@ DMA), and then calls the mmio_enabled() callback on all affected | |||
182 | device drivers. | 185 | device drivers. |
183 | 186 | ||
184 | This is the "early recovery" call. IOs are allowed again, but DMA is | 187 | This is the "early recovery" call. IOs are allowed again, but DMA is |
185 | not (hrm... to be discussed, I prefer not), with some restrictions. This | 188 | not, with some restrictions. This is NOT a callback for the driver to |
186 | is NOT a callback for the driver to start operations again, only to | 189 | start operations again, only to peek/poke at the device, extract diagnostic |
187 | peek/poke at the device, extract diagnostic information, if any, and | 190 | information, if any, and eventually do things like trigger a device local |
188 | eventually do things like trigger a device local reset or some such, | 191 | reset or some such, but not restart operations. This callback is made if |
189 | but not restart operations. This is callback is made if all drivers on | 192 | all drivers on a segment agree that they can try to recover and if no automatic |
190 | a segment agree that they can try to recover and if no automatic link reset | 193 | link reset was performed by the HW. If the platform can't just re-enable IOs |
191 | was performed by the HW. If the platform can't just re-enable IOs without | 194 | without a slot reset or a link reset, it will not call this callback, and |
192 | a slot reset or a link reset, it wont call this callback, and instead | 195 | instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) |
193 | will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) | ||
194 | 196 | ||
195 | >>> The following is proposed; no platform implements this yet: | 197 | >>> The following is proposed; no platform implements this yet: |
196 | >>> Proposal: All I/O's should be done _synchronously_ from within | 198 | >>> Proposal: All I/O's should be done _synchronously_ from within |
@@ -228,9 +230,6 @@ proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). | |||
228 | If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform | 230 | If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform |
229 | proceeds to STEP 4 (Slot Reset) | 231 | proceeds to STEP 4 (Slot Reset) |
230 | 232 | ||
231 | >>> The current powerpc implementation does not implement this callback. | ||
232 | |||
233 | |||
234 | STEP 3: Link Reset | 233 | STEP 3: Link Reset |
235 | ------------------ | 234 | ------------------ |
236 | The platform resets the link, and then calls the link_reset() callback | 235 | The platform resets the link, and then calls the link_reset() callback |
@@ -253,16 +252,33 @@ The platform then proceeds to either STEP 4 (Slot Reset) or STEP 5 | |||
253 | 252 | ||
254 | >>> The current powerpc implementation does not implement this callback. | 253 | >>> The current powerpc implementation does not implement this callback. |
255 | 254 | ||
256 | |||
257 | STEP 4: Slot Reset | 255 | STEP 4: Slot Reset |
258 | ------------------ | 256 | ------------------ |
259 | The platform performs a soft or hard reset of the device, and then | ||
260 | calls the slot_reset() callback. | ||
261 | 257 | ||
262 | A soft reset consists of asserting the adapter #RST line and then | 258 | In response to a return value of PCI_ERS_RESULT_NEED_RESET, the |
259 | the platform will peform a slot reset on the requesting PCI device(s). | ||
260 | The actual steps taken by a platform to perform a slot reset | ||
261 | will be platform-dependent. Upon completion of slot reset, the | ||
262 | platform will call the device slot_reset() callback. | ||
263 | |||
264 | Powerpc platforms implement two levels of slot reset: | ||
265 | soft reset(default) and fundamental(optional) reset. | ||
266 | |||
267 | Powerpc soft reset consists of asserting the adapter #RST line and then | ||
263 | restoring the PCI BAR's and PCI configuration header to a state | 268 | restoring the PCI BAR's and PCI configuration header to a state |
264 | that is equivalent to what it would be after a fresh system | 269 | that is equivalent to what it would be after a fresh system |
265 | power-on followed by power-on BIOS/system firmware initialization. | 270 | power-on followed by power-on BIOS/system firmware initialization. |
271 | Soft reset is also known as hot-reset. | ||
272 | |||
273 | Powerpc fundamental reset is supported by PCI Express cards only | ||
274 | and results in device's state machines, hardware logic, port states and | ||
275 | configuration registers to initialize to their default conditions. | ||
276 | |||
277 | For most PCI devices, a soft reset will be sufficient for recovery. | ||
278 | Optional fundamental reset is provided to support a limited number | ||
279 | of PCI Express PCI devices for which a soft reset is not sufficient | ||
280 | for recovery. | ||
281 | |||
266 | If the platform supports PCI hotplug, then the reset might be | 282 | If the platform supports PCI hotplug, then the reset might be |
267 | performed by toggling the slot electrical power off/on. | 283 | performed by toggling the slot electrical power off/on. |
268 | 284 | ||
@@ -274,10 +290,12 @@ may result in hung devices, kernel panics, or silent data corruption. | |||
274 | 290 | ||
275 | This call gives drivers the chance to re-initialize the hardware | 291 | This call gives drivers the chance to re-initialize the hardware |
276 | (re-download firmware, etc.). At this point, the driver may assume | 292 | (re-download firmware, etc.). At this point, the driver may assume |
277 | that he card is in a fresh state and is fully functional. In | 293 | that the card is in a fresh state and is fully functional. The slot |
278 | particular, interrupt generation should work normally. | 294 | is unfrozen and the driver has full access to PCI config space, |
295 | memory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X) | ||
296 | will also be available. | ||
279 | 297 | ||
280 | Drivers should not yet restart normal I/O processing operations | 298 | Drivers should not restart normal I/O processing operations |
281 | at this point. If all device drivers report success on this | 299 | at this point. If all device drivers report success on this |
282 | callback, the platform will call resume() to complete the sequence, | 300 | callback, the platform will call resume() to complete the sequence, |
283 | and let the driver restart normal I/O processing. | 301 | and let the driver restart normal I/O processing. |
@@ -302,11 +320,21 @@ driver performs device init only from PCI function 0: | |||
302 | - PCI_ERS_RESULT_DISCONNECT | 320 | - PCI_ERS_RESULT_DISCONNECT |
303 | Same as above. | 321 | Same as above. |
304 | 322 | ||
323 | Drivers for PCI Express cards that require a fundamental reset must | ||
324 | set the needs_freset bit in the pci_dev structure in their probe function. | ||
325 | For example, the QLogic qla2xxx driver sets the needs_freset bit for certain | ||
326 | PCI card types: | ||
327 | |||
328 | + /* Set EEH reset type to fundamental if required by hba */ | ||
329 | + if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha)) | ||
330 | + pdev->needs_freset = 1; | ||
331 | + | ||
332 | |||
305 | Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent | 333 | Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent |
306 | Failure). | 334 | Failure). |
307 | 335 | ||
308 | >>> The current powerpc implementation does not currently try a | 336 | >>> The current powerpc implementation does not try a power-cycle |
309 | >>> power-cycle reset if the driver returned PCI_ERS_RESULT_DISCONNECT. | 337 | >>> reset if the driver returned PCI_ERS_RESULT_DISCONNECT. |
310 | >>> However, it probably should. | 338 | >>> However, it probably should. |
311 | 339 | ||
312 | 340 | ||
@@ -348,7 +376,7 @@ software errors. | |||
348 | 376 | ||
349 | Conclusion; General Remarks | 377 | Conclusion; General Remarks |
350 | --------------------------- | 378 | --------------------------- |
351 | The way those callbacks are called is platform policy. A platform with | 379 | The way the callbacks are called is platform policy. A platform with |
352 | no slot reset capability may want to just "ignore" drivers that can't | 380 | no slot reset capability may want to just "ignore" drivers that can't |
353 | recover (disconnect them) and try to let other cards on the same segment | 381 | recover (disconnect them) and try to let other cards on the same segment |
354 | recover. Keep in mind that in most real life cases, though, there will | 382 | recover. Keep in mind that in most real life cases, though, there will |
@@ -361,8 +389,8 @@ That is, the recovery API only requires that: | |||
361 | 389 | ||
362 | - There is no guarantee that interrupt delivery can proceed from any | 390 | - There is no guarantee that interrupt delivery can proceed from any |
363 | device on the segment starting from the error detection and until the | 391 | device on the segment starting from the error detection and until the |
364 | resume callback is sent, at which point interrupts are expected to be | 392 | slot_reset callback is called, at which point interrupts are expected |
365 | fully operational. | 393 | to be fully operational. |
366 | 394 | ||
367 | - There is no guarantee that interrupt delivery is stopped, that is, | 395 | - There is no guarantee that interrupt delivery is stopped, that is, |
368 | a driver that gets an interrupt after detecting an error, or that detects | 396 | a driver that gets an interrupt after detecting an error, or that detects |
@@ -381,16 +409,23 @@ anyway :) | |||
381 | >>> Implementation details for the powerpc platform are discussed in | 409 | >>> Implementation details for the powerpc platform are discussed in |
382 | >>> the file Documentation/powerpc/eeh-pci-error-recovery.txt | 410 | >>> the file Documentation/powerpc/eeh-pci-error-recovery.txt |
383 | 411 | ||
384 | >>> As of this writing, there are six device drivers with patches | 412 | >>> As of this writing, there is a growing list of device drivers with |
385 | >>> implementing error recovery. Not all of these patches are in | 413 | >>> patches implementing error recovery. Not all of these patches are in |
386 | >>> mainline yet. These may be used as "examples": | 414 | >>> mainline yet. These may be used as "examples": |
387 | >>> | 415 | >>> |
388 | >>> drivers/scsi/ipr.c | 416 | >>> drivers/scsi/ipr |
389 | >>> drivers/scsi/sym53cxx_2 | 417 | >>> drivers/scsi/sym53c8xx_2 |
418 | >>> drivers/scsi/qla2xxx | ||
419 | >>> drivers/scsi/lpfc | ||
420 | >>> drivers/next/bnx2.c | ||
390 | >>> drivers/next/e100.c | 421 | >>> drivers/next/e100.c |
391 | >>> drivers/net/e1000 | 422 | >>> drivers/net/e1000 |
423 | >>> drivers/net/e1000e | ||
392 | >>> drivers/net/ixgb | 424 | >>> drivers/net/ixgb |
425 | >>> drivers/net/ixgbe | ||
426 | >>> drivers/net/cxgb3 | ||
393 | >>> drivers/net/s2io.c | 427 | >>> drivers/net/s2io.c |
428 | >>> drivers/net/qlge | ||
394 | 429 | ||
395 | The End | 430 | The End |
396 | ------- | 431 | ------- |
diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt index 9f711d2df91b..d2b85237c76e 100644 --- a/Documentation/RCU/RTFP.txt +++ b/Documentation/RCU/RTFP.txt | |||
@@ -743,3 +743,80 @@ Revised: | |||
743 | RCU, realtime RCU, sleepable RCU, performance. | 743 | RCU, realtime RCU, sleepable RCU, performance. |
744 | " | 744 | " |
745 | } | 745 | } |
746 | |||
747 | @article{PaulEMcKenney2008RCUOSR | ||
748 | ,author="Paul E. McKenney and Jonathan Walpole" | ||
749 | ,title="Introducing technology into the {Linux} kernel: a case study" | ||
750 | ,Year="2008" | ||
751 | ,journal="SIGOPS Oper. Syst. Rev." | ||
752 | ,volume="42" | ||
753 | ,number="5" | ||
754 | ,pages="4--17" | ||
755 | ,issn="0163-5980" | ||
756 | ,doi={http://doi.acm.org/10.1145/1400097.1400099} | ||
757 | ,publisher="ACM" | ||
758 | ,address="New York, NY, USA" | ||
759 | ,annotation={ | ||
760 | Linux changed RCU to a far greater degree than RCU has changed Linux. | ||
761 | } | ||
762 | } | ||
763 | |||
764 | @unpublished{PaulEMcKenney2008HierarchicalRCU | ||
765 | ,Author="Paul E. McKenney" | ||
766 | ,Title="Hierarchical {RCU}" | ||
767 | ,month="November" | ||
768 | ,day="3" | ||
769 | ,year="2008" | ||
770 | ,note="Available: | ||
771 | \url{http://lwn.net/Articles/305782/} | ||
772 | [Viewed November 6, 2008]" | ||
773 | ,annotation=" | ||
774 | RCU with combining-tree-based grace-period detection, | ||
775 | permitting it to handle thousands of CPUs. | ||
776 | " | ||
777 | } | ||
778 | |||
779 | @conference{PaulEMcKenney2009MaliciousURCU | ||
780 | ,Author="Paul E. McKenney" | ||
781 | ,Title="Using a Malicious User-Level {RCU} to Torture {RCU}-Based Algorithms" | ||
782 | ,Booktitle="linux.conf.au 2009" | ||
783 | ,month="January" | ||
784 | ,year="2009" | ||
785 | ,address="Hobart, Australia" | ||
786 | ,note="Available: | ||
787 | \url{http://www.rdrop.com/users/paulmck/RCU/urcutorture.2009.01.22a.pdf} | ||
788 | [Viewed February 2, 2009]" | ||
789 | ,annotation=" | ||
790 | Realtime RCU and torture-testing RCU uses. | ||
791 | " | ||
792 | } | ||
793 | |||
794 | @unpublished{MathieuDesnoyers2009URCU | ||
795 | ,Author="Mathieu Desnoyers" | ||
796 | ,Title="[{RFC} git tree] Userspace {RCU} (urcu) for {Linux}" | ||
797 | ,month="February" | ||
798 | ,day="5" | ||
799 | ,year="2009" | ||
800 | ,note="Available: | ||
801 | \url{http://lkml.org/lkml/2009/2/5/572} | ||
802 | \url{git://lttng.org/userspace-rcu.git} | ||
803 | [Viewed February 20, 2009]" | ||
804 | ,annotation=" | ||
805 | Mathieu Desnoyers's user-space RCU implementation. | ||
806 | git://lttng.org/userspace-rcu.git | ||
807 | " | ||
808 | } | ||
809 | |||
810 | @unpublished{PaulEMcKenney2009BloatWatchRCU | ||
811 | ,Author="Paul E. McKenney" | ||
812 | ,Title="{RCU}: The {Bloatwatch} Edition" | ||
813 | ,month="March" | ||
814 | ,day="17" | ||
815 | ,year="2009" | ||
816 | ,note="Available: | ||
817 | \url{http://lwn.net/Articles/323929/} | ||
818 | [Viewed March 20, 2009]" | ||
819 | ,annotation=" | ||
820 | Uniprocessor assumptions allow simplified RCU implementation. | ||
821 | " | ||
822 | } | ||
diff --git a/Documentation/RCU/UP.txt b/Documentation/RCU/UP.txt index aab4a9ec3931..90ec5341ee98 100644 --- a/Documentation/RCU/UP.txt +++ b/Documentation/RCU/UP.txt | |||
@@ -2,14 +2,13 @@ RCU on Uniprocessor Systems | |||
2 | 2 | ||
3 | 3 | ||
4 | A common misconception is that, on UP systems, the call_rcu() primitive | 4 | A common misconception is that, on UP systems, the call_rcu() primitive |
5 | may immediately invoke its function, and that the synchronize_rcu() | 5 | may immediately invoke its function. The basis of this misconception |
6 | primitive may return immediately. The basis of this misconception | ||
7 | is that since there is only one CPU, it should not be necessary to | 6 | is that since there is only one CPU, it should not be necessary to |
8 | wait for anything else to get done, since there are no other CPUs for | 7 | wait for anything else to get done, since there are no other CPUs for |
9 | anything else to be happening on. Although this approach will -sort- -of- | 8 | anything else to be happening on. Although this approach will -sort- -of- |
10 | work a surprising amount of the time, it is a very bad idea in general. | 9 | work a surprising amount of the time, it is a very bad idea in general. |
11 | This document presents three examples that demonstrate exactly how bad an | 10 | This document presents three examples that demonstrate exactly how bad |
12 | idea this is. | 11 | an idea this is. |
13 | 12 | ||
14 | 13 | ||
15 | Example 1: softirq Suicide | 14 | Example 1: softirq Suicide |
@@ -82,11 +81,18 @@ Quick Quiz #2: What locking restriction must RCU callbacks respect? | |||
82 | 81 | ||
83 | Summary | 82 | Summary |
84 | 83 | ||
85 | Permitting call_rcu() to immediately invoke its arguments or permitting | 84 | Permitting call_rcu() to immediately invoke its arguments breaks RCU, |
86 | synchronize_rcu() to immediately return breaks RCU, even on a UP system. | 85 | even on a UP system. So do not do it! Even on a UP system, the RCU |
87 | So do not do it! Even on a UP system, the RCU infrastructure -must- | 86 | infrastructure -must- respect grace periods, and -must- invoke callbacks |
88 | respect grace periods, and -must- invoke callbacks from a known environment | 87 | from a known environment in which no locks are held. |
89 | in which no locks are held. | 88 | |
89 | It -is- safe for synchronize_sched() and synchronize_rcu_bh() to return | ||
90 | immediately on an UP system. It is also safe for synchronize_rcu() | ||
91 | to return immediately on UP systems, except when running preemptable | ||
92 | RCU. | ||
93 | |||
94 | Quick Quiz #3: Why can't synchronize_rcu() return immediately on | ||
95 | UP systems running preemptable RCU? | ||
90 | 96 | ||
91 | 97 | ||
92 | Answer to Quick Quiz #1: | 98 | Answer to Quick Quiz #1: |
@@ -117,3 +123,13 @@ Answer to Quick Quiz #2: | |||
117 | callbacks acquire locks directly. However, a great many RCU | 123 | callbacks acquire locks directly. However, a great many RCU |
118 | callbacks do acquire locks -indirectly-, for example, via | 124 | callbacks do acquire locks -indirectly-, for example, via |
119 | the kfree() primitive. | 125 | the kfree() primitive. |
126 | |||
127 | Answer to Quick Quiz #3: | ||
128 | Why can't synchronize_rcu() return immediately on UP systems | ||
129 | running preemptable RCU? | ||
130 | |||
131 | Because some other task might have been preempted in the middle | ||
132 | of an RCU read-side critical section. If synchronize_rcu() | ||
133 | simply immediately returned, it would prematurely signal the | ||
134 | end of the grace period, which would come as a nasty shock to | ||
135 | that other thread when it started running again. | ||
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index accfe2f5247d..51525a30e8b4 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt | |||
@@ -11,7 +11,10 @@ over a rather long period of time, but improvements are always welcome! | |||
11 | structure is updated more than about 10% of the time, then | 11 | structure is updated more than about 10% of the time, then |
12 | you should strongly consider some other approach, unless | 12 | you should strongly consider some other approach, unless |
13 | detailed performance measurements show that RCU is nonetheless | 13 | detailed performance measurements show that RCU is nonetheless |
14 | the right tool for the job. | 14 | the right tool for the job. Yes, you might think of RCU |
15 | as simply cutting overhead off of the readers and imposing it | ||
16 | on the writers. That is exactly why normal uses of RCU will | ||
17 | do much more reading than updating. | ||
15 | 18 | ||
16 | Another exception is where performance is not an issue, and RCU | 19 | Another exception is where performance is not an issue, and RCU |
17 | provides a simpler implementation. An example of this situation | 20 | provides a simpler implementation. An example of this situation |
@@ -240,10 +243,11 @@ over a rather long period of time, but improvements are always welcome! | |||
240 | instead need to use synchronize_irq() or synchronize_sched(). | 243 | instead need to use synchronize_irq() or synchronize_sched(). |
241 | 244 | ||
242 | 12. Any lock acquired by an RCU callback must be acquired elsewhere | 245 | 12. Any lock acquired by an RCU callback must be acquired elsewhere |
243 | with irq disabled, e.g., via spin_lock_irqsave(). Failing to | 246 | with softirq disabled, e.g., via spin_lock_irqsave(), |
244 | disable irq on a given acquisition of that lock will result in | 247 | spin_lock_bh(), etc. Failing to disable irq on a given |
245 | deadlock as soon as the RCU callback happens to interrupt that | 248 | acquisition of that lock will result in deadlock as soon as the |
246 | acquisition's critical section. | 249 | RCU callback happens to interrupt that acquisition's critical |
250 | section. | ||
247 | 251 | ||
248 | 13. RCU callbacks can be and are executed in parallel. In many cases, | 252 | 13. RCU callbacks can be and are executed in parallel. In many cases, |
249 | the callback code simply wrappers around kfree(), so that this | 253 | the callback code simply wrappers around kfree(), so that this |
@@ -310,3 +314,9 @@ over a rather long period of time, but improvements are always welcome! | |||
310 | Because these primitives only wait for pre-existing readers, | 314 | Because these primitives only wait for pre-existing readers, |
311 | it is the caller's responsibility to guarantee safety to | 315 | it is the caller's responsibility to guarantee safety to |
312 | any subsequent readers. | 316 | any subsequent readers. |
317 | |||
318 | 16. The various RCU read-side primitives do -not- contain memory | ||
319 | barriers. The CPU (and in some cases, the compiler) is free | ||
320 | to reorder code into and out of RCU read-side critical sections. | ||
321 | It is the responsibility of the RCU update-side primitives to | ||
322 | deal with this. | ||
diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt index 7aa2002ade77..2a23523ce471 100644 --- a/Documentation/RCU/rcu.txt +++ b/Documentation/RCU/rcu.txt | |||
@@ -36,7 +36,7 @@ o How can the updater tell when a grace period has completed | |||
36 | executed in user mode, or executed in the idle loop, we can | 36 | executed in user mode, or executed in the idle loop, we can |
37 | safely free up that item. | 37 | safely free up that item. |
38 | 38 | ||
39 | Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the | 39 | Preemptible variants of RCU (CONFIG_TREE_PREEMPT_RCU) get the |
40 | same effect, but require that the readers manipulate CPU-local | 40 | same effect, but require that the readers manipulate CPU-local |
41 | counters. These counters allow limited types of blocking | 41 | counters. These counters allow limited types of blocking |
42 | within RCU read-side critical sections. SRCU also uses | 42 | within RCU read-side critical sections. SRCU also uses |
@@ -79,10 +79,10 @@ o I hear that RCU is patented? What is with that? | |||
79 | o I hear that RCU needs work in order to support realtime kernels? | 79 | o I hear that RCU needs work in order to support realtime kernels? |
80 | 80 | ||
81 | This work is largely completed. Realtime-friendly RCU can be | 81 | This work is largely completed. Realtime-friendly RCU can be |
82 | enabled via the CONFIG_PREEMPT_RCU kernel configuration parameter. | 82 | enabled via the CONFIG_TREE_PREEMPT_RCU kernel configuration |
83 | However, work is in progress for enabling priority boosting of | 83 | parameter. However, work is in progress for enabling priority |
84 | preempted RCU read-side critical sections. This is needed if you | 84 | boosting of preempted RCU read-side critical sections. This is |
85 | have CPU-bound realtime threads. | 85 | needed if you have CPU-bound realtime threads. |
86 | 86 | ||
87 | o Where can I find more information on RCU? | 87 | o Where can I find more information on RCU? |
88 | 88 | ||
diff --git a/Documentation/RCU/rcubarrier.txt b/Documentation/RCU/rcubarrier.txt index 909602d409bb..e439a0edee22 100644 --- a/Documentation/RCU/rcubarrier.txt +++ b/Documentation/RCU/rcubarrier.txt | |||
@@ -170,6 +170,13 @@ module invokes call_rcu() from timers, you will need to first cancel all | |||
170 | the timers, and only then invoke rcu_barrier() to wait for any remaining | 170 | the timers, and only then invoke rcu_barrier() to wait for any remaining |
171 | RCU callbacks to complete. | 171 | RCU callbacks to complete. |
172 | 172 | ||
173 | Of course, if you module uses call_rcu_bh(), you will need to invoke | ||
174 | rcu_barrier_bh() before unloading. Similarly, if your module uses | ||
175 | call_rcu_sched(), you will need to invoke rcu_barrier_sched() before | ||
176 | unloading. If your module uses call_rcu(), call_rcu_bh(), -and- | ||
177 | call_rcu_sched(), then you will need to invoke each of rcu_barrier(), | ||
178 | rcu_barrier_bh(), and rcu_barrier_sched(). | ||
179 | |||
173 | 180 | ||
174 | Implementing rcu_barrier() | 181 | Implementing rcu_barrier() |
175 | 182 | ||
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index a342b6e1cc10..9dba3bb90e60 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt | |||
@@ -76,8 +76,10 @@ torture_type The type of RCU to test: "rcu" for the rcu_read_lock() API, | |||
76 | "rcu_sync" for rcu_read_lock() with synchronous reclamation, | 76 | "rcu_sync" for rcu_read_lock() with synchronous reclamation, |
77 | "rcu_bh" for the rcu_read_lock_bh() API, "rcu_bh_sync" for | 77 | "rcu_bh" for the rcu_read_lock_bh() API, "rcu_bh_sync" for |
78 | rcu_read_lock_bh() with synchronous reclamation, "srcu" for | 78 | rcu_read_lock_bh() with synchronous reclamation, "srcu" for |
79 | the "srcu_read_lock()" API, and "sched" for the use of | 79 | the "srcu_read_lock()" API, "sched" for the use of |
80 | preempt_disable() together with synchronize_sched(). | 80 | preempt_disable() together with synchronize_sched(), |
81 | and "sched_expedited" for the use of preempt_disable() | ||
82 | with synchronize_sched_expedited(). | ||
81 | 83 | ||
82 | verbose Enable debug printk()s. Default is disabled. | 84 | verbose Enable debug printk()s. Default is disabled. |
83 | 85 | ||
@@ -162,6 +164,23 @@ of the "old" and "current" counters for the corresponding CPU. The | |||
162 | "idx" value maps the "old" and "current" values to the underlying array, | 164 | "idx" value maps the "old" and "current" values to the underlying array, |
163 | and is useful for debugging. | 165 | and is useful for debugging. |
164 | 166 | ||
167 | Similarly, sched_expedited RCU provides the following: | ||
168 | |||
169 | sched_expedited-torture: rtc: d0000000016c1880 ver: 1090796 tfle: 0 rta: 1090796 rtaf: 0 rtf: 1090787 rtmbe: 0 nt: 27713319 | ||
170 | sched_expedited-torture: Reader Pipe: 12660320201 95875 0 0 0 0 0 0 0 0 0 | ||
171 | sched_expedited-torture: Reader Batch: 12660424885 0 0 0 0 0 0 0 0 0 0 | ||
172 | sched_expedited-torture: Free-Block Circulation: 1090795 1090795 1090794 1090793 1090792 1090791 1090790 1090789 1090788 1090787 0 | ||
173 | state: -1 / 0:0 3:0 4:0 | ||
174 | |||
175 | As before, the first four lines are similar to those for RCU. | ||
176 | The last line shows the task-migration state. The first number is | ||
177 | -1 if synchronize_sched_expedited() is idle, -2 if in the process of | ||
178 | posting wakeups to the migration kthreads, and N when waiting on CPU N. | ||
179 | Each of the colon-separated fields following the "/" is a CPU:state pair. | ||
180 | Valid states are "0" for idle, "1" for waiting for quiescent state, | ||
181 | "2" for passed through quiescent state, and "3" when a race with a | ||
182 | CPU-hotplug event forces use of the synchronize_sched() primitive. | ||
183 | |||
165 | 184 | ||
166 | USAGE | 185 | USAGE |
167 | 186 | ||
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index 02cced183b2d..187bbf10c923 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt | |||
@@ -191,8 +191,7 @@ rcu/rcuhier (which displays the struct rcu_node hierarchy). | |||
191 | 191 | ||
192 | The output of "cat rcu/rcudata" looks as follows: | 192 | The output of "cat rcu/rcudata" looks as follows: |
193 | 193 | ||
194 | rcu: | 194 | rcu_sched: |
195 | rcu: | ||
196 | 0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1 dn=0 df=1101 of=0 ri=36 ql=0 b=10 | 195 | 0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1 dn=0 df=1101 of=0 ri=36 ql=0 b=10 |
197 | 1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1 dn=0 df=1015 of=0 ri=0 ql=0 b=10 | 196 | 1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1 dn=0 df=1015 of=0 ri=0 ql=0 b=10 |
198 | 2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1 dn=0 df=1839 of=0 ri=0 ql=0 b=10 | 197 | 2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1 dn=0 df=1839 of=0 ri=0 ql=0 b=10 |
@@ -306,7 +305,7 @@ comma-separated-variable spreadsheet format. | |||
306 | 305 | ||
307 | The output of "cat rcu/rcugp" looks as follows: | 306 | The output of "cat rcu/rcugp" looks as follows: |
308 | 307 | ||
309 | rcu: completed=33062 gpnum=33063 | 308 | rcu_sched: completed=33062 gpnum=33063 |
310 | rcu_bh: completed=464 gpnum=464 | 309 | rcu_bh: completed=464 gpnum=464 |
311 | 310 | ||
312 | Again, this output is for both "rcu" and "rcu_bh". The fields are | 311 | Again, this output is for both "rcu" and "rcu_bh". The fields are |
@@ -413,7 +412,7 @@ o Each element of the form "1/1 0:127 ^0" represents one struct | |||
413 | 412 | ||
414 | The output of "cat rcu/rcu_pending" looks as follows: | 413 | The output of "cat rcu/rcu_pending" looks as follows: |
415 | 414 | ||
416 | rcu: | 415 | rcu_sched: |
417 | 0 np=255892 qsp=53936 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741 | 416 | 0 np=255892 qsp=53936 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741 |
418 | 1 np=261224 qsp=54638 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792 | 417 | 1 np=261224 qsp=54638 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792 |
419 | 2 np=237496 qsp=49664 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629 | 418 | 2 np=237496 qsp=49664 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629 |
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 96170824a717..e41a7fecf0d3 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt | |||
@@ -136,10 +136,10 @@ rcu_read_lock() | |||
136 | Used by a reader to inform the reclaimer that the reader is | 136 | Used by a reader to inform the reclaimer that the reader is |
137 | entering an RCU read-side critical section. It is illegal | 137 | entering an RCU read-side critical section. It is illegal |
138 | to block while in an RCU read-side critical section, though | 138 | to block while in an RCU read-side critical section, though |
139 | kernels built with CONFIG_PREEMPT_RCU can preempt RCU read-side | 139 | kernels built with CONFIG_TREE_PREEMPT_RCU can preempt RCU |
140 | critical sections. Any RCU-protected data structure accessed | 140 | read-side critical sections. Any RCU-protected data structure |
141 | during an RCU read-side critical section is guaranteed to remain | 141 | accessed during an RCU read-side critical section is guaranteed to |
142 | unreclaimed for the full duration of that critical section. | 142 | remain unreclaimed for the full duration of that critical section. |
143 | Reference counts may be used in conjunction with RCU to maintain | 143 | Reference counts may be used in conjunction with RCU to maintain |
144 | longer-term references to data structures. | 144 | longer-term references to data structures. |
145 | 145 | ||
@@ -785,6 +785,7 @@ RCU pointer/list traversal: | |||
785 | rcu_dereference | 785 | rcu_dereference |
786 | list_for_each_entry_rcu | 786 | list_for_each_entry_rcu |
787 | hlist_for_each_entry_rcu | 787 | hlist_for_each_entry_rcu |
788 | hlist_nulls_for_each_entry_rcu | ||
788 | 789 | ||
789 | list_for_each_continue_rcu (to be deprecated in favor of new | 790 | list_for_each_continue_rcu (to be deprecated in favor of new |
790 | list_for_each_entry_continue_rcu) | 791 | list_for_each_entry_continue_rcu) |
@@ -807,19 +808,23 @@ RCU: Critical sections Grace period Barrier | |||
807 | 808 | ||
808 | rcu_read_lock synchronize_net rcu_barrier | 809 | rcu_read_lock synchronize_net rcu_barrier |
809 | rcu_read_unlock synchronize_rcu | 810 | rcu_read_unlock synchronize_rcu |
811 | synchronize_rcu_expedited | ||
810 | call_rcu | 812 | call_rcu |
811 | 813 | ||
812 | 814 | ||
813 | bh: Critical sections Grace period Barrier | 815 | bh: Critical sections Grace period Barrier |
814 | 816 | ||
815 | rcu_read_lock_bh call_rcu_bh rcu_barrier_bh | 817 | rcu_read_lock_bh call_rcu_bh rcu_barrier_bh |
816 | rcu_read_unlock_bh | 818 | rcu_read_unlock_bh synchronize_rcu_bh |
819 | synchronize_rcu_bh_expedited | ||
817 | 820 | ||
818 | 821 | ||
819 | sched: Critical sections Grace period Barrier | 822 | sched: Critical sections Grace period Barrier |
820 | 823 | ||
821 | [preempt_disable] synchronize_sched rcu_barrier_sched | 824 | rcu_read_lock_sched synchronize_sched rcu_barrier_sched |
822 | [and friends] call_rcu_sched | 825 | rcu_read_unlock_sched call_rcu_sched |
826 | [preempt_disable] synchronize_sched_expedited | ||
827 | [and friends] | ||
823 | 828 | ||
824 | 829 | ||
825 | SRCU: Critical sections Grace period Barrier | 830 | SRCU: Critical sections Grace period Barrier |
@@ -827,6 +832,9 @@ SRCU: Critical sections Grace period Barrier | |||
827 | srcu_read_lock synchronize_srcu N/A | 832 | srcu_read_lock synchronize_srcu N/A |
828 | srcu_read_unlock | 833 | srcu_read_unlock |
829 | 834 | ||
835 | SRCU: Initialization/cleanup | ||
836 | init_srcu_struct | ||
837 | cleanup_srcu_struct | ||
830 | 838 | ||
831 | See the comment headers in the source code (or the docbook generated | 839 | See the comment headers in the source code (or the docbook generated |
832 | from them) for more information. | 840 | from them) for more information. |
diff --git a/Documentation/arm/OMAP/omap_pm b/Documentation/arm/OMAP/omap_pm new file mode 100644 index 000000000000..5389440aade3 --- /dev/null +++ b/Documentation/arm/OMAP/omap_pm | |||
@@ -0,0 +1,129 @@ | |||
1 | |||
2 | The OMAP PM interface | ||
3 | ===================== | ||
4 | |||
5 | This document describes the temporary OMAP PM interface. Driver | ||
6 | authors use these functions to communicate minimum latency or | ||
7 | throughput constraints to the kernel power management code. | ||
8 | Over time, the intention is to merge features from the OMAP PM | ||
9 | interface into the Linux PM QoS code. | ||
10 | |||
11 | Drivers need to express PM parameters which: | ||
12 | |||
13 | - support the range of power management parameters present in the TI SRF; | ||
14 | |||
15 | - separate the drivers from the underlying PM parameter | ||
16 | implementation, whether it is the TI SRF or Linux PM QoS or Linux | ||
17 | latency framework or something else; | ||
18 | |||
19 | - specify PM parameters in terms of fundamental units, such as | ||
20 | latency and throughput, rather than units which are specific to OMAP | ||
21 | or to particular OMAP variants; | ||
22 | |||
23 | - allow drivers which are shared with other architectures (e.g., | ||
24 | DaVinci) to add these constraints in a way which won't affect non-OMAP | ||
25 | systems, | ||
26 | |||
27 | - can be implemented immediately with minimal disruption of other | ||
28 | architectures. | ||
29 | |||
30 | |||
31 | This document proposes the OMAP PM interface, including the following | ||
32 | five power management functions for driver code: | ||
33 | |||
34 | 1. Set the maximum MPU wakeup latency: | ||
35 | (*pdata->set_max_mpu_wakeup_lat)(struct device *dev, unsigned long t) | ||
36 | |||
37 | 2. Set the maximum device wakeup latency: | ||
38 | (*pdata->set_max_dev_wakeup_lat)(struct device *dev, unsigned long t) | ||
39 | |||
40 | 3. Set the maximum system DMA transfer start latency (CORE pwrdm): | ||
41 | (*pdata->set_max_sdma_lat)(struct device *dev, long t) | ||
42 | |||
43 | 4. Set the minimum bus throughput needed by a device: | ||
44 | (*pdata->set_min_bus_tput)(struct device *dev, u8 agent_id, unsigned long r) | ||
45 | |||
46 | 5. Return the number of times the device has lost context | ||
47 | (*pdata->get_dev_context_loss_count)(struct device *dev) | ||
48 | |||
49 | |||
50 | Further documentation for all OMAP PM interface functions can be | ||
51 | found in arch/arm/plat-omap/include/mach/omap-pm.h. | ||
52 | |||
53 | |||
54 | The OMAP PM layer is intended to be temporary | ||
55 | --------------------------------------------- | ||
56 | |||
57 | The intention is that eventually the Linux PM QoS layer should support | ||
58 | the range of power management features present in OMAP3. As this | ||
59 | happens, existing drivers using the OMAP PM interface can be modified | ||
60 | to use the Linux PM QoS code; and the OMAP PM interface can disappear. | ||
61 | |||
62 | |||
63 | Driver usage of the OMAP PM functions | ||
64 | ------------------------------------- | ||
65 | |||
66 | As the 'pdata' in the above examples indicates, these functions are | ||
67 | exposed to drivers through function pointers in driver .platform_data | ||
68 | structures. The function pointers are initialized by the board-*.c | ||
69 | files to point to the corresponding OMAP PM functions: | ||
70 | .set_max_dev_wakeup_lat will point to | ||
71 | omap_pm_set_max_dev_wakeup_lat(), etc. Other architectures which do | ||
72 | not support these functions should leave these function pointers set | ||
73 | to NULL. Drivers should use the following idiom: | ||
74 | |||
75 | if (pdata->set_max_dev_wakeup_lat) | ||
76 | (*pdata->set_max_dev_wakeup_lat)(dev, t); | ||
77 | |||
78 | The most common usage of these functions will probably be to specify | ||
79 | the maximum time from when an interrupt occurs, to when the device | ||
80 | becomes accessible. To accomplish this, driver writers should use the | ||
81 | set_max_mpu_wakeup_lat() function to to constrain the MPU wakeup | ||
82 | latency, and the set_max_dev_wakeup_lat() function to constrain the | ||
83 | device wakeup latency (from clk_enable() to accessibility). For | ||
84 | example, | ||
85 | |||
86 | /* Limit MPU wakeup latency */ | ||
87 | if (pdata->set_max_mpu_wakeup_lat) | ||
88 | (*pdata->set_max_mpu_wakeup_lat)(dev, tc); | ||
89 | |||
90 | /* Limit device powerdomain wakeup latency */ | ||
91 | if (pdata->set_max_dev_wakeup_lat) | ||
92 | (*pdata->set_max_dev_wakeup_lat)(dev, td); | ||
93 | |||
94 | /* total wakeup latency in this example: (tc + td) */ | ||
95 | |||
96 | The PM parameters can be overwritten by calling the function again | ||
97 | with the new value. The settings can be removed by calling the | ||
98 | function with a t argument of -1 (except in the case of | ||
99 | set_max_bus_tput(), which should be called with an r argument of 0). | ||
100 | |||
101 | The fifth function above, omap_pm_get_dev_context_loss_count(), | ||
102 | is intended as an optimization to allow drivers to determine whether the | ||
103 | device has lost its internal context. If context has been lost, the | ||
104 | driver must restore its internal context before proceeding. | ||
105 | |||
106 | |||
107 | Other specialized interface functions | ||
108 | ------------------------------------- | ||
109 | |||
110 | The five functions listed above are intended to be usable by any | ||
111 | device driver. DSPBridge and CPUFreq have a few special requirements. | ||
112 | DSPBridge expresses target DSP performance levels in terms of OPP IDs. | ||
113 | CPUFreq expresses target MPU performance levels in terms of MPU | ||
114 | frequency. The OMAP PM interface contains functions for these | ||
115 | specialized cases to convert that input information (OPPs/MPU | ||
116 | frequency) into the form that the underlying power management | ||
117 | implementation needs: | ||
118 | |||
119 | 6. (*pdata->dsp_get_opp_table)(void) | ||
120 | |||
121 | 7. (*pdata->dsp_set_min_opp)(u8 opp_id) | ||
122 | |||
123 | 8. (*pdata->dsp_get_opp)(void) | ||
124 | |||
125 | 9. (*pdata->cpu_get_freq_table)(void) | ||
126 | |||
127 | 10. (*pdata->cpu_set_freq)(unsigned long f) | ||
128 | |||
129 | 11. (*pdata->cpu_get_freq)(void) | ||
diff --git a/Documentation/arm/SA1100/ADSBitsy b/Documentation/arm/SA1100/ADSBitsy index ab47c3833908..7197a9e958ee 100644 --- a/Documentation/arm/SA1100/ADSBitsy +++ b/Documentation/arm/SA1100/ADSBitsy | |||
@@ -40,4 +40,4 @@ Notes: | |||
40 | mode, the timing is off so the image is corrupted. This will be | 40 | mode, the timing is off so the image is corrupted. This will be |
41 | fixed soon. | 41 | fixed soon. |
42 | 42 | ||
43 | Any contribution can be sent to nico@cam.org and will be greatly welcome! | 43 | Any contribution can be sent to nico@fluxnic.net and will be greatly welcome! |
diff --git a/Documentation/arm/SA1100/Assabet b/Documentation/arm/SA1100/Assabet index 78bc1c1b04e5..91f7ce7ba426 100644 --- a/Documentation/arm/SA1100/Assabet +++ b/Documentation/arm/SA1100/Assabet | |||
@@ -240,7 +240,7 @@ Then, rebooting the Assabet is just a matter of waiting for the login prompt. | |||
240 | 240 | ||
241 | 241 | ||
242 | Nicolas Pitre | 242 | Nicolas Pitre |
243 | nico@cam.org | 243 | nico@fluxnic.net |
244 | June 12, 2001 | 244 | June 12, 2001 |
245 | 245 | ||
246 | 246 | ||
diff --git a/Documentation/arm/SA1100/Brutus b/Documentation/arm/SA1100/Brutus index 2254c8f0b326..b1cfd405dccc 100644 --- a/Documentation/arm/SA1100/Brutus +++ b/Documentation/arm/SA1100/Brutus | |||
@@ -60,7 +60,7 @@ little modifications. | |||
60 | 60 | ||
61 | Any contribution is welcome. | 61 | Any contribution is welcome. |
62 | 62 | ||
63 | Please send patches to nico@cam.org | 63 | Please send patches to nico@fluxnic.net |
64 | 64 | ||
65 | Have Fun ! | 65 | Have Fun ! |
66 | 66 | ||
diff --git a/Documentation/arm/SA1100/GraphicsClient b/Documentation/arm/SA1100/GraphicsClient index 8fa7e8027ff1..6c9c4f5a36e1 100644 --- a/Documentation/arm/SA1100/GraphicsClient +++ b/Documentation/arm/SA1100/GraphicsClient | |||
@@ -4,7 +4,7 @@ For more details, contact Applied Data Systems or see | |||
4 | http://www.applieddata.net/products.html | 4 | http://www.applieddata.net/products.html |
5 | 5 | ||
6 | The original Linux support for this product has been provided by | 6 | The original Linux support for this product has been provided by |
7 | Nicolas Pitre <nico@cam.org>. Continued development work by | 7 | Nicolas Pitre <nico@fluxnic.net>. Continued development work by |
8 | Woojung Huh <whuh@applieddata.net> | 8 | Woojung Huh <whuh@applieddata.net> |
9 | 9 | ||
10 | It's currently possible to mount a root filesystem via NFS providing a | 10 | It's currently possible to mount a root filesystem via NFS providing a |
@@ -94,5 +94,5 @@ Notes: | |||
94 | mode, the timing is off so the image is corrupted. This will be | 94 | mode, the timing is off so the image is corrupted. This will be |
95 | fixed soon. | 95 | fixed soon. |
96 | 96 | ||
97 | Any contribution can be sent to nico@cam.org and will be greatly welcome! | 97 | Any contribution can be sent to nico@fluxnic.net and will be greatly welcome! |
98 | 98 | ||
diff --git a/Documentation/arm/SA1100/GraphicsMaster b/Documentation/arm/SA1100/GraphicsMaster index dd28745ac521..ee7c6595f23f 100644 --- a/Documentation/arm/SA1100/GraphicsMaster +++ b/Documentation/arm/SA1100/GraphicsMaster | |||
@@ -4,7 +4,7 @@ For more details, contact Applied Data Systems or see | |||
4 | http://www.applieddata.net/products.html | 4 | http://www.applieddata.net/products.html |
5 | 5 | ||
6 | The original Linux support for this product has been provided by | 6 | The original Linux support for this product has been provided by |
7 | Nicolas Pitre <nico@cam.org>. Continued development work by | 7 | Nicolas Pitre <nico@fluxnic.net>. Continued development work by |
8 | Woojung Huh <whuh@applieddata.net> | 8 | Woojung Huh <whuh@applieddata.net> |
9 | 9 | ||
10 | Use 'make graphicsmaster_config' before any 'make config'. | 10 | Use 'make graphicsmaster_config' before any 'make config'. |
@@ -50,4 +50,4 @@ Notes: | |||
50 | mode, the timing is off so the image is corrupted. This will be | 50 | mode, the timing is off so the image is corrupted. This will be |
51 | fixed soon. | 51 | fixed soon. |
52 | 52 | ||
53 | Any contribution can be sent to nico@cam.org and will be greatly welcome! | 53 | Any contribution can be sent to nico@fluxnic.net and will be greatly welcome! |
diff --git a/Documentation/arm/SA1100/Victor b/Documentation/arm/SA1100/Victor index 01e81fc49461..f938a29fdc20 100644 --- a/Documentation/arm/SA1100/Victor +++ b/Documentation/arm/SA1100/Victor | |||
@@ -9,7 +9,7 @@ Of course Victor is using Linux as its main operating system. | |||
9 | The Victor implementation for Linux is maintained by Nicolas Pitre: | 9 | The Victor implementation for Linux is maintained by Nicolas Pitre: |
10 | 10 | ||
11 | nico@visuaide.com | 11 | nico@visuaide.com |
12 | nico@cam.org | 12 | nico@fluxnic.net |
13 | 13 | ||
14 | For any comments, please feel free to contact me through the above | 14 | For any comments, please feel free to contact me through the above |
15 | addresses. | 15 | addresses. |
diff --git a/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt b/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt new file mode 100644 index 000000000000..76b3a11e90be --- /dev/null +++ b/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt | |||
@@ -0,0 +1,75 @@ | |||
1 | S3C24XX CPUfreq support | ||
2 | ======================= | ||
3 | |||
4 | Introduction | ||
5 | ------------ | ||
6 | |||
7 | The S3C24XX series support a number of power saving systems, such as | ||
8 | the ability to change the core, memory and peripheral operating | ||
9 | frequencies. The core control is exported via the CPUFreq driver | ||
10 | which has a number of different manual or automatic controls over the | ||
11 | rate the core is running at. | ||
12 | |||
13 | There are two forms of the driver depending on the specific CPU and | ||
14 | how the clocks are arranged. The first implementation used as single | ||
15 | PLL to feed the ARM, memory and peripherals via a series of dividers | ||
16 | and muxes and this is the implementation that is documented here. A | ||
17 | newer version where there is a seperate PLL and clock divider for the | ||
18 | ARM core is available as a seperate driver. | ||
19 | |||
20 | |||
21 | Layout | ||
22 | ------ | ||
23 | |||
24 | The code core manages the CPU specific drivers, any data that they | ||
25 | need to register and the interface to the generic drivers/cpufreq | ||
26 | system. Each CPU registers a driver to control the PLL, clock dividers | ||
27 | and anything else associated with it. Any board that wants to use this | ||
28 | framework needs to supply at least basic details of what is required. | ||
29 | |||
30 | The core registers with drivers/cpufreq at init time if all the data | ||
31 | necessary has been supplied. | ||
32 | |||
33 | |||
34 | CPU support | ||
35 | ----------- | ||
36 | |||
37 | The support for each CPU depends on the facilities provided by the | ||
38 | SoC and the driver as each device has different PLL and clock chains | ||
39 | associated with it. | ||
40 | |||
41 | |||
42 | Slow Mode | ||
43 | --------- | ||
44 | |||
45 | The SLOW mode where the PLL is turned off altogether and the | ||
46 | system is fed by the external crystal input is currently not | ||
47 | supported. | ||
48 | |||
49 | |||
50 | sysfs | ||
51 | ----- | ||
52 | |||
53 | The core code exports extra information via sysfs in the directory | ||
54 | devices/system/cpu/cpu0/arch-freq. | ||
55 | |||
56 | |||
57 | Board Support | ||
58 | ------------- | ||
59 | |||
60 | Each board that wants to use the cpufreq code must register some basic | ||
61 | information with the core driver to provide information about what the | ||
62 | board requires and any restrictions being placed on it. | ||
63 | |||
64 | The board needs to supply information about whether it needs the IO bank | ||
65 | timings changing, any maximum frequency limits and information about the | ||
66 | SDRAM refresh rate. | ||
67 | |||
68 | |||
69 | |||
70 | |||
71 | Document Author | ||
72 | --------------- | ||
73 | |||
74 | Ben Dooks, Copyright 2009 Simtec Electronics | ||
75 | Licensed under GPLv2 | ||
diff --git a/Documentation/btmrvl.txt b/Documentation/btmrvl.txt new file mode 100644 index 000000000000..34916a46c099 --- /dev/null +++ b/Documentation/btmrvl.txt | |||
@@ -0,0 +1,119 @@ | |||
1 | ======================================================================= | ||
2 | README for btmrvl driver | ||
3 | ======================================================================= | ||
4 | |||
5 | |||
6 | All commands are used via debugfs interface. | ||
7 | |||
8 | ===================== | ||
9 | Set/get driver configurations: | ||
10 | |||
11 | Path: /debug/btmrvl/config/ | ||
12 | |||
13 | gpiogap=[n] | ||
14 | hscfgcmd | ||
15 | These commands are used to configure the host sleep parameters. | ||
16 | bit 8:0 -- Gap | ||
17 | bit 16:8 -- GPIO | ||
18 | |||
19 | where GPIO is the pin number of GPIO used to wake up the host. | ||
20 | It could be any valid GPIO pin# (e.g. 0-7) or 0xff (SDIO interface | ||
21 | wakeup will be used instead). | ||
22 | |||
23 | where Gap is the gap in milli seconds between wakeup signal and | ||
24 | wakeup event, or 0xff for special host sleep setting. | ||
25 | |||
26 | Usage: | ||
27 | # Use SDIO interface to wake up the host and set GAP to 0x80: | ||
28 | echo 0xff80 > /debug/btmrvl/config/gpiogap | ||
29 | echo 1 > /debug/btmrvl/config/hscfgcmd | ||
30 | |||
31 | # Use GPIO pin #3 to wake up the host and set GAP to 0xff: | ||
32 | echo 0x03ff > /debug/btmrvl/config/gpiogap | ||
33 | echo 1 > /debug/btmrvl/config/hscfgcmd | ||
34 | |||
35 | psmode=[n] | ||
36 | pscmd | ||
37 | These commands are used to enable/disable auto sleep mode | ||
38 | |||
39 | where the option is: | ||
40 | 1 -- Enable auto sleep mode | ||
41 | 0 -- Disable auto sleep mode | ||
42 | |||
43 | Usage: | ||
44 | # Enable auto sleep mode | ||
45 | echo 1 > /debug/btmrvl/config/psmode | ||
46 | echo 1 > /debug/btmrvl/config/pscmd | ||
47 | |||
48 | # Disable auto sleep mode | ||
49 | echo 0 > /debug/btmrvl/config/psmode | ||
50 | echo 1 > /debug/btmrvl/config/pscmd | ||
51 | |||
52 | |||
53 | hsmode=[n] | ||
54 | hscmd | ||
55 | These commands are used to enable host sleep or wake up firmware | ||
56 | |||
57 | where the option is: | ||
58 | 1 -- Enable host sleep | ||
59 | 0 -- Wake up firmware | ||
60 | |||
61 | Usage: | ||
62 | # Enable host sleep | ||
63 | echo 1 > /debug/btmrvl/config/hsmode | ||
64 | echo 1 > /debug/btmrvl/config/hscmd | ||
65 | |||
66 | # Wake up firmware | ||
67 | echo 0 > /debug/btmrvl/config/hsmode | ||
68 | echo 1 > /debug/btmrvl/config/hscmd | ||
69 | |||
70 | |||
71 | ====================== | ||
72 | Get driver status: | ||
73 | |||
74 | Path: /debug/btmrvl/status/ | ||
75 | |||
76 | Usage: | ||
77 | cat /debug/btmrvl/status/<args> | ||
78 | |||
79 | where the args are: | ||
80 | |||
81 | curpsmode | ||
82 | This command displays current auto sleep status. | ||
83 | |||
84 | psstate | ||
85 | This command display the power save state. | ||
86 | |||
87 | hsstate | ||
88 | This command display the host sleep state. | ||
89 | |||
90 | txdnldrdy | ||
91 | This command displays the value of Tx download ready flag. | ||
92 | |||
93 | |||
94 | ===================== | ||
95 | |||
96 | Use hcitool to issue raw hci command, refer to hcitool manual | ||
97 | |||
98 | Usage: Hcitool cmd <ogf> <ocf> [Parameters] | ||
99 | |||
100 | Interface Control Command | ||
101 | hcitool cmd 0x3f 0x5b 0xf5 0x01 0x00 --Enable All interface | ||
102 | hcitool cmd 0x3f 0x5b 0xf5 0x01 0x01 --Enable Wlan interface | ||
103 | hcitool cmd 0x3f 0x5b 0xf5 0x01 0x02 --Enable BT interface | ||
104 | hcitool cmd 0x3f 0x5b 0xf5 0x00 0x00 --Disable All interface | ||
105 | hcitool cmd 0x3f 0x5b 0xf5 0x00 0x01 --Disable Wlan interface | ||
106 | hcitool cmd 0x3f 0x5b 0xf5 0x00 0x02 --Disable BT interface | ||
107 | |||
108 | ======================================================================= | ||
109 | |||
110 | |||
111 | SD8688 firmware: | ||
112 | |||
113 | /lib/firmware/sd8688_helper.bin | ||
114 | /lib/firmware/sd8688.bin | ||
115 | |||
116 | |||
117 | The images can be downloaded from: | ||
118 | |||
119 | git.infradead.org/users/dwmw2/linux-firmware.git/libertas/ | ||
diff --git a/Documentation/connector/Makefile b/Documentation/connector/Makefile index 8df1a7285a06..d98e4df98e24 100644 --- a/Documentation/connector/Makefile +++ b/Documentation/connector/Makefile | |||
@@ -9,3 +9,8 @@ hostprogs-y := ucon | |||
9 | always := $(hostprogs-y) | 9 | always := $(hostprogs-y) |
10 | 10 | ||
11 | HOSTCFLAGS_ucon.o += -I$(objtree)/usr/include | 11 | HOSTCFLAGS_ucon.o += -I$(objtree)/usr/include |
12 | |||
13 | all: modules | ||
14 | |||
15 | modules clean: | ||
16 | $(MAKE) -C ../.. SUBDIRS=$(PWD) $@ | ||
diff --git a/Documentation/connector/cn_test.c b/Documentation/connector/cn_test.c index 6a5be5d5c8e4..1711adc33373 100644 --- a/Documentation/connector/cn_test.c +++ b/Documentation/connector/cn_test.c | |||
@@ -19,6 +19,8 @@ | |||
19 | * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA | 19 | * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA |
20 | */ | 20 | */ |
21 | 21 | ||
22 | #define pr_fmt(fmt) "cn_test: " fmt | ||
23 | |||
22 | #include <linux/kernel.h> | 24 | #include <linux/kernel.h> |
23 | #include <linux/module.h> | 25 | #include <linux/module.h> |
24 | #include <linux/moduleparam.h> | 26 | #include <linux/moduleparam.h> |
@@ -27,18 +29,17 @@ | |||
27 | 29 | ||
28 | #include <linux/connector.h> | 30 | #include <linux/connector.h> |
29 | 31 | ||
30 | static struct cb_id cn_test_id = { 0x123, 0x456 }; | 32 | static struct cb_id cn_test_id = { CN_NETLINK_USERS + 3, 0x456 }; |
31 | static char cn_test_name[] = "cn_test"; | 33 | static char cn_test_name[] = "cn_test"; |
32 | static struct sock *nls; | 34 | static struct sock *nls; |
33 | static struct timer_list cn_test_timer; | 35 | static struct timer_list cn_test_timer; |
34 | 36 | ||
35 | void cn_test_callback(void *data) | 37 | static void cn_test_callback(struct cn_msg *msg) |
36 | { | 38 | { |
37 | struct cn_msg *msg = (struct cn_msg *)data; | 39 | pr_info("%s: %lu: idx=%x, val=%x, seq=%u, ack=%u, len=%d: %s.\n", |
38 | 40 | __func__, jiffies, msg->id.idx, msg->id.val, | |
39 | printk("%s: %lu: idx=%x, val=%x, seq=%u, ack=%u, len=%d: %s.\n", | 41 | msg->seq, msg->ack, msg->len, |
40 | __func__, jiffies, msg->id.idx, msg->id.val, | 42 | msg->len ? (char *)msg->data : ""); |
41 | msg->seq, msg->ack, msg->len, (char *)msg->data); | ||
42 | } | 43 | } |
43 | 44 | ||
44 | /* | 45 | /* |
@@ -63,9 +64,7 @@ static int cn_test_want_notify(void) | |||
63 | 64 | ||
64 | skb = alloc_skb(size, GFP_ATOMIC); | 65 | skb = alloc_skb(size, GFP_ATOMIC); |
65 | if (!skb) { | 66 | if (!skb) { |
66 | printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", | 67 | pr_err("failed to allocate new skb with size=%u\n", size); |
67 | size); | ||
68 | |||
69 | return -ENOMEM; | 68 | return -ENOMEM; |
70 | } | 69 | } |
71 | 70 | ||
@@ -114,12 +113,12 @@ static int cn_test_want_notify(void) | |||
114 | //netlink_broadcast(nls, skb, 0, ctl->group, GFP_ATOMIC); | 113 | //netlink_broadcast(nls, skb, 0, ctl->group, GFP_ATOMIC); |
115 | netlink_unicast(nls, skb, 0, 0); | 114 | netlink_unicast(nls, skb, 0, 0); |
116 | 115 | ||
117 | printk(KERN_INFO "Request was sent. Group=0x%x.\n", ctl->group); | 116 | pr_info("request was sent: group=0x%x\n", ctl->group); |
118 | 117 | ||
119 | return 0; | 118 | return 0; |
120 | 119 | ||
121 | nlmsg_failure: | 120 | nlmsg_failure: |
122 | printk(KERN_ERR "Failed to send %u.%u\n", msg->seq, msg->ack); | 121 | pr_err("failed to send %u.%u\n", msg->seq, msg->ack); |
123 | kfree_skb(skb); | 122 | kfree_skb(skb); |
124 | return -EINVAL; | 123 | return -EINVAL; |
125 | } | 124 | } |
@@ -131,6 +130,8 @@ static void cn_test_timer_func(unsigned long __data) | |||
131 | struct cn_msg *m; | 130 | struct cn_msg *m; |
132 | char data[32]; | 131 | char data[32]; |
133 | 132 | ||
133 | pr_debug("%s: timer fired with data %lu\n", __func__, __data); | ||
134 | |||
134 | m = kzalloc(sizeof(*m) + sizeof(data), GFP_ATOMIC); | 135 | m = kzalloc(sizeof(*m) + sizeof(data), GFP_ATOMIC); |
135 | if (m) { | 136 | if (m) { |
136 | 137 | ||
@@ -150,7 +151,7 @@ static void cn_test_timer_func(unsigned long __data) | |||
150 | 151 | ||
151 | cn_test_timer_counter++; | 152 | cn_test_timer_counter++; |
152 | 153 | ||
153 | mod_timer(&cn_test_timer, jiffies + HZ); | 154 | mod_timer(&cn_test_timer, jiffies + msecs_to_jiffies(1000)); |
154 | } | 155 | } |
155 | 156 | ||
156 | static int cn_test_init(void) | 157 | static int cn_test_init(void) |
@@ -168,8 +169,10 @@ static int cn_test_init(void) | |||
168 | } | 169 | } |
169 | 170 | ||
170 | setup_timer(&cn_test_timer, cn_test_timer_func, 0); | 171 | setup_timer(&cn_test_timer, cn_test_timer_func, 0); |
171 | cn_test_timer.expires = jiffies + HZ; | 172 | mod_timer(&cn_test_timer, jiffies + msecs_to_jiffies(1000)); |
172 | add_timer(&cn_test_timer); | 173 | |
174 | pr_info("initialized with id={%u.%u}\n", | ||
175 | cn_test_id.idx, cn_test_id.val); | ||
173 | 176 | ||
174 | return 0; | 177 | return 0; |
175 | 178 | ||
diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt index ad6e0ba7b38c..81e6bf6ead57 100644 --- a/Documentation/connector/connector.txt +++ b/Documentation/connector/connector.txt | |||
@@ -5,10 +5,10 @@ Kernel Connector. | |||
5 | Kernel connector - new netlink based userspace <-> kernel space easy | 5 | Kernel connector - new netlink based userspace <-> kernel space easy |
6 | to use communication module. | 6 | to use communication module. |
7 | 7 | ||
8 | Connector driver adds possibility to connect various agents using | 8 | The Connector driver makes it easy to connect various agents using a |
9 | netlink based network. One must register callback and | 9 | netlink based network. One must register a callback and an identifier. |
10 | identifier. When driver receives special netlink message with | 10 | When the driver receives a special netlink message with the appropriate |
11 | appropriate identifier, appropriate callback will be called. | 11 | identifier, the appropriate callback will be called. |
12 | 12 | ||
13 | From the userspace point of view it's quite straightforward: | 13 | From the userspace point of view it's quite straightforward: |
14 | 14 | ||
@@ -17,10 +17,10 @@ From the userspace point of view it's quite straightforward: | |||
17 | send(); | 17 | send(); |
18 | recv(); | 18 | recv(); |
19 | 19 | ||
20 | But if kernelspace want to use full power of such connections, driver | 20 | But if kernelspace wants to use the full power of such connections, the |
21 | writer must create special sockets, must know about struct sk_buff | 21 | driver writer must create special sockets, must know about struct sk_buff |
22 | handling... Connector allows any kernelspace agents to use netlink | 22 | handling, etc... The Connector driver allows any kernelspace agents to use |
23 | based networking for inter-process communication in a significantly | 23 | netlink based networking for inter-process communication in a significantly |
24 | easier way: | 24 | easier way: |
25 | 25 | ||
26 | int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *)); | 26 | int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *)); |
@@ -32,15 +32,15 @@ struct cb_id | |||
32 | __u32 val; | 32 | __u32 val; |
33 | }; | 33 | }; |
34 | 34 | ||
35 | idx and val are unique identifiers which must be registered in | 35 | idx and val are unique identifiers which must be registered in the |
36 | connector.h for in-kernel usage. void (*callback) (void *) - is a | 36 | connector.h header for in-kernel usage. void (*callback) (void *) is a |
37 | callback function which will be called when message with above idx.val | 37 | callback function which will be called when a message with above idx.val |
38 | will be received by connector core. Argument for that function must | 38 | is received by the connector core. The argument for that function must |
39 | be dereferenced to struct cn_msg *. | 39 | be dereferenced to struct cn_msg *. |
40 | 40 | ||
41 | struct cn_msg | 41 | struct cn_msg |
42 | { | 42 | { |
43 | struct cb_id id; | 43 | struct cb_id id; |
44 | 44 | ||
45 | __u32 seq; | 45 | __u32 seq; |
46 | __u32 ack; | 46 | __u32 ack; |
@@ -55,92 +55,95 @@ Connector interfaces. | |||
55 | 55 | ||
56 | int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *)); | 56 | int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *)); |
57 | 57 | ||
58 | Registers new callback with connector core. | 58 | Registers new callback with connector core. |
59 | 59 | ||
60 | struct cb_id *id - unique connector's user identifier. | 60 | struct cb_id *id - unique connector's user identifier. |
61 | It must be registered in connector.h for legal in-kernel users. | 61 | It must be registered in connector.h for legal in-kernel users. |
62 | char *name - connector's callback symbolic name. | 62 | char *name - connector's callback symbolic name. |
63 | void (*callback) (void *) - connector's callback. | 63 | void (*callback) (void *) - connector's callback. |
64 | Argument must be dereferenced to struct cn_msg *. | 64 | Argument must be dereferenced to struct cn_msg *. |
65 | 65 | ||
66 | |||
66 | void cn_del_callback(struct cb_id *id); | 67 | void cn_del_callback(struct cb_id *id); |
67 | 68 | ||
68 | Unregisters new callback with connector core. | 69 | Unregisters new callback with connector core. |
70 | |||
71 | struct cb_id *id - unique connector's user identifier. | ||
69 | 72 | ||
70 | struct cb_id *id - unique connector's user identifier. | ||
71 | 73 | ||
72 | int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask); | 74 | int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask); |
73 | 75 | ||
74 | Sends message to the specified groups. It can be safely called from | 76 | Sends message to the specified groups. It can be safely called from |
75 | softirq context, but may silently fail under strong memory pressure. | 77 | softirq context, but may silently fail under strong memory pressure. |
76 | If there are no listeners for given group -ESRCH can be returned. | 78 | If there are no listeners for given group -ESRCH can be returned. |
77 | 79 | ||
78 | struct cn_msg * - message header(with attached data). | 80 | struct cn_msg * - message header(with attached data). |
79 | u32 __group - destination group. | 81 | u32 __group - destination group. |
80 | If __group is zero, then appropriate group will | 82 | If __group is zero, then appropriate group will |
81 | be searched through all registered connector users, | 83 | be searched through all registered connector users, |
82 | and message will be delivered to the group which was | 84 | and message will be delivered to the group which was |
83 | created for user with the same ID as in msg. | 85 | created for user with the same ID as in msg. |
84 | If __group is not zero, then message will be delivered | 86 | If __group is not zero, then message will be delivered |
85 | to the specified group. | 87 | to the specified group. |
86 | int gfp_mask - GFP mask. | 88 | int gfp_mask - GFP mask. |
87 | 89 | ||
88 | Note: When registering new callback user, connector core assigns | 90 | Note: When registering new callback user, connector core assigns |
89 | netlink group to the user which is equal to it's id.idx. | 91 | netlink group to the user which is equal to it's id.idx. |
90 | 92 | ||
91 | /*****************************************/ | 93 | /*****************************************/ |
92 | Protocol description. | 94 | Protocol description. |
93 | /*****************************************/ | 95 | /*****************************************/ |
94 | 96 | ||
95 | Current offers transport layer with fixed header. Recommended | 97 | The current framework offers a transport layer with fixed headers. The |
96 | protocol which uses such header is following: | 98 | recommended protocol which uses such a header is as following: |
97 | 99 | ||
98 | msg->seq and msg->ack are used to determine message genealogy. When | 100 | msg->seq and msg->ack are used to determine message genealogy. When |
99 | someone sends message it puts there locally unique sequence and random | 101 | someone sends a message, they use a locally unique sequence and random |
100 | acknowledge numbers. Sequence number may be copied into | 102 | acknowledge number. The sequence number may be copied into |
101 | nlmsghdr->nlmsg_seq too. | 103 | nlmsghdr->nlmsg_seq too. |
102 | 104 | ||
103 | Sequence number is incremented with each message to be sent. | 105 | The sequence number is incremented with each message sent. |
104 | 106 | ||
105 | If we expect reply to our message, then sequence number in received | 107 | If you expect a reply to the message, then the sequence number in the |
106 | message MUST be the same as in original message, and acknowledge | 108 | received message MUST be the same as in the original message, and the |
107 | number MUST be the same + 1. | 109 | acknowledge number MUST be the same + 1. |
108 | 110 | ||
109 | If we receive message and it's sequence number is not equal to one we | 111 | If we receive a message and its sequence number is not equal to one we |
110 | are expecting, then it is new message. If we receive message and it's | 112 | are expecting, then it is a new message. If we receive a message and |
111 | sequence number is the same as one we are expecting, but it's | 113 | its sequence number is the same as one we are expecting, but its |
112 | acknowledge is not equal acknowledge number in original message + 1, | 114 | acknowledge is not equal to the acknowledge number in the original |
113 | then it is new message. | 115 | message + 1, then it is a new message. |
114 | 116 | ||
115 | Obviously, protocol header contains above id. | 117 | Obviously, the protocol header contains the above id. |
116 | 118 | ||
117 | connector allows event notification in the following form: kernel | 119 | The connector allows event notification in the following form: kernel |
118 | driver or userspace process can ask connector to notify it when | 120 | driver or userspace process can ask connector to notify it when |
119 | selected id's will be turned on or off(registered or unregistered it's | 121 | selected ids will be turned on or off (registered or unregistered its |
120 | callback). It is done by sending special command to connector | 122 | callback). It is done by sending a special command to the connector |
121 | driver(it also registers itself with id={-1, -1}). | 123 | driver (it also registers itself with id={-1, -1}). |
122 | 124 | ||
123 | As example of usage Documentation/connector now contains cn_test.c - | 125 | As example of this usage can be found in the cn_test.c module which |
124 | testing module which uses connector to request notification and to | 126 | uses the connector to request notification and to send messages. |
125 | send messages. | ||
126 | 127 | ||
127 | /*****************************************/ | 128 | /*****************************************/ |
128 | Reliability. | 129 | Reliability. |
129 | /*****************************************/ | 130 | /*****************************************/ |
130 | 131 | ||
131 | Netlink itself is not reliable protocol, that means that messages can | 132 | Netlink itself is not a reliable protocol. That means that messages can |
132 | be lost due to memory pressure or process' receiving queue overflowed, | 133 | be lost due to memory pressure or process' receiving queue overflowed, |
133 | so caller is warned must be prepared. That is why struct cn_msg [main | 134 | so caller is warned that it must be prepared. That is why the struct |
134 | connector's message header] contains u32 seq and u32 ack fields. | 135 | cn_msg [main connector's message header] contains u32 seq and u32 ack |
136 | fields. | ||
135 | 137 | ||
136 | /*****************************************/ | 138 | /*****************************************/ |
137 | Userspace usage. | 139 | Userspace usage. |
138 | /*****************************************/ | 140 | /*****************************************/ |
141 | |||
139 | 2.6.14 has a new netlink socket implementation, which by default does not | 142 | 2.6.14 has a new netlink socket implementation, which by default does not |
140 | allow to send data to netlink groups other than 1. | 143 | allow people to send data to netlink groups other than 1. |
141 | So, if to use netlink socket (for example using connector) | 144 | So, if you wish to use a netlink socket (for example using connector) |
142 | with different group number userspace application must subscribe to | 145 | with a different group number, the userspace application must subscribe to |
143 | that group. It can be achieved by following pseudocode: | 146 | that group first. It can be achieved by the following pseudocode: |
144 | 147 | ||
145 | s = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR); | 148 | s = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR); |
146 | 149 | ||
@@ -160,8 +163,8 @@ if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) { | |||
160 | } | 163 | } |
161 | 164 | ||
162 | Where 270 above is SOL_NETLINK, and 1 is a NETLINK_ADD_MEMBERSHIP socket | 165 | Where 270 above is SOL_NETLINK, and 1 is a NETLINK_ADD_MEMBERSHIP socket |
163 | option. To drop multicast subscription one should call above socket option | 166 | option. To drop a multicast subscription, one should call the above socket |
164 | with NETLINK_DROP_MEMBERSHIP parameter which is defined as 0. | 167 | option with the NETLINK_DROP_MEMBERSHIP parameter which is defined as 0. |
165 | 168 | ||
166 | 2.6.14 netlink code only allows to select a group which is less or equal to | 169 | 2.6.14 netlink code only allows to select a group which is less or equal to |
167 | the maximum group number, which is used at netlink_kernel_create() time. | 170 | the maximum group number, which is used at netlink_kernel_create() time. |
diff --git a/Documentation/connector/ucon.c b/Documentation/connector/ucon.c index c5092ad0ce4b..4848db8c71ff 100644 --- a/Documentation/connector/ucon.c +++ b/Documentation/connector/ucon.c | |||
@@ -30,18 +30,24 @@ | |||
30 | 30 | ||
31 | #include <arpa/inet.h> | 31 | #include <arpa/inet.h> |
32 | 32 | ||
33 | #include <stdbool.h> | ||
33 | #include <stdio.h> | 34 | #include <stdio.h> |
34 | #include <stdlib.h> | 35 | #include <stdlib.h> |
35 | #include <unistd.h> | 36 | #include <unistd.h> |
36 | #include <string.h> | 37 | #include <string.h> |
37 | #include <errno.h> | 38 | #include <errno.h> |
38 | #include <time.h> | 39 | #include <time.h> |
40 | #include <getopt.h> | ||
39 | 41 | ||
40 | #include <linux/connector.h> | 42 | #include <linux/connector.h> |
41 | 43 | ||
42 | #define DEBUG | 44 | #define DEBUG |
43 | #define NETLINK_CONNECTOR 11 | 45 | #define NETLINK_CONNECTOR 11 |
44 | 46 | ||
47 | /* Hopefully your userspace connector.h matches this kernel */ | ||
48 | #define CN_TEST_IDX CN_NETLINK_USERS + 3 | ||
49 | #define CN_TEST_VAL 0x456 | ||
50 | |||
45 | #ifdef DEBUG | 51 | #ifdef DEBUG |
46 | #define ulog(f, a...) fprintf(stdout, f, ##a) | 52 | #define ulog(f, a...) fprintf(stdout, f, ##a) |
47 | #else | 53 | #else |
@@ -83,6 +89,25 @@ static int netlink_send(int s, struct cn_msg *msg) | |||
83 | return err; | 89 | return err; |
84 | } | 90 | } |
85 | 91 | ||
92 | static void usage(void) | ||
93 | { | ||
94 | printf( | ||
95 | "Usage: ucon [options] [output file]\n" | ||
96 | "\n" | ||
97 | "\t-h\tthis help screen\n" | ||
98 | "\t-s\tsend buffers to the test module\n" | ||
99 | "\n" | ||
100 | "The default behavior of ucon is to subscribe to the test module\n" | ||
101 | "and wait for state messages. Any ones received are dumped to the\n" | ||
102 | "specified output file (or stdout). The test module is assumed to\n" | ||
103 | "have an id of {%u.%u}\n" | ||
104 | "\n" | ||
105 | "If you get no output, then verify the cn_test module id matches\n" | ||
106 | "the expected id above.\n" | ||
107 | , CN_TEST_IDX, CN_TEST_VAL | ||
108 | ); | ||
109 | } | ||
110 | |||
86 | int main(int argc, char *argv[]) | 111 | int main(int argc, char *argv[]) |
87 | { | 112 | { |
88 | int s; | 113 | int s; |
@@ -94,17 +119,34 @@ int main(int argc, char *argv[]) | |||
94 | FILE *out; | 119 | FILE *out; |
95 | time_t tm; | 120 | time_t tm; |
96 | struct pollfd pfd; | 121 | struct pollfd pfd; |
122 | bool send_msgs = false; | ||
97 | 123 | ||
98 | if (argc < 2) | 124 | while ((s = getopt(argc, argv, "hs")) != -1) { |
99 | out = stdout; | 125 | switch (s) { |
100 | else { | 126 | case 's': |
101 | out = fopen(argv[1], "a+"); | 127 | send_msgs = true; |
128 | break; | ||
129 | |||
130 | case 'h': | ||
131 | usage(); | ||
132 | return 0; | ||
133 | |||
134 | default: | ||
135 | /* getopt() outputs an error for us */ | ||
136 | usage(); | ||
137 | return 1; | ||
138 | } | ||
139 | } | ||
140 | |||
141 | if (argc != optind) { | ||
142 | out = fopen(argv[optind], "a+"); | ||
102 | if (!out) { | 143 | if (!out) { |
103 | ulog("Unable to open %s for writing: %s\n", | 144 | ulog("Unable to open %s for writing: %s\n", |
104 | argv[1], strerror(errno)); | 145 | argv[1], strerror(errno)); |
105 | out = stdout; | 146 | out = stdout; |
106 | } | 147 | } |
107 | } | 148 | } else |
149 | out = stdout; | ||
108 | 150 | ||
109 | memset(buf, 0, sizeof(buf)); | 151 | memset(buf, 0, sizeof(buf)); |
110 | 152 | ||
@@ -115,9 +157,11 @@ int main(int argc, char *argv[]) | |||
115 | } | 157 | } |
116 | 158 | ||
117 | l_local.nl_family = AF_NETLINK; | 159 | l_local.nl_family = AF_NETLINK; |
118 | l_local.nl_groups = 0x123; /* bitmask of requested groups */ | 160 | l_local.nl_groups = -1; /* bitmask of requested groups */ |
119 | l_local.nl_pid = 0; | 161 | l_local.nl_pid = 0; |
120 | 162 | ||
163 | ulog("subscribing to %u.%u\n", CN_TEST_IDX, CN_TEST_VAL); | ||
164 | |||
121 | if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) { | 165 | if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) { |
122 | perror("bind"); | 166 | perror("bind"); |
123 | close(s); | 167 | close(s); |
@@ -130,15 +174,15 @@ int main(int argc, char *argv[]) | |||
130 | setsockopt(s, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, &on, sizeof(on)); | 174 | setsockopt(s, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, &on, sizeof(on)); |
131 | } | 175 | } |
132 | #endif | 176 | #endif |
133 | if (0) { | 177 | if (send_msgs) { |
134 | int i, j; | 178 | int i, j; |
135 | 179 | ||
136 | memset(buf, 0, sizeof(buf)); | 180 | memset(buf, 0, sizeof(buf)); |
137 | 181 | ||
138 | data = (struct cn_msg *)buf; | 182 | data = (struct cn_msg *)buf; |
139 | 183 | ||
140 | data->id.idx = 0x123; | 184 | data->id.idx = CN_TEST_IDX; |
141 | data->id.val = 0x456; | 185 | data->id.val = CN_TEST_VAL; |
142 | data->seq = seq++; | 186 | data->seq = seq++; |
143 | data->ack = 0; | 187 | data->ack = 0; |
144 | data->len = 0; | 188 | data->len = 0; |
diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt index 5d5f5fadd1c2..2a5b850847c0 100644 --- a/Documentation/cpu-freq/user-guide.txt +++ b/Documentation/cpu-freq/user-guide.txt | |||
@@ -176,7 +176,9 @@ scaling_governor, and by "echoing" the name of another | |||
176 | work on some specific architectures or | 176 | work on some specific architectures or |
177 | processors. | 177 | processors. |
178 | 178 | ||
179 | cpuinfo_cur_freq : Current speed of the CPU, in KHz. | 179 | cpuinfo_cur_freq : Current frequency of the CPU as obtained from |
180 | the hardware, in KHz. This is the frequency | ||
181 | the CPU actually runs at. | ||
180 | 182 | ||
181 | scaling_available_frequencies : List of available frequencies, in KHz. | 183 | scaling_available_frequencies : List of available frequencies, in KHz. |
182 | 184 | ||
@@ -196,7 +198,10 @@ related_cpus : List of CPUs that need some sort of frequency | |||
196 | 198 | ||
197 | scaling_driver : Hardware driver for cpufreq. | 199 | scaling_driver : Hardware driver for cpufreq. |
198 | 200 | ||
199 | scaling_cur_freq : Current frequency of the CPU, in KHz. | 201 | scaling_cur_freq : Current frequency of the CPU as determined by |
202 | the governor and cpufreq core, in KHz. This is | ||
203 | the frequency the kernel thinks the CPU runs | ||
204 | at. | ||
200 | 205 | ||
201 | If you have selected the "userspace" governor which allows you to | 206 | If you have selected the "userspace" governor which allows you to |
202 | set the CPU operating frequency to a specific value, you can read out | 207 | set the CPU operating frequency to a specific value, you can read out |
diff --git a/Documentation/dontdiff b/Documentation/dontdiff index 88519daab6e9..e1efc400bed6 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff | |||
@@ -152,7 +152,6 @@ piggy.gz | |||
152 | piggyback | 152 | piggyback |
153 | pnmtologo | 153 | pnmtologo |
154 | ppc_defs.h* | 154 | ppc_defs.h* |
155 | promcon_tbl.c | ||
156 | pss_boot.h | 155 | pss_boot.h |
157 | qconf | 156 | qconf |
158 | raid6altivec*.c | 157 | raid6altivec*.c |
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 09e031c55887..fa75220f8d34 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -6,6 +6,35 @@ be removed from this file. | |||
6 | 6 | ||
7 | --------------------------- | 7 | --------------------------- |
8 | 8 | ||
9 | What: PRISM54 | ||
10 | When: 2.6.34 | ||
11 | |||
12 | Why: prism54 FullMAC PCI / Cardbus devices used to be supported only by the | ||
13 | prism54 wireless driver. After Intersil stopped selling these | ||
14 | devices in preference for the newer more flexible SoftMAC devices | ||
15 | a SoftMAC device driver was required and prism54 did not support | ||
16 | them. The p54pci driver now exists and has been present in the kernel for | ||
17 | a while. This driver supports both SoftMAC devices and FullMAC devices. | ||
18 | The main difference between these devices was the amount of memory which | ||
19 | could be used for the firmware. The SoftMAC devices support a smaller | ||
20 | amount of memory. Because of this the SoftMAC firmware fits into FullMAC | ||
21 | devices's memory. p54pci supports not only PCI / Cardbus but also USB | ||
22 | and SPI. Since p54pci supports all devices prism54 supports | ||
23 | you will have a conflict. I'm not quite sure how distributions are | ||
24 | handling this conflict right now. prism54 was kept around due to | ||
25 | claims users may experience issues when using the SoftMAC driver. | ||
26 | Time has passed users have not reported issues. If you use prism54 | ||
27 | and for whatever reason you cannot use p54pci please let us know! | ||
28 | E-mail us at: linux-wireless@vger.kernel.org | ||
29 | |||
30 | For more information see the p54 wiki page: | ||
31 | |||
32 | http://wireless.kernel.org/en/users/Drivers/p54 | ||
33 | |||
34 | Who: Luis R. Rodriguez <lrodriguez@atheros.com> | ||
35 | |||
36 | --------------------------- | ||
37 | |||
9 | What: IRQF_SAMPLE_RANDOM | 38 | What: IRQF_SAMPLE_RANDOM |
10 | Check: IRQF_SAMPLE_RANDOM | 39 | Check: IRQF_SAMPLE_RANDOM |
11 | When: July 2009 | 40 | When: July 2009 |
@@ -206,24 +235,6 @@ Who: Len Brown <len.brown@intel.com> | |||
206 | 235 | ||
207 | --------------------------- | 236 | --------------------------- |
208 | 237 | ||
209 | What: libata spindown skipping and warning | ||
210 | When: Dec 2008 | ||
211 | Why: Some halt(8) implementations synchronize caches for and spin | ||
212 | down libata disks because libata didn't use to spin down disk on | ||
213 | system halt (only synchronized caches). | ||
214 | Spin down on system halt is now implemented. sysfs node | ||
215 | /sys/class/scsi_disk/h:c:i:l/manage_start_stop is present if | ||
216 | spin down support is available. | ||
217 | Because issuing spin down command to an already spun down disk | ||
218 | makes some disks spin up just to spin down again, libata tracks | ||
219 | device spindown status to skip the extra spindown command and | ||
220 | warn about it. | ||
221 | This is to give userspace tools the time to get updated and will | ||
222 | be removed after userspace is reasonably updated. | ||
223 | Who: Tejun Heo <htejun@gmail.com> | ||
224 | |||
225 | --------------------------- | ||
226 | |||
227 | What: i386/x86_64 bzImage symlinks | 238 | What: i386/x86_64 bzImage symlinks |
228 | When: April 2010 | 239 | When: April 2010 |
229 | 240 | ||
@@ -235,31 +246,6 @@ Who: Thomas Gleixner <tglx@linutronix.de> | |||
235 | --------------------------- | 246 | --------------------------- |
236 | 247 | ||
237 | What (Why): | 248 | What (Why): |
238 | - include/linux/netfilter_ipv4/ipt_TOS.h ipt_tos.h header files | ||
239 | (superseded by xt_TOS/xt_tos target & match) | ||
240 | |||
241 | - "forwarding" header files like ipt_mac.h in | ||
242 | include/linux/netfilter_ipv4/ and include/linux/netfilter_ipv6/ | ||
243 | |||
244 | - xt_CONNMARK match revision 0 | ||
245 | (superseded by xt_CONNMARK match revision 1) | ||
246 | |||
247 | - xt_MARK target revisions 0 and 1 | ||
248 | (superseded by xt_MARK match revision 2) | ||
249 | |||
250 | - xt_connmark match revision 0 | ||
251 | (superseded by xt_connmark match revision 1) | ||
252 | |||
253 | - xt_conntrack match revision 0 | ||
254 | (superseded by xt_conntrack match revision 1) | ||
255 | |||
256 | - xt_iprange match revision 0, | ||
257 | include/linux/netfilter_ipv4/ipt_iprange.h | ||
258 | (superseded by xt_iprange match revision 1) | ||
259 | |||
260 | - xt_mark match revision 0 | ||
261 | (superseded by xt_mark match revision 1) | ||
262 | |||
263 | - xt_recent: the old ipt_recent proc dir | 249 | - xt_recent: the old ipt_recent proc dir |
264 | (superseded by /proc/net/xt_recent) | 250 | (superseded by /proc/net/xt_recent) |
265 | 251 | ||
@@ -394,15 +380,6 @@ Who: Thomas Gleixner <tglx@linutronix.de> | |||
394 | 380 | ||
395 | ----------------------------- | 381 | ----------------------------- |
396 | 382 | ||
397 | What: obsolete generic irq defines and typedefs | ||
398 | When: 2.6.30 | ||
399 | Why: The defines and typedefs (hw_interrupt_type, no_irq_type, irq_desc_t) | ||
400 | have been kept around for migration reasons. After more than two years | ||
401 | it's time to remove them finally | ||
402 | Who: Thomas Gleixner <tglx@linutronix.de> | ||
403 | |||
404 | --------------------------- | ||
405 | |||
406 | What: fakephp and associated sysfs files in /sys/bus/pci/slots/ | 383 | What: fakephp and associated sysfs files in /sys/bus/pci/slots/ |
407 | When: 2011 | 384 | When: 2011 |
408 | Why: In 2.6.27, the semantics of /sys/bus/pci/slots was redefined to | 385 | Why: In 2.6.27, the semantics of /sys/bus/pci/slots was redefined to |
@@ -451,16 +428,6 @@ Who: Johannes Berg <johannes@sipsolutions.net> | |||
451 | 428 | ||
452 | ---------------------------- | 429 | ---------------------------- |
453 | 430 | ||
454 | What: CONFIG_X86_OLD_MCE | ||
455 | When: 2.6.32 | ||
456 | Why: Remove the old legacy 32bit machine check code. This has been | ||
457 | superseded by the newer machine check code from the 64bit port, | ||
458 | but the old version has been kept around for easier testing. Note this | ||
459 | doesn't impact the old P5 and WinChip machine check handlers. | ||
460 | Who: Andi Kleen <andi@firstfloor.org> | ||
461 | |||
462 | ---------------------------- | ||
463 | |||
464 | What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be | 431 | What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be |
465 | exported interface anymore. | 432 | exported interface anymore. |
466 | When: 2.6.33 | 433 | When: 2.6.33 |
@@ -468,3 +435,27 @@ Why: cpu_policy_rwsem has a new cleaner definition making it local to | |||
468 | cpufreq core and contained inside cpufreq.c. Other dependent | 435 | cpufreq core and contained inside cpufreq.c. Other dependent |
469 | drivers should not use it in order to safely avoid lockdep issues. | 436 | drivers should not use it in order to safely avoid lockdep issues. |
470 | Who: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> | 437 | Who: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> |
438 | |||
439 | ---------------------------- | ||
440 | |||
441 | What: sound-slot/service-* module aliases and related clutters in | ||
442 | sound/sound_core.c | ||
443 | When: August 2010 | ||
444 | Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR | ||
445 | (14) and requests modules using custom sound-slot/service-* | ||
446 | module aliases. The only benefit of doing this is allowing | ||
447 | use of custom module aliases which might as well be considered | ||
448 | a bug at this point. This preemptive claiming prevents | ||
449 | alternative OSS implementations. | ||
450 | |||
451 | Till the feature is removed, the kernel will be requesting | ||
452 | both sound-slot/service-* and the standard char-major-* module | ||
453 | aliases and allow turning off the pre-claiming selectively via | ||
454 | CONFIG_SOUND_OSS_CORE_PRECLAIM and soundcore.preclaim_oss | ||
455 | kernel parameter. | ||
456 | |||
457 | After the transition phase is complete, both the custom module | ||
458 | aliases and switches to disable it will go away. This removal | ||
459 | will also allow making ALSA OSS emulation independent of | ||
460 | sound_core. The dependency will be broken then too. | ||
461 | Who: Tejun Heo <tj@kernel.org> | ||
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 7be02ac5fa36..18b5ec8cea45 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt | |||
@@ -134,15 +134,9 @@ ro Mount filesystem read only. Note that ext4 will | |||
134 | mount options "ro,noload" can be used to prevent | 134 | mount options "ro,noload" can be used to prevent |
135 | writes to the filesystem. | 135 | writes to the filesystem. |
136 | 136 | ||
137 | journal_checksum Enable checksumming of the journal transactions. | ||
138 | This will allow the recovery code in e2fsck and the | ||
139 | kernel to detect corruption in the kernel. It is a | ||
140 | compatible change and will be ignored by older kernels. | ||
141 | |||
142 | journal_async_commit Commit block can be written to disk without waiting | 137 | journal_async_commit Commit block can be written to disk without waiting |
143 | for descriptor blocks. If enabled older kernels cannot | 138 | for descriptor blocks. If enabled older kernels cannot |
144 | mount the device. This will enable 'journal_checksum' | 139 | mount the device. |
145 | internally. | ||
146 | 140 | ||
147 | journal=update Update the ext4 file system's journal to the current | 141 | journal=update Update the ext4 file system's journal to the current |
148 | format. | 142 | format. |
@@ -263,10 +257,18 @@ resuid=n The user ID which may use the reserved blocks. | |||
263 | 257 | ||
264 | sb=n Use alternate superblock at this location. | 258 | sb=n Use alternate superblock at this location. |
265 | 259 | ||
266 | quota | 260 | quota These options are ignored by the filesystem. They |
267 | noquota | 261 | noquota are used only by quota tools to recognize volumes |
268 | grpquota | 262 | grpquota where quota should be turned on. See documentation |
269 | usrquota | 263 | usrquota in the quota-tools package for more details |
264 | (http://sourceforge.net/projects/linuxquota). | ||
265 | |||
266 | jqfmt=<quota type> These options tell filesystem details about quota | ||
267 | usrjquota=<file> so that quota information can be properly updated | ||
268 | grpjquota=<file> during journal replay. They replace the above | ||
269 | quota options. See documentation in the quota-tools | ||
270 | package for more details | ||
271 | (http://sourceforge.net/projects/linuxquota). | ||
270 | 272 | ||
271 | bh (*) ext4 associates buffer heads to data pages to | 273 | bh (*) ext4 associates buffer heads to data pages to |
272 | nobh (a) cache disk block mapping information | 274 | nobh (a) cache disk block mapping information |
diff --git a/Documentation/filesystems/gfs2-uevents.txt b/Documentation/filesystems/gfs2-uevents.txt new file mode 100644 index 000000000000..fd966dc9979a --- /dev/null +++ b/Documentation/filesystems/gfs2-uevents.txt | |||
@@ -0,0 +1,100 @@ | |||
1 | uevents and GFS2 | ||
2 | ================== | ||
3 | |||
4 | During the lifetime of a GFS2 mount, a number of uevents are generated. | ||
5 | This document explains what the events are and what they are used | ||
6 | for (by gfs_controld in gfs2-utils). | ||
7 | |||
8 | A list of GFS2 uevents | ||
9 | ----------------------- | ||
10 | |||
11 | 1. ADD | ||
12 | |||
13 | The ADD event occurs at mount time. It will always be the first | ||
14 | uevent generated by the newly created filesystem. If the mount | ||
15 | is successful, an ONLINE uevent will follow. If it is not successful | ||
16 | then a REMOVE uevent will follow. | ||
17 | |||
18 | The ADD uevent has two environment variables: SPECTATOR=[0|1] | ||
19 | and RDONLY=[0|1] that specify the spectator status (a read-only mount | ||
20 | with no journal assigned), and read-only (with journal assigned) status | ||
21 | of the filesystem respectively. | ||
22 | |||
23 | 2. ONLINE | ||
24 | |||
25 | The ONLINE uevent is generated after a successful mount or remount. It | ||
26 | has the same environment variables as the ADD uevent. The ONLINE | ||
27 | uevent, along with the two environment variables for spectator and | ||
28 | RDONLY are a relatively recent addition (2.6.32-rc+) and will not | ||
29 | be generated by older kernels. | ||
30 | |||
31 | 3. CHANGE | ||
32 | |||
33 | The CHANGE uevent is used in two places. One is when reporting the | ||
34 | successful mount of the filesystem by the first node (FIRSTMOUNT=Done). | ||
35 | This is used as a signal by gfs_controld that it is then ok for other | ||
36 | nodes in the cluster to mount the filesystem. | ||
37 | |||
38 | The other CHANGE uevent is used to inform of the completion | ||
39 | of journal recovery for one of the filesystems journals. It has | ||
40 | two environment variables, JID= which specifies the journal id which | ||
41 | has just been recovered, and RECOVERY=[Done|Failed] to indicate the | ||
42 | success (or otherwise) of the operation. These uevents are generated | ||
43 | for every journal recovered, whether it is during the initial mount | ||
44 | process or as the result of gfs_controld requesting a specific journal | ||
45 | recovery via the /sys/fs/gfs2/<fsname>/lock_module/recovery file. | ||
46 | |||
47 | Because the CHANGE uevent was used (in early versions of gfs_controld) | ||
48 | without checking the environment variables to discover the state, we | ||
49 | cannot add any more functions to it without running the risk of | ||
50 | someone using an older version of the user tools and breaking their | ||
51 | cluster. For this reason the ONLINE uevent was used when adding a new | ||
52 | uevent for a successful mount or remount. | ||
53 | |||
54 | 4. OFFLINE | ||
55 | |||
56 | The OFFLINE uevent is only generated due to filesystem errors and is used | ||
57 | as part of the "withdraw" mechanism. Currently this doesn't give any | ||
58 | information about what the error is, which is something that needs to | ||
59 | be fixed. | ||
60 | |||
61 | 5. REMOVE | ||
62 | |||
63 | The REMOVE uevent is generated at the end of an unsuccessful mount | ||
64 | or at the end of a umount of the filesystem. All REMOVE uevents will | ||
65 | have been preceeded by at least an ADD uevent for the same fileystem, | ||
66 | and unlike the other uevents is generated automatically by the kernel's | ||
67 | kobject subsystem. | ||
68 | |||
69 | |||
70 | Information common to all GFS2 uevents (uevent environment variables) | ||
71 | ---------------------------------------------------------------------- | ||
72 | |||
73 | 1. LOCKTABLE= | ||
74 | |||
75 | The LOCKTABLE is a string, as supplied on the mount command | ||
76 | line (locktable=) or via fstab. It is used as a filesystem label | ||
77 | as well as providing the information for a lock_dlm mount to be | ||
78 | able to join the cluster. | ||
79 | |||
80 | 2. LOCKPROTO= | ||
81 | |||
82 | The LOCKPROTO is a string, and its value depends on what is set | ||
83 | on the mount command line, or via fstab. It will be either | ||
84 | lock_nolock or lock_dlm. In the future other lock managers | ||
85 | may be supported. | ||
86 | |||
87 | 3. JOURNALID= | ||
88 | |||
89 | If a journal is in use by the filesystem (journals are not | ||
90 | assigned for spectator mounts) then this will give the | ||
91 | numeric journal id in all GFS2 uevents. | ||
92 | |||
93 | 4. UUID= | ||
94 | |||
95 | With recent versions of gfs2-utils, mkfs.gfs2 writes a UUID | ||
96 | into the filesystem superblock. If it exists, this will | ||
97 | be included in every uevent relating to the filesystem. | ||
98 | |||
99 | |||
100 | |||
diff --git a/Documentation/filesystems/nfs.txt b/Documentation/filesystems/nfs.txt new file mode 100644 index 000000000000..f50f26ce6cd0 --- /dev/null +++ b/Documentation/filesystems/nfs.txt | |||
@@ -0,0 +1,98 @@ | |||
1 | |||
2 | The NFS client | ||
3 | ============== | ||
4 | |||
5 | The NFS version 2 protocol was first documented in RFC1094 (March 1989). | ||
6 | Since then two more major releases of NFS have been published, with NFSv3 | ||
7 | being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April | ||
8 | 2003). | ||
9 | |||
10 | The Linux NFS client currently supports all the above published versions, | ||
11 | and work is in progress on adding support for minor version 1 of the NFSv4 | ||
12 | protocol. | ||
13 | |||
14 | The purpose of this document is to provide information on some of the | ||
15 | upcall interfaces that are used in order to provide the NFS client with | ||
16 | some of the information that it requires in order to fully comply with | ||
17 | the NFS spec. | ||
18 | |||
19 | The DNS resolver | ||
20 | ================ | ||
21 | |||
22 | NFSv4 allows for one server to refer the NFS client to data that has been | ||
23 | migrated onto another server by means of the special "fs_locations" | ||
24 | attribute. See | ||
25 | http://tools.ietf.org/html/rfc3530#section-6 | ||
26 | and | ||
27 | http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00 | ||
28 | |||
29 | The fs_locations information can take the form of either an ip address and | ||
30 | a path, or a DNS hostname and a path. The latter requires the NFS client to | ||
31 | do a DNS lookup in order to mount the new volume, and hence the need for an | ||
32 | upcall to allow userland to provide this service. | ||
33 | |||
34 | Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual | ||
35 | /var/lib/nfs/rpc_pipefs, the upcall consists of the following steps: | ||
36 | |||
37 | (1) The process checks the dns_resolve cache to see if it contains a | ||
38 | valid entry. If so, it returns that entry and exits. | ||
39 | |||
40 | (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent' | ||
41 | (may be changed using the 'nfs.cache_getent' kernel boot parameter) | ||
42 | is run, with two arguments: | ||
43 | - the cache name, "dns_resolve" | ||
44 | - the hostname to resolve | ||
45 | |||
46 | (3) After looking up the corresponding ip address, the helper script | ||
47 | writes the result into the rpc_pipefs pseudo-file | ||
48 | '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel' | ||
49 | in the following (text) format: | ||
50 | |||
51 | "<ip address> <hostname> <ttl>\n" | ||
52 | |||
53 | Where <ip address> is in the usual IPv4 (123.456.78.90) or IPv6 | ||
54 | (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format. | ||
55 | <hostname> is identical to the second argument of the helper | ||
56 | script, and <ttl> is the 'time to live' of this cache entry (in | ||
57 | units of seconds). | ||
58 | |||
59 | Note: If <ip address> is invalid, say the string "0", then a negative | ||
60 | entry is created, which will cause the kernel to treat the hostname | ||
61 | as having no valid DNS translation. | ||
62 | |||
63 | |||
64 | |||
65 | |||
66 | A basic sample /sbin/nfs_cache_getent | ||
67 | ===================================== | ||
68 | |||
69 | #!/bin/bash | ||
70 | # | ||
71 | ttl=600 | ||
72 | # | ||
73 | cut=/usr/bin/cut | ||
74 | getent=/usr/bin/getent | ||
75 | rpc_pipefs=/var/lib/nfs/rpc_pipefs | ||
76 | # | ||
77 | die() | ||
78 | { | ||
79 | echo "Usage: $0 cache_name entry_name" | ||
80 | exit 1 | ||
81 | } | ||
82 | |||
83 | [ $# -lt 2 ] && die | ||
84 | cachename="$1" | ||
85 | cache_path=${rpc_pipefs}/cache/${cachename}/channel | ||
86 | |||
87 | case "${cachename}" in | ||
88 | dns_resolve) | ||
89 | name="$2" | ||
90 | result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )" | ||
91 | [ -z "${result}" ] && result="0" | ||
92 | ;; | ||
93 | *) | ||
94 | die | ||
95 | ;; | ||
96 | esac | ||
97 | echo "${result} ${name} ${ttl}" >${cache_path} | ||
98 | |||
diff --git a/Documentation/filesystems/seq_file.txt b/Documentation/filesystems/seq_file.txt index b843743aa0b5..0d15ebccf5b0 100644 --- a/Documentation/filesystems/seq_file.txt +++ b/Documentation/filesystems/seq_file.txt | |||
@@ -46,7 +46,7 @@ better to do. The file is seekable, in that one can do something like the | |||
46 | following: | 46 | following: |
47 | 47 | ||
48 | dd if=/proc/sequence of=out1 count=1 | 48 | dd if=/proc/sequence of=out1 count=1 |
49 | dd if=/proc/sequence skip=1 out=out2 count=1 | 49 | dd if=/proc/sequence skip=1 of=out2 count=1 |
50 | 50 | ||
51 | Then concatenate the output files out1 and out2 and get the right | 51 | Then concatenate the output files out1 and out2 and get the right |
52 | result. Yes, it is a thoroughly useless module, but the point is to show | 52 | result. Yes, it is a thoroughly useless module, but the point is to show |
diff --git a/Documentation/flexible-arrays.txt b/Documentation/flexible-arrays.txt new file mode 100644 index 000000000000..84eb26808dee --- /dev/null +++ b/Documentation/flexible-arrays.txt | |||
@@ -0,0 +1,99 @@ | |||
1 | Using flexible arrays in the kernel | ||
2 | Last updated for 2.6.31 | ||
3 | Jonathan Corbet <corbet@lwn.net> | ||
4 | |||
5 | Large contiguous memory allocations can be unreliable in the Linux kernel. | ||
6 | Kernel programmers will sometimes respond to this problem by allocating | ||
7 | pages with vmalloc(). This solution not ideal, though. On 32-bit systems, | ||
8 | memory from vmalloc() must be mapped into a relatively small address space; | ||
9 | it's easy to run out. On SMP systems, the page table changes required by | ||
10 | vmalloc() allocations can require expensive cross-processor interrupts on | ||
11 | all CPUs. And, on all systems, use of space in the vmalloc() range | ||
12 | increases pressure on the translation lookaside buffer (TLB), reducing the | ||
13 | performance of the system. | ||
14 | |||
15 | In many cases, the need for memory from vmalloc() can be eliminated by | ||
16 | piecing together an array from smaller parts; the flexible array library | ||
17 | exists to make this task easier. | ||
18 | |||
19 | A flexible array holds an arbitrary (within limits) number of fixed-sized | ||
20 | objects, accessed via an integer index. Sparse arrays are handled | ||
21 | reasonably well. Only single-page allocations are made, so memory | ||
22 | allocation failures should be relatively rare. The down sides are that the | ||
23 | arrays cannot be indexed directly, individual object size cannot exceed the | ||
24 | system page size, and putting data into a flexible array requires a copy | ||
25 | operation. It's also worth noting that flexible arrays do no internal | ||
26 | locking at all; if concurrent access to an array is possible, then the | ||
27 | caller must arrange for appropriate mutual exclusion. | ||
28 | |||
29 | The creation of a flexible array is done with: | ||
30 | |||
31 | #include <linux/flex_array.h> | ||
32 | |||
33 | struct flex_array *flex_array_alloc(int element_size, | ||
34 | unsigned int total, | ||
35 | gfp_t flags); | ||
36 | |||
37 | The individual object size is provided by element_size, while total is the | ||
38 | maximum number of objects which can be stored in the array. The flags | ||
39 | argument is passed directly to the internal memory allocation calls. With | ||
40 | the current code, using flags to ask for high memory is likely to lead to | ||
41 | notably unpleasant side effects. | ||
42 | |||
43 | Storing data into a flexible array is accomplished with a call to: | ||
44 | |||
45 | int flex_array_put(struct flex_array *array, unsigned int element_nr, | ||
46 | void *src, gfp_t flags); | ||
47 | |||
48 | This call will copy the data from src into the array, in the position | ||
49 | indicated by element_nr (which must be less than the maximum specified when | ||
50 | the array was created). If any memory allocations must be performed, flags | ||
51 | will be used. The return value is zero on success, a negative error code | ||
52 | otherwise. | ||
53 | |||
54 | There might possibly be a need to store data into a flexible array while | ||
55 | running in some sort of atomic context; in this situation, sleeping in the | ||
56 | memory allocator would be a bad thing. That can be avoided by using | ||
57 | GFP_ATOMIC for the flags value, but, often, there is a better way. The | ||
58 | trick is to ensure that any needed memory allocations are done before | ||
59 | entering atomic context, using: | ||
60 | |||
61 | int flex_array_prealloc(struct flex_array *array, unsigned int start, | ||
62 | unsigned int end, gfp_t flags); | ||
63 | |||
64 | This function will ensure that memory for the elements indexed in the range | ||
65 | defined by start and end has been allocated. Thereafter, a | ||
66 | flex_array_put() call on an element in that range is guaranteed not to | ||
67 | block. | ||
68 | |||
69 | Getting data back out of the array is done with: | ||
70 | |||
71 | void *flex_array_get(struct flex_array *fa, unsigned int element_nr); | ||
72 | |||
73 | The return value is a pointer to the data element, or NULL if that | ||
74 | particular element has never been allocated. | ||
75 | |||
76 | Note that it is possible to get back a valid pointer for an element which | ||
77 | has never been stored in the array. Memory for array elements is allocated | ||
78 | one page at a time; a single allocation could provide memory for several | ||
79 | adjacent elements. The flexible array code does not know if a specific | ||
80 | element has been written; it only knows if the associated memory is | ||
81 | present. So a flex_array_get() call on an element which was never stored | ||
82 | in the array has the potential to return a pointer to random data. If the | ||
83 | caller does not have a separate way to know which elements were actually | ||
84 | stored, it might be wise, at least, to add GFP_ZERO to the flags argument | ||
85 | to ensure that all elements are zeroed. | ||
86 | |||
87 | There is no way to remove a single element from the array. It is possible, | ||
88 | though, to remove all elements with a call to: | ||
89 | |||
90 | void flex_array_free_parts(struct flex_array *array); | ||
91 | |||
92 | This call frees all elements, but leaves the array itself in place. | ||
93 | Freeing the entire array is done with: | ||
94 | |||
95 | void flex_array_free(struct flex_array *array); | ||
96 | |||
97 | As of this writing, there are no users of flexible arrays in the mainline | ||
98 | kernel. The functions described here are also not exported to modules; | ||
99 | that will probably be fixed when somebody comes up with a need for it. | ||
diff --git a/Documentation/hwmon/pcf8591 b/Documentation/hwmon/pcf8591 index 5628fcf4207f..e76a7892f68e 100644 --- a/Documentation/hwmon/pcf8591 +++ b/Documentation/hwmon/pcf8591 | |||
@@ -2,11 +2,11 @@ Kernel driver pcf8591 | |||
2 | ===================== | 2 | ===================== |
3 | 3 | ||
4 | Supported chips: | 4 | Supported chips: |
5 | * Philips PCF8591 | 5 | * Philips/NXP PCF8591 |
6 | Prefix: 'pcf8591' | 6 | Prefix: 'pcf8591' |
7 | Addresses scanned: I2C 0x48 - 0x4f | 7 | Addresses scanned: I2C 0x48 - 0x4f |
8 | Datasheet: Publicly available at the Philips Semiconductor website | 8 | Datasheet: Publicly available at the NXP website |
9 | http://www.semiconductors.philips.com/pip/PCF8591P.html | 9 | http://www.nxp.com/pip/PCF8591_6.html |
10 | 10 | ||
11 | Authors: | 11 | Authors: |
12 | Aurelien Jarno <aurelien@aurel32.net> | 12 | Aurelien Jarno <aurelien@aurel32.net> |
@@ -16,9 +16,10 @@ Authors: | |||
16 | 16 | ||
17 | Description | 17 | Description |
18 | ----------- | 18 | ----------- |
19 | |||
19 | The PCF8591 is an 8-bit A/D and D/A converter (4 analog inputs and one | 20 | The PCF8591 is an 8-bit A/D and D/A converter (4 analog inputs and one |
20 | analog output) for the I2C bus produced by Philips Semiconductors. It | 21 | analog output) for the I2C bus produced by Philips Semiconductors (now NXP). |
21 | is designed to provide a byte I2C interface to up to 4 separate devices. | 22 | It is designed to provide a byte I2C interface to up to 4 separate devices. |
22 | 23 | ||
23 | The PCF8591 has 4 analog inputs programmable as single-ended or | 24 | The PCF8591 has 4 analog inputs programmable as single-ended or |
24 | differential inputs : | 25 | differential inputs : |
@@ -58,8 +59,8 @@ Accessing PCF8591 via /sys interface | |||
58 | ------------------------------------- | 59 | ------------------------------------- |
59 | 60 | ||
60 | ! Be careful ! | 61 | ! Be careful ! |
61 | The PCF8591 is plainly impossible to detect ! Stupid chip. | 62 | The PCF8591 is plainly impossible to detect! Stupid chip. |
62 | So every chip with address in the interval [48..4f] is | 63 | So every chip with address in the interval [0x48..0x4f] is |
63 | detected as PCF8591. If you have other chips in this address | 64 | detected as PCF8591. If you have other chips in this address |
64 | range, the workaround is to load this module after the one | 65 | range, the workaround is to load this module after the one |
65 | for your others chips. | 66 | for your others chips. |
@@ -67,19 +68,20 @@ for your others chips. | |||
67 | On detection (i.e. insmod, modprobe et al.), directories are being | 68 | On detection (i.e. insmod, modprobe et al.), directories are being |
68 | created for each detected PCF8591: | 69 | created for each detected PCF8591: |
69 | 70 | ||
70 | /sys/bus/devices/<0>-<1>/ | 71 | /sys/bus/i2c/devices/<0>-<1>/ |
71 | where <0> is the bus the chip was detected on (e. g. i2c-0) | 72 | where <0> is the bus the chip was detected on (e. g. i2c-0) |
72 | and <1> the chip address ([48..4f]) | 73 | and <1> the chip address ([48..4f]) |
73 | 74 | ||
74 | Inside these directories, there are such files: | 75 | Inside these directories, there are such files: |
75 | in0, in1, in2, in3, out0_enable, out0_output, name | 76 | in0_input, in1_input, in2_input, in3_input, out0_enable, out0_output, name |
76 | 77 | ||
77 | Name contains chip name. | 78 | Name contains chip name. |
78 | 79 | ||
79 | The in0, in1, in2 and in3 files are RO. Reading gives the value of the | 80 | The in0_input, in1_input, in2_input and in3_input files are RO. Reading gives |
80 | corresponding channel. Depending on the current analog inputs configuration, | 81 | the value of the corresponding channel. Depending on the current analog inputs |
81 | files in2 and/or in3 do not exist. Values range are from 0 to 255 for single | 82 | configuration, files in2_input and in3_input may not exist. Values range |
82 | ended inputs and -128 to +127 for differential inputs (8-bit ADC). | 83 | from 0 to 255 for single ended inputs and -128 to +127 for differential inputs |
84 | (8-bit ADC). | ||
83 | 85 | ||
84 | The out0_enable file is RW. Reading gives "1" for analog output enabled and | 86 | The out0_enable file is RW. Reading gives "1" for analog output enabled and |
85 | "0" for analog output disabled. Writing accepts "0" and "1" accordingly. | 87 | "0" for analog output disabled. Writing accepts "0" and "1" accordingly. |
diff --git a/Documentation/hwmon/tmp421 b/Documentation/hwmon/tmp421 new file mode 100644 index 000000000000..0cf07f824741 --- /dev/null +++ b/Documentation/hwmon/tmp421 | |||
@@ -0,0 +1,36 @@ | |||
1 | Kernel driver tmp421 | ||
2 | ==================== | ||
3 | |||
4 | Supported chips: | ||
5 | * Texas Instruments TMP421 | ||
6 | Prefix: 'tmp421' | ||
7 | Addresses scanned: I2C 0x2a, 0x4c, 0x4d, 0x4e and 0x4f | ||
8 | Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp421.html | ||
9 | * Texas Instruments TMP422 | ||
10 | Prefix: 'tmp422' | ||
11 | Addresses scanned: I2C 0x2a, 0x4c, 0x4d, 0x4e and 0x4f | ||
12 | Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp421.html | ||
13 | * Texas Instruments TMP423 | ||
14 | Prefix: 'tmp423' | ||
15 | Addresses scanned: I2C 0x2a, 0x4c, 0x4d, 0x4e and 0x4f | ||
16 | Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp421.html | ||
17 | |||
18 | Authors: | ||
19 | Andre Prendel <andre.prendel@gmx.de> | ||
20 | |||
21 | Description | ||
22 | ----------- | ||
23 | |||
24 | This driver implements support for Texas Instruments TMP421, TMP422 | ||
25 | and TMP423 temperature sensor chips. These chips implement one local | ||
26 | and up to one (TMP421), up to two (TMP422) or up to three (TMP423) | ||
27 | remote sensors. Temperature is measured in degrees Celsius. The chips | ||
28 | are wired over I2C/SMBus and specified over a temperature range of -40 | ||
29 | to +125 degrees Celsius. Resolution for both the local and remote | ||
30 | channels is 0.0625 degree C. | ||
31 | |||
32 | The chips support only temperature measurement. The driver exports | ||
33 | the temperature values via the following sysfs files: | ||
34 | |||
35 | temp[1-4]_input | ||
36 | temp[2-4]_fault | ||
diff --git a/Documentation/hwmon/wm831x b/Documentation/hwmon/wm831x new file mode 100644 index 000000000000..24f47d8f6a42 --- /dev/null +++ b/Documentation/hwmon/wm831x | |||
@@ -0,0 +1,37 @@ | |||
1 | Kernel driver wm831x-hwmon | ||
2 | ========================== | ||
3 | |||
4 | Supported chips: | ||
5 | * Wolfson Microelectronics WM831x PMICs | ||
6 | Prefix: 'wm831x' | ||
7 | Datasheet: | ||
8 | http://www.wolfsonmicro.com/products/WM8310 | ||
9 | http://www.wolfsonmicro.com/products/WM8311 | ||
10 | http://www.wolfsonmicro.com/products/WM8312 | ||
11 | |||
12 | Authors: Mark Brown <broonie@opensource.wolfsonmicro.com> | ||
13 | |||
14 | Description | ||
15 | ----------- | ||
16 | |||
17 | The WM831x series of PMICs include an AUXADC which can be used to | ||
18 | monitor a range of system operating parameters, including the voltages | ||
19 | of the major supplies within the system. Currently the driver provides | ||
20 | reporting of all the input values but does not provide any alarms. | ||
21 | |||
22 | Voltage Monitoring | ||
23 | ------------------ | ||
24 | |||
25 | Voltages are sampled by a 12 bit ADC. Voltages in milivolts are 1.465 | ||
26 | times the ADC value. | ||
27 | |||
28 | Temperature Monitoring | ||
29 | ---------------------- | ||
30 | |||
31 | Temperatures are sampled by a 12 bit ADC. Chip and battery temperatures | ||
32 | are available. The chip temperature is calculated as: | ||
33 | |||
34 | Degrees celsius = (512.18 - data) / 1.0983 | ||
35 | |||
36 | while the battery temperature calculation will depend on the NTC | ||
37 | thermistor component. | ||
diff --git a/Documentation/hwmon/wm8350 b/Documentation/hwmon/wm8350 new file mode 100644 index 000000000000..98f923bd2e92 --- /dev/null +++ b/Documentation/hwmon/wm8350 | |||
@@ -0,0 +1,26 @@ | |||
1 | Kernel driver wm8350-hwmon | ||
2 | ========================== | ||
3 | |||
4 | Supported chips: | ||
5 | * Wolfson Microelectronics WM835x PMICs | ||
6 | Prefix: 'wm8350' | ||
7 | Datasheet: | ||
8 | http://www.wolfsonmicro.com/products/WM8350 | ||
9 | http://www.wolfsonmicro.com/products/WM8351 | ||
10 | http://www.wolfsonmicro.com/products/WM8352 | ||
11 | |||
12 | Authors: Mark Brown <broonie@opensource.wolfsonmicro.com> | ||
13 | |||
14 | Description | ||
15 | ----------- | ||
16 | |||
17 | The WM835x series of PMICs include an AUXADC which can be used to | ||
18 | monitor a range of system operating parameters, including the voltages | ||
19 | of the major supplies within the system. Currently the driver provides | ||
20 | simple access to these major supplies. | ||
21 | |||
22 | Voltage Monitoring | ||
23 | ------------------ | ||
24 | |||
25 | Voltages are sampled by a 12 bit ADC. For the internal supplies the ADC | ||
26 | is referenced to the system VRTC. | ||
diff --git a/Documentation/input/sentelic.txt b/Documentation/input/sentelic.txt new file mode 100644 index 000000000000..f7160a2fb6a2 --- /dev/null +++ b/Documentation/input/sentelic.txt | |||
@@ -0,0 +1,475 @@ | |||
1 | Copyright (C) 2002-2008 Sentelic Corporation. | ||
2 | Last update: Oct-31-2008 | ||
3 | |||
4 | ============================================================================== | ||
5 | * Finger Sensing Pad Intellimouse Mode(scrolling wheel, 4th and 5th buttons) | ||
6 | ============================================================================== | ||
7 | A) MSID 4: Scrolling wheel mode plus Forward page(4th button) and Backward | ||
8 | page (5th button) | ||
9 | @1. Set sample rate to 200; | ||
10 | @2. Set sample rate to 200; | ||
11 | @3. Set sample rate to 80; | ||
12 | @4. Issuing the "Get device ID" command (0xF2) and waits for the response; | ||
13 | @5. FSP will respond 0x04. | ||
14 | |||
15 | Packet 1 | ||
16 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | ||
17 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| | ||
18 | 1 |Y|X|y|x|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 | | |B|F|W|W|W|W| | ||
19 | |---------------| |---------------| |---------------| |---------------| | ||
20 | |||
21 | Byte 1: Bit7 => Y overflow | ||
22 | Bit6 => X overflow | ||
23 | Bit5 => Y sign bit | ||
24 | Bit4 => X sign bit | ||
25 | Bit3 => 1 | ||
26 | Bit2 => Middle Button, 1 is pressed, 0 is not pressed. | ||
27 | Bit1 => Right Button, 1 is pressed, 0 is not pressed. | ||
28 | Bit0 => Left Button, 1 is pressed, 0 is not pressed. | ||
29 | Byte 2: X Movement(9-bit 2's complement integers) | ||
30 | Byte 3: Y Movement(9-bit 2's complement integers) | ||
31 | Byte 4: Bit3~Bit0 => the scrolling wheel's movement since the last data report. | ||
32 | valid values, -8 ~ +7 | ||
33 | Bit4 => 1 = 4th mouse button is pressed, Forward one page. | ||
34 | 0 = 4th mouse button is not pressed. | ||
35 | Bit5 => 1 = 5th mouse button is pressed, Backward one page. | ||
36 | 0 = 5th mouse button is not pressed. | ||
37 | |||
38 | B) MSID 6: Horizontal and Vertical scrolling. | ||
39 | @ Set bit 1 in register 0x40 to 1 | ||
40 | |||
41 | # FSP replaces scrolling wheel's movement as 4 bits to show horizontal and | ||
42 | vertical scrolling. | ||
43 | |||
44 | Packet 1 | ||
45 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | ||
46 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| | ||
47 | 1 |Y|X|y|x|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 | | |B|F|l|r|u|d| | ||
48 | |---------------| |---------------| |---------------| |---------------| | ||
49 | |||
50 | Byte 1: Bit7 => Y overflow | ||
51 | Bit6 => X overflow | ||
52 | Bit5 => Y sign bit | ||
53 | Bit4 => X sign bit | ||
54 | Bit3 => 1 | ||
55 | Bit2 => Middle Button, 1 is pressed, 0 is not pressed. | ||
56 | Bit1 => Right Button, 1 is pressed, 0 is not pressed. | ||
57 | Bit0 => Left Button, 1 is pressed, 0 is not pressed. | ||
58 | Byte 2: X Movement(9-bit 2's complement integers) | ||
59 | Byte 3: Y Movement(9-bit 2's complement integers) | ||
60 | Byte 4: Bit0 => the Vertical scrolling movement downward. | ||
61 | Bit1 => the Vertical scrolling movement upward. | ||
62 | Bit2 => the Vertical scrolling movement rightward. | ||
63 | Bit3 => the Vertical scrolling movement leftward. | ||
64 | Bit4 => 1 = 4th mouse button is pressed, Forward one page. | ||
65 | 0 = 4th mouse button is not pressed. | ||
66 | Bit5 => 1 = 5th mouse button is pressed, Backward one page. | ||
67 | 0 = 5th mouse button is not pressed. | ||
68 | |||
69 | C) MSID 7: | ||
70 | # FSP uses 2 packets(8 Bytes) data to represent Absolute Position | ||
71 | so we have PACKET NUMBER to identify packets. | ||
72 | If PACKET NUMBER is 0, the packet is Packet 1. | ||
73 | If PACKET NUMBER is 1, the packet is Packet 2. | ||
74 | Please count this number in program. | ||
75 | |||
76 | # MSID6 special packet will be enable at the same time when enable MSID 7. | ||
77 | |||
78 | ============================================================================== | ||
79 | * Absolute position for STL3886-G0. | ||
80 | ============================================================================== | ||
81 | @ Set bit 2 or 3 in register 0x40 to 1 | ||
82 | @ Set bit 6 in register 0x40 to 1 | ||
83 | |||
84 | Packet 1 (ABSOLUTE POSITION) | ||
85 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | ||
86 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| | ||
87 | 1 |0|1|V|1|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |r|l|d|u|X|X|Y|Y| | ||
88 | |---------------| |---------------| |---------------| |---------------| | ||
89 | |||
90 | Byte 1: Bit7~Bit6 => 00, Normal data packet | ||
91 | => 01, Absolute coordination packet | ||
92 | => 10, Notify packet | ||
93 | Bit5 => valid bit | ||
94 | Bit4 => 1 | ||
95 | Bit3 => 1 | ||
96 | Bit2 => Middle Button, 1 is pressed, 0 is not pressed. | ||
97 | Bit1 => Right Button, 1 is pressed, 0 is not pressed. | ||
98 | Bit0 => Left Button, 1 is pressed, 0 is not pressed. | ||
99 | Byte 2: X coordinate (xpos[9:2]) | ||
100 | Byte 3: Y coordinate (ypos[9:2]) | ||
101 | Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0]) | ||
102 | Bit3~Bit2 => X coordinate (ypos[1:0]) | ||
103 | Bit4 => scroll up | ||
104 | Bit5 => scroll down | ||
105 | Bit6 => scroll left | ||
106 | Bit7 => scroll right | ||
107 | |||
108 | Notify Packet for G0 | ||
109 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | ||
110 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| | ||
111 | 1 |1|0|0|1|1|M|R|L| 2 |C|C|C|C|C|C|C|C| 3 |M|M|M|M|M|M|M|M| 4 |0|0|0|0|0|0|0|0| | ||
112 | |---------------| |---------------| |---------------| |---------------| | ||
113 | |||
114 | Byte 1: Bit7~Bit6 => 00, Normal data packet | ||
115 | => 01, Absolute coordination packet | ||
116 | => 10, Notify packet | ||
117 | Bit5 => 0 | ||
118 | Bit4 => 1 | ||
119 | Bit3 => 1 | ||
120 | Bit2 => Middle Button, 1 is pressed, 0 is not pressed. | ||
121 | Bit1 => Right Button, 1 is pressed, 0 is not pressed. | ||
122 | Bit0 => Left Button, 1 is pressed, 0 is not pressed. | ||
123 | Byte 2: Message Type => 0x5A (Enable/Disable status packet) | ||
124 | Mode Type => 0xA5 (Normal/Icon mode status) | ||
125 | Byte 3: Message Type => 0x00 (Disabled) | ||
126 | => 0x01 (Enabled) | ||
127 | Mode Type => 0x00 (Normal) | ||
128 | => 0x01 (Icon) | ||
129 | Byte 4: Bit7~Bit0 => Don't Care | ||
130 | |||
131 | ============================================================================== | ||
132 | * Absolute position for STL3888-A0. | ||
133 | ============================================================================== | ||
134 | Packet 1 (ABSOLUTE POSITION) | ||
135 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | ||
136 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| | ||
137 | 1 |0|1|V|A|1|L|0|1| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |x|x|y|y|X|X|Y|Y| | ||
138 | |---------------| |---------------| |---------------| |---------------| | ||
139 | |||
140 | Byte 1: Bit7~Bit6 => 00, Normal data packet | ||
141 | => 01, Absolute coordination packet | ||
142 | => 10, Notify packet | ||
143 | Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up. | ||
144 | When both fingers are up, the last two reports have zero valid | ||
145 | bit. | ||
146 | Bit4 => arc | ||
147 | Bit3 => 1 | ||
148 | Bit2 => Left Button, 1 is pressed, 0 is released. | ||
149 | Bit1 => 0 | ||
150 | Bit0 => 1 | ||
151 | Byte 2: X coordinate (xpos[9:2]) | ||
152 | Byte 3: Y coordinate (ypos[9:2]) | ||
153 | Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0]) | ||
154 | Bit3~Bit2 => X coordinate (ypos[1:0]) | ||
155 | Bit5~Bit4 => y1_g | ||
156 | Bit7~Bit6 => x1_g | ||
157 | |||
158 | Packet 2 (ABSOLUTE POSITION) | ||
159 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | ||
160 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| | ||
161 | 1 |0|1|V|A|1|R|1|0| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |x|x|y|y|X|X|Y|Y| | ||
162 | |---------------| |---------------| |---------------| |---------------| | ||
163 | |||
164 | Byte 1: Bit7~Bit6 => 00, Normal data packet | ||
165 | => 01, Absolute coordinates packet | ||
166 | => 10, Notify packet | ||
167 | Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up. | ||
168 | When both fingers are up, the last two reports have zero valid | ||
169 | bit. | ||
170 | Bit4 => arc | ||
171 | Bit3 => 1 | ||
172 | Bit2 => Right Button, 1 is pressed, 0 is released. | ||
173 | Bit1 => 1 | ||
174 | Bit0 => 0 | ||
175 | Byte 2: X coordinate (xpos[9:2]) | ||
176 | Byte 3: Y coordinate (ypos[9:2]) | ||
177 | Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0]) | ||
178 | Bit3~Bit2 => X coordinate (ypos[1:0]) | ||
179 | Bit5~Bit4 => y2_g | ||
180 | Bit7~Bit6 => x2_g | ||
181 | |||
182 | Notify Packet for STL3888-A0 | ||
183 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | ||
184 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| | ||
185 | 1 |1|0|1|P|1|M|R|L| 2 |C|C|C|C|C|C|C|C| 3 |0|0|F|F|0|0|0|i| 4 |r|l|d|u|0|0|0|0| | ||
186 | |---------------| |---------------| |---------------| |---------------| | ||
187 | |||
188 | Byte 1: Bit7~Bit6 => 00, Normal data packet | ||
189 | => 01, Absolute coordination packet | ||
190 | => 10, Notify packet | ||
191 | Bit5 => 1 | ||
192 | Bit4 => when in absolute coordinates mode (valid when EN_PKT_GO is 1): | ||
193 | 0: left button is generated by the on-pad command | ||
194 | 1: left button is generated by the external button | ||
195 | Bit3 => 1 | ||
196 | Bit2 => Middle Button, 1 is pressed, 0 is not pressed. | ||
197 | Bit1 => Right Button, 1 is pressed, 0 is not pressed. | ||
198 | Bit0 => Left Button, 1 is pressed, 0 is not pressed. | ||
199 | Byte 2: Message Type => 0xB7 (Multi Finger, Multi Coordinate mode) | ||
200 | Byte 3: Bit7~Bit6 => Don't care | ||
201 | Bit5~Bit4 => Number of fingers | ||
202 | Bit3~Bit1 => Reserved | ||
203 | Bit0 => 1: enter gesture mode; 0: leaving gesture mode | ||
204 | Byte 4: Bit7 => scroll right button | ||
205 | Bit6 => scroll left button | ||
206 | Bit5 => scroll down button | ||
207 | Bit4 => scroll up button | ||
208 | * Note that if gesture and additional button (Bit4~Bit7) | ||
209 | happen at the same time, the button information will not | ||
210 | be sent. | ||
211 | Bit3~Bit0 => Reserved | ||
212 | |||
213 | Sample sequence of Multi-finger, Multi-coordinate mode: | ||
214 | |||
215 | notify packet (valid bit == 1), abs pkt 1, abs pkt 2, abs pkt 1, | ||
216 | abs pkt 2, ..., notify packet(valid bit == 0) | ||
217 | |||
218 | ============================================================================== | ||
219 | * FSP Enable/Disable packet | ||
220 | ============================================================================== | ||
221 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | ||
222 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| | ||
223 | 1 |Y|X|0|0|1|M|R|L| 2 |0|1|0|1|1|0|1|E| 3 | | | | | | | | | 4 | | | | | | | | | | ||
224 | |---------------| |---------------| |---------------| |---------------| | ||
225 | |||
226 | FSP will send out enable/disable packet when FSP receive PS/2 enable/disable | ||
227 | command. Host will receive the packet which Middle, Right, Left button will | ||
228 | be set. The packet only use byte 0 and byte 1 as a pattern of original packet. | ||
229 | Ignore the other bytes of the packet. | ||
230 | |||
231 | Byte 1: Bit7 => 0, Y overflow | ||
232 | Bit6 => 0, X overflow | ||
233 | Bit5 => 0, Y sign bit | ||
234 | Bit4 => 0, X sign bit | ||
235 | Bit3 => 1 | ||
236 | Bit2 => 1, Middle Button | ||
237 | Bit1 => 1, Right Button | ||
238 | Bit0 => 1, Left Button | ||
239 | Byte 2: Bit7~1 => (0101101b) | ||
240 | Bit0 => 1 = Enable | ||
241 | 0 = Disable | ||
242 | Byte 3: Don't care | ||
243 | Byte 4: Don't care (MOUSE ID 3, 4) | ||
244 | Byte 5~8: Don't care (Absolute packet) | ||
245 | |||
246 | ============================================================================== | ||
247 | * PS/2 Command Set | ||
248 | ============================================================================== | ||
249 | |||
250 | FSP supports basic PS/2 commanding set and modes, refer to following URL for | ||
251 | details about PS/2 commands: | ||
252 | |||
253 | http://www.computer-engineering.org/index.php?title=PS/2_Mouse_Interface | ||
254 | |||
255 | ============================================================================== | ||
256 | * Programming Sequence for Determining Packet Parsing Flow | ||
257 | ============================================================================== | ||
258 | 1. Identify FSP by reading device ID(0x00) and version(0x01) register | ||
259 | |||
260 | 2. Determine number of buttons by reading status2 (0x0b) register | ||
261 | |||
262 | buttons = reg[0x0b] & 0x30 | ||
263 | |||
264 | if buttons == 0x30 or buttons == 0x20: | ||
265 | # two/four buttons | ||
266 | Refer to 'Finger Sensing Pad PS/2 Mouse Intellimouse' | ||
267 | section A for packet parsing detail(ignore byte 4, bit ~ 7) | ||
268 | elif buttons == 0x10: | ||
269 | # 6 buttons | ||
270 | Refer to 'Finger Sensing Pad PS/2 Mouse Intellimouse' | ||
271 | section B for packet parsing detail | ||
272 | elif buttons == 0x00: | ||
273 | # 6 buttons | ||
274 | Refer to 'Finger Sensing Pad PS/2 Mouse Intellimouse' | ||
275 | section A for packet parsing detail | ||
276 | |||
277 | ============================================================================== | ||
278 | * Programming Sequence for Register Reading/Writing | ||
279 | ============================================================================== | ||
280 | |||
281 | Register inversion requirement: | ||
282 | |||
283 | Following values needed to be inverted(the '~' operator in C) before being | ||
284 | sent to FSP: | ||
285 | |||
286 | 0xe9, 0xee, 0xf2 and 0xff. | ||
287 | |||
288 | Register swapping requirement: | ||
289 | |||
290 | Following values needed to have their higher 4 bits and lower 4 bits being | ||
291 | swapped before being sent to FSP: | ||
292 | |||
293 | 10, 20, 40, 60, 80, 100 and 200. | ||
294 | |||
295 | Register reading sequence: | ||
296 | |||
297 | 1. send 0xf3 PS/2 command to FSP; | ||
298 | |||
299 | 2. send 0x66 PS/2 command to FSP; | ||
300 | |||
301 | 3. send 0x88 PS/2 command to FSP; | ||
302 | |||
303 | 4. send 0xf3 PS/2 command to FSP; | ||
304 | |||
305 | 5. if the register address being to read is not required to be | ||
306 | inverted(refer to the 'Register inversion requirement' section), | ||
307 | goto step 6 | ||
308 | |||
309 | 5a. send 0x68 PS/2 command to FSP; | ||
310 | |||
311 | 5b. send the inverted register address to FSP and goto step 8; | ||
312 | |||
313 | 6. if the register address being to read is not required to be | ||
314 | swapped(refer to the 'Register swapping requirement' section), | ||
315 | goto step 7 | ||
316 | |||
317 | 6a. send 0xcc PS/2 command to FSP; | ||
318 | |||
319 | 6b. send the swapped register address to FSP and goto step 8; | ||
320 | |||
321 | 7. send 0x66 PS/2 command to FSP; | ||
322 | |||
323 | 7a. send the original register address to FSP and goto step 8; | ||
324 | |||
325 | 8. send 0xe9(status request) PS/2 command to FSP; | ||
326 | |||
327 | 9. the response read from FSP should be the requested register value. | ||
328 | |||
329 | Register writing sequence: | ||
330 | |||
331 | 1. send 0xf3 PS/2 command to FSP; | ||
332 | |||
333 | 2. if the register address being to write is not required to be | ||
334 | inverted(refer to the 'Register inversion requirement' section), | ||
335 | goto step 3 | ||
336 | |||
337 | 2a. send 0x74 PS/2 command to FSP; | ||
338 | |||
339 | 2b. send the inverted register address to FSP and goto step 5; | ||
340 | |||
341 | 3. if the register address being to write is not required to be | ||
342 | swapped(refer to the 'Register swapping requirement' section), | ||
343 | goto step 4 | ||
344 | |||
345 | 3a. send 0x77 PS/2 command to FSP; | ||
346 | |||
347 | 3b. send the swapped register address to FSP and goto step 5; | ||
348 | |||
349 | 4. send 0x55 PS/2 command to FSP; | ||
350 | |||
351 | 4a. send the register address to FSP and goto step 5; | ||
352 | |||
353 | 5. send 0xf3 PS/2 command to FSP; | ||
354 | |||
355 | 6. if the register value being to write is not required to be | ||
356 | inverted(refer to the 'Register inversion requirement' section), | ||
357 | goto step 7 | ||
358 | |||
359 | 6a. send 0x47 PS/2 command to FSP; | ||
360 | |||
361 | 6b. send the inverted register value to FSP and goto step 9; | ||
362 | |||
363 | 7. if the register value being to write is not required to be | ||
364 | swapped(refer to the 'Register swapping requirement' section), | ||
365 | goto step 8 | ||
366 | |||
367 | 7a. send 0x44 PS/2 command to FSP; | ||
368 | |||
369 | 7b. send the swapped register value to FSP and goto step 9; | ||
370 | |||
371 | 8. send 0x33 PS/2 command to FSP; | ||
372 | |||
373 | 8a. send the register value to FSP; | ||
374 | |||
375 | 9. the register writing sequence is completed. | ||
376 | |||
377 | ============================================================================== | ||
378 | * Register Listing | ||
379 | ============================================================================== | ||
380 | |||
381 | offset width default r/w name | ||
382 | 0x00 bit7~bit0 0x01 RO device ID | ||
383 | |||
384 | 0x01 bit7~bit0 0xc0 RW version ID | ||
385 | |||
386 | 0x02 bit7~bit0 0x01 RO vendor ID | ||
387 | |||
388 | 0x03 bit7~bit0 0x01 RO product ID | ||
389 | |||
390 | 0x04 bit3~bit0 0x01 RW revision ID | ||
391 | |||
392 | 0x0b RO test mode status 1 | ||
393 | bit3 1 RO 0: rotate 180 degree, 1: no rotation | ||
394 | |||
395 | bit5~bit4 RO number of buttons | ||
396 | 11 => 2, lbtn/rbtn | ||
397 | 10 => 4, lbtn/rbtn/scru/scrd | ||
398 | 01 => 6, lbtn/rbtn/scru/scrd/scrl/scrr | ||
399 | 00 => 6, lbtn/rbtn/scru/scrd/fbtn/bbtn | ||
400 | |||
401 | 0x0f RW register file page control | ||
402 | bit0 0 RW 1 to enable page 1 register files | ||
403 | |||
404 | 0x10 RW system control 1 | ||
405 | bit0 1 RW Reserved, must be 1 | ||
406 | bit1 0 RW Reserved, must be 0 | ||
407 | bit4 1 RW Reserved, must be 0 | ||
408 | bit5 0 RW register clock gating enable | ||
409 | 0: read only, 1: read/write enable | ||
410 | (Note that following registers does not require clock gating being | ||
411 | enabled prior to write: 05 06 07 08 09 0c 0f 10 11 12 16 17 18 23 2e | ||
412 | 40 41 42 43.) | ||
413 | |||
414 | 0x31 RW on-pad command detection | ||
415 | bit7 0 RW on-pad command left button down tag | ||
416 | enable | ||
417 | 0: disable, 1: enable | ||
418 | |||
419 | 0x34 RW on-pad command control 5 | ||
420 | bit4~bit0 0x05 RW XLO in 0s/4/1, so 03h = 0010.1b = 2.5 | ||
421 | (Note that position unit is in 0.5 scanline) | ||
422 | |||
423 | bit7 0 RW on-pad tap zone enable | ||
424 | 0: disable, 1: enable | ||
425 | |||
426 | 0x35 RW on-pad command control 6 | ||
427 | bit4~bit0 0x1d RW XHI in 0s/4/1, so 19h = 1100.1b = 12.5 | ||
428 | (Note that position unit is in 0.5 scanline) | ||
429 | |||
430 | 0x36 RW on-pad command control 7 | ||
431 | bit4~bit0 0x04 RW YLO in 0s/4/1, so 03h = 0010.1b = 2.5 | ||
432 | (Note that position unit is in 0.5 scanline) | ||
433 | |||
434 | 0x37 RW on-pad command control 8 | ||
435 | bit4~bit0 0x13 RW YHI in 0s/4/1, so 11h = 1000.1b = 8.5 | ||
436 | (Note that position unit is in 0.5 scanline) | ||
437 | |||
438 | 0x40 RW system control 5 | ||
439 | bit1 0 RW FSP Intellimouse mode enable | ||
440 | 0: disable, 1: enable | ||
441 | |||
442 | bit2 0 RW movement + abs. coordinate mode enable | ||
443 | 0: disable, 1: enable | ||
444 | (Note that this function has the functionality of bit 1 even when | ||
445 | bit 1 is not set. However, the format is different from that of bit 1. | ||
446 | In addition, when bit 1 and bit 2 are set at the same time, bit 2 will | ||
447 | override bit 1.) | ||
448 | |||
449 | bit3 0 RW abs. coordinate only mode enable | ||
450 | 0: disable, 1: enable | ||
451 | (Note that this function has the functionality of bit 1 even when | ||
452 | bit 1 is not set. However, the format is different from that of bit 1. | ||
453 | In addition, when bit 1, bit 2 and bit 3 are set at the same time, | ||
454 | bit 3 will override bit 1 and 2.) | ||
455 | |||
456 | bit5 0 RW auto switch enable | ||
457 | 0: disable, 1: enable | ||
458 | |||
459 | bit6 0 RW G0 abs. + notify packet format enable | ||
460 | 0: disable, 1: enable | ||
461 | (Note that the absolute/relative coordinate output still depends on | ||
462 | bit 2 and 3. That is, if any of those bit is 1, host will receive | ||
463 | absolute coordinates; otherwise, host only receives packets with | ||
464 | relative coordinate.) | ||
465 | |||
466 | 0x43 RW on-pad control | ||
467 | bit0 0 RW on-pad control enable | ||
468 | 0: disable, 1: enable | ||
469 | (Note that if this bit is cleared, bit 3/5 will be ineffective) | ||
470 | |||
471 | bit3 0 RW on-pad fix vertical scrolling enable | ||
472 | 0: disable, 1: enable | ||
473 | |||
474 | bit5 0 RW on-pad fix horizontal scrolling enable | ||
475 | 0: disable, 1: enable | ||
diff --git a/Documentation/intel_txt.txt b/Documentation/intel_txt.txt new file mode 100644 index 000000000000..f40a1f030019 --- /dev/null +++ b/Documentation/intel_txt.txt | |||
@@ -0,0 +1,210 @@ | |||
1 | Intel(R) TXT Overview: | ||
2 | ===================== | ||
3 | |||
4 | Intel's technology for safer computing, Intel(R) Trusted Execution | ||
5 | Technology (Intel(R) TXT), defines platform-level enhancements that | ||
6 | provide the building blocks for creating trusted platforms. | ||
7 | |||
8 | Intel TXT was formerly known by the code name LaGrande Technology (LT). | ||
9 | |||
10 | Intel TXT in Brief: | ||
11 | o Provides dynamic root of trust for measurement (DRTM) | ||
12 | o Data protection in case of improper shutdown | ||
13 | o Measurement and verification of launched environment | ||
14 | |||
15 | Intel TXT is part of the vPro(TM) brand and is also available some | ||
16 | non-vPro systems. It is currently available on desktop systems | ||
17 | based on the Q35, X38, Q45, and Q43 Express chipsets (e.g. Dell | ||
18 | Optiplex 755, HP dc7800, etc.) and mobile systems based on the GM45, | ||
19 | PM45, and GS45 Express chipsets. | ||
20 | |||
21 | For more information, see http://www.intel.com/technology/security/. | ||
22 | This site also has a link to the Intel TXT MLE Developers Manual, | ||
23 | which has been updated for the new released platforms. | ||
24 | |||
25 | Intel TXT has been presented at various events over the past few | ||
26 | years, some of which are: | ||
27 | LinuxTAG 2008: | ||
28 | http://www.linuxtag.org/2008/en/conf/events/vp-donnerstag/ | ||
29 | details.html?talkid=110 | ||
30 | TRUST2008: | ||
31 | http://www.trust2008.eu/downloads/Keynote-Speakers/ | ||
32 | 3_David-Grawrock_The-Front-Door-of-Trusted-Computing.pdf | ||
33 | IDF 2008, Shanghai: | ||
34 | http://inteldeveloperforum.com.edgesuite.net/shanghai_2008/ | ||
35 | aep/PROS003/index.html | ||
36 | IDFs 2006, 2007 (I'm not sure if/where they are online) | ||
37 | |||
38 | Trusted Boot Project Overview: | ||
39 | ============================= | ||
40 | |||
41 | Trusted Boot (tboot) is an open source, pre- kernel/VMM module that | ||
42 | uses Intel TXT to perform a measured and verified launch of an OS | ||
43 | kernel/VMM. | ||
44 | |||
45 | It is hosted on SourceForge at http://sourceforge.net/projects/tboot. | ||
46 | The mercurial source repo is available at http://www.bughost.org/ | ||
47 | repos.hg/tboot.hg. | ||
48 | |||
49 | Tboot currently supports launching Xen (open source VMM/hypervisor | ||
50 | w/ TXT support since v3.2), and now Linux kernels. | ||
51 | |||
52 | |||
53 | Value Proposition for Linux or "Why should you care?" | ||
54 | ===================================================== | ||
55 | |||
56 | While there are many products and technologies that attempt to | ||
57 | measure or protect the integrity of a running kernel, they all | ||
58 | assume the kernel is "good" to begin with. The Integrity | ||
59 | Measurement Architecture (IMA) and Linux Integrity Module interface | ||
60 | are examples of such solutions. | ||
61 | |||
62 | To get trust in the initial kernel without using Intel TXT, a | ||
63 | static root of trust must be used. This bases trust in BIOS | ||
64 | starting at system reset and requires measurement of all code | ||
65 | executed between system reset through the completion of the kernel | ||
66 | boot as well as data objects used by that code. In the case of a | ||
67 | Linux kernel, this means all of BIOS, any option ROMs, the | ||
68 | bootloader and the boot config. In practice, this is a lot of | ||
69 | code/data, much of which is subject to change from boot to boot | ||
70 | (e.g. changing NICs may change option ROMs). Without reference | ||
71 | hashes, these measurement changes are difficult to assess or | ||
72 | confirm as benign. This process also does not provide DMA | ||
73 | protection, memory configuration/alias checks and locks, crash | ||
74 | protection, or policy support. | ||
75 | |||
76 | By using the hardware-based root of trust that Intel TXT provides, | ||
77 | many of these issues can be mitigated. Specifically: many | ||
78 | pre-launch components can be removed from the trust chain, DMA | ||
79 | protection is provided to all launched components, a large number | ||
80 | of platform configuration checks are performed and values locked, | ||
81 | protection is provided for any data in the event of an improper | ||
82 | shutdown, and there is support for policy-based execution/verification. | ||
83 | This provides a more stable measurement and a higher assurance of | ||
84 | system configuration and initial state than would be otherwise | ||
85 | possible. Since the tboot project is open source, source code for | ||
86 | almost all parts of the trust chain is available (excepting SMM and | ||
87 | Intel-provided firmware). | ||
88 | |||
89 | How Does it Work? | ||
90 | ================= | ||
91 | |||
92 | o Tboot is an executable that is launched by the bootloader as | ||
93 | the "kernel" (the binary the bootloader executes). | ||
94 | o It performs all of the work necessary to determine if the | ||
95 | platform supports Intel TXT and, if so, executes the GETSEC[SENTER] | ||
96 | processor instruction that initiates the dynamic root of trust. | ||
97 | - If tboot determines that the system does not support Intel TXT | ||
98 | or is not configured correctly (e.g. the SINIT AC Module was | ||
99 | incorrect), it will directly launch the kernel with no changes | ||
100 | to any state. | ||
101 | - Tboot will output various information about its progress to the | ||
102 | terminal, serial port, and/or an in-memory log; the output | ||
103 | locations can be configured with a command line switch. | ||
104 | o The GETSEC[SENTER] instruction will return control to tboot and | ||
105 | tboot then verifies certain aspects of the environment (e.g. TPM NV | ||
106 | lock, e820 table does not have invalid entries, etc.). | ||
107 | o It will wake the APs from the special sleep state the GETSEC[SENTER] | ||
108 | instruction had put them in and place them into a wait-for-SIPI | ||
109 | state. | ||
110 | - Because the processors will not respond to an INIT or SIPI when | ||
111 | in the TXT environment, it is necessary to create a small VT-x | ||
112 | guest for the APs. When they run in this guest, they will | ||
113 | simply wait for the INIT-SIPI-SIPI sequence, which will cause | ||
114 | VMEXITs, and then disable VT and jump to the SIPI vector. This | ||
115 | approach seemed like a better choice than having to insert | ||
116 | special code into the kernel's MP wakeup sequence. | ||
117 | o Tboot then applies an (optional) user-defined launch policy to | ||
118 | verify the kernel and initrd. | ||
119 | - This policy is rooted in TPM NV and is described in the tboot | ||
120 | project. The tboot project also contains code for tools to | ||
121 | create and provision the policy. | ||
122 | - Policies are completely under user control and if not present | ||
123 | then any kernel will be launched. | ||
124 | - Policy action is flexible and can include halting on failures | ||
125 | or simply logging them and continuing. | ||
126 | o Tboot adjusts the e820 table provided by the bootloader to reserve | ||
127 | its own location in memory as well as to reserve certain other | ||
128 | TXT-related regions. | ||
129 | o As part of it's launch, tboot DMA protects all of RAM (using the | ||
130 | VT-d PMRs). Thus, the kernel must be booted with 'intel_iommu=on' | ||
131 | in order to remove this blanket protection and use VT-d's | ||
132 | page-level protection. | ||
133 | o Tboot will populate a shared page with some data about itself and | ||
134 | pass this to the Linux kernel as it transfers control. | ||
135 | - The location of the shared page is passed via the boot_params | ||
136 | struct as a physical address. | ||
137 | o The kernel will look for the tboot shared page address and, if it | ||
138 | exists, map it. | ||
139 | o As one of the checks/protections provided by TXT, it makes a copy | ||
140 | of the VT-d DMARs in a DMA-protected region of memory and verifies | ||
141 | them for correctness. The VT-d code will detect if the kernel was | ||
142 | launched with tboot and use this copy instead of the one in the | ||
143 | ACPI table. | ||
144 | o At this point, tboot and TXT are out of the picture until a | ||
145 | shutdown (S<n>) | ||
146 | o In order to put a system into any of the sleep states after a TXT | ||
147 | launch, TXT must first be exited. This is to prevent attacks that | ||
148 | attempt to crash the system to gain control on reboot and steal | ||
149 | data left in memory. | ||
150 | - The kernel will perform all of its sleep preparation and | ||
151 | populate the shared page with the ACPI data needed to put the | ||
152 | platform in the desired sleep state. | ||
153 | - Then the kernel jumps into tboot via the vector specified in the | ||
154 | shared page. | ||
155 | - Tboot will clean up the environment and disable TXT, then use the | ||
156 | kernel-provided ACPI information to actually place the platform | ||
157 | into the desired sleep state. | ||
158 | - In the case of S3, tboot will also register itself as the resume | ||
159 | vector. This is necessary because it must re-establish the | ||
160 | measured environment upon resume. Once the TXT environment | ||
161 | has been restored, it will restore the TPM PCRs and then | ||
162 | transfer control back to the kernel's S3 resume vector. | ||
163 | In order to preserve system integrity across S3, the kernel | ||
164 | provides tboot with a set of memory ranges (kernel | ||
165 | code/data/bss, S3 resume code, and AP trampoline) that tboot | ||
166 | will calculate a MAC (message authentication code) over and then | ||
167 | seal with the TPM. On resume and once the measured environment | ||
168 | has been re-established, tboot will re-calculate the MAC and | ||
169 | verify it against the sealed value. Tboot's policy determines | ||
170 | what happens if the verification fails. | ||
171 | |||
172 | That's pretty much it for TXT support. | ||
173 | |||
174 | |||
175 | Configuring the System: | ||
176 | ====================== | ||
177 | |||
178 | This code works with 32bit, 32bit PAE, and 64bit (x86_64) kernels. | ||
179 | |||
180 | In BIOS, the user must enable: TPM, TXT, VT-x, VT-d. Not all BIOSes | ||
181 | allow these to be individually enabled/disabled and the screens in | ||
182 | which to find them are BIOS-specific. | ||
183 | |||
184 | grub.conf needs to be modified as follows: | ||
185 | title Linux 2.6.29-tip w/ tboot | ||
186 | root (hd0,0) | ||
187 | kernel /tboot.gz logging=serial,vga,memory | ||
188 | module /vmlinuz-2.6.29-tip intel_iommu=on ro | ||
189 | root=LABEL=/ rhgb console=ttyS0,115200 3 | ||
190 | module /initrd-2.6.29-tip.img | ||
191 | module /Q35_SINIT_17.BIN | ||
192 | |||
193 | The kernel option for enabling Intel TXT support is found under the | ||
194 | Security top-level menu and is called "Enable Intel(R) Trusted | ||
195 | Execution Technology (TXT)". It is marked as EXPERIMENTAL and | ||
196 | depends on the generic x86 support (to allow maximum flexibility in | ||
197 | kernel build options), since the tboot code will detect whether the | ||
198 | platform actually supports Intel TXT and thus whether any of the | ||
199 | kernel code is executed. | ||
200 | |||
201 | The Q35_SINIT_17.BIN file is what Intel TXT refers to as an | ||
202 | Authenticated Code Module. It is specific to the chipset in the | ||
203 | system and can also be found on the Trusted Boot site. It is an | ||
204 | (unencrypted) module signed by Intel that is used as part of the | ||
205 | DRTM process to verify and configure the system. It is signed | ||
206 | because it operates at a higher privilege level in the system than | ||
207 | any other macrocode and its correct operation is critical to the | ||
208 | establishment of the DRTM. The process for determining the correct | ||
209 | SINIT ACM for a system is documented in the SINIT-guide.txt file | ||
210 | that is on the tboot SourceForge site under the SINIT ACM downloads. | ||
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index dbea4f95fc85..aafca0a8f66a 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt | |||
@@ -121,6 +121,7 @@ Code Seq# Include File Comments | |||
121 | 'c' 00-7F linux/comstats.h conflict! | 121 | 'c' 00-7F linux/comstats.h conflict! |
122 | 'c' 00-7F linux/coda.h conflict! | 122 | 'c' 00-7F linux/coda.h conflict! |
123 | 'c' 80-9F arch/s390/include/asm/chsc.h | 123 | 'c' 80-9F arch/s390/include/asm/chsc.h |
124 | 'c' A0-AF arch/x86/include/asm/msr.h | ||
124 | 'd' 00-FF linux/char/drm/drm/h conflict! | 125 | 'd' 00-FF linux/char/drm/drm/h conflict! |
125 | 'd' F0-FF linux/digi1.h | 126 | 'd' F0-FF linux/digi1.h |
126 | 'e' all linux/digi1.h conflict! | 127 | 'e' all linux/digi1.h conflict! |
@@ -192,7 +193,7 @@ Code Seq# Include File Comments | |||
192 | 0xAD 00 Netfilter device in development: | 193 | 0xAD 00 Netfilter device in development: |
193 | <mailto:rusty@rustcorp.com.au> | 194 | <mailto:rusty@rustcorp.com.au> |
194 | 0xAE all linux/kvm.h Kernel-based Virtual Machine | 195 | 0xAE all linux/kvm.h Kernel-based Virtual Machine |
195 | <mailto:kvm-devel@lists.sourceforge.net> | 196 | <mailto:kvm@vger.kernel.org> |
196 | 0xB0 all RATIO devices in development: | 197 | 0xB0 all RATIO devices in development: |
197 | <mailto:vgo@ratio.de> | 198 | <mailto:vgo@ratio.de> |
198 | 0xB1 00-1F PPPoX <mailto:mostrows@styx.uwaterloo.ca> | 199 | 0xB1 00-1F PPPoX <mailto:mostrows@styx.uwaterloo.ca> |
diff --git a/Documentation/kernel-doc-nano-HOWTO.txt b/Documentation/kernel-doc-nano-HOWTO.txt index 4d04572b6549..348b9e5e28fc 100644 --- a/Documentation/kernel-doc-nano-HOWTO.txt +++ b/Documentation/kernel-doc-nano-HOWTO.txt | |||
@@ -66,7 +66,9 @@ Example kernel-doc function comment: | |||
66 | * The longer description can have multiple paragraphs. | 66 | * The longer description can have multiple paragraphs. |
67 | */ | 67 | */ |
68 | 68 | ||
69 | The first line, with the short description, must be on a single line. | 69 | The short description following the subject can span multiple lines |
70 | and ends with an @argument description, an empty line or the end of | ||
71 | the comment block. | ||
70 | 72 | ||
71 | The @argument descriptions must begin on the very next line following | 73 | The @argument descriptions must begin on the very next line following |
72 | this opening short function description line, with no intervening | 74 | this opening short function description line, with no intervening |
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 7936b801fe6a..0f17d16dc101 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -57,6 +57,7 @@ parameter is applicable: | |||
57 | ISAPNP ISA PnP code is enabled. | 57 | ISAPNP ISA PnP code is enabled. |
58 | ISDN Appropriate ISDN support is enabled. | 58 | ISDN Appropriate ISDN support is enabled. |
59 | JOY Appropriate joystick support is enabled. | 59 | JOY Appropriate joystick support is enabled. |
60 | KVM Kernel Virtual Machine support is enabled. | ||
60 | LIBATA Libata driver is enabled | 61 | LIBATA Libata driver is enabled |
61 | LP Printer support is enabled. | 62 | LP Printer support is enabled. |
62 | LOOP Loopback device support is enabled. | 63 | LOOP Loopback device support is enabled. |
@@ -1098,6 +1099,44 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1098 | kstack=N [X86] Print N words from the kernel stack | 1099 | kstack=N [X86] Print N words from the kernel stack |
1099 | in oops dumps. | 1100 | in oops dumps. |
1100 | 1101 | ||
1102 | kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs. | ||
1103 | Default is 0 (don't ignore, but inject #GP) | ||
1104 | |||
1105 | kvm.oos_shadow= [KVM] Disable out-of-sync shadow paging. | ||
1106 | Default is 1 (enabled) | ||
1107 | |||
1108 | kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM. | ||
1109 | Default is 0 (off) | ||
1110 | |||
1111 | kvm-amd.npt= [KVM,AMD] Disable nested paging (virtualized MMU) | ||
1112 | for all guests. | ||
1113 | Default is 1 (enabled) if in 64bit or 32bit-PAE mode | ||
1114 | |||
1115 | kvm-intel.bypass_guest_pf= | ||
1116 | [KVM,Intel] Disables bypassing of guest page faults | ||
1117 | on Intel chips. Default is 1 (enabled) | ||
1118 | |||
1119 | kvm-intel.ept= [KVM,Intel] Disable extended page tables | ||
1120 | (virtualized MMU) support on capable Intel chips. | ||
1121 | Default is 1 (enabled) | ||
1122 | |||
1123 | kvm-intel.emulate_invalid_guest_state= | ||
1124 | [KVM,Intel] Enable emulation of invalid guest states | ||
1125 | Default is 0 (disabled) | ||
1126 | |||
1127 | kvm-intel.flexpriority= | ||
1128 | [KVM,Intel] Disable FlexPriority feature (TPR shadow). | ||
1129 | Default is 1 (enabled) | ||
1130 | |||
1131 | kvm-intel.unrestricted_guest= | ||
1132 | [KVM,Intel] Disable unrestricted guest feature | ||
1133 | (virtualized real and unpaged mode) on capable | ||
1134 | Intel chips. Default is 1 (enabled) | ||
1135 | |||
1136 | kvm-intel.vpid= [KVM,Intel] Disable Virtual Processor Identification | ||
1137 | feature (tagged TLBs) on capable Intel chips. | ||
1138 | Default is 1 (enabled) | ||
1139 | |||
1101 | l2cr= [PPC] | 1140 | l2cr= [PPC] |
1102 | 1141 | ||
1103 | l3cr= [PPC] | 1142 | l3cr= [PPC] |
@@ -1247,6 +1286,10 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1247 | (machvec) in a generic kernel. | 1286 | (machvec) in a generic kernel. |
1248 | Example: machvec=hpzx1_swiotlb | 1287 | Example: machvec=hpzx1_swiotlb |
1249 | 1288 | ||
1289 | machtype= [Loongson] Share the same kernel image file between different | ||
1290 | yeeloong laptop. | ||
1291 | Example: machtype=lemote-yeeloong-2f-7inch | ||
1292 | |||
1250 | max_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory greater | 1293 | max_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory greater |
1251 | than or equal to this physical address is ignored. | 1294 | than or equal to this physical address is ignored. |
1252 | 1295 | ||
@@ -1503,6 +1546,14 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1503 | [NFS] set the TCP port on which the NFSv4 callback | 1546 | [NFS] set the TCP port on which the NFSv4 callback |
1504 | channel should listen. | 1547 | channel should listen. |
1505 | 1548 | ||
1549 | nfs.cache_getent= | ||
1550 | [NFS] sets the pathname to the program which is used | ||
1551 | to update the NFS client cache entries. | ||
1552 | |||
1553 | nfs.cache_getent_timeout= | ||
1554 | [NFS] sets the timeout after which an attempt to | ||
1555 | update a cache entry is deemed to have failed. | ||
1556 | |||
1506 | nfs.idmap_cache_timeout= | 1557 | nfs.idmap_cache_timeout= |
1507 | [NFS] set the maximum lifetime for idmapper cache | 1558 | [NFS] set the maximum lifetime for idmapper cache |
1508 | entries. | 1559 | entries. |
@@ -1514,7 +1565,7 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1514 | of returning the full 64-bit number. | 1565 | of returning the full 64-bit number. |
1515 | The default is to return 64-bit inode numbers. | 1566 | The default is to return 64-bit inode numbers. |
1516 | 1567 | ||
1517 | nmi_debug= [KNL,AVR32] Specify one or more actions to take | 1568 | nmi_debug= [KNL,AVR32,SH] Specify one or more actions to take |
1518 | when a NMI is triggered. | 1569 | when a NMI is triggered. |
1519 | Format: [state][,regs][,debounce][,die] | 1570 | Format: [state][,regs][,debounce][,die] |
1520 | 1571 | ||
@@ -1535,6 +1586,11 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1535 | symbolic names: lapic and ioapic | 1586 | symbolic names: lapic and ioapic |
1536 | Example: nmi_watchdog=2 or nmi_watchdog=panic,lapic | 1587 | Example: nmi_watchdog=2 or nmi_watchdog=panic,lapic |
1537 | 1588 | ||
1589 | netpoll.carrier_timeout= | ||
1590 | [NET] Specifies amount of time (in seconds) that | ||
1591 | netpoll should wait for a carrier. By default netpoll | ||
1592 | waits 4 seconds. | ||
1593 | |||
1538 | no387 [BUGS=X86-32] Tells the kernel to use the 387 maths | 1594 | no387 [BUGS=X86-32] Tells the kernel to use the 387 maths |
1539 | emulation library even if a 387 maths coprocessor | 1595 | emulation library even if a 387 maths coprocessor |
1540 | is present. | 1596 | is present. |
@@ -1919,11 +1975,12 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1919 | Format: { 0 | 1 } | 1975 | Format: { 0 | 1 } |
1920 | See arch/parisc/kernel/pdc_chassis.c | 1976 | See arch/parisc/kernel/pdc_chassis.c |
1921 | 1977 | ||
1922 | percpu_alloc= [X86] Select which percpu first chunk allocator to use. | 1978 | percpu_alloc= Select which percpu first chunk allocator to use. |
1923 | Allowed values are one of "lpage", "embed" and "4k". | 1979 | Currently supported values are "embed" and "page". |
1924 | See comments in arch/x86/kernel/setup_percpu.c for | 1980 | Archs may support subset or none of the selections. |
1925 | details on each allocator. This parameter is primarily | 1981 | See comments in mm/percpu.c for details on each |
1926 | for debugging and performance comparison. | 1982 | allocator. This parameter is primarily for debugging |
1983 | and performance comparison. | ||
1927 | 1984 | ||
1928 | pf. [PARIDE] | 1985 | pf. [PARIDE] |
1929 | See Documentation/blockdev/paride.txt. | 1986 | See Documentation/blockdev/paride.txt. |
@@ -2395,6 +2452,18 @@ and is between 256 and 4096 characters. It is defined in the file | |||
2395 | stifb= [HW] | 2452 | stifb= [HW] |
2396 | Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]] | 2453 | Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]] |
2397 | 2454 | ||
2455 | sunrpc.min_resvport= | ||
2456 | sunrpc.max_resvport= | ||
2457 | [NFS,SUNRPC] | ||
2458 | SunRPC servers often require that client requests | ||
2459 | originate from a privileged port (i.e. a port in the | ||
2460 | range 0 < portnr < 1024). | ||
2461 | An administrator who wishes to reserve some of these | ||
2462 | ports for other uses may adjust the range that the | ||
2463 | kernel's sunrpc client considers to be privileged | ||
2464 | using these two parameters to set the minimum and | ||
2465 | maximum port values. | ||
2466 | |||
2398 | sunrpc.pool_mode= | 2467 | sunrpc.pool_mode= |
2399 | [NFS] | 2468 | [NFS] |
2400 | Control how the NFS server code allocates CPUs to | 2469 | Control how the NFS server code allocates CPUs to |
@@ -2411,6 +2480,15 @@ and is between 256 and 4096 characters. It is defined in the file | |||
2411 | pernode one pool for each NUMA node (equivalent | 2480 | pernode one pool for each NUMA node (equivalent |
2412 | to global on non-NUMA machines) | 2481 | to global on non-NUMA machines) |
2413 | 2482 | ||
2483 | sunrpc.tcp_slot_table_entries= | ||
2484 | sunrpc.udp_slot_table_entries= | ||
2485 | [NFS,SUNRPC] | ||
2486 | Sets the upper limit on the number of simultaneous | ||
2487 | RPC calls that can be sent from the client to a | ||
2488 | server. Increasing these values may allow you to | ||
2489 | improve throughput, but will also increase the | ||
2490 | amount of memory reserved for use by the client. | ||
2491 | |||
2414 | swiotlb= [IA-64] Number of I/O TLB slabs | 2492 | swiotlb= [IA-64] Number of I/O TLB slabs |
2415 | 2493 | ||
2416 | switches= [HW,M68k] | 2494 | switches= [HW,M68k] |
@@ -2480,6 +2558,11 @@ and is between 256 and 4096 characters. It is defined in the file | |||
2480 | trace_buf_size=nn[KMG] | 2558 | trace_buf_size=nn[KMG] |
2481 | [FTRACE] will set tracing buffer size. | 2559 | [FTRACE] will set tracing buffer size. |
2482 | 2560 | ||
2561 | trace_event=[event-list] | ||
2562 | [FTRACE] Set and start specified trace events in order | ||
2563 | to facilitate early boot debugging. | ||
2564 | See also Documentation/trace/events.txt | ||
2565 | |||
2483 | trix= [HW,OSS] MediaTrix AudioTrix Pro | 2566 | trix= [HW,OSS] MediaTrix AudioTrix Pro |
2484 | Format: | 2567 | Format: |
2485 | <io>,<irq>,<dma>,<dma2>,<sb_io>,<sb_irq>,<sb_dma>,<mpu_io>,<mpu_irq> | 2568 | <io>,<irq>,<dma>,<dma2>,<sb_io>,<sb_irq>,<sb_dma>,<mpu_io>,<mpu_irq> |
diff --git a/Documentation/keys.txt b/Documentation/keys.txt index b56aacc1fff8..e4dbbdb1bd96 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt | |||
@@ -26,7 +26,7 @@ This document has the following sections: | |||
26 | - Notes on accessing payload contents | 26 | - Notes on accessing payload contents |
27 | - Defining a key type | 27 | - Defining a key type |
28 | - Request-key callback service | 28 | - Request-key callback service |
29 | - Key access filesystem | 29 | - Garbage collection |
30 | 30 | ||
31 | 31 | ||
32 | ============ | 32 | ============ |
@@ -113,6 +113,9 @@ Each key has a number of attributes: | |||
113 | 113 | ||
114 | (*) Dead. The key's type was unregistered, and so the key is now useless. | 114 | (*) Dead. The key's type was unregistered, and so the key is now useless. |
115 | 115 | ||
116 | Keys in the last three states are subject to garbage collection. See the | ||
117 | section on "Garbage collection". | ||
118 | |||
116 | 119 | ||
117 | ==================== | 120 | ==================== |
118 | KEY SERVICE OVERVIEW | 121 | KEY SERVICE OVERVIEW |
@@ -754,6 +757,26 @@ The keyctl syscall functions are: | |||
754 | successful. | 757 | successful. |
755 | 758 | ||
756 | 759 | ||
760 | (*) Install the calling process's session keyring on its parent. | ||
761 | |||
762 | long keyctl(KEYCTL_SESSION_TO_PARENT); | ||
763 | |||
764 | This functions attempts to install the calling process's session keyring | ||
765 | on to the calling process's parent, replacing the parent's current session | ||
766 | keyring. | ||
767 | |||
768 | The calling process must have the same ownership as its parent, the | ||
769 | keyring must have the same ownership as the calling process, the calling | ||
770 | process must have LINK permission on the keyring and the active LSM module | ||
771 | mustn't deny permission, otherwise error EPERM will be returned. | ||
772 | |||
773 | Error ENOMEM will be returned if there was insufficient memory to complete | ||
774 | the operation, otherwise 0 will be returned to indicate success. | ||
775 | |||
776 | The keyring will be replaced next time the parent process leaves the | ||
777 | kernel and resumes executing userspace. | ||
778 | |||
779 | |||
757 | =============== | 780 | =============== |
758 | KERNEL SERVICES | 781 | KERNEL SERVICES |
759 | =============== | 782 | =============== |
@@ -1231,3 +1254,17 @@ by executing: | |||
1231 | 1254 | ||
1232 | In this case, the program isn't required to actually attach the key to a ring; | 1255 | In this case, the program isn't required to actually attach the key to a ring; |
1233 | the rings are provided for reference. | 1256 | the rings are provided for reference. |
1257 | |||
1258 | |||
1259 | ================== | ||
1260 | GARBAGE COLLECTION | ||
1261 | ================== | ||
1262 | |||
1263 | Dead keys (for which the type has been removed) will be automatically unlinked | ||
1264 | from those keyrings that point to them and deleted as soon as possible by a | ||
1265 | background garbage collector. | ||
1266 | |||
1267 | Similarly, revoked and expired keys will be garbage collected, but only after a | ||
1268 | certain amount of time has passed. This time is set as a number of seconds in: | ||
1269 | |||
1270 | /proc/sys/kernel/keys/gc_delay | ||
diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt index 89068030b01b..34f6638aa5ac 100644 --- a/Documentation/kmemleak.txt +++ b/Documentation/kmemleak.txt | |||
@@ -27,6 +27,13 @@ To trigger an intermediate memory scan: | |||
27 | 27 | ||
28 | # echo scan > /sys/kernel/debug/kmemleak | 28 | # echo scan > /sys/kernel/debug/kmemleak |
29 | 29 | ||
30 | To clear the list of all current possible memory leaks: | ||
31 | |||
32 | # echo clear > /sys/kernel/debug/kmemleak | ||
33 | |||
34 | New leaks will then come up upon reading /sys/kernel/debug/kmemleak | ||
35 | again. | ||
36 | |||
30 | Note that the orphan objects are listed in the order they were allocated | 37 | Note that the orphan objects are listed in the order they were allocated |
31 | and one object at the beginning of the list may cause other subsequent | 38 | and one object at the beginning of the list may cause other subsequent |
32 | objects to be reported as orphan. | 39 | objects to be reported as orphan. |
@@ -42,6 +49,9 @@ Memory scanning parameters can be modified at run-time by writing to the | |||
42 | scan=<secs> - set the automatic memory scanning period in seconds | 49 | scan=<secs> - set the automatic memory scanning period in seconds |
43 | (default 600, 0 to stop the automatic scanning) | 50 | (default 600, 0 to stop the automatic scanning) |
44 | scan - trigger a memory scan | 51 | scan - trigger a memory scan |
52 | clear - clear list of current memory leak suspects, done by | ||
53 | marking all current reported unreferenced objects grey | ||
54 | dump=<addr> - dump information about the object found at <addr> | ||
45 | 55 | ||
46 | Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on | 56 | Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on |
47 | the kernel command line. | 57 | the kernel command line. |
@@ -86,6 +96,27 @@ avoid this, kmemleak can also store the number of values pointing to an | |||
86 | address inside the block address range that need to be found so that the | 96 | address inside the block address range that need to be found so that the |
87 | block is not considered a leak. One example is __vmalloc(). | 97 | block is not considered a leak. One example is __vmalloc(). |
88 | 98 | ||
99 | Testing specific sections with kmemleak | ||
100 | --------------------------------------- | ||
101 | |||
102 | Upon initial bootup your /sys/kernel/debug/kmemleak output page may be | ||
103 | quite extensive. This can also be the case if you have very buggy code | ||
104 | when doing development. To work around these situations you can use the | ||
105 | 'clear' command to clear all reported unreferenced objects from the | ||
106 | /sys/kernel/debug/kmemleak output. By issuing a 'scan' after a 'clear' | ||
107 | you can find new unreferenced objects; this should help with testing | ||
108 | specific sections of code. | ||
109 | |||
110 | To test a critical section on demand with a clean kmemleak do: | ||
111 | |||
112 | # echo clear > /sys/kernel/debug/kmemleak | ||
113 | ... test your kernel or modules ... | ||
114 | # echo scan > /sys/kernel/debug/kmemleak | ||
115 | |||
116 | Then as usual to get your report with: | ||
117 | |||
118 | # cat /sys/kernel/debug/kmemleak | ||
119 | |||
89 | Kmemleak API | 120 | Kmemleak API |
90 | ------------ | 121 | ------------ |
91 | 122 | ||
diff --git a/Documentation/kref.txt b/Documentation/kref.txt index 130b6e87aa7e..ae203f91ee9b 100644 --- a/Documentation/kref.txt +++ b/Documentation/kref.txt | |||
@@ -84,7 +84,6 @@ int my_data_handler(void) | |||
84 | task = kthread_run(more_data_handling, data, "more_data_handling"); | 84 | task = kthread_run(more_data_handling, data, "more_data_handling"); |
85 | if (task == ERR_PTR(-ENOMEM)) { | 85 | if (task == ERR_PTR(-ENOMEM)) { |
86 | rv = -ENOMEM; | 86 | rv = -ENOMEM; |
87 | kref_put(&data->refcount, data_release); | ||
88 | goto out; | 87 | goto out; |
89 | } | 88 | } |
90 | 89 | ||
diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt new file mode 100644 index 000000000000..5a4bc8cf6d04 --- /dev/null +++ b/Documentation/kvm/api.txt | |||
@@ -0,0 +1,759 @@ | |||
1 | The Definitive KVM (Kernel-based Virtual Machine) API Documentation | ||
2 | =================================================================== | ||
3 | |||
4 | 1. General description | ||
5 | |||
6 | The kvm API is a set of ioctls that are issued to control various aspects | ||
7 | of a virtual machine. The ioctls belong to three classes | ||
8 | |||
9 | - System ioctls: These query and set global attributes which affect the | ||
10 | whole kvm subsystem. In addition a system ioctl is used to create | ||
11 | virtual machines | ||
12 | |||
13 | - VM ioctls: These query and set attributes that affect an entire virtual | ||
14 | machine, for example memory layout. In addition a VM ioctl is used to | ||
15 | create virtual cpus (vcpus). | ||
16 | |||
17 | Only run VM ioctls from the same process (address space) that was used | ||
18 | to create the VM. | ||
19 | |||
20 | - vcpu ioctls: These query and set attributes that control the operation | ||
21 | of a single virtual cpu. | ||
22 | |||
23 | Only run vcpu ioctls from the same thread that was used to create the | ||
24 | vcpu. | ||
25 | |||
26 | 2. File descritpors | ||
27 | |||
28 | The kvm API is centered around file descriptors. An initial | ||
29 | open("/dev/kvm") obtains a handle to the kvm subsystem; this handle | ||
30 | can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this | ||
31 | handle will create a VM file descripror which can be used to issue VM | ||
32 | ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu | ||
33 | and return a file descriptor pointing to it. Finally, ioctls on a vcpu | ||
34 | fd can be used to control the vcpu, including the important task of | ||
35 | actually running guest code. | ||
36 | |||
37 | In general file descriptors can be migrated among processes by means | ||
38 | of fork() and the SCM_RIGHTS facility of unix domain socket. These | ||
39 | kinds of tricks are explicitly not supported by kvm. While they will | ||
40 | not cause harm to the host, their actual behavior is not guaranteed by | ||
41 | the API. The only supported use is one virtual machine per process, | ||
42 | and one vcpu per thread. | ||
43 | |||
44 | 3. Extensions | ||
45 | |||
46 | As of Linux 2.6.22, the KVM ABI has been stabilized: no backward | ||
47 | incompatible change are allowed. However, there is an extension | ||
48 | facility that allows backward-compatible extensions to the API to be | ||
49 | queried and used. | ||
50 | |||
51 | The extension mechanism is not based on on the Linux version number. | ||
52 | Instead, kvm defines extension identifiers and a facility to query | ||
53 | whether a particular extension identifier is available. If it is, a | ||
54 | set of ioctls is available for application use. | ||
55 | |||
56 | 4. API description | ||
57 | |||
58 | This section describes ioctls that can be used to control kvm guests. | ||
59 | For each ioctl, the following information is provided along with a | ||
60 | description: | ||
61 | |||
62 | Capability: which KVM extension provides this ioctl. Can be 'basic', | ||
63 | which means that is will be provided by any kernel that supports | ||
64 | API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which | ||
65 | means availability needs to be checked with KVM_CHECK_EXTENSION | ||
66 | (see section 4.4). | ||
67 | |||
68 | Architectures: which instruction set architectures provide this ioctl. | ||
69 | x86 includes both i386 and x86_64. | ||
70 | |||
71 | Type: system, vm, or vcpu. | ||
72 | |||
73 | Parameters: what parameters are accepted by the ioctl. | ||
74 | |||
75 | Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL) | ||
76 | are not detailed, but errors with specific meanings are. | ||
77 | |||
78 | 4.1 KVM_GET_API_VERSION | ||
79 | |||
80 | Capability: basic | ||
81 | Architectures: all | ||
82 | Type: system ioctl | ||
83 | Parameters: none | ||
84 | Returns: the constant KVM_API_VERSION (=12) | ||
85 | |||
86 | This identifies the API version as the stable kvm API. It is not | ||
87 | expected that this number will change. However, Linux 2.6.20 and | ||
88 | 2.6.21 report earlier versions; these are not documented and not | ||
89 | supported. Applications should refuse to run if KVM_GET_API_VERSION | ||
90 | returns a value other than 12. If this check passes, all ioctls | ||
91 | described as 'basic' will be available. | ||
92 | |||
93 | 4.2 KVM_CREATE_VM | ||
94 | |||
95 | Capability: basic | ||
96 | Architectures: all | ||
97 | Type: system ioctl | ||
98 | Parameters: none | ||
99 | Returns: a VM fd that can be used to control the new virtual machine. | ||
100 | |||
101 | The new VM has no virtual cpus and no memory. An mmap() of a VM fd | ||
102 | will access the virtual machine's physical address space; offset zero | ||
103 | corresponds to guest physical address zero. Use of mmap() on a VM fd | ||
104 | is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is | ||
105 | available. | ||
106 | |||
107 | 4.3 KVM_GET_MSR_INDEX_LIST | ||
108 | |||
109 | Capability: basic | ||
110 | Architectures: x86 | ||
111 | Type: system | ||
112 | Parameters: struct kvm_msr_list (in/out) | ||
113 | Returns: 0 on success; -1 on error | ||
114 | Errors: | ||
115 | E2BIG: the msr index list is to be to fit in the array specified by | ||
116 | the user. | ||
117 | |||
118 | struct kvm_msr_list { | ||
119 | __u32 nmsrs; /* number of msrs in entries */ | ||
120 | __u32 indices[0]; | ||
121 | }; | ||
122 | |||
123 | This ioctl returns the guest msrs that are supported. The list varies | ||
124 | by kvm version and host processor, but does not change otherwise. The | ||
125 | user fills in the size of the indices array in nmsrs, and in return | ||
126 | kvm adjusts nmsrs to reflect the actual number of msrs and fills in | ||
127 | the indices array with their numbers. | ||
128 | |||
129 | 4.4 KVM_CHECK_EXTENSION | ||
130 | |||
131 | Capability: basic | ||
132 | Architectures: all | ||
133 | Type: system ioctl | ||
134 | Parameters: extension identifier (KVM_CAP_*) | ||
135 | Returns: 0 if unsupported; 1 (or some other positive integer) if supported | ||
136 | |||
137 | The API allows the application to query about extensions to the core | ||
138 | kvm API. Userspace passes an extension identifier (an integer) and | ||
139 | receives an integer that describes the extension availability. | ||
140 | Generally 0 means no and 1 means yes, but some extensions may report | ||
141 | additional information in the integer return value. | ||
142 | |||
143 | 4.5 KVM_GET_VCPU_MMAP_SIZE | ||
144 | |||
145 | Capability: basic | ||
146 | Architectures: all | ||
147 | Type: system ioctl | ||
148 | Parameters: none | ||
149 | Returns: size of vcpu mmap area, in bytes | ||
150 | |||
151 | The KVM_RUN ioctl (cf.) communicates with userspace via a shared | ||
152 | memory region. This ioctl returns the size of that region. See the | ||
153 | KVM_RUN documentation for details. | ||
154 | |||
155 | 4.6 KVM_SET_MEMORY_REGION | ||
156 | |||
157 | Capability: basic | ||
158 | Architectures: all | ||
159 | Type: vm ioctl | ||
160 | Parameters: struct kvm_memory_region (in) | ||
161 | Returns: 0 on success, -1 on error | ||
162 | |||
163 | struct kvm_memory_region { | ||
164 | __u32 slot; | ||
165 | __u32 flags; | ||
166 | __u64 guest_phys_addr; | ||
167 | __u64 memory_size; /* bytes */ | ||
168 | }; | ||
169 | |||
170 | /* for kvm_memory_region::flags */ | ||
171 | #define KVM_MEM_LOG_DIRTY_PAGES 1UL | ||
172 | |||
173 | This ioctl allows the user to create or modify a guest physical memory | ||
174 | slot. When changing an existing slot, it may be moved in the guest | ||
175 | physical memory space, or its flags may be modified. It may not be | ||
176 | resized. Slots may not overlap. | ||
177 | |||
178 | The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which | ||
179 | instructs kvm to keep track of writes to memory within the slot. See | ||
180 | the KVM_GET_DIRTY_LOG ioctl. | ||
181 | |||
182 | It is recommended to use the KVM_SET_USER_MEMORY_REGION ioctl instead | ||
183 | of this API, if available. This newer API allows placing guest memory | ||
184 | at specified locations in the host address space, yielding better | ||
185 | control and easy access. | ||
186 | |||
187 | 4.6 KVM_CREATE_VCPU | ||
188 | |||
189 | Capability: basic | ||
190 | Architectures: all | ||
191 | Type: vm ioctl | ||
192 | Parameters: vcpu id (apic id on x86) | ||
193 | Returns: vcpu fd on success, -1 on error | ||
194 | |||
195 | This API adds a vcpu to a virtual machine. The vcpu id is a small integer | ||
196 | in the range [0, max_vcpus). | ||
197 | |||
198 | 4.7 KVM_GET_DIRTY_LOG (vm ioctl) | ||
199 | |||
200 | Capability: basic | ||
201 | Architectures: x86 | ||
202 | Type: vm ioctl | ||
203 | Parameters: struct kvm_dirty_log (in/out) | ||
204 | Returns: 0 on success, -1 on error | ||
205 | |||
206 | /* for KVM_GET_DIRTY_LOG */ | ||
207 | struct kvm_dirty_log { | ||
208 | __u32 slot; | ||
209 | __u32 padding; | ||
210 | union { | ||
211 | void __user *dirty_bitmap; /* one bit per page */ | ||
212 | __u64 padding; | ||
213 | }; | ||
214 | }; | ||
215 | |||
216 | Given a memory slot, return a bitmap containing any pages dirtied | ||
217 | since the last call to this ioctl. Bit 0 is the first page in the | ||
218 | memory slot. Ensure the entire structure is cleared to avoid padding | ||
219 | issues. | ||
220 | |||
221 | 4.8 KVM_SET_MEMORY_ALIAS | ||
222 | |||
223 | Capability: basic | ||
224 | Architectures: x86 | ||
225 | Type: vm ioctl | ||
226 | Parameters: struct kvm_memory_alias (in) | ||
227 | Returns: 0 (success), -1 (error) | ||
228 | |||
229 | struct kvm_memory_alias { | ||
230 | __u32 slot; /* this has a different namespace than memory slots */ | ||
231 | __u32 flags; | ||
232 | __u64 guest_phys_addr; | ||
233 | __u64 memory_size; | ||
234 | __u64 target_phys_addr; | ||
235 | }; | ||
236 | |||
237 | Defines a guest physical address space region as an alias to another | ||
238 | region. Useful for aliased address, for example the VGA low memory | ||
239 | window. Should not be used with userspace memory. | ||
240 | |||
241 | 4.9 KVM_RUN | ||
242 | |||
243 | Capability: basic | ||
244 | Architectures: all | ||
245 | Type: vcpu ioctl | ||
246 | Parameters: none | ||
247 | Returns: 0 on success, -1 on error | ||
248 | Errors: | ||
249 | EINTR: an unmasked signal is pending | ||
250 | |||
251 | This ioctl is used to run a guest virtual cpu. While there are no | ||
252 | explicit parameters, there is an implicit parameter block that can be | ||
253 | obtained by mmap()ing the vcpu fd at offset 0, with the size given by | ||
254 | KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct | ||
255 | kvm_run' (see below). | ||
256 | |||
257 | 4.10 KVM_GET_REGS | ||
258 | |||
259 | Capability: basic | ||
260 | Architectures: all | ||
261 | Type: vcpu ioctl | ||
262 | Parameters: struct kvm_regs (out) | ||
263 | Returns: 0 on success, -1 on error | ||
264 | |||
265 | Reads the general purpose registers from the vcpu. | ||
266 | |||
267 | /* x86 */ | ||
268 | struct kvm_regs { | ||
269 | /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */ | ||
270 | __u64 rax, rbx, rcx, rdx; | ||
271 | __u64 rsi, rdi, rsp, rbp; | ||
272 | __u64 r8, r9, r10, r11; | ||
273 | __u64 r12, r13, r14, r15; | ||
274 | __u64 rip, rflags; | ||
275 | }; | ||
276 | |||
277 | 4.11 KVM_SET_REGS | ||
278 | |||
279 | Capability: basic | ||
280 | Architectures: all | ||
281 | Type: vcpu ioctl | ||
282 | Parameters: struct kvm_regs (in) | ||
283 | Returns: 0 on success, -1 on error | ||
284 | |||
285 | Writes the general purpose registers into the vcpu. | ||
286 | |||
287 | See KVM_GET_REGS for the data structure. | ||
288 | |||
289 | 4.12 KVM_GET_SREGS | ||
290 | |||
291 | Capability: basic | ||
292 | Architectures: x86 | ||
293 | Type: vcpu ioctl | ||
294 | Parameters: struct kvm_sregs (out) | ||
295 | Returns: 0 on success, -1 on error | ||
296 | |||
297 | Reads special registers from the vcpu. | ||
298 | |||
299 | /* x86 */ | ||
300 | struct kvm_sregs { | ||
301 | struct kvm_segment cs, ds, es, fs, gs, ss; | ||
302 | struct kvm_segment tr, ldt; | ||
303 | struct kvm_dtable gdt, idt; | ||
304 | __u64 cr0, cr2, cr3, cr4, cr8; | ||
305 | __u64 efer; | ||
306 | __u64 apic_base; | ||
307 | __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64]; | ||
308 | }; | ||
309 | |||
310 | interrupt_bitmap is a bitmap of pending external interrupts. At most | ||
311 | one bit may be set. This interrupt has been acknowledged by the APIC | ||
312 | but not yet injected into the cpu core. | ||
313 | |||
314 | 4.13 KVM_SET_SREGS | ||
315 | |||
316 | Capability: basic | ||
317 | Architectures: x86 | ||
318 | Type: vcpu ioctl | ||
319 | Parameters: struct kvm_sregs (in) | ||
320 | Returns: 0 on success, -1 on error | ||
321 | |||
322 | Writes special registers into the vcpu. See KVM_GET_SREGS for the | ||
323 | data structures. | ||
324 | |||
325 | 4.14 KVM_TRANSLATE | ||
326 | |||
327 | Capability: basic | ||
328 | Architectures: x86 | ||
329 | Type: vcpu ioctl | ||
330 | Parameters: struct kvm_translation (in/out) | ||
331 | Returns: 0 on success, -1 on error | ||
332 | |||
333 | Translates a virtual address according to the vcpu's current address | ||
334 | translation mode. | ||
335 | |||
336 | struct kvm_translation { | ||
337 | /* in */ | ||
338 | __u64 linear_address; | ||
339 | |||
340 | /* out */ | ||
341 | __u64 physical_address; | ||
342 | __u8 valid; | ||
343 | __u8 writeable; | ||
344 | __u8 usermode; | ||
345 | __u8 pad[5]; | ||
346 | }; | ||
347 | |||
348 | 4.15 KVM_INTERRUPT | ||
349 | |||
350 | Capability: basic | ||
351 | Architectures: x86 | ||
352 | Type: vcpu ioctl | ||
353 | Parameters: struct kvm_interrupt (in) | ||
354 | Returns: 0 on success, -1 on error | ||
355 | |||
356 | Queues a hardware interrupt vector to be injected. This is only | ||
357 | useful if in-kernel local APIC is not used. | ||
358 | |||
359 | /* for KVM_INTERRUPT */ | ||
360 | struct kvm_interrupt { | ||
361 | /* in */ | ||
362 | __u32 irq; | ||
363 | }; | ||
364 | |||
365 | Note 'irq' is an interrupt vector, not an interrupt pin or line. | ||
366 | |||
367 | 4.16 KVM_DEBUG_GUEST | ||
368 | |||
369 | Capability: basic | ||
370 | Architectures: none | ||
371 | Type: vcpu ioctl | ||
372 | Parameters: none) | ||
373 | Returns: -1 on error | ||
374 | |||
375 | Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead. | ||
376 | |||
377 | 4.17 KVM_GET_MSRS | ||
378 | |||
379 | Capability: basic | ||
380 | Architectures: x86 | ||
381 | Type: vcpu ioctl | ||
382 | Parameters: struct kvm_msrs (in/out) | ||
383 | Returns: 0 on success, -1 on error | ||
384 | |||
385 | Reads model-specific registers from the vcpu. Supported msr indices can | ||
386 | be obtained using KVM_GET_MSR_INDEX_LIST. | ||
387 | |||
388 | struct kvm_msrs { | ||
389 | __u32 nmsrs; /* number of msrs in entries */ | ||
390 | __u32 pad; | ||
391 | |||
392 | struct kvm_msr_entry entries[0]; | ||
393 | }; | ||
394 | |||
395 | struct kvm_msr_entry { | ||
396 | __u32 index; | ||
397 | __u32 reserved; | ||
398 | __u64 data; | ||
399 | }; | ||
400 | |||
401 | Application code should set the 'nmsrs' member (which indicates the | ||
402 | size of the entries array) and the 'index' member of each array entry. | ||
403 | kvm will fill in the 'data' member. | ||
404 | |||
405 | 4.18 KVM_SET_MSRS | ||
406 | |||
407 | Capability: basic | ||
408 | Architectures: x86 | ||
409 | Type: vcpu ioctl | ||
410 | Parameters: struct kvm_msrs (in) | ||
411 | Returns: 0 on success, -1 on error | ||
412 | |||
413 | Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the | ||
414 | data structures. | ||
415 | |||
416 | Application code should set the 'nmsrs' member (which indicates the | ||
417 | size of the entries array), and the 'index' and 'data' members of each | ||
418 | array entry. | ||
419 | |||
420 | 4.19 KVM_SET_CPUID | ||
421 | |||
422 | Capability: basic | ||
423 | Architectures: x86 | ||
424 | Type: vcpu ioctl | ||
425 | Parameters: struct kvm_cpuid (in) | ||
426 | Returns: 0 on success, -1 on error | ||
427 | |||
428 | Defines the vcpu responses to the cpuid instruction. Applications | ||
429 | should use the KVM_SET_CPUID2 ioctl if available. | ||
430 | |||
431 | |||
432 | struct kvm_cpuid_entry { | ||
433 | __u32 function; | ||
434 | __u32 eax; | ||
435 | __u32 ebx; | ||
436 | __u32 ecx; | ||
437 | __u32 edx; | ||
438 | __u32 padding; | ||
439 | }; | ||
440 | |||
441 | /* for KVM_SET_CPUID */ | ||
442 | struct kvm_cpuid { | ||
443 | __u32 nent; | ||
444 | __u32 padding; | ||
445 | struct kvm_cpuid_entry entries[0]; | ||
446 | }; | ||
447 | |||
448 | 4.20 KVM_SET_SIGNAL_MASK | ||
449 | |||
450 | Capability: basic | ||
451 | Architectures: x86 | ||
452 | Type: vcpu ioctl | ||
453 | Parameters: struct kvm_signal_mask (in) | ||
454 | Returns: 0 on success, -1 on error | ||
455 | |||
456 | Defines which signals are blocked during execution of KVM_RUN. This | ||
457 | signal mask temporarily overrides the threads signal mask. Any | ||
458 | unblocked signal received (except SIGKILL and SIGSTOP, which retain | ||
459 | their traditional behaviour) will cause KVM_RUN to return with -EINTR. | ||
460 | |||
461 | Note the signal will only be delivered if not blocked by the original | ||
462 | signal mask. | ||
463 | |||
464 | /* for KVM_SET_SIGNAL_MASK */ | ||
465 | struct kvm_signal_mask { | ||
466 | __u32 len; | ||
467 | __u8 sigset[0]; | ||
468 | }; | ||
469 | |||
470 | 4.21 KVM_GET_FPU | ||
471 | |||
472 | Capability: basic | ||
473 | Architectures: x86 | ||
474 | Type: vcpu ioctl | ||
475 | Parameters: struct kvm_fpu (out) | ||
476 | Returns: 0 on success, -1 on error | ||
477 | |||
478 | Reads the floating point state from the vcpu. | ||
479 | |||
480 | /* for KVM_GET_FPU and KVM_SET_FPU */ | ||
481 | struct kvm_fpu { | ||
482 | __u8 fpr[8][16]; | ||
483 | __u16 fcw; | ||
484 | __u16 fsw; | ||
485 | __u8 ftwx; /* in fxsave format */ | ||
486 | __u8 pad1; | ||
487 | __u16 last_opcode; | ||
488 | __u64 last_ip; | ||
489 | __u64 last_dp; | ||
490 | __u8 xmm[16][16]; | ||
491 | __u32 mxcsr; | ||
492 | __u32 pad2; | ||
493 | }; | ||
494 | |||
495 | 4.22 KVM_SET_FPU | ||
496 | |||
497 | Capability: basic | ||
498 | Architectures: x86 | ||
499 | Type: vcpu ioctl | ||
500 | Parameters: struct kvm_fpu (in) | ||
501 | Returns: 0 on success, -1 on error | ||
502 | |||
503 | Writes the floating point state to the vcpu. | ||
504 | |||
505 | /* for KVM_GET_FPU and KVM_SET_FPU */ | ||
506 | struct kvm_fpu { | ||
507 | __u8 fpr[8][16]; | ||
508 | __u16 fcw; | ||
509 | __u16 fsw; | ||
510 | __u8 ftwx; /* in fxsave format */ | ||
511 | __u8 pad1; | ||
512 | __u16 last_opcode; | ||
513 | __u64 last_ip; | ||
514 | __u64 last_dp; | ||
515 | __u8 xmm[16][16]; | ||
516 | __u32 mxcsr; | ||
517 | __u32 pad2; | ||
518 | }; | ||
519 | |||
520 | 4.23 KVM_CREATE_IRQCHIP | ||
521 | |||
522 | Capability: KVM_CAP_IRQCHIP | ||
523 | Architectures: x86, ia64 | ||
524 | Type: vm ioctl | ||
525 | Parameters: none | ||
526 | Returns: 0 on success, -1 on error | ||
527 | |||
528 | Creates an interrupt controller model in the kernel. On x86, creates a virtual | ||
529 | ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a | ||
530 | local APIC. IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23 | ||
531 | only go to the IOAPIC. On ia64, a IOSAPIC is created. | ||
532 | |||
533 | 4.24 KVM_IRQ_LINE | ||
534 | |||
535 | Capability: KVM_CAP_IRQCHIP | ||
536 | Architectures: x86, ia64 | ||
537 | Type: vm ioctl | ||
538 | Parameters: struct kvm_irq_level | ||
539 | Returns: 0 on success, -1 on error | ||
540 | |||
541 | Sets the level of a GSI input to the interrupt controller model in the kernel. | ||
542 | Requires that an interrupt controller model has been previously created with | ||
543 | KVM_CREATE_IRQCHIP. Note that edge-triggered interrupts require the level | ||
544 | to be set to 1 and then back to 0. | ||
545 | |||
546 | struct kvm_irq_level { | ||
547 | union { | ||
548 | __u32 irq; /* GSI */ | ||
549 | __s32 status; /* not used for KVM_IRQ_LEVEL */ | ||
550 | }; | ||
551 | __u32 level; /* 0 or 1 */ | ||
552 | }; | ||
553 | |||
554 | 4.25 KVM_GET_IRQCHIP | ||
555 | |||
556 | Capability: KVM_CAP_IRQCHIP | ||
557 | Architectures: x86, ia64 | ||
558 | Type: vm ioctl | ||
559 | Parameters: struct kvm_irqchip (in/out) | ||
560 | Returns: 0 on success, -1 on error | ||
561 | |||
562 | Reads the state of a kernel interrupt controller created with | ||
563 | KVM_CREATE_IRQCHIP into a buffer provided by the caller. | ||
564 | |||
565 | struct kvm_irqchip { | ||
566 | __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ | ||
567 | __u32 pad; | ||
568 | union { | ||
569 | char dummy[512]; /* reserving space */ | ||
570 | struct kvm_pic_state pic; | ||
571 | struct kvm_ioapic_state ioapic; | ||
572 | } chip; | ||
573 | }; | ||
574 | |||
575 | 4.26 KVM_SET_IRQCHIP | ||
576 | |||
577 | Capability: KVM_CAP_IRQCHIP | ||
578 | Architectures: x86, ia64 | ||
579 | Type: vm ioctl | ||
580 | Parameters: struct kvm_irqchip (in) | ||
581 | Returns: 0 on success, -1 on error | ||
582 | |||
583 | Sets the state of a kernel interrupt controller created with | ||
584 | KVM_CREATE_IRQCHIP from a buffer provided by the caller. | ||
585 | |||
586 | struct kvm_irqchip { | ||
587 | __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ | ||
588 | __u32 pad; | ||
589 | union { | ||
590 | char dummy[512]; /* reserving space */ | ||
591 | struct kvm_pic_state pic; | ||
592 | struct kvm_ioapic_state ioapic; | ||
593 | } chip; | ||
594 | }; | ||
595 | |||
596 | 5. The kvm_run structure | ||
597 | |||
598 | Application code obtains a pointer to the kvm_run structure by | ||
599 | mmap()ing a vcpu fd. From that point, application code can control | ||
600 | execution by changing fields in kvm_run prior to calling the KVM_RUN | ||
601 | ioctl, and obtain information about the reason KVM_RUN returned by | ||
602 | looking up structure members. | ||
603 | |||
604 | struct kvm_run { | ||
605 | /* in */ | ||
606 | __u8 request_interrupt_window; | ||
607 | |||
608 | Request that KVM_RUN return when it becomes possible to inject external | ||
609 | interrupts into the guest. Useful in conjunction with KVM_INTERRUPT. | ||
610 | |||
611 | __u8 padding1[7]; | ||
612 | |||
613 | /* out */ | ||
614 | __u32 exit_reason; | ||
615 | |||
616 | When KVM_RUN has returned successfully (return value 0), this informs | ||
617 | application code why KVM_RUN has returned. Allowable values for this | ||
618 | field are detailed below. | ||
619 | |||
620 | __u8 ready_for_interrupt_injection; | ||
621 | |||
622 | If request_interrupt_window has been specified, this field indicates | ||
623 | an interrupt can be injected now with KVM_INTERRUPT. | ||
624 | |||
625 | __u8 if_flag; | ||
626 | |||
627 | The value of the current interrupt flag. Only valid if in-kernel | ||
628 | local APIC is not used. | ||
629 | |||
630 | __u8 padding2[2]; | ||
631 | |||
632 | /* in (pre_kvm_run), out (post_kvm_run) */ | ||
633 | __u64 cr8; | ||
634 | |||
635 | The value of the cr8 register. Only valid if in-kernel local APIC is | ||
636 | not used. Both input and output. | ||
637 | |||
638 | __u64 apic_base; | ||
639 | |||
640 | The value of the APIC BASE msr. Only valid if in-kernel local | ||
641 | APIC is not used. Both input and output. | ||
642 | |||
643 | union { | ||
644 | /* KVM_EXIT_UNKNOWN */ | ||
645 | struct { | ||
646 | __u64 hardware_exit_reason; | ||
647 | } hw; | ||
648 | |||
649 | If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown | ||
650 | reasons. Further architecture-specific information is available in | ||
651 | hardware_exit_reason. | ||
652 | |||
653 | /* KVM_EXIT_FAIL_ENTRY */ | ||
654 | struct { | ||
655 | __u64 hardware_entry_failure_reason; | ||
656 | } fail_entry; | ||
657 | |||
658 | If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due | ||
659 | to unknown reasons. Further architecture-specific information is | ||
660 | available in hardware_entry_failure_reason. | ||
661 | |||
662 | /* KVM_EXIT_EXCEPTION */ | ||
663 | struct { | ||
664 | __u32 exception; | ||
665 | __u32 error_code; | ||
666 | } ex; | ||
667 | |||
668 | Unused. | ||
669 | |||
670 | /* KVM_EXIT_IO */ | ||
671 | struct { | ||
672 | #define KVM_EXIT_IO_IN 0 | ||
673 | #define KVM_EXIT_IO_OUT 1 | ||
674 | __u8 direction; | ||
675 | __u8 size; /* bytes */ | ||
676 | __u16 port; | ||
677 | __u32 count; | ||
678 | __u64 data_offset; /* relative to kvm_run start */ | ||
679 | } io; | ||
680 | |||
681 | If exit_reason is KVM_EXIT_IO_IN or KVM_EXIT_IO_OUT, then the vcpu has | ||
682 | executed a port I/O instruction which could not be satisfied by kvm. | ||
683 | data_offset describes where the data is located (KVM_EXIT_IO_OUT) or | ||
684 | where kvm expects application code to place the data for the next | ||
685 | KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a patcked array. | ||
686 | |||
687 | struct { | ||
688 | struct kvm_debug_exit_arch arch; | ||
689 | } debug; | ||
690 | |||
691 | Unused. | ||
692 | |||
693 | /* KVM_EXIT_MMIO */ | ||
694 | struct { | ||
695 | __u64 phys_addr; | ||
696 | __u8 data[8]; | ||
697 | __u32 len; | ||
698 | __u8 is_write; | ||
699 | } mmio; | ||
700 | |||
701 | If exit_reason is KVM_EXIT_MMIO or KVM_EXIT_IO_OUT, then the vcpu has | ||
702 | executed a memory-mapped I/O instruction which could not be satisfied | ||
703 | by kvm. The 'data' member contains the written data if 'is_write' is | ||
704 | true, and should be filled by application code otherwise. | ||
705 | |||
706 | /* KVM_EXIT_HYPERCALL */ | ||
707 | struct { | ||
708 | __u64 nr; | ||
709 | __u64 args[6]; | ||
710 | __u64 ret; | ||
711 | __u32 longmode; | ||
712 | __u32 pad; | ||
713 | } hypercall; | ||
714 | |||
715 | Unused. | ||
716 | |||
717 | /* KVM_EXIT_TPR_ACCESS */ | ||
718 | struct { | ||
719 | __u64 rip; | ||
720 | __u32 is_write; | ||
721 | __u32 pad; | ||
722 | } tpr_access; | ||
723 | |||
724 | To be documented (KVM_TPR_ACCESS_REPORTING). | ||
725 | |||
726 | /* KVM_EXIT_S390_SIEIC */ | ||
727 | struct { | ||
728 | __u8 icptcode; | ||
729 | __u64 mask; /* psw upper half */ | ||
730 | __u64 addr; /* psw lower half */ | ||
731 | __u16 ipa; | ||
732 | __u32 ipb; | ||
733 | } s390_sieic; | ||
734 | |||
735 | s390 specific. | ||
736 | |||
737 | /* KVM_EXIT_S390_RESET */ | ||
738 | #define KVM_S390_RESET_POR 1 | ||
739 | #define KVM_S390_RESET_CLEAR 2 | ||
740 | #define KVM_S390_RESET_SUBSYSTEM 4 | ||
741 | #define KVM_S390_RESET_CPU_INIT 8 | ||
742 | #define KVM_S390_RESET_IPL 16 | ||
743 | __u64 s390_reset_flags; | ||
744 | |||
745 | s390 specific. | ||
746 | |||
747 | /* KVM_EXIT_DCR */ | ||
748 | struct { | ||
749 | __u32 dcrn; | ||
750 | __u32 data; | ||
751 | __u8 is_write; | ||
752 | } dcr; | ||
753 | |||
754 | powerpc specific. | ||
755 | |||
756 | /* Fix the size of the union. */ | ||
757 | char padding[256]; | ||
758 | }; | ||
759 | }; | ||
diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index e2ddcdeb61b6..6d03487ef1c7 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt | |||
@@ -219,7 +219,7 @@ The following commands can be written to the /proc/acpi/ibm/hotkey file: | |||
219 | echo 0xffffffff > /proc/acpi/ibm/hotkey -- enable all hot keys | 219 | echo 0xffffffff > /proc/acpi/ibm/hotkey -- enable all hot keys |
220 | echo 0 > /proc/acpi/ibm/hotkey -- disable all possible hot keys | 220 | echo 0 > /proc/acpi/ibm/hotkey -- disable all possible hot keys |
221 | ... any other 8-hex-digit mask ... | 221 | ... any other 8-hex-digit mask ... |
222 | echo reset > /proc/acpi/ibm/hotkey -- restore the original mask | 222 | echo reset > /proc/acpi/ibm/hotkey -- restore the recommended mask |
223 | 223 | ||
224 | The following commands have been deprecated and will cause the kernel | 224 | The following commands have been deprecated and will cause the kernel |
225 | to log a warning: | 225 | to log a warning: |
@@ -240,9 +240,13 @@ sysfs notes: | |||
240 | Returns 0. | 240 | Returns 0. |
241 | 241 | ||
242 | hotkey_bios_mask: | 242 | hotkey_bios_mask: |
243 | DEPRECATED, DON'T USE, WILL BE REMOVED IN THE FUTURE. | ||
244 | |||
243 | Returns the hot keys mask when thinkpad-acpi was loaded. | 245 | Returns the hot keys mask when thinkpad-acpi was loaded. |
244 | Upon module unload, the hot keys mask will be restored | 246 | Upon module unload, the hot keys mask will be restored |
245 | to this value. | 247 | to this value. This is always 0x80c, because those are |
248 | the hotkeys that were supported by ancient firmware | ||
249 | without mask support. | ||
246 | 250 | ||
247 | hotkey_enable: | 251 | hotkey_enable: |
248 | DEPRECATED, WILL BE REMOVED SOON. | 252 | DEPRECATED, WILL BE REMOVED SOON. |
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index 1634c6dcecae..50189bf07d53 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX | |||
@@ -60,6 +60,8 @@ framerelay.txt | |||
60 | - info on using Frame Relay/Data Link Connection Identifier (DLCI). | 60 | - info on using Frame Relay/Data Link Connection Identifier (DLCI). |
61 | generic_netlink.txt | 61 | generic_netlink.txt |
62 | - info on Generic Netlink | 62 | - info on Generic Netlink |
63 | ieee802154.txt | ||
64 | - Linux IEEE 802.15.4 implementation, API and drivers | ||
63 | ip-sysctl.txt | 65 | ip-sysctl.txt |
64 | - /proc/sys/net/ipv4/* variables | 66 | - /proc/sys/net/ipv4/* variables |
65 | ip_dynaddr.txt | 67 | ip_dynaddr.txt |
diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt index a0280ad2edc9..23c995e64032 100644 --- a/Documentation/networking/ieee802154.txt +++ b/Documentation/networking/ieee802154.txt | |||
@@ -22,7 +22,7 @@ int sd = socket(PF_IEEE802154, SOCK_DGRAM, 0); | |||
22 | ..... | 22 | ..... |
23 | 23 | ||
24 | The address family, socket addresses etc. are defined in the | 24 | The address family, socket addresses etc. are defined in the |
25 | include/net/ieee802154/af_ieee802154.h header or in the special header | 25 | include/net/af_ieee802154.h header or in the special header |
26 | in our userspace package (see either linux-zigbee sourceforge download page | 26 | in our userspace package (see either linux-zigbee sourceforge download page |
27 | or git tree at git://linux-zigbee.git.sourceforge.net/gitroot/linux-zigbee). | 27 | or git tree at git://linux-zigbee.git.sourceforge.net/gitroot/linux-zigbee). |
28 | 28 | ||
@@ -33,7 +33,7 @@ MLME - MAC Level Management | |||
33 | ============================ | 33 | ============================ |
34 | 34 | ||
35 | Most of IEEE 802.15.4 MLME interfaces are directly mapped on netlink commands. | 35 | Most of IEEE 802.15.4 MLME interfaces are directly mapped on netlink commands. |
36 | See the include/net/ieee802154/nl802154.h header. Our userspace tools package | 36 | See the include/net/nl802154.h header. Our userspace tools package |
37 | (see above) provides CLI configuration utility for radio interfaces and simple | 37 | (see above) provides CLI configuration utility for radio interfaces and simple |
38 | coordinator for IEEE 802.15.4 networks as an example users of MLME protocol. | 38 | coordinator for IEEE 802.15.4 networks as an example users of MLME protocol. |
39 | 39 | ||
@@ -54,10 +54,14 @@ Those types of devices require different approach to be hooked into Linux kernel | |||
54 | HardMAC | 54 | HardMAC |
55 | ======= | 55 | ======= |
56 | 56 | ||
57 | See the header include/net/ieee802154/netdevice.h. You have to implement Linux | 57 | See the header include/net/ieee802154_netdev.h. You have to implement Linux |
58 | net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family | 58 | net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family |
59 | code via plain sk_buffs. The control block of sk_buffs will contain additional | 59 | code via plain sk_buffs. On skb reception skb->cb must contain additional |
60 | info as described in the struct ieee802154_mac_cb. | 60 | info as described in the struct ieee802154_mac_cb. During packet transmission |
61 | the skb->cb is used to provide additional data to device's header_ops->create | ||
62 | function. Be aware, that this data can be overriden later (when socket code | ||
63 | submits skb to qdisc), so if you need something from that cb later, you should | ||
64 | store info in the skb->data on your own. | ||
61 | 65 | ||
62 | To hook the MLME interface you have to populate the ml_priv field of your | 66 | To hook the MLME interface you have to populate the ml_priv field of your |
63 | net_device with a pointer to struct ieee802154_mlme_ops instance. All fields are | 67 | net_device with a pointer to struct ieee802154_mlme_ops instance. All fields are |
@@ -69,8 +73,8 @@ We provide an example of simple HardMAC driver at drivers/ieee802154/fakehard.c | |||
69 | SoftMAC | 73 | SoftMAC |
70 | ======= | 74 | ======= |
71 | 75 | ||
72 | We are going to provide intermediate layer impelementing IEEE 802.15.4 MAC | 76 | We are going to provide intermediate layer implementing IEEE 802.15.4 MAC |
73 | in software. This is currently WIP. | 77 | in software. This is currently WIP. |
74 | 78 | ||
75 | See header include/net/ieee802154/mac802154.h and several drivers in | 79 | See header include/net/mac802154.h and several drivers in drivers/ieee802154/. |
76 | drivers/ieee802154/ | 80 | |
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 8be76235fe67..fbe427a6580c 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt | |||
@@ -311,9 +311,12 @@ tcp_no_metrics_save - BOOLEAN | |||
311 | connections. | 311 | connections. |
312 | 312 | ||
313 | tcp_orphan_retries - INTEGER | 313 | tcp_orphan_retries - INTEGER |
314 | How may times to retry before killing TCP connection, closed | 314 | This value influences the timeout of a locally closed TCP connection, |
315 | by our side. Default value 7 corresponds to ~50sec-16min | 315 | when RTO retransmissions remain unacknowledged. |
316 | depending on RTO. If you machine is loaded WEB server, | 316 | See tcp_retries2 for more details. |
317 | |||
318 | The default value is 7. | ||
319 | If your machine is a loaded WEB server, | ||
317 | you should think about lowering this value, such sockets | 320 | you should think about lowering this value, such sockets |
318 | may consume significant resources. Cf. tcp_max_orphans. | 321 | may consume significant resources. Cf. tcp_max_orphans. |
319 | 322 | ||
@@ -327,16 +330,28 @@ tcp_retrans_collapse - BOOLEAN | |||
327 | certain TCP stacks. | 330 | certain TCP stacks. |
328 | 331 | ||
329 | tcp_retries1 - INTEGER | 332 | tcp_retries1 - INTEGER |
330 | How many times to retry before deciding that something is wrong | 333 | This value influences the time, after which TCP decides, that |
331 | and it is necessary to report this suspicion to network layer. | 334 | something is wrong due to unacknowledged RTO retransmissions, |
332 | Minimal RFC value is 3, it is default, which corresponds | 335 | and reports this suspicion to the network layer. |
333 | to ~3sec-8min depending on RTO. | 336 | See tcp_retries2 for more details. |
337 | |||
338 | RFC 1122 recommends at least 3 retransmissions, which is the | ||
339 | default. | ||
334 | 340 | ||
335 | tcp_retries2 - INTEGER | 341 | tcp_retries2 - INTEGER |
336 | How may times to retry before killing alive TCP connection. | 342 | This value influences the timeout of an alive TCP connection, |
337 | RFC1122 says that the limit should be longer than 100 sec. | 343 | when RTO retransmissions remain unacknowledged. |
338 | It is too small number. Default value 15 corresponds to ~13-30min | 344 | Given a value of N, a hypothetical TCP connection following |
339 | depending on RTO. | 345 | exponential backoff with an initial RTO of TCP_RTO_MIN would |
346 | retransmit N times before killing the connection at the (N+1)th RTO. | ||
347 | |||
348 | The default value of 15 yields a hypothetical timeout of 924.6 | ||
349 | seconds and is a lower bound for the effective timeout. | ||
350 | TCP will effectively time out at the first RTO which exceeds the | ||
351 | hypothetical timeout. | ||
352 | |||
353 | RFC 1122 recommends at least 100 seconds for the timeout, | ||
354 | which corresponds to a value of at least 8. | ||
340 | 355 | ||
341 | tcp_rfc1337 - BOOLEAN | 356 | tcp_rfc1337 - BOOLEAN |
342 | If set, the TCP stack behaves conforming to RFC1337. If unset, | 357 | If set, the TCP stack behaves conforming to RFC1337. If unset, |
@@ -1282,6 +1297,16 @@ sctp_rmem - vector of 3 INTEGERs: min, default, max | |||
1282 | sctp_wmem - vector of 3 INTEGERs: min, default, max | 1297 | sctp_wmem - vector of 3 INTEGERs: min, default, max |
1283 | See tcp_wmem for a description. | 1298 | See tcp_wmem for a description. |
1284 | 1299 | ||
1300 | addr_scope_policy - INTEGER | ||
1301 | Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00 | ||
1302 | |||
1303 | 0 - Disable IPv4 address scoping | ||
1304 | 1 - Enable IPv4 address scoping | ||
1305 | 2 - Follow draft but allow IPv4 private addresses | ||
1306 | 3 - Follow draft but allow IPv4 link local addresses | ||
1307 | |||
1308 | Default: 1 | ||
1309 | |||
1285 | 1310 | ||
1286 | /proc/sys/net/core/* | 1311 | /proc/sys/net/core/* |
1287 | dev_weight - INTEGER | 1312 | dev_weight - INTEGER |
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt new file mode 100644 index 000000000000..f49a33b704d2 --- /dev/null +++ b/Documentation/power/runtime_pm.txt | |||
@@ -0,0 +1,378 @@ | |||
1 | Run-time Power Management Framework for I/O Devices | ||
2 | |||
3 | (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. | ||
4 | |||
5 | 1. Introduction | ||
6 | |||
7 | Support for run-time power management (run-time PM) of I/O devices is provided | ||
8 | at the power management core (PM core) level by means of: | ||
9 | |||
10 | * The power management workqueue pm_wq in which bus types and device drivers can | ||
11 | put their PM-related work items. It is strongly recommended that pm_wq be | ||
12 | used for queuing all work items related to run-time PM, because this allows | ||
13 | them to be synchronized with system-wide power transitions (suspend to RAM, | ||
14 | hibernation and resume from system sleep states). pm_wq is declared in | ||
15 | include/linux/pm_runtime.h and defined in kernel/power/main.c. | ||
16 | |||
17 | * A number of run-time PM fields in the 'power' member of 'struct device' (which | ||
18 | is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can | ||
19 | be used for synchronizing run-time PM operations with one another. | ||
20 | |||
21 | * Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in | ||
22 | include/linux/pm.h). | ||
23 | |||
24 | * A set of helper functions defined in drivers/base/power/runtime.c that can be | ||
25 | used for carrying out run-time PM operations in such a way that the | ||
26 | synchronization between them is taken care of by the PM core. Bus types and | ||
27 | device drivers are encouraged to use these functions. | ||
28 | |||
29 | The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM | ||
30 | fields of 'struct dev_pm_info' and the core helper functions provided for | ||
31 | run-time PM are described below. | ||
32 | |||
33 | 2. Device Run-time PM Callbacks | ||
34 | |||
35 | There are three device run-time PM callbacks defined in 'struct dev_pm_ops': | ||
36 | |||
37 | struct dev_pm_ops { | ||
38 | ... | ||
39 | int (*runtime_suspend)(struct device *dev); | ||
40 | int (*runtime_resume)(struct device *dev); | ||
41 | void (*runtime_idle)(struct device *dev); | ||
42 | ... | ||
43 | }; | ||
44 | |||
45 | The ->runtime_suspend() callback is executed by the PM core for the bus type of | ||
46 | the device being suspended. The bus type's callback is then _entirely_ | ||
47 | _responsible_ for handling the device as appropriate, which may, but need not | ||
48 | include executing the device driver's own ->runtime_suspend() callback (from the | ||
49 | PM core's point of view it is not necessary to implement a ->runtime_suspend() | ||
50 | callback in a device driver as long as the bus type's ->runtime_suspend() knows | ||
51 | what to do to handle the device). | ||
52 | |||
53 | * Once the bus type's ->runtime_suspend() callback has completed successfully | ||
54 | for given device, the PM core regards the device as suspended, which need | ||
55 | not mean that the device has been put into a low power state. It is | ||
56 | supposed to mean, however, that the device will not process data and will | ||
57 | not communicate with the CPU(s) and RAM until its bus type's | ||
58 | ->runtime_resume() callback is executed for it. The run-time PM status of | ||
59 | a device after successful execution of its bus type's ->runtime_suspend() | ||
60 | callback is 'suspended'. | ||
61 | |||
62 | * If the bus type's ->runtime_suspend() callback returns -EBUSY or -EAGAIN, | ||
63 | the device's run-time PM status is supposed to be 'active', which means that | ||
64 | the device _must_ be fully operational afterwards. | ||
65 | |||
66 | * If the bus type's ->runtime_suspend() callback returns an error code | ||
67 | different from -EBUSY or -EAGAIN, the PM core regards this as a fatal | ||
68 | error and will refuse to run the helper functions described in Section 4 | ||
69 | for the device, until the status of it is directly set either to 'active' | ||
70 | or to 'suspended' (the PM core provides special helper functions for this | ||
71 | purpose). | ||
72 | |||
73 | In particular, if the driver requires remote wakeup capability for proper | ||
74 | functioning and device_may_wakeup() returns 'false' for the device, then | ||
75 | ->runtime_suspend() should return -EBUSY. On the other hand, if | ||
76 | device_may_wakeup() returns 'true' for the device and the device is put | ||
77 | into a low power state during the execution of its bus type's | ||
78 | ->runtime_suspend(), it is expected that remote wake-up (i.e. hardware mechanism | ||
79 | allowing the device to request a change of its power state, such as PCI PME) | ||
80 | will be enabled for the device. Generally, remote wake-up should be enabled | ||
81 | for all input devices put into a low power state at run time. | ||
82 | |||
83 | The ->runtime_resume() callback is executed by the PM core for the bus type of | ||
84 | the device being woken up. The bus type's callback is then _entirely_ | ||
85 | _responsible_ for handling the device as appropriate, which may, but need not | ||
86 | include executing the device driver's own ->runtime_resume() callback (from the | ||
87 | PM core's point of view it is not necessary to implement a ->runtime_resume() | ||
88 | callback in a device driver as long as the bus type's ->runtime_resume() knows | ||
89 | what to do to handle the device). | ||
90 | |||
91 | * Once the bus type's ->runtime_resume() callback has completed successfully, | ||
92 | the PM core regards the device as fully operational, which means that the | ||
93 | device _must_ be able to complete I/O operations as needed. The run-time | ||
94 | PM status of the device is then 'active'. | ||
95 | |||
96 | * If the bus type's ->runtime_resume() callback returns an error code, the PM | ||
97 | core regards this as a fatal error and will refuse to run the helper | ||
98 | functions described in Section 4 for the device, until its status is | ||
99 | directly set either to 'active' or to 'suspended' (the PM core provides | ||
100 | special helper functions for this purpose). | ||
101 | |||
102 | The ->runtime_idle() callback is executed by the PM core for the bus type of | ||
103 | given device whenever the device appears to be idle, which is indicated to the | ||
104 | PM core by two counters, the device's usage counter and the counter of 'active' | ||
105 | children of the device. | ||
106 | |||
107 | * If any of these counters is decreased using a helper function provided by | ||
108 | the PM core and it turns out to be equal to zero, the other counter is | ||
109 | checked. If that counter also is equal to zero, the PM core executes the | ||
110 | device bus type's ->runtime_idle() callback (with the device as an | ||
111 | argument). | ||
112 | |||
113 | The action performed by a bus type's ->runtime_idle() callback is totally | ||
114 | dependent on the bus type in question, but the expected and recommended action | ||
115 | is to check if the device can be suspended (i.e. if all of the conditions | ||
116 | necessary for suspending the device are satisfied) and to queue up a suspend | ||
117 | request for the device in that case. | ||
118 | |||
119 | The helper functions provided by the PM core, described in Section 4, guarantee | ||
120 | that the following constraints are met with respect to the bus type's run-time | ||
121 | PM callbacks: | ||
122 | |||
123 | (1) The callbacks are mutually exclusive (e.g. it is forbidden to execute | ||
124 | ->runtime_suspend() in parallel with ->runtime_resume() or with another | ||
125 | instance of ->runtime_suspend() for the same device) with the exception that | ||
126 | ->runtime_suspend() or ->runtime_resume() can be executed in parallel with | ||
127 | ->runtime_idle() (although ->runtime_idle() will not be started while any | ||
128 | of the other callbacks is being executed for the same device). | ||
129 | |||
130 | (2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active' | ||
131 | devices (i.e. the PM core will only execute ->runtime_idle() or | ||
132 | ->runtime_suspend() for the devices the run-time PM status of which is | ||
133 | 'active'). | ||
134 | |||
135 | (3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device | ||
136 | the usage counter of which is equal to zero _and_ either the counter of | ||
137 | 'active' children of which is equal to zero, or the 'power.ignore_children' | ||
138 | flag of which is set. | ||
139 | |||
140 | (4) ->runtime_resume() can only be executed for 'suspended' devices (i.e. the | ||
141 | PM core will only execute ->runtime_resume() for the devices the run-time | ||
142 | PM status of which is 'suspended'). | ||
143 | |||
144 | Additionally, the helper functions provided by the PM core obey the following | ||
145 | rules: | ||
146 | |||
147 | * If ->runtime_suspend() is about to be executed or there's a pending request | ||
148 | to execute it, ->runtime_idle() will not be executed for the same device. | ||
149 | |||
150 | * A request to execute or to schedule the execution of ->runtime_suspend() | ||
151 | will cancel any pending requests to execute ->runtime_idle() for the same | ||
152 | device. | ||
153 | |||
154 | * If ->runtime_resume() is about to be executed or there's a pending request | ||
155 | to execute it, the other callbacks will not be executed for the same device. | ||
156 | |||
157 | * A request to execute ->runtime_resume() will cancel any pending or | ||
158 | scheduled requests to execute the other callbacks for the same device. | ||
159 | |||
160 | 3. Run-time PM Device Fields | ||
161 | |||
162 | The following device run-time PM fields are present in 'struct dev_pm_info', as | ||
163 | defined in include/linux/pm.h: | ||
164 | |||
165 | struct timer_list suspend_timer; | ||
166 | - timer used for scheduling (delayed) suspend request | ||
167 | |||
168 | unsigned long timer_expires; | ||
169 | - timer expiration time, in jiffies (if this is different from zero, the | ||
170 | timer is running and will expire at that time, otherwise the timer is not | ||
171 | running) | ||
172 | |||
173 | struct work_struct work; | ||
174 | - work structure used for queuing up requests (i.e. work items in pm_wq) | ||
175 | |||
176 | wait_queue_head_t wait_queue; | ||
177 | - wait queue used if any of the helper functions needs to wait for another | ||
178 | one to complete | ||
179 | |||
180 | spinlock_t lock; | ||
181 | - lock used for synchronisation | ||
182 | |||
183 | atomic_t usage_count; | ||
184 | - the usage counter of the device | ||
185 | |||
186 | atomic_t child_count; | ||
187 | - the count of 'active' children of the device | ||
188 | |||
189 | unsigned int ignore_children; | ||
190 | - if set, the value of child_count is ignored (but still updated) | ||
191 | |||
192 | unsigned int disable_depth; | ||
193 | - used for disabling the helper funcions (they work normally if this is | ||
194 | equal to zero); the initial value of it is 1 (i.e. run-time PM is | ||
195 | initially disabled for all devices) | ||
196 | |||
197 | unsigned int runtime_error; | ||
198 | - if set, there was a fatal error (one of the callbacks returned error code | ||
199 | as described in Section 2), so the helper funtions will not work until | ||
200 | this flag is cleared; this is the error code returned by the failing | ||
201 | callback | ||
202 | |||
203 | unsigned int idle_notification; | ||
204 | - if set, ->runtime_idle() is being executed | ||
205 | |||
206 | unsigned int request_pending; | ||
207 | - if set, there's a pending request (i.e. a work item queued up into pm_wq) | ||
208 | |||
209 | enum rpm_request request; | ||
210 | - type of request that's pending (valid if request_pending is set) | ||
211 | |||
212 | unsigned int deferred_resume; | ||
213 | - set if ->runtime_resume() is about to be run while ->runtime_suspend() is | ||
214 | being executed for that device and it is not practical to wait for the | ||
215 | suspend to complete; means "start a resume as soon as you've suspended" | ||
216 | |||
217 | enum rpm_status runtime_status; | ||
218 | - the run-time PM status of the device; this field's initial value is | ||
219 | RPM_SUSPENDED, which means that each device is initially regarded by the | ||
220 | PM core as 'suspended', regardless of its real hardware status | ||
221 | |||
222 | All of the above fields are members of the 'power' member of 'struct device'. | ||
223 | |||
224 | 4. Run-time PM Device Helper Functions | ||
225 | |||
226 | The following run-time PM helper functions are defined in | ||
227 | drivers/base/power/runtime.c and include/linux/pm_runtime.h: | ||
228 | |||
229 | void pm_runtime_init(struct device *dev); | ||
230 | - initialize the device run-time PM fields in 'struct dev_pm_info' | ||
231 | |||
232 | void pm_runtime_remove(struct device *dev); | ||
233 | - make sure that the run-time PM of the device will be disabled after | ||
234 | removing the device from device hierarchy | ||
235 | |||
236 | int pm_runtime_idle(struct device *dev); | ||
237 | - execute ->runtime_idle() for the device's bus type; returns 0 on success | ||
238 | or error code on failure, where -EINPROGRESS means that ->runtime_idle() | ||
239 | is already being executed | ||
240 | |||
241 | int pm_runtime_suspend(struct device *dev); | ||
242 | - execute ->runtime_suspend() for the device's bus type; returns 0 on | ||
243 | success, 1 if the device's run-time PM status was already 'suspended', or | ||
244 | error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt | ||
245 | to suspend the device again in future | ||
246 | |||
247 | int pm_runtime_resume(struct device *dev); | ||
248 | - execute ->runtime_resume() for the device's bus type; returns 0 on | ||
249 | success, 1 if the device's run-time PM status was already 'active' or | ||
250 | error code on failure, where -EAGAIN means it may be safe to attempt to | ||
251 | resume the device again in future, but 'power.runtime_error' should be | ||
252 | checked additionally | ||
253 | |||
254 | int pm_request_idle(struct device *dev); | ||
255 | - submit a request to execute ->runtime_idle() for the device's bus type | ||
256 | (the request is represented by a work item in pm_wq); returns 0 on success | ||
257 | or error code if the request has not been queued up | ||
258 | |||
259 | int pm_schedule_suspend(struct device *dev, unsigned int delay); | ||
260 | - schedule the execution of ->runtime_suspend() for the device's bus type | ||
261 | in future, where 'delay' is the time to wait before queuing up a suspend | ||
262 | work item in pm_wq, in milliseconds (if 'delay' is zero, the work item is | ||
263 | queued up immediately); returns 0 on success, 1 if the device's PM | ||
264 | run-time status was already 'suspended', or error code if the request | ||
265 | hasn't been scheduled (or queued up if 'delay' is 0); if the execution of | ||
266 | ->runtime_suspend() is already scheduled and not yet expired, the new | ||
267 | value of 'delay' will be used as the time to wait | ||
268 | |||
269 | int pm_request_resume(struct device *dev); | ||
270 | - submit a request to execute ->runtime_resume() for the device's bus type | ||
271 | (the request is represented by a work item in pm_wq); returns 0 on | ||
272 | success, 1 if the device's run-time PM status was already 'active', or | ||
273 | error code if the request hasn't been queued up | ||
274 | |||
275 | void pm_runtime_get_noresume(struct device *dev); | ||
276 | - increment the device's usage counter | ||
277 | |||
278 | int pm_runtime_get(struct device *dev); | ||
279 | - increment the device's usage counter, run pm_request_resume(dev) and | ||
280 | return its result | ||
281 | |||
282 | int pm_runtime_get_sync(struct device *dev); | ||
283 | - increment the device's usage counter, run pm_runtime_resume(dev) and | ||
284 | return its result | ||
285 | |||
286 | void pm_runtime_put_noidle(struct device *dev); | ||
287 | - decrement the device's usage counter | ||
288 | |||
289 | int pm_runtime_put(struct device *dev); | ||
290 | - decrement the device's usage counter, run pm_request_idle(dev) and return | ||
291 | its result | ||
292 | |||
293 | int pm_runtime_put_sync(struct device *dev); | ||
294 | - decrement the device's usage counter, run pm_runtime_idle(dev) and return | ||
295 | its result | ||
296 | |||
297 | void pm_runtime_enable(struct device *dev); | ||
298 | - enable the run-time PM helper functions to run the device bus type's | ||
299 | run-time PM callbacks described in Section 2 | ||
300 | |||
301 | int pm_runtime_disable(struct device *dev); | ||
302 | - prevent the run-time PM helper functions from running the device bus | ||
303 | type's run-time PM callbacks, make sure that all of the pending run-time | ||
304 | PM operations on the device are either completed or canceled; returns | ||
305 | 1 if there was a resume request pending and it was necessary to execute | ||
306 | ->runtime_resume() for the device's bus type to satisfy that request, | ||
307 | otherwise 0 is returned | ||
308 | |||
309 | void pm_suspend_ignore_children(struct device *dev, bool enable); | ||
310 | - set/unset the power.ignore_children flag of the device | ||
311 | |||
312 | int pm_runtime_set_active(struct device *dev); | ||
313 | - clear the device's 'power.runtime_error' flag, set the device's run-time | ||
314 | PM status to 'active' and update its parent's counter of 'active' | ||
315 | children as appropriate (it is only valid to use this function if | ||
316 | 'power.runtime_error' is set or 'power.disable_depth' is greater than | ||
317 | zero); it will fail and return error code if the device has a parent | ||
318 | which is not active and the 'power.ignore_children' flag of which is unset | ||
319 | |||
320 | void pm_runtime_set_suspended(struct device *dev); | ||
321 | - clear the device's 'power.runtime_error' flag, set the device's run-time | ||
322 | PM status to 'suspended' and update its parent's counter of 'active' | ||
323 | children as appropriate (it is only valid to use this function if | ||
324 | 'power.runtime_error' is set or 'power.disable_depth' is greater than | ||
325 | zero) | ||
326 | |||
327 | It is safe to execute the following helper functions from interrupt context: | ||
328 | |||
329 | pm_request_idle() | ||
330 | pm_schedule_suspend() | ||
331 | pm_request_resume() | ||
332 | pm_runtime_get_noresume() | ||
333 | pm_runtime_get() | ||
334 | pm_runtime_put_noidle() | ||
335 | pm_runtime_put() | ||
336 | pm_suspend_ignore_children() | ||
337 | pm_runtime_set_active() | ||
338 | pm_runtime_set_suspended() | ||
339 | pm_runtime_enable() | ||
340 | |||
341 | 5. Run-time PM Initialization, Device Probing and Removal | ||
342 | |||
343 | Initially, the run-time PM is disabled for all devices, which means that the | ||
344 | majority of the run-time PM helper funtions described in Section 4 will return | ||
345 | -EAGAIN until pm_runtime_enable() is called for the device. | ||
346 | |||
347 | In addition to that, the initial run-time PM status of all devices is | ||
348 | 'suspended', but it need not reflect the actual physical state of the device. | ||
349 | Thus, if the device is initially active (i.e. it is able to process I/O), its | ||
350 | run-time PM status must be changed to 'active', with the help of | ||
351 | pm_runtime_set_active(), before pm_runtime_enable() is called for the device. | ||
352 | |||
353 | However, if the device has a parent and the parent's run-time PM is enabled, | ||
354 | calling pm_runtime_set_active() for the device will affect the parent, unless | ||
355 | the parent's 'power.ignore_children' flag is set. Namely, in that case the | ||
356 | parent won't be able to suspend at run time, using the PM core's helper | ||
357 | functions, as long as the child's status is 'active', even if the child's | ||
358 | run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for | ||
359 | the child yet or pm_runtime_disable() has been called for it). For this reason, | ||
360 | once pm_runtime_set_active() has been called for the device, pm_runtime_enable() | ||
361 | should be called for it too as soon as reasonably possible or its run-time PM | ||
362 | status should be changed back to 'suspended' with the help of | ||
363 | pm_runtime_set_suspended(). | ||
364 | |||
365 | If the default initial run-time PM status of the device (i.e. 'suspended') | ||
366 | reflects the actual state of the device, its bus type's or its driver's | ||
367 | ->probe() callback will likely need to wake it up using one of the PM core's | ||
368 | helper functions described in Section 4. In that case, pm_runtime_resume() | ||
369 | should be used. Of course, for this purpose the device's run-time PM has to be | ||
370 | enabled earlier by calling pm_runtime_enable(). | ||
371 | |||
372 | If the device bus type's or driver's ->probe() or ->remove() callback runs | ||
373 | pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts, | ||
374 | they will fail returning -EAGAIN, because the device's usage counter is | ||
375 | incremented by the core before executing ->probe() and ->remove(). Still, it | ||
376 | may be desirable to suspend the device as soon as ->probe() or ->remove() has | ||
377 | finished, so the PM core uses pm_runtime_idle_sync() to invoke the device bus | ||
378 | type's ->runtime_idle() callback at that time. | ||
diff --git a/Documentation/s390/s390dbf.txt b/Documentation/s390/s390dbf.txt index 2d10053dd97e..ae66f9b90a25 100644 --- a/Documentation/s390/s390dbf.txt +++ b/Documentation/s390/s390dbf.txt | |||
@@ -495,6 +495,13 @@ and for each vararg a long value. So e.g. for a debug entry with a format | |||
495 | string plus two varargs one would need to allocate a (3 * sizeof(long)) | 495 | string plus two varargs one would need to allocate a (3 * sizeof(long)) |
496 | byte data area in the debug_register() function. | 496 | byte data area in the debug_register() function. |
497 | 497 | ||
498 | IMPORTANT: Using "%s" in sprintf event functions is dangerous. You can only | ||
499 | use "%s" in the sprintf event functions, if the memory for the passed string is | ||
500 | available as long as the debug feature exists. The reason behind this is that | ||
501 | due to performance considerations only a pointer to the string is stored in | ||
502 | the debug feature. If you log a string that is freed afterwards, you will get | ||
503 | an OOPS when inspecting the debug feature, because then the debug feature will | ||
504 | access the already freed memory. | ||
498 | 505 | ||
499 | NOTE: If using the sprintf view do NOT use other event/exception functions | 506 | NOTE: If using the sprintf view do NOT use other event/exception functions |
500 | than the sprintf-event and -exception functions. | 507 | than the sprintf-event and -exception functions. |
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt index 4252697a95d6..1c8eb4518ce0 100644 --- a/Documentation/sound/alsa/ALSA-Configuration.txt +++ b/Documentation/sound/alsa/ALSA-Configuration.txt | |||
@@ -60,6 +60,12 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
60 | slots - Reserve the slot index for the given driver. | 60 | slots - Reserve the slot index for the given driver. |
61 | This option takes multiple strings. | 61 | This option takes multiple strings. |
62 | See "Module Autoloading Support" section for details. | 62 | See "Module Autoloading Support" section for details. |
63 | debug - Specifies the debug message level | ||
64 | (0 = disable debug prints, 1 = normal debug messages, | ||
65 | 2 = verbose debug messages) | ||
66 | This option appears only when CONFIG_SND_DEBUG=y. | ||
67 | This option can be dynamically changed via sysfs | ||
68 | /sys/modules/snd/parameters/debug file. | ||
63 | 69 | ||
64 | Module snd-pcm-oss | 70 | Module snd-pcm-oss |
65 | ------------------ | 71 | ------------------ |
@@ -513,6 +519,26 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
513 | or input, but you may use this module for any application which | 519 | or input, but you may use this module for any application which |
514 | requires a sound card (like RealPlayer). | 520 | requires a sound card (like RealPlayer). |
515 | 521 | ||
522 | pcm_devs - Number of PCM devices assigned to each card | ||
523 | (default = 1, up to 4) | ||
524 | pcm_substreams - Number of PCM substreams assigned to each PCM | ||
525 | (default = 8, up to 16) | ||
526 | hrtimer - Use hrtimer (=1, default) or system timer (=0) | ||
527 | fake_buffer - Fake buffer allocations (default = 1) | ||
528 | |||
529 | When multiple PCM devices are created, snd-dummy gives different | ||
530 | behavior to each PCM device: | ||
531 | 0 = interleaved with mmap support | ||
532 | 1 = non-interleaved with mmap support | ||
533 | 2 = interleaved without mmap | ||
534 | 3 = non-interleaved without mmap | ||
535 | |||
536 | As default, snd-dummy drivers doesn't allocate the real buffers | ||
537 | but either ignores read/write or mmap a single dummy page to all | ||
538 | buffer pages, in order to save the resouces. If your apps need | ||
539 | the read/ written buffer data to be consistent, pass fake_buffer=0 | ||
540 | option. | ||
541 | |||
516 | The power-management is supported. | 542 | The power-management is supported. |
517 | 543 | ||
518 | Module snd-echo3g | 544 | Module snd-echo3g |
@@ -768,6 +794,10 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
768 | bdl_pos_adj - Specifies the DMA IRQ timing delay in samples. | 794 | bdl_pos_adj - Specifies the DMA IRQ timing delay in samples. |
769 | Passing -1 will make the driver to choose the appropriate | 795 | Passing -1 will make the driver to choose the appropriate |
770 | value based on the controller chip. | 796 | value based on the controller chip. |
797 | patch - Specifies the early "patch" files to modify the HD-audio | ||
798 | setup before initializing the codecs. This option is | ||
799 | available only when CONFIG_SND_HDA_PATCH_LOADER=y is set. | ||
800 | See HD-Audio.txt for details. | ||
771 | 801 | ||
772 | [Single (global) options] | 802 | [Single (global) options] |
773 | single_cmd - Use single immediate commands to communicate with | 803 | single_cmd - Use single immediate commands to communicate with |
diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt index 939a3dd58148..97eebd63bedc 100644 --- a/Documentation/sound/alsa/HD-Audio-Models.txt +++ b/Documentation/sound/alsa/HD-Audio-Models.txt | |||
@@ -114,8 +114,8 @@ ALC662/663/272 | |||
114 | samsung-nc10 Samsung NC10 mini notebook | 114 | samsung-nc10 Samsung NC10 mini notebook |
115 | auto auto-config reading BIOS (default) | 115 | auto auto-config reading BIOS (default) |
116 | 116 | ||
117 | ALC882/885 | 117 | ALC882/883/885/888/889 |
118 | ========== | 118 | ====================== |
119 | 3stack-dig 3-jack with SPDIF I/O | 119 | 3stack-dig 3-jack with SPDIF I/O |
120 | 6stack-dig 6-jack digital with SPDIF I/O | 120 | 6stack-dig 6-jack digital with SPDIF I/O |
121 | arima Arima W820Di1 | 121 | arima Arima W820Di1 |
@@ -127,12 +127,8 @@ ALC882/885 | |||
127 | mbp3 Macbook Pro rev3 | 127 | mbp3 Macbook Pro rev3 |
128 | imac24 iMac 24'' with jack detection | 128 | imac24 iMac 24'' with jack detection |
129 | w2jc ASUS W2JC | 129 | w2jc ASUS W2JC |
130 | auto auto-config reading BIOS (default) | 130 | 3stack-2ch-dig 3-jack with SPDIF I/O (ALC883) |
131 | 131 | alc883-6stack-dig 6-jack digital with SPDIF I/O (ALC883) | |
132 | ALC883/888 | ||
133 | ========== | ||
134 | 3stack-dig 3-jack with SPDIF I/O | ||
135 | 6stack-dig 6-jack digital with SPDIF I/O | ||
136 | 3stack-6ch 3-jack 6-channel | 132 | 3stack-6ch 3-jack 6-channel |
137 | 3stack-6ch-dig 3-jack 6-channel with SPDIF I/O | 133 | 3stack-6ch-dig 3-jack 6-channel with SPDIF I/O |
138 | 6stack-dig-demo 6-jack digital for Intel demo board | 134 | 6stack-dig-demo 6-jack digital for Intel demo board |
@@ -140,6 +136,7 @@ ALC883/888 | |||
140 | acer-aspire Acer Aspire 9810 | 136 | acer-aspire Acer Aspire 9810 |
141 | acer-aspire-4930g Acer Aspire 4930G | 137 | acer-aspire-4930g Acer Aspire 4930G |
142 | acer-aspire-6530g Acer Aspire 6530G | 138 | acer-aspire-6530g Acer Aspire 6530G |
139 | acer-aspire-7730g Acer Aspire 7730G | ||
143 | acer-aspire-8930g Acer Aspire 8930G | 140 | acer-aspire-8930g Acer Aspire 8930G |
144 | medion Medion Laptops | 141 | medion Medion Laptops |
145 | medion-md2 Medion MD2 | 142 | medion-md2 Medion MD2 |
@@ -155,10 +152,13 @@ ALC883/888 | |||
155 | 3stack-hp HP machines with 3stack (Lucknow, Samba boards) | 152 | 3stack-hp HP machines with 3stack (Lucknow, Samba boards) |
156 | 6stack-dell Dell machines with 6stack (Inspiron 530) | 153 | 6stack-dell Dell machines with 6stack (Inspiron 530) |
157 | mitac Mitac 8252D | 154 | mitac Mitac 8252D |
155 | clevo-m540r Clevo M540R (6ch + digital) | ||
158 | clevo-m720 Clevo M720 laptop series | 156 | clevo-m720 Clevo M720 laptop series |
159 | fujitsu-pi2515 Fujitsu AMILO Pi2515 | 157 | fujitsu-pi2515 Fujitsu AMILO Pi2515 |
160 | fujitsu-xa3530 Fujitsu AMILO XA3530 | 158 | fujitsu-xa3530 Fujitsu AMILO XA3530 |
161 | 3stack-6ch-intel Intel DG33* boards | 159 | 3stack-6ch-intel Intel DG33* boards |
160 | intel-alc889a Intel IbexPeak with ALC889A | ||
161 | intel-x58 Intel DX58 with ALC889 | ||
162 | asus-p5q ASUS P5Q-EM boards | 162 | asus-p5q ASUS P5Q-EM boards |
163 | mb31 MacBook 3,1 | 163 | mb31 MacBook 3,1 |
164 | sony-vaio-tt Sony VAIO TT | 164 | sony-vaio-tt Sony VAIO TT |
@@ -229,7 +229,7 @@ AD1984 | |||
229 | ====== | 229 | ====== |
230 | basic default configuration | 230 | basic default configuration |
231 | thinkpad Lenovo Thinkpad T61/X61 | 231 | thinkpad Lenovo Thinkpad T61/X61 |
232 | dell Dell T3400 | 232 | dell_desktop Dell T3400 |
233 | 233 | ||
234 | AD1986A | 234 | AD1986A |
235 | ======= | 235 | ======= |
@@ -258,6 +258,7 @@ Conexant 5045 | |||
258 | laptop-micsense Laptop with Mic sense (old model fujitsu) | 258 | laptop-micsense Laptop with Mic sense (old model fujitsu) |
259 | laptop-hpmicsense Laptop with HP and Mic senses | 259 | laptop-hpmicsense Laptop with HP and Mic senses |
260 | benq Benq R55E | 260 | benq Benq R55E |
261 | laptop-hp530 HP 530 laptop | ||
261 | test for testing/debugging purpose, almost all controls | 262 | test for testing/debugging purpose, almost all controls |
262 | can be adjusted. Appearing only when compiled with | 263 | can be adjusted. Appearing only when compiled with |
263 | $CONFIG_SND_DEBUG=y | 264 | $CONFIG_SND_DEBUG=y |
@@ -278,9 +279,16 @@ Conexant 5051 | |||
278 | hp-dv6736 HP dv6736 | 279 | hp-dv6736 HP dv6736 |
279 | lenovo-x200 Lenovo X200 laptop | 280 | lenovo-x200 Lenovo X200 laptop |
280 | 281 | ||
282 | Conexant 5066 | ||
283 | ============= | ||
284 | laptop Basic Laptop config (default) | ||
285 | dell-laptop Dell laptops | ||
286 | olpc-xo-1_5 OLPC XO 1.5 | ||
287 | |||
281 | STAC9200 | 288 | STAC9200 |
282 | ======== | 289 | ======== |
283 | ref Reference board | 290 | ref Reference board |
291 | oqo OQO Model 2 | ||
284 | dell-d21 Dell (unknown) | 292 | dell-d21 Dell (unknown) |
285 | dell-d22 Dell (unknown) | 293 | dell-d22 Dell (unknown) |
286 | dell-d23 Dell (unknown) | 294 | dell-d23 Dell (unknown) |
@@ -368,10 +376,12 @@ STAC92HD73* | |||
368 | =========== | 376 | =========== |
369 | ref Reference board | 377 | ref Reference board |
370 | no-jd BIOS setup but without jack-detection | 378 | no-jd BIOS setup but without jack-detection |
379 | intel Intel DG45* mobos | ||
371 | dell-m6-amic Dell desktops/laptops with analog mics | 380 | dell-m6-amic Dell desktops/laptops with analog mics |
372 | dell-m6-dmic Dell desktops/laptops with digital mics | 381 | dell-m6-dmic Dell desktops/laptops with digital mics |
373 | dell-m6 Dell desktops/laptops with both type of mics | 382 | dell-m6 Dell desktops/laptops with both type of mics |
374 | dell-eq Dell desktops/laptops | 383 | dell-eq Dell desktops/laptops |
384 | alienware Alienware M17x | ||
375 | auto BIOS setup (default) | 385 | auto BIOS setup (default) |
376 | 386 | ||
377 | STAC92HD83* | 387 | STAC92HD83* |
@@ -385,3 +395,8 @@ STAC9872 | |||
385 | ======== | 395 | ======== |
386 | vaio VAIO laptop without SPDIF | 396 | vaio VAIO laptop without SPDIF |
387 | auto BIOS setup (default) | 397 | auto BIOS setup (default) |
398 | |||
399 | Cirrus Logic CS4206/4207 | ||
400 | ======================== | ||
401 | mbp55 MacBook Pro 5,5 | ||
402 | auto BIOS setup (default) | ||
diff --git a/Documentation/sound/alsa/HD-Audio.txt b/Documentation/sound/alsa/HD-Audio.txt index 71ac995b1915..7b8a5f947d1d 100644 --- a/Documentation/sound/alsa/HD-Audio.txt +++ b/Documentation/sound/alsa/HD-Audio.txt | |||
@@ -139,6 +139,10 @@ The driver checks PCI SSID and looks through the static configuration | |||
139 | table until any matching entry is found. If you have a new machine, | 139 | table until any matching entry is found. If you have a new machine, |
140 | you may see a message like below: | 140 | you may see a message like below: |
141 | ------------------------------------------------------------------------ | 141 | ------------------------------------------------------------------------ |
142 | hda_codec: ALC880: BIOS auto-probing. | ||
143 | ------------------------------------------------------------------------ | ||
144 | Meanwhile, in the earlier versions, you would see a message like: | ||
145 | ------------------------------------------------------------------------ | ||
142 | hda_codec: Unknown model for ALC880, trying auto-probe from BIOS... | 146 | hda_codec: Unknown model for ALC880, trying auto-probe from BIOS... |
143 | ------------------------------------------------------------------------ | 147 | ------------------------------------------------------------------------ |
144 | Even if you see such a message, DON'T PANIC. Take a deep breath and | 148 | Even if you see such a message, DON'T PANIC. Take a deep breath and |
@@ -403,6 +407,66 @@ re-configure based on that state, run like below: | |||
403 | ------------------------------------------------------------------------ | 407 | ------------------------------------------------------------------------ |
404 | 408 | ||
405 | 409 | ||
410 | Early Patching | ||
411 | ~~~~~~~~~~~~~~ | ||
412 | When CONFIG_SND_HDA_PATCH_LOADER=y is set, you can pass a "patch" as a | ||
413 | firmware file for modifying the HD-audio setup before initializing the | ||
414 | codec. This can work basically like the reconfiguration via sysfs in | ||
415 | the above, but it does it before the first codec configuration. | ||
416 | |||
417 | A patch file is a plain text file which looks like below: | ||
418 | |||
419 | ------------------------------------------------------------------------ | ||
420 | [codec] | ||
421 | 0x12345678 0xabcd1234 2 | ||
422 | |||
423 | [model] | ||
424 | auto | ||
425 | |||
426 | [pincfg] | ||
427 | 0x12 0x411111f0 | ||
428 | |||
429 | [verb] | ||
430 | 0x20 0x500 0x03 | ||
431 | 0x20 0x400 0xff | ||
432 | |||
433 | [hint] | ||
434 | hp_detect = yes | ||
435 | ------------------------------------------------------------------------ | ||
436 | |||
437 | The file needs to have a line `[codec]`. The next line should contain | ||
438 | three numbers indicating the codec vendor-id (0x12345678 in the | ||
439 | example), the codec subsystem-id (0xabcd1234) and the address (2) of | ||
440 | the codec. The rest patch entries are applied to this specified codec | ||
441 | until another codec entry is given. | ||
442 | |||
443 | The `[model]` line allows to change the model name of the each codec. | ||
444 | In the example above, it will be changed to model=auto. | ||
445 | Note that this overrides the module option. | ||
446 | |||
447 | After the `[pincfg]` line, the contents are parsed as the initial | ||
448 | default pin-configurations just like `user_pin_configs` sysfs above. | ||
449 | The values can be shown in user_pin_configs sysfs file, too. | ||
450 | |||
451 | Similarly, the lines after `[verb]` are parsed as `init_verbs` | ||
452 | sysfs entries, and the lines after `[hint]` are parsed as `hints` | ||
453 | sysfs entries, respectively. | ||
454 | |||
455 | The hd-audio driver reads the file via request_firmware(). Thus, | ||
456 | a patch file has to be located on the appropriate firmware path, | ||
457 | typically, /lib/firmware. For example, when you pass the option | ||
458 | `patch=hda-init.fw`, the file /lib/firmware/hda-init-fw must be | ||
459 | present. | ||
460 | |||
461 | The patch module option is specific to each card instance, and you | ||
462 | need to give one file name for each instance, separated by commas. | ||
463 | For example, if you have two cards, one for an on-board analog and one | ||
464 | for an HDMI video board, you may pass patch option like below: | ||
465 | ------------------------------------------------------------------------ | ||
466 | options snd-hda-intel patch=on-board-patch,hdmi-patch | ||
467 | ------------------------------------------------------------------------ | ||
468 | |||
469 | |||
406 | Power-Saving | 470 | Power-Saving |
407 | ~~~~~~~~~~~~ | 471 | ~~~~~~~~~~~~ |
408 | The power-saving is a kind of auto-suspend of the device. When the | 472 | The power-saving is a kind of auto-suspend of the device. When the |
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 322a00bb99d9..2dbff53369d0 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt | |||
@@ -19,6 +19,7 @@ Currently, these files might (depending on your configuration) | |||
19 | show up in /proc/sys/kernel: | 19 | show up in /proc/sys/kernel: |
20 | - acpi_video_flags | 20 | - acpi_video_flags |
21 | - acct | 21 | - acct |
22 | - callhome [ S390 only ] | ||
22 | - auto_msgmni | 23 | - auto_msgmni |
23 | - core_pattern | 24 | - core_pattern |
24 | - core_uses_pid | 25 | - core_uses_pid |
@@ -91,6 +92,21 @@ valid for 30 seconds. | |||
91 | 92 | ||
92 | ============================================================== | 93 | ============================================================== |
93 | 94 | ||
95 | callhome: | ||
96 | |||
97 | Controls the kernel's callhome behavior in case of a kernel panic. | ||
98 | |||
99 | The s390 hardware allows an operating system to send a notification | ||
100 | to a service organization (callhome) in case of an operating system panic. | ||
101 | |||
102 | When the value in this file is 0 (which is the default behavior) | ||
103 | nothing happens in case of a kernel panic. If this value is set to "1" | ||
104 | the complete kernel oops message is send to the IBM customer service | ||
105 | organization in case the mainframe the Linux operating system is running | ||
106 | on has a service contract with IBM. | ||
107 | |||
108 | ============================================================== | ||
109 | |||
94 | core_pattern: | 110 | core_pattern: |
95 | 111 | ||
96 | core_pattern is used to specify a core dumpfile pattern name. | 112 | core_pattern is used to specify a core dumpfile pattern name. |
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt index f157d7594ea7..78c45a87be57 100644 --- a/Documentation/trace/events.txt +++ b/Documentation/trace/events.txt | |||
@@ -1,7 +1,7 @@ | |||
1 | Event Tracing | 1 | Event Tracing |
2 | 2 | ||
3 | Documentation written by Theodore Ts'o | 3 | Documentation written by Theodore Ts'o |
4 | Updated by Li Zefan | 4 | Updated by Li Zefan and Tom Zanussi |
5 | 5 | ||
6 | 1. Introduction | 6 | 1. Introduction |
7 | =============== | 7 | =============== |
@@ -22,12 +22,12 @@ tracing information should be printed. | |||
22 | --------------------------------- | 22 | --------------------------------- |
23 | 23 | ||
24 | The events which are available for tracing can be found in the file | 24 | The events which are available for tracing can be found in the file |
25 | /debug/tracing/available_events. | 25 | /sys/kernel/debug/tracing/available_events. |
26 | 26 | ||
27 | To enable a particular event, such as 'sched_wakeup', simply echo it | 27 | To enable a particular event, such as 'sched_wakeup', simply echo it |
28 | to /debug/tracing/set_event. For example: | 28 | to /sys/kernel/debug/tracing/set_event. For example: |
29 | 29 | ||
30 | # echo sched_wakeup >> /debug/tracing/set_event | 30 | # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event |
31 | 31 | ||
32 | [ Note: '>>' is necessary, otherwise it will firstly disable | 32 | [ Note: '>>' is necessary, otherwise it will firstly disable |
33 | all the events. ] | 33 | all the events. ] |
@@ -35,15 +35,15 @@ to /debug/tracing/set_event. For example: | |||
35 | To disable an event, echo the event name to the set_event file prefixed | 35 | To disable an event, echo the event name to the set_event file prefixed |
36 | with an exclamation point: | 36 | with an exclamation point: |
37 | 37 | ||
38 | # echo '!sched_wakeup' >> /debug/tracing/set_event | 38 | # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event |
39 | 39 | ||
40 | To disable all events, echo an empty line to the set_event file: | 40 | To disable all events, echo an empty line to the set_event file: |
41 | 41 | ||
42 | # echo > /debug/tracing/set_event | 42 | # echo > /sys/kernel/debug/tracing/set_event |
43 | 43 | ||
44 | To enable all events, echo '*:*' or '*:' to the set_event file: | 44 | To enable all events, echo '*:*' or '*:' to the set_event file: |
45 | 45 | ||
46 | # echo *:* > /debug/tracing/set_event | 46 | # echo *:* > /sys/kernel/debug/tracing/set_event |
47 | 47 | ||
48 | The events are organized into subsystems, such as ext4, irq, sched, | 48 | The events are organized into subsystems, such as ext4, irq, sched, |
49 | etc., and a full event name looks like this: <subsystem>:<event>. The | 49 | etc., and a full event name looks like this: <subsystem>:<event>. The |
@@ -52,29 +52,29 @@ file. All of the events in a subsystem can be specified via the syntax | |||
52 | "<subsystem>:*"; for example, to enable all irq events, you can use the | 52 | "<subsystem>:*"; for example, to enable all irq events, you can use the |
53 | command: | 53 | command: |
54 | 54 | ||
55 | # echo 'irq:*' > /debug/tracing/set_event | 55 | # echo 'irq:*' > /sys/kernel/debug/tracing/set_event |
56 | 56 | ||
57 | 2.2 Via the 'enable' toggle | 57 | 2.2 Via the 'enable' toggle |
58 | --------------------------- | 58 | --------------------------- |
59 | 59 | ||
60 | The events available are also listed in /debug/tracing/events/ hierarchy | 60 | The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy |
61 | of directories. | 61 | of directories. |
62 | 62 | ||
63 | To enable event 'sched_wakeup': | 63 | To enable event 'sched_wakeup': |
64 | 64 | ||
65 | # echo 1 > /debug/tracing/events/sched/sched_wakeup/enable | 65 | # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable |
66 | 66 | ||
67 | To disable it: | 67 | To disable it: |
68 | 68 | ||
69 | # echo 0 > /debug/tracing/events/sched/sched_wakeup/enable | 69 | # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable |
70 | 70 | ||
71 | To enable all events in sched subsystem: | 71 | To enable all events in sched subsystem: |
72 | 72 | ||
73 | # echo 1 > /debug/tracing/events/sched/enable | 73 | # echo 1 > /sys/kernel/debug/tracing/events/sched/enable |
74 | 74 | ||
75 | To eanble all events: | 75 | To eanble all events: |
76 | 76 | ||
77 | # echo 1 > /debug/tracing/events/enable | 77 | # echo 1 > /sys/kernel/debug/tracing/events/enable |
78 | 78 | ||
79 | When reading one of these enable files, there are four results: | 79 | When reading one of these enable files, there are four results: |
80 | 80 | ||
@@ -83,8 +83,199 @@ When reading one of these enable files, there are four results: | |||
83 | X - there is a mixture of events enabled and disabled | 83 | X - there is a mixture of events enabled and disabled |
84 | ? - this file does not affect any event | 84 | ? - this file does not affect any event |
85 | 85 | ||
86 | 2.3 Boot option | ||
87 | --------------- | ||
88 | |||
89 | In order to facilitate early boot debugging, use boot option: | ||
90 | |||
91 | trace_event=[event-list] | ||
92 | |||
93 | The format of this boot option is the same as described in section 2.1. | ||
94 | |||
86 | 3. Defining an event-enabled tracepoint | 95 | 3. Defining an event-enabled tracepoint |
87 | ======================================= | 96 | ======================================= |
88 | 97 | ||
89 | See The example provided in samples/trace_events | 98 | See The example provided in samples/trace_events |
90 | 99 | ||
100 | 4. Event formats | ||
101 | ================ | ||
102 | |||
103 | Each trace event has a 'format' file associated with it that contains | ||
104 | a description of each field in a logged event. This information can | ||
105 | be used to parse the binary trace stream, and is also the place to | ||
106 | find the field names that can be used in event filters (see section 5). | ||
107 | |||
108 | It also displays the format string that will be used to print the | ||
109 | event in text mode, along with the event name and ID used for | ||
110 | profiling. | ||
111 | |||
112 | Every event has a set of 'common' fields associated with it; these are | ||
113 | the fields prefixed with 'common_'. The other fields vary between | ||
114 | events and correspond to the fields defined in the TRACE_EVENT | ||
115 | definition for that event. | ||
116 | |||
117 | Each field in the format has the form: | ||
118 | |||
119 | field:field-type field-name; offset:N; size:N; | ||
120 | |||
121 | where offset is the offset of the field in the trace record and size | ||
122 | is the size of the data item, in bytes. | ||
123 | |||
124 | For example, here's the information displayed for the 'sched_wakeup' | ||
125 | event: | ||
126 | |||
127 | # cat /debug/tracing/events/sched/sched_wakeup/format | ||
128 | |||
129 | name: sched_wakeup | ||
130 | ID: 60 | ||
131 | format: | ||
132 | field:unsigned short common_type; offset:0; size:2; | ||
133 | field:unsigned char common_flags; offset:2; size:1; | ||
134 | field:unsigned char common_preempt_count; offset:3; size:1; | ||
135 | field:int common_pid; offset:4; size:4; | ||
136 | field:int common_tgid; offset:8; size:4; | ||
137 | |||
138 | field:char comm[TASK_COMM_LEN]; offset:12; size:16; | ||
139 | field:pid_t pid; offset:28; size:4; | ||
140 | field:int prio; offset:32; size:4; | ||
141 | field:int success; offset:36; size:4; | ||
142 | field:int cpu; offset:40; size:4; | ||
143 | |||
144 | print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid, | ||
145 | REC->prio, REC->success, REC->cpu | ||
146 | |||
147 | This event contains 10 fields, the first 5 common and the remaining 5 | ||
148 | event-specific. All the fields for this event are numeric, except for | ||
149 | 'comm' which is a string, a distinction important for event filtering. | ||
150 | |||
151 | 5. Event filtering | ||
152 | ================== | ||
153 | |||
154 | Trace events can be filtered in the kernel by associating boolean | ||
155 | 'filter expressions' with them. As soon as an event is logged into | ||
156 | the trace buffer, its fields are checked against the filter expression | ||
157 | associated with that event type. An event with field values that | ||
158 | 'match' the filter will appear in the trace output, and an event whose | ||
159 | values don't match will be discarded. An event with no filter | ||
160 | associated with it matches everything, and is the default when no | ||
161 | filter has been set for an event. | ||
162 | |||
163 | 5.1 Expression syntax | ||
164 | --------------------- | ||
165 | |||
166 | A filter expression consists of one or more 'predicates' that can be | ||
167 | combined using the logical operators '&&' and '||'. A predicate is | ||
168 | simply a clause that compares the value of a field contained within a | ||
169 | logged event with a constant value and returns either 0 or 1 depending | ||
170 | on whether the field value matched (1) or didn't match (0): | ||
171 | |||
172 | field-name relational-operator value | ||
173 | |||
174 | Parentheses can be used to provide arbitrary logical groupings and | ||
175 | double-quotes can be used to prevent the shell from interpreting | ||
176 | operators as shell metacharacters. | ||
177 | |||
178 | The field-names available for use in filters can be found in the | ||
179 | 'format' files for trace events (see section 4). | ||
180 | |||
181 | The relational-operators depend on the type of the field being tested: | ||
182 | |||
183 | The operators available for numeric fields are: | ||
184 | |||
185 | ==, !=, <, <=, >, >= | ||
186 | |||
187 | And for string fields they are: | ||
188 | |||
189 | ==, != | ||
190 | |||
191 | Currently, only exact string matches are supported. | ||
192 | |||
193 | Currently, the maximum number of predicates in a filter is 16. | ||
194 | |||
195 | 5.2 Setting filters | ||
196 | ------------------- | ||
197 | |||
198 | A filter for an individual event is set by writing a filter expression | ||
199 | to the 'filter' file for the given event. | ||
200 | |||
201 | For example: | ||
202 | |||
203 | # cd /debug/tracing/events/sched/sched_wakeup | ||
204 | # echo "common_preempt_count > 4" > filter | ||
205 | |||
206 | A slightly more involved example: | ||
207 | |||
208 | # cd /debug/tracing/events/sched/sched_signal_send | ||
209 | # echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter | ||
210 | |||
211 | If there is an error in the expression, you'll get an 'Invalid | ||
212 | argument' error when setting it, and the erroneous string along with | ||
213 | an error message can be seen by looking at the filter e.g.: | ||
214 | |||
215 | # cd /debug/tracing/events/sched/sched_signal_send | ||
216 | # echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter | ||
217 | -bash: echo: write error: Invalid argument | ||
218 | # cat filter | ||
219 | ((sig >= 10 && sig < 15) || dsig == 17) && comm != bash | ||
220 | ^ | ||
221 | parse_error: Field not found | ||
222 | |||
223 | Currently the caret ('^') for an error always appears at the beginning of | ||
224 | the filter string; the error message should still be useful though | ||
225 | even without more accurate position info. | ||
226 | |||
227 | 5.3 Clearing filters | ||
228 | -------------------- | ||
229 | |||
230 | To clear the filter for an event, write a '0' to the event's filter | ||
231 | file. | ||
232 | |||
233 | To clear the filters for all events in a subsystem, write a '0' to the | ||
234 | subsystem's filter file. | ||
235 | |||
236 | 5.3 Subsystem filters | ||
237 | --------------------- | ||
238 | |||
239 | For convenience, filters for every event in a subsystem can be set or | ||
240 | cleared as a group by writing a filter expression into the filter file | ||
241 | at the root of the subsytem. Note however, that if a filter for any | ||
242 | event within the subsystem lacks a field specified in the subsystem | ||
243 | filter, or if the filter can't be applied for any other reason, the | ||
244 | filter for that event will retain its previous setting. This can | ||
245 | result in an unintended mixture of filters which could lead to | ||
246 | confusing (to the user who might think different filters are in | ||
247 | effect) trace output. Only filters that reference just the common | ||
248 | fields can be guaranteed to propagate successfully to all events. | ||
249 | |||
250 | Here are a few subsystem filter examples that also illustrate the | ||
251 | above points: | ||
252 | |||
253 | Clear the filters on all events in the sched subsytem: | ||
254 | |||
255 | # cd /sys/kernel/debug/tracing/events/sched | ||
256 | # echo 0 > filter | ||
257 | # cat sched_switch/filter | ||
258 | none | ||
259 | # cat sched_wakeup/filter | ||
260 | none | ||
261 | |||
262 | Set a filter using only common fields for all events in the sched | ||
263 | subsytem (all events end up with the same filter): | ||
264 | |||
265 | # cd /sys/kernel/debug/tracing/events/sched | ||
266 | # echo common_pid == 0 > filter | ||
267 | # cat sched_switch/filter | ||
268 | common_pid == 0 | ||
269 | # cat sched_wakeup/filter | ||
270 | common_pid == 0 | ||
271 | |||
272 | Attempt to set a filter using a non-common field for all events in the | ||
273 | sched subsytem (all events but those that have a prev_pid field retain | ||
274 | their old filters): | ||
275 | |||
276 | # cd /sys/kernel/debug/tracing/events/sched | ||
277 | # echo prev_pid == 0 > filter | ||
278 | # cat sched_switch/filter | ||
279 | prev_pid == 0 | ||
280 | # cat sched_wakeup/filter | ||
281 | common_pid == 0 | ||
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt new file mode 100644 index 000000000000..7003e10f10f5 --- /dev/null +++ b/Documentation/trace/ftrace-design.txt | |||
@@ -0,0 +1,233 @@ | |||
1 | function tracer guts | ||
2 | ==================== | ||
3 | |||
4 | Introduction | ||
5 | ------------ | ||
6 | |||
7 | Here we will cover the architecture pieces that the common function tracing | ||
8 | code relies on for proper functioning. Things are broken down into increasing | ||
9 | complexity so that you can start simple and at least get basic functionality. | ||
10 | |||
11 | Note that this focuses on architecture implementation details only. If you | ||
12 | want more explanation of a feature in terms of common code, review the common | ||
13 | ftrace.txt file. | ||
14 | |||
15 | |||
16 | Prerequisites | ||
17 | ------------- | ||
18 | |||
19 | Ftrace relies on these features being implemented: | ||
20 | STACKTRACE_SUPPORT - implement save_stack_trace() | ||
21 | TRACE_IRQFLAGS_SUPPORT - implement include/asm/irqflags.h | ||
22 | |||
23 | |||
24 | HAVE_FUNCTION_TRACER | ||
25 | -------------------- | ||
26 | |||
27 | You will need to implement the mcount and the ftrace_stub functions. | ||
28 | |||
29 | The exact mcount symbol name will depend on your toolchain. Some call it | ||
30 | "mcount", "_mcount", or even "__mcount". You can probably figure it out by | ||
31 | running something like: | ||
32 | $ echo 'main(){}' | gcc -x c -S -o - - -pg | grep mcount | ||
33 | call mcount | ||
34 | We'll make the assumption below that the symbol is "mcount" just to keep things | ||
35 | nice and simple in the examples. | ||
36 | |||
37 | Keep in mind that the ABI that is in effect inside of the mcount function is | ||
38 | *highly* architecture/toolchain specific. We cannot help you in this regard, | ||
39 | sorry. Dig up some old documentation and/or find someone more familiar than | ||
40 | you to bang ideas off of. Typically, register usage (argument/scratch/etc...) | ||
41 | is a major issue at this point, especially in relation to the location of the | ||
42 | mcount call (before/after function prologue). You might also want to look at | ||
43 | how glibc has implemented the mcount function for your architecture. It might | ||
44 | be (semi-)relevant. | ||
45 | |||
46 | The mcount function should check the function pointer ftrace_trace_function | ||
47 | to see if it is set to ftrace_stub. If it is, there is nothing for you to do, | ||
48 | so return immediately. If it isn't, then call that function in the same way | ||
49 | the mcount function normally calls __mcount_internal -- the first argument is | ||
50 | the "frompc" while the second argument is the "selfpc" (adjusted to remove the | ||
51 | size of the mcount call that is embedded in the function). | ||
52 | |||
53 | For example, if the function foo() calls bar(), when the bar() function calls | ||
54 | mcount(), the arguments mcount() will pass to the tracer are: | ||
55 | "frompc" - the address bar() will use to return to foo() | ||
56 | "selfpc" - the address bar() (with _mcount() size adjustment) | ||
57 | |||
58 | Also keep in mind that this mcount function will be called *a lot*, so | ||
59 | optimizing for the default case of no tracer will help the smooth running of | ||
60 | your system when tracing is disabled. So the start of the mcount function is | ||
61 | typically the bare min with checking things before returning. That also means | ||
62 | the code flow should usually kept linear (i.e. no branching in the nop case). | ||
63 | This is of course an optimization and not a hard requirement. | ||
64 | |||
65 | Here is some pseudo code that should help (these functions should actually be | ||
66 | implemented in assembly): | ||
67 | |||
68 | void ftrace_stub(void) | ||
69 | { | ||
70 | return; | ||
71 | } | ||
72 | |||
73 | void mcount(void) | ||
74 | { | ||
75 | /* save any bare state needed in order to do initial checking */ | ||
76 | |||
77 | extern void (*ftrace_trace_function)(unsigned long, unsigned long); | ||
78 | if (ftrace_trace_function != ftrace_stub) | ||
79 | goto do_trace; | ||
80 | |||
81 | /* restore any bare state */ | ||
82 | |||
83 | return; | ||
84 | |||
85 | do_trace: | ||
86 | |||
87 | /* save all state needed by the ABI (see paragraph above) */ | ||
88 | |||
89 | unsigned long frompc = ...; | ||
90 | unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; | ||
91 | ftrace_trace_function(frompc, selfpc); | ||
92 | |||
93 | /* restore all state needed by the ABI */ | ||
94 | } | ||
95 | |||
96 | Don't forget to export mcount for modules ! | ||
97 | extern void mcount(void); | ||
98 | EXPORT_SYMBOL(mcount); | ||
99 | |||
100 | |||
101 | HAVE_FUNCTION_TRACE_MCOUNT_TEST | ||
102 | ------------------------------- | ||
103 | |||
104 | This is an optional optimization for the normal case when tracing is turned off | ||
105 | in the system. If you do not enable this Kconfig option, the common ftrace | ||
106 | code will take care of doing the checking for you. | ||
107 | |||
108 | To support this feature, you only need to check the function_trace_stop | ||
109 | variable in the mcount function. If it is non-zero, there is no tracing to be | ||
110 | done at all, so you can return. | ||
111 | |||
112 | This additional pseudo code would simply be: | ||
113 | void mcount(void) | ||
114 | { | ||
115 | /* save any bare state needed in order to do initial checking */ | ||
116 | |||
117 | + if (function_trace_stop) | ||
118 | + return; | ||
119 | |||
120 | extern void (*ftrace_trace_function)(unsigned long, unsigned long); | ||
121 | if (ftrace_trace_function != ftrace_stub) | ||
122 | ... | ||
123 | |||
124 | |||
125 | HAVE_FUNCTION_GRAPH_TRACER | ||
126 | -------------------------- | ||
127 | |||
128 | Deep breath ... time to do some real work. Here you will need to update the | ||
129 | mcount function to check ftrace graph function pointers, as well as implement | ||
130 | some functions to save (hijack) and restore the return address. | ||
131 | |||
132 | The mcount function should check the function pointers ftrace_graph_return | ||
133 | (compare to ftrace_stub) and ftrace_graph_entry (compare to | ||
134 | ftrace_graph_entry_stub). If either of those are not set to the relevant stub | ||
135 | function, call the arch-specific function ftrace_graph_caller which in turn | ||
136 | calls the arch-specific function prepare_ftrace_return. Neither of these | ||
137 | function names are strictly required, but you should use them anyways to stay | ||
138 | consistent across the architecture ports -- easier to compare & contrast | ||
139 | things. | ||
140 | |||
141 | The arguments to prepare_ftrace_return are slightly different than what are | ||
142 | passed to ftrace_trace_function. The second argument "selfpc" is the same, | ||
143 | but the first argument should be a pointer to the "frompc". Typically this is | ||
144 | located on the stack. This allows the function to hijack the return address | ||
145 | temporarily to have it point to the arch-specific function return_to_handler. | ||
146 | That function will simply call the common ftrace_return_to_handler function and | ||
147 | that will return the original return address with which, you can return to the | ||
148 | original call site. | ||
149 | |||
150 | Here is the updated mcount pseudo code: | ||
151 | void mcount(void) | ||
152 | { | ||
153 | ... | ||
154 | if (ftrace_trace_function != ftrace_stub) | ||
155 | goto do_trace; | ||
156 | |||
157 | +#ifdef CONFIG_FUNCTION_GRAPH_TRACER | ||
158 | + extern void (*ftrace_graph_return)(...); | ||
159 | + extern void (*ftrace_graph_entry)(...); | ||
160 | + if (ftrace_graph_return != ftrace_stub || | ||
161 | + ftrace_graph_entry != ftrace_graph_entry_stub) | ||
162 | + ftrace_graph_caller(); | ||
163 | +#endif | ||
164 | |||
165 | /* restore any bare state */ | ||
166 | ... | ||
167 | |||
168 | Here is the pseudo code for the new ftrace_graph_caller assembly function: | ||
169 | #ifdef CONFIG_FUNCTION_GRAPH_TRACER | ||
170 | void ftrace_graph_caller(void) | ||
171 | { | ||
172 | /* save all state needed by the ABI */ | ||
173 | |||
174 | unsigned long *frompc = &...; | ||
175 | unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; | ||
176 | prepare_ftrace_return(frompc, selfpc); | ||
177 | |||
178 | /* restore all state needed by the ABI */ | ||
179 | } | ||
180 | #endif | ||
181 | |||
182 | For information on how to implement prepare_ftrace_return(), simply look at | ||
183 | the x86 version. The only architecture-specific piece in it is the setup of | ||
184 | the fault recovery table (the asm(...) code). The rest should be the same | ||
185 | across architectures. | ||
186 | |||
187 | Here is the pseudo code for the new return_to_handler assembly function. Note | ||
188 | that the ABI that applies here is different from what applies to the mcount | ||
189 | code. Since you are returning from a function (after the epilogue), you might | ||
190 | be able to skimp on things saved/restored (usually just registers used to pass | ||
191 | return values). | ||
192 | |||
193 | #ifdef CONFIG_FUNCTION_GRAPH_TRACER | ||
194 | void return_to_handler(void) | ||
195 | { | ||
196 | /* save all state needed by the ABI (see paragraph above) */ | ||
197 | |||
198 | void (*original_return_point)(void) = ftrace_return_to_handler(); | ||
199 | |||
200 | /* restore all state needed by the ABI */ | ||
201 | |||
202 | /* this is usually either a return or a jump */ | ||
203 | original_return_point(); | ||
204 | } | ||
205 | #endif | ||
206 | |||
207 | |||
208 | HAVE_FTRACE_NMI_ENTER | ||
209 | --------------------- | ||
210 | |||
211 | If you can't trace NMI functions, then skip this option. | ||
212 | |||
213 | <details to be filled> | ||
214 | |||
215 | |||
216 | HAVE_FTRACE_SYSCALLS | ||
217 | --------------------- | ||
218 | |||
219 | <details to be filled> | ||
220 | |||
221 | |||
222 | HAVE_FTRACE_MCOUNT_RECORD | ||
223 | ------------------------- | ||
224 | |||
225 | See scripts/recordmcount.pl for more info. | ||
226 | |||
227 | <details to be filled> | ||
228 | |||
229 | |||
230 | HAVE_DYNAMIC_FTRACE | ||
231 | --------------------- | ||
232 | |||
233 | <details to be filled> | ||
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index a39b3c749de5..1b6292bbdd6d 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt | |||
@@ -26,6 +26,12 @@ disabled, and more (ftrace allows for tracer plugins, which | |||
26 | means that the list of tracers can always grow). | 26 | means that the list of tracers can always grow). |
27 | 27 | ||
28 | 28 | ||
29 | Implementation Details | ||
30 | ---------------------- | ||
31 | |||
32 | See ftrace-design.txt for details for arch porters and such. | ||
33 | |||
34 | |||
29 | The File System | 35 | The File System |
30 | --------------- | 36 | --------------- |
31 | 37 | ||
@@ -85,26 +91,19 @@ of ftrace. Here is a list of some of the key files: | |||
85 | This file holds the output of the trace in a human | 91 | This file holds the output of the trace in a human |
86 | readable format (described below). | 92 | readable format (described below). |
87 | 93 | ||
88 | latency_trace: | ||
89 | |||
90 | This file shows the same trace but the information | ||
91 | is organized more to display possible latencies | ||
92 | in the system (described below). | ||
93 | |||
94 | trace_pipe: | 94 | trace_pipe: |
95 | 95 | ||
96 | The output is the same as the "trace" file but this | 96 | The output is the same as the "trace" file but this |
97 | file is meant to be streamed with live tracing. | 97 | file is meant to be streamed with live tracing. |
98 | Reads from this file will block until new data | 98 | Reads from this file will block until new data is |
99 | is retrieved. Unlike the "trace" and "latency_trace" | 99 | retrieved. Unlike the "trace" file, this file is a |
100 | files, this file is a consumer. This means reading | 100 | consumer. This means reading from this file causes |
101 | from this file causes sequential reads to display | 101 | sequential reads to display more current data. Once |
102 | more current data. Once data is read from this | 102 | data is read from this file, it is consumed, and |
103 | file, it is consumed, and will not be read | 103 | will not be read again with a sequential read. The |
104 | again with a sequential read. The "trace" and | 104 | "trace" file is static, and if the tracer is not |
105 | "latency_trace" files are static, and if the | 105 | adding more data,they will display the same |
106 | tracer is not adding more data, they will display | 106 | information every time they are read. |
107 | the same information every time they are read. | ||
108 | 107 | ||
109 | trace_options: | 108 | trace_options: |
110 | 109 | ||
@@ -117,10 +116,10 @@ of ftrace. Here is a list of some of the key files: | |||
117 | Some of the tracers record the max latency. | 116 | Some of the tracers record the max latency. |
118 | For example, the time interrupts are disabled. | 117 | For example, the time interrupts are disabled. |
119 | This time is saved in this file. The max trace | 118 | This time is saved in this file. The max trace |
120 | will also be stored, and displayed by either | 119 | will also be stored, and displayed by "trace". |
121 | "trace" or "latency_trace". A new max trace will | 120 | A new max trace will only be recorded if the |
122 | only be recorded if the latency is greater than | 121 | latency is greater than the value in this |
123 | the value in this file. (in microseconds) | 122 | file. (in microseconds) |
124 | 123 | ||
125 | buffer_size_kb: | 124 | buffer_size_kb: |
126 | 125 | ||
@@ -210,7 +209,7 @@ Here is the list of current tracers that may be configured. | |||
210 | the trace with the longest max latency. | 209 | the trace with the longest max latency. |
211 | See tracing_max_latency. When a new max is recorded, | 210 | See tracing_max_latency. When a new max is recorded, |
212 | it replaces the old trace. It is best to view this | 211 | it replaces the old trace. It is best to view this |
213 | trace via the latency_trace file. | 212 | trace with the latency-format option enabled. |
214 | 213 | ||
215 | "preemptoff" | 214 | "preemptoff" |
216 | 215 | ||
@@ -307,8 +306,8 @@ the lowest priority thread (pid 0). | |||
307 | Latency trace format | 306 | Latency trace format |
308 | -------------------- | 307 | -------------------- |
309 | 308 | ||
310 | For traces that display latency times, the latency_trace file | 309 | When the latency-format option is enabled, the trace file gives |
311 | gives somewhat more information to see why a latency happened. | 310 | somewhat more information to see why a latency happened. |
312 | Here is a typical trace. | 311 | Here is a typical trace. |
313 | 312 | ||
314 | # tracer: irqsoff | 313 | # tracer: irqsoff |
@@ -380,9 +379,10 @@ explains which is which. | |||
380 | 379 | ||
381 | The above is mostly meaningful for kernel developers. | 380 | The above is mostly meaningful for kernel developers. |
382 | 381 | ||
383 | time: This differs from the trace file output. The trace file output | 382 | time: When the latency-format option is enabled, the trace file |
384 | includes an absolute timestamp. The timestamp used by the | 383 | output includes a timestamp relative to the start of the |
385 | latency_trace file is relative to the start of the trace. | 384 | trace. This differs from the output when latency-format |
385 | is disabled, which includes an absolute timestamp. | ||
386 | 386 | ||
387 | delay: This is just to help catch your eye a bit better. And | 387 | delay: This is just to help catch your eye a bit better. And |
388 | needs to be fixed to be only relative to the same CPU. | 388 | needs to be fixed to be only relative to the same CPU. |
@@ -440,7 +440,8 @@ Here are the available options: | |||
440 | sym-addr: | 440 | sym-addr: |
441 | bash-4000 [01] 1477.606694: simple_strtoul <c0339346> | 441 | bash-4000 [01] 1477.606694: simple_strtoul <c0339346> |
442 | 442 | ||
443 | verbose - This deals with the latency_trace file. | 443 | verbose - This deals with the trace file when the |
444 | latency-format option is enabled. | ||
444 | 445 | ||
445 | bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \ | 446 | bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \ |
446 | (+0.000ms): simple_strtoul (strict_strtoul) | 447 | (+0.000ms): simple_strtoul (strict_strtoul) |
@@ -472,7 +473,7 @@ Here are the available options: | |||
472 | the app is no longer running | 473 | the app is no longer running |
473 | 474 | ||
474 | The lookup is performed when you read | 475 | The lookup is performed when you read |
475 | trace,trace_pipe,latency_trace. Example: | 476 | trace,trace_pipe. Example: |
476 | 477 | ||
477 | a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0 | 478 | a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0 |
478 | x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] | 479 | x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] |
@@ -481,6 +482,11 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] | |||
481 | every scheduling event. Will add overhead if | 482 | every scheduling event. Will add overhead if |
482 | there's a lot of tasks running at once. | 483 | there's a lot of tasks running at once. |
483 | 484 | ||
485 | latency-format - This option changes the trace. When | ||
486 | it is enabled, the trace displays | ||
487 | additional information about the | ||
488 | latencies, as described in "Latency | ||
489 | trace format". | ||
484 | 490 | ||
485 | sched_switch | 491 | sched_switch |
486 | ------------ | 492 | ------------ |
@@ -596,12 +602,13 @@ To reset the maximum, echo 0 into tracing_max_latency. Here is | |||
596 | an example: | 602 | an example: |
597 | 603 | ||
598 | # echo irqsoff > current_tracer | 604 | # echo irqsoff > current_tracer |
605 | # echo latency-format > trace_options | ||
599 | # echo 0 > tracing_max_latency | 606 | # echo 0 > tracing_max_latency |
600 | # echo 1 > tracing_enabled | 607 | # echo 1 > tracing_enabled |
601 | # ls -ltr | 608 | # ls -ltr |
602 | [...] | 609 | [...] |
603 | # echo 0 > tracing_enabled | 610 | # echo 0 > tracing_enabled |
604 | # cat latency_trace | 611 | # cat trace |
605 | # tracer: irqsoff | 612 | # tracer: irqsoff |
606 | # | 613 | # |
607 | irqsoff latency trace v1.1.5 on 2.6.26 | 614 | irqsoff latency trace v1.1.5 on 2.6.26 |
@@ -703,12 +710,13 @@ which preemption was disabled. The control of preemptoff tracer | |||
703 | is much like the irqsoff tracer. | 710 | is much like the irqsoff tracer. |
704 | 711 | ||
705 | # echo preemptoff > current_tracer | 712 | # echo preemptoff > current_tracer |
713 | # echo latency-format > trace_options | ||
706 | # echo 0 > tracing_max_latency | 714 | # echo 0 > tracing_max_latency |
707 | # echo 1 > tracing_enabled | 715 | # echo 1 > tracing_enabled |
708 | # ls -ltr | 716 | # ls -ltr |
709 | [...] | 717 | [...] |
710 | # echo 0 > tracing_enabled | 718 | # echo 0 > tracing_enabled |
711 | # cat latency_trace | 719 | # cat trace |
712 | # tracer: preemptoff | 720 | # tracer: preemptoff |
713 | # | 721 | # |
714 | preemptoff latency trace v1.1.5 on 2.6.26-rc8 | 722 | preemptoff latency trace v1.1.5 on 2.6.26-rc8 |
@@ -850,12 +858,13 @@ Again, using this trace is much like the irqsoff and preemptoff | |||
850 | tracers. | 858 | tracers. |
851 | 859 | ||
852 | # echo preemptirqsoff > current_tracer | 860 | # echo preemptirqsoff > current_tracer |
861 | # echo latency-format > trace_options | ||
853 | # echo 0 > tracing_max_latency | 862 | # echo 0 > tracing_max_latency |
854 | # echo 1 > tracing_enabled | 863 | # echo 1 > tracing_enabled |
855 | # ls -ltr | 864 | # ls -ltr |
856 | [...] | 865 | [...] |
857 | # echo 0 > tracing_enabled | 866 | # echo 0 > tracing_enabled |
858 | # cat latency_trace | 867 | # cat trace |
859 | # tracer: preemptirqsoff | 868 | # tracer: preemptirqsoff |
860 | # | 869 | # |
861 | preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8 | 870 | preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8 |
@@ -1012,11 +1021,12 @@ Instead of performing an 'ls', we will run 'sleep 1' under | |||
1012 | 'chrt' which changes the priority of the task. | 1021 | 'chrt' which changes the priority of the task. |
1013 | 1022 | ||
1014 | # echo wakeup > current_tracer | 1023 | # echo wakeup > current_tracer |
1024 | # echo latency-format > trace_options | ||
1015 | # echo 0 > tracing_max_latency | 1025 | # echo 0 > tracing_max_latency |
1016 | # echo 1 > tracing_enabled | 1026 | # echo 1 > tracing_enabled |
1017 | # chrt -f 5 sleep 1 | 1027 | # chrt -f 5 sleep 1 |
1018 | # echo 0 > tracing_enabled | 1028 | # echo 0 > tracing_enabled |
1019 | # cat latency_trace | 1029 | # cat trace |
1020 | # tracer: wakeup | 1030 | # tracer: wakeup |
1021 | # | 1031 | # |
1022 | wakeup latency trace v1.1.5 on 2.6.26-rc8 | 1032 | wakeup latency trace v1.1.5 on 2.6.26-rc8 |
diff --git a/Documentation/trace/function-graph-fold.vim b/Documentation/trace/function-graph-fold.vim new file mode 100644 index 000000000000..0544b504c8b0 --- /dev/null +++ b/Documentation/trace/function-graph-fold.vim | |||
@@ -0,0 +1,42 @@ | |||
1 | " Enable folding for ftrace function_graph traces. | ||
2 | " | ||
3 | " To use, :source this file while viewing a function_graph trace, or use vim's | ||
4 | " -S option to load from the command-line together with a trace. You can then | ||
5 | " use the usual vim fold commands, such as "za", to open and close nested | ||
6 | " functions. While closed, a fold will show the total time taken for a call, | ||
7 | " as would normally appear on the line with the closing brace. Folded | ||
8 | " functions will not include finish_task_switch(), so folding should remain | ||
9 | " relatively sane even through a context switch. | ||
10 | " | ||
11 | " Note that this will almost certainly only work well with a | ||
12 | " single-CPU trace (e.g. trace-cmd report --cpu 1). | ||
13 | |||
14 | function! FunctionGraphFoldExpr(lnum) | ||
15 | let line = getline(a:lnum) | ||
16 | if line[-1:] == '{' | ||
17 | if line =~ 'finish_task_switch() {$' | ||
18 | return '>1' | ||
19 | endif | ||
20 | return 'a1' | ||
21 | elseif line[-1:] == '}' | ||
22 | return 's1' | ||
23 | else | ||
24 | return '=' | ||
25 | endif | ||
26 | endfunction | ||
27 | |||
28 | function! FunctionGraphFoldText() | ||
29 | let s = split(getline(v:foldstart), '|', 1) | ||
30 | if getline(v:foldend+1) =~ 'finish_task_switch() {$' | ||
31 | let s[2] = ' task switch ' | ||
32 | else | ||
33 | let e = split(getline(v:foldend), '|', 1) | ||
34 | let s[2] = e[2] | ||
35 | endif | ||
36 | return join(s, '|') | ||
37 | endfunction | ||
38 | |||
39 | setlocal foldexpr=FunctionGraphFoldExpr(v:lnum) | ||
40 | setlocal foldtext=FunctionGraphFoldText() | ||
41 | setlocal foldcolumn=12 | ||
42 | setlocal foldmethod=expr | ||
diff --git a/Documentation/trace/ring-buffer-design.txt b/Documentation/trace/ring-buffer-design.txt new file mode 100644 index 000000000000..5b1d23d604c5 --- /dev/null +++ b/Documentation/trace/ring-buffer-design.txt | |||
@@ -0,0 +1,955 @@ | |||
1 | Lockless Ring Buffer Design | ||
2 | =========================== | ||
3 | |||
4 | Copyright 2009 Red Hat Inc. | ||
5 | Author: Steven Rostedt <srostedt@redhat.com> | ||
6 | License: The GNU Free Documentation License, Version 1.2 | ||
7 | (dual licensed under the GPL v2) | ||
8 | Reviewers: Mathieu Desnoyers, Huang Ying, Hidetoshi Seto, | ||
9 | and Frederic Weisbecker. | ||
10 | |||
11 | |||
12 | Written for: 2.6.31 | ||
13 | |||
14 | Terminology used in this Document | ||
15 | --------------------------------- | ||
16 | |||
17 | tail - where new writes happen in the ring buffer. | ||
18 | |||
19 | head - where new reads happen in the ring buffer. | ||
20 | |||
21 | producer - the task that writes into the ring buffer (same as writer) | ||
22 | |||
23 | writer - same as producer | ||
24 | |||
25 | consumer - the task that reads from the buffer (same as reader) | ||
26 | |||
27 | reader - same as consumer. | ||
28 | |||
29 | reader_page - A page outside the ring buffer used solely (for the most part) | ||
30 | by the reader. | ||
31 | |||
32 | head_page - a pointer to the page that the reader will use next | ||
33 | |||
34 | tail_page - a pointer to the page that will be written to next | ||
35 | |||
36 | commit_page - a pointer to the page with the last finished non nested write. | ||
37 | |||
38 | cmpxchg - hardware assisted atomic transaction that performs the following: | ||
39 | |||
40 | A = B iff previous A == C | ||
41 | |||
42 | R = cmpxchg(A, C, B) is saying that we replace A with B if and only if | ||
43 | current A is equal to C, and we put the old (current) A into R | ||
44 | |||
45 | R gets the previous A regardless if A is updated with B or not. | ||
46 | |||
47 | To see if the update was successful a compare of R == C may be used. | ||
48 | |||
49 | The Generic Ring Buffer | ||
50 | ----------------------- | ||
51 | |||
52 | The ring buffer can be used in either an overwrite mode or in | ||
53 | producer/consumer mode. | ||
54 | |||
55 | Producer/consumer mode is where the producer were to fill up the | ||
56 | buffer before the consumer could free up anything, the producer | ||
57 | will stop writing to the buffer. This will lose most recent events. | ||
58 | |||
59 | Overwrite mode is where the produce were to fill up the buffer | ||
60 | before the consumer could free up anything, the producer will | ||
61 | overwrite the older data. This will lose the oldest events. | ||
62 | |||
63 | No two writers can write at the same time (on the same per cpu buffer), | ||
64 | but a writer may interrupt another writer, but it must finish writing | ||
65 | before the previous writer may continue. This is very important to the | ||
66 | algorithm. The writers act like a "stack". The way interrupts works | ||
67 | enforces this behavior. | ||
68 | |||
69 | |||
70 | writer1 start | ||
71 | <preempted> writer2 start | ||
72 | <preempted> writer3 start | ||
73 | writer3 finishes | ||
74 | writer2 finishes | ||
75 | writer1 finishes | ||
76 | |||
77 | This is very much like a writer being preempted by an interrupt and | ||
78 | the interrupt doing a write as well. | ||
79 | |||
80 | Readers can happen at any time. But no two readers may run at the | ||
81 | same time, nor can a reader preempt/interrupt another reader. A reader | ||
82 | can not preempt/interrupt a writer, but it may read/consume from the | ||
83 | buffer at the same time as a writer is writing, but the reader must be | ||
84 | on another processor to do so. A reader may read on its own processor | ||
85 | and can be preempted by a writer. | ||
86 | |||
87 | A writer can preempt a reader, but a reader can not preempt a writer. | ||
88 | But a reader can read the buffer at the same time (on another processor) | ||
89 | as a writer. | ||
90 | |||
91 | The ring buffer is made up of a list of pages held together by a link list. | ||
92 | |||
93 | At initialization a reader page is allocated for the reader that is not | ||
94 | part of the ring buffer. | ||
95 | |||
96 | The head_page, tail_page and commit_page are all initialized to point | ||
97 | to the same page. | ||
98 | |||
99 | The reader page is initialized to have its next pointer pointing to | ||
100 | the head page, and its previous pointer pointing to a page before | ||
101 | the head page. | ||
102 | |||
103 | The reader has its own page to use. At start up time, this page is | ||
104 | allocated but is not attached to the list. When the reader wants | ||
105 | to read from the buffer, if its page is empty (like it is on start up) | ||
106 | it will swap its page with the head_page. The old reader page will | ||
107 | become part of the ring buffer and the head_page will be removed. | ||
108 | The page after the inserted page (old reader_page) will become the | ||
109 | new head page. | ||
110 | |||
111 | Once the new page is given to the reader, the reader could do what | ||
112 | it wants with it, as long as a writer has left that page. | ||
113 | |||
114 | A sample of how the reader page is swapped: Note this does not | ||
115 | show the head page in the buffer, it is for demonstrating a swap | ||
116 | only. | ||
117 | |||
118 | +------+ | ||
119 | |reader| RING BUFFER | ||
120 | |page | | ||
121 | +------+ | ||
122 | +---+ +---+ +---+ | ||
123 | | |-->| |-->| | | ||
124 | | |<--| |<--| | | ||
125 | +---+ +---+ +---+ | ||
126 | ^ | ^ | | ||
127 | | +-------------+ | | ||
128 | +-----------------+ | ||
129 | |||
130 | |||
131 | +------+ | ||
132 | |reader| RING BUFFER | ||
133 | |page |-------------------+ | ||
134 | +------+ v | ||
135 | | +---+ +---+ +---+ | ||
136 | | | |-->| |-->| | | ||
137 | | | |<--| |<--| |<-+ | ||
138 | | +---+ +---+ +---+ | | ||
139 | | ^ | ^ | | | ||
140 | | | +-------------+ | | | ||
141 | | +-----------------+ | | ||
142 | +------------------------------------+ | ||
143 | |||
144 | +------+ | ||
145 | |reader| RING BUFFER | ||
146 | |page |-------------------+ | ||
147 | +------+ <---------------+ v | ||
148 | | ^ +---+ +---+ +---+ | ||
149 | | | | |-->| |-->| | | ||
150 | | | | | | |<--| |<-+ | ||
151 | | | +---+ +---+ +---+ | | ||
152 | | | | ^ | | | ||
153 | | | +-------------+ | | | ||
154 | | +-----------------------------+ | | ||
155 | +------------------------------------+ | ||
156 | |||
157 | +------+ | ||
158 | |buffer| RING BUFFER | ||
159 | |page |-------------------+ | ||
160 | +------+ <---------------+ v | ||
161 | | ^ +---+ +---+ +---+ | ||
162 | | | | | | |-->| | | ||
163 | | | New | | | |<--| |<-+ | ||
164 | | | Reader +---+ +---+ +---+ | | ||
165 | | | page ----^ | | | ||
166 | | | | | | ||
167 | | +-----------------------------+ | | ||
168 | +------------------------------------+ | ||
169 | |||
170 | |||
171 | |||
172 | It is possible that the page swapped is the commit page and the tail page, | ||
173 | if what is in the ring buffer is less than what is held in a buffer page. | ||
174 | |||
175 | |||
176 | reader page commit page tail page | ||
177 | | | | | ||
178 | v | | | ||
179 | +---+ | | | ||
180 | | |<----------+ | | ||
181 | | |<------------------------+ | ||
182 | | |------+ | ||
183 | +---+ | | ||
184 | | | ||
185 | v | ||
186 | +---+ +---+ +---+ +---+ | ||
187 | <---| |--->| |--->| |--->| |---> | ||
188 | --->| |<---| |<---| |<---| |<--- | ||
189 | +---+ +---+ +---+ +---+ | ||
190 | |||
191 | This case is still valid for this algorithm. | ||
192 | When the writer leaves the page, it simply goes into the ring buffer | ||
193 | since the reader page still points to the next location in the ring | ||
194 | buffer. | ||
195 | |||
196 | |||
197 | The main pointers: | ||
198 | |||
199 | reader page - The page used solely by the reader and is not part | ||
200 | of the ring buffer (may be swapped in) | ||
201 | |||
202 | head page - the next page in the ring buffer that will be swapped | ||
203 | with the reader page. | ||
204 | |||
205 | tail page - the page where the next write will take place. | ||
206 | |||
207 | commit page - the page that last finished a write. | ||
208 | |||
209 | The commit page only is updated by the outer most writer in the | ||
210 | writer stack. A writer that preempts another writer will not move the | ||
211 | commit page. | ||
212 | |||
213 | When data is written into the ring buffer, a position is reserved | ||
214 | in the ring buffer and passed back to the writer. When the writer | ||
215 | is finished writing data into that position, it commits the write. | ||
216 | |||
217 | Another write (or a read) may take place at anytime during this | ||
218 | transaction. If another write happens it must finish before continuing | ||
219 | with the previous write. | ||
220 | |||
221 | |||
222 | Write reserve: | ||
223 | |||
224 | Buffer page | ||
225 | +---------+ | ||
226 | |written | | ||
227 | +---------+ <--- given back to writer (current commit) | ||
228 | |reserved | | ||
229 | +---------+ <--- tail pointer | ||
230 | | empty | | ||
231 | +---------+ | ||
232 | |||
233 | Write commit: | ||
234 | |||
235 | Buffer page | ||
236 | +---------+ | ||
237 | |written | | ||
238 | +---------+ | ||
239 | |written | | ||
240 | +---------+ <--- next positon for write (current commit) | ||
241 | | empty | | ||
242 | +---------+ | ||
243 | |||
244 | |||
245 | If a write happens after the first reserve: | ||
246 | |||
247 | Buffer page | ||
248 | +---------+ | ||
249 | |written | | ||
250 | +---------+ <-- current commit | ||
251 | |reserved | | ||
252 | +---------+ <--- given back to second writer | ||
253 | |reserved | | ||
254 | +---------+ <--- tail pointer | ||
255 | |||
256 | After second writer commits: | ||
257 | |||
258 | |||
259 | Buffer page | ||
260 | +---------+ | ||
261 | |written | | ||
262 | +---------+ <--(last full commit) | ||
263 | |reserved | | ||
264 | +---------+ | ||
265 | |pending | | ||
266 | |commit | | ||
267 | +---------+ <--- tail pointer | ||
268 | |||
269 | When the first writer commits: | ||
270 | |||
271 | Buffer page | ||
272 | +---------+ | ||
273 | |written | | ||
274 | +---------+ | ||
275 | |written | | ||
276 | +---------+ | ||
277 | |written | | ||
278 | +---------+ <--(last full commit and tail pointer) | ||
279 | |||
280 | |||
281 | The commit pointer points to the last write location that was | ||
282 | committed without preempting another write. When a write that | ||
283 | preempted another write is committed, it only becomes a pending commit | ||
284 | and will not be a full commit till all writes have been committed. | ||
285 | |||
286 | The commit page points to the page that has the last full commit. | ||
287 | The tail page points to the page with the last write (before | ||
288 | committing). | ||
289 | |||
290 | The tail page is always equal to or after the commit page. It may | ||
291 | be several pages ahead. If the tail page catches up to the commit | ||
292 | page then no more writes may take place (regardless of the mode | ||
293 | of the ring buffer: overwrite and produce/consumer). | ||
294 | |||
295 | The order of pages are: | ||
296 | |||
297 | head page | ||
298 | commit page | ||
299 | tail page | ||
300 | |||
301 | Possible scenario: | ||
302 | tail page | ||
303 | head page commit page | | ||
304 | | | | | ||
305 | v v v | ||
306 | +---+ +---+ +---+ +---+ | ||
307 | <---| |--->| |--->| |--->| |---> | ||
308 | --->| |<---| |<---| |<---| |<--- | ||
309 | +---+ +---+ +---+ +---+ | ||
310 | |||
311 | There is a special case that the head page is after either the commit page | ||
312 | and possibly the tail page. That is when the commit (and tail) page has been | ||
313 | swapped with the reader page. This is because the head page is always | ||
314 | part of the ring buffer, but the reader page is not. When ever there | ||
315 | has been less than a full page that has been committed inside the ring buffer, | ||
316 | and a reader swaps out a page, it will be swapping out the commit page. | ||
317 | |||
318 | |||
319 | reader page commit page tail page | ||
320 | | | | | ||
321 | v | | | ||
322 | +---+ | | | ||
323 | | |<----------+ | | ||
324 | | |<------------------------+ | ||
325 | | |------+ | ||
326 | +---+ | | ||
327 | | | ||
328 | v | ||
329 | +---+ +---+ +---+ +---+ | ||
330 | <---| |--->| |--->| |--->| |---> | ||
331 | --->| |<---| |<---| |<---| |<--- | ||
332 | +---+ +---+ +---+ +---+ | ||
333 | ^ | ||
334 | | | ||
335 | head page | ||
336 | |||
337 | |||
338 | In this case, the head page will not move when the tail and commit | ||
339 | move back into the ring buffer. | ||
340 | |||
341 | The reader can not swap a page into the ring buffer if the commit page | ||
342 | is still on that page. If the read meets the last commit (real commit | ||
343 | not pending or reserved), then there is nothing more to read. | ||
344 | The buffer is considered empty until another full commit finishes. | ||
345 | |||
346 | When the tail meets the head page, if the buffer is in overwrite mode, | ||
347 | the head page will be pushed ahead one. If the buffer is in producer/consumer | ||
348 | mode, the write will fail. | ||
349 | |||
350 | Overwrite mode: | ||
351 | |||
352 | tail page | ||
353 | | | ||
354 | v | ||
355 | +---+ +---+ +---+ +---+ | ||
356 | <---| |--->| |--->| |--->| |---> | ||
357 | --->| |<---| |<---| |<---| |<--- | ||
358 | +---+ +---+ +---+ +---+ | ||
359 | ^ | ||
360 | | | ||
361 | head page | ||
362 | |||
363 | |||
364 | tail page | ||
365 | | | ||
366 | v | ||
367 | +---+ +---+ +---+ +---+ | ||
368 | <---| |--->| |--->| |--->| |---> | ||
369 | --->| |<---| |<---| |<---| |<--- | ||
370 | +---+ +---+ +---+ +---+ | ||
371 | ^ | ||
372 | | | ||
373 | head page | ||
374 | |||
375 | |||
376 | tail page | ||
377 | | | ||
378 | v | ||
379 | +---+ +---+ +---+ +---+ | ||
380 | <---| |--->| |--->| |--->| |---> | ||
381 | --->| |<---| |<---| |<---| |<--- | ||
382 | +---+ +---+ +---+ +---+ | ||
383 | ^ | ||
384 | | | ||
385 | head page | ||
386 | |||
387 | Note, the reader page will still point to the previous head page. | ||
388 | But when a swap takes place, it will use the most recent head page. | ||
389 | |||
390 | |||
391 | Making the Ring Buffer Lockless: | ||
392 | -------------------------------- | ||
393 | |||
394 | The main idea behind the lockless algorithm is to combine the moving | ||
395 | of the head_page pointer with the swapping of pages with the reader. | ||
396 | State flags are placed inside the pointer to the page. To do this, | ||
397 | each page must be aligned in memory by 4 bytes. This will allow the 2 | ||
398 | least significant bits of the address to be used as flags. Since | ||
399 | they will always be zero for the address. To get the address, | ||
400 | simply mask out the flags. | ||
401 | |||
402 | MASK = ~3 | ||
403 | |||
404 | address & MASK | ||
405 | |||
406 | Two flags will be kept by these two bits: | ||
407 | |||
408 | HEADER - the page being pointed to is a head page | ||
409 | |||
410 | UPDATE - the page being pointed to is being updated by a writer | ||
411 | and was or is about to be a head page. | ||
412 | |||
413 | |||
414 | reader page | ||
415 | | | ||
416 | v | ||
417 | +---+ | ||
418 | | |------+ | ||
419 | +---+ | | ||
420 | | | ||
421 | v | ||
422 | +---+ +---+ +---+ +---+ | ||
423 | <---| |--->| |-H->| |--->| |---> | ||
424 | --->| |<---| |<---| |<---| |<--- | ||
425 | +---+ +---+ +---+ +---+ | ||
426 | |||
427 | |||
428 | The above pointer "-H->" would have the HEADER flag set. That is | ||
429 | the next page is the next page to be swapped out by the reader. | ||
430 | This pointer means the next page is the head page. | ||
431 | |||
432 | When the tail page meets the head pointer, it will use cmpxchg to | ||
433 | change the pointer to the UPDATE state: | ||
434 | |||
435 | |||
436 | tail page | ||
437 | | | ||
438 | v | ||
439 | +---+ +---+ +---+ +---+ | ||
440 | <---| |--->| |-H->| |--->| |---> | ||
441 | --->| |<---| |<---| |<---| |<--- | ||
442 | +---+ +---+ +---+ +---+ | ||
443 | |||
444 | tail page | ||
445 | | | ||
446 | v | ||
447 | +---+ +---+ +---+ +---+ | ||
448 | <---| |--->| |-U->| |--->| |---> | ||
449 | --->| |<---| |<---| |<---| |<--- | ||
450 | +---+ +---+ +---+ +---+ | ||
451 | |||
452 | "-U->" represents a pointer in the UPDATE state. | ||
453 | |||
454 | Any access to the reader will need to take some sort of lock to serialize | ||
455 | the readers. But the writers will never take a lock to write to the | ||
456 | ring buffer. This means we only need to worry about a single reader, | ||
457 | and writes only preempt in "stack" formation. | ||
458 | |||
459 | When the reader tries to swap the page with the ring buffer, it | ||
460 | will also use cmpxchg. If the flag bit in the pointer to the | ||
461 | head page does not have the HEADER flag set, the compare will fail | ||
462 | and the reader will need to look for the new head page and try again. | ||
463 | Note, the flag UPDATE and HEADER are never set at the same time. | ||
464 | |||
465 | The reader swaps the reader page as follows: | ||
466 | |||
467 | +------+ | ||
468 | |reader| RING BUFFER | ||
469 | |page | | ||
470 | +------+ | ||
471 | +---+ +---+ +---+ | ||
472 | | |--->| |--->| | | ||
473 | | |<---| |<---| | | ||
474 | +---+ +---+ +---+ | ||
475 | ^ | ^ | | ||
476 | | +---------------+ | | ||
477 | +-----H-------------+ | ||
478 | |||
479 | The reader sets the reader page next pointer as HEADER to the page after | ||
480 | the head page. | ||
481 | |||
482 | |||
483 | +------+ | ||
484 | |reader| RING BUFFER | ||
485 | |page |-------H-----------+ | ||
486 | +------+ v | ||
487 | | +---+ +---+ +---+ | ||
488 | | | |--->| |--->| | | ||
489 | | | |<---| |<---| |<-+ | ||
490 | | +---+ +---+ +---+ | | ||
491 | | ^ | ^ | | | ||
492 | | | +---------------+ | | | ||
493 | | +-----H-------------+ | | ||
494 | +--------------------------------------+ | ||
495 | |||
496 | It does a cmpxchg with the pointer to the previous head page to make it | ||
497 | point to the reader page. Note that the new pointer does not have the HEADER | ||
498 | flag set. This action atomically moves the head page forward. | ||
499 | |||
500 | +------+ | ||
501 | |reader| RING BUFFER | ||
502 | |page |-------H-----------+ | ||
503 | +------+ v | ||
504 | | ^ +---+ +---+ +---+ | ||
505 | | | | |-->| |-->| | | ||
506 | | | | |<--| |<--| |<-+ | ||
507 | | | +---+ +---+ +---+ | | ||
508 | | | | ^ | | | ||
509 | | | +-------------+ | | | ||
510 | | +-----------------------------+ | | ||
511 | +------------------------------------+ | ||
512 | |||
513 | After the new head page is set, the previous pointer of the head page is | ||
514 | updated to the reader page. | ||
515 | |||
516 | +------+ | ||
517 | |reader| RING BUFFER | ||
518 | |page |-------H-----------+ | ||
519 | +------+ <---------------+ v | ||
520 | | ^ +---+ +---+ +---+ | ||
521 | | | | |-->| |-->| | | ||
522 | | | | | | |<--| |<-+ | ||
523 | | | +---+ +---+ +---+ | | ||
524 | | | | ^ | | | ||
525 | | | +-------------+ | | | ||
526 | | +-----------------------------+ | | ||
527 | +------------------------------------+ | ||
528 | |||
529 | +------+ | ||
530 | |buffer| RING BUFFER | ||
531 | |page |-------H-----------+ <--- New head page | ||
532 | +------+ <---------------+ v | ||
533 | | ^ +---+ +---+ +---+ | ||
534 | | | | | | |-->| | | ||
535 | | | New | | | |<--| |<-+ | ||
536 | | | Reader +---+ +---+ +---+ | | ||
537 | | | page ----^ | | | ||
538 | | | | | | ||
539 | | +-----------------------------+ | | ||
540 | +------------------------------------+ | ||
541 | |||
542 | Another important point. The page that the reader page points back to | ||
543 | by its previous pointer (the one that now points to the new head page) | ||
544 | never points back to the reader page. That is because the reader page is | ||
545 | not part of the ring buffer. Traversing the ring buffer via the next pointers | ||
546 | will always stay in the ring buffer. Traversing the ring buffer via the | ||
547 | prev pointers may not. | ||
548 | |||
549 | Note, the way to determine a reader page is simply by examining the previous | ||
550 | pointer of the page. If the next pointer of the previous page does not | ||
551 | point back to the original page, then the original page is a reader page: | ||
552 | |||
553 | |||
554 | +--------+ | ||
555 | | reader | next +----+ | ||
556 | | page |-------->| |<====== (buffer page) | ||
557 | +--------+ +----+ | ||
558 | | | ^ | ||
559 | | v | next | ||
560 | prev | +----+ | ||
561 | +------------->| | | ||
562 | +----+ | ||
563 | |||
564 | The way the head page moves forward: | ||
565 | |||
566 | When the tail page meets the head page and the buffer is in overwrite mode | ||
567 | and more writes take place, the head page must be moved forward before the | ||
568 | writer may move the tail page. The way this is done is that the writer | ||
569 | performs a cmpxchg to convert the pointer to the head page from the HEADER | ||
570 | flag to have the UPDATE flag set. Once this is done, the reader will | ||
571 | not be able to swap the head page from the buffer, nor will it be able to | ||
572 | move the head page, until the writer is finished with the move. | ||
573 | |||
574 | This eliminates any races that the reader can have on the writer. The reader | ||
575 | must spin, and this is why the reader can not preempt the writer. | ||
576 | |||
577 | tail page | ||
578 | | | ||
579 | v | ||
580 | +---+ +---+ +---+ +---+ | ||
581 | <---| |--->| |-H->| |--->| |---> | ||
582 | --->| |<---| |<---| |<---| |<--- | ||
583 | +---+ +---+ +---+ +---+ | ||
584 | |||
585 | tail page | ||
586 | | | ||
587 | v | ||
588 | +---+ +---+ +---+ +---+ | ||
589 | <---| |--->| |-U->| |--->| |---> | ||
590 | --->| |<---| |<---| |<---| |<--- | ||
591 | +---+ +---+ +---+ +---+ | ||
592 | |||
593 | The following page will be made into the new head page. | ||
594 | |||
595 | tail page | ||
596 | | | ||
597 | v | ||
598 | +---+ +---+ +---+ +---+ | ||
599 | <---| |--->| |-U->| |-H->| |---> | ||
600 | --->| |<---| |<---| |<---| |<--- | ||
601 | +---+ +---+ +---+ +---+ | ||
602 | |||
603 | After the new head page has been set, we can set the old head page | ||
604 | pointer back to NORMAL. | ||
605 | |||
606 | tail page | ||
607 | | | ||
608 | v | ||
609 | +---+ +---+ +---+ +---+ | ||
610 | <---| |--->| |--->| |-H->| |---> | ||
611 | --->| |<---| |<---| |<---| |<--- | ||
612 | +---+ +---+ +---+ +---+ | ||
613 | |||
614 | After the head page has been moved, the tail page may now move forward. | ||
615 | |||
616 | tail page | ||
617 | | | ||
618 | v | ||
619 | +---+ +---+ +---+ +---+ | ||
620 | <---| |--->| |--->| |-H->| |---> | ||
621 | --->| |<---| |<---| |<---| |<--- | ||
622 | +---+ +---+ +---+ +---+ | ||
623 | |||
624 | |||
625 | The above are the trivial updates. Now for the more complex scenarios. | ||
626 | |||
627 | |||
628 | As stated before, if enough writes preempt the first write, the | ||
629 | tail page may make it all the way around the buffer and meet the commit | ||
630 | page. At this time, we must start dropping writes (usually with some kind | ||
631 | of warning to the user). But what happens if the commit was still on the | ||
632 | reader page? The commit page is not part of the ring buffer. The tail page | ||
633 | must account for this. | ||
634 | |||
635 | |||
636 | reader page commit page | ||
637 | | | | ||
638 | v | | ||
639 | +---+ | | ||
640 | | |<----------+ | ||
641 | | | | ||
642 | | |------+ | ||
643 | +---+ | | ||
644 | | | ||
645 | v | ||
646 | +---+ +---+ +---+ +---+ | ||
647 | <---| |--->| |-H->| |--->| |---> | ||
648 | --->| |<---| |<---| |<---| |<--- | ||
649 | +---+ +---+ +---+ +---+ | ||
650 | ^ | ||
651 | | | ||
652 | tail page | ||
653 | |||
654 | If the tail page were to simply push the head page forward, the commit when | ||
655 | leaving the reader page would not be pointing to the correct page. | ||
656 | |||
657 | The solution to this is to test if the commit page is on the reader page | ||
658 | before pushing the head page. If it is, then it can be assumed that the | ||
659 | tail page wrapped the buffer, and we must drop new writes. | ||
660 | |||
661 | This is not a race condition, because the commit page can only be moved | ||
662 | by the outter most writer (the writer that was preempted). | ||
663 | This means that the commit will not move while a writer is moving the | ||
664 | tail page. The reader can not swap the reader page if it is also being | ||
665 | used as the commit page. The reader can simply check that the commit | ||
666 | is off the reader page. Once the commit page leaves the reader page | ||
667 | it will never go back on it unless a reader does another swap with the | ||
668 | buffer page that is also the commit page. | ||
669 | |||
670 | |||
671 | Nested writes | ||
672 | ------------- | ||
673 | |||
674 | In the pushing forward of the tail page we must first push forward | ||
675 | the head page if the head page is the next page. If the head page | ||
676 | is not the next page, the tail page is simply updated with a cmpxchg. | ||
677 | |||
678 | Only writers move the tail page. This must be done atomically to protect | ||
679 | against nested writers. | ||
680 | |||
681 | temp_page = tail_page | ||
682 | next_page = temp_page->next | ||
683 | cmpxchg(tail_page, temp_page, next_page) | ||
684 | |||
685 | The above will update the tail page if it is still pointing to the expected | ||
686 | page. If this fails, a nested write pushed it forward, the the current write | ||
687 | does not need to push it. | ||
688 | |||
689 | |||
690 | temp page | ||
691 | | | ||
692 | v | ||
693 | tail page | ||
694 | | | ||
695 | v | ||
696 | +---+ +---+ +---+ +---+ | ||
697 | <---| |--->| |--->| |--->| |---> | ||
698 | --->| |<---| |<---| |<---| |<--- | ||
699 | +---+ +---+ +---+ +---+ | ||
700 | |||
701 | Nested write comes in and moves the tail page forward: | ||
702 | |||
703 | tail page (moved by nested writer) | ||
704 | temp page | | ||
705 | | | | ||
706 | v v | ||
707 | +---+ +---+ +---+ +---+ | ||
708 | <---| |--->| |--->| |--->| |---> | ||
709 | --->| |<---| |<---| |<---| |<--- | ||
710 | +---+ +---+ +---+ +---+ | ||
711 | |||
712 | The above would fail the cmpxchg, but since the tail page has already | ||
713 | been moved forward, the writer will just try again to reserve storage | ||
714 | on the new tail page. | ||
715 | |||
716 | But the moving of the head page is a bit more complex. | ||
717 | |||
718 | tail page | ||
719 | | | ||
720 | v | ||
721 | +---+ +---+ +---+ +---+ | ||
722 | <---| |--->| |-H->| |--->| |---> | ||
723 | --->| |<---| |<---| |<---| |<--- | ||
724 | +---+ +---+ +---+ +---+ | ||
725 | |||
726 | The write converts the head page pointer to UPDATE. | ||
727 | |||
728 | tail page | ||
729 | | | ||
730 | v | ||
731 | +---+ +---+ +---+ +---+ | ||
732 | <---| |--->| |-U->| |--->| |---> | ||
733 | --->| |<---| |<---| |<---| |<--- | ||
734 | +---+ +---+ +---+ +---+ | ||
735 | |||
736 | But if a nested writer preempts here. It will see that the next | ||
737 | page is a head page, but it is also nested. It will detect that | ||
738 | it is nested and will save that information. The detection is the | ||
739 | fact that it sees the UPDATE flag instead of a HEADER or NORMAL | ||
740 | pointer. | ||
741 | |||
742 | The nested writer will set the new head page pointer. | ||
743 | |||
744 | tail page | ||
745 | | | ||
746 | v | ||
747 | +---+ +---+ +---+ +---+ | ||
748 | <---| |--->| |-U->| |-H->| |---> | ||
749 | --->| |<---| |<---| |<---| |<--- | ||
750 | +---+ +---+ +---+ +---+ | ||
751 | |||
752 | But it will not reset the update back to normal. Only the writer | ||
753 | that converted a pointer from HEAD to UPDATE will convert it back | ||
754 | to NORMAL. | ||
755 | |||
756 | tail page | ||
757 | | | ||
758 | v | ||
759 | +---+ +---+ +---+ +---+ | ||
760 | <---| |--->| |-U->| |-H->| |---> | ||
761 | --->| |<---| |<---| |<---| |<--- | ||
762 | +---+ +---+ +---+ +---+ | ||
763 | |||
764 | After the nested writer finishes, the outer most writer will convert | ||
765 | the UPDATE pointer to NORMAL. | ||
766 | |||
767 | |||
768 | tail page | ||
769 | | | ||
770 | v | ||
771 | +---+ +---+ +---+ +---+ | ||
772 | <---| |--->| |--->| |-H->| |---> | ||
773 | --->| |<---| |<---| |<---| |<--- | ||
774 | +---+ +---+ +---+ +---+ | ||
775 | |||
776 | |||
777 | It can be even more complex if several nested writes came in and moved | ||
778 | the tail page ahead several pages: | ||
779 | |||
780 | |||
781 | (first writer) | ||
782 | |||
783 | tail page | ||
784 | | | ||
785 | v | ||
786 | +---+ +---+ +---+ +---+ | ||
787 | <---| |--->| |-H->| |--->| |---> | ||
788 | --->| |<---| |<---| |<---| |<--- | ||
789 | +---+ +---+ +---+ +---+ | ||
790 | |||
791 | The write converts the head page pointer to UPDATE. | ||
792 | |||
793 | tail page | ||
794 | | | ||
795 | v | ||
796 | +---+ +---+ +---+ +---+ | ||
797 | <---| |--->| |-U->| |--->| |---> | ||
798 | --->| |<---| |<---| |<---| |<--- | ||
799 | +---+ +---+ +---+ +---+ | ||
800 | |||
801 | Next writer comes in, and sees the update and sets up the new | ||
802 | head page. | ||
803 | |||
804 | (second writer) | ||
805 | |||
806 | tail page | ||
807 | | | ||
808 | v | ||
809 | +---+ +---+ +---+ +---+ | ||
810 | <---| |--->| |-U->| |-H->| |---> | ||
811 | --->| |<---| |<---| |<---| |<--- | ||
812 | +---+ +---+ +---+ +---+ | ||
813 | |||
814 | The nested writer moves the tail page forward. But does not set the old | ||
815 | update page to NORMAL because it is not the outer most writer. | ||
816 | |||
817 | tail page | ||
818 | | | ||
819 | v | ||
820 | +---+ +---+ +---+ +---+ | ||
821 | <---| |--->| |-U->| |-H->| |---> | ||
822 | --->| |<---| |<---| |<---| |<--- | ||
823 | +---+ +---+ +---+ +---+ | ||
824 | |||
825 | Another writer preempts and sees the page after the tail page is a head page. | ||
826 | It changes it from HEAD to UPDATE. | ||
827 | |||
828 | (third writer) | ||
829 | |||
830 | tail page | ||
831 | | | ||
832 | v | ||
833 | +---+ +---+ +---+ +---+ | ||
834 | <---| |--->| |-U->| |-U->| |---> | ||
835 | --->| |<---| |<---| |<---| |<--- | ||
836 | +---+ +---+ +---+ +---+ | ||
837 | |||
838 | The writer will move the head page forward: | ||
839 | |||
840 | |||
841 | (third writer) | ||
842 | |||
843 | tail page | ||
844 | | | ||
845 | v | ||
846 | +---+ +---+ +---+ +---+ | ||
847 | <---| |--->| |-U->| |-U->| |-H-> | ||
848 | --->| |<---| |<---| |<---| |<--- | ||
849 | +---+ +---+ +---+ +---+ | ||
850 | |||
851 | But now that the third writer did change the HEAD flag to UPDATE it | ||
852 | will convert it to normal: | ||
853 | |||
854 | |||
855 | (third writer) | ||
856 | |||
857 | tail page | ||
858 | | | ||
859 | v | ||
860 | +---+ +---+ +---+ +---+ | ||
861 | <---| |--->| |-U->| |--->| |-H-> | ||
862 | --->| |<---| |<---| |<---| |<--- | ||
863 | +---+ +---+ +---+ +---+ | ||
864 | |||
865 | |||
866 | Then it will move the tail page, and return back to the second writer. | ||
867 | |||
868 | |||
869 | (second writer) | ||
870 | |||
871 | tail page | ||
872 | | | ||
873 | v | ||
874 | +---+ +---+ +---+ +---+ | ||
875 | <---| |--->| |-U->| |--->| |-H-> | ||
876 | --->| |<---| |<---| |<---| |<--- | ||
877 | +---+ +---+ +---+ +---+ | ||
878 | |||
879 | |||
880 | The second writer will fail to move the tail page because it was already | ||
881 | moved, so it will try again and add its data to the new tail page. | ||
882 | It will return to the first writer. | ||
883 | |||
884 | |||
885 | (first writer) | ||
886 | |||
887 | tail page | ||
888 | | | ||
889 | v | ||
890 | +---+ +---+ +---+ +---+ | ||
891 | <---| |--->| |-U->| |--->| |-H-> | ||
892 | --->| |<---| |<---| |<---| |<--- | ||
893 | +---+ +---+ +---+ +---+ | ||
894 | |||
895 | The first writer can not know atomically test if the tail page moved | ||
896 | while it updates the HEAD page. It will then update the head page to | ||
897 | what it thinks is the new head page. | ||
898 | |||
899 | |||
900 | (first writer) | ||
901 | |||
902 | tail page | ||
903 | | | ||
904 | v | ||
905 | +---+ +---+ +---+ +---+ | ||
906 | <---| |--->| |-U->| |-H->| |-H-> | ||
907 | --->| |<---| |<---| |<---| |<--- | ||
908 | +---+ +---+ +---+ +---+ | ||
909 | |||
910 | Since the cmpxchg returns the old value of the pointer the first writer | ||
911 | will see it succeeded in updating the pointer from NORMAL to HEAD. | ||
912 | But as we can see, this is not good enough. It must also check to see | ||
913 | if the tail page is either where it use to be or on the next page: | ||
914 | |||
915 | |||
916 | (first writer) | ||
917 | |||
918 | A B tail page | ||
919 | | | | | ||
920 | v v v | ||
921 | +---+ +---+ +---+ +---+ | ||
922 | <---| |--->| |-U->| |-H->| |-H-> | ||
923 | --->| |<---| |<---| |<---| |<--- | ||
924 | +---+ +---+ +---+ +---+ | ||
925 | |||
926 | If tail page != A and tail page does not equal B, then it must reset the | ||
927 | pointer back to NORMAL. The fact that it only needs to worry about | ||
928 | nested writers, it only needs to check this after setting the HEAD page. | ||
929 | |||
930 | |||
931 | (first writer) | ||
932 | |||
933 | A B tail page | ||
934 | | | | | ||
935 | v v v | ||
936 | +---+ +---+ +---+ +---+ | ||
937 | <---| |--->| |-U->| |--->| |-H-> | ||
938 | --->| |<---| |<---| |<---| |<--- | ||
939 | +---+ +---+ +---+ +---+ | ||
940 | |||
941 | Now the writer can update the head page. This is also why the head page must | ||
942 | remain in UPDATE and only reset by the outer most writer. This prevents | ||
943 | the reader from seeing the incorrect head page. | ||
944 | |||
945 | |||
946 | (first writer) | ||
947 | |||
948 | A B tail page | ||
949 | | | | | ||
950 | v v v | ||
951 | +---+ +---+ +---+ +---+ | ||
952 | <---| |--->| |--->| |--->| |-H-> | ||
953 | --->| |<---| |<---| |<---| |<--- | ||
954 | +---+ +---+ +---+ +---+ | ||
955 | |||
diff --git a/Documentation/vgaarbiter.txt b/Documentation/vgaarbiter.txt new file mode 100644 index 000000000000..987f9b0a5ece --- /dev/null +++ b/Documentation/vgaarbiter.txt | |||
@@ -0,0 +1,194 @@ | |||
1 | |||
2 | VGA Arbiter | ||
3 | =========== | ||
4 | |||
5 | Graphic devices are accessed through ranges in I/O or memory space. While most | ||
6 | modern devices allow relocation of such ranges, some "Legacy" VGA devices | ||
7 | implemented on PCI will typically have the same "hard-decoded" addresses as | ||
8 | they did on ISA. For more details see "PCI Bus Binding to IEEE Std 1275-1994 | ||
9 | Standard for Boot (Initialization Configuration) Firmware Revision 2.1" | ||
10 | Section 7, Legacy Devices. | ||
11 | |||
12 | The Resource Access Control (RAC) module inside the X server [0] existed for | ||
13 | the legacy VGA arbitration task (besides other bus management tasks) when more | ||
14 | than one legacy device co-exists on the same machine. But the problem happens | ||
15 | when these devices are trying to be accessed by different userspace clients | ||
16 | (e.g. two server in parallel). Their address assignments conflict. Moreover, | ||
17 | ideally, being an userspace application, it is not the role of the the X | ||
18 | server to control bus resources. Therefore an arbitration scheme outside of | ||
19 | the X server is needed to control the sharing of these resources. This | ||
20 | document introduces the operation of the VGA arbiter implemented for Linux | ||
21 | kernel. | ||
22 | |||
23 | ---------------------------------------------------------------------------- | ||
24 | |||
25 | I. Details and Theory of Operation | ||
26 | I.1 vgaarb | ||
27 | I.2 libpciaccess | ||
28 | I.3 xf86VGAArbiter (X server implementation) | ||
29 | II. Credits | ||
30 | III.References | ||
31 | |||
32 | |||
33 | I. Details and Theory of Operation | ||
34 | ================================== | ||
35 | |||
36 | I.1 vgaarb | ||
37 | ---------- | ||
38 | |||
39 | The vgaarb is a module of the Linux Kernel. When it is initially loaded, it | ||
40 | scans all PCI devices and adds the VGA ones inside the arbitration. The | ||
41 | arbiter then enables/disables the decoding on different devices of the VGA | ||
42 | legacy instructions. Device which do not want/need to use the arbiter may | ||
43 | explicitly tell it by calling vga_set_legacy_decoding(). | ||
44 | |||
45 | The kernel exports a char device interface (/dev/vga_arbiter) to the clients, | ||
46 | which has the following semantics: | ||
47 | |||
48 | open : open user instance of the arbiter. By default, it's attached to | ||
49 | the default VGA device of the system. | ||
50 | |||
51 | close : close user instance. Release locks made by the user | ||
52 | |||
53 | read : return a string indicating the status of the target like: | ||
54 | |||
55 | "<card_ID>,decodes=<io_state>,owns=<io_state>,locks=<io_state> (ic,mc)" | ||
56 | |||
57 | An IO state string is of the form {io,mem,io+mem,none}, mc and | ||
58 | ic are respectively mem and io lock counts (for debugging/ | ||
59 | diagnostic only). "decodes" indicate what the card currently | ||
60 | decodes, "owns" indicates what is currently enabled on it, and | ||
61 | "locks" indicates what is locked by this card. If the card is | ||
62 | unplugged, we get "invalid" then for card_ID and an -ENODEV | ||
63 | error is returned for any command until a new card is targeted. | ||
64 | |||
65 | |||
66 | write : write a command to the arbiter. List of commands: | ||
67 | |||
68 | target <card_ID> : switch target to card <card_ID> (see below) | ||
69 | lock <io_state> : acquires locks on target ("none" is an invalid io_state) | ||
70 | trylock <io_state> : non-blocking acquire locks on target (returns EBUSY if | ||
71 | unsuccessful) | ||
72 | unlock <io_state> : release locks on target | ||
73 | unlock all : release all locks on target held by this user (not | ||
74 | implemented yet) | ||
75 | decodes <io_state> : set the legacy decoding attributes for the card | ||
76 | |||
77 | poll : event if something changes on any card (not just the | ||
78 | target) | ||
79 | |||
80 | card_ID is of the form "PCI:domain:bus:dev.fn". It can be set to "default" | ||
81 | to go back to the system default card (TODO: not implemented yet). Currently, | ||
82 | only PCI is supported as a prefix, but the userland API may support other bus | ||
83 | types in the future, even if the current kernel implementation doesn't. | ||
84 | |||
85 | Note about locks: | ||
86 | |||
87 | The driver keeps track of which user has which locks on which card. It | ||
88 | supports stacking, like the kernel one. This complexifies the implementation | ||
89 | a bit, but makes the arbiter more tolerant to user space problems and able | ||
90 | to properly cleanup in all cases when a process dies. | ||
91 | Currently, a max of 16 cards can have locks simultaneously issued from | ||
92 | user space for a given user (file descriptor instance) of the arbiter. | ||
93 | |||
94 | In the case of devices hot-{un,}plugged, there is a hook - pci_notify() - to | ||
95 | notify them being added/removed in the system and automatically added/removed | ||
96 | in the arbiter. | ||
97 | |||
98 | There's also a in-kernel API of the arbiter in the case of DRM, vgacon and | ||
99 | others which may use the arbiter. | ||
100 | |||
101 | |||
102 | I.2 libpciaccess | ||
103 | ---------------- | ||
104 | |||
105 | To use the vga arbiter char device it was implemented an API inside the | ||
106 | libpciaccess library. One fieldd was added to struct pci_device (each device | ||
107 | on the system): | ||
108 | |||
109 | /* the type of resource decoded by the device */ | ||
110 | int vgaarb_rsrc; | ||
111 | |||
112 | Besides it, in pci_system were added: | ||
113 | |||
114 | int vgaarb_fd; | ||
115 | int vga_count; | ||
116 | struct pci_device *vga_target; | ||
117 | struct pci_device *vga_default_dev; | ||
118 | |||
119 | |||
120 | The vga_count is usually need to keep informed how many cards are being | ||
121 | arbitrated, so for instance if there's only one then it can totally escape the | ||
122 | scheme. | ||
123 | |||
124 | |||
125 | These functions below acquire VGA resources for the given card and mark those | ||
126 | resources as locked. If the resources requested are "normal" (and not legacy) | ||
127 | resources, the arbiter will first check whether the card is doing legacy | ||
128 | decoding for that type of resource. If yes, the lock is "converted" into a | ||
129 | legacy resource lock. The arbiter will first look for all VGA cards that | ||
130 | might conflict and disable their IOs and/or Memory access, including VGA | ||
131 | forwarding on P2P bridges if necessary, so that the requested resources can | ||
132 | be used. Then, the card is marked as locking these resources and the IO and/or | ||
133 | Memory access is enabled on the card (including VGA forwarding on parent | ||
134 | P2P bridges if any). In the case of vga_arb_lock(), the function will block | ||
135 | if some conflicting card is already locking one of the required resources (or | ||
136 | any resource on a different bus segment, since P2P bridges don't differentiate | ||
137 | VGA memory and IO afaik). If the card already owns the resources, the function | ||
138 | succeeds. vga_arb_trylock() will return (-EBUSY) instead of blocking. Nested | ||
139 | calls are supported (a per-resource counter is maintained). | ||
140 | |||
141 | |||
142 | Set the target device of this client. | ||
143 | int pci_device_vgaarb_set_target (struct pci_device *dev); | ||
144 | |||
145 | |||
146 | For instance, in x86 if two devices on the same bus want to lock different | ||
147 | resources, both will succeed (lock). If devices are in different buses and | ||
148 | trying to lock different resources, only the first who tried succeeds. | ||
149 | int pci_device_vgaarb_lock (void); | ||
150 | int pci_device_vgaarb_trylock (void); | ||
151 | |||
152 | Unlock resources of device. | ||
153 | int pci_device_vgaarb_unlock (void); | ||
154 | |||
155 | Indicates to the arbiter if the card decodes legacy VGA IOs, legacy VGA | ||
156 | Memory, both, or none. All cards default to both, the card driver (fbdev for | ||
157 | example) should tell the arbiter if it has disabled legacy decoding, so the | ||
158 | card can be left out of the arbitration process (and can be safe to take | ||
159 | interrupts at any time. | ||
160 | int pci_device_vgaarb_decodes (int new_vgaarb_rsrc); | ||
161 | |||
162 | Connects to the arbiter device, allocates the struct | ||
163 | int pci_device_vgaarb_init (void); | ||
164 | |||
165 | Close the connection | ||
166 | void pci_device_vgaarb_fini (void); | ||
167 | |||
168 | |||
169 | I.3 xf86VGAArbiter (X server implementation) | ||
170 | -------------------------------------------- | ||
171 | |||
172 | (TODO) | ||
173 | |||
174 | X server basically wraps all the functions that touch VGA registers somehow. | ||
175 | |||
176 | |||
177 | II. Credits | ||
178 | =========== | ||
179 | |||
180 | Benjamin Herrenschmidt (IBM?) started this work when he discussed such design | ||
181 | with the Xorg community in 2005 [1, 2]. In the end of 2007, Paulo Zanoni and | ||
182 | Tiago Vignatti (both of C3SL/Federal University of Paraná) proceeded his work | ||
183 | enhancing the kernel code to adapt as a kernel module and also did the | ||
184 | implementation of the user space side [3]. Now (2009) Tiago Vignatti and Dave | ||
185 | Airlie finally put this work in shape and queued to Jesse Barnes' PCI tree. | ||
186 | |||
187 | |||
188 | III. References | ||
189 | ============== | ||
190 | |||
191 | [0] http://cgit.freedesktop.org/xorg/xserver/commit/?id=4b42448a2388d40f257774fbffdccaea87bd0347 | ||
192 | [1] http://lists.freedesktop.org/archives/xorg/2005-March/006663.html | ||
193 | [2] http://lists.freedesktop.org/archives/xorg/2005-March/006745.html | ||
194 | [3] http://lists.freedesktop.org/archives/xorg/2007-October/029507.html | ||
diff --git a/Documentation/video4linux/CARDLIST.cx23885 b/Documentation/video4linux/CARDLIST.cx23885 index 450b8f8c389b..525edb37c758 100644 --- a/Documentation/video4linux/CARDLIST.cx23885 +++ b/Documentation/video4linux/CARDLIST.cx23885 | |||
@@ -21,3 +21,5 @@ | |||
21 | 20 -> Hauppauge WinTV-HVR1255 [0070:2251] | 21 | 20 -> Hauppauge WinTV-HVR1255 [0070:2251] |
22 | 21 -> Hauppauge WinTV-HVR1210 [0070:2291,0070:2295] | 22 | 21 -> Hauppauge WinTV-HVR1210 [0070:2291,0070:2295] |
23 | 22 -> Mygica X8506 DMB-TH [14f1:8651] | 23 | 22 -> Mygica X8506 DMB-TH [14f1:8651] |
24 | 23 -> Magic-Pro ProHDTV Extreme 2 [14f1:8657] | ||
25 | 24 -> Hauppauge WinTV-HVR1850 [0070:8541] | ||
diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88 index 0736518b2f88..3385f8b094a5 100644 --- a/Documentation/video4linux/CARDLIST.cx88 +++ b/Documentation/video4linux/CARDLIST.cx88 | |||
@@ -80,3 +80,4 @@ | |||
80 | 79 -> Terratec Cinergy HT PCI MKII [153b:1177] | 80 | 79 -> Terratec Cinergy HT PCI MKII [153b:1177] |
81 | 80 -> Hauppauge WinTV-IR Only [0070:9290] | 81 | 80 -> Hauppauge WinTV-IR Only [0070:9290] |
82 | 81 -> Leadtek WinFast DTV1800 Hybrid [107d:6654] | 82 | 81 -> Leadtek WinFast DTV1800 Hybrid [107d:6654] |
83 | 82 -> WinFast DTV2000 H rev. J [107d:6f2b] | ||
diff --git a/Documentation/video4linux/CARDLIST.em28xx b/Documentation/video4linux/CARDLIST.em28xx index e352d754875c..b13fcbd5d94b 100644 --- a/Documentation/video4linux/CARDLIST.em28xx +++ b/Documentation/video4linux/CARDLIST.em28xx | |||
@@ -7,7 +7,7 @@ | |||
7 | 6 -> Terratec Cinergy 200 USB (em2800) | 7 | 6 -> Terratec Cinergy 200 USB (em2800) |
8 | 7 -> Leadtek Winfast USB II (em2800) [0413:6023] | 8 | 7 -> Leadtek Winfast USB II (em2800) [0413:6023] |
9 | 8 -> Kworld USB2800 (em2800) | 9 | 8 -> Kworld USB2800 (em2800) |
10 | 9 -> Pinnacle Dazzle DVC 90/100/101/107 / Kaiser Baas Video to DVD maker (em2820/em2840) [1b80:e302,2304:0207,2304:021a] | 10 | 9 -> Pinnacle Dazzle DVC 90/100/101/107 / Kaiser Baas Video to DVD maker (em2820/em2840) [1b80:e302,1b80:e304,2304:0207,2304:021a] |
11 | 10 -> Hauppauge WinTV HVR 900 (em2880) [2040:6500] | 11 | 10 -> Hauppauge WinTV HVR 900 (em2880) [2040:6500] |
12 | 11 -> Terratec Hybrid XS (em2880) [0ccd:0042] | 12 | 11 -> Terratec Hybrid XS (em2880) [0ccd:0042] |
13 | 12 -> Kworld PVR TV 2800 RF (em2820/em2840) | 13 | 12 -> Kworld PVR TV 2800 RF (em2820/em2840) |
@@ -33,7 +33,7 @@ | |||
33 | 34 -> Terratec Cinergy A Hybrid XS (em2860) [0ccd:004f] | 33 | 34 -> Terratec Cinergy A Hybrid XS (em2860) [0ccd:004f] |
34 | 35 -> Typhoon DVD Maker (em2860) | 34 | 35 -> Typhoon DVD Maker (em2860) |
35 | 36 -> NetGMBH Cam (em2860) | 35 | 36 -> NetGMBH Cam (em2860) |
36 | 37 -> Gadmei UTV330 (em2860) | 36 | 37 -> Gadmei UTV330 (em2860) [eb1a:50a6] |
37 | 38 -> Yakumo MovieMixer (em2861) | 37 | 38 -> Yakumo MovieMixer (em2861) |
38 | 39 -> KWorld PVRTV 300U (em2861) [eb1a:e300] | 38 | 39 -> KWorld PVRTV 300U (em2861) [eb1a:e300] |
39 | 40 -> Plextor ConvertX PX-TV100U (em2861) [093b:a005] | 39 | 40 -> Plextor ConvertX PX-TV100U (em2861) [093b:a005] |
@@ -67,3 +67,4 @@ | |||
67 | 69 -> KWorld ATSC 315U HDTV TV Box (em2882) [eb1a:a313] | 67 | 69 -> KWorld ATSC 315U HDTV TV Box (em2882) [eb1a:a313] |
68 | 70 -> Evga inDtube (em2882) | 68 | 70 -> Evga inDtube (em2882) |
69 | 71 -> Silvercrest Webcam 1.3mpix (em2820/em2840) | 69 | 71 -> Silvercrest Webcam 1.3mpix (em2820/em2840) |
70 | 72 -> Gadmei UTV330+ (em2861) | ||
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134 index c913e5614195..0ac4d2544778 100644 --- a/Documentation/video4linux/CARDLIST.saa7134 +++ b/Documentation/video4linux/CARDLIST.saa7134 | |||
@@ -167,3 +167,7 @@ | |||
167 | 166 -> Beholder BeholdTV 607 RDS [5ace:6073] | 167 | 166 -> Beholder BeholdTV 607 RDS [5ace:6073] |
168 | 167 -> Beholder BeholdTV 609 RDS [5ace:6092] | 168 | 167 -> Beholder BeholdTV 609 RDS [5ace:6092] |
169 | 168 -> Beholder BeholdTV 609 RDS [5ace:6093] | 169 | 168 -> Beholder BeholdTV 609 RDS [5ace:6093] |
170 | 169 -> Compro VideoMate S350/S300 [185b:c900] | ||
171 | 170 -> AverMedia AverTV Studio 505 [1461:a115] | ||
172 | 171 -> Beholder BeholdTV X7 [5ace:7595] | ||
173 | 172 -> RoverMedia TV Link Pro FM [19d1:0138] | ||
diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner index be67844074dd..ba9fa679e2d3 100644 --- a/Documentation/video4linux/CARDLIST.tuner +++ b/Documentation/video4linux/CARDLIST.tuner | |||
@@ -78,3 +78,4 @@ tuner=77 - TCL tuner MF02GIP-5N-E | |||
78 | tuner=78 - Philips FMD1216MEX MK3 Hybrid Tuner | 78 | tuner=78 - Philips FMD1216MEX MK3 Hybrid Tuner |
79 | tuner=79 - Philips PAL/SECAM multi (FM1216 MK5) | 79 | tuner=79 - Philips PAL/SECAM multi (FM1216 MK5) |
80 | tuner=80 - Philips FQ1216LME MK3 PAL/SECAM w/active loopthrough | 80 | tuner=80 - Philips FQ1216LME MK3 PAL/SECAM w/active loopthrough |
81 | tuner=81 - Partsnic (Daewoo) PTI-5NF05 | ||
diff --git a/Documentation/video4linux/CQcam.txt b/Documentation/video4linux/CQcam.txt index 04986efb731c..d230878e473e 100644 --- a/Documentation/video4linux/CQcam.txt +++ b/Documentation/video4linux/CQcam.txt | |||
@@ -18,8 +18,8 @@ Table of Contents | |||
18 | 18 | ||
19 | 1.0 Introduction | 19 | 1.0 Introduction |
20 | 20 | ||
21 | The file ../drivers/char/c-qcam.c is a device driver for the | 21 | The file ../../drivers/media/video/c-qcam.c is a device driver for |
22 | Logitech (nee Connectix) parallel port interface color CCD camera. | 22 | the Logitech (nee Connectix) parallel port interface color CCD camera. |
23 | This is a fairly inexpensive device for capturing images. Logitech | 23 | This is a fairly inexpensive device for capturing images. Logitech |
24 | does not currently provide information for developers, but many people | 24 | does not currently provide information for developers, but many people |
25 | have engineered several solutions for non-Microsoft use of the Color | 25 | have engineered several solutions for non-Microsoft use of the Color |
diff --git a/Documentation/video4linux/gspca.txt b/Documentation/video4linux/gspca.txt index 573f95b58807..4686e84dd800 100644 --- a/Documentation/video4linux/gspca.txt +++ b/Documentation/video4linux/gspca.txt | |||
@@ -140,6 +140,7 @@ spca500 04fc:7333 PalmPixDC85 | |||
140 | sunplus 04fc:ffff Pure DigitalDakota | 140 | sunplus 04fc:ffff Pure DigitalDakota |
141 | spca501 0506:00df 3Com HomeConnect Lite | 141 | spca501 0506:00df 3Com HomeConnect Lite |
142 | sunplus 052b:1513 Megapix V4 | 142 | sunplus 052b:1513 Megapix V4 |
143 | sunplus 052b:1803 MegaImage VI | ||
143 | tv8532 0545:808b Veo Stingray | 144 | tv8532 0545:808b Veo Stingray |
144 | tv8532 0545:8333 Veo Stingray | 145 | tv8532 0545:8333 Veo Stingray |
145 | sunplus 0546:3155 Polaroid PDC3070 | 146 | sunplus 0546:3155 Polaroid PDC3070 |
@@ -182,6 +183,7 @@ ov534 06f8:3002 Hercules Blog Webcam | |||
182 | ov534 06f8:3003 Hercules Dualpix HD Weblog | 183 | ov534 06f8:3003 Hercules Dualpix HD Weblog |
183 | sonixj 06f8:3004 Hercules Classic Silver | 184 | sonixj 06f8:3004 Hercules Classic Silver |
184 | sonixj 06f8:3008 Hercules Deluxe Optical Glass | 185 | sonixj 06f8:3008 Hercules Deluxe Optical Glass |
186 | pac7311 06f8:3009 Hercules Classic Link | ||
185 | spca508 0733:0110 ViewQuest VQ110 | 187 | spca508 0733:0110 ViewQuest VQ110 |
186 | spca508 0130:0130 Clone Digital Webcam 11043 | 188 | spca508 0130:0130 Clone Digital Webcam 11043 |
187 | spca501 0733:0401 Intel Create and Share | 189 | spca501 0733:0401 Intel Create and Share |
@@ -235,8 +237,10 @@ pac7311 093a:2621 PAC731x | |||
235 | pac7311 093a:2622 Genius Eye 312 | 237 | pac7311 093a:2622 Genius Eye 312 |
236 | pac7311 093a:2624 PAC7302 | 238 | pac7311 093a:2624 PAC7302 |
237 | pac7311 093a:2626 Labtec 2200 | 239 | pac7311 093a:2626 Labtec 2200 |
240 | pac7311 093a:2629 Genious iSlim 300 | ||
238 | pac7311 093a:262a Webcam 300k | 241 | pac7311 093a:262a Webcam 300k |
239 | pac7311 093a:262c Philips SPC 230 NC | 242 | pac7311 093a:262c Philips SPC 230 NC |
243 | jeilinj 0979:0280 Sakar 57379 | ||
240 | zc3xx 0ac8:0302 Z-star Vimicro zc0302 | 244 | zc3xx 0ac8:0302 Z-star Vimicro zc0302 |
241 | vc032x 0ac8:0321 Vimicro generic vc0321 | 245 | vc032x 0ac8:0321 Vimicro generic vc0321 |
242 | vc032x 0ac8:0323 Vimicro Vc0323 | 246 | vc032x 0ac8:0323 Vimicro Vc0323 |
@@ -247,6 +251,7 @@ zc3xx 0ac8:305b Z-star Vimicro zc0305b | |||
247 | zc3xx 0ac8:307b Ldlc VC302+Ov7620 | 251 | zc3xx 0ac8:307b Ldlc VC302+Ov7620 |
248 | vc032x 0ac8:c001 Sony embedded vimicro | 252 | vc032x 0ac8:c001 Sony embedded vimicro |
249 | vc032x 0ac8:c002 Sony embedded vimicro | 253 | vc032x 0ac8:c002 Sony embedded vimicro |
254 | vc032x 0ac8:c301 Samsung Q1 Ultra Premium | ||
250 | spca508 0af9:0010 Hama USB Sightcam 100 | 255 | spca508 0af9:0010 Hama USB Sightcam 100 |
251 | spca508 0af9:0011 Hama USB Sightcam 100 | 256 | spca508 0af9:0011 Hama USB Sightcam 100 |
252 | sonixb 0c45:6001 Genius VideoCAM NB | 257 | sonixb 0c45:6001 Genius VideoCAM NB |
@@ -284,6 +289,7 @@ sonixj 0c45:613a Microdia Sonix PC Camera | |||
284 | sonixj 0c45:613b Surfer SN-206 | 289 | sonixj 0c45:613b Surfer SN-206 |
285 | sonixj 0c45:613c Sonix Pccam168 | 290 | sonixj 0c45:613c Sonix Pccam168 |
286 | sonixj 0c45:6143 Sonix Pccam168 | 291 | sonixj 0c45:6143 Sonix Pccam168 |
292 | sonixj 0c45:6148 Digitus DA-70811/ZSMC USB PC Camera ZS211/Microdia | ||
287 | sn9c20x 0c45:6240 PC Camera (SN9C201 + MT9M001) | 293 | sn9c20x 0c45:6240 PC Camera (SN9C201 + MT9M001) |
288 | sn9c20x 0c45:6242 PC Camera (SN9C201 + MT9M111) | 294 | sn9c20x 0c45:6242 PC Camera (SN9C201 + MT9M111) |
289 | sn9c20x 0c45:6248 PC Camera (SN9C201 + OV9655) | 295 | sn9c20x 0c45:6248 PC Camera (SN9C201 + OV9655) |
diff --git a/Documentation/video4linux/si4713.txt b/Documentation/video4linux/si4713.txt new file mode 100644 index 000000000000..25abdb78209d --- /dev/null +++ b/Documentation/video4linux/si4713.txt | |||
@@ -0,0 +1,176 @@ | |||
1 | Driver for I2C radios for the Silicon Labs Si4713 FM Radio Transmitters | ||
2 | |||
3 | Copyright (c) 2009 Nokia Corporation | ||
4 | Contact: Eduardo Valentin <eduardo.valentin@nokia.com> | ||
5 | |||
6 | |||
7 | Information about the Device | ||
8 | ============================ | ||
9 | This chip is a Silicon Labs product. It is a I2C device, currently on 0x63 address. | ||
10 | Basically, it has transmission and signal noise level measurement features. | ||
11 | |||
12 | The Si4713 integrates transmit functions for FM broadcast stereo transmission. | ||
13 | The chip also allows integrated receive power scanning to identify low signal | ||
14 | power FM channels. | ||
15 | |||
16 | The chip is programmed using commands and responses. There are also several | ||
17 | properties which can change the behavior of this chip. | ||
18 | |||
19 | Users must comply with local regulations on radio frequency (RF) transmission. | ||
20 | |||
21 | Device driver description | ||
22 | ========================= | ||
23 | There are two modules to handle this device. One is a I2C device driver | ||
24 | and the other is a platform driver. | ||
25 | |||
26 | The I2C device driver exports a v4l2-subdev interface to the kernel. | ||
27 | All properties can also be accessed by v4l2 extended controls interface, by | ||
28 | using the v4l2-subdev calls (g_ext_ctrls, s_ext_ctrls). | ||
29 | |||
30 | The platform device driver exports a v4l2 radio device interface to user land. | ||
31 | So, it uses the I2C device driver as a sub device in order to send the user | ||
32 | commands to the actual device. Basically it is a wrapper to the I2C device driver. | ||
33 | |||
34 | Applications can use v4l2 radio API to specify frequency of operation, mute state, | ||
35 | etc. But mostly of its properties will be present in the extended controls. | ||
36 | |||
37 | When the v4l2 mute property is set to 1 (true), the driver will turn the chip off. | ||
38 | |||
39 | Properties description | ||
40 | ====================== | ||
41 | |||
42 | The properties can be accessed using v4l2 extended controls. | ||
43 | Here is an output from v4l2-ctl util: | ||
44 | / # v4l2-ctl -d /dev/radio0 --all -L | ||
45 | Driver Info: | ||
46 | Driver name : radio-si4713 | ||
47 | Card type : Silicon Labs Si4713 Modulator | ||
48 | Bus info : | ||
49 | Driver version: 0 | ||
50 | Capabilities : 0x00080800 | ||
51 | RDS Output | ||
52 | Modulator | ||
53 | Audio output: 0 (FM Modulator Audio Out) | ||
54 | Frequency: 1408000 (88.000000 MHz) | ||
55 | Video Standard = 0x00000000 | ||
56 | Modulator: | ||
57 | Name : FM Modulator | ||
58 | Capabilities : 62.5 Hz stereo rds | ||
59 | Frequency range : 76.0 MHz - 108.0 MHz | ||
60 | Subchannel modulation: stereo+rds | ||
61 | |||
62 | User Controls | ||
63 | |||
64 | mute (bool) : default=1 value=0 | ||
65 | |||
66 | FM Radio Modulator Controls | ||
67 | |||
68 | rds_signal_deviation (int) : min=0 max=90000 step=10 default=200 value=200 flags=slider | ||
69 | rds_program_id (int) : min=0 max=65535 step=1 default=0 value=0 | ||
70 | rds_program_type (int) : min=0 max=31 step=1 default=0 value=0 | ||
71 | rds_ps_name (str) : min=0 max=96 step=8 value='si4713 ' | ||
72 | rds_radio_text (str) : min=0 max=384 step=32 value='' | ||
73 | audio_limiter_feature_enabled (bool) : default=1 value=1 | ||
74 | audio_limiter_release_time (int) : min=250 max=102390 step=50 default=5010 value=5010 flags=slider | ||
75 | audio_limiter_deviation (int) : min=0 max=90000 step=10 default=66250 value=66250 flags=slider | ||
76 | audio_compression_feature_enabl (bool) : default=1 value=1 | ||
77 | audio_compression_gain (int) : min=0 max=20 step=1 default=15 value=15 flags=slider | ||
78 | audio_compression_threshold (int) : min=-40 max=0 step=1 default=-40 value=-40 flags=slider | ||
79 | audio_compression_attack_time (int) : min=0 max=5000 step=500 default=0 value=0 flags=slider | ||
80 | audio_compression_release_time (int) : min=100000 max=1000000 step=100000 default=1000000 value=1000000 flags=slider | ||
81 | pilot_tone_feature_enabled (bool) : default=1 value=1 | ||
82 | pilot_tone_deviation (int) : min=0 max=90000 step=10 default=6750 value=6750 flags=slider | ||
83 | pilot_tone_frequency (int) : min=0 max=19000 step=1 default=19000 value=19000 flags=slider | ||
84 | pre_emphasis_settings (menu) : min=0 max=2 default=1 value=1 | ||
85 | tune_power_level (int) : min=0 max=120 step=1 default=88 value=88 flags=slider | ||
86 | tune_antenna_capacitor (int) : min=0 max=191 step=1 default=0 value=110 flags=slider | ||
87 | / # | ||
88 | |||
89 | Here is a summary of them: | ||
90 | |||
91 | * Pilot is an audible tone sent by the device. | ||
92 | |||
93 | pilot_frequency - Configures the frequency of the stereo pilot tone. | ||
94 | pilot_deviation - Configures pilot tone frequency deviation level. | ||
95 | pilot_enabled - Enables or disables the pilot tone feature. | ||
96 | |||
97 | * The si4713 device is capable of applying audio compression to the transmitted signal. | ||
98 | |||
99 | acomp_enabled - Enables or disables the audio dynamic range control feature. | ||
100 | acomp_gain - Sets the gain for audio dynamic range control. | ||
101 | acomp_threshold - Sets the threshold level for audio dynamic range control. | ||
102 | acomp_attack_time - Sets the attack time for audio dynamic range control. | ||
103 | acomp_release_time - Sets the release time for audio dynamic range control. | ||
104 | |||
105 | * Limiter setups audio deviation limiter feature. Once a over deviation occurs, | ||
106 | it is possible to adjust the front-end gain of the audio input and always | ||
107 | prevent over deviation. | ||
108 | |||
109 | limiter_enabled - Enables or disables the limiter feature. | ||
110 | limiter_deviation - Configures audio frequency deviation level. | ||
111 | limiter_release_time - Sets the limiter release time. | ||
112 | |||
113 | * Tuning power | ||
114 | |||
115 | power_level - Sets the output power level for signal transmission. | ||
116 | antenna_capacitor - This selects the value of antenna tuning capacitor manually | ||
117 | or automatically if set to zero. | ||
118 | |||
119 | * RDS related | ||
120 | |||
121 | rds_ps_name - Sets the RDS ps name field for transmission. | ||
122 | rds_radio_text - Sets the RDS radio text for transmission. | ||
123 | rds_pi - Sets the RDS PI field for transmission. | ||
124 | rds_pty - Sets the RDS PTY field for transmission. | ||
125 | |||
126 | * Region related | ||
127 | |||
128 | preemphasis - sets the preemphasis to be applied for transmission. | ||
129 | |||
130 | RNL | ||
131 | === | ||
132 | |||
133 | This device also has an interface to measure received noise level. To do that, you should | ||
134 | ioctl the device node. Here is an code of example: | ||
135 | |||
136 | int main (int argc, char *argv[]) | ||
137 | { | ||
138 | struct si4713_rnl rnl; | ||
139 | int fd = open("/dev/radio0", O_RDWR); | ||
140 | int rval; | ||
141 | |||
142 | if (argc < 2) | ||
143 | return -EINVAL; | ||
144 | |||
145 | if (fd < 0) | ||
146 | return fd; | ||
147 | |||
148 | sscanf(argv[1], "%d", &rnl.frequency); | ||
149 | |||
150 | rval = ioctl(fd, SI4713_IOC_MEASURE_RNL, &rnl); | ||
151 | if (rval < 0) | ||
152 | return rval; | ||
153 | |||
154 | printf("received noise level: %d\n", rnl.rnl); | ||
155 | |||
156 | close(fd); | ||
157 | } | ||
158 | |||
159 | The struct si4713_rnl and SI4713_IOC_MEASURE_RNL are defined under | ||
160 | include/media/si4713.h. | ||
161 | |||
162 | Stereo/Mono and RDS subchannels | ||
163 | =============================== | ||
164 | |||
165 | The device can also be configured using the available sub channels for | ||
166 | transmission. To do that use S/G_MODULATOR ioctl and configure txsubchans properly. | ||
167 | Refer to v4l2-spec for proper use of this ioctl. | ||
168 | |||
169 | Testing | ||
170 | ======= | ||
171 | Testing is usually done with v4l2-ctl utility for managing FM tuner cards. | ||
172 | The tool can be found in v4l-dvb repository under v4l2-apps/util directory. | ||
173 | |||
174 | Example for setting rds ps name: | ||
175 | # v4l2-ctl -d /dev/radio0 --set-ctrl=rds_ps_name="Dummy" | ||
176 | |||
diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt index bb1f5c6e28b3..510917ff59ed 100644 --- a/Documentation/vm/slub.txt +++ b/Documentation/vm/slub.txt | |||
@@ -41,6 +41,8 @@ Possible debug options are | |||
41 | P Poisoning (object and padding) | 41 | P Poisoning (object and padding) |
42 | U User tracking (free and alloc) | 42 | U User tracking (free and alloc) |
43 | T Trace (please only use on single slabs) | 43 | T Trace (please only use on single slabs) |
44 | O Switch debugging off for caches that would have | ||
45 | caused higher minimum slab orders | ||
44 | - Switch all debugging off (useful if the kernel is | 46 | - Switch all debugging off (useful if the kernel is |
45 | configured with CONFIG_SLUB_DEBUG_ON) | 47 | configured with CONFIG_SLUB_DEBUG_ON) |
46 | 48 | ||
@@ -59,6 +61,14 @@ to the dentry cache with | |||
59 | 61 | ||
60 | slub_debug=F,dentry | 62 | slub_debug=F,dentry |
61 | 63 | ||
64 | Debugging options may require the minimum possible slab order to increase as | ||
65 | a result of storing the metadata (for example, caches with PAGE_SIZE object | ||
66 | sizes). This has a higher liklihood of resulting in slab allocation errors | ||
67 | in low memory situations or if there's high fragmentation of memory. To | ||
68 | switch off debugging for such caches by default, use | ||
69 | |||
70 | slub_debug=O | ||
71 | |||
62 | In case you forgot to enable debugging on the kernel command line: It is | 72 | In case you forgot to enable debugging on the kernel command line: It is |
63 | possible to enable debugging manually when the kernel is up. Look at the | 73 | possible to enable debugging manually when the kernel is up. Look at the |
64 | contents of: | 74 | contents of: |
diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt index 8da3a795083f..30b43e1b2697 100644 --- a/Documentation/x86/boot.txt +++ b/Documentation/x86/boot.txt | |||
@@ -599,6 +599,7 @@ Protocol: 2.07+ | |||
599 | 0x00000000 The default x86/PC environment | 599 | 0x00000000 The default x86/PC environment |
600 | 0x00000001 lguest | 600 | 0x00000001 lguest |
601 | 0x00000002 Xen | 601 | 0x00000002 Xen |
602 | 0x00000003 Moorestown MID | ||
602 | 603 | ||
603 | Field name: hardware_subarch_data | 604 | Field name: hardware_subarch_data |
604 | Type: write (subarch-dependent) | 605 | Type: write (subarch-dependent) |
diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt index 4f913857b8a2..feb37e177010 100644 --- a/Documentation/x86/zero-page.txt +++ b/Documentation/x86/zero-page.txt | |||
@@ -12,6 +12,7 @@ Offset Proto Name Meaning | |||
12 | 000/040 ALL screen_info Text mode or frame buffer information | 12 | 000/040 ALL screen_info Text mode or frame buffer information |
13 | (struct screen_info) | 13 | (struct screen_info) |
14 | 040/014 ALL apm_bios_info APM BIOS information (struct apm_bios_info) | 14 | 040/014 ALL apm_bios_info APM BIOS information (struct apm_bios_info) |
15 | 058/008 ALL tboot_addr Physical address of tboot shared page | ||
15 | 060/010 ALL ist_info Intel SpeedStep (IST) BIOS support information | 16 | 060/010 ALL ist_info Intel SpeedStep (IST) BIOS support information |
16 | (struct ist_info) | 17 | (struct ist_info) |
17 | 080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!! | 18 | 080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!! |