aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/00-INDEX2
-rw-r--r--Documentation/ABI/testing/sysfs-bus-pci10
-rw-r--r--Documentation/DocBook/uio-howto.tmpl163
-rw-r--r--Documentation/PCI/pci-error-recovery.txt119
-rw-r--r--Documentation/RCU/RTFP.txt77
-rw-r--r--Documentation/RCU/UP.txt34
-rw-r--r--Documentation/RCU/checklist.txt20
-rw-r--r--Documentation/RCU/rcu.txt10
-rw-r--r--Documentation/RCU/rcubarrier.txt7
-rw-r--r--Documentation/RCU/torture.txt23
-rw-r--r--Documentation/RCU/trace.txt7
-rw-r--r--Documentation/RCU/whatisRCU.txt22
-rw-r--r--Documentation/arm/OMAP/omap_pm129
-rw-r--r--Documentation/arm/SA1100/ADSBitsy2
-rw-r--r--Documentation/arm/SA1100/Assabet2
-rw-r--r--Documentation/arm/SA1100/Brutus2
-rw-r--r--Documentation/arm/SA1100/GraphicsClient4
-rw-r--r--Documentation/arm/SA1100/GraphicsMaster4
-rw-r--r--Documentation/arm/SA1100/Victor2
-rw-r--r--Documentation/arm/Samsung-S3C24XX/CPUfreq.txt75
-rw-r--r--Documentation/btmrvl.txt119
-rw-r--r--Documentation/connector/Makefile5
-rw-r--r--Documentation/connector/cn_test.c33
-rw-r--r--Documentation/connector/connector.txt119
-rw-r--r--Documentation/connector/ucon.c62
-rw-r--r--Documentation/cpu-freq/user-guide.txt9
-rw-r--r--Documentation/dontdiff1
-rw-r--r--Documentation/feature-removal-schedule.txt115
-rw-r--r--Documentation/filesystems/9p.txt3
-rw-r--r--Documentation/filesystems/ext4.txt24
-rw-r--r--Documentation/filesystems/gfs2-uevents.txt100
-rw-r--r--Documentation/filesystems/nfs.txt98
-rw-r--r--Documentation/filesystems/seq_file.txt2
-rw-r--r--Documentation/flexible-arrays.txt99
-rw-r--r--Documentation/hwmon/pcf859128
-rw-r--r--Documentation/hwmon/tmp42136
-rw-r--r--Documentation/hwmon/wm831x37
-rw-r--r--Documentation/hwmon/wm835026
-rw-r--r--Documentation/input/sentelic.txt475
-rw-r--r--Documentation/intel_txt.txt210
-rw-r--r--Documentation/ioctl/ioctl-number.txt3
-rw-r--r--Documentation/kernel-doc-nano-HOWTO.txt4
-rw-r--r--Documentation/kernel-parameters.txt95
-rw-r--r--Documentation/keys.txt39
-rw-r--r--Documentation/kmemleak.txt31
-rw-r--r--Documentation/kref.txt1
-rw-r--r--Documentation/kvm/api.txt759
-rw-r--r--Documentation/networking/00-INDEX2
-rw-r--r--Documentation/networking/ieee802154.txt20
-rw-r--r--Documentation/networking/ip-sysctl.txt47
-rw-r--r--Documentation/power/runtime_pm.txt378
-rw-r--r--Documentation/s390/s390dbf.txt7
-rw-r--r--Documentation/sound/alsa/ALSA-Configuration.txt30
-rw-r--r--Documentation/sound/alsa/HD-Audio-Models.txt33
-rw-r--r--Documentation/sound/alsa/HD-Audio.txt64
-rw-r--r--Documentation/sysctl/kernel.txt16
-rw-r--r--Documentation/trace/events.txt217
-rw-r--r--Documentation/trace/ftrace-design.txt233
-rw-r--r--Documentation/trace/ftrace.txt74
-rw-r--r--Documentation/trace/function-graph-fold.vim42
-rw-r--r--Documentation/trace/ring-buffer-design.txt955
-rw-r--r--Documentation/vgaarbiter.txt194
-rw-r--r--Documentation/video4linux/CARDLIST.cx238852
-rw-r--r--Documentation/video4linux/CARDLIST.cx881
-rw-r--r--Documentation/video4linux/CARDLIST.em28xx5
-rw-r--r--Documentation/video4linux/CARDLIST.saa71344
-rw-r--r--Documentation/video4linux/CARDLIST.tuner1
-rw-r--r--Documentation/video4linux/CQcam.txt4
-rw-r--r--Documentation/video4linux/gspca.txt6
-rw-r--r--Documentation/video4linux/si4713.txt176
-rw-r--r--Documentation/vm/slub.txt10
-rw-r--r--Documentation/x86/zero-page.txt1
72 files changed, 5428 insertions, 341 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index d05737aaa84b..06b982affe76 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -82,6 +82,8 @@ block/
82 - info on the Block I/O (BIO) layer. 82 - info on the Block I/O (BIO) layer.
83blockdev/ 83blockdev/
84 - info on block devices & drivers 84 - info on block devices & drivers
85btmrvl.txt
86 - info on Marvell Bluetooth driver usage.
85cachetlb.txt 87cachetlb.txt
86 - describes the cache/TLB flushing interfaces Linux uses. 88 - describes the cache/TLB flushing interfaces Linux uses.
87cdrom/ 89cdrom/
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 6bf68053e4b8..25be3250f7d6 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -84,6 +84,16 @@ Description:
84 from this part of the device tree. 84 from this part of the device tree.
85 Depends on CONFIG_HOTPLUG. 85 Depends on CONFIG_HOTPLUG.
86 86
87What: /sys/bus/pci/devices/.../reset
88Date: July 2009
89Contact: Michael S. Tsirkin <mst@redhat.com>
90Description:
91 Some devices allow an individual function to be reset
92 without affecting other functions in the same device.
93 For devices that have this support, a file named reset
94 will be present in sysfs. Writing 1 to this file
95 will perform reset.
96
87What: /sys/bus/pci/devices/.../vpd 97What: /sys/bus/pci/devices/.../vpd
88Date: February 2008 98Date: February 2008
89Contact: Ben Hutchings <bhutchings@solarflare.com> 99Contact: Ben Hutchings <bhutchings@solarflare.com>
diff --git a/Documentation/DocBook/uio-howto.tmpl b/Documentation/DocBook/uio-howto.tmpl
index 8f6e3b2403c7..4d4ce0e61e42 100644
--- a/Documentation/DocBook/uio-howto.tmpl
+++ b/Documentation/DocBook/uio-howto.tmpl
@@ -25,6 +25,10 @@
25 <year>2006-2008</year> 25 <year>2006-2008</year>
26 <holder>Hans-Jürgen Koch.</holder> 26 <holder>Hans-Jürgen Koch.</holder>
27</copyright> 27</copyright>
28<copyright>
29 <year>2009</year>
30 <holder>Red Hat Inc, Michael S. Tsirkin (mst@redhat.com)</holder>
31</copyright>
28 32
29<legalnotice> 33<legalnotice>
30<para> 34<para>
@@ -42,6 +46,13 @@ GPL version 2.
42 46
43<revhistory> 47<revhistory>
44 <revision> 48 <revision>
49 <revnumber>0.9</revnumber>
50 <date>2009-07-16</date>
51 <authorinitials>mst</authorinitials>
52 <revremark>Added generic pci driver
53 </revremark>
54 </revision>
55 <revision>
45 <revnumber>0.8</revnumber> 56 <revnumber>0.8</revnumber>
46 <date>2008-12-24</date> 57 <date>2008-12-24</date>
47 <authorinitials>hjk</authorinitials> 58 <authorinitials>hjk</authorinitials>
@@ -809,6 +820,158 @@ framework to set up sysfs files for this region. Simply leave it alone.
809 820
810</chapter> 821</chapter>
811 822
823<chapter id="uio_pci_generic" xreflabel="Using Generic driver for PCI cards">
824<?dbhtml filename="uio_pci_generic.html"?>
825<title>Generic PCI UIO driver</title>
826 <para>
827 The generic driver is a kernel module named uio_pci_generic.
828 It can work with any device compliant to PCI 2.3 (circa 2002) and
829 any compliant PCI Express device. Using this, you only need to
830 write the userspace driver, removing the need to write
831 a hardware-specific kernel module.
832 </para>
833
834<sect1 id="uio_pci_generic_binding">
835<title>Making the driver recognize the device</title>
836 <para>
837Since the driver does not declare any device ids, it will not get loaded
838automatically and will not automatically bind to any devices, you must load it
839and allocate id to the driver yourself. For example:
840 <programlisting>
841 modprobe uio_pci_generic
842 echo &quot;8086 10f5&quot; &gt; /sys/bus/pci/drivers/uio_pci_generic/new_id
843 </programlisting>
844 </para>
845 <para>
846If there already is a hardware specific kernel driver for your device, the
847generic driver still won't bind to it, in this case if you want to use the
848generic driver (why would you?) you'll have to manually unbind the hardware
849specific driver and bind the generic driver, like this:
850 <programlisting>
851 echo -n 0000:00:19.0 &gt; /sys/bus/pci/drivers/e1000e/unbind
852 echo -n 0000:00:19.0 &gt; /sys/bus/pci/drivers/uio_pci_generic/bind
853 </programlisting>
854 </para>
855 <para>
856You can verify that the device has been bound to the driver
857by looking for it in sysfs, for example like the following:
858 <programlisting>
859 ls -l /sys/bus/pci/devices/0000:00:19.0/driver
860 </programlisting>
861Which if successful should print
862 <programlisting>
863 .../0000:00:19.0/driver -&gt; ../../../bus/pci/drivers/uio_pci_generic
864 </programlisting>
865Note that the generic driver will not bind to old PCI 2.2 devices.
866If binding the device failed, run the following command:
867 <programlisting>
868 dmesg
869 </programlisting>
870and look in the output for failure reasons
871 </para>
872</sect1>
873
874<sect1 id="uio_pci_generic_internals">
875<title>Things to know about uio_pci_generic</title>
876 <para>
877Interrupts are handled using the Interrupt Disable bit in the PCI command
878register and Interrupt Status bit in the PCI status register. All devices
879compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should
880support these bits. uio_pci_generic detects this support, and won't bind to
881devices which do not support the Interrupt Disable Bit in the command register.
882 </para>
883 <para>
884On each interrupt, uio_pci_generic sets the Interrupt Disable bit.
885This prevents the device from generating further interrupts
886until the bit is cleared. The userspace driver should clear this
887bit before blocking and waiting for more interrupts.
888 </para>
889</sect1>
890<sect1 id="uio_pci_generic_userspace">
891<title>Writing userspace driver using uio_pci_generic</title>
892 <para>
893Userspace driver can use pci sysfs interface, or the
894libpci libray that wraps it, to talk to the device and to
895re-enable interrupts by writing to the command register.
896 </para>
897</sect1>
898<sect1 id="uio_pci_generic_example">
899<title>Example code using uio_pci_generic</title>
900 <para>
901Here is some sample userspace driver code using uio_pci_generic:
902<programlisting>
903#include &lt;stdlib.h&gt;
904#include &lt;stdio.h&gt;
905#include &lt;unistd.h&gt;
906#include &lt;sys/types.h&gt;
907#include &lt;sys/stat.h&gt;
908#include &lt;fcntl.h&gt;
909#include &lt;errno.h&gt;
910
911int main()
912{
913 int uiofd;
914 int configfd;
915 int err;
916 int i;
917 unsigned icount;
918 unsigned char command_high;
919
920 uiofd = open(&quot;/dev/uio0&quot;, O_RDONLY);
921 if (uiofd &lt; 0) {
922 perror(&quot;uio open:&quot;);
923 return errno;
924 }
925 configfd = open(&quot;/sys/class/uio/uio0/device/config&quot;, O_RDWR);
926 if (uiofd &lt; 0) {
927 perror(&quot;config open:&quot;);
928 return errno;
929 }
930
931 /* Read and cache command value */
932 err = pread(configfd, &amp;command_high, 1, 5);
933 if (err != 1) {
934 perror(&quot;command config read:&quot;);
935 return errno;
936 }
937 command_high &amp;= ~0x4;
938
939 for(i = 0;; ++i) {
940 /* Print out a message, for debugging. */
941 if (i == 0)
942 fprintf(stderr, &quot;Started uio test driver.\n&quot;);
943 else
944 fprintf(stderr, &quot;Interrupts: %d\n&quot;, icount);
945
946 /****************************************/
947 /* Here we got an interrupt from the
948 device. Do something to it. */
949 /****************************************/
950
951 /* Re-enable interrupts. */
952 err = pwrite(configfd, &amp;command_high, 1, 5);
953 if (err != 1) {
954 perror(&quot;config write:&quot;);
955 break;
956 }
957
958 /* Wait for next interrupt. */
959 err = read(uiofd, &amp;icount, 4);
960 if (err != 4) {
961 perror(&quot;uio read:&quot;);
962 break;
963 }
964
965 }
966 return errno;
967}
968
969</programlisting>
970 </para>
971</sect1>
972
973</chapter>
974
812<appendix id="app1"> 975<appendix id="app1">
813<title>Further information</title> 976<title>Further information</title>
814<itemizedlist> 977<itemizedlist>
diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.txt
index 6650af432523..e83f2ea76415 100644
--- a/Documentation/PCI/pci-error-recovery.txt
+++ b/Documentation/PCI/pci-error-recovery.txt
@@ -4,15 +4,17 @@
4 February 2, 2006 4 February 2, 2006
5 5
6 Current document maintainer: 6 Current document maintainer:
7 Linas Vepstas <linas@austin.ibm.com> 7 Linas Vepstas <linasvepstas@gmail.com>
8 updated by Richard Lary <rlary@us.ibm.com>
9 and Mike Mason <mmlnx@us.ibm.com> on 27-Jul-2009
8 10
9 11
10Many PCI bus controllers are able to detect a variety of hardware 12Many PCI bus controllers are able to detect a variety of hardware
11PCI errors on the bus, such as parity errors on the data and address 13PCI errors on the bus, such as parity errors on the data and address
12busses, as well as SERR and PERR errors. Some of the more advanced 14busses, as well as SERR and PERR errors. Some of the more advanced
13chipsets are able to deal with these errors; these include PCI-E chipsets, 15chipsets are able to deal with these errors; these include PCI-E chipsets,
14and the PCI-host bridges found on IBM Power4 and Power5-based pSeries 16and the PCI-host bridges found on IBM Power4, Power5 and Power6-based
15boxes. A typical action taken is to disconnect the affected device, 17pSeries boxes. A typical action taken is to disconnect the affected device,
16halting all I/O to it. The goal of a disconnection is to avoid system 18halting all I/O to it. The goal of a disconnection is to avoid system
17corruption; for example, to halt system memory corruption due to DMA's 19corruption; for example, to halt system memory corruption due to DMA's
18to "wild" addresses. Typically, a reconnection mechanism is also 20to "wild" addresses. Typically, a reconnection mechanism is also
@@ -37,10 +39,11 @@ is forced by the need to handle multi-function devices, that is,
37devices that have multiple device drivers associated with them. 39devices that have multiple device drivers associated with them.
38In the first stage, each driver is allowed to indicate what type 40In the first stage, each driver is allowed to indicate what type
39of reset it desires, the choices being a simple re-enabling of I/O 41of reset it desires, the choices being a simple re-enabling of I/O
40or requesting a hard reset (a full electrical #RST of the PCI card). 42or requesting a slot reset.
41If any driver requests a full reset, that is what will be done.
42 43
43After a full reset and/or a re-enabling of I/O, all drivers are 44If any driver requests a slot reset, that is what will be done.
45
46After a reset and/or a re-enabling of I/O, all drivers are
44again notified, so that they may then perform any device setup/config 47again notified, so that they may then perform any device setup/config
45that may be required. After these have all completed, a final 48that may be required. After these have all completed, a final
46"resume normal operations" event is sent out. 49"resume normal operations" event is sent out.
@@ -101,7 +104,7 @@ if it implements any, it must implement error_detected(). If a callback
101is not implemented, the corresponding feature is considered unsupported. 104is not implemented, the corresponding feature is considered unsupported.
102For example, if mmio_enabled() and resume() aren't there, then it 105For example, if mmio_enabled() and resume() aren't there, then it
103is assumed that the driver is not doing any direct recovery and requires 106is assumed that the driver is not doing any direct recovery and requires
104a reset. If link_reset() is not implemented, the card is assumed as 107a slot reset. If link_reset() is not implemented, the card is assumed to
105not care about link resets. Typically a driver will want to know about 108not care about link resets. Typically a driver will want to know about
106a slot_reset(). 109a slot_reset().
107 110
@@ -111,7 +114,7 @@ sequence described below.
111 114
112STEP 0: Error Event 115STEP 0: Error Event
113------------------- 116-------------------
114PCI bus error is detect by the PCI hardware. On powerpc, the slot 117A PCI bus error is detected by the PCI hardware. On powerpc, the slot
115is isolated, in that all I/O is blocked: all reads return 0xffffffff, 118is isolated, in that all I/O is blocked: all reads return 0xffffffff,
116all writes are ignored. 119all writes are ignored.
117 120
@@ -139,7 +142,7 @@ The driver must return one of the following result codes:
139 a chance to extract some diagnostic information (see 142 a chance to extract some diagnostic information (see
140 mmio_enable, below). 143 mmio_enable, below).
141 - PCI_ERS_RESULT_NEED_RESET: 144 - PCI_ERS_RESULT_NEED_RESET:
142 Driver returns this if it can't recover without a hard 145 Driver returns this if it can't recover without a
143 slot reset. 146 slot reset.
144 - PCI_ERS_RESULT_DISCONNECT: 147 - PCI_ERS_RESULT_DISCONNECT:
145 Driver returns this if it doesn't want to recover at all. 148 Driver returns this if it doesn't want to recover at all.
@@ -169,11 +172,11 @@ is STEP 6 (Permanent Failure).
169 172
170>>> The current powerpc implementation doesn't much care if the device 173>>> The current powerpc implementation doesn't much care if the device
171>>> attempts I/O at this point, or not. I/O's will fail, returning 174>>> attempts I/O at this point, or not. I/O's will fail, returning
172>>> a value of 0xff on read, and writes will be dropped. If the device 175>>> a value of 0xff on read, and writes will be dropped. If more than
173>>> driver attempts more than 10K I/O's to a frozen adapter, it will 176>>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
174>>> assume that the device driver has gone into an infinite loop, and 177>>> assumes that the device driver has gone into an infinite loop
175>>> it will panic the kernel. There doesn't seem to be any other 178>>> and prints an error to syslog. A reboot is then required to
176>>> way of stopping a device driver that insists on spinning on I/O. 179>>> get the device working again.
177 180
178STEP 2: MMIO Enabled 181STEP 2: MMIO Enabled
179------------------- 182-------------------
@@ -182,15 +185,14 @@ DMA), and then calls the mmio_enabled() callback on all affected
182device drivers. 185device drivers.
183 186
184This is the "early recovery" call. IOs are allowed again, but DMA is 187This is the "early recovery" call. IOs are allowed again, but DMA is
185not (hrm... to be discussed, I prefer not), with some restrictions. This 188not, with some restrictions. This is NOT a callback for the driver to
186is NOT a callback for the driver to start operations again, only to 189start operations again, only to peek/poke at the device, extract diagnostic
187peek/poke at the device, extract diagnostic information, if any, and 190information, if any, and eventually do things like trigger a device local
188eventually do things like trigger a device local reset or some such, 191reset or some such, but not restart operations. This callback is made if
189but not restart operations. This is callback is made if all drivers on 192all drivers on a segment agree that they can try to recover and if no automatic
190a segment agree that they can try to recover and if no automatic link reset 193link reset was performed by the HW. If the platform can't just re-enable IOs
191was performed by the HW. If the platform can't just re-enable IOs without 194without a slot reset or a link reset, it will not call this callback, and
192a slot reset or a link reset, it wont call this callback, and instead 195instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
193will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
194 196
195>>> The following is proposed; no platform implements this yet: 197>>> The following is proposed; no platform implements this yet:
196>>> Proposal: All I/O's should be done _synchronously_ from within 198>>> Proposal: All I/O's should be done _synchronously_ from within
@@ -228,9 +230,6 @@ proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
228If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform 230If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
229proceeds to STEP 4 (Slot Reset) 231proceeds to STEP 4 (Slot Reset)
230 232
231>>> The current powerpc implementation does not implement this callback.
232
233
234STEP 3: Link Reset 233STEP 3: Link Reset
235------------------ 234------------------
236The platform resets the link, and then calls the link_reset() callback 235The platform resets the link, and then calls the link_reset() callback
@@ -253,16 +252,33 @@ The platform then proceeds to either STEP 4 (Slot Reset) or STEP 5
253 252
254>>> The current powerpc implementation does not implement this callback. 253>>> The current powerpc implementation does not implement this callback.
255 254
256
257STEP 4: Slot Reset 255STEP 4: Slot Reset
258------------------ 256------------------
259The platform performs a soft or hard reset of the device, and then
260calls the slot_reset() callback.
261 257
262A soft reset consists of asserting the adapter #RST line and then 258In response to a return value of PCI_ERS_RESULT_NEED_RESET, the
259the platform will peform a slot reset on the requesting PCI device(s).
260The actual steps taken by a platform to perform a slot reset
261will be platform-dependent. Upon completion of slot reset, the
262platform will call the device slot_reset() callback.
263
264Powerpc platforms implement two levels of slot reset:
265soft reset(default) and fundamental(optional) reset.
266
267Powerpc soft reset consists of asserting the adapter #RST line and then
263restoring the PCI BAR's and PCI configuration header to a state 268restoring the PCI BAR's and PCI configuration header to a state
264that is equivalent to what it would be after a fresh system 269that is equivalent to what it would be after a fresh system
265power-on followed by power-on BIOS/system firmware initialization. 270power-on followed by power-on BIOS/system firmware initialization.
271Soft reset is also known as hot-reset.
272
273Powerpc fundamental reset is supported by PCI Express cards only
274and results in device's state machines, hardware logic, port states and
275configuration registers to initialize to their default conditions.
276
277For most PCI devices, a soft reset will be sufficient for recovery.
278Optional fundamental reset is provided to support a limited number
279of PCI Express PCI devices for which a soft reset is not sufficient
280for recovery.
281
266If the platform supports PCI hotplug, then the reset might be 282If the platform supports PCI hotplug, then the reset might be
267performed by toggling the slot electrical power off/on. 283performed by toggling the slot electrical power off/on.
268 284
@@ -274,10 +290,12 @@ may result in hung devices, kernel panics, or silent data corruption.
274 290
275This call gives drivers the chance to re-initialize the hardware 291This call gives drivers the chance to re-initialize the hardware
276(re-download firmware, etc.). At this point, the driver may assume 292(re-download firmware, etc.). At this point, the driver may assume
277that he card is in a fresh state and is fully functional. In 293that the card is in a fresh state and is fully functional. The slot
278particular, interrupt generation should work normally. 294is unfrozen and the driver has full access to PCI config space,
295memory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X)
296will also be available.
279 297
280Drivers should not yet restart normal I/O processing operations 298Drivers should not restart normal I/O processing operations
281at this point. If all device drivers report success on this 299at this point. If all device drivers report success on this
282callback, the platform will call resume() to complete the sequence, 300callback, the platform will call resume() to complete the sequence,
283and let the driver restart normal I/O processing. 301and let the driver restart normal I/O processing.
@@ -302,11 +320,21 @@ driver performs device init only from PCI function 0:
302 - PCI_ERS_RESULT_DISCONNECT 320 - PCI_ERS_RESULT_DISCONNECT
303 Same as above. 321 Same as above.
304 322
323Drivers for PCI Express cards that require a fundamental reset must
324set the needs_freset bit in the pci_dev structure in their probe function.
325For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
326PCI card types:
327
328+ /* Set EEH reset type to fundamental if required by hba */
329+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
330+ pdev->needs_freset = 1;
331+
332
305Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent 333Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
306Failure). 334Failure).
307 335
308>>> The current powerpc implementation does not currently try a 336>>> The current powerpc implementation does not try a power-cycle
309>>> power-cycle reset if the driver returned PCI_ERS_RESULT_DISCONNECT. 337>>> reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
310>>> However, it probably should. 338>>> However, it probably should.
311 339
312 340
@@ -348,7 +376,7 @@ software errors.
348 376
349Conclusion; General Remarks 377Conclusion; General Remarks
350--------------------------- 378---------------------------
351The way those callbacks are called is platform policy. A platform with 379The way the callbacks are called is platform policy. A platform with
352no slot reset capability may want to just "ignore" drivers that can't 380no slot reset capability may want to just "ignore" drivers that can't
353recover (disconnect them) and try to let other cards on the same segment 381recover (disconnect them) and try to let other cards on the same segment
354recover. Keep in mind that in most real life cases, though, there will 382recover. Keep in mind that in most real life cases, though, there will
@@ -361,8 +389,8 @@ That is, the recovery API only requires that:
361 389
362 - There is no guarantee that interrupt delivery can proceed from any 390 - There is no guarantee that interrupt delivery can proceed from any
363device on the segment starting from the error detection and until the 391device on the segment starting from the error detection and until the
364resume callback is sent, at which point interrupts are expected to be 392slot_reset callback is called, at which point interrupts are expected
365fully operational. 393to be fully operational.
366 394
367 - There is no guarantee that interrupt delivery is stopped, that is, 395 - There is no guarantee that interrupt delivery is stopped, that is,
368a driver that gets an interrupt after detecting an error, or that detects 396a driver that gets an interrupt after detecting an error, or that detects
@@ -381,16 +409,23 @@ anyway :)
381>>> Implementation details for the powerpc platform are discussed in 409>>> Implementation details for the powerpc platform are discussed in
382>>> the file Documentation/powerpc/eeh-pci-error-recovery.txt 410>>> the file Documentation/powerpc/eeh-pci-error-recovery.txt
383 411
384>>> As of this writing, there are six device drivers with patches 412>>> As of this writing, there is a growing list of device drivers with
385>>> implementing error recovery. Not all of these patches are in 413>>> patches implementing error recovery. Not all of these patches are in
386>>> mainline yet. These may be used as "examples": 414>>> mainline yet. These may be used as "examples":
387>>> 415>>>
388>>> drivers/scsi/ipr.c 416>>> drivers/scsi/ipr
389>>> drivers/scsi/sym53cxx_2 417>>> drivers/scsi/sym53c8xx_2
418>>> drivers/scsi/qla2xxx
419>>> drivers/scsi/lpfc
420>>> drivers/next/bnx2.c
390>>> drivers/next/e100.c 421>>> drivers/next/e100.c
391>>> drivers/net/e1000 422>>> drivers/net/e1000
423>>> drivers/net/e1000e
392>>> drivers/net/ixgb 424>>> drivers/net/ixgb
425>>> drivers/net/ixgbe
426>>> drivers/net/cxgb3
393>>> drivers/net/s2io.c 427>>> drivers/net/s2io.c
428>>> drivers/net/qlge
394 429
395The End 430The End
396------- 431-------
diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt
index 9f711d2df91b..d2b85237c76e 100644
--- a/Documentation/RCU/RTFP.txt
+++ b/Documentation/RCU/RTFP.txt
@@ -743,3 +743,80 @@ Revised:
743 RCU, realtime RCU, sleepable RCU, performance. 743 RCU, realtime RCU, sleepable RCU, performance.
744" 744"
745} 745}
746
747@article{PaulEMcKenney2008RCUOSR
748,author="Paul E. McKenney and Jonathan Walpole"
749,title="Introducing technology into the {Linux} kernel: a case study"
750,Year="2008"
751,journal="SIGOPS Oper. Syst. Rev."
752,volume="42"
753,number="5"
754,pages="4--17"
755,issn="0163-5980"
756,doi={http://doi.acm.org/10.1145/1400097.1400099}
757,publisher="ACM"
758,address="New York, NY, USA"
759,annotation={
760 Linux changed RCU to a far greater degree than RCU has changed Linux.
761}
762}
763
764@unpublished{PaulEMcKenney2008HierarchicalRCU
765,Author="Paul E. McKenney"
766,Title="Hierarchical {RCU}"
767,month="November"
768,day="3"
769,year="2008"
770,note="Available:
771\url{http://lwn.net/Articles/305782/}
772[Viewed November 6, 2008]"
773,annotation="
774 RCU with combining-tree-based grace-period detection,
775 permitting it to handle thousands of CPUs.
776"
777}
778
779@conference{PaulEMcKenney2009MaliciousURCU
780,Author="Paul E. McKenney"
781,Title="Using a Malicious User-Level {RCU} to Torture {RCU}-Based Algorithms"
782,Booktitle="linux.conf.au 2009"
783,month="January"
784,year="2009"
785,address="Hobart, Australia"
786,note="Available:
787\url{http://www.rdrop.com/users/paulmck/RCU/urcutorture.2009.01.22a.pdf}
788[Viewed February 2, 2009]"
789,annotation="
790 Realtime RCU and torture-testing RCU uses.
791"
792}
793
794@unpublished{MathieuDesnoyers2009URCU
795,Author="Mathieu Desnoyers"
796,Title="[{RFC} git tree] Userspace {RCU} (urcu) for {Linux}"
797,month="February"
798,day="5"
799,year="2009"
800,note="Available:
801\url{http://lkml.org/lkml/2009/2/5/572}
802\url{git://lttng.org/userspace-rcu.git}
803[Viewed February 20, 2009]"
804,annotation="
805 Mathieu Desnoyers's user-space RCU implementation.
806 git://lttng.org/userspace-rcu.git
807"
808}
809
810@unpublished{PaulEMcKenney2009BloatWatchRCU
811,Author="Paul E. McKenney"
812,Title="{RCU}: The {Bloatwatch} Edition"
813,month="March"
814,day="17"
815,year="2009"
816,note="Available:
817\url{http://lwn.net/Articles/323929/}
818[Viewed March 20, 2009]"
819,annotation="
820 Uniprocessor assumptions allow simplified RCU implementation.
821"
822}
diff --git a/Documentation/RCU/UP.txt b/Documentation/RCU/UP.txt
index aab4a9ec3931..90ec5341ee98 100644
--- a/Documentation/RCU/UP.txt
+++ b/Documentation/RCU/UP.txt
@@ -2,14 +2,13 @@ RCU on Uniprocessor Systems
2 2
3 3
4A common misconception is that, on UP systems, the call_rcu() primitive 4A common misconception is that, on UP systems, the call_rcu() primitive
5may immediately invoke its function, and that the synchronize_rcu() 5may immediately invoke its function. The basis of this misconception
6primitive may return immediately. The basis of this misconception
7is that since there is only one CPU, it should not be necessary to 6is that since there is only one CPU, it should not be necessary to
8wait for anything else to get done, since there are no other CPUs for 7wait for anything else to get done, since there are no other CPUs for
9anything else to be happening on. Although this approach will -sort- -of- 8anything else to be happening on. Although this approach will -sort- -of-
10work a surprising amount of the time, it is a very bad idea in general. 9work a surprising amount of the time, it is a very bad idea in general.
11This document presents three examples that demonstrate exactly how bad an 10This document presents three examples that demonstrate exactly how bad
12idea this is. 11an idea this is.
13 12
14 13
15Example 1: softirq Suicide 14Example 1: softirq Suicide
@@ -82,11 +81,18 @@ Quick Quiz #2: What locking restriction must RCU callbacks respect?
82 81
83Summary 82Summary
84 83
85Permitting call_rcu() to immediately invoke its arguments or permitting 84Permitting call_rcu() to immediately invoke its arguments breaks RCU,
86synchronize_rcu() to immediately return breaks RCU, even on a UP system. 85even on a UP system. So do not do it! Even on a UP system, the RCU
87So do not do it! Even on a UP system, the RCU infrastructure -must- 86infrastructure -must- respect grace periods, and -must- invoke callbacks
88respect grace periods, and -must- invoke callbacks from a known environment 87from a known environment in which no locks are held.
89in which no locks are held. 88
89It -is- safe for synchronize_sched() and synchronize_rcu_bh() to return
90immediately on an UP system. It is also safe for synchronize_rcu()
91to return immediately on UP systems, except when running preemptable
92RCU.
93
94Quick Quiz #3: Why can't synchronize_rcu() return immediately on
95 UP systems running preemptable RCU?
90 96
91 97
92Answer to Quick Quiz #1: 98Answer to Quick Quiz #1:
@@ -117,3 +123,13 @@ Answer to Quick Quiz #2:
117 callbacks acquire locks directly. However, a great many RCU 123 callbacks acquire locks directly. However, a great many RCU
118 callbacks do acquire locks -indirectly-, for example, via 124 callbacks do acquire locks -indirectly-, for example, via
119 the kfree() primitive. 125 the kfree() primitive.
126
127Answer to Quick Quiz #3:
128 Why can't synchronize_rcu() return immediately on UP systems
129 running preemptable RCU?
130
131 Because some other task might have been preempted in the middle
132 of an RCU read-side critical section. If synchronize_rcu()
133 simply immediately returned, it would prematurely signal the
134 end of the grace period, which would come as a nasty shock to
135 that other thread when it started running again.
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index accfe2f5247d..51525a30e8b4 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -11,7 +11,10 @@ over a rather long period of time, but improvements are always welcome!
11 structure is updated more than about 10% of the time, then 11 structure is updated more than about 10% of the time, then
12 you should strongly consider some other approach, unless 12 you should strongly consider some other approach, unless
13 detailed performance measurements show that RCU is nonetheless 13 detailed performance measurements show that RCU is nonetheless
14 the right tool for the job. 14 the right tool for the job. Yes, you might think of RCU
15 as simply cutting overhead off of the readers and imposing it
16 on the writers. That is exactly why normal uses of RCU will
17 do much more reading than updating.
15 18
16 Another exception is where performance is not an issue, and RCU 19 Another exception is where performance is not an issue, and RCU
17 provides a simpler implementation. An example of this situation 20 provides a simpler implementation. An example of this situation
@@ -240,10 +243,11 @@ over a rather long period of time, but improvements are always welcome!
240 instead need to use synchronize_irq() or synchronize_sched(). 243 instead need to use synchronize_irq() or synchronize_sched().
241 244
24212. Any lock acquired by an RCU callback must be acquired elsewhere 24512. Any lock acquired by an RCU callback must be acquired elsewhere
243 with irq disabled, e.g., via spin_lock_irqsave(). Failing to 246 with softirq disabled, e.g., via spin_lock_irqsave(),
244 disable irq on a given acquisition of that lock will result in 247 spin_lock_bh(), etc. Failing to disable irq on a given
245 deadlock as soon as the RCU callback happens to interrupt that 248 acquisition of that lock will result in deadlock as soon as the
246 acquisition's critical section. 249 RCU callback happens to interrupt that acquisition's critical
250 section.
247 251
24813. RCU callbacks can be and are executed in parallel. In many cases, 25213. RCU callbacks can be and are executed in parallel. In many cases,
249 the callback code simply wrappers around kfree(), so that this 253 the callback code simply wrappers around kfree(), so that this
@@ -310,3 +314,9 @@ over a rather long period of time, but improvements are always welcome!
310 Because these primitives only wait for pre-existing readers, 314 Because these primitives only wait for pre-existing readers,
311 it is the caller's responsibility to guarantee safety to 315 it is the caller's responsibility to guarantee safety to
312 any subsequent readers. 316 any subsequent readers.
317
31816. The various RCU read-side primitives do -not- contain memory
319 barriers. The CPU (and in some cases, the compiler) is free
320 to reorder code into and out of RCU read-side critical sections.
321 It is the responsibility of the RCU update-side primitives to
322 deal with this.
diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt
index 7aa2002ade77..2a23523ce471 100644
--- a/Documentation/RCU/rcu.txt
+++ b/Documentation/RCU/rcu.txt
@@ -36,7 +36,7 @@ o How can the updater tell when a grace period has completed
36 executed in user mode, or executed in the idle loop, we can 36 executed in user mode, or executed in the idle loop, we can
37 safely free up that item. 37 safely free up that item.
38 38
39 Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the 39 Preemptible variants of RCU (CONFIG_TREE_PREEMPT_RCU) get the
40 same effect, but require that the readers manipulate CPU-local 40 same effect, but require that the readers manipulate CPU-local
41 counters. These counters allow limited types of blocking 41 counters. These counters allow limited types of blocking
42 within RCU read-side critical sections. SRCU also uses 42 within RCU read-side critical sections. SRCU also uses
@@ -79,10 +79,10 @@ o I hear that RCU is patented? What is with that?
79o I hear that RCU needs work in order to support realtime kernels? 79o I hear that RCU needs work in order to support realtime kernels?
80 80
81 This work is largely completed. Realtime-friendly RCU can be 81 This work is largely completed. Realtime-friendly RCU can be
82 enabled via the CONFIG_PREEMPT_RCU kernel configuration parameter. 82 enabled via the CONFIG_TREE_PREEMPT_RCU kernel configuration
83 However, work is in progress for enabling priority boosting of 83 parameter. However, work is in progress for enabling priority
84 preempted RCU read-side critical sections. This is needed if you 84 boosting of preempted RCU read-side critical sections. This is
85 have CPU-bound realtime threads. 85 needed if you have CPU-bound realtime threads.
86 86
87o Where can I find more information on RCU? 87o Where can I find more information on RCU?
88 88
diff --git a/Documentation/RCU/rcubarrier.txt b/Documentation/RCU/rcubarrier.txt
index 909602d409bb..e439a0edee22 100644
--- a/Documentation/RCU/rcubarrier.txt
+++ b/Documentation/RCU/rcubarrier.txt
@@ -170,6 +170,13 @@ module invokes call_rcu() from timers, you will need to first cancel all
170the timers, and only then invoke rcu_barrier() to wait for any remaining 170the timers, and only then invoke rcu_barrier() to wait for any remaining
171RCU callbacks to complete. 171RCU callbacks to complete.
172 172
173Of course, if you module uses call_rcu_bh(), you will need to invoke
174rcu_barrier_bh() before unloading. Similarly, if your module uses
175call_rcu_sched(), you will need to invoke rcu_barrier_sched() before
176unloading. If your module uses call_rcu(), call_rcu_bh(), -and-
177call_rcu_sched(), then you will need to invoke each of rcu_barrier(),
178rcu_barrier_bh(), and rcu_barrier_sched().
179
173 180
174Implementing rcu_barrier() 181Implementing rcu_barrier()
175 182
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt
index a342b6e1cc10..9dba3bb90e60 100644
--- a/Documentation/RCU/torture.txt
+++ b/Documentation/RCU/torture.txt
@@ -76,8 +76,10 @@ torture_type The type of RCU to test: "rcu" for the rcu_read_lock() API,
76 "rcu_sync" for rcu_read_lock() with synchronous reclamation, 76 "rcu_sync" for rcu_read_lock() with synchronous reclamation,
77 "rcu_bh" for the rcu_read_lock_bh() API, "rcu_bh_sync" for 77 "rcu_bh" for the rcu_read_lock_bh() API, "rcu_bh_sync" for
78 rcu_read_lock_bh() with synchronous reclamation, "srcu" for 78 rcu_read_lock_bh() with synchronous reclamation, "srcu" for
79 the "srcu_read_lock()" API, and "sched" for the use of 79 the "srcu_read_lock()" API, "sched" for the use of
80 preempt_disable() together with synchronize_sched(). 80 preempt_disable() together with synchronize_sched(),
81 and "sched_expedited" for the use of preempt_disable()
82 with synchronize_sched_expedited().
81 83
82verbose Enable debug printk()s. Default is disabled. 84verbose Enable debug printk()s. Default is disabled.
83 85
@@ -162,6 +164,23 @@ of the "old" and "current" counters for the corresponding CPU. The
162"idx" value maps the "old" and "current" values to the underlying array, 164"idx" value maps the "old" and "current" values to the underlying array,
163and is useful for debugging. 165and is useful for debugging.
164 166
167Similarly, sched_expedited RCU provides the following:
168
169 sched_expedited-torture: rtc: d0000000016c1880 ver: 1090796 tfle: 0 rta: 1090796 rtaf: 0 rtf: 1090787 rtmbe: 0 nt: 27713319
170 sched_expedited-torture: Reader Pipe: 12660320201 95875 0 0 0 0 0 0 0 0 0
171 sched_expedited-torture: Reader Batch: 12660424885 0 0 0 0 0 0 0 0 0 0
172 sched_expedited-torture: Free-Block Circulation: 1090795 1090795 1090794 1090793 1090792 1090791 1090790 1090789 1090788 1090787 0
173 state: -1 / 0:0 3:0 4:0
174
175As before, the first four lines are similar to those for RCU.
176The last line shows the task-migration state. The first number is
177-1 if synchronize_sched_expedited() is idle, -2 if in the process of
178posting wakeups to the migration kthreads, and N when waiting on CPU N.
179Each of the colon-separated fields following the "/" is a CPU:state pair.
180Valid states are "0" for idle, "1" for waiting for quiescent state,
181"2" for passed through quiescent state, and "3" when a race with a
182CPU-hotplug event forces use of the synchronize_sched() primitive.
183
165 184
166USAGE 185USAGE
167 186
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index 02cced183b2d..187bbf10c923 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -191,8 +191,7 @@ rcu/rcuhier (which displays the struct rcu_node hierarchy).
191 191
192The output of "cat rcu/rcudata" looks as follows: 192The output of "cat rcu/rcudata" looks as follows:
193 193
194rcu: 194rcu_sched:
195rcu:
196 0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1 dn=0 df=1101 of=0 ri=36 ql=0 b=10 195 0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1 dn=0 df=1101 of=0 ri=36 ql=0 b=10
197 1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1 dn=0 df=1015 of=0 ri=0 ql=0 b=10 196 1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1 dn=0 df=1015 of=0 ri=0 ql=0 b=10
198 2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1 dn=0 df=1839 of=0 ri=0 ql=0 b=10 197 2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1 dn=0 df=1839 of=0 ri=0 ql=0 b=10
@@ -306,7 +305,7 @@ comma-separated-variable spreadsheet format.
306 305
307The output of "cat rcu/rcugp" looks as follows: 306The output of "cat rcu/rcugp" looks as follows:
308 307
309rcu: completed=33062 gpnum=33063 308rcu_sched: completed=33062 gpnum=33063
310rcu_bh: completed=464 gpnum=464 309rcu_bh: completed=464 gpnum=464
311 310
312Again, this output is for both "rcu" and "rcu_bh". The fields are 311Again, this output is for both "rcu" and "rcu_bh". The fields are
@@ -413,7 +412,7 @@ o Each element of the form "1/1 0:127 ^0" represents one struct
413 412
414The output of "cat rcu/rcu_pending" looks as follows: 413The output of "cat rcu/rcu_pending" looks as follows:
415 414
416rcu: 415rcu_sched:
417 0 np=255892 qsp=53936 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741 416 0 np=255892 qsp=53936 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741
418 1 np=261224 qsp=54638 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792 417 1 np=261224 qsp=54638 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792
419 2 np=237496 qsp=49664 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629 418 2 np=237496 qsp=49664 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 96170824a717..e41a7fecf0d3 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -136,10 +136,10 @@ rcu_read_lock()
136 Used by a reader to inform the reclaimer that the reader is 136 Used by a reader to inform the reclaimer that the reader is
137 entering an RCU read-side critical section. It is illegal 137 entering an RCU read-side critical section. It is illegal
138 to block while in an RCU read-side critical section, though 138 to block while in an RCU read-side critical section, though
139 kernels built with CONFIG_PREEMPT_RCU can preempt RCU read-side 139 kernels built with CONFIG_TREE_PREEMPT_RCU can preempt RCU
140 critical sections. Any RCU-protected data structure accessed 140 read-side critical sections. Any RCU-protected data structure
141 during an RCU read-side critical section is guaranteed to remain 141 accessed during an RCU read-side critical section is guaranteed to
142 unreclaimed for the full duration of that critical section. 142 remain unreclaimed for the full duration of that critical section.
143 Reference counts may be used in conjunction with RCU to maintain 143 Reference counts may be used in conjunction with RCU to maintain
144 longer-term references to data structures. 144 longer-term references to data structures.
145 145
@@ -785,6 +785,7 @@ RCU pointer/list traversal:
785 rcu_dereference 785 rcu_dereference
786 list_for_each_entry_rcu 786 list_for_each_entry_rcu
787 hlist_for_each_entry_rcu 787 hlist_for_each_entry_rcu
788 hlist_nulls_for_each_entry_rcu
788 789
789 list_for_each_continue_rcu (to be deprecated in favor of new 790 list_for_each_continue_rcu (to be deprecated in favor of new
790 list_for_each_entry_continue_rcu) 791 list_for_each_entry_continue_rcu)
@@ -807,19 +808,23 @@ RCU: Critical sections Grace period Barrier
807 808
808 rcu_read_lock synchronize_net rcu_barrier 809 rcu_read_lock synchronize_net rcu_barrier
809 rcu_read_unlock synchronize_rcu 810 rcu_read_unlock synchronize_rcu
811 synchronize_rcu_expedited
810 call_rcu 812 call_rcu
811 813
812 814
813bh: Critical sections Grace period Barrier 815bh: Critical sections Grace period Barrier
814 816
815 rcu_read_lock_bh call_rcu_bh rcu_barrier_bh 817 rcu_read_lock_bh call_rcu_bh rcu_barrier_bh
816 rcu_read_unlock_bh 818 rcu_read_unlock_bh synchronize_rcu_bh
819 synchronize_rcu_bh_expedited
817 820
818 821
819sched: Critical sections Grace period Barrier 822sched: Critical sections Grace period Barrier
820 823
821 [preempt_disable] synchronize_sched rcu_barrier_sched 824 rcu_read_lock_sched synchronize_sched rcu_barrier_sched
822 [and friends] call_rcu_sched 825 rcu_read_unlock_sched call_rcu_sched
826 [preempt_disable] synchronize_sched_expedited
827 [and friends]
823 828
824 829
825SRCU: Critical sections Grace period Barrier 830SRCU: Critical sections Grace period Barrier
@@ -827,6 +832,9 @@ SRCU: Critical sections Grace period Barrier
827 srcu_read_lock synchronize_srcu N/A 832 srcu_read_lock synchronize_srcu N/A
828 srcu_read_unlock 833 srcu_read_unlock
829 834
835SRCU: Initialization/cleanup
836 init_srcu_struct
837 cleanup_srcu_struct
830 838
831See the comment headers in the source code (or the docbook generated 839See the comment headers in the source code (or the docbook generated
832from them) for more information. 840from them) for more information.
diff --git a/Documentation/arm/OMAP/omap_pm b/Documentation/arm/OMAP/omap_pm
new file mode 100644
index 000000000000..5389440aade3
--- /dev/null
+++ b/Documentation/arm/OMAP/omap_pm
@@ -0,0 +1,129 @@
1
2The OMAP PM interface
3=====================
4
5This document describes the temporary OMAP PM interface. Driver
6authors use these functions to communicate minimum latency or
7throughput constraints to the kernel power management code.
8Over time, the intention is to merge features from the OMAP PM
9interface into the Linux PM QoS code.
10
11Drivers need to express PM parameters which:
12
13- support the range of power management parameters present in the TI SRF;
14
15- separate the drivers from the underlying PM parameter
16 implementation, whether it is the TI SRF or Linux PM QoS or Linux
17 latency framework or something else;
18
19- specify PM parameters in terms of fundamental units, such as
20 latency and throughput, rather than units which are specific to OMAP
21 or to particular OMAP variants;
22
23- allow drivers which are shared with other architectures (e.g.,
24 DaVinci) to add these constraints in a way which won't affect non-OMAP
25 systems,
26
27- can be implemented immediately with minimal disruption of other
28 architectures.
29
30
31This document proposes the OMAP PM interface, including the following
32five power management functions for driver code:
33
341. Set the maximum MPU wakeup latency:
35 (*pdata->set_max_mpu_wakeup_lat)(struct device *dev, unsigned long t)
36
372. Set the maximum device wakeup latency:
38 (*pdata->set_max_dev_wakeup_lat)(struct device *dev, unsigned long t)
39
403. Set the maximum system DMA transfer start latency (CORE pwrdm):
41 (*pdata->set_max_sdma_lat)(struct device *dev, long t)
42
434. Set the minimum bus throughput needed by a device:
44 (*pdata->set_min_bus_tput)(struct device *dev, u8 agent_id, unsigned long r)
45
465. Return the number of times the device has lost context
47 (*pdata->get_dev_context_loss_count)(struct device *dev)
48
49
50Further documentation for all OMAP PM interface functions can be
51found in arch/arm/plat-omap/include/mach/omap-pm.h.
52
53
54The OMAP PM layer is intended to be temporary
55---------------------------------------------
56
57The intention is that eventually the Linux PM QoS layer should support
58the range of power management features present in OMAP3. As this
59happens, existing drivers using the OMAP PM interface can be modified
60to use the Linux PM QoS code; and the OMAP PM interface can disappear.
61
62
63Driver usage of the OMAP PM functions
64-------------------------------------
65
66As the 'pdata' in the above examples indicates, these functions are
67exposed to drivers through function pointers in driver .platform_data
68structures. The function pointers are initialized by the board-*.c
69files to point to the corresponding OMAP PM functions:
70.set_max_dev_wakeup_lat will point to
71omap_pm_set_max_dev_wakeup_lat(), etc. Other architectures which do
72not support these functions should leave these function pointers set
73to NULL. Drivers should use the following idiom:
74
75 if (pdata->set_max_dev_wakeup_lat)
76 (*pdata->set_max_dev_wakeup_lat)(dev, t);
77
78The most common usage of these functions will probably be to specify
79the maximum time from when an interrupt occurs, to when the device
80becomes accessible. To accomplish this, driver writers should use the
81set_max_mpu_wakeup_lat() function to to constrain the MPU wakeup
82latency, and the set_max_dev_wakeup_lat() function to constrain the
83device wakeup latency (from clk_enable() to accessibility). For
84example,
85
86 /* Limit MPU wakeup latency */
87 if (pdata->set_max_mpu_wakeup_lat)
88 (*pdata->set_max_mpu_wakeup_lat)(dev, tc);
89
90 /* Limit device powerdomain wakeup latency */
91 if (pdata->set_max_dev_wakeup_lat)
92 (*pdata->set_max_dev_wakeup_lat)(dev, td);
93
94 /* total wakeup latency in this example: (tc + td) */
95
96The PM parameters can be overwritten by calling the function again
97with the new value. The settings can be removed by calling the
98function with a t argument of -1 (except in the case of
99set_max_bus_tput(), which should be called with an r argument of 0).
100
101The fifth function above, omap_pm_get_dev_context_loss_count(),
102is intended as an optimization to allow drivers to determine whether the
103device has lost its internal context. If context has been lost, the
104driver must restore its internal context before proceeding.
105
106
107Other specialized interface functions
108-------------------------------------
109
110The five functions listed above are intended to be usable by any
111device driver. DSPBridge and CPUFreq have a few special requirements.
112DSPBridge expresses target DSP performance levels in terms of OPP IDs.
113CPUFreq expresses target MPU performance levels in terms of MPU
114frequency. The OMAP PM interface contains functions for these
115specialized cases to convert that input information (OPPs/MPU
116frequency) into the form that the underlying power management
117implementation needs:
118
1196. (*pdata->dsp_get_opp_table)(void)
120
1217. (*pdata->dsp_set_min_opp)(u8 opp_id)
122
1238. (*pdata->dsp_get_opp)(void)
124
1259. (*pdata->cpu_get_freq_table)(void)
126
12710. (*pdata->cpu_set_freq)(unsigned long f)
128
12911. (*pdata->cpu_get_freq)(void)
diff --git a/Documentation/arm/SA1100/ADSBitsy b/Documentation/arm/SA1100/ADSBitsy
index ab47c3833908..7197a9e958ee 100644
--- a/Documentation/arm/SA1100/ADSBitsy
+++ b/Documentation/arm/SA1100/ADSBitsy
@@ -40,4 +40,4 @@ Notes:
40 mode, the timing is off so the image is corrupted. This will be 40 mode, the timing is off so the image is corrupted. This will be
41 fixed soon. 41 fixed soon.
42 42
43Any contribution can be sent to nico@cam.org and will be greatly welcome! 43Any contribution can be sent to nico@fluxnic.net and will be greatly welcome!
diff --git a/Documentation/arm/SA1100/Assabet b/Documentation/arm/SA1100/Assabet
index 78bc1c1b04e5..91f7ce7ba426 100644
--- a/Documentation/arm/SA1100/Assabet
+++ b/Documentation/arm/SA1100/Assabet
@@ -240,7 +240,7 @@ Then, rebooting the Assabet is just a matter of waiting for the login prompt.
240 240
241 241
242Nicolas Pitre 242Nicolas Pitre
243nico@cam.org 243nico@fluxnic.net
244June 12, 2001 244June 12, 2001
245 245
246 246
diff --git a/Documentation/arm/SA1100/Brutus b/Documentation/arm/SA1100/Brutus
index 2254c8f0b326..b1cfd405dccc 100644
--- a/Documentation/arm/SA1100/Brutus
+++ b/Documentation/arm/SA1100/Brutus
@@ -60,7 +60,7 @@ little modifications.
60 60
61Any contribution is welcome. 61Any contribution is welcome.
62 62
63Please send patches to nico@cam.org 63Please send patches to nico@fluxnic.net
64 64
65Have Fun ! 65Have Fun !
66 66
diff --git a/Documentation/arm/SA1100/GraphicsClient b/Documentation/arm/SA1100/GraphicsClient
index 8fa7e8027ff1..6c9c4f5a36e1 100644
--- a/Documentation/arm/SA1100/GraphicsClient
+++ b/Documentation/arm/SA1100/GraphicsClient
@@ -4,7 +4,7 @@ For more details, contact Applied Data Systems or see
4http://www.applieddata.net/products.html 4http://www.applieddata.net/products.html
5 5
6The original Linux support for this product has been provided by 6The original Linux support for this product has been provided by
7Nicolas Pitre <nico@cam.org>. Continued development work by 7Nicolas Pitre <nico@fluxnic.net>. Continued development work by
8Woojung Huh <whuh@applieddata.net> 8Woojung Huh <whuh@applieddata.net>
9 9
10It's currently possible to mount a root filesystem via NFS providing a 10It's currently possible to mount a root filesystem via NFS providing a
@@ -94,5 +94,5 @@ Notes:
94 mode, the timing is off so the image is corrupted. This will be 94 mode, the timing is off so the image is corrupted. This will be
95 fixed soon. 95 fixed soon.
96 96
97Any contribution can be sent to nico@cam.org and will be greatly welcome! 97Any contribution can be sent to nico@fluxnic.net and will be greatly welcome!
98 98
diff --git a/Documentation/arm/SA1100/GraphicsMaster b/Documentation/arm/SA1100/GraphicsMaster
index dd28745ac521..ee7c6595f23f 100644
--- a/Documentation/arm/SA1100/GraphicsMaster
+++ b/Documentation/arm/SA1100/GraphicsMaster
@@ -4,7 +4,7 @@ For more details, contact Applied Data Systems or see
4http://www.applieddata.net/products.html 4http://www.applieddata.net/products.html
5 5
6The original Linux support for this product has been provided by 6The original Linux support for this product has been provided by
7Nicolas Pitre <nico@cam.org>. Continued development work by 7Nicolas Pitre <nico@fluxnic.net>. Continued development work by
8Woojung Huh <whuh@applieddata.net> 8Woojung Huh <whuh@applieddata.net>
9 9
10Use 'make graphicsmaster_config' before any 'make config'. 10Use 'make graphicsmaster_config' before any 'make config'.
@@ -50,4 +50,4 @@ Notes:
50 mode, the timing is off so the image is corrupted. This will be 50 mode, the timing is off so the image is corrupted. This will be
51 fixed soon. 51 fixed soon.
52 52
53Any contribution can be sent to nico@cam.org and will be greatly welcome! 53Any contribution can be sent to nico@fluxnic.net and will be greatly welcome!
diff --git a/Documentation/arm/SA1100/Victor b/Documentation/arm/SA1100/Victor
index 01e81fc49461..f938a29fdc20 100644
--- a/Documentation/arm/SA1100/Victor
+++ b/Documentation/arm/SA1100/Victor
@@ -9,7 +9,7 @@ Of course Victor is using Linux as its main operating system.
9The Victor implementation for Linux is maintained by Nicolas Pitre: 9The Victor implementation for Linux is maintained by Nicolas Pitre:
10 10
11 nico@visuaide.com 11 nico@visuaide.com
12 nico@cam.org 12 nico@fluxnic.net
13 13
14For any comments, please feel free to contact me through the above 14For any comments, please feel free to contact me through the above
15addresses. 15addresses.
diff --git a/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt b/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt
new file mode 100644
index 000000000000..76b3a11e90be
--- /dev/null
+++ b/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt
@@ -0,0 +1,75 @@
1 S3C24XX CPUfreq support
2 =======================
3
4Introduction
5------------
6
7 The S3C24XX series support a number of power saving systems, such as
8 the ability to change the core, memory and peripheral operating
9 frequencies. The core control is exported via the CPUFreq driver
10 which has a number of different manual or automatic controls over the
11 rate the core is running at.
12
13 There are two forms of the driver depending on the specific CPU and
14 how the clocks are arranged. The first implementation used as single
15 PLL to feed the ARM, memory and peripherals via a series of dividers
16 and muxes and this is the implementation that is documented here. A
17 newer version where there is a seperate PLL and clock divider for the
18 ARM core is available as a seperate driver.
19
20
21Layout
22------
23
24 The code core manages the CPU specific drivers, any data that they
25 need to register and the interface to the generic drivers/cpufreq
26 system. Each CPU registers a driver to control the PLL, clock dividers
27 and anything else associated with it. Any board that wants to use this
28 framework needs to supply at least basic details of what is required.
29
30 The core registers with drivers/cpufreq at init time if all the data
31 necessary has been supplied.
32
33
34CPU support
35-----------
36
37 The support for each CPU depends on the facilities provided by the
38 SoC and the driver as each device has different PLL and clock chains
39 associated with it.
40
41
42Slow Mode
43---------
44
45 The SLOW mode where the PLL is turned off altogether and the
46 system is fed by the external crystal input is currently not
47 supported.
48
49
50sysfs
51-----
52
53 The core code exports extra information via sysfs in the directory
54 devices/system/cpu/cpu0/arch-freq.
55
56
57Board Support
58-------------
59
60 Each board that wants to use the cpufreq code must register some basic
61 information with the core driver to provide information about what the
62 board requires and any restrictions being placed on it.
63
64 The board needs to supply information about whether it needs the IO bank
65 timings changing, any maximum frequency limits and information about the
66 SDRAM refresh rate.
67
68
69
70
71Document Author
72---------------
73
74Ben Dooks, Copyright 2009 Simtec Electronics
75Licensed under GPLv2
diff --git a/Documentation/btmrvl.txt b/Documentation/btmrvl.txt
new file mode 100644
index 000000000000..34916a46c099
--- /dev/null
+++ b/Documentation/btmrvl.txt
@@ -0,0 +1,119 @@
1=======================================================================
2 README for btmrvl driver
3=======================================================================
4
5
6All commands are used via debugfs interface.
7
8=====================
9Set/get driver configurations:
10
11Path: /debug/btmrvl/config/
12
13gpiogap=[n]
14hscfgcmd
15 These commands are used to configure the host sleep parameters.
16 bit 8:0 -- Gap
17 bit 16:8 -- GPIO
18
19 where GPIO is the pin number of GPIO used to wake up the host.
20 It could be any valid GPIO pin# (e.g. 0-7) or 0xff (SDIO interface
21 wakeup will be used instead).
22
23 where Gap is the gap in milli seconds between wakeup signal and
24 wakeup event, or 0xff for special host sleep setting.
25
26 Usage:
27 # Use SDIO interface to wake up the host and set GAP to 0x80:
28 echo 0xff80 > /debug/btmrvl/config/gpiogap
29 echo 1 > /debug/btmrvl/config/hscfgcmd
30
31 # Use GPIO pin #3 to wake up the host and set GAP to 0xff:
32 echo 0x03ff > /debug/btmrvl/config/gpiogap
33 echo 1 > /debug/btmrvl/config/hscfgcmd
34
35psmode=[n]
36pscmd
37 These commands are used to enable/disable auto sleep mode
38
39 where the option is:
40 1 -- Enable auto sleep mode
41 0 -- Disable auto sleep mode
42
43 Usage:
44 # Enable auto sleep mode
45 echo 1 > /debug/btmrvl/config/psmode
46 echo 1 > /debug/btmrvl/config/pscmd
47
48 # Disable auto sleep mode
49 echo 0 > /debug/btmrvl/config/psmode
50 echo 1 > /debug/btmrvl/config/pscmd
51
52
53hsmode=[n]
54hscmd
55 These commands are used to enable host sleep or wake up firmware
56
57 where the option is:
58 1 -- Enable host sleep
59 0 -- Wake up firmware
60
61 Usage:
62 # Enable host sleep
63 echo 1 > /debug/btmrvl/config/hsmode
64 echo 1 > /debug/btmrvl/config/hscmd
65
66 # Wake up firmware
67 echo 0 > /debug/btmrvl/config/hsmode
68 echo 1 > /debug/btmrvl/config/hscmd
69
70
71======================
72Get driver status:
73
74Path: /debug/btmrvl/status/
75
76Usage:
77 cat /debug/btmrvl/status/<args>
78
79where the args are:
80
81curpsmode
82 This command displays current auto sleep status.
83
84psstate
85 This command display the power save state.
86
87hsstate
88 This command display the host sleep state.
89
90txdnldrdy
91 This command displays the value of Tx download ready flag.
92
93
94=====================
95
96Use hcitool to issue raw hci command, refer to hcitool manual
97
98 Usage: Hcitool cmd <ogf> <ocf> [Parameters]
99
100 Interface Control Command
101 hcitool cmd 0x3f 0x5b 0xf5 0x01 0x00 --Enable All interface
102 hcitool cmd 0x3f 0x5b 0xf5 0x01 0x01 --Enable Wlan interface
103 hcitool cmd 0x3f 0x5b 0xf5 0x01 0x02 --Enable BT interface
104 hcitool cmd 0x3f 0x5b 0xf5 0x00 0x00 --Disable All interface
105 hcitool cmd 0x3f 0x5b 0xf5 0x00 0x01 --Disable Wlan interface
106 hcitool cmd 0x3f 0x5b 0xf5 0x00 0x02 --Disable BT interface
107
108=======================================================================
109
110
111SD8688 firmware:
112
113/lib/firmware/sd8688_helper.bin
114/lib/firmware/sd8688.bin
115
116
117The images can be downloaded from:
118
119git.infradead.org/users/dwmw2/linux-firmware.git/libertas/
diff --git a/Documentation/connector/Makefile b/Documentation/connector/Makefile
index 8df1a7285a06..d98e4df98e24 100644
--- a/Documentation/connector/Makefile
+++ b/Documentation/connector/Makefile
@@ -9,3 +9,8 @@ hostprogs-y := ucon
9always := $(hostprogs-y) 9always := $(hostprogs-y)
10 10
11HOSTCFLAGS_ucon.o += -I$(objtree)/usr/include 11HOSTCFLAGS_ucon.o += -I$(objtree)/usr/include
12
13all: modules
14
15modules clean:
16 $(MAKE) -C ../.. SUBDIRS=$(PWD) $@
diff --git a/Documentation/connector/cn_test.c b/Documentation/connector/cn_test.c
index 6a5be5d5c8e4..1711adc33373 100644
--- a/Documentation/connector/cn_test.c
+++ b/Documentation/connector/cn_test.c
@@ -19,6 +19,8 @@
19 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 19 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
20 */ 20 */
21 21
22#define pr_fmt(fmt) "cn_test: " fmt
23
22#include <linux/kernel.h> 24#include <linux/kernel.h>
23#include <linux/module.h> 25#include <linux/module.h>
24#include <linux/moduleparam.h> 26#include <linux/moduleparam.h>
@@ -27,18 +29,17 @@
27 29
28#include <linux/connector.h> 30#include <linux/connector.h>
29 31
30static struct cb_id cn_test_id = { 0x123, 0x456 }; 32static struct cb_id cn_test_id = { CN_NETLINK_USERS + 3, 0x456 };
31static char cn_test_name[] = "cn_test"; 33static char cn_test_name[] = "cn_test";
32static struct sock *nls; 34static struct sock *nls;
33static struct timer_list cn_test_timer; 35static struct timer_list cn_test_timer;
34 36
35void cn_test_callback(void *data) 37static void cn_test_callback(struct cn_msg *msg)
36{ 38{
37 struct cn_msg *msg = (struct cn_msg *)data; 39 pr_info("%s: %lu: idx=%x, val=%x, seq=%u, ack=%u, len=%d: %s.\n",
38 40 __func__, jiffies, msg->id.idx, msg->id.val,
39 printk("%s: %lu: idx=%x, val=%x, seq=%u, ack=%u, len=%d: %s.\n", 41 msg->seq, msg->ack, msg->len,
40 __func__, jiffies, msg->id.idx, msg->id.val, 42 msg->len ? (char *)msg->data : "");
41 msg->seq, msg->ack, msg->len, (char *)msg->data);
42} 43}
43 44
44/* 45/*
@@ -63,9 +64,7 @@ static int cn_test_want_notify(void)
63 64
64 skb = alloc_skb(size, GFP_ATOMIC); 65 skb = alloc_skb(size, GFP_ATOMIC);
65 if (!skb) { 66 if (!skb) {
66 printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", 67 pr_err("failed to allocate new skb with size=%u\n", size);
67 size);
68
69 return -ENOMEM; 68 return -ENOMEM;
70 } 69 }
71 70
@@ -114,12 +113,12 @@ static int cn_test_want_notify(void)
114 //netlink_broadcast(nls, skb, 0, ctl->group, GFP_ATOMIC); 113 //netlink_broadcast(nls, skb, 0, ctl->group, GFP_ATOMIC);
115 netlink_unicast(nls, skb, 0, 0); 114 netlink_unicast(nls, skb, 0, 0);
116 115
117 printk(KERN_INFO "Request was sent. Group=0x%x.\n", ctl->group); 116 pr_info("request was sent: group=0x%x\n", ctl->group);
118 117
119 return 0; 118 return 0;
120 119
121nlmsg_failure: 120nlmsg_failure:
122 printk(KERN_ERR "Failed to send %u.%u\n", msg->seq, msg->ack); 121 pr_err("failed to send %u.%u\n", msg->seq, msg->ack);
123 kfree_skb(skb); 122 kfree_skb(skb);
124 return -EINVAL; 123 return -EINVAL;
125} 124}
@@ -131,6 +130,8 @@ static void cn_test_timer_func(unsigned long __data)
131 struct cn_msg *m; 130 struct cn_msg *m;
132 char data[32]; 131 char data[32];
133 132
133 pr_debug("%s: timer fired with data %lu\n", __func__, __data);
134
134 m = kzalloc(sizeof(*m) + sizeof(data), GFP_ATOMIC); 135 m = kzalloc(sizeof(*m) + sizeof(data), GFP_ATOMIC);
135 if (m) { 136 if (m) {
136 137
@@ -150,7 +151,7 @@ static void cn_test_timer_func(unsigned long __data)
150 151
151 cn_test_timer_counter++; 152 cn_test_timer_counter++;
152 153
153 mod_timer(&cn_test_timer, jiffies + HZ); 154 mod_timer(&cn_test_timer, jiffies + msecs_to_jiffies(1000));
154} 155}
155 156
156static int cn_test_init(void) 157static int cn_test_init(void)
@@ -168,8 +169,10 @@ static int cn_test_init(void)
168 } 169 }
169 170
170 setup_timer(&cn_test_timer, cn_test_timer_func, 0); 171 setup_timer(&cn_test_timer, cn_test_timer_func, 0);
171 cn_test_timer.expires = jiffies + HZ; 172 mod_timer(&cn_test_timer, jiffies + msecs_to_jiffies(1000));
172 add_timer(&cn_test_timer); 173
174 pr_info("initialized with id={%u.%u}\n",
175 cn_test_id.idx, cn_test_id.val);
173 176
174 return 0; 177 return 0;
175 178
diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt
index ad6e0ba7b38c..81e6bf6ead57 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -5,10 +5,10 @@ Kernel Connector.
5Kernel connector - new netlink based userspace <-> kernel space easy 5Kernel connector - new netlink based userspace <-> kernel space easy
6to use communication module. 6to use communication module.
7 7
8Connector driver adds possibility to connect various agents using 8The Connector driver makes it easy to connect various agents using a
9netlink based network. One must register callback and 9netlink based network. One must register a callback and an identifier.
10identifier. When driver receives special netlink message with 10When the driver receives a special netlink message with the appropriate
11appropriate identifier, appropriate callback will be called. 11identifier, the appropriate callback will be called.
12 12
13From the userspace point of view it's quite straightforward: 13From the userspace point of view it's quite straightforward:
14 14
@@ -17,10 +17,10 @@ From the userspace point of view it's quite straightforward:
17 send(); 17 send();
18 recv(); 18 recv();
19 19
20But if kernelspace want to use full power of such connections, driver 20But if kernelspace wants to use the full power of such connections, the
21writer must create special sockets, must know about struct sk_buff 21driver writer must create special sockets, must know about struct sk_buff
22handling... Connector allows any kernelspace agents to use netlink 22handling, etc... The Connector driver allows any kernelspace agents to use
23based networking for inter-process communication in a significantly 23netlink based networking for inter-process communication in a significantly
24easier way: 24easier way:
25 25
26int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *)); 26int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *));
@@ -32,15 +32,15 @@ struct cb_id
32 __u32 val; 32 __u32 val;
33}; 33};
34 34
35idx and val are unique identifiers which must be registered in 35idx and val are unique identifiers which must be registered in the
36connector.h for in-kernel usage. void (*callback) (void *) - is a 36connector.h header for in-kernel usage. void (*callback) (void *) is a
37callback function which will be called when message with above idx.val 37callback function which will be called when a message with above idx.val
38will be received by connector core. Argument for that function must 38is received by the connector core. The argument for that function must
39be dereferenced to struct cn_msg *. 39be dereferenced to struct cn_msg *.
40 40
41struct cn_msg 41struct cn_msg
42{ 42{
43 struct cb_id id; 43 struct cb_id id;
44 44
45 __u32 seq; 45 __u32 seq;
46 __u32 ack; 46 __u32 ack;
@@ -55,92 +55,95 @@ Connector interfaces.
55 55
56int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *)); 56int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *));
57 57
58Registers new callback with connector core. 58 Registers new callback with connector core.
59 59
60struct cb_id *id - unique connector's user identifier. 60 struct cb_id *id - unique connector's user identifier.
61 It must be registered in connector.h for legal in-kernel users. 61 It must be registered in connector.h for legal in-kernel users.
62char *name - connector's callback symbolic name. 62 char *name - connector's callback symbolic name.
63void (*callback) (void *) - connector's callback. 63 void (*callback) (void *) - connector's callback.
64 Argument must be dereferenced to struct cn_msg *. 64 Argument must be dereferenced to struct cn_msg *.
65 65
66
66void cn_del_callback(struct cb_id *id); 67void cn_del_callback(struct cb_id *id);
67 68
68Unregisters new callback with connector core. 69 Unregisters new callback with connector core.
70
71 struct cb_id *id - unique connector's user identifier.
69 72
70struct cb_id *id - unique connector's user identifier.
71 73
72int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask); 74int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);
73 75
74Sends message to the specified groups. It can be safely called from 76 Sends message to the specified groups. It can be safely called from
75softirq context, but may silently fail under strong memory pressure. 77 softirq context, but may silently fail under strong memory pressure.
76If there are no listeners for given group -ESRCH can be returned. 78 If there are no listeners for given group -ESRCH can be returned.
77 79
78struct cn_msg * - message header(with attached data). 80 struct cn_msg * - message header(with attached data).
79u32 __group - destination group. 81 u32 __group - destination group.
80 If __group is zero, then appropriate group will 82 If __group is zero, then appropriate group will
81 be searched through all registered connector users, 83 be searched through all registered connector users,
82 and message will be delivered to the group which was 84 and message will be delivered to the group which was
83 created for user with the same ID as in msg. 85 created for user with the same ID as in msg.
84 If __group is not zero, then message will be delivered 86 If __group is not zero, then message will be delivered
85 to the specified group. 87 to the specified group.
86int gfp_mask - GFP mask. 88 int gfp_mask - GFP mask.
87 89
88Note: When registering new callback user, connector core assigns 90 Note: When registering new callback user, connector core assigns
89netlink group to the user which is equal to it's id.idx. 91 netlink group to the user which is equal to it's id.idx.
90 92
91/*****************************************/ 93/*****************************************/
92Protocol description. 94Protocol description.
93/*****************************************/ 95/*****************************************/
94 96
95Current offers transport layer with fixed header. Recommended 97The current framework offers a transport layer with fixed headers. The
96protocol which uses such header is following: 98recommended protocol which uses such a header is as following:
97 99
98msg->seq and msg->ack are used to determine message genealogy. When 100msg->seq and msg->ack are used to determine message genealogy. When
99someone sends message it puts there locally unique sequence and random 101someone sends a message, they use a locally unique sequence and random
100acknowledge numbers. Sequence number may be copied into 102acknowledge number. The sequence number may be copied into
101nlmsghdr->nlmsg_seq too. 103nlmsghdr->nlmsg_seq too.
102 104
103Sequence number is incremented with each message to be sent. 105The sequence number is incremented with each message sent.
104 106
105If we expect reply to our message, then sequence number in received 107If you expect a reply to the message, then the sequence number in the
106message MUST be the same as in original message, and acknowledge 108received message MUST be the same as in the original message, and the
107number MUST be the same + 1. 109acknowledge number MUST be the same + 1.
108 110
109If we receive message and it's sequence number is not equal to one we 111If we receive a message and its sequence number is not equal to one we
110are expecting, then it is new message. If we receive message and it's 112are expecting, then it is a new message. If we receive a message and
111sequence number is the same as one we are expecting, but it's 113its sequence number is the same as one we are expecting, but its
112acknowledge is not equal acknowledge number in original message + 1, 114acknowledge is not equal to the acknowledge number in the original
113then it is new message. 115message + 1, then it is a new message.
114 116
115Obviously, protocol header contains above id. 117Obviously, the protocol header contains the above id.
116 118
117connector allows event notification in the following form: kernel 119The connector allows event notification in the following form: kernel
118driver or userspace process can ask connector to notify it when 120driver or userspace process can ask connector to notify it when
119selected id's will be turned on or off(registered or unregistered it's 121selected ids will be turned on or off (registered or unregistered its
120callback). It is done by sending special command to connector 122callback). It is done by sending a special command to the connector
121driver(it also registers itself with id={-1, -1}). 123driver (it also registers itself with id={-1, -1}).
122 124
123As example of usage Documentation/connector now contains cn_test.c - 125As example of this usage can be found in the cn_test.c module which
124testing module which uses connector to request notification and to 126uses the connector to request notification and to send messages.
125send messages.
126 127
127/*****************************************/ 128/*****************************************/
128Reliability. 129Reliability.
129/*****************************************/ 130/*****************************************/
130 131
131Netlink itself is not reliable protocol, that means that messages can 132Netlink itself is not a reliable protocol. That means that messages can
132be lost due to memory pressure or process' receiving queue overflowed, 133be lost due to memory pressure or process' receiving queue overflowed,
133so caller is warned must be prepared. That is why struct cn_msg [main 134so caller is warned that it must be prepared. That is why the struct
134connector's message header] contains u32 seq and u32 ack fields. 135cn_msg [main connector's message header] contains u32 seq and u32 ack
136fields.
135 137
136/*****************************************/ 138/*****************************************/
137Userspace usage. 139Userspace usage.
138/*****************************************/ 140/*****************************************/
141
1392.6.14 has a new netlink socket implementation, which by default does not 1422.6.14 has a new netlink socket implementation, which by default does not
140allow to send data to netlink groups other than 1. 143allow people to send data to netlink groups other than 1.
141So, if to use netlink socket (for example using connector) 144So, if you wish to use a netlink socket (for example using connector)
142with different group number userspace application must subscribe to 145with a different group number, the userspace application must subscribe to
143that group. It can be achieved by following pseudocode: 146that group first. It can be achieved by the following pseudocode:
144 147
145s = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR); 148s = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
146 149
@@ -160,8 +163,8 @@ if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) {
160} 163}
161 164
162Where 270 above is SOL_NETLINK, and 1 is a NETLINK_ADD_MEMBERSHIP socket 165Where 270 above is SOL_NETLINK, and 1 is a NETLINK_ADD_MEMBERSHIP socket
163option. To drop multicast subscription one should call above socket option 166option. To drop a multicast subscription, one should call the above socket
164with NETLINK_DROP_MEMBERSHIP parameter which is defined as 0. 167option with the NETLINK_DROP_MEMBERSHIP parameter which is defined as 0.
165 168
1662.6.14 netlink code only allows to select a group which is less or equal to 1692.6.14 netlink code only allows to select a group which is less or equal to
167the maximum group number, which is used at netlink_kernel_create() time. 170the maximum group number, which is used at netlink_kernel_create() time.
diff --git a/Documentation/connector/ucon.c b/Documentation/connector/ucon.c
index c5092ad0ce4b..4848db8c71ff 100644
--- a/Documentation/connector/ucon.c
+++ b/Documentation/connector/ucon.c
@@ -30,18 +30,24 @@
30 30
31#include <arpa/inet.h> 31#include <arpa/inet.h>
32 32
33#include <stdbool.h>
33#include <stdio.h> 34#include <stdio.h>
34#include <stdlib.h> 35#include <stdlib.h>
35#include <unistd.h> 36#include <unistd.h>
36#include <string.h> 37#include <string.h>
37#include <errno.h> 38#include <errno.h>
38#include <time.h> 39#include <time.h>
40#include <getopt.h>
39 41
40#include <linux/connector.h> 42#include <linux/connector.h>
41 43
42#define DEBUG 44#define DEBUG
43#define NETLINK_CONNECTOR 11 45#define NETLINK_CONNECTOR 11
44 46
47/* Hopefully your userspace connector.h matches this kernel */
48#define CN_TEST_IDX CN_NETLINK_USERS + 3
49#define CN_TEST_VAL 0x456
50
45#ifdef DEBUG 51#ifdef DEBUG
46#define ulog(f, a...) fprintf(stdout, f, ##a) 52#define ulog(f, a...) fprintf(stdout, f, ##a)
47#else 53#else
@@ -83,6 +89,25 @@ static int netlink_send(int s, struct cn_msg *msg)
83 return err; 89 return err;
84} 90}
85 91
92static void usage(void)
93{
94 printf(
95 "Usage: ucon [options] [output file]\n"
96 "\n"
97 "\t-h\tthis help screen\n"
98 "\t-s\tsend buffers to the test module\n"
99 "\n"
100 "The default behavior of ucon is to subscribe to the test module\n"
101 "and wait for state messages. Any ones received are dumped to the\n"
102 "specified output file (or stdout). The test module is assumed to\n"
103 "have an id of {%u.%u}\n"
104 "\n"
105 "If you get no output, then verify the cn_test module id matches\n"
106 "the expected id above.\n"
107 , CN_TEST_IDX, CN_TEST_VAL
108 );
109}
110
86int main(int argc, char *argv[]) 111int main(int argc, char *argv[])
87{ 112{
88 int s; 113 int s;
@@ -94,17 +119,34 @@ int main(int argc, char *argv[])
94 FILE *out; 119 FILE *out;
95 time_t tm; 120 time_t tm;
96 struct pollfd pfd; 121 struct pollfd pfd;
122 bool send_msgs = false;
97 123
98 if (argc < 2) 124 while ((s = getopt(argc, argv, "hs")) != -1) {
99 out = stdout; 125 switch (s) {
100 else { 126 case 's':
101 out = fopen(argv[1], "a+"); 127 send_msgs = true;
128 break;
129
130 case 'h':
131 usage();
132 return 0;
133
134 default:
135 /* getopt() outputs an error for us */
136 usage();
137 return 1;
138 }
139 }
140
141 if (argc != optind) {
142 out = fopen(argv[optind], "a+");
102 if (!out) { 143 if (!out) {
103 ulog("Unable to open %s for writing: %s\n", 144 ulog("Unable to open %s for writing: %s\n",
104 argv[1], strerror(errno)); 145 argv[1], strerror(errno));
105 out = stdout; 146 out = stdout;
106 } 147 }
107 } 148 } else
149 out = stdout;
108 150
109 memset(buf, 0, sizeof(buf)); 151 memset(buf, 0, sizeof(buf));
110 152
@@ -115,9 +157,11 @@ int main(int argc, char *argv[])
115 } 157 }
116 158
117 l_local.nl_family = AF_NETLINK; 159 l_local.nl_family = AF_NETLINK;
118 l_local.nl_groups = 0x123; /* bitmask of requested groups */ 160 l_local.nl_groups = -1; /* bitmask of requested groups */
119 l_local.nl_pid = 0; 161 l_local.nl_pid = 0;
120 162
163 ulog("subscribing to %u.%u\n", CN_TEST_IDX, CN_TEST_VAL);
164
121 if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) { 165 if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) {
122 perror("bind"); 166 perror("bind");
123 close(s); 167 close(s);
@@ -130,15 +174,15 @@ int main(int argc, char *argv[])
130 setsockopt(s, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, &on, sizeof(on)); 174 setsockopt(s, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, &on, sizeof(on));
131 } 175 }
132#endif 176#endif
133 if (0) { 177 if (send_msgs) {
134 int i, j; 178 int i, j;
135 179
136 memset(buf, 0, sizeof(buf)); 180 memset(buf, 0, sizeof(buf));
137 181
138 data = (struct cn_msg *)buf; 182 data = (struct cn_msg *)buf;
139 183
140 data->id.idx = 0x123; 184 data->id.idx = CN_TEST_IDX;
141 data->id.val = 0x456; 185 data->id.val = CN_TEST_VAL;
142 data->seq = seq++; 186 data->seq = seq++;
143 data->ack = 0; 187 data->ack = 0;
144 data->len = 0; 188 data->len = 0;
diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt
index 5d5f5fadd1c2..2a5b850847c0 100644
--- a/Documentation/cpu-freq/user-guide.txt
+++ b/Documentation/cpu-freq/user-guide.txt
@@ -176,7 +176,9 @@ scaling_governor, and by "echoing" the name of another
176 work on some specific architectures or 176 work on some specific architectures or
177 processors. 177 processors.
178 178
179cpuinfo_cur_freq : Current speed of the CPU, in KHz. 179cpuinfo_cur_freq : Current frequency of the CPU as obtained from
180 the hardware, in KHz. This is the frequency
181 the CPU actually runs at.
180 182
181scaling_available_frequencies : List of available frequencies, in KHz. 183scaling_available_frequencies : List of available frequencies, in KHz.
182 184
@@ -196,7 +198,10 @@ related_cpus : List of CPUs that need some sort of frequency
196 198
197scaling_driver : Hardware driver for cpufreq. 199scaling_driver : Hardware driver for cpufreq.
198 200
199scaling_cur_freq : Current frequency of the CPU, in KHz. 201scaling_cur_freq : Current frequency of the CPU as determined by
202 the governor and cpufreq core, in KHz. This is
203 the frequency the kernel thinks the CPU runs
204 at.
200 205
201If you have selected the "userspace" governor which allows you to 206If you have selected the "userspace" governor which allows you to
202set the CPU operating frequency to a specific value, you can read out 207set the CPU operating frequency to a specific value, you can read out
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 88519daab6e9..e1efc400bed6 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -152,7 +152,6 @@ piggy.gz
152piggyback 152piggyback
153pnmtologo 153pnmtologo
154ppc_defs.h* 154ppc_defs.h*
155promcon_tbl.c
156pss_boot.h 155pss_boot.h
157qconf 156qconf
158raid6altivec*.c 157raid6altivec*.c
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 09e031c55887..fa75220f8d34 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -6,6 +6,35 @@ be removed from this file.
6 6
7--------------------------- 7---------------------------
8 8
9What: PRISM54
10When: 2.6.34
11
12Why: prism54 FullMAC PCI / Cardbus devices used to be supported only by the
13 prism54 wireless driver. After Intersil stopped selling these
14 devices in preference for the newer more flexible SoftMAC devices
15 a SoftMAC device driver was required and prism54 did not support
16 them. The p54pci driver now exists and has been present in the kernel for
17 a while. This driver supports both SoftMAC devices and FullMAC devices.
18 The main difference between these devices was the amount of memory which
19 could be used for the firmware. The SoftMAC devices support a smaller
20 amount of memory. Because of this the SoftMAC firmware fits into FullMAC
21 devices's memory. p54pci supports not only PCI / Cardbus but also USB
22 and SPI. Since p54pci supports all devices prism54 supports
23 you will have a conflict. I'm not quite sure how distributions are
24 handling this conflict right now. prism54 was kept around due to
25 claims users may experience issues when using the SoftMAC driver.
26 Time has passed users have not reported issues. If you use prism54
27 and for whatever reason you cannot use p54pci please let us know!
28 E-mail us at: linux-wireless@vger.kernel.org
29
30 For more information see the p54 wiki page:
31
32 http://wireless.kernel.org/en/users/Drivers/p54
33
34Who: Luis R. Rodriguez <lrodriguez@atheros.com>
35
36---------------------------
37
9What: IRQF_SAMPLE_RANDOM 38What: IRQF_SAMPLE_RANDOM
10Check: IRQF_SAMPLE_RANDOM 39Check: IRQF_SAMPLE_RANDOM
11When: July 2009 40When: July 2009
@@ -206,24 +235,6 @@ Who: Len Brown <len.brown@intel.com>
206 235
207--------------------------- 236---------------------------
208 237
209What: libata spindown skipping and warning
210When: Dec 2008
211Why: Some halt(8) implementations synchronize caches for and spin
212 down libata disks because libata didn't use to spin down disk on
213 system halt (only synchronized caches).
214 Spin down on system halt is now implemented. sysfs node
215 /sys/class/scsi_disk/h:c:i:l/manage_start_stop is present if
216 spin down support is available.
217 Because issuing spin down command to an already spun down disk
218 makes some disks spin up just to spin down again, libata tracks
219 device spindown status to skip the extra spindown command and
220 warn about it.
221 This is to give userspace tools the time to get updated and will
222 be removed after userspace is reasonably updated.
223Who: Tejun Heo <htejun@gmail.com>
224
225---------------------------
226
227What: i386/x86_64 bzImage symlinks 238What: i386/x86_64 bzImage symlinks
228When: April 2010 239When: April 2010
229 240
@@ -235,31 +246,6 @@ Who: Thomas Gleixner <tglx@linutronix.de>
235--------------------------- 246---------------------------
236 247
237What (Why): 248What (Why):
238 - include/linux/netfilter_ipv4/ipt_TOS.h ipt_tos.h header files
239 (superseded by xt_TOS/xt_tos target & match)
240
241 - "forwarding" header files like ipt_mac.h in
242 include/linux/netfilter_ipv4/ and include/linux/netfilter_ipv6/
243
244 - xt_CONNMARK match revision 0
245 (superseded by xt_CONNMARK match revision 1)
246
247 - xt_MARK target revisions 0 and 1
248 (superseded by xt_MARK match revision 2)
249
250 - xt_connmark match revision 0
251 (superseded by xt_connmark match revision 1)
252
253 - xt_conntrack match revision 0
254 (superseded by xt_conntrack match revision 1)
255
256 - xt_iprange match revision 0,
257 include/linux/netfilter_ipv4/ipt_iprange.h
258 (superseded by xt_iprange match revision 1)
259
260 - xt_mark match revision 0
261 (superseded by xt_mark match revision 1)
262
263 - xt_recent: the old ipt_recent proc dir 249 - xt_recent: the old ipt_recent proc dir
264 (superseded by /proc/net/xt_recent) 250 (superseded by /proc/net/xt_recent)
265 251
@@ -394,15 +380,6 @@ Who: Thomas Gleixner <tglx@linutronix.de>
394 380
395----------------------------- 381-----------------------------
396 382
397What: obsolete generic irq defines and typedefs
398When: 2.6.30
399Why: The defines and typedefs (hw_interrupt_type, no_irq_type, irq_desc_t)
400 have been kept around for migration reasons. After more than two years
401 it's time to remove them finally
402Who: Thomas Gleixner <tglx@linutronix.de>
403
404---------------------------
405
406What: fakephp and associated sysfs files in /sys/bus/pci/slots/ 383What: fakephp and associated sysfs files in /sys/bus/pci/slots/
407When: 2011 384When: 2011
408Why: In 2.6.27, the semantics of /sys/bus/pci/slots was redefined to 385Why: In 2.6.27, the semantics of /sys/bus/pci/slots was redefined to
@@ -451,16 +428,6 @@ Who: Johannes Berg <johannes@sipsolutions.net>
451 428
452---------------------------- 429----------------------------
453 430
454What: CONFIG_X86_OLD_MCE
455When: 2.6.32
456Why: Remove the old legacy 32bit machine check code. This has been
457 superseded by the newer machine check code from the 64bit port,
458 but the old version has been kept around for easier testing. Note this
459 doesn't impact the old P5 and WinChip machine check handlers.
460Who: Andi Kleen <andi@firstfloor.org>
461
462----------------------------
463
464What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be 431What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be
465 exported interface anymore. 432 exported interface anymore.
466When: 2.6.33 433When: 2.6.33
@@ -468,3 +435,27 @@ Why: cpu_policy_rwsem has a new cleaner definition making it local to
468 cpufreq core and contained inside cpufreq.c. Other dependent 435 cpufreq core and contained inside cpufreq.c. Other dependent
469 drivers should not use it in order to safely avoid lockdep issues. 436 drivers should not use it in order to safely avoid lockdep issues.
470Who: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> 437Who: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
438
439----------------------------
440
441What: sound-slot/service-* module aliases and related clutters in
442 sound/sound_core.c
443When: August 2010
444Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR
445 (14) and requests modules using custom sound-slot/service-*
446 module aliases. The only benefit of doing this is allowing
447 use of custom module aliases which might as well be considered
448 a bug at this point. This preemptive claiming prevents
449 alternative OSS implementations.
450
451 Till the feature is removed, the kernel will be requesting
452 both sound-slot/service-* and the standard char-major-* module
453 aliases and allow turning off the pre-claiming selectively via
454 CONFIG_SOUND_OSS_CORE_PRECLAIM and soundcore.preclaim_oss
455 kernel parameter.
456
457 After the transition phase is complete, both the custom module
458 aliases and switches to disable it will go away. This removal
459 will also allow making ALSA OSS emulation independent of
460 sound_core. The dependency will be broken then too.
461Who: Tejun Heo <tj@kernel.org>
diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt
index bf8080640eba..6208f55c44c3 100644
--- a/Documentation/filesystems/9p.txt
+++ b/Documentation/filesystems/9p.txt
@@ -123,6 +123,9 @@ available from the same CVS repository.
123There are user and developer mailing lists available through the v9fs project 123There are user and developer mailing lists available through the v9fs project
124on sourceforge (http://sourceforge.net/projects/v9fs). 124on sourceforge (http://sourceforge.net/projects/v9fs).
125 125
126A stand-alone version of the module (which should build for any 2.6 kernel)
127is available via (http://github.com/ericvh/9p-sac/tree/master)
128
126News and other information is maintained on SWiK (http://swik.net/v9fs). 129News and other information is maintained on SWiK (http://swik.net/v9fs).
127 130
128Bug reports may be issued through the kernel.org bugzilla 131Bug reports may be issued through the kernel.org bugzilla
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 7be02ac5fa36..18b5ec8cea45 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -134,15 +134,9 @@ ro Mount filesystem read only. Note that ext4 will
134 mount options "ro,noload" can be used to prevent 134 mount options "ro,noload" can be used to prevent
135 writes to the filesystem. 135 writes to the filesystem.
136 136
137journal_checksum Enable checksumming of the journal transactions.
138 This will allow the recovery code in e2fsck and the
139 kernel to detect corruption in the kernel. It is a
140 compatible change and will be ignored by older kernels.
141
142journal_async_commit Commit block can be written to disk without waiting 137journal_async_commit Commit block can be written to disk without waiting
143 for descriptor blocks. If enabled older kernels cannot 138 for descriptor blocks. If enabled older kernels cannot
144 mount the device. This will enable 'journal_checksum' 139 mount the device.
145 internally.
146 140
147journal=update Update the ext4 file system's journal to the current 141journal=update Update the ext4 file system's journal to the current
148 format. 142 format.
@@ -263,10 +257,18 @@ resuid=n The user ID which may use the reserved blocks.
263 257
264sb=n Use alternate superblock at this location. 258sb=n Use alternate superblock at this location.
265 259
266quota 260quota These options are ignored by the filesystem. They
267noquota 261noquota are used only by quota tools to recognize volumes
268grpquota 262grpquota where quota should be turned on. See documentation
269usrquota 263usrquota in the quota-tools package for more details
264 (http://sourceforge.net/projects/linuxquota).
265
266jqfmt=<quota type> These options tell filesystem details about quota
267usrjquota=<file> so that quota information can be properly updated
268grpjquota=<file> during journal replay. They replace the above
269 quota options. See documentation in the quota-tools
270 package for more details
271 (http://sourceforge.net/projects/linuxquota).
270 272
271bh (*) ext4 associates buffer heads to data pages to 273bh (*) ext4 associates buffer heads to data pages to
272nobh (a) cache disk block mapping information 274nobh (a) cache disk block mapping information
diff --git a/Documentation/filesystems/gfs2-uevents.txt b/Documentation/filesystems/gfs2-uevents.txt
new file mode 100644
index 000000000000..fd966dc9979a
--- /dev/null
+++ b/Documentation/filesystems/gfs2-uevents.txt
@@ -0,0 +1,100 @@
1 uevents and GFS2
2 ==================
3
4During the lifetime of a GFS2 mount, a number of uevents are generated.
5This document explains what the events are and what they are used
6for (by gfs_controld in gfs2-utils).
7
8A list of GFS2 uevents
9-----------------------
10
111. ADD
12
13The ADD event occurs at mount time. It will always be the first
14uevent generated by the newly created filesystem. If the mount
15is successful, an ONLINE uevent will follow. If it is not successful
16then a REMOVE uevent will follow.
17
18The ADD uevent has two environment variables: SPECTATOR=[0|1]
19and RDONLY=[0|1] that specify the spectator status (a read-only mount
20with no journal assigned), and read-only (with journal assigned) status
21of the filesystem respectively.
22
232. ONLINE
24
25The ONLINE uevent is generated after a successful mount or remount. It
26has the same environment variables as the ADD uevent. The ONLINE
27uevent, along with the two environment variables for spectator and
28RDONLY are a relatively recent addition (2.6.32-rc+) and will not
29be generated by older kernels.
30
313. CHANGE
32
33The CHANGE uevent is used in two places. One is when reporting the
34successful mount of the filesystem by the first node (FIRSTMOUNT=Done).
35This is used as a signal by gfs_controld that it is then ok for other
36nodes in the cluster to mount the filesystem.
37
38The other CHANGE uevent is used to inform of the completion
39of journal recovery for one of the filesystems journals. It has
40two environment variables, JID= which specifies the journal id which
41has just been recovered, and RECOVERY=[Done|Failed] to indicate the
42success (or otherwise) of the operation. These uevents are generated
43for every journal recovered, whether it is during the initial mount
44process or as the result of gfs_controld requesting a specific journal
45recovery via the /sys/fs/gfs2/<fsname>/lock_module/recovery file.
46
47Because the CHANGE uevent was used (in early versions of gfs_controld)
48without checking the environment variables to discover the state, we
49cannot add any more functions to it without running the risk of
50someone using an older version of the user tools and breaking their
51cluster. For this reason the ONLINE uevent was used when adding a new
52uevent for a successful mount or remount.
53
544. OFFLINE
55
56The OFFLINE uevent is only generated due to filesystem errors and is used
57as part of the "withdraw" mechanism. Currently this doesn't give any
58information about what the error is, which is something that needs to
59be fixed.
60
615. REMOVE
62
63The REMOVE uevent is generated at the end of an unsuccessful mount
64or at the end of a umount of the filesystem. All REMOVE uevents will
65have been preceeded by at least an ADD uevent for the same fileystem,
66and unlike the other uevents is generated automatically by the kernel's
67kobject subsystem.
68
69
70Information common to all GFS2 uevents (uevent environment variables)
71----------------------------------------------------------------------
72
731. LOCKTABLE=
74
75The LOCKTABLE is a string, as supplied on the mount command
76line (locktable=) or via fstab. It is used as a filesystem label
77as well as providing the information for a lock_dlm mount to be
78able to join the cluster.
79
802. LOCKPROTO=
81
82The LOCKPROTO is a string, and its value depends on what is set
83on the mount command line, or via fstab. It will be either
84lock_nolock or lock_dlm. In the future other lock managers
85may be supported.
86
873. JOURNALID=
88
89If a journal is in use by the filesystem (journals are not
90assigned for spectator mounts) then this will give the
91numeric journal id in all GFS2 uevents.
92
934. UUID=
94
95With recent versions of gfs2-utils, mkfs.gfs2 writes a UUID
96into the filesystem superblock. If it exists, this will
97be included in every uevent relating to the filesystem.
98
99
100
diff --git a/Documentation/filesystems/nfs.txt b/Documentation/filesystems/nfs.txt
new file mode 100644
index 000000000000..f50f26ce6cd0
--- /dev/null
+++ b/Documentation/filesystems/nfs.txt
@@ -0,0 +1,98 @@
1
2The NFS client
3==============
4
5The NFS version 2 protocol was first documented in RFC1094 (March 1989).
6Since then two more major releases of NFS have been published, with NFSv3
7being documented in RFC1813 (June 1995), and NFSv4 in RFC3530 (April
82003).
9
10The Linux NFS client currently supports all the above published versions,
11and work is in progress on adding support for minor version 1 of the NFSv4
12protocol.
13
14The purpose of this document is to provide information on some of the
15upcall interfaces that are used in order to provide the NFS client with
16some of the information that it requires in order to fully comply with
17the NFS spec.
18
19The DNS resolver
20================
21
22NFSv4 allows for one server to refer the NFS client to data that has been
23migrated onto another server by means of the special "fs_locations"
24attribute. See
25 http://tools.ietf.org/html/rfc3530#section-6
26and
27 http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00
28
29The fs_locations information can take the form of either an ip address and
30a path, or a DNS hostname and a path. The latter requires the NFS client to
31do a DNS lookup in order to mount the new volume, and hence the need for an
32upcall to allow userland to provide this service.
33
34Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual
35/var/lib/nfs/rpc_pipefs, the upcall consists of the following steps:
36
37 (1) The process checks the dns_resolve cache to see if it contains a
38 valid entry. If so, it returns that entry and exits.
39
40 (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent'
41 (may be changed using the 'nfs.cache_getent' kernel boot parameter)
42 is run, with two arguments:
43 - the cache name, "dns_resolve"
44 - the hostname to resolve
45
46 (3) After looking up the corresponding ip address, the helper script
47 writes the result into the rpc_pipefs pseudo-file
48 '/var/lib/nfs/rpc_pipefs/cache/dns_resolve/channel'
49 in the following (text) format:
50
51 "<ip address> <hostname> <ttl>\n"
52
53 Where <ip address> is in the usual IPv4 (123.456.78.90) or IPv6
54 (ffee:ddcc:bbaa:9988:7766:5544:3322:1100, ffee::1100, ...) format.
55 <hostname> is identical to the second argument of the helper
56 script, and <ttl> is the 'time to live' of this cache entry (in
57 units of seconds).
58
59 Note: If <ip address> is invalid, say the string "0", then a negative
60 entry is created, which will cause the kernel to treat the hostname
61 as having no valid DNS translation.
62
63
64
65
66A basic sample /sbin/nfs_cache_getent
67=====================================
68
69#!/bin/bash
70#
71ttl=600
72#
73cut=/usr/bin/cut
74getent=/usr/bin/getent
75rpc_pipefs=/var/lib/nfs/rpc_pipefs
76#
77die()
78{
79 echo "Usage: $0 cache_name entry_name"
80 exit 1
81}
82
83[ $# -lt 2 ] && die
84cachename="$1"
85cache_path=${rpc_pipefs}/cache/${cachename}/channel
86
87case "${cachename}" in
88 dns_resolve)
89 name="$2"
90 result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )"
91 [ -z "${result}" ] && result="0"
92 ;;
93 *)
94 die
95 ;;
96esac
97echo "${result} ${name} ${ttl}" >${cache_path}
98
diff --git a/Documentation/filesystems/seq_file.txt b/Documentation/filesystems/seq_file.txt
index b843743aa0b5..0d15ebccf5b0 100644
--- a/Documentation/filesystems/seq_file.txt
+++ b/Documentation/filesystems/seq_file.txt
@@ -46,7 +46,7 @@ better to do. The file is seekable, in that one can do something like the
46following: 46following:
47 47
48 dd if=/proc/sequence of=out1 count=1 48 dd if=/proc/sequence of=out1 count=1
49 dd if=/proc/sequence skip=1 out=out2 count=1 49 dd if=/proc/sequence skip=1 of=out2 count=1
50 50
51Then concatenate the output files out1 and out2 and get the right 51Then concatenate the output files out1 and out2 and get the right
52result. Yes, it is a thoroughly useless module, but the point is to show 52result. Yes, it is a thoroughly useless module, but the point is to show
diff --git a/Documentation/flexible-arrays.txt b/Documentation/flexible-arrays.txt
new file mode 100644
index 000000000000..84eb26808dee
--- /dev/null
+++ b/Documentation/flexible-arrays.txt
@@ -0,0 +1,99 @@
1Using flexible arrays in the kernel
2Last updated for 2.6.31
3Jonathan Corbet <corbet@lwn.net>
4
5Large contiguous memory allocations can be unreliable in the Linux kernel.
6Kernel programmers will sometimes respond to this problem by allocating
7pages with vmalloc(). This solution not ideal, though. On 32-bit systems,
8memory from vmalloc() must be mapped into a relatively small address space;
9it's easy to run out. On SMP systems, the page table changes required by
10vmalloc() allocations can require expensive cross-processor interrupts on
11all CPUs. And, on all systems, use of space in the vmalloc() range
12increases pressure on the translation lookaside buffer (TLB), reducing the
13performance of the system.
14
15In many cases, the need for memory from vmalloc() can be eliminated by
16piecing together an array from smaller parts; the flexible array library
17exists to make this task easier.
18
19A flexible array holds an arbitrary (within limits) number of fixed-sized
20objects, accessed via an integer index. Sparse arrays are handled
21reasonably well. Only single-page allocations are made, so memory
22allocation failures should be relatively rare. The down sides are that the
23arrays cannot be indexed directly, individual object size cannot exceed the
24system page size, and putting data into a flexible array requires a copy
25operation. It's also worth noting that flexible arrays do no internal
26locking at all; if concurrent access to an array is possible, then the
27caller must arrange for appropriate mutual exclusion.
28
29The creation of a flexible array is done with:
30
31 #include <linux/flex_array.h>
32
33 struct flex_array *flex_array_alloc(int element_size,
34 unsigned int total,
35 gfp_t flags);
36
37The individual object size is provided by element_size, while total is the
38maximum number of objects which can be stored in the array. The flags
39argument is passed directly to the internal memory allocation calls. With
40the current code, using flags to ask for high memory is likely to lead to
41notably unpleasant side effects.
42
43Storing data into a flexible array is accomplished with a call to:
44
45 int flex_array_put(struct flex_array *array, unsigned int element_nr,
46 void *src, gfp_t flags);
47
48This call will copy the data from src into the array, in the position
49indicated by element_nr (which must be less than the maximum specified when
50the array was created). If any memory allocations must be performed, flags
51will be used. The return value is zero on success, a negative error code
52otherwise.
53
54There might possibly be a need to store data into a flexible array while
55running in some sort of atomic context; in this situation, sleeping in the
56memory allocator would be a bad thing. That can be avoided by using
57GFP_ATOMIC for the flags value, but, often, there is a better way. The
58trick is to ensure that any needed memory allocations are done before
59entering atomic context, using:
60
61 int flex_array_prealloc(struct flex_array *array, unsigned int start,
62 unsigned int end, gfp_t flags);
63
64This function will ensure that memory for the elements indexed in the range
65defined by start and end has been allocated. Thereafter, a
66flex_array_put() call on an element in that range is guaranteed not to
67block.
68
69Getting data back out of the array is done with:
70
71 void *flex_array_get(struct flex_array *fa, unsigned int element_nr);
72
73The return value is a pointer to the data element, or NULL if that
74particular element has never been allocated.
75
76Note that it is possible to get back a valid pointer for an element which
77has never been stored in the array. Memory for array elements is allocated
78one page at a time; a single allocation could provide memory for several
79adjacent elements. The flexible array code does not know if a specific
80element has been written; it only knows if the associated memory is
81present. So a flex_array_get() call on an element which was never stored
82in the array has the potential to return a pointer to random data. If the
83caller does not have a separate way to know which elements were actually
84stored, it might be wise, at least, to add GFP_ZERO to the flags argument
85to ensure that all elements are zeroed.
86
87There is no way to remove a single element from the array. It is possible,
88though, to remove all elements with a call to:
89
90 void flex_array_free_parts(struct flex_array *array);
91
92This call frees all elements, but leaves the array itself in place.
93Freeing the entire array is done with:
94
95 void flex_array_free(struct flex_array *array);
96
97As of this writing, there are no users of flexible arrays in the mainline
98kernel. The functions described here are also not exported to modules;
99that will probably be fixed when somebody comes up with a need for it.
diff --git a/Documentation/hwmon/pcf8591 b/Documentation/hwmon/pcf8591
index 5628fcf4207f..e76a7892f68e 100644
--- a/Documentation/hwmon/pcf8591
+++ b/Documentation/hwmon/pcf8591
@@ -2,11 +2,11 @@ Kernel driver pcf8591
2===================== 2=====================
3 3
4Supported chips: 4Supported chips:
5 * Philips PCF8591 5 * Philips/NXP PCF8591
6 Prefix: 'pcf8591' 6 Prefix: 'pcf8591'
7 Addresses scanned: I2C 0x48 - 0x4f 7 Addresses scanned: I2C 0x48 - 0x4f
8 Datasheet: Publicly available at the Philips Semiconductor website 8 Datasheet: Publicly available at the NXP website
9 http://www.semiconductors.philips.com/pip/PCF8591P.html 9 http://www.nxp.com/pip/PCF8591_6.html
10 10
11Authors: 11Authors:
12 Aurelien Jarno <aurelien@aurel32.net> 12 Aurelien Jarno <aurelien@aurel32.net>
@@ -16,9 +16,10 @@ Authors:
16 16
17Description 17Description
18----------- 18-----------
19
19The PCF8591 is an 8-bit A/D and D/A converter (4 analog inputs and one 20The PCF8591 is an 8-bit A/D and D/A converter (4 analog inputs and one
20analog output) for the I2C bus produced by Philips Semiconductors. It 21analog output) for the I2C bus produced by Philips Semiconductors (now NXP).
21is designed to provide a byte I2C interface to up to 4 separate devices. 22It is designed to provide a byte I2C interface to up to 4 separate devices.
22 23
23The PCF8591 has 4 analog inputs programmable as single-ended or 24The PCF8591 has 4 analog inputs programmable as single-ended or
24differential inputs : 25differential inputs :
@@ -58,8 +59,8 @@ Accessing PCF8591 via /sys interface
58------------------------------------- 59-------------------------------------
59 60
60! Be careful ! 61! Be careful !
61The PCF8591 is plainly impossible to detect ! Stupid chip. 62The PCF8591 is plainly impossible to detect! Stupid chip.
62So every chip with address in the interval [48..4f] is 63So every chip with address in the interval [0x48..0x4f] is
63detected as PCF8591. If you have other chips in this address 64detected as PCF8591. If you have other chips in this address
64range, the workaround is to load this module after the one 65range, the workaround is to load this module after the one
65for your others chips. 66for your others chips.
@@ -67,19 +68,20 @@ for your others chips.
67On detection (i.e. insmod, modprobe et al.), directories are being 68On detection (i.e. insmod, modprobe et al.), directories are being
68created for each detected PCF8591: 69created for each detected PCF8591:
69 70
70/sys/bus/devices/<0>-<1>/ 71/sys/bus/i2c/devices/<0>-<1>/
71where <0> is the bus the chip was detected on (e. g. i2c-0) 72where <0> is the bus the chip was detected on (e. g. i2c-0)
72and <1> the chip address ([48..4f]) 73and <1> the chip address ([48..4f])
73 74
74Inside these directories, there are such files: 75Inside these directories, there are such files:
75in0, in1, in2, in3, out0_enable, out0_output, name 76in0_input, in1_input, in2_input, in3_input, out0_enable, out0_output, name
76 77
77Name contains chip name. 78Name contains chip name.
78 79
79The in0, in1, in2 and in3 files are RO. Reading gives the value of the 80The in0_input, in1_input, in2_input and in3_input files are RO. Reading gives
80corresponding channel. Depending on the current analog inputs configuration, 81the value of the corresponding channel. Depending on the current analog inputs
81files in2 and/or in3 do not exist. Values range are from 0 to 255 for single 82configuration, files in2_input and in3_input may not exist. Values range
82ended inputs and -128 to +127 for differential inputs (8-bit ADC). 83from 0 to 255 for single ended inputs and -128 to +127 for differential inputs
84(8-bit ADC).
83 85
84The out0_enable file is RW. Reading gives "1" for analog output enabled and 86The out0_enable file is RW. Reading gives "1" for analog output enabled and
85"0" for analog output disabled. Writing accepts "0" and "1" accordingly. 87"0" for analog output disabled. Writing accepts "0" and "1" accordingly.
diff --git a/Documentation/hwmon/tmp421 b/Documentation/hwmon/tmp421
new file mode 100644
index 000000000000..0cf07f824741
--- /dev/null
+++ b/Documentation/hwmon/tmp421
@@ -0,0 +1,36 @@
1Kernel driver tmp421
2====================
3
4Supported chips:
5 * Texas Instruments TMP421
6 Prefix: 'tmp421'
7 Addresses scanned: I2C 0x2a, 0x4c, 0x4d, 0x4e and 0x4f
8 Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp421.html
9 * Texas Instruments TMP422
10 Prefix: 'tmp422'
11 Addresses scanned: I2C 0x2a, 0x4c, 0x4d, 0x4e and 0x4f
12 Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp421.html
13 * Texas Instruments TMP423
14 Prefix: 'tmp423'
15 Addresses scanned: I2C 0x2a, 0x4c, 0x4d, 0x4e and 0x4f
16 Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp421.html
17
18Authors:
19 Andre Prendel <andre.prendel@gmx.de>
20
21Description
22-----------
23
24This driver implements support for Texas Instruments TMP421, TMP422
25and TMP423 temperature sensor chips. These chips implement one local
26and up to one (TMP421), up to two (TMP422) or up to three (TMP423)
27remote sensors. Temperature is measured in degrees Celsius. The chips
28are wired over I2C/SMBus and specified over a temperature range of -40
29to +125 degrees Celsius. Resolution for both the local and remote
30channels is 0.0625 degree C.
31
32The chips support only temperature measurement. The driver exports
33the temperature values via the following sysfs files:
34
35temp[1-4]_input
36temp[2-4]_fault
diff --git a/Documentation/hwmon/wm831x b/Documentation/hwmon/wm831x
new file mode 100644
index 000000000000..24f47d8f6a42
--- /dev/null
+++ b/Documentation/hwmon/wm831x
@@ -0,0 +1,37 @@
1Kernel driver wm831x-hwmon
2==========================
3
4Supported chips:
5 * Wolfson Microelectronics WM831x PMICs
6 Prefix: 'wm831x'
7 Datasheet:
8 http://www.wolfsonmicro.com/products/WM8310
9 http://www.wolfsonmicro.com/products/WM8311
10 http://www.wolfsonmicro.com/products/WM8312
11
12Authors: Mark Brown <broonie@opensource.wolfsonmicro.com>
13
14Description
15-----------
16
17The WM831x series of PMICs include an AUXADC which can be used to
18monitor a range of system operating parameters, including the voltages
19of the major supplies within the system. Currently the driver provides
20reporting of all the input values but does not provide any alarms.
21
22Voltage Monitoring
23------------------
24
25Voltages are sampled by a 12 bit ADC. Voltages in milivolts are 1.465
26times the ADC value.
27
28Temperature Monitoring
29----------------------
30
31Temperatures are sampled by a 12 bit ADC. Chip and battery temperatures
32are available. The chip temperature is calculated as:
33
34 Degrees celsius = (512.18 - data) / 1.0983
35
36while the battery temperature calculation will depend on the NTC
37thermistor component.
diff --git a/Documentation/hwmon/wm8350 b/Documentation/hwmon/wm8350
new file mode 100644
index 000000000000..98f923bd2e92
--- /dev/null
+++ b/Documentation/hwmon/wm8350
@@ -0,0 +1,26 @@
1Kernel driver wm8350-hwmon
2==========================
3
4Supported chips:
5 * Wolfson Microelectronics WM835x PMICs
6 Prefix: 'wm8350'
7 Datasheet:
8 http://www.wolfsonmicro.com/products/WM8350
9 http://www.wolfsonmicro.com/products/WM8351
10 http://www.wolfsonmicro.com/products/WM8352
11
12Authors: Mark Brown <broonie@opensource.wolfsonmicro.com>
13
14Description
15-----------
16
17The WM835x series of PMICs include an AUXADC which can be used to
18monitor a range of system operating parameters, including the voltages
19of the major supplies within the system. Currently the driver provides
20simple access to these major supplies.
21
22Voltage Monitoring
23------------------
24
25Voltages are sampled by a 12 bit ADC. For the internal supplies the ADC
26is referenced to the system VRTC.
diff --git a/Documentation/input/sentelic.txt b/Documentation/input/sentelic.txt
new file mode 100644
index 000000000000..f7160a2fb6a2
--- /dev/null
+++ b/Documentation/input/sentelic.txt
@@ -0,0 +1,475 @@
1Copyright (C) 2002-2008 Sentelic Corporation.
2Last update: Oct-31-2008
3
4==============================================================================
5* Finger Sensing Pad Intellimouse Mode(scrolling wheel, 4th and 5th buttons)
6==============================================================================
7A) MSID 4: Scrolling wheel mode plus Forward page(4th button) and Backward
8 page (5th button)
9@1. Set sample rate to 200;
10@2. Set sample rate to 200;
11@3. Set sample rate to 80;
12@4. Issuing the "Get device ID" command (0xF2) and waits for the response;
13@5. FSP will respond 0x04.
14
15Packet 1
16 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
17BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
18 1 |Y|X|y|x|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 | | |B|F|W|W|W|W|
19 |---------------| |---------------| |---------------| |---------------|
20
21Byte 1: Bit7 => Y overflow
22 Bit6 => X overflow
23 Bit5 => Y sign bit
24 Bit4 => X sign bit
25 Bit3 => 1
26 Bit2 => Middle Button, 1 is pressed, 0 is not pressed.
27 Bit1 => Right Button, 1 is pressed, 0 is not pressed.
28 Bit0 => Left Button, 1 is pressed, 0 is not pressed.
29Byte 2: X Movement(9-bit 2's complement integers)
30Byte 3: Y Movement(9-bit 2's complement integers)
31Byte 4: Bit3~Bit0 => the scrolling wheel's movement since the last data report.
32 valid values, -8 ~ +7
33 Bit4 => 1 = 4th mouse button is pressed, Forward one page.
34 0 = 4th mouse button is not pressed.
35 Bit5 => 1 = 5th mouse button is pressed, Backward one page.
36 0 = 5th mouse button is not pressed.
37
38B) MSID 6: Horizontal and Vertical scrolling.
39@ Set bit 1 in register 0x40 to 1
40
41# FSP replaces scrolling wheel's movement as 4 bits to show horizontal and
42 vertical scrolling.
43
44Packet 1
45 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
46BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
47 1 |Y|X|y|x|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 | | |B|F|l|r|u|d|
48 |---------------| |---------------| |---------------| |---------------|
49
50Byte 1: Bit7 => Y overflow
51 Bit6 => X overflow
52 Bit5 => Y sign bit
53 Bit4 => X sign bit
54 Bit3 => 1
55 Bit2 => Middle Button, 1 is pressed, 0 is not pressed.
56 Bit1 => Right Button, 1 is pressed, 0 is not pressed.
57 Bit0 => Left Button, 1 is pressed, 0 is not pressed.
58Byte 2: X Movement(9-bit 2's complement integers)
59Byte 3: Y Movement(9-bit 2's complement integers)
60Byte 4: Bit0 => the Vertical scrolling movement downward.
61 Bit1 => the Vertical scrolling movement upward.
62 Bit2 => the Vertical scrolling movement rightward.
63 Bit3 => the Vertical scrolling movement leftward.
64 Bit4 => 1 = 4th mouse button is pressed, Forward one page.
65 0 = 4th mouse button is not pressed.
66 Bit5 => 1 = 5th mouse button is pressed, Backward one page.
67 0 = 5th mouse button is not pressed.
68
69C) MSID 7:
70# FSP uses 2 packets(8 Bytes) data to represent Absolute Position
71 so we have PACKET NUMBER to identify packets.
72 If PACKET NUMBER is 0, the packet is Packet 1.
73 If PACKET NUMBER is 1, the packet is Packet 2.
74 Please count this number in program.
75
76# MSID6 special packet will be enable at the same time when enable MSID 7.
77
78==============================================================================
79* Absolute position for STL3886-G0.
80==============================================================================
81@ Set bit 2 or 3 in register 0x40 to 1
82@ Set bit 6 in register 0x40 to 1
83
84Packet 1 (ABSOLUTE POSITION)
85 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
86BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
87 1 |0|1|V|1|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |r|l|d|u|X|X|Y|Y|
88 |---------------| |---------------| |---------------| |---------------|
89
90Byte 1: Bit7~Bit6 => 00, Normal data packet
91 => 01, Absolute coordination packet
92 => 10, Notify packet
93 Bit5 => valid bit
94 Bit4 => 1
95 Bit3 => 1
96 Bit2 => Middle Button, 1 is pressed, 0 is not pressed.
97 Bit1 => Right Button, 1 is pressed, 0 is not pressed.
98 Bit0 => Left Button, 1 is pressed, 0 is not pressed.
99Byte 2: X coordinate (xpos[9:2])
100Byte 3: Y coordinate (ypos[9:2])
101Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0])
102 Bit3~Bit2 => X coordinate (ypos[1:0])
103 Bit4 => scroll up
104 Bit5 => scroll down
105 Bit6 => scroll left
106 Bit7 => scroll right
107
108Notify Packet for G0
109 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
110BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
111 1 |1|0|0|1|1|M|R|L| 2 |C|C|C|C|C|C|C|C| 3 |M|M|M|M|M|M|M|M| 4 |0|0|0|0|0|0|0|0|
112 |---------------| |---------------| |---------------| |---------------|
113
114Byte 1: Bit7~Bit6 => 00, Normal data packet
115 => 01, Absolute coordination packet
116 => 10, Notify packet
117 Bit5 => 0
118 Bit4 => 1
119 Bit3 => 1
120 Bit2 => Middle Button, 1 is pressed, 0 is not pressed.
121 Bit1 => Right Button, 1 is pressed, 0 is not pressed.
122 Bit0 => Left Button, 1 is pressed, 0 is not pressed.
123Byte 2: Message Type => 0x5A (Enable/Disable status packet)
124 Mode Type => 0xA5 (Normal/Icon mode status)
125Byte 3: Message Type => 0x00 (Disabled)
126 => 0x01 (Enabled)
127 Mode Type => 0x00 (Normal)
128 => 0x01 (Icon)
129Byte 4: Bit7~Bit0 => Don't Care
130
131==============================================================================
132* Absolute position for STL3888-A0.
133==============================================================================
134Packet 1 (ABSOLUTE POSITION)
135 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
136BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
137 1 |0|1|V|A|1|L|0|1| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |x|x|y|y|X|X|Y|Y|
138 |---------------| |---------------| |---------------| |---------------|
139
140Byte 1: Bit7~Bit6 => 00, Normal data packet
141 => 01, Absolute coordination packet
142 => 10, Notify packet
143 Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up.
144 When both fingers are up, the last two reports have zero valid
145 bit.
146 Bit4 => arc
147 Bit3 => 1
148 Bit2 => Left Button, 1 is pressed, 0 is released.
149 Bit1 => 0
150 Bit0 => 1
151Byte 2: X coordinate (xpos[9:2])
152Byte 3: Y coordinate (ypos[9:2])
153Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0])
154 Bit3~Bit2 => X coordinate (ypos[1:0])
155 Bit5~Bit4 => y1_g
156 Bit7~Bit6 => x1_g
157
158Packet 2 (ABSOLUTE POSITION)
159 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
160BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
161 1 |0|1|V|A|1|R|1|0| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |x|x|y|y|X|X|Y|Y|
162 |---------------| |---------------| |---------------| |---------------|
163
164Byte 1: Bit7~Bit6 => 00, Normal data packet
165 => 01, Absolute coordinates packet
166 => 10, Notify packet
167 Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up.
168 When both fingers are up, the last two reports have zero valid
169 bit.
170 Bit4 => arc
171 Bit3 => 1
172 Bit2 => Right Button, 1 is pressed, 0 is released.
173 Bit1 => 1
174 Bit0 => 0
175Byte 2: X coordinate (xpos[9:2])
176Byte 3: Y coordinate (ypos[9:2])
177Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0])
178 Bit3~Bit2 => X coordinate (ypos[1:0])
179 Bit5~Bit4 => y2_g
180 Bit7~Bit6 => x2_g
181
182Notify Packet for STL3888-A0
183 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
184BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
185 1 |1|0|1|P|1|M|R|L| 2 |C|C|C|C|C|C|C|C| 3 |0|0|F|F|0|0|0|i| 4 |r|l|d|u|0|0|0|0|
186 |---------------| |---------------| |---------------| |---------------|
187
188Byte 1: Bit7~Bit6 => 00, Normal data packet
189 => 01, Absolute coordination packet
190 => 10, Notify packet
191 Bit5 => 1
192 Bit4 => when in absolute coordinates mode (valid when EN_PKT_GO is 1):
193 0: left button is generated by the on-pad command
194 1: left button is generated by the external button
195 Bit3 => 1
196 Bit2 => Middle Button, 1 is pressed, 0 is not pressed.
197 Bit1 => Right Button, 1 is pressed, 0 is not pressed.
198 Bit0 => Left Button, 1 is pressed, 0 is not pressed.
199Byte 2: Message Type => 0xB7 (Multi Finger, Multi Coordinate mode)
200Byte 3: Bit7~Bit6 => Don't care
201 Bit5~Bit4 => Number of fingers
202 Bit3~Bit1 => Reserved
203 Bit0 => 1: enter gesture mode; 0: leaving gesture mode
204Byte 4: Bit7 => scroll right button
205 Bit6 => scroll left button
206 Bit5 => scroll down button
207 Bit4 => scroll up button
208 * Note that if gesture and additional button (Bit4~Bit7)
209 happen at the same time, the button information will not
210 be sent.
211 Bit3~Bit0 => Reserved
212
213Sample sequence of Multi-finger, Multi-coordinate mode:
214
215 notify packet (valid bit == 1), abs pkt 1, abs pkt 2, abs pkt 1,
216 abs pkt 2, ..., notify packet(valid bit == 0)
217
218==============================================================================
219* FSP Enable/Disable packet
220==============================================================================
221 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
222BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
223 1 |Y|X|0|0|1|M|R|L| 2 |0|1|0|1|1|0|1|E| 3 | | | | | | | | | 4 | | | | | | | | |
224 |---------------| |---------------| |---------------| |---------------|
225
226FSP will send out enable/disable packet when FSP receive PS/2 enable/disable
227command. Host will receive the packet which Middle, Right, Left button will
228be set. The packet only use byte 0 and byte 1 as a pattern of original packet.
229Ignore the other bytes of the packet.
230
231Byte 1: Bit7 => 0, Y overflow
232 Bit6 => 0, X overflow
233 Bit5 => 0, Y sign bit
234 Bit4 => 0, X sign bit
235 Bit3 => 1
236 Bit2 => 1, Middle Button
237 Bit1 => 1, Right Button
238 Bit0 => 1, Left Button
239Byte 2: Bit7~1 => (0101101b)
240 Bit0 => 1 = Enable
241 0 = Disable
242Byte 3: Don't care
243Byte 4: Don't care (MOUSE ID 3, 4)
244Byte 5~8: Don't care (Absolute packet)
245
246==============================================================================
247* PS/2 Command Set
248==============================================================================
249
250FSP supports basic PS/2 commanding set and modes, refer to following URL for
251details about PS/2 commands:
252
253http://www.computer-engineering.org/index.php?title=PS/2_Mouse_Interface
254
255==============================================================================
256* Programming Sequence for Determining Packet Parsing Flow
257==============================================================================
2581. Identify FSP by reading device ID(0x00) and version(0x01) register
259
2602. Determine number of buttons by reading status2 (0x0b) register
261
262 buttons = reg[0x0b] & 0x30
263
264 if buttons == 0x30 or buttons == 0x20:
265 # two/four buttons
266 Refer to 'Finger Sensing Pad PS/2 Mouse Intellimouse'
267 section A for packet parsing detail(ignore byte 4, bit ~ 7)
268 elif buttons == 0x10:
269 # 6 buttons
270 Refer to 'Finger Sensing Pad PS/2 Mouse Intellimouse'
271 section B for packet parsing detail
272 elif buttons == 0x00:
273 # 6 buttons
274 Refer to 'Finger Sensing Pad PS/2 Mouse Intellimouse'
275 section A for packet parsing detail
276
277==============================================================================
278* Programming Sequence for Register Reading/Writing
279==============================================================================
280
281Register inversion requirement:
282
283 Following values needed to be inverted(the '~' operator in C) before being
284sent to FSP:
285
286 0xe9, 0xee, 0xf2 and 0xff.
287
288Register swapping requirement:
289
290 Following values needed to have their higher 4 bits and lower 4 bits being
291swapped before being sent to FSP:
292
293 10, 20, 40, 60, 80, 100 and 200.
294
295Register reading sequence:
296
297 1. send 0xf3 PS/2 command to FSP;
298
299 2. send 0x66 PS/2 command to FSP;
300
301 3. send 0x88 PS/2 command to FSP;
302
303 4. send 0xf3 PS/2 command to FSP;
304
305 5. if the register address being to read is not required to be
306 inverted(refer to the 'Register inversion requirement' section),
307 goto step 6
308
309 5a. send 0x68 PS/2 command to FSP;
310
311 5b. send the inverted register address to FSP and goto step 8;
312
313 6. if the register address being to read is not required to be
314 swapped(refer to the 'Register swapping requirement' section),
315 goto step 7
316
317 6a. send 0xcc PS/2 command to FSP;
318
319 6b. send the swapped register address to FSP and goto step 8;
320
321 7. send 0x66 PS/2 command to FSP;
322
323 7a. send the original register address to FSP and goto step 8;
324
325 8. send 0xe9(status request) PS/2 command to FSP;
326
327 9. the response read from FSP should be the requested register value.
328
329Register writing sequence:
330
331 1. send 0xf3 PS/2 command to FSP;
332
333 2. if the register address being to write is not required to be
334 inverted(refer to the 'Register inversion requirement' section),
335 goto step 3
336
337 2a. send 0x74 PS/2 command to FSP;
338
339 2b. send the inverted register address to FSP and goto step 5;
340
341 3. if the register address being to write is not required to be
342 swapped(refer to the 'Register swapping requirement' section),
343 goto step 4
344
345 3a. send 0x77 PS/2 command to FSP;
346
347 3b. send the swapped register address to FSP and goto step 5;
348
349 4. send 0x55 PS/2 command to FSP;
350
351 4a. send the register address to FSP and goto step 5;
352
353 5. send 0xf3 PS/2 command to FSP;
354
355 6. if the register value being to write is not required to be
356 inverted(refer to the 'Register inversion requirement' section),
357 goto step 7
358
359 6a. send 0x47 PS/2 command to FSP;
360
361 6b. send the inverted register value to FSP and goto step 9;
362
363 7. if the register value being to write is not required to be
364 swapped(refer to the 'Register swapping requirement' section),
365 goto step 8
366
367 7a. send 0x44 PS/2 command to FSP;
368
369 7b. send the swapped register value to FSP and goto step 9;
370
371 8. send 0x33 PS/2 command to FSP;
372
373 8a. send the register value to FSP;
374
375 9. the register writing sequence is completed.
376
377==============================================================================
378* Register Listing
379==============================================================================
380
381offset width default r/w name
3820x00 bit7~bit0 0x01 RO device ID
383
3840x01 bit7~bit0 0xc0 RW version ID
385
3860x02 bit7~bit0 0x01 RO vendor ID
387
3880x03 bit7~bit0 0x01 RO product ID
389
3900x04 bit3~bit0 0x01 RW revision ID
391
3920x0b RO test mode status 1
393 bit3 1 RO 0: rotate 180 degree, 1: no rotation
394
395 bit5~bit4 RO number of buttons
396 11 => 2, lbtn/rbtn
397 10 => 4, lbtn/rbtn/scru/scrd
398 01 => 6, lbtn/rbtn/scru/scrd/scrl/scrr
399 00 => 6, lbtn/rbtn/scru/scrd/fbtn/bbtn
400
4010x0f RW register file page control
402 bit0 0 RW 1 to enable page 1 register files
403
4040x10 RW system control 1
405 bit0 1 RW Reserved, must be 1
406 bit1 0 RW Reserved, must be 0
407 bit4 1 RW Reserved, must be 0
408 bit5 0 RW register clock gating enable
409 0: read only, 1: read/write enable
410 (Note that following registers does not require clock gating being
411 enabled prior to write: 05 06 07 08 09 0c 0f 10 11 12 16 17 18 23 2e
412 40 41 42 43.)
413
4140x31 RW on-pad command detection
415 bit7 0 RW on-pad command left button down tag
416 enable
417 0: disable, 1: enable
418
4190x34 RW on-pad command control 5
420 bit4~bit0 0x05 RW XLO in 0s/4/1, so 03h = 0010.1b = 2.5
421 (Note that position unit is in 0.5 scanline)
422
423 bit7 0 RW on-pad tap zone enable
424 0: disable, 1: enable
425
4260x35 RW on-pad command control 6
427 bit4~bit0 0x1d RW XHI in 0s/4/1, so 19h = 1100.1b = 12.5
428 (Note that position unit is in 0.5 scanline)
429
4300x36 RW on-pad command control 7
431 bit4~bit0 0x04 RW YLO in 0s/4/1, so 03h = 0010.1b = 2.5
432 (Note that position unit is in 0.5 scanline)
433
4340x37 RW on-pad command control 8
435 bit4~bit0 0x13 RW YHI in 0s/4/1, so 11h = 1000.1b = 8.5
436 (Note that position unit is in 0.5 scanline)
437
4380x40 RW system control 5
439 bit1 0 RW FSP Intellimouse mode enable
440 0: disable, 1: enable
441
442 bit2 0 RW movement + abs. coordinate mode enable
443 0: disable, 1: enable
444 (Note that this function has the functionality of bit 1 even when
445 bit 1 is not set. However, the format is different from that of bit 1.
446 In addition, when bit 1 and bit 2 are set at the same time, bit 2 will
447 override bit 1.)
448
449 bit3 0 RW abs. coordinate only mode enable
450 0: disable, 1: enable
451 (Note that this function has the functionality of bit 1 even when
452 bit 1 is not set. However, the format is different from that of bit 1.
453 In addition, when bit 1, bit 2 and bit 3 are set at the same time,
454 bit 3 will override bit 1 and 2.)
455
456 bit5 0 RW auto switch enable
457 0: disable, 1: enable
458
459 bit6 0 RW G0 abs. + notify packet format enable
460 0: disable, 1: enable
461 (Note that the absolute/relative coordinate output still depends on
462 bit 2 and 3. That is, if any of those bit is 1, host will receive
463 absolute coordinates; otherwise, host only receives packets with
464 relative coordinate.)
465
4660x43 RW on-pad control
467 bit0 0 RW on-pad control enable
468 0: disable, 1: enable
469 (Note that if this bit is cleared, bit 3/5 will be ineffective)
470
471 bit3 0 RW on-pad fix vertical scrolling enable
472 0: disable, 1: enable
473
474 bit5 0 RW on-pad fix horizontal scrolling enable
475 0: disable, 1: enable
diff --git a/Documentation/intel_txt.txt b/Documentation/intel_txt.txt
new file mode 100644
index 000000000000..f40a1f030019
--- /dev/null
+++ b/Documentation/intel_txt.txt
@@ -0,0 +1,210 @@
1Intel(R) TXT Overview:
2=====================
3
4Intel's technology for safer computing, Intel(R) Trusted Execution
5Technology (Intel(R) TXT), defines platform-level enhancements that
6provide the building blocks for creating trusted platforms.
7
8Intel TXT was formerly known by the code name LaGrande Technology (LT).
9
10Intel TXT in Brief:
11o Provides dynamic root of trust for measurement (DRTM)
12o Data protection in case of improper shutdown
13o Measurement and verification of launched environment
14
15Intel TXT is part of the vPro(TM) brand and is also available some
16non-vPro systems. It is currently available on desktop systems
17based on the Q35, X38, Q45, and Q43 Express chipsets (e.g. Dell
18Optiplex 755, HP dc7800, etc.) and mobile systems based on the GM45,
19PM45, and GS45 Express chipsets.
20
21For more information, see http://www.intel.com/technology/security/.
22This site also has a link to the Intel TXT MLE Developers Manual,
23which has been updated for the new released platforms.
24
25Intel TXT has been presented at various events over the past few
26years, some of which are:
27 LinuxTAG 2008:
28 http://www.linuxtag.org/2008/en/conf/events/vp-donnerstag/
29 details.html?talkid=110
30 TRUST2008:
31 http://www.trust2008.eu/downloads/Keynote-Speakers/
32 3_David-Grawrock_The-Front-Door-of-Trusted-Computing.pdf
33 IDF 2008, Shanghai:
34 http://inteldeveloperforum.com.edgesuite.net/shanghai_2008/
35 aep/PROS003/index.html
36 IDFs 2006, 2007 (I'm not sure if/where they are online)
37
38Trusted Boot Project Overview:
39=============================
40
41Trusted Boot (tboot) is an open source, pre- kernel/VMM module that
42uses Intel TXT to perform a measured and verified launch of an OS
43kernel/VMM.
44
45It is hosted on SourceForge at http://sourceforge.net/projects/tboot.
46The mercurial source repo is available at http://www.bughost.org/
47repos.hg/tboot.hg.
48
49Tboot currently supports launching Xen (open source VMM/hypervisor
50w/ TXT support since v3.2), and now Linux kernels.
51
52
53Value Proposition for Linux or "Why should you care?"
54=====================================================
55
56While there are many products and technologies that attempt to
57measure or protect the integrity of a running kernel, they all
58assume the kernel is "good" to begin with. The Integrity
59Measurement Architecture (IMA) and Linux Integrity Module interface
60are examples of such solutions.
61
62To get trust in the initial kernel without using Intel TXT, a
63static root of trust must be used. This bases trust in BIOS
64starting at system reset and requires measurement of all code
65executed between system reset through the completion of the kernel
66boot as well as data objects used by that code. In the case of a
67Linux kernel, this means all of BIOS, any option ROMs, the
68bootloader and the boot config. In practice, this is a lot of
69code/data, much of which is subject to change from boot to boot
70(e.g. changing NICs may change option ROMs). Without reference
71hashes, these measurement changes are difficult to assess or
72confirm as benign. This process also does not provide DMA
73protection, memory configuration/alias checks and locks, crash
74protection, or policy support.
75
76By using the hardware-based root of trust that Intel TXT provides,
77many of these issues can be mitigated. Specifically: many
78pre-launch components can be removed from the trust chain, DMA
79protection is provided to all launched components, a large number
80of platform configuration checks are performed and values locked,
81protection is provided for any data in the event of an improper
82shutdown, and there is support for policy-based execution/verification.
83This provides a more stable measurement and a higher assurance of
84system configuration and initial state than would be otherwise
85possible. Since the tboot project is open source, source code for
86almost all parts of the trust chain is available (excepting SMM and
87Intel-provided firmware).
88
89How Does it Work?
90=================
91
92o Tboot is an executable that is launched by the bootloader as
93 the "kernel" (the binary the bootloader executes).
94o It performs all of the work necessary to determine if the
95 platform supports Intel TXT and, if so, executes the GETSEC[SENTER]
96 processor instruction that initiates the dynamic root of trust.
97 - If tboot determines that the system does not support Intel TXT
98 or is not configured correctly (e.g. the SINIT AC Module was
99 incorrect), it will directly launch the kernel with no changes
100 to any state.
101 - Tboot will output various information about its progress to the
102 terminal, serial port, and/or an in-memory log; the output
103 locations can be configured with a command line switch.
104o The GETSEC[SENTER] instruction will return control to tboot and
105 tboot then verifies certain aspects of the environment (e.g. TPM NV
106 lock, e820 table does not have invalid entries, etc.).
107o It will wake the APs from the special sleep state the GETSEC[SENTER]
108 instruction had put them in and place them into a wait-for-SIPI
109 state.
110 - Because the processors will not respond to an INIT or SIPI when
111 in the TXT environment, it is necessary to create a small VT-x
112 guest for the APs. When they run in this guest, they will
113 simply wait for the INIT-SIPI-SIPI sequence, which will cause
114 VMEXITs, and then disable VT and jump to the SIPI vector. This
115 approach seemed like a better choice than having to insert
116 special code into the kernel's MP wakeup sequence.
117o Tboot then applies an (optional) user-defined launch policy to
118 verify the kernel and initrd.
119 - This policy is rooted in TPM NV and is described in the tboot
120 project. The tboot project also contains code for tools to
121 create and provision the policy.
122 - Policies are completely under user control and if not present
123 then any kernel will be launched.
124 - Policy action is flexible and can include halting on failures
125 or simply logging them and continuing.
126o Tboot adjusts the e820 table provided by the bootloader to reserve
127 its own location in memory as well as to reserve certain other
128 TXT-related regions.
129o As part of it's launch, tboot DMA protects all of RAM (using the
130 VT-d PMRs). Thus, the kernel must be booted with 'intel_iommu=on'
131 in order to remove this blanket protection and use VT-d's
132 page-level protection.
133o Tboot will populate a shared page with some data about itself and
134 pass this to the Linux kernel as it transfers control.
135 - The location of the shared page is passed via the boot_params
136 struct as a physical address.
137o The kernel will look for the tboot shared page address and, if it
138 exists, map it.
139o As one of the checks/protections provided by TXT, it makes a copy
140 of the VT-d DMARs in a DMA-protected region of memory and verifies
141 them for correctness. The VT-d code will detect if the kernel was
142 launched with tboot and use this copy instead of the one in the
143 ACPI table.
144o At this point, tboot and TXT are out of the picture until a
145 shutdown (S<n>)
146o In order to put a system into any of the sleep states after a TXT
147 launch, TXT must first be exited. This is to prevent attacks that
148 attempt to crash the system to gain control on reboot and steal
149 data left in memory.
150 - The kernel will perform all of its sleep preparation and
151 populate the shared page with the ACPI data needed to put the
152 platform in the desired sleep state.
153 - Then the kernel jumps into tboot via the vector specified in the
154 shared page.
155 - Tboot will clean up the environment and disable TXT, then use the
156 kernel-provided ACPI information to actually place the platform
157 into the desired sleep state.
158 - In the case of S3, tboot will also register itself as the resume
159 vector. This is necessary because it must re-establish the
160 measured environment upon resume. Once the TXT environment
161 has been restored, it will restore the TPM PCRs and then
162 transfer control back to the kernel's S3 resume vector.
163 In order to preserve system integrity across S3, the kernel
164 provides tboot with a set of memory ranges (kernel
165 code/data/bss, S3 resume code, and AP trampoline) that tboot
166 will calculate a MAC (message authentication code) over and then
167 seal with the TPM. On resume and once the measured environment
168 has been re-established, tboot will re-calculate the MAC and
169 verify it against the sealed value. Tboot's policy determines
170 what happens if the verification fails.
171
172That's pretty much it for TXT support.
173
174
175Configuring the System:
176======================
177
178This code works with 32bit, 32bit PAE, and 64bit (x86_64) kernels.
179
180In BIOS, the user must enable: TPM, TXT, VT-x, VT-d. Not all BIOSes
181allow these to be individually enabled/disabled and the screens in
182which to find them are BIOS-specific.
183
184grub.conf needs to be modified as follows:
185 title Linux 2.6.29-tip w/ tboot
186 root (hd0,0)
187 kernel /tboot.gz logging=serial,vga,memory
188 module /vmlinuz-2.6.29-tip intel_iommu=on ro
189 root=LABEL=/ rhgb console=ttyS0,115200 3
190 module /initrd-2.6.29-tip.img
191 module /Q35_SINIT_17.BIN
192
193The kernel option for enabling Intel TXT support is found under the
194Security top-level menu and is called "Enable Intel(R) Trusted
195Execution Technology (TXT)". It is marked as EXPERIMENTAL and
196depends on the generic x86 support (to allow maximum flexibility in
197kernel build options), since the tboot code will detect whether the
198platform actually supports Intel TXT and thus whether any of the
199kernel code is executed.
200
201The Q35_SINIT_17.BIN file is what Intel TXT refers to as an
202Authenticated Code Module. It is specific to the chipset in the
203system and can also be found on the Trusted Boot site. It is an
204(unencrypted) module signed by Intel that is used as part of the
205DRTM process to verify and configure the system. It is signed
206because it operates at a higher privilege level in the system than
207any other macrocode and its correct operation is critical to the
208establishment of the DRTM. The process for determining the correct
209SINIT ACM for a system is documented in the SINIT-guide.txt file
210that is on the tboot SourceForge site under the SINIT ACM downloads.
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index dbea4f95fc85..aafca0a8f66a 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -121,6 +121,7 @@ Code Seq# Include File Comments
121'c' 00-7F linux/comstats.h conflict! 121'c' 00-7F linux/comstats.h conflict!
122'c' 00-7F linux/coda.h conflict! 122'c' 00-7F linux/coda.h conflict!
123'c' 80-9F arch/s390/include/asm/chsc.h 123'c' 80-9F arch/s390/include/asm/chsc.h
124'c' A0-AF arch/x86/include/asm/msr.h
124'd' 00-FF linux/char/drm/drm/h conflict! 125'd' 00-FF linux/char/drm/drm/h conflict!
125'd' F0-FF linux/digi1.h 126'd' F0-FF linux/digi1.h
126'e' all linux/digi1.h conflict! 127'e' all linux/digi1.h conflict!
@@ -192,7 +193,7 @@ Code Seq# Include File Comments
1920xAD 00 Netfilter device in development: 1930xAD 00 Netfilter device in development:
193 <mailto:rusty@rustcorp.com.au> 194 <mailto:rusty@rustcorp.com.au>
1940xAE all linux/kvm.h Kernel-based Virtual Machine 1950xAE all linux/kvm.h Kernel-based Virtual Machine
195 <mailto:kvm-devel@lists.sourceforge.net> 196 <mailto:kvm@vger.kernel.org>
1960xB0 all RATIO devices in development: 1970xB0 all RATIO devices in development:
197 <mailto:vgo@ratio.de> 198 <mailto:vgo@ratio.de>
1980xB1 00-1F PPPoX <mailto:mostrows@styx.uwaterloo.ca> 1990xB1 00-1F PPPoX <mailto:mostrows@styx.uwaterloo.ca>
diff --git a/Documentation/kernel-doc-nano-HOWTO.txt b/Documentation/kernel-doc-nano-HOWTO.txt
index 4d04572b6549..348b9e5e28fc 100644
--- a/Documentation/kernel-doc-nano-HOWTO.txt
+++ b/Documentation/kernel-doc-nano-HOWTO.txt
@@ -66,7 +66,9 @@ Example kernel-doc function comment:
66 * The longer description can have multiple paragraphs. 66 * The longer description can have multiple paragraphs.
67 */ 67 */
68 68
69The first line, with the short description, must be on a single line. 69The short description following the subject can span multiple lines
70and ends with an @argument description, an empty line or the end of
71the comment block.
70 72
71The @argument descriptions must begin on the very next line following 73The @argument descriptions must begin on the very next line following
72this opening short function description line, with no intervening 74this opening short function description line, with no intervening
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 7936b801fe6a..0f17d16dc101 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -57,6 +57,7 @@ parameter is applicable:
57 ISAPNP ISA PnP code is enabled. 57 ISAPNP ISA PnP code is enabled.
58 ISDN Appropriate ISDN support is enabled. 58 ISDN Appropriate ISDN support is enabled.
59 JOY Appropriate joystick support is enabled. 59 JOY Appropriate joystick support is enabled.
60 KVM Kernel Virtual Machine support is enabled.
60 LIBATA Libata driver is enabled 61 LIBATA Libata driver is enabled
61 LP Printer support is enabled. 62 LP Printer support is enabled.
62 LOOP Loopback device support is enabled. 63 LOOP Loopback device support is enabled.
@@ -1098,6 +1099,44 @@ and is between 256 and 4096 characters. It is defined in the file
1098 kstack=N [X86] Print N words from the kernel stack 1099 kstack=N [X86] Print N words from the kernel stack
1099 in oops dumps. 1100 in oops dumps.
1100 1101
1102 kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
1103 Default is 0 (don't ignore, but inject #GP)
1104
1105 kvm.oos_shadow= [KVM] Disable out-of-sync shadow paging.
1106 Default is 1 (enabled)
1107
1108 kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM.
1109 Default is 0 (off)
1110
1111 kvm-amd.npt= [KVM,AMD] Disable nested paging (virtualized MMU)
1112 for all guests.
1113 Default is 1 (enabled) if in 64bit or 32bit-PAE mode
1114
1115 kvm-intel.bypass_guest_pf=
1116 [KVM,Intel] Disables bypassing of guest page faults
1117 on Intel chips. Default is 1 (enabled)
1118
1119 kvm-intel.ept= [KVM,Intel] Disable extended page tables
1120 (virtualized MMU) support on capable Intel chips.
1121 Default is 1 (enabled)
1122
1123 kvm-intel.emulate_invalid_guest_state=
1124 [KVM,Intel] Enable emulation of invalid guest states
1125 Default is 0 (disabled)
1126
1127 kvm-intel.flexpriority=
1128 [KVM,Intel] Disable FlexPriority feature (TPR shadow).
1129 Default is 1 (enabled)
1130
1131 kvm-intel.unrestricted_guest=
1132 [KVM,Intel] Disable unrestricted guest feature
1133 (virtualized real and unpaged mode) on capable
1134 Intel chips. Default is 1 (enabled)
1135
1136 kvm-intel.vpid= [KVM,Intel] Disable Virtual Processor Identification
1137 feature (tagged TLBs) on capable Intel chips.
1138 Default is 1 (enabled)
1139
1101 l2cr= [PPC] 1140 l2cr= [PPC]
1102 1141
1103 l3cr= [PPC] 1142 l3cr= [PPC]
@@ -1247,6 +1286,10 @@ and is between 256 and 4096 characters. It is defined in the file
1247 (machvec) in a generic kernel. 1286 (machvec) in a generic kernel.
1248 Example: machvec=hpzx1_swiotlb 1287 Example: machvec=hpzx1_swiotlb
1249 1288
1289 machtype= [Loongson] Share the same kernel image file between different
1290 yeeloong laptop.
1291 Example: machtype=lemote-yeeloong-2f-7inch
1292
1250 max_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory greater 1293 max_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory greater
1251 than or equal to this physical address is ignored. 1294 than or equal to this physical address is ignored.
1252 1295
@@ -1503,6 +1546,14 @@ and is between 256 and 4096 characters. It is defined in the file
1503 [NFS] set the TCP port on which the NFSv4 callback 1546 [NFS] set the TCP port on which the NFSv4 callback
1504 channel should listen. 1547 channel should listen.
1505 1548
1549 nfs.cache_getent=
1550 [NFS] sets the pathname to the program which is used
1551 to update the NFS client cache entries.
1552
1553 nfs.cache_getent_timeout=
1554 [NFS] sets the timeout after which an attempt to
1555 update a cache entry is deemed to have failed.
1556
1506 nfs.idmap_cache_timeout= 1557 nfs.idmap_cache_timeout=
1507 [NFS] set the maximum lifetime for idmapper cache 1558 [NFS] set the maximum lifetime for idmapper cache
1508 entries. 1559 entries.
@@ -1514,7 +1565,7 @@ and is between 256 and 4096 characters. It is defined in the file
1514 of returning the full 64-bit number. 1565 of returning the full 64-bit number.
1515 The default is to return 64-bit inode numbers. 1566 The default is to return 64-bit inode numbers.
1516 1567
1517 nmi_debug= [KNL,AVR32] Specify one or more actions to take 1568 nmi_debug= [KNL,AVR32,SH] Specify one or more actions to take
1518 when a NMI is triggered. 1569 when a NMI is triggered.
1519 Format: [state][,regs][,debounce][,die] 1570 Format: [state][,regs][,debounce][,die]
1520 1571
@@ -1535,6 +1586,11 @@ and is between 256 and 4096 characters. It is defined in the file
1535 symbolic names: lapic and ioapic 1586 symbolic names: lapic and ioapic
1536 Example: nmi_watchdog=2 or nmi_watchdog=panic,lapic 1587 Example: nmi_watchdog=2 or nmi_watchdog=panic,lapic
1537 1588
1589 netpoll.carrier_timeout=
1590 [NET] Specifies amount of time (in seconds) that
1591 netpoll should wait for a carrier. By default netpoll
1592 waits 4 seconds.
1593
1538 no387 [BUGS=X86-32] Tells the kernel to use the 387 maths 1594 no387 [BUGS=X86-32] Tells the kernel to use the 387 maths
1539 emulation library even if a 387 maths coprocessor 1595 emulation library even if a 387 maths coprocessor
1540 is present. 1596 is present.
@@ -1919,11 +1975,12 @@ and is between 256 and 4096 characters. It is defined in the file
1919 Format: { 0 | 1 } 1975 Format: { 0 | 1 }
1920 See arch/parisc/kernel/pdc_chassis.c 1976 See arch/parisc/kernel/pdc_chassis.c
1921 1977
1922 percpu_alloc= [X86] Select which percpu first chunk allocator to use. 1978 percpu_alloc= Select which percpu first chunk allocator to use.
1923 Allowed values are one of "lpage", "embed" and "4k". 1979 Currently supported values are "embed" and "page".
1924 See comments in arch/x86/kernel/setup_percpu.c for 1980 Archs may support subset or none of the selections.
1925 details on each allocator. This parameter is primarily 1981 See comments in mm/percpu.c for details on each
1926 for debugging and performance comparison. 1982 allocator. This parameter is primarily for debugging
1983 and performance comparison.
1927 1984
1928 pf. [PARIDE] 1985 pf. [PARIDE]
1929 See Documentation/blockdev/paride.txt. 1986 See Documentation/blockdev/paride.txt.
@@ -2395,6 +2452,18 @@ and is between 256 and 4096 characters. It is defined in the file
2395 stifb= [HW] 2452 stifb= [HW]
2396 Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]] 2453 Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]]
2397 2454
2455 sunrpc.min_resvport=
2456 sunrpc.max_resvport=
2457 [NFS,SUNRPC]
2458 SunRPC servers often require that client requests
2459 originate from a privileged port (i.e. a port in the
2460 range 0 < portnr < 1024).
2461 An administrator who wishes to reserve some of these
2462 ports for other uses may adjust the range that the
2463 kernel's sunrpc client considers to be privileged
2464 using these two parameters to set the minimum and
2465 maximum port values.
2466
2398 sunrpc.pool_mode= 2467 sunrpc.pool_mode=
2399 [NFS] 2468 [NFS]
2400 Control how the NFS server code allocates CPUs to 2469 Control how the NFS server code allocates CPUs to
@@ -2411,6 +2480,15 @@ and is between 256 and 4096 characters. It is defined in the file
2411 pernode one pool for each NUMA node (equivalent 2480 pernode one pool for each NUMA node (equivalent
2412 to global on non-NUMA machines) 2481 to global on non-NUMA machines)
2413 2482
2483 sunrpc.tcp_slot_table_entries=
2484 sunrpc.udp_slot_table_entries=
2485 [NFS,SUNRPC]
2486 Sets the upper limit on the number of simultaneous
2487 RPC calls that can be sent from the client to a
2488 server. Increasing these values may allow you to
2489 improve throughput, but will also increase the
2490 amount of memory reserved for use by the client.
2491
2414 swiotlb= [IA-64] Number of I/O TLB slabs 2492 swiotlb= [IA-64] Number of I/O TLB slabs
2415 2493
2416 switches= [HW,M68k] 2494 switches= [HW,M68k]
@@ -2480,6 +2558,11 @@ and is between 256 and 4096 characters. It is defined in the file
2480 trace_buf_size=nn[KMG] 2558 trace_buf_size=nn[KMG]
2481 [FTRACE] will set tracing buffer size. 2559 [FTRACE] will set tracing buffer size.
2482 2560
2561 trace_event=[event-list]
2562 [FTRACE] Set and start specified trace events in order
2563 to facilitate early boot debugging.
2564 See also Documentation/trace/events.txt
2565
2483 trix= [HW,OSS] MediaTrix AudioTrix Pro 2566 trix= [HW,OSS] MediaTrix AudioTrix Pro
2484 Format: 2567 Format:
2485 <io>,<irq>,<dma>,<dma2>,<sb_io>,<sb_irq>,<sb_dma>,<mpu_io>,<mpu_irq> 2568 <io>,<irq>,<dma>,<dma2>,<sb_io>,<sb_irq>,<sb_dma>,<mpu_io>,<mpu_irq>
diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index b56aacc1fff8..e4dbbdb1bd96 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -26,7 +26,7 @@ This document has the following sections:
26 - Notes on accessing payload contents 26 - Notes on accessing payload contents
27 - Defining a key type 27 - Defining a key type
28 - Request-key callback service 28 - Request-key callback service
29 - Key access filesystem 29 - Garbage collection
30 30
31 31
32============ 32============
@@ -113,6 +113,9 @@ Each key has a number of attributes:
113 113
114 (*) Dead. The key's type was unregistered, and so the key is now useless. 114 (*) Dead. The key's type was unregistered, and so the key is now useless.
115 115
116Keys in the last three states are subject to garbage collection. See the
117section on "Garbage collection".
118
116 119
117==================== 120====================
118KEY SERVICE OVERVIEW 121KEY SERVICE OVERVIEW
@@ -754,6 +757,26 @@ The keyctl syscall functions are:
754 successful. 757 successful.
755 758
756 759
760 (*) Install the calling process's session keyring on its parent.
761
762 long keyctl(KEYCTL_SESSION_TO_PARENT);
763
764 This functions attempts to install the calling process's session keyring
765 on to the calling process's parent, replacing the parent's current session
766 keyring.
767
768 The calling process must have the same ownership as its parent, the
769 keyring must have the same ownership as the calling process, the calling
770 process must have LINK permission on the keyring and the active LSM module
771 mustn't deny permission, otherwise error EPERM will be returned.
772
773 Error ENOMEM will be returned if there was insufficient memory to complete
774 the operation, otherwise 0 will be returned to indicate success.
775
776 The keyring will be replaced next time the parent process leaves the
777 kernel and resumes executing userspace.
778
779
757=============== 780===============
758KERNEL SERVICES 781KERNEL SERVICES
759=============== 782===============
@@ -1231,3 +1254,17 @@ by executing:
1231 1254
1232In this case, the program isn't required to actually attach the key to a ring; 1255In this case, the program isn't required to actually attach the key to a ring;
1233the rings are provided for reference. 1256the rings are provided for reference.
1257
1258
1259==================
1260GARBAGE COLLECTION
1261==================
1262
1263Dead keys (for which the type has been removed) will be automatically unlinked
1264from those keyrings that point to them and deleted as soon as possible by a
1265background garbage collector.
1266
1267Similarly, revoked and expired keys will be garbage collected, but only after a
1268certain amount of time has passed. This time is set as a number of seconds in:
1269
1270 /proc/sys/kernel/keys/gc_delay
diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt
index 89068030b01b..34f6638aa5ac 100644
--- a/Documentation/kmemleak.txt
+++ b/Documentation/kmemleak.txt
@@ -27,6 +27,13 @@ To trigger an intermediate memory scan:
27 27
28 # echo scan > /sys/kernel/debug/kmemleak 28 # echo scan > /sys/kernel/debug/kmemleak
29 29
30To clear the list of all current possible memory leaks:
31
32 # echo clear > /sys/kernel/debug/kmemleak
33
34New leaks will then come up upon reading /sys/kernel/debug/kmemleak
35again.
36
30Note that the orphan objects are listed in the order they were allocated 37Note that the orphan objects are listed in the order they were allocated
31and one object at the beginning of the list may cause other subsequent 38and one object at the beginning of the list may cause other subsequent
32objects to be reported as orphan. 39objects to be reported as orphan.
@@ -42,6 +49,9 @@ Memory scanning parameters can be modified at run-time by writing to the
42 scan=<secs> - set the automatic memory scanning period in seconds 49 scan=<secs> - set the automatic memory scanning period in seconds
43 (default 600, 0 to stop the automatic scanning) 50 (default 600, 0 to stop the automatic scanning)
44 scan - trigger a memory scan 51 scan - trigger a memory scan
52 clear - clear list of current memory leak suspects, done by
53 marking all current reported unreferenced objects grey
54 dump=<addr> - dump information about the object found at <addr>
45 55
46Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on 56Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on
47the kernel command line. 57the kernel command line.
@@ -86,6 +96,27 @@ avoid this, kmemleak can also store the number of values pointing to an
86address inside the block address range that need to be found so that the 96address inside the block address range that need to be found so that the
87block is not considered a leak. One example is __vmalloc(). 97block is not considered a leak. One example is __vmalloc().
88 98
99Testing specific sections with kmemleak
100---------------------------------------
101
102Upon initial bootup your /sys/kernel/debug/kmemleak output page may be
103quite extensive. This can also be the case if you have very buggy code
104when doing development. To work around these situations you can use the
105'clear' command to clear all reported unreferenced objects from the
106/sys/kernel/debug/kmemleak output. By issuing a 'scan' after a 'clear'
107you can find new unreferenced objects; this should help with testing
108specific sections of code.
109
110To test a critical section on demand with a clean kmemleak do:
111
112 # echo clear > /sys/kernel/debug/kmemleak
113 ... test your kernel or modules ...
114 # echo scan > /sys/kernel/debug/kmemleak
115
116Then as usual to get your report with:
117
118 # cat /sys/kernel/debug/kmemleak
119
89Kmemleak API 120Kmemleak API
90------------ 121------------
91 122
diff --git a/Documentation/kref.txt b/Documentation/kref.txt
index 130b6e87aa7e..ae203f91ee9b 100644
--- a/Documentation/kref.txt
+++ b/Documentation/kref.txt
@@ -84,7 +84,6 @@ int my_data_handler(void)
84 task = kthread_run(more_data_handling, data, "more_data_handling"); 84 task = kthread_run(more_data_handling, data, "more_data_handling");
85 if (task == ERR_PTR(-ENOMEM)) { 85 if (task == ERR_PTR(-ENOMEM)) {
86 rv = -ENOMEM; 86 rv = -ENOMEM;
87 kref_put(&data->refcount, data_release);
88 goto out; 87 goto out;
89 } 88 }
90 89
diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
new file mode 100644
index 000000000000..5a4bc8cf6d04
--- /dev/null
+++ b/Documentation/kvm/api.txt
@@ -0,0 +1,759 @@
1The Definitive KVM (Kernel-based Virtual Machine) API Documentation
2===================================================================
3
41. General description
5
6The kvm API is a set of ioctls that are issued to control various aspects
7of a virtual machine. The ioctls belong to three classes
8
9 - System ioctls: These query and set global attributes which affect the
10 whole kvm subsystem. In addition a system ioctl is used to create
11 virtual machines
12
13 - VM ioctls: These query and set attributes that affect an entire virtual
14 machine, for example memory layout. In addition a VM ioctl is used to
15 create virtual cpus (vcpus).
16
17 Only run VM ioctls from the same process (address space) that was used
18 to create the VM.
19
20 - vcpu ioctls: These query and set attributes that control the operation
21 of a single virtual cpu.
22
23 Only run vcpu ioctls from the same thread that was used to create the
24 vcpu.
25
262. File descritpors
27
28The kvm API is centered around file descriptors. An initial
29open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
30can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
31handle will create a VM file descripror which can be used to issue VM
32ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu
33and return a file descriptor pointing to it. Finally, ioctls on a vcpu
34fd can be used to control the vcpu, including the important task of
35actually running guest code.
36
37In general file descriptors can be migrated among processes by means
38of fork() and the SCM_RIGHTS facility of unix domain socket. These
39kinds of tricks are explicitly not supported by kvm. While they will
40not cause harm to the host, their actual behavior is not guaranteed by
41the API. The only supported use is one virtual machine per process,
42and one vcpu per thread.
43
443. Extensions
45
46As of Linux 2.6.22, the KVM ABI has been stabilized: no backward
47incompatible change are allowed. However, there is an extension
48facility that allows backward-compatible extensions to the API to be
49queried and used.
50
51The extension mechanism is not based on on the Linux version number.
52Instead, kvm defines extension identifiers and a facility to query
53whether a particular extension identifier is available. If it is, a
54set of ioctls is available for application use.
55
564. API description
57
58This section describes ioctls that can be used to control kvm guests.
59For each ioctl, the following information is provided along with a
60description:
61
62 Capability: which KVM extension provides this ioctl. Can be 'basic',
63 which means that is will be provided by any kernel that supports
64 API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which
65 means availability needs to be checked with KVM_CHECK_EXTENSION
66 (see section 4.4).
67
68 Architectures: which instruction set architectures provide this ioctl.
69 x86 includes both i386 and x86_64.
70
71 Type: system, vm, or vcpu.
72
73 Parameters: what parameters are accepted by the ioctl.
74
75 Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
76 are not detailed, but errors with specific meanings are.
77
784.1 KVM_GET_API_VERSION
79
80Capability: basic
81Architectures: all
82Type: system ioctl
83Parameters: none
84Returns: the constant KVM_API_VERSION (=12)
85
86This identifies the API version as the stable kvm API. It is not
87expected that this number will change. However, Linux 2.6.20 and
882.6.21 report earlier versions; these are not documented and not
89supported. Applications should refuse to run if KVM_GET_API_VERSION
90returns a value other than 12. If this check passes, all ioctls
91described as 'basic' will be available.
92
934.2 KVM_CREATE_VM
94
95Capability: basic
96Architectures: all
97Type: system ioctl
98Parameters: none
99Returns: a VM fd that can be used to control the new virtual machine.
100
101The new VM has no virtual cpus and no memory. An mmap() of a VM fd
102will access the virtual machine's physical address space; offset zero
103corresponds to guest physical address zero. Use of mmap() on a VM fd
104is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is
105available.
106
1074.3 KVM_GET_MSR_INDEX_LIST
108
109Capability: basic
110Architectures: x86
111Type: system
112Parameters: struct kvm_msr_list (in/out)
113Returns: 0 on success; -1 on error
114Errors:
115 E2BIG: the msr index list is to be to fit in the array specified by
116 the user.
117
118struct kvm_msr_list {
119 __u32 nmsrs; /* number of msrs in entries */
120 __u32 indices[0];
121};
122
123This ioctl returns the guest msrs that are supported. The list varies
124by kvm version and host processor, but does not change otherwise. The
125user fills in the size of the indices array in nmsrs, and in return
126kvm adjusts nmsrs to reflect the actual number of msrs and fills in
127the indices array with their numbers.
128
1294.4 KVM_CHECK_EXTENSION
130
131Capability: basic
132Architectures: all
133Type: system ioctl
134Parameters: extension identifier (KVM_CAP_*)
135Returns: 0 if unsupported; 1 (or some other positive integer) if supported
136
137The API allows the application to query about extensions to the core
138kvm API. Userspace passes an extension identifier (an integer) and
139receives an integer that describes the extension availability.
140Generally 0 means no and 1 means yes, but some extensions may report
141additional information in the integer return value.
142
1434.5 KVM_GET_VCPU_MMAP_SIZE
144
145Capability: basic
146Architectures: all
147Type: system ioctl
148Parameters: none
149Returns: size of vcpu mmap area, in bytes
150
151The KVM_RUN ioctl (cf.) communicates with userspace via a shared
152memory region. This ioctl returns the size of that region. See the
153KVM_RUN documentation for details.
154
1554.6 KVM_SET_MEMORY_REGION
156
157Capability: basic
158Architectures: all
159Type: vm ioctl
160Parameters: struct kvm_memory_region (in)
161Returns: 0 on success, -1 on error
162
163struct kvm_memory_region {
164 __u32 slot;
165 __u32 flags;
166 __u64 guest_phys_addr;
167 __u64 memory_size; /* bytes */
168};
169
170/* for kvm_memory_region::flags */
171#define KVM_MEM_LOG_DIRTY_PAGES 1UL
172
173This ioctl allows the user to create or modify a guest physical memory
174slot. When changing an existing slot, it may be moved in the guest
175physical memory space, or its flags may be modified. It may not be
176resized. Slots may not overlap.
177
178The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
179instructs kvm to keep track of writes to memory within the slot. See
180the KVM_GET_DIRTY_LOG ioctl.
181
182It is recommended to use the KVM_SET_USER_MEMORY_REGION ioctl instead
183of this API, if available. This newer API allows placing guest memory
184at specified locations in the host address space, yielding better
185control and easy access.
186
1874.6 KVM_CREATE_VCPU
188
189Capability: basic
190Architectures: all
191Type: vm ioctl
192Parameters: vcpu id (apic id on x86)
193Returns: vcpu fd on success, -1 on error
194
195This API adds a vcpu to a virtual machine. The vcpu id is a small integer
196in the range [0, max_vcpus).
197
1984.7 KVM_GET_DIRTY_LOG (vm ioctl)
199
200Capability: basic
201Architectures: x86
202Type: vm ioctl
203Parameters: struct kvm_dirty_log (in/out)
204Returns: 0 on success, -1 on error
205
206/* for KVM_GET_DIRTY_LOG */
207struct kvm_dirty_log {
208 __u32 slot;
209 __u32 padding;
210 union {
211 void __user *dirty_bitmap; /* one bit per page */
212 __u64 padding;
213 };
214};
215
216Given a memory slot, return a bitmap containing any pages dirtied
217since the last call to this ioctl. Bit 0 is the first page in the
218memory slot. Ensure the entire structure is cleared to avoid padding
219issues.
220
2214.8 KVM_SET_MEMORY_ALIAS
222
223Capability: basic
224Architectures: x86
225Type: vm ioctl
226Parameters: struct kvm_memory_alias (in)
227Returns: 0 (success), -1 (error)
228
229struct kvm_memory_alias {
230 __u32 slot; /* this has a different namespace than memory slots */
231 __u32 flags;
232 __u64 guest_phys_addr;
233 __u64 memory_size;
234 __u64 target_phys_addr;
235};
236
237Defines a guest physical address space region as an alias to another
238region. Useful for aliased address, for example the VGA low memory
239window. Should not be used with userspace memory.
240
2414.9 KVM_RUN
242
243Capability: basic
244Architectures: all
245Type: vcpu ioctl
246Parameters: none
247Returns: 0 on success, -1 on error
248Errors:
249 EINTR: an unmasked signal is pending
250
251This ioctl is used to run a guest virtual cpu. While there are no
252explicit parameters, there is an implicit parameter block that can be
253obtained by mmap()ing the vcpu fd at offset 0, with the size given by
254KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct
255kvm_run' (see below).
256
2574.10 KVM_GET_REGS
258
259Capability: basic
260Architectures: all
261Type: vcpu ioctl
262Parameters: struct kvm_regs (out)
263Returns: 0 on success, -1 on error
264
265Reads the general purpose registers from the vcpu.
266
267/* x86 */
268struct kvm_regs {
269 /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
270 __u64 rax, rbx, rcx, rdx;
271 __u64 rsi, rdi, rsp, rbp;
272 __u64 r8, r9, r10, r11;
273 __u64 r12, r13, r14, r15;
274 __u64 rip, rflags;
275};
276
2774.11 KVM_SET_REGS
278
279Capability: basic
280Architectures: all
281Type: vcpu ioctl
282Parameters: struct kvm_regs (in)
283Returns: 0 on success, -1 on error
284
285Writes the general purpose registers into the vcpu.
286
287See KVM_GET_REGS for the data structure.
288
2894.12 KVM_GET_SREGS
290
291Capability: basic
292Architectures: x86
293Type: vcpu ioctl
294Parameters: struct kvm_sregs (out)
295Returns: 0 on success, -1 on error
296
297Reads special registers from the vcpu.
298
299/* x86 */
300struct kvm_sregs {
301 struct kvm_segment cs, ds, es, fs, gs, ss;
302 struct kvm_segment tr, ldt;
303 struct kvm_dtable gdt, idt;
304 __u64 cr0, cr2, cr3, cr4, cr8;
305 __u64 efer;
306 __u64 apic_base;
307 __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
308};
309
310interrupt_bitmap is a bitmap of pending external interrupts. At most
311one bit may be set. This interrupt has been acknowledged by the APIC
312but not yet injected into the cpu core.
313
3144.13 KVM_SET_SREGS
315
316Capability: basic
317Architectures: x86
318Type: vcpu ioctl
319Parameters: struct kvm_sregs (in)
320Returns: 0 on success, -1 on error
321
322Writes special registers into the vcpu. See KVM_GET_SREGS for the
323data structures.
324
3254.14 KVM_TRANSLATE
326
327Capability: basic
328Architectures: x86
329Type: vcpu ioctl
330Parameters: struct kvm_translation (in/out)
331Returns: 0 on success, -1 on error
332
333Translates a virtual address according to the vcpu's current address
334translation mode.
335
336struct kvm_translation {
337 /* in */
338 __u64 linear_address;
339
340 /* out */
341 __u64 physical_address;
342 __u8 valid;
343 __u8 writeable;
344 __u8 usermode;
345 __u8 pad[5];
346};
347
3484.15 KVM_INTERRUPT
349
350Capability: basic
351Architectures: x86
352Type: vcpu ioctl
353Parameters: struct kvm_interrupt (in)
354Returns: 0 on success, -1 on error
355
356Queues a hardware interrupt vector to be injected. This is only
357useful if in-kernel local APIC is not used.
358
359/* for KVM_INTERRUPT */
360struct kvm_interrupt {
361 /* in */
362 __u32 irq;
363};
364
365Note 'irq' is an interrupt vector, not an interrupt pin or line.
366
3674.16 KVM_DEBUG_GUEST
368
369Capability: basic
370Architectures: none
371Type: vcpu ioctl
372Parameters: none)
373Returns: -1 on error
374
375Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead.
376
3774.17 KVM_GET_MSRS
378
379Capability: basic
380Architectures: x86
381Type: vcpu ioctl
382Parameters: struct kvm_msrs (in/out)
383Returns: 0 on success, -1 on error
384
385Reads model-specific registers from the vcpu. Supported msr indices can
386be obtained using KVM_GET_MSR_INDEX_LIST.
387
388struct kvm_msrs {
389 __u32 nmsrs; /* number of msrs in entries */
390 __u32 pad;
391
392 struct kvm_msr_entry entries[0];
393};
394
395struct kvm_msr_entry {
396 __u32 index;
397 __u32 reserved;
398 __u64 data;
399};
400
401Application code should set the 'nmsrs' member (which indicates the
402size of the entries array) and the 'index' member of each array entry.
403kvm will fill in the 'data' member.
404
4054.18 KVM_SET_MSRS
406
407Capability: basic
408Architectures: x86
409Type: vcpu ioctl
410Parameters: struct kvm_msrs (in)
411Returns: 0 on success, -1 on error
412
413Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the
414data structures.
415
416Application code should set the 'nmsrs' member (which indicates the
417size of the entries array), and the 'index' and 'data' members of each
418array entry.
419
4204.19 KVM_SET_CPUID
421
422Capability: basic
423Architectures: x86
424Type: vcpu ioctl
425Parameters: struct kvm_cpuid (in)
426Returns: 0 on success, -1 on error
427
428Defines the vcpu responses to the cpuid instruction. Applications
429should use the KVM_SET_CPUID2 ioctl if available.
430
431
432struct kvm_cpuid_entry {
433 __u32 function;
434 __u32 eax;
435 __u32 ebx;
436 __u32 ecx;
437 __u32 edx;
438 __u32 padding;
439};
440
441/* for KVM_SET_CPUID */
442struct kvm_cpuid {
443 __u32 nent;
444 __u32 padding;
445 struct kvm_cpuid_entry entries[0];
446};
447
4484.20 KVM_SET_SIGNAL_MASK
449
450Capability: basic
451Architectures: x86
452Type: vcpu ioctl
453Parameters: struct kvm_signal_mask (in)
454Returns: 0 on success, -1 on error
455
456Defines which signals are blocked during execution of KVM_RUN. This
457signal mask temporarily overrides the threads signal mask. Any
458unblocked signal received (except SIGKILL and SIGSTOP, which retain
459their traditional behaviour) will cause KVM_RUN to return with -EINTR.
460
461Note the signal will only be delivered if not blocked by the original
462signal mask.
463
464/* for KVM_SET_SIGNAL_MASK */
465struct kvm_signal_mask {
466 __u32 len;
467 __u8 sigset[0];
468};
469
4704.21 KVM_GET_FPU
471
472Capability: basic
473Architectures: x86
474Type: vcpu ioctl
475Parameters: struct kvm_fpu (out)
476Returns: 0 on success, -1 on error
477
478Reads the floating point state from the vcpu.
479
480/* for KVM_GET_FPU and KVM_SET_FPU */
481struct kvm_fpu {
482 __u8 fpr[8][16];
483 __u16 fcw;
484 __u16 fsw;
485 __u8 ftwx; /* in fxsave format */
486 __u8 pad1;
487 __u16 last_opcode;
488 __u64 last_ip;
489 __u64 last_dp;
490 __u8 xmm[16][16];
491 __u32 mxcsr;
492 __u32 pad2;
493};
494
4954.22 KVM_SET_FPU
496
497Capability: basic
498Architectures: x86
499Type: vcpu ioctl
500Parameters: struct kvm_fpu (in)
501Returns: 0 on success, -1 on error
502
503Writes the floating point state to the vcpu.
504
505/* for KVM_GET_FPU and KVM_SET_FPU */
506struct kvm_fpu {
507 __u8 fpr[8][16];
508 __u16 fcw;
509 __u16 fsw;
510 __u8 ftwx; /* in fxsave format */
511 __u8 pad1;
512 __u16 last_opcode;
513 __u64 last_ip;
514 __u64 last_dp;
515 __u8 xmm[16][16];
516 __u32 mxcsr;
517 __u32 pad2;
518};
519
5204.23 KVM_CREATE_IRQCHIP
521
522Capability: KVM_CAP_IRQCHIP
523Architectures: x86, ia64
524Type: vm ioctl
525Parameters: none
526Returns: 0 on success, -1 on error
527
528Creates an interrupt controller model in the kernel. On x86, creates a virtual
529ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
530local APIC. IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
531only go to the IOAPIC. On ia64, a IOSAPIC is created.
532
5334.24 KVM_IRQ_LINE
534
535Capability: KVM_CAP_IRQCHIP
536Architectures: x86, ia64
537Type: vm ioctl
538Parameters: struct kvm_irq_level
539Returns: 0 on success, -1 on error
540
541Sets the level of a GSI input to the interrupt controller model in the kernel.
542Requires that an interrupt controller model has been previously created with
543KVM_CREATE_IRQCHIP. Note that edge-triggered interrupts require the level
544to be set to 1 and then back to 0.
545
546struct kvm_irq_level {
547 union {
548 __u32 irq; /* GSI */
549 __s32 status; /* not used for KVM_IRQ_LEVEL */
550 };
551 __u32 level; /* 0 or 1 */
552};
553
5544.25 KVM_GET_IRQCHIP
555
556Capability: KVM_CAP_IRQCHIP
557Architectures: x86, ia64
558Type: vm ioctl
559Parameters: struct kvm_irqchip (in/out)
560Returns: 0 on success, -1 on error
561
562Reads the state of a kernel interrupt controller created with
563KVM_CREATE_IRQCHIP into a buffer provided by the caller.
564
565struct kvm_irqchip {
566 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
567 __u32 pad;
568 union {
569 char dummy[512]; /* reserving space */
570 struct kvm_pic_state pic;
571 struct kvm_ioapic_state ioapic;
572 } chip;
573};
574
5754.26 KVM_SET_IRQCHIP
576
577Capability: KVM_CAP_IRQCHIP
578Architectures: x86, ia64
579Type: vm ioctl
580Parameters: struct kvm_irqchip (in)
581Returns: 0 on success, -1 on error
582
583Sets the state of a kernel interrupt controller created with
584KVM_CREATE_IRQCHIP from a buffer provided by the caller.
585
586struct kvm_irqchip {
587 __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
588 __u32 pad;
589 union {
590 char dummy[512]; /* reserving space */
591 struct kvm_pic_state pic;
592 struct kvm_ioapic_state ioapic;
593 } chip;
594};
595
5965. The kvm_run structure
597
598Application code obtains a pointer to the kvm_run structure by
599mmap()ing a vcpu fd. From that point, application code can control
600execution by changing fields in kvm_run prior to calling the KVM_RUN
601ioctl, and obtain information about the reason KVM_RUN returned by
602looking up structure members.
603
604struct kvm_run {
605 /* in */
606 __u8 request_interrupt_window;
607
608Request that KVM_RUN return when it becomes possible to inject external
609interrupts into the guest. Useful in conjunction with KVM_INTERRUPT.
610
611 __u8 padding1[7];
612
613 /* out */
614 __u32 exit_reason;
615
616When KVM_RUN has returned successfully (return value 0), this informs
617application code why KVM_RUN has returned. Allowable values for this
618field are detailed below.
619
620 __u8 ready_for_interrupt_injection;
621
622If request_interrupt_window has been specified, this field indicates
623an interrupt can be injected now with KVM_INTERRUPT.
624
625 __u8 if_flag;
626
627The value of the current interrupt flag. Only valid if in-kernel
628local APIC is not used.
629
630 __u8 padding2[2];
631
632 /* in (pre_kvm_run), out (post_kvm_run) */
633 __u64 cr8;
634
635The value of the cr8 register. Only valid if in-kernel local APIC is
636not used. Both input and output.
637
638 __u64 apic_base;
639
640The value of the APIC BASE msr. Only valid if in-kernel local
641APIC is not used. Both input and output.
642
643 union {
644 /* KVM_EXIT_UNKNOWN */
645 struct {
646 __u64 hardware_exit_reason;
647 } hw;
648
649If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
650reasons. Further architecture-specific information is available in
651hardware_exit_reason.
652
653 /* KVM_EXIT_FAIL_ENTRY */
654 struct {
655 __u64 hardware_entry_failure_reason;
656 } fail_entry;
657
658If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
659to unknown reasons. Further architecture-specific information is
660available in hardware_entry_failure_reason.
661
662 /* KVM_EXIT_EXCEPTION */
663 struct {
664 __u32 exception;
665 __u32 error_code;
666 } ex;
667
668Unused.
669
670 /* KVM_EXIT_IO */
671 struct {
672#define KVM_EXIT_IO_IN 0
673#define KVM_EXIT_IO_OUT 1
674 __u8 direction;
675 __u8 size; /* bytes */
676 __u16 port;
677 __u32 count;
678 __u64 data_offset; /* relative to kvm_run start */
679 } io;
680
681If exit_reason is KVM_EXIT_IO_IN or KVM_EXIT_IO_OUT, then the vcpu has
682executed a port I/O instruction which could not be satisfied by kvm.
683data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
684where kvm expects application code to place the data for the next
685KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a patcked array.
686
687 struct {
688 struct kvm_debug_exit_arch arch;
689 } debug;
690
691Unused.
692
693 /* KVM_EXIT_MMIO */
694 struct {
695 __u64 phys_addr;
696 __u8 data[8];
697 __u32 len;
698 __u8 is_write;
699 } mmio;
700
701If exit_reason is KVM_EXIT_MMIO or KVM_EXIT_IO_OUT, then the vcpu has
702executed a memory-mapped I/O instruction which could not be satisfied
703by kvm. The 'data' member contains the written data if 'is_write' is
704true, and should be filled by application code otherwise.
705
706 /* KVM_EXIT_HYPERCALL */
707 struct {
708 __u64 nr;
709 __u64 args[6];
710 __u64 ret;
711 __u32 longmode;
712 __u32 pad;
713 } hypercall;
714
715Unused.
716
717 /* KVM_EXIT_TPR_ACCESS */
718 struct {
719 __u64 rip;
720 __u32 is_write;
721 __u32 pad;
722 } tpr_access;
723
724To be documented (KVM_TPR_ACCESS_REPORTING).
725
726 /* KVM_EXIT_S390_SIEIC */
727 struct {
728 __u8 icptcode;
729 __u64 mask; /* psw upper half */
730 __u64 addr; /* psw lower half */
731 __u16 ipa;
732 __u32 ipb;
733 } s390_sieic;
734
735s390 specific.
736
737 /* KVM_EXIT_S390_RESET */
738#define KVM_S390_RESET_POR 1
739#define KVM_S390_RESET_CLEAR 2
740#define KVM_S390_RESET_SUBSYSTEM 4
741#define KVM_S390_RESET_CPU_INIT 8
742#define KVM_S390_RESET_IPL 16
743 __u64 s390_reset_flags;
744
745s390 specific.
746
747 /* KVM_EXIT_DCR */
748 struct {
749 __u32 dcrn;
750 __u32 data;
751 __u8 is_write;
752 } dcr;
753
754powerpc specific.
755
756 /* Fix the size of the union. */
757 char padding[256];
758 };
759};
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX
index 1634c6dcecae..50189bf07d53 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -60,6 +60,8 @@ framerelay.txt
60 - info on using Frame Relay/Data Link Connection Identifier (DLCI). 60 - info on using Frame Relay/Data Link Connection Identifier (DLCI).
61generic_netlink.txt 61generic_netlink.txt
62 - info on Generic Netlink 62 - info on Generic Netlink
63ieee802154.txt
64 - Linux IEEE 802.15.4 implementation, API and drivers
63ip-sysctl.txt 65ip-sysctl.txt
64 - /proc/sys/net/ipv4/* variables 66 - /proc/sys/net/ipv4/* variables
65ip_dynaddr.txt 67ip_dynaddr.txt
diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt
index a0280ad2edc9..23c995e64032 100644
--- a/Documentation/networking/ieee802154.txt
+++ b/Documentation/networking/ieee802154.txt
@@ -22,7 +22,7 @@ int sd = socket(PF_IEEE802154, SOCK_DGRAM, 0);
22..... 22.....
23 23
24The address family, socket addresses etc. are defined in the 24The address family, socket addresses etc. are defined in the
25include/net/ieee802154/af_ieee802154.h header or in the special header 25include/net/af_ieee802154.h header or in the special header
26in our userspace package (see either linux-zigbee sourceforge download page 26in our userspace package (see either linux-zigbee sourceforge download page
27or git tree at git://linux-zigbee.git.sourceforge.net/gitroot/linux-zigbee). 27or git tree at git://linux-zigbee.git.sourceforge.net/gitroot/linux-zigbee).
28 28
@@ -33,7 +33,7 @@ MLME - MAC Level Management
33============================ 33============================
34 34
35Most of IEEE 802.15.4 MLME interfaces are directly mapped on netlink commands. 35Most of IEEE 802.15.4 MLME interfaces are directly mapped on netlink commands.
36See the include/net/ieee802154/nl802154.h header. Our userspace tools package 36See the include/net/nl802154.h header. Our userspace tools package
37(see above) provides CLI configuration utility for radio interfaces and simple 37(see above) provides CLI configuration utility for radio interfaces and simple
38coordinator for IEEE 802.15.4 networks as an example users of MLME protocol. 38coordinator for IEEE 802.15.4 networks as an example users of MLME protocol.
39 39
@@ -54,10 +54,14 @@ Those types of devices require different approach to be hooked into Linux kernel
54HardMAC 54HardMAC
55======= 55=======
56 56
57See the header include/net/ieee802154/netdevice.h. You have to implement Linux 57See the header include/net/ieee802154_netdev.h. You have to implement Linux
58net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family 58net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family
59code via plain sk_buffs. The control block of sk_buffs will contain additional 59code via plain sk_buffs. On skb reception skb->cb must contain additional
60info as described in the struct ieee802154_mac_cb. 60info as described in the struct ieee802154_mac_cb. During packet transmission
61the skb->cb is used to provide additional data to device's header_ops->create
62function. Be aware, that this data can be overriden later (when socket code
63submits skb to qdisc), so if you need something from that cb later, you should
64store info in the skb->data on your own.
61 65
62To hook the MLME interface you have to populate the ml_priv field of your 66To hook the MLME interface you have to populate the ml_priv field of your
63net_device with a pointer to struct ieee802154_mlme_ops instance. All fields are 67net_device with a pointer to struct ieee802154_mlme_ops instance. All fields are
@@ -69,8 +73,8 @@ We provide an example of simple HardMAC driver at drivers/ieee802154/fakehard.c
69SoftMAC 73SoftMAC
70======= 74=======
71 75
72We are going to provide intermediate layer impelementing IEEE 802.15.4 MAC 76We are going to provide intermediate layer implementing IEEE 802.15.4 MAC
73in software. This is currently WIP. 77in software. This is currently WIP.
74 78
75See header include/net/ieee802154/mac802154.h and several drivers in 79See header include/net/mac802154.h and several drivers in drivers/ieee802154/.
76drivers/ieee802154/ 80
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 8be76235fe67..fbe427a6580c 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -311,9 +311,12 @@ tcp_no_metrics_save - BOOLEAN
311 connections. 311 connections.
312 312
313tcp_orphan_retries - INTEGER 313tcp_orphan_retries - INTEGER
314 How may times to retry before killing TCP connection, closed 314 This value influences the timeout of a locally closed TCP connection,
315 by our side. Default value 7 corresponds to ~50sec-16min 315 when RTO retransmissions remain unacknowledged.
316 depending on RTO. If you machine is loaded WEB server, 316 See tcp_retries2 for more details.
317
318 The default value is 7.
319 If your machine is a loaded WEB server,
317 you should think about lowering this value, such sockets 320 you should think about lowering this value, such sockets
318 may consume significant resources. Cf. tcp_max_orphans. 321 may consume significant resources. Cf. tcp_max_orphans.
319 322
@@ -327,16 +330,28 @@ tcp_retrans_collapse - BOOLEAN
327 certain TCP stacks. 330 certain TCP stacks.
328 331
329tcp_retries1 - INTEGER 332tcp_retries1 - INTEGER
330 How many times to retry before deciding that something is wrong 333 This value influences the time, after which TCP decides, that
331 and it is necessary to report this suspicion to network layer. 334 something is wrong due to unacknowledged RTO retransmissions,
332 Minimal RFC value is 3, it is default, which corresponds 335 and reports this suspicion to the network layer.
333 to ~3sec-8min depending on RTO. 336 See tcp_retries2 for more details.
337
338 RFC 1122 recommends at least 3 retransmissions, which is the
339 default.
334 340
335tcp_retries2 - INTEGER 341tcp_retries2 - INTEGER
336 How may times to retry before killing alive TCP connection. 342 This value influences the timeout of an alive TCP connection,
337 RFC1122 says that the limit should be longer than 100 sec. 343 when RTO retransmissions remain unacknowledged.
338 It is too small number. Default value 15 corresponds to ~13-30min 344 Given a value of N, a hypothetical TCP connection following
339 depending on RTO. 345 exponential backoff with an initial RTO of TCP_RTO_MIN would
346 retransmit N times before killing the connection at the (N+1)th RTO.
347
348 The default value of 15 yields a hypothetical timeout of 924.6
349 seconds and is a lower bound for the effective timeout.
350 TCP will effectively time out at the first RTO which exceeds the
351 hypothetical timeout.
352
353 RFC 1122 recommends at least 100 seconds for the timeout,
354 which corresponds to a value of at least 8.
340 355
341tcp_rfc1337 - BOOLEAN 356tcp_rfc1337 - BOOLEAN
342 If set, the TCP stack behaves conforming to RFC1337. If unset, 357 If set, the TCP stack behaves conforming to RFC1337. If unset,
@@ -1282,6 +1297,16 @@ sctp_rmem - vector of 3 INTEGERs: min, default, max
1282sctp_wmem - vector of 3 INTEGERs: min, default, max 1297sctp_wmem - vector of 3 INTEGERs: min, default, max
1283 See tcp_wmem for a description. 1298 See tcp_wmem for a description.
1284 1299
1300addr_scope_policy - INTEGER
1301 Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00
1302
1303 0 - Disable IPv4 address scoping
1304 1 - Enable IPv4 address scoping
1305 2 - Follow draft but allow IPv4 private addresses
1306 3 - Follow draft but allow IPv4 link local addresses
1307
1308 Default: 1
1309
1285 1310
1286/proc/sys/net/core/* 1311/proc/sys/net/core/*
1287dev_weight - INTEGER 1312dev_weight - INTEGER
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
new file mode 100644
index 000000000000..f49a33b704d2
--- /dev/null
+++ b/Documentation/power/runtime_pm.txt
@@ -0,0 +1,378 @@
1Run-time Power Management Framework for I/O Devices
2
3(C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
4
51. Introduction
6
7Support for run-time power management (run-time PM) of I/O devices is provided
8at the power management core (PM core) level by means of:
9
10* The power management workqueue pm_wq in which bus types and device drivers can
11 put their PM-related work items. It is strongly recommended that pm_wq be
12 used for queuing all work items related to run-time PM, because this allows
13 them to be synchronized with system-wide power transitions (suspend to RAM,
14 hibernation and resume from system sleep states). pm_wq is declared in
15 include/linux/pm_runtime.h and defined in kernel/power/main.c.
16
17* A number of run-time PM fields in the 'power' member of 'struct device' (which
18 is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
19 be used for synchronizing run-time PM operations with one another.
20
21* Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in
22 include/linux/pm.h).
23
24* A set of helper functions defined in drivers/base/power/runtime.c that can be
25 used for carrying out run-time PM operations in such a way that the
26 synchronization between them is taken care of by the PM core. Bus types and
27 device drivers are encouraged to use these functions.
28
29The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM
30fields of 'struct dev_pm_info' and the core helper functions provided for
31run-time PM are described below.
32
332. Device Run-time PM Callbacks
34
35There are three device run-time PM callbacks defined in 'struct dev_pm_ops':
36
37struct dev_pm_ops {
38 ...
39 int (*runtime_suspend)(struct device *dev);
40 int (*runtime_resume)(struct device *dev);
41 void (*runtime_idle)(struct device *dev);
42 ...
43};
44
45The ->runtime_suspend() callback is executed by the PM core for the bus type of
46the device being suspended. The bus type's callback is then _entirely_
47_responsible_ for handling the device as appropriate, which may, but need not
48include executing the device driver's own ->runtime_suspend() callback (from the
49PM core's point of view it is not necessary to implement a ->runtime_suspend()
50callback in a device driver as long as the bus type's ->runtime_suspend() knows
51what to do to handle the device).
52
53 * Once the bus type's ->runtime_suspend() callback has completed successfully
54 for given device, the PM core regards the device as suspended, which need
55 not mean that the device has been put into a low power state. It is
56 supposed to mean, however, that the device will not process data and will
57 not communicate with the CPU(s) and RAM until its bus type's
58 ->runtime_resume() callback is executed for it. The run-time PM status of
59 a device after successful execution of its bus type's ->runtime_suspend()
60 callback is 'suspended'.
61
62 * If the bus type's ->runtime_suspend() callback returns -EBUSY or -EAGAIN,
63 the device's run-time PM status is supposed to be 'active', which means that
64 the device _must_ be fully operational afterwards.
65
66 * If the bus type's ->runtime_suspend() callback returns an error code
67 different from -EBUSY or -EAGAIN, the PM core regards this as a fatal
68 error and will refuse to run the helper functions described in Section 4
69 for the device, until the status of it is directly set either to 'active'
70 or to 'suspended' (the PM core provides special helper functions for this
71 purpose).
72
73In particular, if the driver requires remote wakeup capability for proper
74functioning and device_may_wakeup() returns 'false' for the device, then
75->runtime_suspend() should return -EBUSY. On the other hand, if
76device_may_wakeup() returns 'true' for the device and the device is put
77into a low power state during the execution of its bus type's
78->runtime_suspend(), it is expected that remote wake-up (i.e. hardware mechanism
79allowing the device to request a change of its power state, such as PCI PME)
80will be enabled for the device. Generally, remote wake-up should be enabled
81for all input devices put into a low power state at run time.
82
83The ->runtime_resume() callback is executed by the PM core for the bus type of
84the device being woken up. The bus type's callback is then _entirely_
85_responsible_ for handling the device as appropriate, which may, but need not
86include executing the device driver's own ->runtime_resume() callback (from the
87PM core's point of view it is not necessary to implement a ->runtime_resume()
88callback in a device driver as long as the bus type's ->runtime_resume() knows
89what to do to handle the device).
90
91 * Once the bus type's ->runtime_resume() callback has completed successfully,
92 the PM core regards the device as fully operational, which means that the
93 device _must_ be able to complete I/O operations as needed. The run-time
94 PM status of the device is then 'active'.
95
96 * If the bus type's ->runtime_resume() callback returns an error code, the PM
97 core regards this as a fatal error and will refuse to run the helper
98 functions described in Section 4 for the device, until its status is
99 directly set either to 'active' or to 'suspended' (the PM core provides
100 special helper functions for this purpose).
101
102The ->runtime_idle() callback is executed by the PM core for the bus type of
103given device whenever the device appears to be idle, which is indicated to the
104PM core by two counters, the device's usage counter and the counter of 'active'
105children of the device.
106
107 * If any of these counters is decreased using a helper function provided by
108 the PM core and it turns out to be equal to zero, the other counter is
109 checked. If that counter also is equal to zero, the PM core executes the
110 device bus type's ->runtime_idle() callback (with the device as an
111 argument).
112
113The action performed by a bus type's ->runtime_idle() callback is totally
114dependent on the bus type in question, but the expected and recommended action
115is to check if the device can be suspended (i.e. if all of the conditions
116necessary for suspending the device are satisfied) and to queue up a suspend
117request for the device in that case.
118
119The helper functions provided by the PM core, described in Section 4, guarantee
120that the following constraints are met with respect to the bus type's run-time
121PM callbacks:
122
123(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
124 ->runtime_suspend() in parallel with ->runtime_resume() or with another
125 instance of ->runtime_suspend() for the same device) with the exception that
126 ->runtime_suspend() or ->runtime_resume() can be executed in parallel with
127 ->runtime_idle() (although ->runtime_idle() will not be started while any
128 of the other callbacks is being executed for the same device).
129
130(2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
131 devices (i.e. the PM core will only execute ->runtime_idle() or
132 ->runtime_suspend() for the devices the run-time PM status of which is
133 'active').
134
135(3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
136 the usage counter of which is equal to zero _and_ either the counter of
137 'active' children of which is equal to zero, or the 'power.ignore_children'
138 flag of which is set.
139
140(4) ->runtime_resume() can only be executed for 'suspended' devices (i.e. the
141 PM core will only execute ->runtime_resume() for the devices the run-time
142 PM status of which is 'suspended').
143
144Additionally, the helper functions provided by the PM core obey the following
145rules:
146
147 * If ->runtime_suspend() is about to be executed or there's a pending request
148 to execute it, ->runtime_idle() will not be executed for the same device.
149
150 * A request to execute or to schedule the execution of ->runtime_suspend()
151 will cancel any pending requests to execute ->runtime_idle() for the same
152 device.
153
154 * If ->runtime_resume() is about to be executed or there's a pending request
155 to execute it, the other callbacks will not be executed for the same device.
156
157 * A request to execute ->runtime_resume() will cancel any pending or
158 scheduled requests to execute the other callbacks for the same device.
159
1603. Run-time PM Device Fields
161
162The following device run-time PM fields are present in 'struct dev_pm_info', as
163defined in include/linux/pm.h:
164
165 struct timer_list suspend_timer;
166 - timer used for scheduling (delayed) suspend request
167
168 unsigned long timer_expires;
169 - timer expiration time, in jiffies (if this is different from zero, the
170 timer is running and will expire at that time, otherwise the timer is not
171 running)
172
173 struct work_struct work;
174 - work structure used for queuing up requests (i.e. work items in pm_wq)
175
176 wait_queue_head_t wait_queue;
177 - wait queue used if any of the helper functions needs to wait for another
178 one to complete
179
180 spinlock_t lock;
181 - lock used for synchronisation
182
183 atomic_t usage_count;
184 - the usage counter of the device
185
186 atomic_t child_count;
187 - the count of 'active' children of the device
188
189 unsigned int ignore_children;
190 - if set, the value of child_count is ignored (but still updated)
191
192 unsigned int disable_depth;
193 - used for disabling the helper funcions (they work normally if this is
194 equal to zero); the initial value of it is 1 (i.e. run-time PM is
195 initially disabled for all devices)
196
197 unsigned int runtime_error;
198 - if set, there was a fatal error (one of the callbacks returned error code
199 as described in Section 2), so the helper funtions will not work until
200 this flag is cleared; this is the error code returned by the failing
201 callback
202
203 unsigned int idle_notification;
204 - if set, ->runtime_idle() is being executed
205
206 unsigned int request_pending;
207 - if set, there's a pending request (i.e. a work item queued up into pm_wq)
208
209 enum rpm_request request;
210 - type of request that's pending (valid if request_pending is set)
211
212 unsigned int deferred_resume;
213 - set if ->runtime_resume() is about to be run while ->runtime_suspend() is
214 being executed for that device and it is not practical to wait for the
215 suspend to complete; means "start a resume as soon as you've suspended"
216
217 enum rpm_status runtime_status;
218 - the run-time PM status of the device; this field's initial value is
219 RPM_SUSPENDED, which means that each device is initially regarded by the
220 PM core as 'suspended', regardless of its real hardware status
221
222All of the above fields are members of the 'power' member of 'struct device'.
223
2244. Run-time PM Device Helper Functions
225
226The following run-time PM helper functions are defined in
227drivers/base/power/runtime.c and include/linux/pm_runtime.h:
228
229 void pm_runtime_init(struct device *dev);
230 - initialize the device run-time PM fields in 'struct dev_pm_info'
231
232 void pm_runtime_remove(struct device *dev);
233 - make sure that the run-time PM of the device will be disabled after
234 removing the device from device hierarchy
235
236 int pm_runtime_idle(struct device *dev);
237 - execute ->runtime_idle() for the device's bus type; returns 0 on success
238 or error code on failure, where -EINPROGRESS means that ->runtime_idle()
239 is already being executed
240
241 int pm_runtime_suspend(struct device *dev);
242 - execute ->runtime_suspend() for the device's bus type; returns 0 on
243 success, 1 if the device's run-time PM status was already 'suspended', or
244 error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
245 to suspend the device again in future
246
247 int pm_runtime_resume(struct device *dev);
248 - execute ->runtime_resume() for the device's bus type; returns 0 on
249 success, 1 if the device's run-time PM status was already 'active' or
250 error code on failure, where -EAGAIN means it may be safe to attempt to
251 resume the device again in future, but 'power.runtime_error' should be
252 checked additionally
253
254 int pm_request_idle(struct device *dev);
255 - submit a request to execute ->runtime_idle() for the device's bus type
256 (the request is represented by a work item in pm_wq); returns 0 on success
257 or error code if the request has not been queued up
258
259 int pm_schedule_suspend(struct device *dev, unsigned int delay);
260 - schedule the execution of ->runtime_suspend() for the device's bus type
261 in future, where 'delay' is the time to wait before queuing up a suspend
262 work item in pm_wq, in milliseconds (if 'delay' is zero, the work item is
263 queued up immediately); returns 0 on success, 1 if the device's PM
264 run-time status was already 'suspended', or error code if the request
265 hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
266 ->runtime_suspend() is already scheduled and not yet expired, the new
267 value of 'delay' will be used as the time to wait
268
269 int pm_request_resume(struct device *dev);
270 - submit a request to execute ->runtime_resume() for the device's bus type
271 (the request is represented by a work item in pm_wq); returns 0 on
272 success, 1 if the device's run-time PM status was already 'active', or
273 error code if the request hasn't been queued up
274
275 void pm_runtime_get_noresume(struct device *dev);
276 - increment the device's usage counter
277
278 int pm_runtime_get(struct device *dev);
279 - increment the device's usage counter, run pm_request_resume(dev) and
280 return its result
281
282 int pm_runtime_get_sync(struct device *dev);
283 - increment the device's usage counter, run pm_runtime_resume(dev) and
284 return its result
285
286 void pm_runtime_put_noidle(struct device *dev);
287 - decrement the device's usage counter
288
289 int pm_runtime_put(struct device *dev);
290 - decrement the device's usage counter, run pm_request_idle(dev) and return
291 its result
292
293 int pm_runtime_put_sync(struct device *dev);
294 - decrement the device's usage counter, run pm_runtime_idle(dev) and return
295 its result
296
297 void pm_runtime_enable(struct device *dev);
298 - enable the run-time PM helper functions to run the device bus type's
299 run-time PM callbacks described in Section 2
300
301 int pm_runtime_disable(struct device *dev);
302 - prevent the run-time PM helper functions from running the device bus
303 type's run-time PM callbacks, make sure that all of the pending run-time
304 PM operations on the device are either completed or canceled; returns
305 1 if there was a resume request pending and it was necessary to execute
306 ->runtime_resume() for the device's bus type to satisfy that request,
307 otherwise 0 is returned
308
309 void pm_suspend_ignore_children(struct device *dev, bool enable);
310 - set/unset the power.ignore_children flag of the device
311
312 int pm_runtime_set_active(struct device *dev);
313 - clear the device's 'power.runtime_error' flag, set the device's run-time
314 PM status to 'active' and update its parent's counter of 'active'
315 children as appropriate (it is only valid to use this function if
316 'power.runtime_error' is set or 'power.disable_depth' is greater than
317 zero); it will fail and return error code if the device has a parent
318 which is not active and the 'power.ignore_children' flag of which is unset
319
320 void pm_runtime_set_suspended(struct device *dev);
321 - clear the device's 'power.runtime_error' flag, set the device's run-time
322 PM status to 'suspended' and update its parent's counter of 'active'
323 children as appropriate (it is only valid to use this function if
324 'power.runtime_error' is set or 'power.disable_depth' is greater than
325 zero)
326
327It is safe to execute the following helper functions from interrupt context:
328
329pm_request_idle()
330pm_schedule_suspend()
331pm_request_resume()
332pm_runtime_get_noresume()
333pm_runtime_get()
334pm_runtime_put_noidle()
335pm_runtime_put()
336pm_suspend_ignore_children()
337pm_runtime_set_active()
338pm_runtime_set_suspended()
339pm_runtime_enable()
340
3415. Run-time PM Initialization, Device Probing and Removal
342
343Initially, the run-time PM is disabled for all devices, which means that the
344majority of the run-time PM helper funtions described in Section 4 will return
345-EAGAIN until pm_runtime_enable() is called for the device.
346
347In addition to that, the initial run-time PM status of all devices is
348'suspended', but it need not reflect the actual physical state of the device.
349Thus, if the device is initially active (i.e. it is able to process I/O), its
350run-time PM status must be changed to 'active', with the help of
351pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
352
353However, if the device has a parent and the parent's run-time PM is enabled,
354calling pm_runtime_set_active() for the device will affect the parent, unless
355the parent's 'power.ignore_children' flag is set. Namely, in that case the
356parent won't be able to suspend at run time, using the PM core's helper
357functions, as long as the child's status is 'active', even if the child's
358run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
359the child yet or pm_runtime_disable() has been called for it). For this reason,
360once pm_runtime_set_active() has been called for the device, pm_runtime_enable()
361should be called for it too as soon as reasonably possible or its run-time PM
362status should be changed back to 'suspended' with the help of
363pm_runtime_set_suspended().
364
365If the default initial run-time PM status of the device (i.e. 'suspended')
366reflects the actual state of the device, its bus type's or its driver's
367->probe() callback will likely need to wake it up using one of the PM core's
368helper functions described in Section 4. In that case, pm_runtime_resume()
369should be used. Of course, for this purpose the device's run-time PM has to be
370enabled earlier by calling pm_runtime_enable().
371
372If the device bus type's or driver's ->probe() or ->remove() callback runs
373pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts,
374they will fail returning -EAGAIN, because the device's usage counter is
375incremented by the core before executing ->probe() and ->remove(). Still, it
376may be desirable to suspend the device as soon as ->probe() or ->remove() has
377finished, so the PM core uses pm_runtime_idle_sync() to invoke the device bus
378type's ->runtime_idle() callback at that time.
diff --git a/Documentation/s390/s390dbf.txt b/Documentation/s390/s390dbf.txt
index 2d10053dd97e..ae66f9b90a25 100644
--- a/Documentation/s390/s390dbf.txt
+++ b/Documentation/s390/s390dbf.txt
@@ -495,6 +495,13 @@ and for each vararg a long value. So e.g. for a debug entry with a format
495string plus two varargs one would need to allocate a (3 * sizeof(long)) 495string plus two varargs one would need to allocate a (3 * sizeof(long))
496byte data area in the debug_register() function. 496byte data area in the debug_register() function.
497 497
498IMPORTANT: Using "%s" in sprintf event functions is dangerous. You can only
499use "%s" in the sprintf event functions, if the memory for the passed string is
500available as long as the debug feature exists. The reason behind this is that
501due to performance considerations only a pointer to the string is stored in
502the debug feature. If you log a string that is freed afterwards, you will get
503an OOPS when inspecting the debug feature, because then the debug feature will
504access the already freed memory.
498 505
499NOTE: If using the sprintf view do NOT use other event/exception functions 506NOTE: If using the sprintf view do NOT use other event/exception functions
500than the sprintf-event and -exception functions. 507than the sprintf-event and -exception functions.
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index 4252697a95d6..1c8eb4518ce0 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -60,6 +60,12 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
60 slots - Reserve the slot index for the given driver. 60 slots - Reserve the slot index for the given driver.
61 This option takes multiple strings. 61 This option takes multiple strings.
62 See "Module Autoloading Support" section for details. 62 See "Module Autoloading Support" section for details.
63 debug - Specifies the debug message level
64 (0 = disable debug prints, 1 = normal debug messages,
65 2 = verbose debug messages)
66 This option appears only when CONFIG_SND_DEBUG=y.
67 This option can be dynamically changed via sysfs
68 /sys/modules/snd/parameters/debug file.
63 69
64 Module snd-pcm-oss 70 Module snd-pcm-oss
65 ------------------ 71 ------------------
@@ -513,6 +519,26 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
513 or input, but you may use this module for any application which 519 or input, but you may use this module for any application which
514 requires a sound card (like RealPlayer). 520 requires a sound card (like RealPlayer).
515 521
522 pcm_devs - Number of PCM devices assigned to each card
523 (default = 1, up to 4)
524 pcm_substreams - Number of PCM substreams assigned to each PCM
525 (default = 8, up to 16)
526 hrtimer - Use hrtimer (=1, default) or system timer (=0)
527 fake_buffer - Fake buffer allocations (default = 1)
528
529 When multiple PCM devices are created, snd-dummy gives different
530 behavior to each PCM device:
531 0 = interleaved with mmap support
532 1 = non-interleaved with mmap support
533 2 = interleaved without mmap
534 3 = non-interleaved without mmap
535
536 As default, snd-dummy drivers doesn't allocate the real buffers
537 but either ignores read/write or mmap a single dummy page to all
538 buffer pages, in order to save the resouces. If your apps need
539 the read/ written buffer data to be consistent, pass fake_buffer=0
540 option.
541
516 The power-management is supported. 542 The power-management is supported.
517 543
518 Module snd-echo3g 544 Module snd-echo3g
@@ -768,6 +794,10 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
768 bdl_pos_adj - Specifies the DMA IRQ timing delay in samples. 794 bdl_pos_adj - Specifies the DMA IRQ timing delay in samples.
769 Passing -1 will make the driver to choose the appropriate 795 Passing -1 will make the driver to choose the appropriate
770 value based on the controller chip. 796 value based on the controller chip.
797 patch - Specifies the early "patch" files to modify the HD-audio
798 setup before initializing the codecs. This option is
799 available only when CONFIG_SND_HDA_PATCH_LOADER=y is set.
800 See HD-Audio.txt for details.
771 801
772 [Single (global) options] 802 [Single (global) options]
773 single_cmd - Use single immediate commands to communicate with 803 single_cmd - Use single immediate commands to communicate with
diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt
index 939a3dd58148..97eebd63bedc 100644
--- a/Documentation/sound/alsa/HD-Audio-Models.txt
+++ b/Documentation/sound/alsa/HD-Audio-Models.txt
@@ -114,8 +114,8 @@ ALC662/663/272
114 samsung-nc10 Samsung NC10 mini notebook 114 samsung-nc10 Samsung NC10 mini notebook
115 auto auto-config reading BIOS (default) 115 auto auto-config reading BIOS (default)
116 116
117ALC882/885 117ALC882/883/885/888/889
118========== 118======================
119 3stack-dig 3-jack with SPDIF I/O 119 3stack-dig 3-jack with SPDIF I/O
120 6stack-dig 6-jack digital with SPDIF I/O 120 6stack-dig 6-jack digital with SPDIF I/O
121 arima Arima W820Di1 121 arima Arima W820Di1
@@ -127,12 +127,8 @@ ALC882/885
127 mbp3 Macbook Pro rev3 127 mbp3 Macbook Pro rev3
128 imac24 iMac 24'' with jack detection 128 imac24 iMac 24'' with jack detection
129 w2jc ASUS W2JC 129 w2jc ASUS W2JC
130 auto auto-config reading BIOS (default) 130 3stack-2ch-dig 3-jack with SPDIF I/O (ALC883)
131 131 alc883-6stack-dig 6-jack digital with SPDIF I/O (ALC883)
132ALC883/888
133==========
134 3stack-dig 3-jack with SPDIF I/O
135 6stack-dig 6-jack digital with SPDIF I/O
136 3stack-6ch 3-jack 6-channel 132 3stack-6ch 3-jack 6-channel
137 3stack-6ch-dig 3-jack 6-channel with SPDIF I/O 133 3stack-6ch-dig 3-jack 6-channel with SPDIF I/O
138 6stack-dig-demo 6-jack digital for Intel demo board 134 6stack-dig-demo 6-jack digital for Intel demo board
@@ -140,6 +136,7 @@ ALC883/888
140 acer-aspire Acer Aspire 9810 136 acer-aspire Acer Aspire 9810
141 acer-aspire-4930g Acer Aspire 4930G 137 acer-aspire-4930g Acer Aspire 4930G
142 acer-aspire-6530g Acer Aspire 6530G 138 acer-aspire-6530g Acer Aspire 6530G
139 acer-aspire-7730g Acer Aspire 7730G
143 acer-aspire-8930g Acer Aspire 8930G 140 acer-aspire-8930g Acer Aspire 8930G
144 medion Medion Laptops 141 medion Medion Laptops
145 medion-md2 Medion MD2 142 medion-md2 Medion MD2
@@ -155,10 +152,13 @@ ALC883/888
155 3stack-hp HP machines with 3stack (Lucknow, Samba boards) 152 3stack-hp HP machines with 3stack (Lucknow, Samba boards)
156 6stack-dell Dell machines with 6stack (Inspiron 530) 153 6stack-dell Dell machines with 6stack (Inspiron 530)
157 mitac Mitac 8252D 154 mitac Mitac 8252D
155 clevo-m540r Clevo M540R (6ch + digital)
158 clevo-m720 Clevo M720 laptop series 156 clevo-m720 Clevo M720 laptop series
159 fujitsu-pi2515 Fujitsu AMILO Pi2515 157 fujitsu-pi2515 Fujitsu AMILO Pi2515
160 fujitsu-xa3530 Fujitsu AMILO XA3530 158 fujitsu-xa3530 Fujitsu AMILO XA3530
161 3stack-6ch-intel Intel DG33* boards 159 3stack-6ch-intel Intel DG33* boards
160 intel-alc889a Intel IbexPeak with ALC889A
161 intel-x58 Intel DX58 with ALC889
162 asus-p5q ASUS P5Q-EM boards 162 asus-p5q ASUS P5Q-EM boards
163 mb31 MacBook 3,1 163 mb31 MacBook 3,1
164 sony-vaio-tt Sony VAIO TT 164 sony-vaio-tt Sony VAIO TT
@@ -229,7 +229,7 @@ AD1984
229====== 229======
230 basic default configuration 230 basic default configuration
231 thinkpad Lenovo Thinkpad T61/X61 231 thinkpad Lenovo Thinkpad T61/X61
232 dell Dell T3400 232 dell_desktop Dell T3400
233 233
234AD1986A 234AD1986A
235======= 235=======
@@ -258,6 +258,7 @@ Conexant 5045
258 laptop-micsense Laptop with Mic sense (old model fujitsu) 258 laptop-micsense Laptop with Mic sense (old model fujitsu)
259 laptop-hpmicsense Laptop with HP and Mic senses 259 laptop-hpmicsense Laptop with HP and Mic senses
260 benq Benq R55E 260 benq Benq R55E
261 laptop-hp530 HP 530 laptop
261 test for testing/debugging purpose, almost all controls 262 test for testing/debugging purpose, almost all controls
262 can be adjusted. Appearing only when compiled with 263 can be adjusted. Appearing only when compiled with
263 $CONFIG_SND_DEBUG=y 264 $CONFIG_SND_DEBUG=y
@@ -278,9 +279,16 @@ Conexant 5051
278 hp-dv6736 HP dv6736 279 hp-dv6736 HP dv6736
279 lenovo-x200 Lenovo X200 laptop 280 lenovo-x200 Lenovo X200 laptop
280 281
282Conexant 5066
283=============
284 laptop Basic Laptop config (default)
285 dell-laptop Dell laptops
286 olpc-xo-1_5 OLPC XO 1.5
287
281STAC9200 288STAC9200
282======== 289========
283 ref Reference board 290 ref Reference board
291 oqo OQO Model 2
284 dell-d21 Dell (unknown) 292 dell-d21 Dell (unknown)
285 dell-d22 Dell (unknown) 293 dell-d22 Dell (unknown)
286 dell-d23 Dell (unknown) 294 dell-d23 Dell (unknown)
@@ -368,10 +376,12 @@ STAC92HD73*
368=========== 376===========
369 ref Reference board 377 ref Reference board
370 no-jd BIOS setup but without jack-detection 378 no-jd BIOS setup but without jack-detection
379 intel Intel DG45* mobos
371 dell-m6-amic Dell desktops/laptops with analog mics 380 dell-m6-amic Dell desktops/laptops with analog mics
372 dell-m6-dmic Dell desktops/laptops with digital mics 381 dell-m6-dmic Dell desktops/laptops with digital mics
373 dell-m6 Dell desktops/laptops with both type of mics 382 dell-m6 Dell desktops/laptops with both type of mics
374 dell-eq Dell desktops/laptops 383 dell-eq Dell desktops/laptops
384 alienware Alienware M17x
375 auto BIOS setup (default) 385 auto BIOS setup (default)
376 386
377STAC92HD83* 387STAC92HD83*
@@ -385,3 +395,8 @@ STAC9872
385======== 395========
386 vaio VAIO laptop without SPDIF 396 vaio VAIO laptop without SPDIF
387 auto BIOS setup (default) 397 auto BIOS setup (default)
398
399Cirrus Logic CS4206/4207
400========================
401 mbp55 MacBook Pro 5,5
402 auto BIOS setup (default)
diff --git a/Documentation/sound/alsa/HD-Audio.txt b/Documentation/sound/alsa/HD-Audio.txt
index 71ac995b1915..7b8a5f947d1d 100644
--- a/Documentation/sound/alsa/HD-Audio.txt
+++ b/Documentation/sound/alsa/HD-Audio.txt
@@ -139,6 +139,10 @@ The driver checks PCI SSID and looks through the static configuration
139table until any matching entry is found. If you have a new machine, 139table until any matching entry is found. If you have a new machine,
140you may see a message like below: 140you may see a message like below:
141------------------------------------------------------------------------ 141------------------------------------------------------------------------
142 hda_codec: ALC880: BIOS auto-probing.
143------------------------------------------------------------------------
144Meanwhile, in the earlier versions, you would see a message like:
145------------------------------------------------------------------------
142 hda_codec: Unknown model for ALC880, trying auto-probe from BIOS... 146 hda_codec: Unknown model for ALC880, trying auto-probe from BIOS...
143------------------------------------------------------------------------ 147------------------------------------------------------------------------
144Even if you see such a message, DON'T PANIC. Take a deep breath and 148Even if you see such a message, DON'T PANIC. Take a deep breath and
@@ -403,6 +407,66 @@ re-configure based on that state, run like below:
403------------------------------------------------------------------------ 407------------------------------------------------------------------------
404 408
405 409
410Early Patching
411~~~~~~~~~~~~~~
412When CONFIG_SND_HDA_PATCH_LOADER=y is set, you can pass a "patch" as a
413firmware file for modifying the HD-audio setup before initializing the
414codec. This can work basically like the reconfiguration via sysfs in
415the above, but it does it before the first codec configuration.
416
417A patch file is a plain text file which looks like below:
418
419------------------------------------------------------------------------
420 [codec]
421 0x12345678 0xabcd1234 2
422
423 [model]
424 auto
425
426 [pincfg]
427 0x12 0x411111f0
428
429 [verb]
430 0x20 0x500 0x03
431 0x20 0x400 0xff
432
433 [hint]
434 hp_detect = yes
435------------------------------------------------------------------------
436
437The file needs to have a line `[codec]`. The next line should contain
438three numbers indicating the codec vendor-id (0x12345678 in the
439example), the codec subsystem-id (0xabcd1234) and the address (2) of
440the codec. The rest patch entries are applied to this specified codec
441until another codec entry is given.
442
443The `[model]` line allows to change the model name of the each codec.
444In the example above, it will be changed to model=auto.
445Note that this overrides the module option.
446
447After the `[pincfg]` line, the contents are parsed as the initial
448default pin-configurations just like `user_pin_configs` sysfs above.
449The values can be shown in user_pin_configs sysfs file, too.
450
451Similarly, the lines after `[verb]` are parsed as `init_verbs`
452sysfs entries, and the lines after `[hint]` are parsed as `hints`
453sysfs entries, respectively.
454
455The hd-audio driver reads the file via request_firmware(). Thus,
456a patch file has to be located on the appropriate firmware path,
457typically, /lib/firmware. For example, when you pass the option
458`patch=hda-init.fw`, the file /lib/firmware/hda-init-fw must be
459present.
460
461The patch module option is specific to each card instance, and you
462need to give one file name for each instance, separated by commas.
463For example, if you have two cards, one for an on-board analog and one
464for an HDMI video board, you may pass patch option like below:
465------------------------------------------------------------------------
466 options snd-hda-intel patch=on-board-patch,hdmi-patch
467------------------------------------------------------------------------
468
469
406Power-Saving 470Power-Saving
407~~~~~~~~~~~~ 471~~~~~~~~~~~~
408The power-saving is a kind of auto-suspend of the device. When the 472The power-saving is a kind of auto-suspend of the device. When the
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 322a00bb99d9..2dbff53369d0 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -19,6 +19,7 @@ Currently, these files might (depending on your configuration)
19show up in /proc/sys/kernel: 19show up in /proc/sys/kernel:
20- acpi_video_flags 20- acpi_video_flags
21- acct 21- acct
22- callhome [ S390 only ]
22- auto_msgmni 23- auto_msgmni
23- core_pattern 24- core_pattern
24- core_uses_pid 25- core_uses_pid
@@ -91,6 +92,21 @@ valid for 30 seconds.
91 92
92============================================================== 93==============================================================
93 94
95callhome:
96
97Controls the kernel's callhome behavior in case of a kernel panic.
98
99The s390 hardware allows an operating system to send a notification
100to a service organization (callhome) in case of an operating system panic.
101
102When the value in this file is 0 (which is the default behavior)
103nothing happens in case of a kernel panic. If this value is set to "1"
104the complete kernel oops message is send to the IBM customer service
105organization in case the mainframe the Linux operating system is running
106on has a service contract with IBM.
107
108==============================================================
109
94core_pattern: 110core_pattern:
95 111
96core_pattern is used to specify a core dumpfile pattern name. 112core_pattern is used to specify a core dumpfile pattern name.
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index f157d7594ea7..78c45a87be57 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -1,7 +1,7 @@
1 Event Tracing 1 Event Tracing
2 2
3 Documentation written by Theodore Ts'o 3 Documentation written by Theodore Ts'o
4 Updated by Li Zefan 4 Updated by Li Zefan and Tom Zanussi
5 5
61. Introduction 61. Introduction
7=============== 7===============
@@ -22,12 +22,12 @@ tracing information should be printed.
22--------------------------------- 22---------------------------------
23 23
24The events which are available for tracing can be found in the file 24The events which are available for tracing can be found in the file
25/debug/tracing/available_events. 25/sys/kernel/debug/tracing/available_events.
26 26
27To enable a particular event, such as 'sched_wakeup', simply echo it 27To enable a particular event, such as 'sched_wakeup', simply echo it
28to /debug/tracing/set_event. For example: 28to /sys/kernel/debug/tracing/set_event. For example:
29 29
30 # echo sched_wakeup >> /debug/tracing/set_event 30 # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event
31 31
32[ Note: '>>' is necessary, otherwise it will firstly disable 32[ Note: '>>' is necessary, otherwise it will firstly disable
33 all the events. ] 33 all the events. ]
@@ -35,15 +35,15 @@ to /debug/tracing/set_event. For example:
35To disable an event, echo the event name to the set_event file prefixed 35To disable an event, echo the event name to the set_event file prefixed
36with an exclamation point: 36with an exclamation point:
37 37
38 # echo '!sched_wakeup' >> /debug/tracing/set_event 38 # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event
39 39
40To disable all events, echo an empty line to the set_event file: 40To disable all events, echo an empty line to the set_event file:
41 41
42 # echo > /debug/tracing/set_event 42 # echo > /sys/kernel/debug/tracing/set_event
43 43
44To enable all events, echo '*:*' or '*:' to the set_event file: 44To enable all events, echo '*:*' or '*:' to the set_event file:
45 45
46 # echo *:* > /debug/tracing/set_event 46 # echo *:* > /sys/kernel/debug/tracing/set_event
47 47
48The events are organized into subsystems, such as ext4, irq, sched, 48The events are organized into subsystems, such as ext4, irq, sched,
49etc., and a full event name looks like this: <subsystem>:<event>. The 49etc., and a full event name looks like this: <subsystem>:<event>. The
@@ -52,29 +52,29 @@ file. All of the events in a subsystem can be specified via the syntax
52"<subsystem>:*"; for example, to enable all irq events, you can use the 52"<subsystem>:*"; for example, to enable all irq events, you can use the
53command: 53command:
54 54
55 # echo 'irq:*' > /debug/tracing/set_event 55 # echo 'irq:*' > /sys/kernel/debug/tracing/set_event
56 56
572.2 Via the 'enable' toggle 572.2 Via the 'enable' toggle
58--------------------------- 58---------------------------
59 59
60The events available are also listed in /debug/tracing/events/ hierarchy 60The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy
61of directories. 61of directories.
62 62
63To enable event 'sched_wakeup': 63To enable event 'sched_wakeup':
64 64
65 # echo 1 > /debug/tracing/events/sched/sched_wakeup/enable 65 # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
66 66
67To disable it: 67To disable it:
68 68
69 # echo 0 > /debug/tracing/events/sched/sched_wakeup/enable 69 # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
70 70
71To enable all events in sched subsystem: 71To enable all events in sched subsystem:
72 72
73 # echo 1 > /debug/tracing/events/sched/enable 73 # echo 1 > /sys/kernel/debug/tracing/events/sched/enable
74 74
75To eanble all events: 75To eanble all events:
76 76
77 # echo 1 > /debug/tracing/events/enable 77 # echo 1 > /sys/kernel/debug/tracing/events/enable
78 78
79When reading one of these enable files, there are four results: 79When reading one of these enable files, there are four results:
80 80
@@ -83,8 +83,199 @@ When reading one of these enable files, there are four results:
83 X - there is a mixture of events enabled and disabled 83 X - there is a mixture of events enabled and disabled
84 ? - this file does not affect any event 84 ? - this file does not affect any event
85 85
862.3 Boot option
87---------------
88
89In order to facilitate early boot debugging, use boot option:
90
91 trace_event=[event-list]
92
93The format of this boot option is the same as described in section 2.1.
94
863. Defining an event-enabled tracepoint 953. Defining an event-enabled tracepoint
87======================================= 96=======================================
88 97
89See The example provided in samples/trace_events 98See The example provided in samples/trace_events
90 99
1004. Event formats
101================
102
103Each trace event has a 'format' file associated with it that contains
104a description of each field in a logged event. This information can
105be used to parse the binary trace stream, and is also the place to
106find the field names that can be used in event filters (see section 5).
107
108It also displays the format string that will be used to print the
109event in text mode, along with the event name and ID used for
110profiling.
111
112Every event has a set of 'common' fields associated with it; these are
113the fields prefixed with 'common_'. The other fields vary between
114events and correspond to the fields defined in the TRACE_EVENT
115definition for that event.
116
117Each field in the format has the form:
118
119 field:field-type field-name; offset:N; size:N;
120
121where offset is the offset of the field in the trace record and size
122is the size of the data item, in bytes.
123
124For example, here's the information displayed for the 'sched_wakeup'
125event:
126
127# cat /debug/tracing/events/sched/sched_wakeup/format
128
129name: sched_wakeup
130ID: 60
131format:
132 field:unsigned short common_type; offset:0; size:2;
133 field:unsigned char common_flags; offset:2; size:1;
134 field:unsigned char common_preempt_count; offset:3; size:1;
135 field:int common_pid; offset:4; size:4;
136 field:int common_tgid; offset:8; size:4;
137
138 field:char comm[TASK_COMM_LEN]; offset:12; size:16;
139 field:pid_t pid; offset:28; size:4;
140 field:int prio; offset:32; size:4;
141 field:int success; offset:36; size:4;
142 field:int cpu; offset:40; size:4;
143
144print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid,
145 REC->prio, REC->success, REC->cpu
146
147This event contains 10 fields, the first 5 common and the remaining 5
148event-specific. All the fields for this event are numeric, except for
149'comm' which is a string, a distinction important for event filtering.
150
1515. Event filtering
152==================
153
154Trace events can be filtered in the kernel by associating boolean
155'filter expressions' with them. As soon as an event is logged into
156the trace buffer, its fields are checked against the filter expression
157associated with that event type. An event with field values that
158'match' the filter will appear in the trace output, and an event whose
159values don't match will be discarded. An event with no filter
160associated with it matches everything, and is the default when no
161filter has been set for an event.
162
1635.1 Expression syntax
164---------------------
165
166A filter expression consists of one or more 'predicates' that can be
167combined using the logical operators '&&' and '||'. A predicate is
168simply a clause that compares the value of a field contained within a
169logged event with a constant value and returns either 0 or 1 depending
170on whether the field value matched (1) or didn't match (0):
171
172 field-name relational-operator value
173
174Parentheses can be used to provide arbitrary logical groupings and
175double-quotes can be used to prevent the shell from interpreting
176operators as shell metacharacters.
177
178The field-names available for use in filters can be found in the
179'format' files for trace events (see section 4).
180
181The relational-operators depend on the type of the field being tested:
182
183The operators available for numeric fields are:
184
185==, !=, <, <=, >, >=
186
187And for string fields they are:
188
189==, !=
190
191Currently, only exact string matches are supported.
192
193Currently, the maximum number of predicates in a filter is 16.
194
1955.2 Setting filters
196-------------------
197
198A filter for an individual event is set by writing a filter expression
199to the 'filter' file for the given event.
200
201For example:
202
203# cd /debug/tracing/events/sched/sched_wakeup
204# echo "common_preempt_count > 4" > filter
205
206A slightly more involved example:
207
208# cd /debug/tracing/events/sched/sched_signal_send
209# echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter
210
211If there is an error in the expression, you'll get an 'Invalid
212argument' error when setting it, and the erroneous string along with
213an error message can be seen by looking at the filter e.g.:
214
215# cd /debug/tracing/events/sched/sched_signal_send
216# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
217-bash: echo: write error: Invalid argument
218# cat filter
219((sig >= 10 && sig < 15) || dsig == 17) && comm != bash
220^
221parse_error: Field not found
222
223Currently the caret ('^') for an error always appears at the beginning of
224the filter string; the error message should still be useful though
225even without more accurate position info.
226
2275.3 Clearing filters
228--------------------
229
230To clear the filter for an event, write a '0' to the event's filter
231file.
232
233To clear the filters for all events in a subsystem, write a '0' to the
234subsystem's filter file.
235
2365.3 Subsystem filters
237---------------------
238
239For convenience, filters for every event in a subsystem can be set or
240cleared as a group by writing a filter expression into the filter file
241at the root of the subsytem. Note however, that if a filter for any
242event within the subsystem lacks a field specified in the subsystem
243filter, or if the filter can't be applied for any other reason, the
244filter for that event will retain its previous setting. This can
245result in an unintended mixture of filters which could lead to
246confusing (to the user who might think different filters are in
247effect) trace output. Only filters that reference just the common
248fields can be guaranteed to propagate successfully to all events.
249
250Here are a few subsystem filter examples that also illustrate the
251above points:
252
253Clear the filters on all events in the sched subsytem:
254
255# cd /sys/kernel/debug/tracing/events/sched
256# echo 0 > filter
257# cat sched_switch/filter
258none
259# cat sched_wakeup/filter
260none
261
262Set a filter using only common fields for all events in the sched
263subsytem (all events end up with the same filter):
264
265# cd /sys/kernel/debug/tracing/events/sched
266# echo common_pid == 0 > filter
267# cat sched_switch/filter
268common_pid == 0
269# cat sched_wakeup/filter
270common_pid == 0
271
272Attempt to set a filter using a non-common field for all events in the
273sched subsytem (all events but those that have a prev_pid field retain
274their old filters):
275
276# cd /sys/kernel/debug/tracing/events/sched
277# echo prev_pid == 0 > filter
278# cat sched_switch/filter
279prev_pid == 0
280# cat sched_wakeup/filter
281common_pid == 0
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt
new file mode 100644
index 000000000000..7003e10f10f5
--- /dev/null
+++ b/Documentation/trace/ftrace-design.txt
@@ -0,0 +1,233 @@
1 function tracer guts
2 ====================
3
4Introduction
5------------
6
7Here we will cover the architecture pieces that the common function tracing
8code relies on for proper functioning. Things are broken down into increasing
9complexity so that you can start simple and at least get basic functionality.
10
11Note that this focuses on architecture implementation details only. If you
12want more explanation of a feature in terms of common code, review the common
13ftrace.txt file.
14
15
16Prerequisites
17-------------
18
19Ftrace relies on these features being implemented:
20 STACKTRACE_SUPPORT - implement save_stack_trace()
21 TRACE_IRQFLAGS_SUPPORT - implement include/asm/irqflags.h
22
23
24HAVE_FUNCTION_TRACER
25--------------------
26
27You will need to implement the mcount and the ftrace_stub functions.
28
29The exact mcount symbol name will depend on your toolchain. Some call it
30"mcount", "_mcount", or even "__mcount". You can probably figure it out by
31running something like:
32 $ echo 'main(){}' | gcc -x c -S -o - - -pg | grep mcount
33 call mcount
34We'll make the assumption below that the symbol is "mcount" just to keep things
35nice and simple in the examples.
36
37Keep in mind that the ABI that is in effect inside of the mcount function is
38*highly* architecture/toolchain specific. We cannot help you in this regard,
39sorry. Dig up some old documentation and/or find someone more familiar than
40you to bang ideas off of. Typically, register usage (argument/scratch/etc...)
41is a major issue at this point, especially in relation to the location of the
42mcount call (before/after function prologue). You might also want to look at
43how glibc has implemented the mcount function for your architecture. It might
44be (semi-)relevant.
45
46The mcount function should check the function pointer ftrace_trace_function
47to see if it is set to ftrace_stub. If it is, there is nothing for you to do,
48so return immediately. If it isn't, then call that function in the same way
49the mcount function normally calls __mcount_internal -- the first argument is
50the "frompc" while the second argument is the "selfpc" (adjusted to remove the
51size of the mcount call that is embedded in the function).
52
53For example, if the function foo() calls bar(), when the bar() function calls
54mcount(), the arguments mcount() will pass to the tracer are:
55 "frompc" - the address bar() will use to return to foo()
56 "selfpc" - the address bar() (with _mcount() size adjustment)
57
58Also keep in mind that this mcount function will be called *a lot*, so
59optimizing for the default case of no tracer will help the smooth running of
60your system when tracing is disabled. So the start of the mcount function is
61typically the bare min with checking things before returning. That also means
62the code flow should usually kept linear (i.e. no branching in the nop case).
63This is of course an optimization and not a hard requirement.
64
65Here is some pseudo code that should help (these functions should actually be
66implemented in assembly):
67
68void ftrace_stub(void)
69{
70 return;
71}
72
73void mcount(void)
74{
75 /* save any bare state needed in order to do initial checking */
76
77 extern void (*ftrace_trace_function)(unsigned long, unsigned long);
78 if (ftrace_trace_function != ftrace_stub)
79 goto do_trace;
80
81 /* restore any bare state */
82
83 return;
84
85do_trace:
86
87 /* save all state needed by the ABI (see paragraph above) */
88
89 unsigned long frompc = ...;
90 unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE;
91 ftrace_trace_function(frompc, selfpc);
92
93 /* restore all state needed by the ABI */
94}
95
96Don't forget to export mcount for modules !
97extern void mcount(void);
98EXPORT_SYMBOL(mcount);
99
100
101HAVE_FUNCTION_TRACE_MCOUNT_TEST
102-------------------------------
103
104This is an optional optimization for the normal case when tracing is turned off
105in the system. If you do not enable this Kconfig option, the common ftrace
106code will take care of doing the checking for you.
107
108To support this feature, you only need to check the function_trace_stop
109variable in the mcount function. If it is non-zero, there is no tracing to be
110done at all, so you can return.
111
112This additional pseudo code would simply be:
113void mcount(void)
114{
115 /* save any bare state needed in order to do initial checking */
116
117+ if (function_trace_stop)
118+ return;
119
120 extern void (*ftrace_trace_function)(unsigned long, unsigned long);
121 if (ftrace_trace_function != ftrace_stub)
122...
123
124
125HAVE_FUNCTION_GRAPH_TRACER
126--------------------------
127
128Deep breath ... time to do some real work. Here you will need to update the
129mcount function to check ftrace graph function pointers, as well as implement
130some functions to save (hijack) and restore the return address.
131
132The mcount function should check the function pointers ftrace_graph_return
133(compare to ftrace_stub) and ftrace_graph_entry (compare to
134ftrace_graph_entry_stub). If either of those are not set to the relevant stub
135function, call the arch-specific function ftrace_graph_caller which in turn
136calls the arch-specific function prepare_ftrace_return. Neither of these
137function names are strictly required, but you should use them anyways to stay
138consistent across the architecture ports -- easier to compare & contrast
139things.
140
141The arguments to prepare_ftrace_return are slightly different than what are
142passed to ftrace_trace_function. The second argument "selfpc" is the same,
143but the first argument should be a pointer to the "frompc". Typically this is
144located on the stack. This allows the function to hijack the return address
145temporarily to have it point to the arch-specific function return_to_handler.
146That function will simply call the common ftrace_return_to_handler function and
147that will return the original return address with which, you can return to the
148original call site.
149
150Here is the updated mcount pseudo code:
151void mcount(void)
152{
153...
154 if (ftrace_trace_function != ftrace_stub)
155 goto do_trace;
156
157+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
158+ extern void (*ftrace_graph_return)(...);
159+ extern void (*ftrace_graph_entry)(...);
160+ if (ftrace_graph_return != ftrace_stub ||
161+ ftrace_graph_entry != ftrace_graph_entry_stub)
162+ ftrace_graph_caller();
163+#endif
164
165 /* restore any bare state */
166...
167
168Here is the pseudo code for the new ftrace_graph_caller assembly function:
169#ifdef CONFIG_FUNCTION_GRAPH_TRACER
170void ftrace_graph_caller(void)
171{
172 /* save all state needed by the ABI */
173
174 unsigned long *frompc = &...;
175 unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE;
176 prepare_ftrace_return(frompc, selfpc);
177
178 /* restore all state needed by the ABI */
179}
180#endif
181
182For information on how to implement prepare_ftrace_return(), simply look at
183the x86 version. The only architecture-specific piece in it is the setup of
184the fault recovery table (the asm(...) code). The rest should be the same
185across architectures.
186
187Here is the pseudo code for the new return_to_handler assembly function. Note
188that the ABI that applies here is different from what applies to the mcount
189code. Since you are returning from a function (after the epilogue), you might
190be able to skimp on things saved/restored (usually just registers used to pass
191return values).
192
193#ifdef CONFIG_FUNCTION_GRAPH_TRACER
194void return_to_handler(void)
195{
196 /* save all state needed by the ABI (see paragraph above) */
197
198 void (*original_return_point)(void) = ftrace_return_to_handler();
199
200 /* restore all state needed by the ABI */
201
202 /* this is usually either a return or a jump */
203 original_return_point();
204}
205#endif
206
207
208HAVE_FTRACE_NMI_ENTER
209---------------------
210
211If you can't trace NMI functions, then skip this option.
212
213<details to be filled>
214
215
216HAVE_FTRACE_SYSCALLS
217---------------------
218
219<details to be filled>
220
221
222HAVE_FTRACE_MCOUNT_RECORD
223-------------------------
224
225See scripts/recordmcount.pl for more info.
226
227<details to be filled>
228
229
230HAVE_DYNAMIC_FTRACE
231---------------------
232
233<details to be filled>
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index a39b3c749de5..1b6292bbdd6d 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -26,6 +26,12 @@ disabled, and more (ftrace allows for tracer plugins, which
26means that the list of tracers can always grow). 26means that the list of tracers can always grow).
27 27
28 28
29Implementation Details
30----------------------
31
32See ftrace-design.txt for details for arch porters and such.
33
34
29The File System 35The File System
30--------------- 36---------------
31 37
@@ -85,26 +91,19 @@ of ftrace. Here is a list of some of the key files:
85 This file holds the output of the trace in a human 91 This file holds the output of the trace in a human
86 readable format (described below). 92 readable format (described below).
87 93
88 latency_trace:
89
90 This file shows the same trace but the information
91 is organized more to display possible latencies
92 in the system (described below).
93
94 trace_pipe: 94 trace_pipe:
95 95
96 The output is the same as the "trace" file but this 96 The output is the same as the "trace" file but this
97 file is meant to be streamed with live tracing. 97 file is meant to be streamed with live tracing.
98 Reads from this file will block until new data 98 Reads from this file will block until new data is
99 is retrieved. Unlike the "trace" and "latency_trace" 99 retrieved. Unlike the "trace" file, this file is a
100 files, this file is a consumer. This means reading 100 consumer. This means reading from this file causes
101 from this file causes sequential reads to display 101 sequential reads to display more current data. Once
102 more current data. Once data is read from this 102 data is read from this file, it is consumed, and
103 file, it is consumed, and will not be read 103 will not be read again with a sequential read. The
104 again with a sequential read. The "trace" and 104 "trace" file is static, and if the tracer is not
105 "latency_trace" files are static, and if the 105 adding more data,they will display the same
106 tracer is not adding more data, they will display 106 information every time they are read.
107 the same information every time they are read.
108 107
109 trace_options: 108 trace_options:
110 109
@@ -117,10 +116,10 @@ of ftrace. Here is a list of some of the key files:
117 Some of the tracers record the max latency. 116 Some of the tracers record the max latency.
118 For example, the time interrupts are disabled. 117 For example, the time interrupts are disabled.
119 This time is saved in this file. The max trace 118 This time is saved in this file. The max trace
120 will also be stored, and displayed by either 119 will also be stored, and displayed by "trace".
121 "trace" or "latency_trace". A new max trace will 120 A new max trace will only be recorded if the
122 only be recorded if the latency is greater than 121 latency is greater than the value in this
123 the value in this file. (in microseconds) 122 file. (in microseconds)
124 123
125 buffer_size_kb: 124 buffer_size_kb:
126 125
@@ -210,7 +209,7 @@ Here is the list of current tracers that may be configured.
210 the trace with the longest max latency. 209 the trace with the longest max latency.
211 See tracing_max_latency. When a new max is recorded, 210 See tracing_max_latency. When a new max is recorded,
212 it replaces the old trace. It is best to view this 211 it replaces the old trace. It is best to view this
213 trace via the latency_trace file. 212 trace with the latency-format option enabled.
214 213
215 "preemptoff" 214 "preemptoff"
216 215
@@ -307,8 +306,8 @@ the lowest priority thread (pid 0).
307Latency trace format 306Latency trace format
308-------------------- 307--------------------
309 308
310For traces that display latency times, the latency_trace file 309When the latency-format option is enabled, the trace file gives
311gives somewhat more information to see why a latency happened. 310somewhat more information to see why a latency happened.
312Here is a typical trace. 311Here is a typical trace.
313 312
314# tracer: irqsoff 313# tracer: irqsoff
@@ -380,9 +379,10 @@ explains which is which.
380 379
381The above is mostly meaningful for kernel developers. 380The above is mostly meaningful for kernel developers.
382 381
383 time: This differs from the trace file output. The trace file output 382 time: When the latency-format option is enabled, the trace file
384 includes an absolute timestamp. The timestamp used by the 383 output includes a timestamp relative to the start of the
385 latency_trace file is relative to the start of the trace. 384 trace. This differs from the output when latency-format
385 is disabled, which includes an absolute timestamp.
386 386
387 delay: This is just to help catch your eye a bit better. And 387 delay: This is just to help catch your eye a bit better. And
388 needs to be fixed to be only relative to the same CPU. 388 needs to be fixed to be only relative to the same CPU.
@@ -440,7 +440,8 @@ Here are the available options:
440 sym-addr: 440 sym-addr:
441 bash-4000 [01] 1477.606694: simple_strtoul <c0339346> 441 bash-4000 [01] 1477.606694: simple_strtoul <c0339346>
442 442
443 verbose - This deals with the latency_trace file. 443 verbose - This deals with the trace file when the
444 latency-format option is enabled.
444 445
445 bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \ 446 bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \
446 (+0.000ms): simple_strtoul (strict_strtoul) 447 (+0.000ms): simple_strtoul (strict_strtoul)
@@ -472,7 +473,7 @@ Here are the available options:
472 the app is no longer running 473 the app is no longer running
473 474
474 The lookup is performed when you read 475 The lookup is performed when you read
475 trace,trace_pipe,latency_trace. Example: 476 trace,trace_pipe. Example:
476 477
477 a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0 478 a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
478x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] 479x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
@@ -481,6 +482,11 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
481 every scheduling event. Will add overhead if 482 every scheduling event. Will add overhead if
482 there's a lot of tasks running at once. 483 there's a lot of tasks running at once.
483 484
485 latency-format - This option changes the trace. When
486 it is enabled, the trace displays
487 additional information about the
488 latencies, as described in "Latency
489 trace format".
484 490
485sched_switch 491sched_switch
486------------ 492------------
@@ -596,12 +602,13 @@ To reset the maximum, echo 0 into tracing_max_latency. Here is
596an example: 602an example:
597 603
598 # echo irqsoff > current_tracer 604 # echo irqsoff > current_tracer
605 # echo latency-format > trace_options
599 # echo 0 > tracing_max_latency 606 # echo 0 > tracing_max_latency
600 # echo 1 > tracing_enabled 607 # echo 1 > tracing_enabled
601 # ls -ltr 608 # ls -ltr
602 [...] 609 [...]
603 # echo 0 > tracing_enabled 610 # echo 0 > tracing_enabled
604 # cat latency_trace 611 # cat trace
605# tracer: irqsoff 612# tracer: irqsoff
606# 613#
607irqsoff latency trace v1.1.5 on 2.6.26 614irqsoff latency trace v1.1.5 on 2.6.26
@@ -703,12 +710,13 @@ which preemption was disabled. The control of preemptoff tracer
703is much like the irqsoff tracer. 710is much like the irqsoff tracer.
704 711
705 # echo preemptoff > current_tracer 712 # echo preemptoff > current_tracer
713 # echo latency-format > trace_options
706 # echo 0 > tracing_max_latency 714 # echo 0 > tracing_max_latency
707 # echo 1 > tracing_enabled 715 # echo 1 > tracing_enabled
708 # ls -ltr 716 # ls -ltr
709 [...] 717 [...]
710 # echo 0 > tracing_enabled 718 # echo 0 > tracing_enabled
711 # cat latency_trace 719 # cat trace
712# tracer: preemptoff 720# tracer: preemptoff
713# 721#
714preemptoff latency trace v1.1.5 on 2.6.26-rc8 722preemptoff latency trace v1.1.5 on 2.6.26-rc8
@@ -850,12 +858,13 @@ Again, using this trace is much like the irqsoff and preemptoff
850tracers. 858tracers.
851 859
852 # echo preemptirqsoff > current_tracer 860 # echo preemptirqsoff > current_tracer
861 # echo latency-format > trace_options
853 # echo 0 > tracing_max_latency 862 # echo 0 > tracing_max_latency
854 # echo 1 > tracing_enabled 863 # echo 1 > tracing_enabled
855 # ls -ltr 864 # ls -ltr
856 [...] 865 [...]
857 # echo 0 > tracing_enabled 866 # echo 0 > tracing_enabled
858 # cat latency_trace 867 # cat trace
859# tracer: preemptirqsoff 868# tracer: preemptirqsoff
860# 869#
861preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8 870preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
@@ -1012,11 +1021,12 @@ Instead of performing an 'ls', we will run 'sleep 1' under
1012'chrt' which changes the priority of the task. 1021'chrt' which changes the priority of the task.
1013 1022
1014 # echo wakeup > current_tracer 1023 # echo wakeup > current_tracer
1024 # echo latency-format > trace_options
1015 # echo 0 > tracing_max_latency 1025 # echo 0 > tracing_max_latency
1016 # echo 1 > tracing_enabled 1026 # echo 1 > tracing_enabled
1017 # chrt -f 5 sleep 1 1027 # chrt -f 5 sleep 1
1018 # echo 0 > tracing_enabled 1028 # echo 0 > tracing_enabled
1019 # cat latency_trace 1029 # cat trace
1020# tracer: wakeup 1030# tracer: wakeup
1021# 1031#
1022wakeup latency trace v1.1.5 on 2.6.26-rc8 1032wakeup latency trace v1.1.5 on 2.6.26-rc8
diff --git a/Documentation/trace/function-graph-fold.vim b/Documentation/trace/function-graph-fold.vim
new file mode 100644
index 000000000000..0544b504c8b0
--- /dev/null
+++ b/Documentation/trace/function-graph-fold.vim
@@ -0,0 +1,42 @@
1" Enable folding for ftrace function_graph traces.
2"
3" To use, :source this file while viewing a function_graph trace, or use vim's
4" -S option to load from the command-line together with a trace. You can then
5" use the usual vim fold commands, such as "za", to open and close nested
6" functions. While closed, a fold will show the total time taken for a call,
7" as would normally appear on the line with the closing brace. Folded
8" functions will not include finish_task_switch(), so folding should remain
9" relatively sane even through a context switch.
10"
11" Note that this will almost certainly only work well with a
12" single-CPU trace (e.g. trace-cmd report --cpu 1).
13
14function! FunctionGraphFoldExpr(lnum)
15 let line = getline(a:lnum)
16 if line[-1:] == '{'
17 if line =~ 'finish_task_switch() {$'
18 return '>1'
19 endif
20 return 'a1'
21 elseif line[-1:] == '}'
22 return 's1'
23 else
24 return '='
25 endif
26endfunction
27
28function! FunctionGraphFoldText()
29 let s = split(getline(v:foldstart), '|', 1)
30 if getline(v:foldend+1) =~ 'finish_task_switch() {$'
31 let s[2] = ' task switch '
32 else
33 let e = split(getline(v:foldend), '|', 1)
34 let s[2] = e[2]
35 endif
36 return join(s, '|')
37endfunction
38
39setlocal foldexpr=FunctionGraphFoldExpr(v:lnum)
40setlocal foldtext=FunctionGraphFoldText()
41setlocal foldcolumn=12
42setlocal foldmethod=expr
diff --git a/Documentation/trace/ring-buffer-design.txt b/Documentation/trace/ring-buffer-design.txt
new file mode 100644
index 000000000000..5b1d23d604c5
--- /dev/null
+++ b/Documentation/trace/ring-buffer-design.txt
@@ -0,0 +1,955 @@
1 Lockless Ring Buffer Design
2 ===========================
3
4Copyright 2009 Red Hat Inc.
5 Author: Steven Rostedt <srostedt@redhat.com>
6 License: The GNU Free Documentation License, Version 1.2
7 (dual licensed under the GPL v2)
8Reviewers: Mathieu Desnoyers, Huang Ying, Hidetoshi Seto,
9 and Frederic Weisbecker.
10
11
12Written for: 2.6.31
13
14Terminology used in this Document
15---------------------------------
16
17tail - where new writes happen in the ring buffer.
18
19head - where new reads happen in the ring buffer.
20
21producer - the task that writes into the ring buffer (same as writer)
22
23writer - same as producer
24
25consumer - the task that reads from the buffer (same as reader)
26
27reader - same as consumer.
28
29reader_page - A page outside the ring buffer used solely (for the most part)
30 by the reader.
31
32head_page - a pointer to the page that the reader will use next
33
34tail_page - a pointer to the page that will be written to next
35
36commit_page - a pointer to the page with the last finished non nested write.
37
38cmpxchg - hardware assisted atomic transaction that performs the following:
39
40 A = B iff previous A == C
41
42 R = cmpxchg(A, C, B) is saying that we replace A with B if and only if
43 current A is equal to C, and we put the old (current) A into R
44
45 R gets the previous A regardless if A is updated with B or not.
46
47 To see if the update was successful a compare of R == C may be used.
48
49The Generic Ring Buffer
50-----------------------
51
52The ring buffer can be used in either an overwrite mode or in
53producer/consumer mode.
54
55Producer/consumer mode is where the producer were to fill up the
56buffer before the consumer could free up anything, the producer
57will stop writing to the buffer. This will lose most recent events.
58
59Overwrite mode is where the produce were to fill up the buffer
60before the consumer could free up anything, the producer will
61overwrite the older data. This will lose the oldest events.
62
63No two writers can write at the same time (on the same per cpu buffer),
64but a writer may interrupt another writer, but it must finish writing
65before the previous writer may continue. This is very important to the
66algorithm. The writers act like a "stack". The way interrupts works
67enforces this behavior.
68
69
70 writer1 start
71 <preempted> writer2 start
72 <preempted> writer3 start
73 writer3 finishes
74 writer2 finishes
75 writer1 finishes
76
77This is very much like a writer being preempted by an interrupt and
78the interrupt doing a write as well.
79
80Readers can happen at any time. But no two readers may run at the
81same time, nor can a reader preempt/interrupt another reader. A reader
82can not preempt/interrupt a writer, but it may read/consume from the
83buffer at the same time as a writer is writing, but the reader must be
84on another processor to do so. A reader may read on its own processor
85and can be preempted by a writer.
86
87A writer can preempt a reader, but a reader can not preempt a writer.
88But a reader can read the buffer at the same time (on another processor)
89as a writer.
90
91The ring buffer is made up of a list of pages held together by a link list.
92
93At initialization a reader page is allocated for the reader that is not
94part of the ring buffer.
95
96The head_page, tail_page and commit_page are all initialized to point
97to the same page.
98
99The reader page is initialized to have its next pointer pointing to
100the head page, and its previous pointer pointing to a page before
101the head page.
102
103The reader has its own page to use. At start up time, this page is
104allocated but is not attached to the list. When the reader wants
105to read from the buffer, if its page is empty (like it is on start up)
106it will swap its page with the head_page. The old reader page will
107become part of the ring buffer and the head_page will be removed.
108The page after the inserted page (old reader_page) will become the
109new head page.
110
111Once the new page is given to the reader, the reader could do what
112it wants with it, as long as a writer has left that page.
113
114A sample of how the reader page is swapped: Note this does not
115show the head page in the buffer, it is for demonstrating a swap
116only.
117
118 +------+
119 |reader| RING BUFFER
120 |page |
121 +------+
122 +---+ +---+ +---+
123 | |-->| |-->| |
124 | |<--| |<--| |
125 +---+ +---+ +---+
126 ^ | ^ |
127 | +-------------+ |
128 +-----------------+
129
130
131 +------+
132 |reader| RING BUFFER
133 |page |-------------------+
134 +------+ v
135 | +---+ +---+ +---+
136 | | |-->| |-->| |
137 | | |<--| |<--| |<-+
138 | +---+ +---+ +---+ |
139 | ^ | ^ | |
140 | | +-------------+ | |
141 | +-----------------+ |
142 +------------------------------------+
143
144 +------+
145 |reader| RING BUFFER
146 |page |-------------------+
147 +------+ <---------------+ v
148 | ^ +---+ +---+ +---+
149 | | | |-->| |-->| |
150 | | | | | |<--| |<-+
151 | | +---+ +---+ +---+ |
152 | | | ^ | |
153 | | +-------------+ | |
154 | +-----------------------------+ |
155 +------------------------------------+
156
157 +------+
158 |buffer| RING BUFFER
159 |page |-------------------+
160 +------+ <---------------+ v
161 | ^ +---+ +---+ +---+
162 | | | | | |-->| |
163 | | New | | | |<--| |<-+
164 | | Reader +---+ +---+ +---+ |
165 | | page ----^ | |
166 | | | |
167 | +-----------------------------+ |
168 +------------------------------------+
169
170
171
172It is possible that the page swapped is the commit page and the tail page,
173if what is in the ring buffer is less than what is held in a buffer page.
174
175
176 reader page commit page tail page
177 | | |
178 v | |
179 +---+ | |
180 | |<----------+ |
181 | |<------------------------+
182 | |------+
183 +---+ |
184 |
185 v
186 +---+ +---+ +---+ +---+
187<---| |--->| |--->| |--->| |--->
188--->| |<---| |<---| |<---| |<---
189 +---+ +---+ +---+ +---+
190
191This case is still valid for this algorithm.
192When the writer leaves the page, it simply goes into the ring buffer
193since the reader page still points to the next location in the ring
194buffer.
195
196
197The main pointers:
198
199 reader page - The page used solely by the reader and is not part
200 of the ring buffer (may be swapped in)
201
202 head page - the next page in the ring buffer that will be swapped
203 with the reader page.
204
205 tail page - the page where the next write will take place.
206
207 commit page - the page that last finished a write.
208
209The commit page only is updated by the outer most writer in the
210writer stack. A writer that preempts another writer will not move the
211commit page.
212
213When data is written into the ring buffer, a position is reserved
214in the ring buffer and passed back to the writer. When the writer
215is finished writing data into that position, it commits the write.
216
217Another write (or a read) may take place at anytime during this
218transaction. If another write happens it must finish before continuing
219with the previous write.
220
221
222 Write reserve:
223
224 Buffer page
225 +---------+
226 |written |
227 +---------+ <--- given back to writer (current commit)
228 |reserved |
229 +---------+ <--- tail pointer
230 | empty |
231 +---------+
232
233 Write commit:
234
235 Buffer page
236 +---------+
237 |written |
238 +---------+
239 |written |
240 +---------+ <--- next positon for write (current commit)
241 | empty |
242 +---------+
243
244
245 If a write happens after the first reserve:
246
247 Buffer page
248 +---------+
249 |written |
250 +---------+ <-- current commit
251 |reserved |
252 +---------+ <--- given back to second writer
253 |reserved |
254 +---------+ <--- tail pointer
255
256 After second writer commits:
257
258
259 Buffer page
260 +---------+
261 |written |
262 +---------+ <--(last full commit)
263 |reserved |
264 +---------+
265 |pending |
266 |commit |
267 +---------+ <--- tail pointer
268
269 When the first writer commits:
270
271 Buffer page
272 +---------+
273 |written |
274 +---------+
275 |written |
276 +---------+
277 |written |
278 +---------+ <--(last full commit and tail pointer)
279
280
281The commit pointer points to the last write location that was
282committed without preempting another write. When a write that
283preempted another write is committed, it only becomes a pending commit
284and will not be a full commit till all writes have been committed.
285
286The commit page points to the page that has the last full commit.
287The tail page points to the page with the last write (before
288committing).
289
290The tail page is always equal to or after the commit page. It may
291be several pages ahead. If the tail page catches up to the commit
292page then no more writes may take place (regardless of the mode
293of the ring buffer: overwrite and produce/consumer).
294
295The order of pages are:
296
297 head page
298 commit page
299 tail page
300
301Possible scenario:
302 tail page
303 head page commit page |
304 | | |
305 v v v
306 +---+ +---+ +---+ +---+
307<---| |--->| |--->| |--->| |--->
308--->| |<---| |<---| |<---| |<---
309 +---+ +---+ +---+ +---+
310
311There is a special case that the head page is after either the commit page
312and possibly the tail page. That is when the commit (and tail) page has been
313swapped with the reader page. This is because the head page is always
314part of the ring buffer, but the reader page is not. When ever there
315has been less than a full page that has been committed inside the ring buffer,
316and a reader swaps out a page, it will be swapping out the commit page.
317
318
319 reader page commit page tail page
320 | | |
321 v | |
322 +---+ | |
323 | |<----------+ |
324 | |<------------------------+
325 | |------+
326 +---+ |
327 |
328 v
329 +---+ +---+ +---+ +---+
330<---| |--->| |--->| |--->| |--->
331--->| |<---| |<---| |<---| |<---
332 +---+ +---+ +---+ +---+
333 ^
334 |
335 head page
336
337
338In this case, the head page will not move when the tail and commit
339move back into the ring buffer.
340
341The reader can not swap a page into the ring buffer if the commit page
342is still on that page. If the read meets the last commit (real commit
343not pending or reserved), then there is nothing more to read.
344The buffer is considered empty until another full commit finishes.
345
346When the tail meets the head page, if the buffer is in overwrite mode,
347the head page will be pushed ahead one. If the buffer is in producer/consumer
348mode, the write will fail.
349
350Overwrite mode:
351
352 tail page
353 |
354 v
355 +---+ +---+ +---+ +---+
356<---| |--->| |--->| |--->| |--->
357--->| |<---| |<---| |<---| |<---
358 +---+ +---+ +---+ +---+
359 ^
360 |
361 head page
362
363
364 tail page
365 |
366 v
367 +---+ +---+ +---+ +---+
368<---| |--->| |--->| |--->| |--->
369--->| |<---| |<---| |<---| |<---
370 +---+ +---+ +---+ +---+
371 ^
372 |
373 head page
374
375
376 tail page
377 |
378 v
379 +---+ +---+ +---+ +---+
380<---| |--->| |--->| |--->| |--->
381--->| |<---| |<---| |<---| |<---
382 +---+ +---+ +---+ +---+
383 ^
384 |
385 head page
386
387Note, the reader page will still point to the previous head page.
388But when a swap takes place, it will use the most recent head page.
389
390
391Making the Ring Buffer Lockless:
392--------------------------------
393
394The main idea behind the lockless algorithm is to combine the moving
395of the head_page pointer with the swapping of pages with the reader.
396State flags are placed inside the pointer to the page. To do this,
397each page must be aligned in memory by 4 bytes. This will allow the 2
398least significant bits of the address to be used as flags. Since
399they will always be zero for the address. To get the address,
400simply mask out the flags.
401
402 MASK = ~3
403
404 address & MASK
405
406Two flags will be kept by these two bits:
407
408 HEADER - the page being pointed to is a head page
409
410 UPDATE - the page being pointed to is being updated by a writer
411 and was or is about to be a head page.
412
413
414 reader page
415 |
416 v
417 +---+
418 | |------+
419 +---+ |
420 |
421 v
422 +---+ +---+ +---+ +---+
423<---| |--->| |-H->| |--->| |--->
424--->| |<---| |<---| |<---| |<---
425 +---+ +---+ +---+ +---+
426
427
428The above pointer "-H->" would have the HEADER flag set. That is
429the next page is the next page to be swapped out by the reader.
430This pointer means the next page is the head page.
431
432When the tail page meets the head pointer, it will use cmpxchg to
433change the pointer to the UPDATE state:
434
435
436 tail page
437 |
438 v
439 +---+ +---+ +---+ +---+
440<---| |--->| |-H->| |--->| |--->
441--->| |<---| |<---| |<---| |<---
442 +---+ +---+ +---+ +---+
443
444 tail page
445 |
446 v
447 +---+ +---+ +---+ +---+
448<---| |--->| |-U->| |--->| |--->
449--->| |<---| |<---| |<---| |<---
450 +---+ +---+ +---+ +---+
451
452"-U->" represents a pointer in the UPDATE state.
453
454Any access to the reader will need to take some sort of lock to serialize
455the readers. But the writers will never take a lock to write to the
456ring buffer. This means we only need to worry about a single reader,
457and writes only preempt in "stack" formation.
458
459When the reader tries to swap the page with the ring buffer, it
460will also use cmpxchg. If the flag bit in the pointer to the
461head page does not have the HEADER flag set, the compare will fail
462and the reader will need to look for the new head page and try again.
463Note, the flag UPDATE and HEADER are never set at the same time.
464
465The reader swaps the reader page as follows:
466
467 +------+
468 |reader| RING BUFFER
469 |page |
470 +------+
471 +---+ +---+ +---+
472 | |--->| |--->| |
473 | |<---| |<---| |
474 +---+ +---+ +---+
475 ^ | ^ |
476 | +---------------+ |
477 +-----H-------------+
478
479The reader sets the reader page next pointer as HEADER to the page after
480the head page.
481
482
483 +------+
484 |reader| RING BUFFER
485 |page |-------H-----------+
486 +------+ v
487 | +---+ +---+ +---+
488 | | |--->| |--->| |
489 | | |<---| |<---| |<-+
490 | +---+ +---+ +---+ |
491 | ^ | ^ | |
492 | | +---------------+ | |
493 | +-----H-------------+ |
494 +--------------------------------------+
495
496It does a cmpxchg with the pointer to the previous head page to make it
497point to the reader page. Note that the new pointer does not have the HEADER
498flag set. This action atomically moves the head page forward.
499
500 +------+
501 |reader| RING BUFFER
502 |page |-------H-----------+
503 +------+ v
504 | ^ +---+ +---+ +---+
505 | | | |-->| |-->| |
506 | | | |<--| |<--| |<-+
507 | | +---+ +---+ +---+ |
508 | | | ^ | |
509 | | +-------------+ | |
510 | +-----------------------------+ |
511 +------------------------------------+
512
513After the new head page is set, the previous pointer of the head page is
514updated to the reader page.
515
516 +------+
517 |reader| RING BUFFER
518 |page |-------H-----------+
519 +------+ <---------------+ v
520 | ^ +---+ +---+ +---+
521 | | | |-->| |-->| |
522 | | | | | |<--| |<-+
523 | | +---+ +---+ +---+ |
524 | | | ^ | |
525 | | +-------------+ | |
526 | +-----------------------------+ |
527 +------------------------------------+
528
529 +------+
530 |buffer| RING BUFFER
531 |page |-------H-----------+ <--- New head page
532 +------+ <---------------+ v
533 | ^ +---+ +---+ +---+
534 | | | | | |-->| |
535 | | New | | | |<--| |<-+
536 | | Reader +---+ +---+ +---+ |
537 | | page ----^ | |
538 | | | |
539 | +-----------------------------+ |
540 +------------------------------------+
541
542Another important point. The page that the reader page points back to
543by its previous pointer (the one that now points to the new head page)
544never points back to the reader page. That is because the reader page is
545not part of the ring buffer. Traversing the ring buffer via the next pointers
546will always stay in the ring buffer. Traversing the ring buffer via the
547prev pointers may not.
548
549Note, the way to determine a reader page is simply by examining the previous
550pointer of the page. If the next pointer of the previous page does not
551point back to the original page, then the original page is a reader page:
552
553
554 +--------+
555 | reader | next +----+
556 | page |-------->| |<====== (buffer page)
557 +--------+ +----+
558 | | ^
559 | v | next
560 prev | +----+
561 +------------->| |
562 +----+
563
564The way the head page moves forward:
565
566When the tail page meets the head page and the buffer is in overwrite mode
567and more writes take place, the head page must be moved forward before the
568writer may move the tail page. The way this is done is that the writer
569performs a cmpxchg to convert the pointer to the head page from the HEADER
570flag to have the UPDATE flag set. Once this is done, the reader will
571not be able to swap the head page from the buffer, nor will it be able to
572move the head page, until the writer is finished with the move.
573
574This eliminates any races that the reader can have on the writer. The reader
575must spin, and this is why the reader can not preempt the writer.
576
577 tail page
578 |
579 v
580 +---+ +---+ +---+ +---+
581<---| |--->| |-H->| |--->| |--->
582--->| |<---| |<---| |<---| |<---
583 +---+ +---+ +---+ +---+
584
585 tail page
586 |
587 v
588 +---+ +---+ +---+ +---+
589<---| |--->| |-U->| |--->| |--->
590--->| |<---| |<---| |<---| |<---
591 +---+ +---+ +---+ +---+
592
593The following page will be made into the new head page.
594
595 tail page
596 |
597 v
598 +---+ +---+ +---+ +---+
599<---| |--->| |-U->| |-H->| |--->
600--->| |<---| |<---| |<---| |<---
601 +---+ +---+ +---+ +---+
602
603After the new head page has been set, we can set the old head page
604pointer back to NORMAL.
605
606 tail page
607 |
608 v
609 +---+ +---+ +---+ +---+
610<---| |--->| |--->| |-H->| |--->
611--->| |<---| |<---| |<---| |<---
612 +---+ +---+ +---+ +---+
613
614After the head page has been moved, the tail page may now move forward.
615
616 tail page
617 |
618 v
619 +---+ +---+ +---+ +---+
620<---| |--->| |--->| |-H->| |--->
621--->| |<---| |<---| |<---| |<---
622 +---+ +---+ +---+ +---+
623
624
625The above are the trivial updates. Now for the more complex scenarios.
626
627
628As stated before, if enough writes preempt the first write, the
629tail page may make it all the way around the buffer and meet the commit
630page. At this time, we must start dropping writes (usually with some kind
631of warning to the user). But what happens if the commit was still on the
632reader page? The commit page is not part of the ring buffer. The tail page
633must account for this.
634
635
636 reader page commit page
637 | |
638 v |
639 +---+ |
640 | |<----------+
641 | |
642 | |------+
643 +---+ |
644 |
645 v
646 +---+ +---+ +---+ +---+
647<---| |--->| |-H->| |--->| |--->
648--->| |<---| |<---| |<---| |<---
649 +---+ +---+ +---+ +---+
650 ^
651 |
652 tail page
653
654If the tail page were to simply push the head page forward, the commit when
655leaving the reader page would not be pointing to the correct page.
656
657The solution to this is to test if the commit page is on the reader page
658before pushing the head page. If it is, then it can be assumed that the
659tail page wrapped the buffer, and we must drop new writes.
660
661This is not a race condition, because the commit page can only be moved
662by the outter most writer (the writer that was preempted).
663This means that the commit will not move while a writer is moving the
664tail page. The reader can not swap the reader page if it is also being
665used as the commit page. The reader can simply check that the commit
666is off the reader page. Once the commit page leaves the reader page
667it will never go back on it unless a reader does another swap with the
668buffer page that is also the commit page.
669
670
671Nested writes
672-------------
673
674In the pushing forward of the tail page we must first push forward
675the head page if the head page is the next page. If the head page
676is not the next page, the tail page is simply updated with a cmpxchg.
677
678Only writers move the tail page. This must be done atomically to protect
679against nested writers.
680
681 temp_page = tail_page
682 next_page = temp_page->next
683 cmpxchg(tail_page, temp_page, next_page)
684
685The above will update the tail page if it is still pointing to the expected
686page. If this fails, a nested write pushed it forward, the the current write
687does not need to push it.
688
689
690 temp page
691 |
692 v
693 tail page
694 |
695 v
696 +---+ +---+ +---+ +---+
697<---| |--->| |--->| |--->| |--->
698--->| |<---| |<---| |<---| |<---
699 +---+ +---+ +---+ +---+
700
701Nested write comes in and moves the tail page forward:
702
703 tail page (moved by nested writer)
704 temp page |
705 | |
706 v v
707 +---+ +---+ +---+ +---+
708<---| |--->| |--->| |--->| |--->
709--->| |<---| |<---| |<---| |<---
710 +---+ +---+ +---+ +---+
711
712The above would fail the cmpxchg, but since the tail page has already
713been moved forward, the writer will just try again to reserve storage
714on the new tail page.
715
716But the moving of the head page is a bit more complex.
717
718 tail page
719 |
720 v
721 +---+ +---+ +---+ +---+
722<---| |--->| |-H->| |--->| |--->
723--->| |<---| |<---| |<---| |<---
724 +---+ +---+ +---+ +---+
725
726The write converts the head page pointer to UPDATE.
727
728 tail page
729 |
730 v
731 +---+ +---+ +---+ +---+
732<---| |--->| |-U->| |--->| |--->
733--->| |<---| |<---| |<---| |<---
734 +---+ +---+ +---+ +---+
735
736But if a nested writer preempts here. It will see that the next
737page is a head page, but it is also nested. It will detect that
738it is nested and will save that information. The detection is the
739fact that it sees the UPDATE flag instead of a HEADER or NORMAL
740pointer.
741
742The nested writer will set the new head page pointer.
743
744 tail page
745 |
746 v
747 +---+ +---+ +---+ +---+
748<---| |--->| |-U->| |-H->| |--->
749--->| |<---| |<---| |<---| |<---
750 +---+ +---+ +---+ +---+
751
752But it will not reset the update back to normal. Only the writer
753that converted a pointer from HEAD to UPDATE will convert it back
754to NORMAL.
755
756 tail page
757 |
758 v
759 +---+ +---+ +---+ +---+
760<---| |--->| |-U->| |-H->| |--->
761--->| |<---| |<---| |<---| |<---
762 +---+ +---+ +---+ +---+
763
764After the nested writer finishes, the outer most writer will convert
765the UPDATE pointer to NORMAL.
766
767
768 tail page
769 |
770 v
771 +---+ +---+ +---+ +---+
772<---| |--->| |--->| |-H->| |--->
773--->| |<---| |<---| |<---| |<---
774 +---+ +---+ +---+ +---+
775
776
777It can be even more complex if several nested writes came in and moved
778the tail page ahead several pages:
779
780
781(first writer)
782
783 tail page
784 |
785 v
786 +---+ +---+ +---+ +---+
787<---| |--->| |-H->| |--->| |--->
788--->| |<---| |<---| |<---| |<---
789 +---+ +---+ +---+ +---+
790
791The write converts the head page pointer to UPDATE.
792
793 tail page
794 |
795 v
796 +---+ +---+ +---+ +---+
797<---| |--->| |-U->| |--->| |--->
798--->| |<---| |<---| |<---| |<---
799 +---+ +---+ +---+ +---+
800
801Next writer comes in, and sees the update and sets up the new
802head page.
803
804(second writer)
805
806 tail page
807 |
808 v
809 +---+ +---+ +---+ +---+
810<---| |--->| |-U->| |-H->| |--->
811--->| |<---| |<---| |<---| |<---
812 +---+ +---+ +---+ +---+
813
814The nested writer moves the tail page forward. But does not set the old
815update page to NORMAL because it is not the outer most writer.
816
817 tail page
818 |
819 v
820 +---+ +---+ +---+ +---+
821<---| |--->| |-U->| |-H->| |--->
822--->| |<---| |<---| |<---| |<---
823 +---+ +---+ +---+ +---+
824
825Another writer preempts and sees the page after the tail page is a head page.
826It changes it from HEAD to UPDATE.
827
828(third writer)
829
830 tail page
831 |
832 v
833 +---+ +---+ +---+ +---+
834<---| |--->| |-U->| |-U->| |--->
835--->| |<---| |<---| |<---| |<---
836 +---+ +---+ +---+ +---+
837
838The writer will move the head page forward:
839
840
841(third writer)
842
843 tail page
844 |
845 v
846 +---+ +---+ +---+ +---+
847<---| |--->| |-U->| |-U->| |-H->
848--->| |<---| |<---| |<---| |<---
849 +---+ +---+ +---+ +---+
850
851But now that the third writer did change the HEAD flag to UPDATE it
852will convert it to normal:
853
854
855(third writer)
856
857 tail page
858 |
859 v
860 +---+ +---+ +---+ +---+
861<---| |--->| |-U->| |--->| |-H->
862--->| |<---| |<---| |<---| |<---
863 +---+ +---+ +---+ +---+
864
865
866Then it will move the tail page, and return back to the second writer.
867
868
869(second writer)
870
871 tail page
872 |
873 v
874 +---+ +---+ +---+ +---+
875<---| |--->| |-U->| |--->| |-H->
876--->| |<---| |<---| |<---| |<---
877 +---+ +---+ +---+ +---+
878
879
880The second writer will fail to move the tail page because it was already
881moved, so it will try again and add its data to the new tail page.
882It will return to the first writer.
883
884
885(first writer)
886
887 tail page
888 |
889 v
890 +---+ +---+ +---+ +---+
891<---| |--->| |-U->| |--->| |-H->
892--->| |<---| |<---| |<---| |<---
893 +---+ +---+ +---+ +---+
894
895The first writer can not know atomically test if the tail page moved
896while it updates the HEAD page. It will then update the head page to
897what it thinks is the new head page.
898
899
900(first writer)
901
902 tail page
903 |
904 v
905 +---+ +---+ +---+ +---+
906<---| |--->| |-U->| |-H->| |-H->
907--->| |<---| |<---| |<---| |<---
908 +---+ +---+ +---+ +---+
909
910Since the cmpxchg returns the old value of the pointer the first writer
911will see it succeeded in updating the pointer from NORMAL to HEAD.
912But as we can see, this is not good enough. It must also check to see
913if the tail page is either where it use to be or on the next page:
914
915
916(first writer)
917
918 A B tail page
919 | | |
920 v v v
921 +---+ +---+ +---+ +---+
922<---| |--->| |-U->| |-H->| |-H->
923--->| |<---| |<---| |<---| |<---
924 +---+ +---+ +---+ +---+
925
926If tail page != A and tail page does not equal B, then it must reset the
927pointer back to NORMAL. The fact that it only needs to worry about
928nested writers, it only needs to check this after setting the HEAD page.
929
930
931(first writer)
932
933 A B tail page
934 | | |
935 v v v
936 +---+ +---+ +---+ +---+
937<---| |--->| |-U->| |--->| |-H->
938--->| |<---| |<---| |<---| |<---
939 +---+ +---+ +---+ +---+
940
941Now the writer can update the head page. This is also why the head page must
942remain in UPDATE and only reset by the outer most writer. This prevents
943the reader from seeing the incorrect head page.
944
945
946(first writer)
947
948 A B tail page
949 | | |
950 v v v
951 +---+ +---+ +---+ +---+
952<---| |--->| |--->| |--->| |-H->
953--->| |<---| |<---| |<---| |<---
954 +---+ +---+ +---+ +---+
955
diff --git a/Documentation/vgaarbiter.txt b/Documentation/vgaarbiter.txt
new file mode 100644
index 000000000000..987f9b0a5ece
--- /dev/null
+++ b/Documentation/vgaarbiter.txt
@@ -0,0 +1,194 @@
1
2VGA Arbiter
3===========
4
5Graphic devices are accessed through ranges in I/O or memory space. While most
6modern devices allow relocation of such ranges, some "Legacy" VGA devices
7implemented on PCI will typically have the same "hard-decoded" addresses as
8they did on ISA. For more details see "PCI Bus Binding to IEEE Std 1275-1994
9Standard for Boot (Initialization Configuration) Firmware Revision 2.1"
10Section 7, Legacy Devices.
11
12The Resource Access Control (RAC) module inside the X server [0] existed for
13the legacy VGA arbitration task (besides other bus management tasks) when more
14than one legacy device co-exists on the same machine. But the problem happens
15when these devices are trying to be accessed by different userspace clients
16(e.g. two server in parallel). Their address assignments conflict. Moreover,
17ideally, being an userspace application, it is not the role of the the X
18server to control bus resources. Therefore an arbitration scheme outside of
19the X server is needed to control the sharing of these resources. This
20document introduces the operation of the VGA arbiter implemented for Linux
21kernel.
22
23----------------------------------------------------------------------------
24
25I. Details and Theory of Operation
26 I.1 vgaarb
27 I.2 libpciaccess
28 I.3 xf86VGAArbiter (X server implementation)
29II. Credits
30III.References
31
32
33I. Details and Theory of Operation
34==================================
35
36I.1 vgaarb
37----------
38
39The vgaarb is a module of the Linux Kernel. When it is initially loaded, it
40scans all PCI devices and adds the VGA ones inside the arbitration. The
41arbiter then enables/disables the decoding on different devices of the VGA
42legacy instructions. Device which do not want/need to use the arbiter may
43explicitly tell it by calling vga_set_legacy_decoding().
44
45The kernel exports a char device interface (/dev/vga_arbiter) to the clients,
46which has the following semantics:
47
48 open : open user instance of the arbiter. By default, it's attached to
49 the default VGA device of the system.
50
51 close : close user instance. Release locks made by the user
52
53 read : return a string indicating the status of the target like:
54
55 "<card_ID>,decodes=<io_state>,owns=<io_state>,locks=<io_state> (ic,mc)"
56
57 An IO state string is of the form {io,mem,io+mem,none}, mc and
58 ic are respectively mem and io lock counts (for debugging/
59 diagnostic only). "decodes" indicate what the card currently
60 decodes, "owns" indicates what is currently enabled on it, and
61 "locks" indicates what is locked by this card. If the card is
62 unplugged, we get "invalid" then for card_ID and an -ENODEV
63 error is returned for any command until a new card is targeted.
64
65
66 write : write a command to the arbiter. List of commands:
67
68 target <card_ID> : switch target to card <card_ID> (see below)
69 lock <io_state> : acquires locks on target ("none" is an invalid io_state)
70 trylock <io_state> : non-blocking acquire locks on target (returns EBUSY if
71 unsuccessful)
72 unlock <io_state> : release locks on target
73 unlock all : release all locks on target held by this user (not
74 implemented yet)
75 decodes <io_state> : set the legacy decoding attributes for the card
76
77 poll : event if something changes on any card (not just the
78 target)
79
80 card_ID is of the form "PCI:domain:bus:dev.fn". It can be set to "default"
81 to go back to the system default card (TODO: not implemented yet). Currently,
82 only PCI is supported as a prefix, but the userland API may support other bus
83 types in the future, even if the current kernel implementation doesn't.
84
85Note about locks:
86
87The driver keeps track of which user has which locks on which card. It
88supports stacking, like the kernel one. This complexifies the implementation
89a bit, but makes the arbiter more tolerant to user space problems and able
90to properly cleanup in all cases when a process dies.
91Currently, a max of 16 cards can have locks simultaneously issued from
92user space for a given user (file descriptor instance) of the arbiter.
93
94In the case of devices hot-{un,}plugged, there is a hook - pci_notify() - to
95notify them being added/removed in the system and automatically added/removed
96in the arbiter.
97
98There's also a in-kernel API of the arbiter in the case of DRM, vgacon and
99others which may use the arbiter.
100
101
102I.2 libpciaccess
103----------------
104
105To use the vga arbiter char device it was implemented an API inside the
106libpciaccess library. One fieldd was added to struct pci_device (each device
107on the system):
108
109 /* the type of resource decoded by the device */
110 int vgaarb_rsrc;
111
112Besides it, in pci_system were added:
113
114 int vgaarb_fd;
115 int vga_count;
116 struct pci_device *vga_target;
117 struct pci_device *vga_default_dev;
118
119
120The vga_count is usually need to keep informed how many cards are being
121arbitrated, so for instance if there's only one then it can totally escape the
122scheme.
123
124
125These functions below acquire VGA resources for the given card and mark those
126resources as locked. If the resources requested are "normal" (and not legacy)
127resources, the arbiter will first check whether the card is doing legacy
128decoding for that type of resource. If yes, the lock is "converted" into a
129legacy resource lock. The arbiter will first look for all VGA cards that
130might conflict and disable their IOs and/or Memory access, including VGA
131forwarding on P2P bridges if necessary, so that the requested resources can
132be used. Then, the card is marked as locking these resources and the IO and/or
133Memory access is enabled on the card (including VGA forwarding on parent
134P2P bridges if any). In the case of vga_arb_lock(), the function will block
135if some conflicting card is already locking one of the required resources (or
136any resource on a different bus segment, since P2P bridges don't differentiate
137VGA memory and IO afaik). If the card already owns the resources, the function
138succeeds. vga_arb_trylock() will return (-EBUSY) instead of blocking. Nested
139calls are supported (a per-resource counter is maintained).
140
141
142Set the target device of this client.
143 int pci_device_vgaarb_set_target (struct pci_device *dev);
144
145
146For instance, in x86 if two devices on the same bus want to lock different
147resources, both will succeed (lock). If devices are in different buses and
148trying to lock different resources, only the first who tried succeeds.
149 int pci_device_vgaarb_lock (void);
150 int pci_device_vgaarb_trylock (void);
151
152Unlock resources of device.
153 int pci_device_vgaarb_unlock (void);
154
155Indicates to the arbiter if the card decodes legacy VGA IOs, legacy VGA
156Memory, both, or none. All cards default to both, the card driver (fbdev for
157example) should tell the arbiter if it has disabled legacy decoding, so the
158card can be left out of the arbitration process (and can be safe to take
159interrupts at any time.
160 int pci_device_vgaarb_decodes (int new_vgaarb_rsrc);
161
162Connects to the arbiter device, allocates the struct
163 int pci_device_vgaarb_init (void);
164
165Close the connection
166 void pci_device_vgaarb_fini (void);
167
168
169I.3 xf86VGAArbiter (X server implementation)
170--------------------------------------------
171
172(TODO)
173
174X server basically wraps all the functions that touch VGA registers somehow.
175
176
177II. Credits
178===========
179
180Benjamin Herrenschmidt (IBM?) started this work when he discussed such design
181with the Xorg community in 2005 [1, 2]. In the end of 2007, Paulo Zanoni and
182Tiago Vignatti (both of C3SL/Federal University of Paraná) proceeded his work
183enhancing the kernel code to adapt as a kernel module and also did the
184implementation of the user space side [3]. Now (2009) Tiago Vignatti and Dave
185Airlie finally put this work in shape and queued to Jesse Barnes' PCI tree.
186
187
188III. References
189==============
190
191[0] http://cgit.freedesktop.org/xorg/xserver/commit/?id=4b42448a2388d40f257774fbffdccaea87bd0347
192[1] http://lists.freedesktop.org/archives/xorg/2005-March/006663.html
193[2] http://lists.freedesktop.org/archives/xorg/2005-March/006745.html
194[3] http://lists.freedesktop.org/archives/xorg/2007-October/029507.html
diff --git a/Documentation/video4linux/CARDLIST.cx23885 b/Documentation/video4linux/CARDLIST.cx23885
index 450b8f8c389b..525edb37c758 100644
--- a/Documentation/video4linux/CARDLIST.cx23885
+++ b/Documentation/video4linux/CARDLIST.cx23885
@@ -21,3 +21,5 @@
21 20 -> Hauppauge WinTV-HVR1255 [0070:2251] 21 20 -> Hauppauge WinTV-HVR1255 [0070:2251]
22 21 -> Hauppauge WinTV-HVR1210 [0070:2291,0070:2295] 22 21 -> Hauppauge WinTV-HVR1210 [0070:2291,0070:2295]
23 22 -> Mygica X8506 DMB-TH [14f1:8651] 23 22 -> Mygica X8506 DMB-TH [14f1:8651]
24 23 -> Magic-Pro ProHDTV Extreme 2 [14f1:8657]
25 24 -> Hauppauge WinTV-HVR1850 [0070:8541]
diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88
index 0736518b2f88..3385f8b094a5 100644
--- a/Documentation/video4linux/CARDLIST.cx88
+++ b/Documentation/video4linux/CARDLIST.cx88
@@ -80,3 +80,4 @@
80 79 -> Terratec Cinergy HT PCI MKII [153b:1177] 80 79 -> Terratec Cinergy HT PCI MKII [153b:1177]
81 80 -> Hauppauge WinTV-IR Only [0070:9290] 81 80 -> Hauppauge WinTV-IR Only [0070:9290]
82 81 -> Leadtek WinFast DTV1800 Hybrid [107d:6654] 82 81 -> Leadtek WinFast DTV1800 Hybrid [107d:6654]
83 82 -> WinFast DTV2000 H rev. J [107d:6f2b]
diff --git a/Documentation/video4linux/CARDLIST.em28xx b/Documentation/video4linux/CARDLIST.em28xx
index e352d754875c..b13fcbd5d94b 100644
--- a/Documentation/video4linux/CARDLIST.em28xx
+++ b/Documentation/video4linux/CARDLIST.em28xx
@@ -7,7 +7,7 @@
7 6 -> Terratec Cinergy 200 USB (em2800) 7 6 -> Terratec Cinergy 200 USB (em2800)
8 7 -> Leadtek Winfast USB II (em2800) [0413:6023] 8 7 -> Leadtek Winfast USB II (em2800) [0413:6023]
9 8 -> Kworld USB2800 (em2800) 9 8 -> Kworld USB2800 (em2800)
10 9 -> Pinnacle Dazzle DVC 90/100/101/107 / Kaiser Baas Video to DVD maker (em2820/em2840) [1b80:e302,2304:0207,2304:021a] 10 9 -> Pinnacle Dazzle DVC 90/100/101/107 / Kaiser Baas Video to DVD maker (em2820/em2840) [1b80:e302,1b80:e304,2304:0207,2304:021a]
11 10 -> Hauppauge WinTV HVR 900 (em2880) [2040:6500] 11 10 -> Hauppauge WinTV HVR 900 (em2880) [2040:6500]
12 11 -> Terratec Hybrid XS (em2880) [0ccd:0042] 12 11 -> Terratec Hybrid XS (em2880) [0ccd:0042]
13 12 -> Kworld PVR TV 2800 RF (em2820/em2840) 13 12 -> Kworld PVR TV 2800 RF (em2820/em2840)
@@ -33,7 +33,7 @@
33 34 -> Terratec Cinergy A Hybrid XS (em2860) [0ccd:004f] 33 34 -> Terratec Cinergy A Hybrid XS (em2860) [0ccd:004f]
34 35 -> Typhoon DVD Maker (em2860) 34 35 -> Typhoon DVD Maker (em2860)
35 36 -> NetGMBH Cam (em2860) 35 36 -> NetGMBH Cam (em2860)
36 37 -> Gadmei UTV330 (em2860) 36 37 -> Gadmei UTV330 (em2860) [eb1a:50a6]
37 38 -> Yakumo MovieMixer (em2861) 37 38 -> Yakumo MovieMixer (em2861)
38 39 -> KWorld PVRTV 300U (em2861) [eb1a:e300] 38 39 -> KWorld PVRTV 300U (em2861) [eb1a:e300]
39 40 -> Plextor ConvertX PX-TV100U (em2861) [093b:a005] 39 40 -> Plextor ConvertX PX-TV100U (em2861) [093b:a005]
@@ -67,3 +67,4 @@
67 69 -> KWorld ATSC 315U HDTV TV Box (em2882) [eb1a:a313] 67 69 -> KWorld ATSC 315U HDTV TV Box (em2882) [eb1a:a313]
68 70 -> Evga inDtube (em2882) 68 70 -> Evga inDtube (em2882)
69 71 -> Silvercrest Webcam 1.3mpix (em2820/em2840) 69 71 -> Silvercrest Webcam 1.3mpix (em2820/em2840)
70 72 -> Gadmei UTV330+ (em2861)
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134
index c913e5614195..0ac4d2544778 100644
--- a/Documentation/video4linux/CARDLIST.saa7134
+++ b/Documentation/video4linux/CARDLIST.saa7134
@@ -167,3 +167,7 @@
167166 -> Beholder BeholdTV 607 RDS [5ace:6073] 167166 -> Beholder BeholdTV 607 RDS [5ace:6073]
168167 -> Beholder BeholdTV 609 RDS [5ace:6092] 168167 -> Beholder BeholdTV 609 RDS [5ace:6092]
169168 -> Beholder BeholdTV 609 RDS [5ace:6093] 169168 -> Beholder BeholdTV 609 RDS [5ace:6093]
170169 -> Compro VideoMate S350/S300 [185b:c900]
171170 -> AverMedia AverTV Studio 505 [1461:a115]
172171 -> Beholder BeholdTV X7 [5ace:7595]
173172 -> RoverMedia TV Link Pro FM [19d1:0138]
diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner
index be67844074dd..ba9fa679e2d3 100644
--- a/Documentation/video4linux/CARDLIST.tuner
+++ b/Documentation/video4linux/CARDLIST.tuner
@@ -78,3 +78,4 @@ tuner=77 - TCL tuner MF02GIP-5N-E
78tuner=78 - Philips FMD1216MEX MK3 Hybrid Tuner 78tuner=78 - Philips FMD1216MEX MK3 Hybrid Tuner
79tuner=79 - Philips PAL/SECAM multi (FM1216 MK5) 79tuner=79 - Philips PAL/SECAM multi (FM1216 MK5)
80tuner=80 - Philips FQ1216LME MK3 PAL/SECAM w/active loopthrough 80tuner=80 - Philips FQ1216LME MK3 PAL/SECAM w/active loopthrough
81tuner=81 - Partsnic (Daewoo) PTI-5NF05
diff --git a/Documentation/video4linux/CQcam.txt b/Documentation/video4linux/CQcam.txt
index 04986efb731c..d230878e473e 100644
--- a/Documentation/video4linux/CQcam.txt
+++ b/Documentation/video4linux/CQcam.txt
@@ -18,8 +18,8 @@ Table of Contents
18 18
191.0 Introduction 191.0 Introduction
20 20
21 The file ../drivers/char/c-qcam.c is a device driver for the 21 The file ../../drivers/media/video/c-qcam.c is a device driver for
22Logitech (nee Connectix) parallel port interface color CCD camera. 22the Logitech (nee Connectix) parallel port interface color CCD camera.
23This is a fairly inexpensive device for capturing images. Logitech 23This is a fairly inexpensive device for capturing images. Logitech
24does not currently provide information for developers, but many people 24does not currently provide information for developers, but many people
25have engineered several solutions for non-Microsoft use of the Color 25have engineered several solutions for non-Microsoft use of the Color
diff --git a/Documentation/video4linux/gspca.txt b/Documentation/video4linux/gspca.txt
index 573f95b58807..4686e84dd800 100644
--- a/Documentation/video4linux/gspca.txt
+++ b/Documentation/video4linux/gspca.txt
@@ -140,6 +140,7 @@ spca500 04fc:7333 PalmPixDC85
140sunplus 04fc:ffff Pure DigitalDakota 140sunplus 04fc:ffff Pure DigitalDakota
141spca501 0506:00df 3Com HomeConnect Lite 141spca501 0506:00df 3Com HomeConnect Lite
142sunplus 052b:1513 Megapix V4 142sunplus 052b:1513 Megapix V4
143sunplus 052b:1803 MegaImage VI
143tv8532 0545:808b Veo Stingray 144tv8532 0545:808b Veo Stingray
144tv8532 0545:8333 Veo Stingray 145tv8532 0545:8333 Veo Stingray
145sunplus 0546:3155 Polaroid PDC3070 146sunplus 0546:3155 Polaroid PDC3070
@@ -182,6 +183,7 @@ ov534 06f8:3002 Hercules Blog Webcam
182ov534 06f8:3003 Hercules Dualpix HD Weblog 183ov534 06f8:3003 Hercules Dualpix HD Weblog
183sonixj 06f8:3004 Hercules Classic Silver 184sonixj 06f8:3004 Hercules Classic Silver
184sonixj 06f8:3008 Hercules Deluxe Optical Glass 185sonixj 06f8:3008 Hercules Deluxe Optical Glass
186pac7311 06f8:3009 Hercules Classic Link
185spca508 0733:0110 ViewQuest VQ110 187spca508 0733:0110 ViewQuest VQ110
186spca508 0130:0130 Clone Digital Webcam 11043 188spca508 0130:0130 Clone Digital Webcam 11043
187spca501 0733:0401 Intel Create and Share 189spca501 0733:0401 Intel Create and Share
@@ -235,8 +237,10 @@ pac7311 093a:2621 PAC731x
235pac7311 093a:2622 Genius Eye 312 237pac7311 093a:2622 Genius Eye 312
236pac7311 093a:2624 PAC7302 238pac7311 093a:2624 PAC7302
237pac7311 093a:2626 Labtec 2200 239pac7311 093a:2626 Labtec 2200
240pac7311 093a:2629 Genious iSlim 300
238pac7311 093a:262a Webcam 300k 241pac7311 093a:262a Webcam 300k
239pac7311 093a:262c Philips SPC 230 NC 242pac7311 093a:262c Philips SPC 230 NC
243jeilinj 0979:0280 Sakar 57379
240zc3xx 0ac8:0302 Z-star Vimicro zc0302 244zc3xx 0ac8:0302 Z-star Vimicro zc0302
241vc032x 0ac8:0321 Vimicro generic vc0321 245vc032x 0ac8:0321 Vimicro generic vc0321
242vc032x 0ac8:0323 Vimicro Vc0323 246vc032x 0ac8:0323 Vimicro Vc0323
@@ -247,6 +251,7 @@ zc3xx 0ac8:305b Z-star Vimicro zc0305b
247zc3xx 0ac8:307b Ldlc VC302+Ov7620 251zc3xx 0ac8:307b Ldlc VC302+Ov7620
248vc032x 0ac8:c001 Sony embedded vimicro 252vc032x 0ac8:c001 Sony embedded vimicro
249vc032x 0ac8:c002 Sony embedded vimicro 253vc032x 0ac8:c002 Sony embedded vimicro
254vc032x 0ac8:c301 Samsung Q1 Ultra Premium
250spca508 0af9:0010 Hama USB Sightcam 100 255spca508 0af9:0010 Hama USB Sightcam 100
251spca508 0af9:0011 Hama USB Sightcam 100 256spca508 0af9:0011 Hama USB Sightcam 100
252sonixb 0c45:6001 Genius VideoCAM NB 257sonixb 0c45:6001 Genius VideoCAM NB
@@ -284,6 +289,7 @@ sonixj 0c45:613a Microdia Sonix PC Camera
284sonixj 0c45:613b Surfer SN-206 289sonixj 0c45:613b Surfer SN-206
285sonixj 0c45:613c Sonix Pccam168 290sonixj 0c45:613c Sonix Pccam168
286sonixj 0c45:6143 Sonix Pccam168 291sonixj 0c45:6143 Sonix Pccam168
292sonixj 0c45:6148 Digitus DA-70811/ZSMC USB PC Camera ZS211/Microdia
287sn9c20x 0c45:6240 PC Camera (SN9C201 + MT9M001) 293sn9c20x 0c45:6240 PC Camera (SN9C201 + MT9M001)
288sn9c20x 0c45:6242 PC Camera (SN9C201 + MT9M111) 294sn9c20x 0c45:6242 PC Camera (SN9C201 + MT9M111)
289sn9c20x 0c45:6248 PC Camera (SN9C201 + OV9655) 295sn9c20x 0c45:6248 PC Camera (SN9C201 + OV9655)
diff --git a/Documentation/video4linux/si4713.txt b/Documentation/video4linux/si4713.txt
new file mode 100644
index 000000000000..25abdb78209d
--- /dev/null
+++ b/Documentation/video4linux/si4713.txt
@@ -0,0 +1,176 @@
1Driver for I2C radios for the Silicon Labs Si4713 FM Radio Transmitters
2
3Copyright (c) 2009 Nokia Corporation
4Contact: Eduardo Valentin <eduardo.valentin@nokia.com>
5
6
7Information about the Device
8============================
9This chip is a Silicon Labs product. It is a I2C device, currently on 0x63 address.
10Basically, it has transmission and signal noise level measurement features.
11
12The Si4713 integrates transmit functions for FM broadcast stereo transmission.
13The chip also allows integrated receive power scanning to identify low signal
14power FM channels.
15
16The chip is programmed using commands and responses. There are also several
17properties which can change the behavior of this chip.
18
19Users must comply with local regulations on radio frequency (RF) transmission.
20
21Device driver description
22=========================
23There are two modules to handle this device. One is a I2C device driver
24and the other is a platform driver.
25
26The I2C device driver exports a v4l2-subdev interface to the kernel.
27All properties can also be accessed by v4l2 extended controls interface, by
28using the v4l2-subdev calls (g_ext_ctrls, s_ext_ctrls).
29
30The platform device driver exports a v4l2 radio device interface to user land.
31So, it uses the I2C device driver as a sub device in order to send the user
32commands to the actual device. Basically it is a wrapper to the I2C device driver.
33
34Applications can use v4l2 radio API to specify frequency of operation, mute state,
35etc. But mostly of its properties will be present in the extended controls.
36
37When the v4l2 mute property is set to 1 (true), the driver will turn the chip off.
38
39Properties description
40======================
41
42The properties can be accessed using v4l2 extended controls.
43Here is an output from v4l2-ctl util:
44/ # v4l2-ctl -d /dev/radio0 --all -L
45Driver Info:
46 Driver name : radio-si4713
47 Card type : Silicon Labs Si4713 Modulator
48 Bus info :
49 Driver version: 0
50 Capabilities : 0x00080800
51 RDS Output
52 Modulator
53Audio output: 0 (FM Modulator Audio Out)
54Frequency: 1408000 (88.000000 MHz)
55Video Standard = 0x00000000
56Modulator:
57 Name : FM Modulator
58 Capabilities : 62.5 Hz stereo rds
59 Frequency range : 76.0 MHz - 108.0 MHz
60 Subchannel modulation: stereo+rds
61
62User Controls
63
64 mute (bool) : default=1 value=0
65
66FM Radio Modulator Controls
67
68 rds_signal_deviation (int) : min=0 max=90000 step=10 default=200 value=200 flags=slider
69 rds_program_id (int) : min=0 max=65535 step=1 default=0 value=0
70 rds_program_type (int) : min=0 max=31 step=1 default=0 value=0
71 rds_ps_name (str) : min=0 max=96 step=8 value='si4713 '
72 rds_radio_text (str) : min=0 max=384 step=32 value=''
73 audio_limiter_feature_enabled (bool) : default=1 value=1
74 audio_limiter_release_time (int) : min=250 max=102390 step=50 default=5010 value=5010 flags=slider
75 audio_limiter_deviation (int) : min=0 max=90000 step=10 default=66250 value=66250 flags=slider
76audio_compression_feature_enabl (bool) : default=1 value=1
77 audio_compression_gain (int) : min=0 max=20 step=1 default=15 value=15 flags=slider
78 audio_compression_threshold (int) : min=-40 max=0 step=1 default=-40 value=-40 flags=slider
79 audio_compression_attack_time (int) : min=0 max=5000 step=500 default=0 value=0 flags=slider
80 audio_compression_release_time (int) : min=100000 max=1000000 step=100000 default=1000000 value=1000000 flags=slider
81 pilot_tone_feature_enabled (bool) : default=1 value=1
82 pilot_tone_deviation (int) : min=0 max=90000 step=10 default=6750 value=6750 flags=slider
83 pilot_tone_frequency (int) : min=0 max=19000 step=1 default=19000 value=19000 flags=slider
84 pre_emphasis_settings (menu) : min=0 max=2 default=1 value=1
85 tune_power_level (int) : min=0 max=120 step=1 default=88 value=88 flags=slider
86 tune_antenna_capacitor (int) : min=0 max=191 step=1 default=0 value=110 flags=slider
87/ #
88
89Here is a summary of them:
90
91* Pilot is an audible tone sent by the device.
92
93pilot_frequency - Configures the frequency of the stereo pilot tone.
94pilot_deviation - Configures pilot tone frequency deviation level.
95pilot_enabled - Enables or disables the pilot tone feature.
96
97* The si4713 device is capable of applying audio compression to the transmitted signal.
98
99acomp_enabled - Enables or disables the audio dynamic range control feature.
100acomp_gain - Sets the gain for audio dynamic range control.
101acomp_threshold - Sets the threshold level for audio dynamic range control.
102acomp_attack_time - Sets the attack time for audio dynamic range control.
103acomp_release_time - Sets the release time for audio dynamic range control.
104
105* Limiter setups audio deviation limiter feature. Once a over deviation occurs,
106it is possible to adjust the front-end gain of the audio input and always
107prevent over deviation.
108
109limiter_enabled - Enables or disables the limiter feature.
110limiter_deviation - Configures audio frequency deviation level.
111limiter_release_time - Sets the limiter release time.
112
113* Tuning power
114
115power_level - Sets the output power level for signal transmission.
116antenna_capacitor - This selects the value of antenna tuning capacitor manually
117or automatically if set to zero.
118
119* RDS related
120
121rds_ps_name - Sets the RDS ps name field for transmission.
122rds_radio_text - Sets the RDS radio text for transmission.
123rds_pi - Sets the RDS PI field for transmission.
124rds_pty - Sets the RDS PTY field for transmission.
125
126* Region related
127
128preemphasis - sets the preemphasis to be applied for transmission.
129
130RNL
131===
132
133This device also has an interface to measure received noise level. To do that, you should
134ioctl the device node. Here is an code of example:
135
136int main (int argc, char *argv[])
137{
138 struct si4713_rnl rnl;
139 int fd = open("/dev/radio0", O_RDWR);
140 int rval;
141
142 if (argc < 2)
143 return -EINVAL;
144
145 if (fd < 0)
146 return fd;
147
148 sscanf(argv[1], "%d", &rnl.frequency);
149
150 rval = ioctl(fd, SI4713_IOC_MEASURE_RNL, &rnl);
151 if (rval < 0)
152 return rval;
153
154 printf("received noise level: %d\n", rnl.rnl);
155
156 close(fd);
157}
158
159The struct si4713_rnl and SI4713_IOC_MEASURE_RNL are defined under
160include/media/si4713.h.
161
162Stereo/Mono and RDS subchannels
163===============================
164
165The device can also be configured using the available sub channels for
166transmission. To do that use S/G_MODULATOR ioctl and configure txsubchans properly.
167Refer to v4l2-spec for proper use of this ioctl.
168
169Testing
170=======
171Testing is usually done with v4l2-ctl utility for managing FM tuner cards.
172The tool can be found in v4l-dvb repository under v4l2-apps/util directory.
173
174Example for setting rds ps name:
175# v4l2-ctl -d /dev/radio0 --set-ctrl=rds_ps_name="Dummy"
176
diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt
index bb1f5c6e28b3..510917ff59ed 100644
--- a/Documentation/vm/slub.txt
+++ b/Documentation/vm/slub.txt
@@ -41,6 +41,8 @@ Possible debug options are
41 P Poisoning (object and padding) 41 P Poisoning (object and padding)
42 U User tracking (free and alloc) 42 U User tracking (free and alloc)
43 T Trace (please only use on single slabs) 43 T Trace (please only use on single slabs)
44 O Switch debugging off for caches that would have
45 caused higher minimum slab orders
44 - Switch all debugging off (useful if the kernel is 46 - Switch all debugging off (useful if the kernel is
45 configured with CONFIG_SLUB_DEBUG_ON) 47 configured with CONFIG_SLUB_DEBUG_ON)
46 48
@@ -59,6 +61,14 @@ to the dentry cache with
59 61
60 slub_debug=F,dentry 62 slub_debug=F,dentry
61 63
64Debugging options may require the minimum possible slab order to increase as
65a result of storing the metadata (for example, caches with PAGE_SIZE object
66sizes). This has a higher liklihood of resulting in slab allocation errors
67in low memory situations or if there's high fragmentation of memory. To
68switch off debugging for such caches by default, use
69
70 slub_debug=O
71
62In case you forgot to enable debugging on the kernel command line: It is 72In case you forgot to enable debugging on the kernel command line: It is
63possible to enable debugging manually when the kernel is up. Look at the 73possible to enable debugging manually when the kernel is up. Look at the
64contents of: 74contents of:
diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
index 4f913857b8a2..feb37e177010 100644
--- a/Documentation/x86/zero-page.txt
+++ b/Documentation/x86/zero-page.txt
@@ -12,6 +12,7 @@ Offset Proto Name Meaning
12000/040 ALL screen_info Text mode or frame buffer information 12000/040 ALL screen_info Text mode or frame buffer information
13 (struct screen_info) 13 (struct screen_info)
14040/014 ALL apm_bios_info APM BIOS information (struct apm_bios_info) 14040/014 ALL apm_bios_info APM BIOS information (struct apm_bios_info)
15058/008 ALL tboot_addr Physical address of tboot shared page
15060/010 ALL ist_info Intel SpeedStep (IST) BIOS support information 16060/010 ALL ist_info Intel SpeedStep (IST) BIOS support information
16 (struct ist_info) 17 (struct ist_info)
17080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!! 18080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!!