aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/removed/raw1394_legacy_isochronous16
-rw-r--r--Documentation/ABI/testing/sysfs-bus-usb13
-rw-r--r--Documentation/BUG-HUNTING24
-rw-r--r--Documentation/DMA-mapping.txt103
-rw-r--r--Documentation/DocBook/kernel-api.tmpl66
-rw-r--r--Documentation/HOWTO28
-rw-r--r--Documentation/SM501.txt66
-rw-r--r--Documentation/SubmitChecklist6
-rw-r--r--Documentation/SubmittingPatches63
-rw-r--r--Documentation/atomic_ops.txt2
-rw-r--r--Documentation/blackfin/kgdb.txt155
-rw-r--r--Documentation/block/barrier.txt16
-rw-r--r--Documentation/driver-model/platform.txt40
-rw-r--r--Documentation/feature-removal-schedule.txt76
-rw-r--r--Documentation/filesystems/tmpfs.txt10
-rw-r--r--Documentation/firmware_class/README2
-rw-r--r--Documentation/firmware_class/firmware_sample_driver.c2
-rw-r--r--Documentation/firmware_class/firmware_sample_firmware_class.c6
-rw-r--r--Documentation/hrtimer/timer_stats.txt7
-rw-r--r--Documentation/i2c/busses/i2c-i8014
-rw-r--r--Documentation/i2c/busses/i2c-piix42
-rw-r--r--Documentation/i2c/busses/i2c-taos-evm46
-rw-r--r--Documentation/i2c/chips/max68752
-rw-r--r--Documentation/i2c/chips/x120538
-rw-r--r--Documentation/i2c/summary2
-rw-r--r--Documentation/i2c/writing-clients2
-rw-r--r--Documentation/i386/zero-page.txt1
-rw-r--r--Documentation/ia64/aliasing-test.c28
-rw-r--r--Documentation/ia64/aliasing.txt12
-rw-r--r--Documentation/kernel-parameters.txt90
-rw-r--r--Documentation/networking/00-INDEX5
-rw-r--r--Documentation/networking/cxacru.txt84
-rw-r--r--Documentation/networking/ip-sysctl.txt9
-rw-r--r--Documentation/networking/l2tp.txt169
-rw-r--r--Documentation/networking/mac80211-injection.txt59
-rw-r--r--Documentation/networking/multiqueue.txt111
-rw-r--r--Documentation/networking/netdevices.txt38
-rw-r--r--Documentation/networking/radiotap-headers.txt152
-rw-r--r--Documentation/networking/sk98lin.txt568
-rw-r--r--Documentation/networking/spider_net.txt204
-rw-r--r--Documentation/networking/xfrm_sysctl.txt4
-rw-r--r--Documentation/pci.txt8
-rw-r--r--Documentation/power/pci.txt37
-rw-r--r--Documentation/power/swsusp.txt3
-rw-r--r--Documentation/power_supply_class.txt167
-rw-r--r--Documentation/powerpc/booting-without-of.txt59
-rw-r--r--Documentation/sched-design-CFS.txt119
-rw-r--r--Documentation/sound/alsa/ALSA-Configuration.txt1
-rw-r--r--Documentation/sysctl/vm.txt15
-rw-r--r--Documentation/sysfs-rules.txt166
-rw-r--r--Documentation/thinkpad-acpi.txt25
-rw-r--r--Documentation/usb/dma.txt52
-rw-r--r--Documentation/usb/persist.txt156
-rw-r--r--Documentation/vm/slub.txt135
-rw-r--r--Documentation/volatile-considered-harmful.txt119
-rw-r--r--Documentation/watchdog/pcwd-watchdog.txt8
-rw-r--r--Documentation/watchdog/watchdog-api.txt236
-rw-r--r--Documentation/watchdog/watchdog.txt94
-rw-r--r--Documentation/watchdog/wdt.txt43
59 files changed, 2472 insertions, 1302 deletions
diff --git a/Documentation/ABI/removed/raw1394_legacy_isochronous b/Documentation/ABI/removed/raw1394_legacy_isochronous
new file mode 100644
index 000000000000..1b629622d883
--- /dev/null
+++ b/Documentation/ABI/removed/raw1394_legacy_isochronous
@@ -0,0 +1,16 @@
1What: legacy isochronous ABI of raw1394 (1st generation iso ABI)
2Date: June 2007 (scheduled), removed in kernel v2.6.23
3Contact: linux1394-devel@lists.sourceforge.net
4Description:
5 The two request types RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN have
6 been deprecated for quite some time. They are very inefficient as they
7 come with high interrupt load and several layers of callbacks for each
8 packet. Because of these deficiencies, the video1394 and dv1394 drivers
9 and the 3rd-generation isochronous ABI in raw1394 (rawiso) were created.
10
11Users:
12 libraw1394 users via the long deprecated API raw1394_iso_write,
13 raw1394_start_iso_write, raw1394_start_iso_rcv, raw1394_stop_iso_rcv
14
15 libdc1394, which optionally uses these old libraw1394 calls
16 alternatively to the more efficient video1394 ABI
diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb
index f9937add033d..9734577d1711 100644
--- a/Documentation/ABI/testing/sysfs-bus-usb
+++ b/Documentation/ABI/testing/sysfs-bus-usb
@@ -39,3 +39,16 @@ Description:
39 If you want to suspend a device immediately but leave it 39 If you want to suspend a device immediately but leave it
40 free to wake up in response to I/O requests, you should 40 free to wake up in response to I/O requests, you should
41 write "0" to power/autosuspend. 41 write "0" to power/autosuspend.
42
43What: /sys/bus/usb/devices/.../power/persist
44Date: May 2007
45KernelVersion: 2.6.23
46Contact: Alan Stern <stern@rowland.harvard.edu>
47Description:
48 If CONFIG_USB_PERSIST is set, then each USB device directory
49 will contain a file named power/persist. The file holds a
50 boolean value (0 or 1) indicating whether or not the
51 "USB-Persist" facility is enabled for the device. Since the
52 facility is inherently dangerous, it is disabled by default
53 for all devices except hubs. For more information, see
54 Documentation/usb/persist.txt.
diff --git a/Documentation/BUG-HUNTING b/Documentation/BUG-HUNTING
index 65b97e1dbf70..35f5bd243336 100644
--- a/Documentation/BUG-HUNTING
+++ b/Documentation/BUG-HUNTING
@@ -191,6 +191,30 @@ e.g. crash dump output as shown by Dave Miller.
191> mov 0x8(%ebp), %ebx ! %ebx = skb->sk 191> mov 0x8(%ebp), %ebx ! %ebx = skb->sk
192> mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt 192> mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
193 193
194In addition, you can use GDB to figure out the exact file and line
195number of the OOPS from the vmlinux file. If you have
196CONFIG_DEBUG_INFO enabled, you can simply copy the EIP value from the
197OOPS:
198
199 EIP: 0060:[<c021e50e>] Not tainted VLI
200
201And use GDB to translate that to human-readable form:
202
203 gdb vmlinux
204 (gdb) l *0xc021e50e
205
206If you don't have CONFIG_DEBUG_INFO enabled, you use the function
207offset from the OOPS:
208
209 EIP is at vt_ioctl+0xda8/0x1482
210
211And recompile the kernel with CONFIG_DEBUG_INFO enabled:
212
213 make vmlinux
214 gdb vmlinux
215 (gdb) p vt_ioctl
216 (gdb) l *(0x<address of vt_ioctl> + 0xda8)
217
194Another very useful option of the Kernel Hacking section in menuconfig is 218Another very useful option of the Kernel Hacking section in menuconfig is
195Debug memory allocations. This will help you see whether data has been 219Debug memory allocations. This will help you see whether data has been
196initialised and not set before use etc. To see the values that get assigned 220initialised and not set before use etc. To see the values that get assigned
diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt
index 028614cdd062..e07f2530326b 100644
--- a/Documentation/DMA-mapping.txt
+++ b/Documentation/DMA-mapping.txt
@@ -664,109 +664,6 @@ It is that simple.
664Well, not for some odd devices. See the next section for information 664Well, not for some odd devices. See the next section for information
665about that. 665about that.
666 666
667 DAC Addressing for Address Space Hungry Devices
668
669There exists a class of devices which do not mesh well with the PCI
670DMA mapping API. By definition these "mappings" are a finite
671resource. The number of total available mappings per bus is platform
672specific, but there will always be a reasonable amount.
673
674What is "reasonable"? Reasonable means that networking and block I/O
675devices need not worry about using too many mappings.
676
677As an example of a problematic device, consider compute cluster cards.
678They can potentially need to access gigabytes of memory at once via
679DMA. Dynamic mappings are unsuitable for this kind of access pattern.
680
681To this end we've provided a small API by which a device driver
682may use DAC cycles to directly address all of physical memory.
683Not all platforms support this, but most do. It is easy to determine
684whether the platform will work properly at probe time.
685
686First, understand that there may be a SEVERE performance penalty for
687using these interfaces on some platforms. Therefore, you MUST only
688use these interfaces if it is absolutely required. %99 of devices can
689use the normal APIs without any problems.
690
691Note that for streaming type mappings you must either use these
692interfaces, or the dynamic mapping interfaces above. You may not mix
693usage of both for the same device. Such an act is illegal and is
694guaranteed to put a banana in your tailpipe.
695
696However, consistent mappings may in fact be used in conjunction with
697these interfaces. Remember that, as defined, consistent mappings are
698always going to be SAC addressable.
699
700The first thing your driver needs to do is query the PCI platform
701layer if it is capable of handling your devices DAC addressing
702capabilities:
703
704 int pci_dac_dma_supported(struct pci_dev *hwdev, u64 mask);
705
706You may not use the following interfaces if this routine fails.
707
708Next, DMA addresses using this API are kept track of using the
709dma64_addr_t type. It is guaranteed to be big enough to hold any
710DAC address the platform layer will give to you from the following
711routines. If you have consistent mappings as well, you still
712use plain dma_addr_t to keep track of those.
713
714All mappings obtained here will be direct. The mappings are not
715translated, and this is the purpose of this dialect of the DMA API.
716
717All routines work with page/offset pairs. This is the _ONLY_ way to
718portably refer to any piece of memory. If you have a cpu pointer
719(which may be validly DMA'd too) you may easily obtain the page
720and offset using something like this:
721
722 struct page *page = virt_to_page(ptr);
723 unsigned long offset = offset_in_page(ptr);
724
725Here are the interfaces:
726
727 dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev,
728 struct page *page,
729 unsigned long offset,
730 int direction);
731
732The DAC address for the tuple PAGE/OFFSET are returned. The direction
733argument is the same as for pci_{map,unmap}_single(). The same rules
734for cpu/device access apply here as for the streaming mapping
735interfaces. To reiterate:
736
737 The cpu may touch the buffer before pci_dac_page_to_dma.
738 The device may touch the buffer after pci_dac_page_to_dma
739 is made, but the cpu may NOT.
740
741When the DMA transfer is complete, invoke:
742
743 void pci_dac_dma_sync_single_for_cpu(struct pci_dev *pdev,
744 dma64_addr_t dma_addr,
745 size_t len, int direction);
746
747This must be done before the CPU looks at the buffer again.
748This interface behaves identically to pci_dma_sync_{single,sg}_for_cpu().
749
750And likewise, if you wish to let the device get back at the buffer after
751the cpu has read/written it, invoke:
752
753 void pci_dac_dma_sync_single_for_device(struct pci_dev *pdev,
754 dma64_addr_t dma_addr,
755 size_t len, int direction);
756
757before letting the device access the DMA area again.
758
759If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t
760the following interfaces are provided:
761
762 struct page *pci_dac_dma_to_page(struct pci_dev *pdev,
763 dma64_addr_t dma_addr);
764 unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev,
765 dma64_addr_t dma_addr);
766
767This is possible with the DAC interfaces purely because they are
768not translated in any way.
769
770 Optimizing Unmap State Space Consumption 667 Optimizing Unmap State Space Consumption
771 668
772On many platforms, pci_unmap_{single,page}() is simply a nop. 669On many platforms, pci_unmap_{single,page}() is simply a nop.
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index 38f88b6ae405..46bcff2849bd 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -643,4 +643,70 @@ X!Idrivers/video/console/fonts.c
643!Edrivers/spi/spi.c 643!Edrivers/spi/spi.c
644 </chapter> 644 </chapter>
645 645
646 <chapter id="i2c">
647 <title>I<superscript>2</superscript>C and SMBus Subsystem</title>
648
649 <para>
650 I<superscript>2</superscript>C (or without fancy typography, "I2C")
651 is an acronym for the "Inter-IC" bus, a simple bus protocol which is
652 widely used where low data rate communications suffice.
653 Since it's also a licensed trademark, some vendors use another
654 name (such as "Two-Wire Interface", TWI) for the same bus.
655 I2C only needs two signals (SCL for clock, SDA for data), conserving
656 board real estate and minimizing signal quality issues.
657 Most I2C devices use seven bit addresses, and bus speeds of up
658 to 400 kHz; there's a high speed extension (3.4 MHz) that's not yet
659 found wide use.
660 I2C is a multi-master bus; open drain signaling is used to
661 arbitrate between masters, as well as to handshake and to
662 synchronize clocks from slower clients.
663 </para>
664
665 <para>
666 The Linux I2C programming interfaces support only the master
667 side of bus interactions, not the slave side.
668 The programming interface is structured around two kinds of driver,
669 and two kinds of device.
670 An I2C "Adapter Driver" abstracts the controller hardware; it binds
671 to a physical device (perhaps a PCI device or platform_device) and
672 exposes a <structname>struct i2c_adapter</structname> representing
673 each I2C bus segment it manages.
674 On each I2C bus segment will be I2C devices represented by a
675 <structname>struct i2c_client</structname>. Those devices will
676 be bound to a <structname>struct i2c_driver</structname>,
677 which should follow the standard Linux driver model.
678 (At this writing, a legacy model is more widely used.)
679 There are functions to perform various I2C protocol operations; at
680 this writing all such functions are usable only from task context.
681 </para>
682
683 <para>
684 The System Management Bus (SMBus) is a sibling protocol. Most SMBus
685 systems are also I2C conformant. The electrical constraints are
686 tighter for SMBus, and it standardizes particular protocol messages
687 and idioms. Controllers that support I2C can also support most
688 SMBus operations, but SMBus controllers don't support all the protocol
689 options that an I2C controller will.
690 There are functions to perform various SMBus protocol operations,
691 either using I2C primitives or by issuing SMBus commands to
692 i2c_adapter devices which don't support those I2C operations.
693 </para>
694
695!Iinclude/linux/i2c.h
696!Fdrivers/i2c/i2c-boardinfo.c i2c_register_board_info
697!Edrivers/i2c/i2c-core.c
698 </chapter>
699
700 <chapter id="splice">
701 <title>splice API</title>
702 <para>)
703 splice is a method for moving blocks of data around inside the
704 kernel, without continually transferring it between the kernel
705 and user space.
706 </para>
707!Iinclude/linux/splice.h
708!Ffs/splice.c
709 </chapter>
710
711
646</book> 712</book>
diff --git a/Documentation/HOWTO b/Documentation/HOWTO
index ced9207bedcf..98e2701c746f 100644
--- a/Documentation/HOWTO
+++ b/Documentation/HOWTO
@@ -322,39 +322,34 @@ kernel releases as described above.
322Here is a list of some of the different kernel trees available: 322Here is a list of some of the different kernel trees available:
323 git trees: 323 git trees:
324 - Kbuild development tree, Sam Ravnborg <sam@ravnborg.org> 324 - Kbuild development tree, Sam Ravnborg <sam@ravnborg.org>
325 kernel.org:/pub/scm/linux/kernel/git/sam/kbuild.git 325 git.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild.git
326 326
327 - ACPI development tree, Len Brown <len.brown@intel.com> 327 - ACPI development tree, Len Brown <len.brown@intel.com>
328 kernel.org:/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git 328 git.kernel.org:/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git
329 329
330 - Block development tree, Jens Axboe <axboe@suse.de> 330 - Block development tree, Jens Axboe <axboe@suse.de>
331 kernel.org:/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git 331 git.kernel.org:/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git
332 332
333 - DRM development tree, Dave Airlie <airlied@linux.ie> 333 - DRM development tree, Dave Airlie <airlied@linux.ie>
334 kernel.org:/pub/scm/linux/kernel/git/airlied/drm-2.6.git 334 git.kernel.org:/pub/scm/linux/kernel/git/airlied/drm-2.6.git
335 335
336 - ia64 development tree, Tony Luck <tony.luck@intel.com> 336 - ia64 development tree, Tony Luck <tony.luck@intel.com>
337 kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6.git 337 git.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6.git
338
339 - ieee1394 development tree, Jody McIntyre <scjody@modernduck.com>
340 kernel.org:/pub/scm/linux/kernel/git/scjody/ieee1394.git
341 338
342 - infiniband, Roland Dreier <rolandd@cisco.com> 339 - infiniband, Roland Dreier <rolandd@cisco.com>
343 kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git 340 git.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git
344 341
345 - libata, Jeff Garzik <jgarzik@pobox.com> 342 - libata, Jeff Garzik <jgarzik@pobox.com>
346 kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git 343 git.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git
347 344
348 - network drivers, Jeff Garzik <jgarzik@pobox.com> 345 - network drivers, Jeff Garzik <jgarzik@pobox.com>
349 kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 346 git.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git
350 347
351 - pcmcia, Dominik Brodowski <linux@dominikbrodowski.net> 348 - pcmcia, Dominik Brodowski <linux@dominikbrodowski.net>
352 kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git 349 git.kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git
353 350
354 - SCSI, James Bottomley <James.Bottomley@SteelEye.com> 351 - SCSI, James Bottomley <James.Bottomley@SteelEye.com>
355 kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git 352 git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
356
357 Other git kernel trees can be found listed at http://kernel.org/git
358 353
359 quilt trees: 354 quilt trees:
360 - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de> 355 - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de>
@@ -362,6 +357,9 @@ Here is a list of some of the different kernel trees available:
362 - x86-64, partly i386, Andi Kleen <ak@suse.de> 357 - x86-64, partly i386, Andi Kleen <ak@suse.de>
363 ftp.firstfloor.org:/pub/ak/x86_64/quilt/ 358 ftp.firstfloor.org:/pub/ak/x86_64/quilt/
364 359
360 Other kernel trees can be found listed at http://git.kernel.org/ and in
361 the MAINTAINERS file.
362
365Bug Reporting 363Bug Reporting
366------------- 364-------------
367 365
diff --git a/Documentation/SM501.txt b/Documentation/SM501.txt
new file mode 100644
index 000000000000..3a1bd95d3767
--- /dev/null
+++ b/Documentation/SM501.txt
@@ -0,0 +1,66 @@
1 SM501 Driver
2 ============
3
4Copyright 2006, 2007 Simtec Electronics
5
6Core
7----
8
9The core driver in drivers/mfd provides common services for the
10drivers which manage the specific hardware blocks. These services
11include locking for common registers, clock control and resource
12management.
13
14The core registers drivers for both PCI and generic bus based
15chips via the platform device and driver system.
16
17On detection of a device, the core initialises the chip (which may
18be specified by the platform data) and then exports the selected
19peripheral set as platform devices for the specific drivers.
20
21The core re-uses the platform device system as the platform device
22system provides enough features to support the drivers without the
23need to create a new bus-type and the associated code to go with it.
24
25
26Resources
27---------
28
29Each peripheral has a view of the device which is implicitly narrowed to
30the specific set of resources that peripheral requires in order to
31function correctly.
32
33The centralised memory allocation allows the driver to ensure that the
34maximum possible resource allocation can be made to the video subsystem
35as this is by-far the most resource-sensitive of the on-chip functions.
36
37The primary issue with memory allocation is that of moving the video
38buffers once a display mode is chosen. Indeed when a video mode change
39occurs the memory footprint of the video subsystem changes.
40
41Since video memory is difficult to move without changing the display
42(unless sufficient contiguous memory can be provided for the old and new
43modes simultaneously) the video driver fully utilises the memory area
44given to it by aligning fb0 to the start of the area and fb1 to the end
45of it. Any memory left over in the middle is used for the acceleration
46functions, which are transient and thus their location is less critical
47as it can be moved.
48
49
50Configuration
51-------------
52
53The platform device driver uses a set of platform data to pass
54configurations through to the core and the subsidiary drivers
55so that there can be support for more than one system carrying
56an SM501 built into a single kernel image.
57
58The PCI driver assumes that the PCI card behaves as per the Silicon
59Motion reference design.
60
61There is an errata (AB-5) affecting the selection of the
62of the M1XCLK and M1CLK frequencies. These two clocks
63must be sourced from the same PLL, although they can then
64be divided down individually. If this is not set, then SM501 may
65lock and hang the whole system. The driver will refuse to
66attach if the PLL selection is different.
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist
index 3af3e65cf43b..6ebffb57e3db 100644
--- a/Documentation/SubmitChecklist
+++ b/Documentation/SubmitChecklist
@@ -84,3 +84,9 @@ kernel patches.
8424: Avoid whitespace damage such as indenting with spaces or whitespace 8424: Avoid whitespace damage such as indenting with spaces or whitespace
85 at the end of lines. You can test this by feeding the patch to 85 at the end of lines. You can test this by feeding the patch to
86 "git apply --check --whitespace=error-all" 86 "git apply --check --whitespace=error-all"
87
8825: Check your patch for general style as detailed in
89 Documentation/CodingStyle. Check for trivial violations with the
90 patch style checker prior to submission (scripts/checkpatch.pl).
91 You should be able to justify all violations that remain in
92 your patch.
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index a417b25fb1aa..0958e97d4bf4 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -118,7 +118,20 @@ then only post say 15 or so at a time and wait for review and integration.
118 118
119 119
120 120
1214) Select e-mail destination. 1214) Style check your changes.
122
123Check your patch for basic style violations, details of which can be
124found in Documentation/CodingStyle. Failure to do so simply wastes
125the reviewers time and will get your patch rejected, probabally
126without even being read.
127
128At a minimum you should check your patches with the patch style
129checker prior to submission (scripts/patchcheck.pl). You should
130be able to justify all violations that remain in your patch.
131
132
133
1345) Select e-mail destination.
122 135
123Look through the MAINTAINERS file and the source code, and determine 136Look through the MAINTAINERS file and the source code, and determine
124if your change applies to a specific subsystem of the kernel, with 137if your change applies to a specific subsystem of the kernel, with
@@ -146,7 +159,7 @@ discussed should the patch then be submitted to Linus.
146 159
147 160
148 161
1495) Select your CC (e-mail carbon copy) list. 1626) Select your CC (e-mail carbon copy) list.
150 163
151Unless you have a reason NOT to do so, CC linux-kernel@vger.kernel.org. 164Unless you have a reason NOT to do so, CC linux-kernel@vger.kernel.org.
152 165
@@ -187,8 +200,7 @@ URL: <http://www.kernel.org/pub/linux/kernel/people/bunk/trivial/>
187 200
188 201
189 202
190 2037) No MIME, no links, no compression, no attachments. Just plain text.
1916) No MIME, no links, no compression, no attachments. Just plain text.
192 204
193Linus and other kernel developers need to be able to read and comment 205Linus and other kernel developers need to be able to read and comment
194on the changes you are submitting. It is important for a kernel 206on the changes you are submitting. It is important for a kernel
@@ -223,9 +235,9 @@ pref("mailnews.display.disable_format_flowed_support", true);
223 235
224 236
225 237
2267) E-mail size. 2388) E-mail size.
227 239
228When sending patches to Linus, always follow step #6. 240When sending patches to Linus, always follow step #7.
229 241
230Large changes are not appropriate for mailing lists, and some 242Large changes are not appropriate for mailing lists, and some
231maintainers. If your patch, uncompressed, exceeds 40 kB in size, 243maintainers. If your patch, uncompressed, exceeds 40 kB in size,
@@ -234,7 +246,7 @@ server, and provide instead a URL (link) pointing to your patch.
234 246
235 247
236 248
2378) Name your kernel version. 2499) Name your kernel version.
238 250
239It is important to note, either in the subject line or in the patch 251It is important to note, either in the subject line or in the patch
240description, the kernel version to which this patch applies. 252description, the kernel version to which this patch applies.
@@ -244,7 +256,7 @@ Linus will not apply it.
244 256
245 257
246 258
2479) Don't get discouraged. Re-submit. 25910) Don't get discouraged. Re-submit.
248 260
249After you have submitted your change, be patient and wait. If Linus 261After you have submitted your change, be patient and wait. If Linus
250likes your change and applies it, it will appear in the next version 262likes your change and applies it, it will appear in the next version
@@ -270,7 +282,7 @@ When in doubt, solicit comments on linux-kernel mailing list.
270 282
271 283
272 284
27310) Include PATCH in the subject 28511) Include PATCH in the subject
274 286
275Due to high e-mail traffic to Linus, and to linux-kernel, it is common 287Due to high e-mail traffic to Linus, and to linux-kernel, it is common
276convention to prefix your subject line with [PATCH]. This lets Linus 288convention to prefix your subject line with [PATCH]. This lets Linus
@@ -279,7 +291,7 @@ e-mail discussions.
279 291
280 292
281 293
28211) Sign your work 29412) Sign your work
283 295
284To improve tracking of who did what, especially with patches that can 296To improve tracking of who did what, especially with patches that can
285percolate to their final resting place in the kernel through several 297percolate to their final resting place in the kernel through several
@@ -328,7 +340,32 @@ now, but you can do this to mark internal company procedures or just
328point out some special detail about the sign-off. 340point out some special detail about the sign-off.
329 341
330 342
33112) The canonical patch format 34313) When to use Acked-by:
344
345The Signed-off-by: tag indicates that the signer was involved in the
346development of the patch, or that he/she was in the patch's delivery path.
347
348If a person was not directly involved in the preparation or handling of a
349patch but wishes to signify and record their approval of it then they can
350arrange to have an Acked-by: line added to the patch's changelog.
351
352Acked-by: is often used by the maintainer of the affected code when that
353maintainer neither contributed to nor forwarded the patch.
354
355Acked-by: is not as formal as Signed-off-by:. It is a record that the acker
356has at least reviewed the patch and has indicated acceptance. Hence patch
357mergers will sometimes manually convert an acker's "yep, looks good to me"
358into an Acked-by:.
359
360Acked-by: does not necessarily indicate acknowledgement of the entire patch.
361For example, if a patch affects multiple subsystems and has an Acked-by: from
362one subsystem maintainer then this usually indicates acknowledgement of just
363the part which affects that maintainer's code. Judgement should be used here.
364 When in doubt people should refer to the original discussion in the mailing
365list archives.
366
367
36814) The canonical patch format
332 369
333The canonical patch subject line is: 370The canonical patch subject line is:
334 371
@@ -427,6 +464,10 @@ section Linus Computer Science 101.
427Nuff said. If your code deviates too much from this, it is likely 464Nuff said. If your code deviates too much from this, it is likely
428to be rejected without further review, and without comment. 465to be rejected without further review, and without comment.
429 466
467Check your patches with the patch style checker prior to submission
468(scripts/checkpatch.pl). You should be able to justify all
469violations that remain in your patch.
470
430 471
431 472
4322) #ifdefs are ugly 4732) #ifdefs are ugly
diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt
index 2a63d5662a93..05851e9982ed 100644
--- a/Documentation/atomic_ops.txt
+++ b/Documentation/atomic_ops.txt
@@ -149,7 +149,7 @@ defined which accomplish this:
149 void smp_mb__before_atomic_dec(void); 149 void smp_mb__before_atomic_dec(void);
150 void smp_mb__after_atomic_dec(void); 150 void smp_mb__after_atomic_dec(void);
151 void smp_mb__before_atomic_inc(void); 151 void smp_mb__before_atomic_inc(void);
152 void smp_mb__after_atomic_dec(void); 152 void smp_mb__after_atomic_inc(void);
153 153
154For example, smp_mb__before_atomic_dec() can be used like so: 154For example, smp_mb__before_atomic_dec() can be used like so:
155 155
diff --git a/Documentation/blackfin/kgdb.txt b/Documentation/blackfin/kgdb.txt
new file mode 100644
index 000000000000..84f6a484ae9a
--- /dev/null
+++ b/Documentation/blackfin/kgdb.txt
@@ -0,0 +1,155 @@
1 A Simple Guide to Configure KGDB
2
3 Sonic Zhang <sonic.zhang@analog.com>
4 Aug. 24th 2006
5
6
7This KGDB patch enables the kernel developer to do source level debugging on
8the kernel for the Blackfin architecture. The debugging works over either the
9ethernet interface or one of the uarts. Both software breakpoints and
10hardware breakpoints are supported in this version.
11http://docs.blackfin.uclinux.org/doku.php?id=kgdb
12
13
142 known issues:
151. This bug:
16 http://blackfin.uclinux.org/tracker/index.php?func=detail&aid=544&group_id=18&atid=145
17 The GDB client for Blackfin uClinux causes incorrect values of local
18 variables to be displayed when the user breaks the running of kernel in GDB.
192. Because of a hardware bug in Blackfin 533 v1.0.3:
20 05000067 - Watchpoints (Hardware Breakpoints) are not supported
21 Hardware breakpoints cannot be set properly.
22
23
24Debug over Ethernet:
25
261. Compile and install the cross platform version of gdb for blackfin, which
27 can be found at $(BINROOT)/bfin-elf-gdb.
28
292. Apply this patch to the 2.6.x kernel. Select the menuconfig option under
30 "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb".
31 With this selected, option "Full Symbolic/Source Debugging support" and
32 "Compile the kernel with frame pointers" are also selected.
33
343. Select option "KGDB: connect over (Ethernet)". Add "kgdboe=@target-IP/,@host-IP/" to
35 the option "Compiled-in Kernel Boot Parameter" under "Kernel hacking".
36
374. Connect minicom to the serial port and boot the kernel image.
38
395. Configure the IP "/> ifconfig eth0 target-IP"
40
416. Start GDB client "bfin-elf-gdb vmlinux".
42
437. Connect to the target "(gdb) target remote udp:target-IP:6443".
44
458. Set software breakpoint "(gdb) break sys_open".
46
479. Continue "(gdb) c".
48
4910. Run ls in the target console "/> ls".
50
5111. Breakpoint hits. "Breakpoint 1: sys_open(..."
52
5312. Display local variables and function paramters.
54 (*) This operation gives wrong results, see known issue 1.
55
5613. Single stepping "(gdb) si".
57
5814. Remove breakpoint 1. "(gdb) del 1"
59
6015. Set hardware breakpoint "(gdb) hbreak sys_open".
61
6216. Continue "(gdb) c".
63
6417. Run ls in the target console "/> ls".
65
6618. Hardware breakpoint hits. "Breakpoint 1: sys_open(...".
67 (*) This hardware breakpoint will not be hit, see known issue 2.
68
6919. Continue "(gdb) c".
70
7120. Interrupt the target in GDB "Ctrl+C".
72
7321. Detach from the target "(gdb) detach".
74
7522. Exit GDB "(gdb) quit".
76
77
78Debug over the UART:
79
801. Compile and install the cross platform version of gdb for blackfin, which
81 can be found at $(BINROOT)/bfin-elf-gdb.
82
832. Apply this patch to the 2.6.x kernel. Select the menuconfig option under
84 "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb".
85 With this selected, option "Full Symbolic/Source Debugging support" and
86 "Compile the kernel with frame pointers" are also selected.
87
883. Select option "KGDB: connect over (UART)". Set "KGDB: UART port number" to be
89 a different one from the console. Don't forget to change the mode of
90 blackfin serial driver to PIO. Otherwise kgdb works incorrectly on UART.
91
924. If you want connect to kgdb when the kernel boots, enable
93 "KGDB: Wait for gdb connection early"
94
955. Compile kernel.
96
976. Connect minicom to the serial port of the console and boot the kernel image.
98
997. Start GDB client "bfin-elf-gdb vmlinux".
100
1018. Set the baud rate in GDB "(gdb) set remotebaud 57600".
102
1039. Connect to the target on the second serial port "(gdb) target remote /dev/ttyS1".
104
10510. Set software breakpoint "(gdb) break sys_open".
106
10711. Continue "(gdb) c".
108
10912. Run ls in the target console "/> ls".
110
11113. A breakpoint is hit. "Breakpoint 1: sys_open(..."
112
11314. All other operations are the same as that in KGDB over Ethernet.
114
115
116Debug over the same UART as console:
117
1181. Compile and install the cross platform version of gdb for blackfin, which
119 can be found at $(BINROOT)/bfin-elf-gdb.
120
1212. Apply this patch to the 2.6.x kernel. Select the menuconfig option under
122 "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb".
123 With this selected, option "Full Symbolic/Source Debugging support" and
124 "Compile the kernel with frame pointers" are also selected.
125
1263. Select option "KGDB: connect over UART". Set "KGDB: UART port number" to console.
127 Don't forget to change the mode of blackfin serial driver to PIO.
128 Otherwise kgdb works incorrectly on UART.
129
1304. If you want connect to kgdb when the kernel boots, enable
131 "KGDB: Wait for gdb connection early"
132
1335. Connect minicom to the serial port and boot the kernel image.
134
1356. (Optional) Ask target to wait for gdb connection by entering Ctrl+A. In minicom, you should enter Ctrl+A+A.
136
1377. Start GDB client "bfin-elf-gdb vmlinux".
138
1398. Set the baud rate in GDB "(gdb) set remotebaud 57600".
140
1419. Connect to the target "(gdb) target remote /dev/ttyS0".
142
14310. Set software breakpoint "(gdb) break sys_open".
144
14511. Continue "(gdb) c". Then enter Ctrl+C twice to stop GDB connection.
146
14712. Run ls in the target console "/> ls". Dummy string can be seen on the console.
148
14913. Then connect the gdb to target again. "(gdb) target remote /dev/ttyS0".
150 Now you will find a breakpoint is hit. "Breakpoint 1: sys_open(..."
151
15214. All other operations are the same as that in KGDB over Ethernet. The only
153 difference is that after continue command in GDB, please stop GDB
154 connection by 2 "Ctrl+C"s and connect again after breakpoints are hit or
155 Ctrl+A is entered.
diff --git a/Documentation/block/barrier.txt b/Documentation/block/barrier.txt
index a272c3db8094..7d279f2f5bb2 100644
--- a/Documentation/block/barrier.txt
+++ b/Documentation/block/barrier.txt
@@ -82,23 +82,12 @@ including draining and flushing.
82typedef void (prepare_flush_fn)(request_queue_t *q, struct request *rq); 82typedef void (prepare_flush_fn)(request_queue_t *q, struct request *rq);
83 83
84int blk_queue_ordered(request_queue_t *q, unsigned ordered, 84int blk_queue_ordered(request_queue_t *q, unsigned ordered,
85 prepare_flush_fn *prepare_flush_fn, 85 prepare_flush_fn *prepare_flush_fn);
86 unsigned gfp_mask);
87
88int blk_queue_ordered_locked(request_queue_t *q, unsigned ordered,
89 prepare_flush_fn *prepare_flush_fn,
90 unsigned gfp_mask);
91
92The only difference between the two functions is whether or not the
93caller is holding q->queue_lock on entry. The latter expects the
94caller is holding the lock.
95 86
96@q : the queue in question 87@q : the queue in question
97@ordered : the ordered mode the driver/device supports 88@ordered : the ordered mode the driver/device supports
98@prepare_flush_fn : this function should prepare @rq such that it 89@prepare_flush_fn : this function should prepare @rq such that it
99 flushes cache to physical medium when executed 90 flushes cache to physical medium when executed
100@gfp_mask : gfp_mask used when allocating data structures
101 for ordered processing
102 91
103For example, SCSI disk driver's prepare_flush_fn looks like the 92For example, SCSI disk driver's prepare_flush_fn looks like the
104following. 93following.
@@ -106,9 +95,10 @@ following.
106static void sd_prepare_flush(request_queue_t *q, struct request *rq) 95static void sd_prepare_flush(request_queue_t *q, struct request *rq)
107{ 96{
108 memset(rq->cmd, 0, sizeof(rq->cmd)); 97 memset(rq->cmd, 0, sizeof(rq->cmd));
109 rq->flags |= REQ_BLOCK_PC; 98 rq->cmd_type = REQ_TYPE_BLOCK_PC;
110 rq->timeout = SD_TIMEOUT; 99 rq->timeout = SD_TIMEOUT;
111 rq->cmd[0] = SYNCHRONIZE_CACHE; 100 rq->cmd[0] = SYNCHRONIZE_CACHE;
101 rq->cmd_len = 10;
112} 102}
113 103
114The following seven ordered modes are supported. The following table 104The following seven ordered modes are supported. The following table
diff --git a/Documentation/driver-model/platform.txt b/Documentation/driver-model/platform.txt
index 19c4a6e13676..2a97320ee17f 100644
--- a/Documentation/driver-model/platform.txt
+++ b/Documentation/driver-model/platform.txt
@@ -96,6 +96,46 @@ System setup also associates those clocks with the device, so that that
96calls to clk_get(&pdev->dev, clock_name) return them as needed. 96calls to clk_get(&pdev->dev, clock_name) return them as needed.
97 97
98 98
99Legacy Drivers: Device Probing
100~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101Some drivers are not fully converted to the driver model, because they take
102on a non-driver role: the driver registers its platform device, rather than
103leaving that for system infrastructure. Such drivers can't be hotplugged
104or coldplugged, since those mechanisms require device creation to be in a
105different system component than the driver.
106
107The only "good" reason for this is to handle older system designs which, like
108original IBM PCs, rely on error-prone "probe-the-hardware" models for hardware
109configuration. Newer systems have largely abandoned that model, in favor of
110bus-level support for dynamic configuration (PCI, USB), or device tables
111provided by the boot firmware (e.g. PNPACPI on x86). There are too many
112conflicting options about what might be where, and even educated guesses by
113an operating system will be wrong often enough to make trouble.
114
115This style of driver is discouraged. If you're updating such a driver,
116please try to move the device enumeration to a more appropriate location,
117outside the driver. This will usually be cleanup, since such drivers
118tend to already have "normal" modes, such as ones using device nodes that
119were created by PNP or by platform device setup.
120
121None the less, there are some APIs to support such legacy drivers. Avoid
122using these calls except with such hotplug-deficient drivers.
123
124 struct platform_device *platform_device_alloc(
125 char *name, unsigned id);
126
127You can use platform_device_alloc() to dynamically allocate a device, which
128you will then initialize with resources and platform_device_register().
129A better solution is usually:
130
131 struct platform_device *platform_device_register_simple(
132 char *name, unsigned id,
133 struct resource *res, unsigned nres);
134
135You can use platform_device_register_simple() as a one-step call to allocate
136and register a device.
137
138
99Device Naming and Driver Binding 139Device Naming and Driver Binding
100~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 140~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101The platform_device.dev.bus_id is the canonical name for the devices. 141The platform_device.dev.bus_id is the canonical name for the devices.
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 2d7ea85075ba..092c65dd35c2 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -49,16 +49,6 @@ Who: Adrian Bunk <bunk@stusta.de>
49 49
50--------------------------- 50---------------------------
51 51
52What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN
53When: June 2007
54Why: Deprecated in favour of the more efficient and robust rawiso interface.
55 Affected are applications which use the deprecated part of libraw1394
56 (raw1394_iso_write, raw1394_start_iso_write, raw1394_start_iso_rcv,
57 raw1394_stop_iso_rcv) or bypass libraw1394.
58Who: Dan Dennedy <dan@dennedy.org>, Stefan Richter <stefanr@s5r6.in-berlin.de>
59
60---------------------------
61
62What: old NCR53C9x driver 52What: old NCR53C9x driver
63When: October 2007 53When: October 2007
64Why: Replaced by the much better esp_scsi driver. Actual low-level 54Why: Replaced by the much better esp_scsi driver. Actual low-level
@@ -70,6 +60,7 @@ Who: David Miller <davem@davemloft.net>
70 60
71What: Video4Linux API 1 ioctls and video_decoder.h from Video devices. 61What: Video4Linux API 1 ioctls and video_decoder.h from Video devices.
72When: December 2006 62When: December 2006
63Files: include/linux/video_decoder.h
73Why: V4L1 AP1 was replaced by V4L2 API. during migration from 2.4 to 2.6 64Why: V4L1 AP1 was replaced by V4L2 API. during migration from 2.4 to 2.6
74 series. The old API have lots of drawbacks and don't provide enough 65 series. The old API have lots of drawbacks and don't provide enough
75 means to work with all video and audio standards. The newer API is 66 means to work with all video and audio standards. The newer API is
@@ -103,6 +94,7 @@ Who: Dominik Brodowski <linux@brodo.de>
103What: remove EXPORT_SYMBOL(kernel_thread) 94What: remove EXPORT_SYMBOL(kernel_thread)
104When: August 2006 95When: August 2006
105Files: arch/*/kernel/*_ksyms.c 96Files: arch/*/kernel/*_ksyms.c
97Funcs: kernel_thread
106Why: kernel_thread is a low-level implementation detail. Drivers should 98Why: kernel_thread is a low-level implementation detail. Drivers should
107 use the <linux/kthread.h> API instead which shields them from 99 use the <linux/kthread.h> API instead which shields them from
108 implementation details and provides a higherlevel interface that 100 implementation details and provides a higherlevel interface that
@@ -204,28 +196,6 @@ Who: Adrian Bunk <bunk@stusta.de>
204 196
205--------------------------- 197---------------------------
206 198
207What: ACPI hooks (X86_SPEEDSTEP_CENTRINO_ACPI) in speedstep-centrino driver
208When: December 2006
209Why: Speedstep-centrino driver with ACPI hooks and acpi-cpufreq driver are
210 functionally very much similar. They talk to ACPI in same way. Only
211 difference between them is the way they do frequency transitions.
212 One uses MSRs and the other one uses IO ports. Functionaliy of
213 speedstep_centrino with ACPI hooks is now merged into acpi-cpufreq.
214 That means one common driver will support all Intel Enhanced Speedstep
215 capable CPUs. That means less confusion over name of
216 speedstep-centrino driver (with that driver supposed to be used on
217 non-centrino platforms). That means less duplication of code and
218 less maintenance effort and no possibility of these two drivers
219 going out of sync.
220 Current users of speedstep_centrino with ACPI hooks are requested to
221 switch over to acpi-cpufreq driver. speedstep-centrino will continue
222 to work using older non-ACPI static table based scheme even after this
223 date.
224
225Who: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
226
227---------------------------
228
229What: /sys/firmware/acpi/namespace 199What: /sys/firmware/acpi/namespace
230When: 2.6.21 200When: 2.6.21
231Why: The ACPI namespace is effectively the symbol list for 201Why: The ACPI namespace is effectively the symbol list for
@@ -256,14 +226,6 @@ Who: Len Brown <len.brown@intel.com>
256 226
257--------------------------- 227---------------------------
258 228
259What: sk98lin network driver
260When: July 2007
261Why: In kernel tree version of driver is unmaintained. Sk98lin driver
262 replaced by the skge driver.
263Who: Stephen Hemminger <shemminger@osdl.org>
264
265---------------------------
266
267What: Compaq touchscreen device emulation 229What: Compaq touchscreen device emulation
268When: Oct 2007 230When: Oct 2007
269Files: drivers/input/tsdev.c 231Files: drivers/input/tsdev.c
@@ -278,25 +240,6 @@ Who: Richard Purdie <rpurdie@rpsys.net>
278 240
279--------------------------- 241---------------------------
280 242
281What: Multipath cached routing support in ipv4
282When: in 2.6.23
283Why: Code was merged, then submitter immediately disappeared leaving
284 us with no maintainer and lots of bugs. The code should not have
285 been merged in the first place, and many aspects of it's
286 implementation are blocking more critical core networking
287 development. It's marked EXPERIMENTAL and no distribution
288 enables it because it cause obscure crashes due to unfixable bugs
289 (interfaces don't return errors so memory allocation can't be
290 handled, calling contexts of these interfaces make handling
291 errors impossible too because they get called after we've
292 totally commited to creating a route object, for example).
293 This problem has existed for years and no forward progress
294 has ever been made, and nobody steps up to try and salvage
295 this code, so we're going to finally just get rid of it.
296Who: David S. Miller <davem@davemloft.net>
297
298---------------------------
299
300What: read_dev_chars(), read_conf_data{,_lpm}() (s390 common I/O layer) 243What: read_dev_chars(), read_conf_data{,_lpm}() (s390 common I/O layer)
301When: December 2007 244When: December 2007
302Why: These functions are a leftover from 2.4 times. They have several 245Why: These functions are a leftover from 2.4 times. They have several
@@ -346,3 +289,18 @@ Who: Tejun Heo <htejun@gmail.com>
346 289
347--------------------------- 290---------------------------
348 291
292What: Legacy RTC drivers (under drivers/i2c/chips)
293When: November 2007
294Why: Obsolete. We have a RTC subsystem with better drivers.
295Who: Jean Delvare <khali@linux-fr.org>
296
297---------------------------
298
299What: iptables SAME target
300When: 1.1. 2008
301Files: net/ipv4/netfilter/ipt_SAME.c, include/linux/netfilter_ipv4/ipt_SAME.h
302Why: Obsolete for multiple years now, NAT core provides the same behaviour.
303 Unfixable broken wrt. 32/64 bit cleanness.
304Who: Patrick McHardy <kaber@trash.net>
305
306---------------------------
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt
index 6dd050878a20..145e44086358 100644
--- a/Documentation/filesystems/tmpfs.txt
+++ b/Documentation/filesystems/tmpfs.txt
@@ -94,10 +94,10 @@ largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15
94 94
95Note that trying to mount a tmpfs with an mpol option will fail if the 95Note that trying to mount a tmpfs with an mpol option will fail if the
96running kernel does not support NUMA; and will fail if its nodelist 96running kernel does not support NUMA; and will fail if its nodelist
97specifies a node >= MAX_NUMNODES. If your system relies on that tmpfs 97specifies a node which is not online. If your system relies on that
98being mounted, but from time to time runs a kernel built without NUMA 98tmpfs being mounted, but from time to time runs a kernel built without
99capability (perhaps a safe recovery kernel), or configured to support 99NUMA capability (perhaps a safe recovery kernel), or with fewer nodes
100fewer nodes, then it is advisable to omit the mpol option from automatic 100online, then it is advisable to omit the mpol option from automatic
101mount options. It can be added later, when the tmpfs is already mounted 101mount options. It can be added later, when the tmpfs is already mounted
102on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'. 102on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'.
103 103
@@ -121,4 +121,4 @@ RAM/SWAP in 10240 inodes and it is only accessible by root.
121Author: 121Author:
122 Christoph Rohland <cr@sap.com>, 1.12.01 122 Christoph Rohland <cr@sap.com>, 1.12.01
123Updated: 123Updated:
124 Hugh Dickins <hugh@veritas.com>, 19 February 2006 124 Hugh Dickins <hugh@veritas.com>, 4 June 2007
diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README
index e9cc8bb26f7d..c3480aa66ba8 100644
--- a/Documentation/firmware_class/README
+++ b/Documentation/firmware_class/README
@@ -1,7 +1,7 @@
1 1
2 request_firmware() hotplug interface: 2 request_firmware() hotplug interface:
3 ------------------------------------ 3 ------------------------------------
4 Copyright (C) 2003 Manuel Estrada Sainz <ranty@debian.org> 4 Copyright (C) 2003 Manuel Estrada Sainz
5 5
6 Why: 6 Why:
7 --- 7 ---
diff --git a/Documentation/firmware_class/firmware_sample_driver.c b/Documentation/firmware_class/firmware_sample_driver.c
index 87feccdb5c9f..6865cbe075ec 100644
--- a/Documentation/firmware_class/firmware_sample_driver.c
+++ b/Documentation/firmware_class/firmware_sample_driver.c
@@ -1,7 +1,7 @@
1/* 1/*
2 * firmware_sample_driver.c - 2 * firmware_sample_driver.c -
3 * 3 *
4 * Copyright (c) 2003 Manuel Estrada Sainz <ranty@debian.org> 4 * Copyright (c) 2003 Manuel Estrada Sainz
5 * 5 *
6 * Sample code on how to use request_firmware() from drivers. 6 * Sample code on how to use request_firmware() from drivers.
7 * 7 *
diff --git a/Documentation/firmware_class/firmware_sample_firmware_class.c b/Documentation/firmware_class/firmware_sample_firmware_class.c
index 9e1b0e4051cd..fba943aacf93 100644
--- a/Documentation/firmware_class/firmware_sample_firmware_class.c
+++ b/Documentation/firmware_class/firmware_sample_firmware_class.c
@@ -1,7 +1,7 @@
1/* 1/*
2 * firmware_sample_firmware_class.c - 2 * firmware_sample_firmware_class.c -
3 * 3 *
4 * Copyright (c) 2003 Manuel Estrada Sainz <ranty@debian.org> 4 * Copyright (c) 2003 Manuel Estrada Sainz
5 * 5 *
6 * NOTE: This is just a probe of concept, if you think that your driver would 6 * NOTE: This is just a probe of concept, if you think that your driver would
7 * be well served by this mechanism please contact me first. 7 * be well served by this mechanism please contact me first.
@@ -19,7 +19,7 @@
19#include <linux/firmware.h> 19#include <linux/firmware.h>
20 20
21 21
22MODULE_AUTHOR("Manuel Estrada Sainz <ranty@debian.org>"); 22MODULE_AUTHOR("Manuel Estrada Sainz");
23MODULE_DESCRIPTION("Hackish sample for using firmware class directly"); 23MODULE_DESCRIPTION("Hackish sample for using firmware class directly");
24MODULE_LICENSE("GPL"); 24MODULE_LICENSE("GPL");
25 25
@@ -78,6 +78,7 @@ static CLASS_DEVICE_ATTR(loading, 0644,
78 firmware_loading_show, firmware_loading_store); 78 firmware_loading_show, firmware_loading_store);
79 79
80static ssize_t firmware_data_read(struct kobject *kobj, 80static ssize_t firmware_data_read(struct kobject *kobj,
81 struct bin_attribute *bin_attr,
81 char *buffer, loff_t offset, size_t count) 82 char *buffer, loff_t offset, size_t count)
82{ 83{
83 struct class_device *class_dev = to_class_dev(kobj); 84 struct class_device *class_dev = to_class_dev(kobj);
@@ -88,6 +89,7 @@ static ssize_t firmware_data_read(struct kobject *kobj,
88 return count; 89 return count;
89} 90}
90static ssize_t firmware_data_write(struct kobject *kobj, 91static ssize_t firmware_data_write(struct kobject *kobj,
92 struct bin_attribute *bin_attr,
91 char *buffer, loff_t offset, size_t count) 93 char *buffer, loff_t offset, size_t count)
92{ 94{
93 struct class_device *class_dev = to_class_dev(kobj); 95 struct class_device *class_dev = to_class_dev(kobj);
diff --git a/Documentation/hrtimer/timer_stats.txt b/Documentation/hrtimer/timer_stats.txt
index 27f782e3593f..22b0814d0ad0 100644
--- a/Documentation/hrtimer/timer_stats.txt
+++ b/Documentation/hrtimer/timer_stats.txt
@@ -2,9 +2,10 @@ timer_stats - timer usage statistics
2------------------------------------ 2------------------------------------
3 3
4timer_stats is a debugging facility to make the timer (ab)usage in a Linux 4timer_stats is a debugging facility to make the timer (ab)usage in a Linux
5system visible to kernel and userspace developers. It is not intended for 5system visible to kernel and userspace developers. If enabled in the config
6production usage as it adds significant overhead to the (hr)timer code and the 6but not used it has almost zero runtime overhead, and a relatively small
7(hr)timer data structures. 7data structure overhead. Even if collection is enabled runtime all the
8locking is per-CPU and lookup is hashed.
8 9
9timer_stats should be used by kernel and userspace developers to verify that 10timer_stats should be used by kernel and userspace developers to verify that
10their code does not make unduly use of timers. This helps to avoid unnecessary 11their code does not make unduly use of timers. This helps to avoid unnecessary
diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801
index c34f0db78a30..fe6406f2f9a6 100644
--- a/Documentation/i2c/busses/i2c-i801
+++ b/Documentation/i2c/busses/i2c-i801
@@ -5,8 +5,8 @@ Supported adapters:
5 '810' and '810E' chipsets) 5 '810' and '810E' chipsets)
6 * Intel 82801BA (ICH2 - part of the '815E' chipset) 6 * Intel 82801BA (ICH2 - part of the '815E' chipset)
7 * Intel 82801CA/CAM (ICH3) 7 * Intel 82801CA/CAM (ICH3)
8 * Intel 82801DB (ICH4) (HW PEC supported, 32 byte buffer not supported) 8 * Intel 82801DB (ICH4) (HW PEC supported)
9 * Intel 82801EB/ER (ICH5) (HW PEC supported, 32 byte buffer not supported) 9 * Intel 82801EB/ER (ICH5) (HW PEC supported)
10 * Intel 6300ESB 10 * Intel 6300ESB
11 * Intel 82801FB/FR/FW/FRW (ICH6) 11 * Intel 82801FB/FR/FW/FRW (ICH6)
12 * Intel 82801G (ICH7) 12 * Intel 82801G (ICH7)
diff --git a/Documentation/i2c/busses/i2c-piix4 b/Documentation/i2c/busses/i2c-piix4
index 7cbe43fa2701..fa0c786a8bf5 100644
--- a/Documentation/i2c/busses/i2c-piix4
+++ b/Documentation/i2c/busses/i2c-piix4
@@ -6,7 +6,7 @@ Supported adapters:
6 Datasheet: Publicly available at the Intel website 6 Datasheet: Publicly available at the Intel website
7 * ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges 7 * ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges
8 Datasheet: Only available via NDA from ServerWorks 8 Datasheet: Only available via NDA from ServerWorks
9 * ATI IXP200, IXP300, IXP400 and SB600 southbridges 9 * ATI IXP200, IXP300, IXP400, SB600 and SB700 southbridges
10 Datasheet: Not publicly available 10 Datasheet: Not publicly available
11 * Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge 11 * Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge
12 Datasheet: Publicly available at the SMSC website http://www.smsc.com 12 Datasheet: Publicly available at the SMSC website http://www.smsc.com
diff --git a/Documentation/i2c/busses/i2c-taos-evm b/Documentation/i2c/busses/i2c-taos-evm
new file mode 100644
index 000000000000..9146e33be6dd
--- /dev/null
+++ b/Documentation/i2c/busses/i2c-taos-evm
@@ -0,0 +1,46 @@
1Kernel driver i2c-taos-evm
2
3Author: Jean Delvare <khali@linux-fr.org>
4
5This is a driver for the evaluation modules for TAOS I2C/SMBus chips.
6The modules include an SMBus master with limited capabilities, which can
7be controlled over the serial port. Virtually all evaluation modules
8are supported, but a few lines of code need to be added for each new
9module to instantiate the right I2C chip on the bus. Obviously, a driver
10for the chip in question is also needed.
11
12Currently supported devices are:
13
14* TAOS TSL2550 EVM
15
16For addtional information on TAOS products, please see
17 http://www.taosinc.com/
18
19
20Using this driver
21-----------------
22
23In order to use this driver, you'll need the serport driver, and the
24inputattach tool, which is part of the input-utils package. The following
25commands will tell the kernel that you have a TAOS EVM on the first
26serial port:
27
28# modprobe serport
29# inputattach --taos-evm /dev/ttyS0
30
31
32Technical details
33-----------------
34
35Only 4 SMBus transaction types are supported by the TAOS evaluation
36modules:
37* Receive Byte
38* Send Byte
39* Read Byte
40* Write Byte
41
42The communication protocol is text-based and pretty simple. It is
43described in a PDF document on the CD which comes with the evaluation
44module. The communication is rather slow, because the serial port has
45to operate at 1200 bps. However, I don't think this is a big concern in
46practice, as these modules are meant for evaluation and testing only.
diff --git a/Documentation/i2c/chips/max6875 b/Documentation/i2c/chips/max6875
index 96fec562a8e9..a0cd8af2f408 100644
--- a/Documentation/i2c/chips/max6875
+++ b/Documentation/i2c/chips/max6875
@@ -99,7 +99,7 @@ And then read the data
99 99
100 or 100 or
101 101
102 count = i2c_smbus_read_i2c_block_data(fd, 0x84, buffer); 102 count = i2c_smbus_read_i2c_block_data(fd, 0x84, 16, buffer);
103 103
104The block read should read 16 bytes. 104The block read should read 16 bytes.
1050x84 is the block read command. 1050x84 is the block read command.
diff --git a/Documentation/i2c/chips/x1205 b/Documentation/i2c/chips/x1205
deleted file mode 100644
index 09407c991fe5..000000000000
--- a/Documentation/i2c/chips/x1205
+++ /dev/null
@@ -1,38 +0,0 @@
1Kernel driver x1205
2===================
3
4Supported chips:
5 * Xicor X1205 RTC
6 Prefix: 'x1205'
7 Addresses scanned: none
8 Datasheet: http://www.intersil.com/cda/deviceinfo/0,1477,X1205,00.html
9
10Authors:
11 Karen Spearel <kas11@tampabay.rr.com>,
12 Alessandro Zummo <a.zummo@towertech.it>
13
14Description
15-----------
16
17This module aims to provide complete access to the Xicor X1205 RTC.
18Recently Xicor has merged with Intersil, but the chip is
19still sold under the Xicor brand.
20
21This chip is located at address 0x6f and uses a 2-byte register addressing.
22Two bytes need to be written to read a single register, while most
23other chips just require one and take the second one as the data
24to be written. To prevent corrupting unknown chips, the user must
25explicitely set the probe parameter.
26
27example:
28
29modprobe x1205 probe=0,0x6f
30
31The module supports one more option, hctosys, which is used to set the
32software clock from the x1205. On systems where the x1205 is the
33only hardware rtc, this parameter could be used to achieve a correct
34date/time earlier in the system boot sequence.
35
36example:
37
38modprobe x1205 probe=0,0x6f hctosys=1
diff --git a/Documentation/i2c/summary b/Documentation/i2c/summary
index aea60bf7e8f0..003c7319b8c7 100644
--- a/Documentation/i2c/summary
+++ b/Documentation/i2c/summary
@@ -67,7 +67,6 @@ i2c-proc: The /proc/sys/dev/sensors interface for device (client) drivers
67Algorithm drivers 67Algorithm drivers
68----------------- 68-----------------
69 69
70i2c-algo-8xx: An algorithm for CPM's I2C device in Motorola 8xx processors (NOT BUILT BY DEFAULT)
71i2c-algo-bit: A bit-banging algorithm 70i2c-algo-bit: A bit-banging algorithm
72i2c-algo-pcf: A PCF 8584 style algorithm 71i2c-algo-pcf: A PCF 8584 style algorithm
73i2c-algo-ibm_ocp: An algorithm for the I2C device in IBM 4xx processors (NOT BUILT BY DEFAULT) 72i2c-algo-ibm_ocp: An algorithm for the I2C device in IBM 4xx processors (NOT BUILT BY DEFAULT)
@@ -81,6 +80,5 @@ i2c-pcf-epp: PCF8584 on a EPP parallel port (uses i2c-algo-pcf) (NOT mkpatch
81i2c-philips-par: Philips style parallel port adapter (uses i2c-algo-bit) 80i2c-philips-par: Philips style parallel port adapter (uses i2c-algo-bit)
82i2c-adap-ibm_ocp: IBM 4xx processor I2C device (uses i2c-algo-ibm_ocp) (NOT BUILT BY DEFAULT) 81i2c-adap-ibm_ocp: IBM 4xx processor I2C device (uses i2c-algo-ibm_ocp) (NOT BUILT BY DEFAULT)
83i2c-pport: Primitive parallel port adapter (uses i2c-algo-bit) 82i2c-pport: Primitive parallel port adapter (uses i2c-algo-bit)
84i2c-rpx: RPX board Motorola 8xx I2C device (uses i2c-algo-8xx) (NOT BUILT BY DEFAULT)
85i2c-velleman: Velleman K8000 parallel port adapter (uses i2c-algo-bit) 83i2c-velleman: Velleman K8000 parallel port adapter (uses i2c-algo-bit)
86 84
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients
index 3d8d36b0ad12..2c170032bf37 100644
--- a/Documentation/i2c/writing-clients
+++ b/Documentation/i2c/writing-clients
@@ -571,7 +571,7 @@ SMBus communication
571 u8 command, u8 length, 571 u8 command, u8 length,
572 u8 *values); 572 u8 *values);
573 extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client, 573 extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client,
574 u8 command, u8 *values); 574 u8 command, u8 length, u8 *values);
575 575
576These ones were removed in Linux 2.6.10 because they had no users, but could 576These ones were removed in Linux 2.6.10 because they had no users, but could
577be added back later if needed: 577be added back later if needed:
diff --git a/Documentation/i386/zero-page.txt b/Documentation/i386/zero-page.txt
index c04a421f4a7c..75b3680c41eb 100644
--- a/Documentation/i386/zero-page.txt
+++ b/Documentation/i386/zero-page.txt
@@ -37,6 +37,7 @@ Offset Type Description
370x1d0 unsigned long EFI memory descriptor map pointer 370x1d0 unsigned long EFI memory descriptor map pointer
380x1d4 unsigned long EFI memory descriptor map size 380x1d4 unsigned long EFI memory descriptor map size
390x1e0 unsigned long ALT_MEM_K, alternative mem check, in Kb 390x1e0 unsigned long ALT_MEM_K, alternative mem check, in Kb
400x1e4 unsigned long Scratch field for the kernel setup code
400x1e8 char number of entries in E820MAP (below) 410x1e8 char number of entries in E820MAP (below)
410x1e9 unsigned char number of entries in EDDBUF (below) 420x1e9 unsigned char number of entries in EDDBUF (below)
420x1ea unsigned char number of entries in EDD_MBR_SIG_BUFFER (below) 430x1ea unsigned char number of entries in EDD_MBR_SIG_BUFFER (below)
diff --git a/Documentation/ia64/aliasing-test.c b/Documentation/ia64/aliasing-test.c
index 3153167b41c3..773a814d4093 100644
--- a/Documentation/ia64/aliasing-test.c
+++ b/Documentation/ia64/aliasing-test.c
@@ -19,6 +19,7 @@
19#include <sys/mman.h> 19#include <sys/mman.h>
20#include <sys/stat.h> 20#include <sys/stat.h>
21#include <unistd.h> 21#include <unistd.h>
22#include <linux/pci.h>
22 23
23int sum; 24int sum;
24 25
@@ -34,13 +35,19 @@ int map_mem(char *path, off_t offset, size_t length, int touch)
34 return -1; 35 return -1;
35 } 36 }
36 37
38 if (fnmatch("/proc/bus/pci/*", path, 0) == 0) {
39 rc = ioctl(fd, PCIIOC_MMAP_IS_MEM);
40 if (rc == -1)
41 perror("PCIIOC_MMAP_IS_MEM ioctl");
42 }
43
37 addr = mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, offset); 44 addr = mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, offset);
38 if (addr == MAP_FAILED) 45 if (addr == MAP_FAILED)
39 return 1; 46 return 1;
40 47
41 if (touch) { 48 if (touch) {
42 c = (int *) addr; 49 c = (int *) addr;
43 while (c < (int *) (offset + length)) 50 while (c < (int *) (addr + length))
44 sum += *c++; 51 sum += *c++;
45 } 52 }
46 53
@@ -54,7 +61,7 @@ int map_mem(char *path, off_t offset, size_t length, int touch)
54 return 0; 61 return 0;
55} 62}
56 63
57int scan_sysfs(char *path, char *file, off_t offset, size_t length, int touch) 64int scan_tree(char *path, char *file, off_t offset, size_t length, int touch)
58{ 65{
59 struct dirent **namelist; 66 struct dirent **namelist;
60 char *name, *path2; 67 char *name, *path2;
@@ -93,7 +100,7 @@ int scan_sysfs(char *path, char *file, off_t offset, size_t length, int touch)
93 } else { 100 } else {
94 r = lstat(path2, &buf); 101 r = lstat(path2, &buf);
95 if (r == 0 && S_ISDIR(buf.st_mode)) { 102 if (r == 0 && S_ISDIR(buf.st_mode)) {
96 rc = scan_sysfs(path2, file, offset, length, touch); 103 rc = scan_tree(path2, file, offset, length, touch);
97 if (rc < 0) 104 if (rc < 0)
98 return rc; 105 return rc;
99 } 106 }
@@ -197,7 +204,7 @@ skip:
197 return rc; 204 return rc;
198} 205}
199 206
200main() 207int main()
201{ 208{
202 int rc; 209 int rc;
203 210
@@ -238,10 +245,15 @@ main()
238 else 245 else
239 fprintf(stderr, "FAIL: /dev/mem 0x0-0x100000 not accessible\n"); 246 fprintf(stderr, "FAIL: /dev/mem 0x0-0x100000 not accessible\n");
240 247
241 scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0, 0xA0000, 1); 248 scan_tree("/sys/class/pci_bus", "legacy_mem", 0, 0xA0000, 1);
242 scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0xA0000, 0x20000, 0); 249 scan_tree("/sys/class/pci_bus", "legacy_mem", 0xA0000, 0x20000, 0);
243 scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0xC0000, 0x40000, 1); 250 scan_tree("/sys/class/pci_bus", "legacy_mem", 0xC0000, 0x40000, 1);
244 scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0, 1024*1024, 0); 251 scan_tree("/sys/class/pci_bus", "legacy_mem", 0, 1024*1024, 0);
245 252
246 scan_rom("/sys/devices", "rom"); 253 scan_rom("/sys/devices", "rom");
254
255 scan_tree("/proc/bus/pci", "??.?", 0, 0xA0000, 1);
256 scan_tree("/proc/bus/pci", "??.?", 0xA0000, 0x20000, 0);
257 scan_tree("/proc/bus/pci", "??.?", 0xC0000, 0x40000, 1);
258 scan_tree("/proc/bus/pci", "??.?", 0, 1024*1024, 0);
247} 259}
diff --git a/Documentation/ia64/aliasing.txt b/Documentation/ia64/aliasing.txt
index 9a431a7d0f5d..aa3e953f0f7b 100644
--- a/Documentation/ia64/aliasing.txt
+++ b/Documentation/ia64/aliasing.txt
@@ -112,6 +112,18 @@ POTENTIAL ATTRIBUTE ALIASING CASES
112 112
113 The /dev/mem mmap constraints apply. 113 The /dev/mem mmap constraints apply.
114 114
115 mmap of /proc/bus/pci/.../??.?
116
117 This is an MMIO mmap of PCI functions, which additionally may or
118 may not be requested as using the WC attribute.
119
120 If WC is requested, and the region in kern_memmap is either WC
121 or UC, and the EFI memory map designates the region as WC, then
122 the WC mapping is allowed.
123
124 Otherwise, the user mapping must use the same attribute as the
125 kernel mapping.
126
115 read/write of /dev/mem 127 read/write of /dev/mem
116 128
117 This uses copy_from_user(), which implicitly uses a kernel 129 This uses copy_from_user(), which implicitly uses a kernel
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index aae2282600ca..4d880b3d1f35 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -170,7 +170,10 @@ and is between 256 and 4096 characters. It is defined in the file
170 acpi_os_name= [HW,ACPI] Tell ACPI BIOS the name of the OS 170 acpi_os_name= [HW,ACPI] Tell ACPI BIOS the name of the OS
171 Format: To spoof as Windows 98: ="Microsoft Windows" 171 Format: To spoof as Windows 98: ="Microsoft Windows"
172 172
173 acpi_osi= [HW,ACPI] empty param disables _OSI 173 acpi_osi= [HW,ACPI] Modify list of supported OS interface strings
174 acpi_osi="string1" # add string1 -- only one string
175 acpi_osi="!string2" # remove built-in string2
176 acpi_osi= # disable all strings
174 177
175 acpi_serialize [HW,ACPI] force serialization of AML methods 178 acpi_serialize [HW,ACPI] force serialization of AML methods
176 179
@@ -220,11 +223,6 @@ and is between 256 and 4096 characters. It is defined in the file
220 223
221 acpi_fake_ecdt [HW,ACPI] Workaround failure due to BIOS lacking ECDT 224 acpi_fake_ecdt [HW,ACPI] Workaround failure due to BIOS lacking ECDT
222 225
223 acpi_generic_hotkey [HW,ACPI]
224 Allow consolidated generic hotkey driver to
225 override platform specific driver.
226 See also Documentation/acpi-hotkey.txt.
227
228 acpi_pm_good [IA-32,X86-64] 226 acpi_pm_good [IA-32,X86-64]
229 Override the pmtimer bug detection: force the kernel 227 Override the pmtimer bug detection: force the kernel
230 to assume that this machine's pmtimer latches its value 228 to assume that this machine's pmtimer latches its value
@@ -1016,49 +1014,6 @@ and is between 256 and 4096 characters. It is defined in the file
1016 1014
1017 mga= [HW,DRM] 1015 mga= [HW,DRM]
1018 1016
1019 migration_cost=
1020 [KNL,SMP] debug: override scheduler migration costs
1021 Format: <level-1-usecs>,<level-2-usecs>,...
1022 This debugging option can be used to override the
1023 default scheduler migration cost matrix. The numbers
1024 are indexed by 'CPU domain distance'.
1025 E.g. migration_cost=1000,2000,3000 on an SMT NUMA
1026 box will set up an intra-core migration cost of
1027 1 msec, an inter-core migration cost of 2 msecs,
1028 and an inter-node migration cost of 3 msecs.
1029
1030 WARNING: using the wrong values here can break
1031 scheduler performance, so it's only for scheduler
1032 development purposes, not production environments.
1033
1034 migration_debug=
1035 [KNL,SMP] migration cost auto-detect verbosity
1036 Format=<0|1|2>
1037 If a system's migration matrix reported at bootup
1038 seems erroneous then this option can be used to
1039 increase verbosity of the detection process.
1040 We default to 0 (no extra messages), 1 will print
1041 some more information, and 2 will be really
1042 verbose (probably only useful if you also have a
1043 serial console attached to the system).
1044
1045 migration_factor=
1046 [KNL,SMP] multiply/divide migration costs by a factor
1047 Format=<percent>
1048 This debug option can be used to proportionally
1049 increase or decrease the auto-detected migration
1050 costs for all entries of the migration matrix.
1051 E.g. migration_factor=150 will increase migration
1052 costs by 50%. (and thus the scheduler will be less
1053 eager migrating cache-hot tasks)
1054 migration_factor=80 will decrease migration costs
1055 by 20%. (thus the scheduler will be more eager to
1056 migrate tasks)
1057
1058 WARNING: using the wrong values here can break
1059 scheduler performance, so it's only for scheduler
1060 development purposes, not production environments.
1061
1062 mousedev.tap_time= 1017 mousedev.tap_time=
1063 [MOUSE] Maximum time between finger touching and 1018 [MOUSE] Maximum time between finger touching and
1064 leaving touchpad surface for touch to be considered 1019 leaving touchpad surface for touch to be considered
@@ -1132,9 +1087,9 @@ and is between 256 and 4096 characters. It is defined in the file
1132 when set. 1087 when set.
1133 Format: <int> 1088 Format: <int>
1134 1089
1135 noaliencache [MM, NUMA] Disables the allcoation of alien caches in 1090 noaliencache [MM, NUMA, SLAB] Disables the allocation of alien
1136 the slab allocator. Saves per-node memory, but will 1091 caches in the slab allocator. Saves per-node memory,
1137 impact performance on real NUMA hardware. 1092 but will impact performance.
1138 1093
1139 noalign [KNL,ARM] 1094 noalign [KNL,ARM]
1140 1095
@@ -1613,6 +1568,37 @@ and is between 256 and 4096 characters. It is defined in the file
1613 1568
1614 slram= [HW,MTD] 1569 slram= [HW,MTD]
1615 1570
1571 slub_debug [MM, SLUB]
1572 Enabling slub_debug allows one to determine the culprit
1573 if slab objects become corrupted. Enabling slub_debug
1574 creates guard zones around objects and poisons objects
1575 when not in use. Also tracks the last alloc / free.
1576 For more information see Documentation/vm/slub.txt.
1577
1578 slub_max_order= [MM, SLUB]
1579 Determines the maximum allowed order for slabs. Setting
1580 this too high may cause fragmentation.
1581 For more information see Documentation/vm/slub.txt.
1582
1583 slub_min_objects= [MM, SLUB]
1584 The minimum objects per slab. SLUB will increase the
1585 slab order up to slub_max_order to generate a
1586 sufficiently big slab to satisfy the number of objects.
1587 The higher the number of objects the smaller the overhead
1588 of tracking slabs.
1589 For more information see Documentation/vm/slub.txt.
1590
1591 slub_min_order= [MM, SLUB]
1592 Determines the mininum page order for slabs. Must be
1593 lower than slub_max_order
1594 For more information see Documentation/vm/slub.txt.
1595
1596 slub_nomerge [MM, SLUB]
1597 Disable merging of slabs of similar size. May be
1598 necessary if there is some reason to distinguish
1599 allocs to different slabs.
1600 For more information see Documentation/vm/slub.txt.
1601
1616 smart2= [HW] 1602 smart2= [HW]
1617 Format: <io1>[,<io2>[,...,<io8>]] 1603 Format: <io1>[,<io2>[,...,<io8>]]
1618 1604
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX
index e06b6e3c1db5..d63f480afb74 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -32,6 +32,8 @@ cops.txt
32 - info on the COPS LocalTalk Linux driver 32 - info on the COPS LocalTalk Linux driver
33cs89x0.txt 33cs89x0.txt
34 - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver 34 - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver
35cxacru.txt
36 - Conexant AccessRunner USB ADSL Modem
35de4x5.txt 37de4x5.txt
36 - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver 38 - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver
37decnet.txt 39decnet.txt
@@ -94,9 +96,6 @@ routing.txt
94 - the new routing mechanism 96 - the new routing mechanism
95shaper.txt 97shaper.txt
96 - info on the module that can shape/limit transmitted traffic. 98 - info on the module that can shape/limit transmitted traffic.
97sk98lin.txt
98 - Marvell Yukon Chipset / SysKonnect SK-98xx compliant Gigabit
99 Ethernet Adapter family driver info
100skfp.txt 99skfp.txt
101 - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. 100 - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info.
102smc9.txt 101smc9.txt
diff --git a/Documentation/networking/cxacru.txt b/Documentation/networking/cxacru.txt
new file mode 100644
index 000000000000..b074681a963e
--- /dev/null
+++ b/Documentation/networking/cxacru.txt
@@ -0,0 +1,84 @@
1Firmware is required for this device: http://accessrunner.sourceforge.net/
2
3While it is capable of managing/maintaining the ADSL connection without the
4module loaded, the device will sometimes stop responding after unloading the
5driver and it is necessary to unplug/remove power to the device to fix this.
6
7Detected devices will appear as ATM devices named "cxacru". In /sys/class/atm/
8these are directories named cxacruN where N is the device number. A symlink
9named device points to the USB interface device's directory which contains
10several sysfs attribute files for retrieving device statistics:
11
12* adsl_controller_version
13
14* adsl_headend
15* adsl_headend_environment
16 Information about the remote headend.
17
18* downstream_attenuation (dB)
19* downstream_bits_per_frame
20* downstream_rate (kbps)
21* downstream_snr_margin (dB)
22 Downstream stats.
23
24* upstream_attenuation (dB)
25* upstream_bits_per_frame
26* upstream_rate (kbps)
27* upstream_snr_margin (dB)
28* transmitter_power (dBm/Hz)
29 Upstream stats.
30
31* downstream_crc_errors
32* downstream_fec_errors
33* downstream_hec_errors
34* upstream_crc_errors
35* upstream_fec_errors
36* upstream_hec_errors
37 Error counts.
38
39* line_startable
40 Indicates that ADSL support on the device
41 is/can be enabled, see adsl_start.
42
43* line_status
44 "initialising"
45 "down"
46 "attempting to activate"
47 "training"
48 "channel analysis"
49 "exchange"
50 "waiting"
51 "up"
52
53 Changes between "down" and "attempting to activate"
54 if there is no signal.
55
56* link_status
57 "not connected"
58 "connected"
59 "lost"
60
61* mac_address
62
63* modulation
64 "ANSI T1.413"
65 "ITU-T G.992.1 (G.DMT)"
66 "ITU-T G.992.2 (G.LITE)"
67
68* startup_attempts
69 Count of total attempts to initialise ADSL.
70
71To enable/disable ADSL, the following can be written to the adsl_state file:
72 "start"
73 "stop
74 "restart" (stops, waits 1.5s, then starts)
75 "poll" (used to resume status polling if it was disabled due to failure)
76
77Changes in adsl/line state are reported via kernel log messages:
78 [4942145.150704] ATM dev 0: ADSL state: running
79 [4942243.663766] ATM dev 0: ADSL line: down
80 [4942249.665075] ATM dev 0: ADSL line: attempting to activate
81 [4942253.654954] ATM dev 0: ADSL line: training
82 [4942255.666387] ATM dev 0: ADSL line: channel analysis
83 [4942259.656262] ATM dev 0: ADSL line: exchange
84 [2635357.696901] ATM dev 0: ADSL line: up (8128 kb/s down | 832 kb/s up)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index af6a63ab9026..32c2e9da5f3a 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -433,6 +433,12 @@ tcp_workaround_signed_windows - BOOLEAN
433 not receive a window scaling option from them. 433 not receive a window scaling option from them.
434 Default: 0 434 Default: 0
435 435
436tcp_dma_copybreak - INTEGER
437 Lower limit, in bytes, of the size of socket reads that will be
438 offloaded to a DMA copy engine, if one is present in the system
439 and CONFIG_NET_DMA is enabled.
440 Default: 4096
441
436CIPSOv4 Variables: 442CIPSOv4 Variables:
437 443
438cipso_cache_enable - BOOLEAN 444cipso_cache_enable - BOOLEAN
@@ -874,8 +880,7 @@ accept_redirects - BOOLEAN
874accept_source_route - INTEGER 880accept_source_route - INTEGER
875 Accept source routing (routing extension header). 881 Accept source routing (routing extension header).
876 882
877 > 0: Accept routing header. 883 >= 0: Accept only routing header type 2.
878 = 0: Accept only routing header type 2.
879 < 0: Do not accept routing header. 884 < 0: Do not accept routing header.
880 885
881 Default: 0 886 Default: 0
diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt
new file mode 100644
index 000000000000..2451f551c505
--- /dev/null
+++ b/Documentation/networking/l2tp.txt
@@ -0,0 +1,169 @@
1This brief document describes how to use the kernel's PPPoL2TP driver
2to provide L2TP functionality. L2TP is a protocol that tunnels one or
3more PPP sessions over a UDP tunnel. It is commonly used for VPNs
4(L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP
5network infrastructure.
6
7Design
8======
9
10The PPPoL2TP driver, drivers/net/pppol2tp.c, provides a mechanism by
11which PPP frames carried through an L2TP session are passed through
12the kernel's PPP subsystem. The standard PPP daemon, pppd, handles all
13PPP interaction with the peer. PPP network interfaces are created for
14each local PPP endpoint.
15
16The L2TP protocol http://www.faqs.org/rfcs/rfc2661.html defines L2TP
17control and data frames. L2TP control frames carry messages between
18L2TP clients/servers and are used to setup / teardown tunnels and
19sessions. An L2TP client or server is implemented in userspace and
20will use a regular UDP socket per tunnel. L2TP data frames carry PPP
21frames, which may be PPP control or PPP data. The kernel's PPP
22subsystem arranges for PPP control frames to be delivered to pppd,
23while data frames are forwarded as usual.
24
25Each tunnel and session within a tunnel is assigned a unique tunnel_id
26and session_id. These ids are carried in the L2TP header of every
27control and data packet. The pppol2tp driver uses them to lookup
28internal tunnel and/or session contexts. Zero tunnel / session ids are
29treated specially - zero ids are never assigned to tunnels or sessions
30in the network. In the driver, the tunnel context keeps a pointer to
31the tunnel UDP socket. The session context keeps a pointer to the
32PPPoL2TP socket, as well as other data that lets the driver interface
33to the kernel PPP subsystem.
34
35Note that the pppol2tp kernel driver handles only L2TP data frames;
36L2TP control frames are simply passed up to userspace in the UDP
37tunnel socket. The kernel handles all datapath aspects of the
38protocol, including data packet resequencing (if enabled).
39
40There are a number of requirements on the userspace L2TP daemon in
41order to use the pppol2tp driver.
42
431. Use a UDP socket per tunnel.
44
452. Create a single PPPoL2TP socket per tunnel bound to a special null
46 session id. This is used only for communicating with the driver but
47 must remain open while the tunnel is active. Opening this tunnel
48 management socket causes the driver to mark the tunnel socket as an
49 L2TP UDP encapsulation socket and flags it for use by the
50 referenced tunnel id. This hooks up the UDP receive path via
51 udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed
52 in this special PPPoX socket.
53
543. Create a PPPoL2TP socket per L2TP session. This is typically done
55 by starting pppd with the pppol2tp plugin and appropriate
56 arguments. A PPPoL2TP tunnel management socket (Step 2) must be
57 created before the first PPPoL2TP session socket is created.
58
59When creating PPPoL2TP sockets, the application provides information
60to the driver about the socket in a socket connect() call. Source and
61destination tunnel and session ids are provided, as well as the file
62descriptor of a UDP socket. See struct pppol2tp_addr in
63include/linux/if_ppp.h. Note that zero tunnel / session ids are
64treated specially. When creating the per-tunnel PPPoL2TP management
65socket in Step 2 above, zero source and destination session ids are
66specified, which tells the driver to prepare the supplied UDP file
67descriptor for use as an L2TP tunnel socket.
68
69Userspace may control behavior of the tunnel or session using
70setsockopt and ioctl on the PPPoX socket. The following socket
71options are supported:-
72
73DEBUG - bitmask of debug message categories. See below.
74SENDSEQ - 0 => don't send packets with sequence numbers
75 1 => send packets with sequence numbers
76RECVSEQ - 0 => receive packet sequence numbers are optional
77 1 => drop receive packets without sequence numbers
78LNSMODE - 0 => act as LAC.
79 1 => act as LNS.
80REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder.
81
82Only the DEBUG option is supported by the special tunnel management
83PPPoX socket.
84
85In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided
86to retrieve tunnel and session statistics from the kernel using the
87PPPoX socket of the appropriate tunnel or session.
88
89Debugging
90=========
91
92The driver supports a flexible debug scheme where kernel trace
93messages may be optionally enabled per tunnel and per session. Care is
94needed when debugging a live system since the messages are not
95rate-limited and a busy system could be swamped. Userspace uses
96setsockopt on the PPPoX socket to set a debug mask.
97
98The following debug mask bits are available:
99
100PPPOL2TP_MSG_DEBUG verbose debug (if compiled in)
101PPPOL2TP_MSG_CONTROL userspace - kernel interface
102PPPOL2TP_MSG_SEQ sequence numbers handling
103PPPOL2TP_MSG_DATA data packets
104
105Sample Userspace Code
106=====================
107
1081. Create tunnel management PPPoX socket
109
110 kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
111 if (kernel_fd >= 0) {
112 struct sockaddr_pppol2tp sax;
113 struct sockaddr_in const *peer_addr;
114
115 peer_addr = l2tp_tunnel_get_peer_addr(tunnel);
116 memset(&sax, 0, sizeof(sax));
117 sax.sa_family = AF_PPPOX;
118 sax.sa_protocol = PX_PROTO_OL2TP;
119 sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */
120 sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr;
121 sax.pppol2tp.addr.sin_port = peer_addr->sin_port;
122 sax.pppol2tp.addr.sin_family = AF_INET;
123 sax.pppol2tp.s_tunnel = tunnel_id;
124 sax.pppol2tp.s_session = 0; /* special case: mgmt socket */
125 sax.pppol2tp.d_tunnel = 0;
126 sax.pppol2tp.d_session = 0; /* special case: mgmt socket */
127
128 if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) {
129 perror("connect failed");
130 result = -errno;
131 goto err;
132 }
133 }
134
1352. Create session PPPoX data socket
136
137 struct sockaddr_pppol2tp sax;
138 int fd;
139
140 /* Note, the target socket must be bound already, else it will not be ready */
141 sax.sa_family = AF_PPPOX;
142 sax.sa_protocol = PX_PROTO_OL2TP;
143 sax.pppol2tp.fd = tunnel_fd;
144 sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr;
145 sax.pppol2tp.addr.sin_port = addr->sin_port;
146 sax.pppol2tp.addr.sin_family = AF_INET;
147 sax.pppol2tp.s_tunnel = tunnel_id;
148 sax.pppol2tp.s_session = session_id;
149 sax.pppol2tp.d_tunnel = peer_tunnel_id;
150 sax.pppol2tp.d_session = peer_session_id;
151
152 /* session_fd is the fd of the session's PPPoL2TP socket.
153 * tunnel_fd is the fd of the tunnel UDP socket.
154 */
155 fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax));
156 if (fd < 0 ) {
157 return -errno;
158 }
159 return 0;
160
161Miscellanous
162============
163
164The PPPoL2TP driver was developed as part of the OpenL2TP project by
165Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server,
166designed from the ground up to have the L2TP datapath in the
167kernel. The project also implemented the pppol2tp plugin for pppd
168which allows pppd to use the kernel driver. Details can be found at
169http://openl2tp.sourceforge.net.
diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.txt
new file mode 100644
index 000000000000..53ef7a06f49c
--- /dev/null
+++ b/Documentation/networking/mac80211-injection.txt
@@ -0,0 +1,59 @@
1How to use packet injection with mac80211
2=========================================
3
4mac80211 now allows arbitrary packets to be injected down any Monitor Mode
5interface from userland. The packet you inject needs to be composed in the
6following format:
7
8 [ radiotap header ]
9 [ ieee80211 header ]
10 [ payload ]
11
12The radiotap format is discussed in
13./Documentation/networking/radiotap-headers.txt.
14
15Despite 13 radiotap argument types are currently defined, most only make sense
16to appear on received packets. Currently three kinds of argument are used by
17the injection code, although it knows to skip any other arguments that are
18present (facilitating replay of captured radiotap headers directly):
19
20 - IEEE80211_RADIOTAP_RATE - u8 arg in 500kbps units (0x02 --> 1Mbps)
21
22 - IEEE80211_RADIOTAP_ANTENNA - u8 arg, 0x00 = ant1, 0x01 = ant2
23
24 - IEEE80211_RADIOTAP_DBM_TX_POWER - u8 arg, dBm
25
26Here is an example valid radiotap header defining these three parameters
27
28 0x00, 0x00, // <-- radiotap version
29 0x0b, 0x00, // <- radiotap header length
30 0x04, 0x0c, 0x00, 0x00, // <-- bitmap
31 0x6c, // <-- rate
32 0x0c, //<-- tx power
33 0x01 //<-- antenna
34
35The ieee80211 header follows immediately afterwards, looking for example like
36this:
37
38 0x08, 0x01, 0x00, 0x00,
39 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
40 0x13, 0x22, 0x33, 0x44, 0x55, 0x66,
41 0x13, 0x22, 0x33, 0x44, 0x55, 0x66,
42 0x10, 0x86
43
44Then lastly there is the payload.
45
46After composing the packet contents, it is sent by send()-ing it to a logical
47mac80211 interface that is in Monitor mode. Libpcap can also be used,
48(which is easier than doing the work to bind the socket to the right
49interface), along the following lines:
50
51 ppcap = pcap_open_live(szInterfaceName, 800, 1, 20, szErrbuf);
52...
53 r = pcap_inject(ppcap, u8aSendBuffer, nLength);
54
55You can also find sources for a complete inject test applet here:
56
57http://penumbra.warmcat.com/_twk/tiki-index.php?page=packetspammer
58
59Andy Green <andy@warmcat.com>
diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt
new file mode 100644
index 000000000000..00b60cce2224
--- /dev/null
+++ b/Documentation/networking/multiqueue.txt
@@ -0,0 +1,111 @@
1
2 HOWTO for multiqueue network device support
3 ===========================================
4
5Section 1: Base driver requirements for implementing multiqueue support
6Section 2: Qdisc support for multiqueue devices
7Section 3: Brief howto using PRIO or RR for multiqueue devices
8
9
10Intro: Kernel support for multiqueue devices
11---------------------------------------------------------
12
13Kernel support for multiqueue devices is only an API that is presented to the
14netdevice layer for base drivers to implement. This feature is part of the
15core networking stack, and all network devices will be running on the
16multiqueue-aware stack. If a base driver only has one queue, then these
17changes are transparent to that driver.
18
19
20Section 1: Base driver requirements for implementing multiqueue support
21-----------------------------------------------------------------------
22
23Base drivers are required to use the new alloc_etherdev_mq() or
24alloc_netdev_mq() functions to allocate the subqueues for the device. The
25underlying kernel API will take care of the allocation and deallocation of
26the subqueue memory, as well as netdev configuration of where the queues
27exist in memory.
28
29The base driver will also need to manage the queues as it does the global
30netdev->queue_lock today. Therefore base drivers should use the
31netif_{start|stop|wake}_subqueue() functions to manage each queue while the
32device is still operational. netdev->queue_lock is still used when the device
33comes online or when it's completely shut down (unregister_netdev(), etc.).
34
35Finally, the base driver should indicate that it is a multiqueue device. The
36feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features
37bitmap on device initialization. Below is an example from e1000:
38
39#ifdef CONFIG_E1000_MQ
40 if ( (adapter->hw.mac.type == e1000_82571) ||
41 (adapter->hw.mac.type == e1000_82572) ||
42 (adapter->hw.mac.type == e1000_80003es2lan))
43 netdev->features |= NETIF_F_MULTI_QUEUE;
44#endif
45
46
47Section 2: Qdisc support for multiqueue devices
48-----------------------------------------------
49
50Currently two qdiscs support multiqueue devices. A new round-robin qdisc,
51sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to
52bands and queues, and will store the queue mapping into skb->queue_mapping.
53Use this field in the base driver to determine which queue to send the skb
54to.
55
56sch_rr has been added for hardware that doesn't want scheduling policies from
57software, so it's a straight round-robin qdisc. It uses the same syntax and
58classification priomap that sch_prio uses, so it should be intuitive to
59configure for people who've used sch_prio.
60
61The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been
62built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of
63bands requested is equal to the number of queues on the hardware. If they
64are equal, it sets a one-to-one mapping up between the queues and bands. If
65they're not equal, it will not load the qdisc. This is the same behavior
66for RR. Once the association is made, any skb that is classified will have
67skb->queue_mapping set, which will allow the driver to properly queue skb's
68to multiple queues.
69
70
71Section 3: Brief howto using PRIO and RR for multiqueue devices
72---------------------------------------------------------------
73
74The userspace command 'tc,' part of the iproute2 package, is used to configure
75qdiscs. To add the PRIO qdisc to your network device, assuming the device is
76called eth0, run the following command:
77
78# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue
79
80This will create 4 bands, 0 being highest priority, and associate those bands
81to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping
82would look like:
83
84band 0 => queue 0
85band 1 => queue 1
86band 2 => queue 2
87band 3 => queue 3
88
89Traffic will begin flowing through each queue if your TOS values are assigning
90traffic across the various bands. For example, ssh traffic will always try to
91go out band 0 based on TOS -> Linux priority conversion (realtime traffic),
92so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal"
93traffic classification, which is band 1. Therefore pings will be send out
94queue 1 on the NIC.
95
96Note the use of the multiqueue keyword. This is only in versions of iproute2
97that support multiqueue networking devices; if this is omitted when loading
98a qdisc onto a multiqueue device, the qdisc will load and operate the same
99if it were loaded onto a single-queue device (i.e. - sends all traffic to
100queue 0).
101
102Another alternative to multiqueue band allocation can be done by using the
103multiqueue option and specify 0 bands. If this is the case, the qdisc will
104allocate the number of bands to equal the number of queues that the device
105reports, and bring the qdisc online.
106
107The behavior of tc filters remains the same, where it will override TOS priority
108classification.
109
110
111Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt
index ce1361f95243..37869295fc70 100644
--- a/Documentation/networking/netdevices.txt
+++ b/Documentation/networking/netdevices.txt
@@ -20,6 +20,30 @@ private data which gets freed when the network device is freed. If
20separately allocated data is attached to the network device 20separately allocated data is attached to the network device
21(dev->priv) then it is up to the module exit handler to free that. 21(dev->priv) then it is up to the module exit handler to free that.
22 22
23MTU
24===
25Each network device has a Maximum Transfer Unit. The MTU does not
26include any link layer protocol overhead. Upper layer protocols must
27not pass a socket buffer (skb) to a device to transmit with more data
28than the mtu. The MTU does not include link layer header overhead, so
29for example on Ethernet if the standard MTU is 1500 bytes used, the
30actual skb will contain up to 1514 bytes because of the Ethernet
31header. Devices should allow for the 4 byte VLAN header as well.
32
33Segmentation Offload (GSO, TSO) is an exception to this rule. The
34upper layer protocol may pass a large socket buffer to the device
35transmit routine, and the device will break that up into separate
36packets based on the current MTU.
37
38MTU is symmetrical and applies both to receive and transmit. A device
39must be able to receive at least the maximum size packet allowed by
40the MTU. A network device may use the MTU as mechanism to size receive
41buffers, but the device should allow packets with VLAN header. With
42standard Ethernet mtu of 1500 bytes, the device should allow up to
431518 byte packets (1500 + 14 header + 4 tag). The device may either:
44drop, truncate, or pass up oversize packets, but dropping oversize
45packets is preferred.
46
23 47
24struct net_device synchronization rules 48struct net_device synchronization rules
25======================================= 49=======================================
@@ -43,16 +67,17 @@ dev->get_stats:
43 67
44dev->hard_start_xmit: 68dev->hard_start_xmit:
45 Synchronization: netif_tx_lock spinlock. 69 Synchronization: netif_tx_lock spinlock.
70
46 When the driver sets NETIF_F_LLTX in dev->features this will be 71 When the driver sets NETIF_F_LLTX in dev->features this will be
47 called without holding netif_tx_lock. In this case the driver 72 called without holding netif_tx_lock. In this case the driver
48 has to lock by itself when needed. It is recommended to use a try lock 73 has to lock by itself when needed. It is recommended to use a try lock
49 for this and return -1 when the spin lock fails. 74 for this and return NETDEV_TX_LOCKED when the spin lock fails.
50 The locking there should also properly protect against 75 The locking there should also properly protect against
51 set_multicast_list 76 set_multicast_list.
52 Context: Process with BHs disabled or BH (timer). 77
53 Notes: netif_queue_stopped() is guaranteed false 78 Context: Process with BHs disabled or BH (timer),
54 Interrupts must be enabled when calling hard_start_xmit. 79 will be called with interrupts disabled by netconsole.
55 (Interrupts must also be enabled when enabling the BH handler.) 80
56 Return codes: 81 Return codes:
57 o NETDEV_TX_OK everything ok. 82 o NETDEV_TX_OK everything ok.
58 o NETDEV_TX_BUSY Cannot transmit packet, try later 83 o NETDEV_TX_BUSY Cannot transmit packet, try later
@@ -74,4 +99,5 @@ dev->poll:
74 Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See 99 Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See
75 dev_close code and comments in net/core/dev.c for more info. 100 dev_close code and comments in net/core/dev.c for more info.
76 Context: softirq 101 Context: softirq
102 will be called with interrupts disabled by netconsole.
77 103
diff --git a/Documentation/networking/radiotap-headers.txt b/Documentation/networking/radiotap-headers.txt
new file mode 100644
index 000000000000..953331c7984f
--- /dev/null
+++ b/Documentation/networking/radiotap-headers.txt
@@ -0,0 +1,152 @@
1How to use radiotap headers
2===========================
3
4Pointer to the radiotap include file
5------------------------------------
6
7Radiotap headers are variable-length and extensible, you can get most of the
8information you need to know on them from:
9
10./include/net/ieee80211_radiotap.h
11
12This document gives an overview and warns on some corner cases.
13
14
15Structure of the header
16-----------------------
17
18There is a fixed portion at the start which contains a u32 bitmap that defines
19if the possible argument associated with that bit is present or not. So if b0
20of the it_present member of ieee80211_radiotap_header is set, it means that
21the header for argument index 0 (IEEE80211_RADIOTAP_TSFT) is present in the
22argument area.
23
24 < 8-byte ieee80211_radiotap_header >
25 [ <possible argument bitmap extensions ... > ]
26 [ <argument> ... ]
27
28At the moment there are only 13 possible argument indexes defined, but in case
29we run out of space in the u32 it_present member, it is defined that b31 set
30indicates that there is another u32 bitmap following (shown as "possible
31argument bitmap extensions..." above), and the start of the arguments is moved
32forward 4 bytes each time.
33
34Note also that the it_len member __le16 is set to the total number of bytes
35covered by the ieee80211_radiotap_header and any arguments following.
36
37
38Requirements for arguments
39--------------------------
40
41After the fixed part of the header, the arguments follow for each argument
42index whose matching bit is set in the it_present member of
43ieee80211_radiotap_header.
44
45 - the arguments are all stored little-endian!
46
47 - the argument payload for a given argument index has a fixed size. So
48 IEEE80211_RADIOTAP_TSFT being present always indicates an 8-byte argument is
49 present. See the comments in ./include/net/ieee80211_radiotap.h for a nice
50 breakdown of all the argument sizes
51
52 - the arguments must be aligned to a boundary of the argument size using
53 padding. So a u16 argument must start on the next u16 boundary if it isn't
54 already on one, a u32 must start on the next u32 boundary and so on.
55
56 - "alignment" is relative to the start of the ieee80211_radiotap_header, ie,
57 the first byte of the radiotap header. The absolute alignment of that first
58 byte isn't defined. So even if the whole radiotap header is starting at, eg,
59 address 0x00000003, still the first byte of the radiotap header is treated as
60 0 for alignment purposes.
61
62 - the above point that there may be no absolute alignment for multibyte
63 entities in the fixed radiotap header or the argument region means that you
64 have to take special evasive action when trying to access these multibyte
65 entities. Some arches like Blackfin cannot deal with an attempt to
66 dereference, eg, a u16 pointer that is pointing to an odd address. Instead
67 you have to use a kernel API get_unaligned() to dereference the pointer,
68 which will do it bytewise on the arches that require that.
69
70 - The arguments for a given argument index can be a compound of multiple types
71 together. For example IEEE80211_RADIOTAP_CHANNEL has an argument payload
72 consisting of two u16s of total length 4. When this happens, the padding
73 rule is applied dealing with a u16, NOT dealing with a 4-byte single entity.
74
75
76Example valid radiotap header
77-----------------------------
78
79 0x00, 0x00, // <-- radiotap version + pad byte
80 0x0b, 0x00, // <- radiotap header length
81 0x04, 0x0c, 0x00, 0x00, // <-- bitmap
82 0x6c, // <-- rate (in 500kHz units)
83 0x0c, //<-- tx power
84 0x01 //<-- antenna
85
86
87Using the Radiotap Parser
88-------------------------
89
90If you are having to parse a radiotap struct, you can radically simplify the
91job by using the radiotap parser that lives in net/wireless/radiotap.c and has
92its prototypes available in include/net/cfg80211.h. You use it like this:
93
94#include <net/cfg80211.h>
95
96/* buf points to the start of the radiotap header part */
97
98int MyFunction(u8 * buf, int buflen)
99{
100 int pkt_rate_100kHz = 0, antenna = 0, pwr = 0;
101 struct ieee80211_radiotap_iterator iterator;
102 int ret = ieee80211_radiotap_iterator_init(&iterator, buf, buflen);
103
104 while (!ret) {
105
106 ret = ieee80211_radiotap_iterator_next(&iterator);
107
108 if (ret)
109 continue;
110
111 /* see if this argument is something we can use */
112
113 switch (iterator.this_arg_index) {
114 /*
115 * You must take care when dereferencing iterator.this_arg
116 * for multibyte types... the pointer is not aligned. Use
117 * get_unaligned((type *)iterator.this_arg) to dereference
118 * iterator.this_arg for type "type" safely on all arches.
119 */
120 case IEEE80211_RADIOTAP_RATE:
121 /* radiotap "rate" u8 is in
122 * 500kbps units, eg, 0x02=1Mbps
123 */
124 pkt_rate_100kHz = (*iterator.this_arg) * 5;
125 break;
126
127 case IEEE80211_RADIOTAP_ANTENNA:
128 /* radiotap uses 0 for 1st ant */
129 antenna = *iterator.this_arg);
130 break;
131
132 case IEEE80211_RADIOTAP_DBM_TX_POWER:
133 pwr = *iterator.this_arg;
134 break;
135
136 default:
137 break;
138 }
139 } /* while more rt headers */
140
141 if (ret != -ENOENT)
142 return TXRX_DROP;
143
144 /* discard the radiotap header part */
145 buf += iterator.max_length;
146 buflen -= iterator.max_length;
147
148 ...
149
150}
151
152Andy Green <andy@warmcat.com>
diff --git a/Documentation/networking/sk98lin.txt b/Documentation/networking/sk98lin.txt
deleted file mode 100644
index 8590a954df1d..000000000000
--- a/Documentation/networking/sk98lin.txt
+++ /dev/null
@@ -1,568 +0,0 @@
1(C)Copyright 1999-2004 Marvell(R).
2All rights reserved
3===========================================================================
4
5sk98lin.txt created 13-Feb-2004
6
7Readme File for sk98lin v6.23
8Marvell Yukon/SysKonnect SK-98xx Gigabit Ethernet Adapter family driver for LINUX
9
10This file contains
11 1 Overview
12 2 Required Files
13 3 Installation
14 3.1 Driver Installation
15 3.2 Inclusion of adapter at system start
16 4 Driver Parameters
17 4.1 Per-Port Parameters
18 4.2 Adapter Parameters
19 5 Large Frame Support
20 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad)
21 7 Troubleshooting
22
23===========================================================================
24
25
261 Overview
27===========
28
29The sk98lin driver supports the Marvell Yukon and SysKonnect
30SK-98xx/SK-95xx compliant Gigabit Ethernet Adapter on Linux. It has
31been tested with Linux on Intel/x86 machines.
32***
33
34
352 Required Files
36=================
37
38The linux kernel source.
39No additional files required.
40***
41
42
433 Installation
44===============
45
46It is recommended to download the latest version of the driver from the
47SysKonnect web site www.syskonnect.com. If you have downloaded the latest
48driver, the Linux kernel has to be patched before the driver can be
49installed. For details on how to patch a Linux kernel, refer to the
50patch.txt file.
51
523.1 Driver Installation
53------------------------
54
55The following steps describe the actions that are required to install
56the driver and to start it manually. These steps should be carried
57out for the initial driver setup. Once confirmed to be ok, they can
58be included in the system start.
59
60NOTE 1: To perform the following tasks you need 'root' access.
61
62NOTE 2: In case of problems, please read the section "Troubleshooting"
63 below.
64
65The driver can either be integrated into the kernel or it can be compiled
66as a module. Select the appropriate option during the kernel
67configuration.
68
69Compile/use the driver as a module
70----------------------------------
71To compile the driver, go to the directory /usr/src/linux and
72execute the command "make menuconfig" or "make xconfig" and proceed as
73follows:
74
75To integrate the driver permanently into the kernel, proceed as follows:
76
771. Select the menu "Network device support" and then "Ethernet(1000Mbit)"
782. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support"
79 with (*)
803. Build a new kernel when the configuration of the above options is
81 finished.
824. Install the new kernel.
835. Reboot your system.
84
85To use the driver as a module, proceed as follows:
86
871. Enable 'loadable module support' in the kernel.
882. For automatic driver start, enable the 'Kernel module loader'.
893. Select the menu "Network device support" and then "Ethernet(1000Mbit)"
904. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support"
91 with (M)
925. Execute the command "make modules".
936. Execute the command "make modules_install".
94 The appropriate modules will be installed.
957. Reboot your system.
96
97
98Load the module manually
99------------------------
100To load the module manually, proceed as follows:
101
1021. Enter "modprobe sk98lin".
1032. If a Marvell Yukon or SysKonnect SK-98xx adapter is installed in
104 your computer and you have a /proc file system, execute the command:
105 "ls /proc/net/sk98lin/"
106 This should produce an output containing a line with the following
107 format:
108 eth0 eth1 ...
109 which indicates that your adapter has been found and initialized.
110
111 NOTE 1: If you have more than one Marvell Yukon or SysKonnect SK-98xx
112 adapter installed, the adapters will be listed as 'eth0',
113 'eth1', 'eth2', etc.
114 For each adapter, repeat steps 3 and 4 below.
115
116 NOTE 2: If you have other Ethernet adapters installed, your Marvell
117 Yukon or SysKonnect SK-98xx adapter will be mapped to the
118 next available number, e.g. 'eth1'. The mapping is executed
119 automatically.
120 The module installation message (displayed either in a system
121 log file or on the console) prints a line for each adapter
122 found containing the corresponding 'ethX'.
123
1243. Select an IP address and assign it to the respective adapter by
125 entering:
126 ifconfig eth0 <ip-address>
127 With this command, the adapter is connected to the Ethernet.
128
129 SK-98xx Gigabit Ethernet Server Adapters: The yellow LED on the adapter
130 is now active, the link status LED of the primary port is active and
131 the link status LED of the secondary port (on dual port adapters) is
132 blinking (if the ports are connected to a switch or hub).
133 SK-98xx V2.0 Gigabit Ethernet Adapters: The link status LED is active.
134 In addition, you will receive a status message on the console stating
135 "ethX: network connection up using port Y" and showing the selected
136 connection parameters (x stands for the ethernet device number
137 (0,1,2, etc), y stands for the port name (A or B)).
138
139 NOTE: If you are in doubt about IP addresses, ask your network
140 administrator for assistance.
141
1424. Your adapter should now be fully operational.
143 Use 'ping <otherstation>' to verify the connection to other computers
144 on your network.
1455. To check the adapter configuration view /proc/net/sk98lin/[devicename].
146 For example by executing:
147 "cat /proc/net/sk98lin/eth0"
148
149Unload the module
150-----------------
151To stop and unload the driver modules, proceed as follows:
152
1531. Execute the command "ifconfig eth0 down".
1542. Execute the command "rmmod sk98lin".
155
1563.2 Inclusion of adapter at system start
157-----------------------------------------
158
159Since a large number of different Linux distributions are
160available, we are unable to describe a general installation procedure
161for the driver module.
162Because the driver is now integrated in the kernel, installation should
163be easy, using the standard mechanism of your distribution.
164Refer to the distribution's manual for installation of ethernet adapters.
165
166***
167
1684 Driver Parameters
169====================
170
171Parameters can be set at the command line after the module has been
172loaded with the command 'modprobe'.
173In some distributions, the configuration tools are able to pass parameters
174to the driver module.
175
176If you use the kernel module loader, you can set driver parameters
177in the file /etc/modprobe.conf (or /etc/modules.conf in 2.4 or earlier).
178To set the driver parameters in this file, proceed as follows:
179
1801. Insert a line of the form :
181 options sk98lin ...
182 For "...", the same syntax is required as described for the command
183 line parameters of modprobe below.
1842. To activate the new parameters, either reboot your computer
185 or
186 unload and reload the driver.
187 The syntax of the driver parameters is:
188
189 modprobe sk98lin parameter=value1[,value2[,value3...]]
190
191 where value1 refers to the first adapter, value2 to the second etc.
192
193NOTE: All parameters are case sensitive. Write them exactly as shown
194 below.
195
196Example:
197Suppose you have two adapters. You want to set auto-negotiation
198on the first adapter to ON and on the second adapter to OFF.
199You also want to set DuplexCapabilities on the first adapter
200to FULL, and on the second adapter to HALF.
201Then, you must enter:
202
203 modprobe sk98lin AutoNeg_A=On,Off DupCap_A=Full,Half
204
205NOTE: The number of adapters that can be configured this way is
206 limited in the driver (file skge.c, constant SK_MAX_CARD_PARAM).
207 The current limit is 16. If you happen to install
208 more adapters, adjust this and recompile.
209
210
2114.1 Per-Port Parameters
212------------------------
213
214These settings are available for each port on the adapter.
215In the following description, '?' stands for the port for
216which you set the parameter (A or B).
217
218Speed
219-----
220Parameter: Speed_?
221Values: 10, 100, 1000, Auto
222Default: Auto
223
224This parameter is used to set the speed capabilities. It is only valid
225for the SK-98xx V2.0 copper adapters.
226Usually, the speed is negotiated between the two ports during link
227establishment. If this fails, a port can be forced to a specific setting
228with this parameter.
229
230Auto-Negotiation
231----------------
232Parameter: AutoNeg_?
233Values: On, Off, Sense
234Default: On
235
236The "Sense"-mode automatically detects whether the link partner supports
237auto-negotiation or not.
238
239Duplex Capabilities
240-------------------
241Parameter: DupCap_?
242Values: Half, Full, Both
243Default: Both
244
245This parameters is only relevant if auto-negotiation for this port is
246not set to "Sense". If auto-negotiation is set to "On", all three values
247are possible. If it is set to "Off", only "Full" and "Half" are allowed.
248This parameter is useful if your link partner does not support all
249possible combinations.
250
251Flow Control
252------------
253Parameter: FlowCtrl_?
254Values: Sym, SymOrRem, LocSend, None
255Default: SymOrRem
256
257This parameter can be used to set the flow control capabilities the
258port reports during auto-negotiation. It can be set for each port
259individually.
260Possible modes:
261 -- Sym = Symmetric: both link partners are allowed to send
262 PAUSE frames
263 -- SymOrRem = SymmetricOrRemote: both or only remote partner
264 are allowed to send PAUSE frames
265 -- LocSend = LocalSend: only local link partner is allowed
266 to send PAUSE frames
267 -- None = no link partner is allowed to send PAUSE frames
268
269NOTE: This parameter is ignored if auto-negotiation is set to "Off".
270
271Role in Master-Slave-Negotiation (1000Base-T only)
272--------------------------------------------------
273Parameter: Role_?
274Values: Auto, Master, Slave
275Default: Auto
276
277This parameter is only valid for the SK-9821 and SK-9822 adapters.
278For two 1000Base-T ports to communicate, one must take the role of the
279master (providing timing information), while the other must be the
280slave. Usually, this is negotiated between the two ports during link
281establishment. If this fails, a port can be forced to a specific setting
282with this parameter.
283
284
2854.2 Adapter Parameters
286-----------------------
287
288Connection Type (SK-98xx V2.0 copper adapters only)
289---------------
290Parameter: ConType
291Values: Auto, 100FD, 100HD, 10FD, 10HD
292Default: Auto
293
294The parameter 'ConType' is a combination of all five per-port parameters
295within one single parameter. This simplifies the configuration of both ports
296of an adapter card! The different values of this variable reflect the most
297meaningful combinations of port parameters.
298
299The following table shows the values of 'ConType' and the corresponding
300combinations of the per-port parameters:
301
302 ConType | DupCap AutoNeg FlowCtrl Role Speed
303 ----------+------------------------------------------------------
304 Auto | Both On SymOrRem Auto Auto
305 100FD | Full Off None Auto (ignored) 100
306 100HD | Half Off None Auto (ignored) 100
307 10FD | Full Off None Auto (ignored) 10
308 10HD | Half Off None Auto (ignored) 10
309
310Stating any other port parameter together with this 'ConType' variable
311will result in a merged configuration of those settings. This due to
312the fact, that the per-port parameters (e.g. Speed_? ) have a higher
313priority than the combined variable 'ConType'.
314
315NOTE: This parameter is always used on both ports of the adapter card.
316
317Interrupt Moderation
318--------------------
319Parameter: Moderation
320Values: None, Static, Dynamic
321Default: None
322
323Interrupt moderation is employed to limit the maximum number of interrupts
324the driver has to serve. That is, one or more interrupts (which indicate any
325transmit or receive packet to be processed) are queued until the driver
326processes them. When queued interrupts are to be served, is determined by the
327'IntsPerSec' parameter, which is explained later below.
328
329Possible modes:
330
331 -- None - No interrupt moderation is applied on the adapter card.
332 Therefore, each transmit or receive interrupt is served immediately
333 as soon as it appears on the interrupt line of the adapter card.
334
335 -- Static - Interrupt moderation is applied on the adapter card.
336 All transmit and receive interrupts are queued until a complete
337 moderation interval ends. If such a moderation interval ends, all
338 queued interrupts are processed in one big bunch without any delay.
339 The term 'static' reflects the fact, that interrupt moderation is
340 always enabled, regardless how much network load is currently
341 passing via a particular interface. In addition, the duration of
342 the moderation interval has a fixed length that never changes while
343 the driver is operational.
344
345 -- Dynamic - Interrupt moderation might be applied on the adapter card,
346 depending on the load of the system. If the driver detects that the
347 system load is too high, the driver tries to shield the system against
348 too much network load by enabling interrupt moderation. If - at a later
349 time - the CPU utilization decreases again (or if the network load is
350 negligible) the interrupt moderation will automatically be disabled.
351
352Interrupt moderation should be used when the driver has to handle one or more
353interfaces with a high network load, which - as a consequence - leads also to a
354high CPU utilization. When moderation is applied in such high network load
355situations, CPU load might be reduced by 20-30%.
356
357NOTE: The drawback of using interrupt moderation is an increase of the round-
358trip-time (RTT), due to the queueing and serving of interrupts at dedicated
359moderation times.
360
361Interrupts per second
362---------------------
363Parameter: IntsPerSec
364Values: 30...40000 (interrupts per second)
365Default: 2000
366
367This parameter is only used if either static or dynamic interrupt moderation
368is used on a network adapter card. Using this parameter if no moderation is
369applied will lead to no action performed.
370
371This parameter determines the length of any interrupt moderation interval.
372Assuming that static interrupt moderation is to be used, an 'IntsPerSec'
373parameter value of 2000 will lead to an interrupt moderation interval of
374500 microseconds.
375
376NOTE: The duration of the moderation interval is to be chosen with care.
377At first glance, selecting a very long duration (e.g. only 100 interrupts per
378second) seems to be meaningful, but the increase of packet-processing delay
379is tremendous. On the other hand, selecting a very short moderation time might
380compensate the use of any moderation being applied.
381
382
383Preferred Port
384--------------
385Parameter: PrefPort
386Values: A, B
387Default: A
388
389This is used to force the preferred port to A or B (on dual-port network
390adapters). The preferred port is the one that is used if both are detected
391as fully functional.
392
393RLMT Mode (Redundant Link Management Technology)
394------------------------------------------------
395Parameter: RlmtMode
396Values: CheckLinkState,CheckLocalPort, CheckSeg, DualNet
397Default: CheckLinkState
398
399RLMT monitors the status of the port. If the link of the active port
400fails, RLMT switches immediately to the standby link. The virtual link is
401maintained as long as at least one 'physical' link is up.
402
403Possible modes:
404
405 -- CheckLinkState - Check link state only: RLMT uses the link state
406 reported by the adapter hardware for each individual port to
407 determine whether a port can be used for all network traffic or
408 not.
409
410 -- CheckLocalPort - In this mode, RLMT monitors the network path
411 between the two ports of an adapter by regularly exchanging packets
412 between them. This mode requires a network configuration in which
413 the two ports are able to "see" each other (i.e. there must not be
414 any router between the ports).
415
416 -- CheckSeg - Check local port and segmentation: This mode supports the
417 same functions as the CheckLocalPort mode and additionally checks
418 network segmentation between the ports. Therefore, this mode is only
419 to be used if Gigabit Ethernet switches are installed on the network
420 that have been configured to use the Spanning Tree protocol.
421
422 -- DualNet - In this mode, ports A and B are used as separate devices.
423 If you have a dual port adapter, port A will be configured as eth0
424 and port B as eth1. Both ports can be used independently with
425 distinct IP addresses. The preferred port setting is not used.
426 RLMT is turned off.
427
428NOTE: RLMT modes CLP and CLPSS are designed to operate in configurations
429 where a network path between the ports on one adapter exists.
430 Moreover, they are not designed to work where adapters are connected
431 back-to-back.
432***
433
434
4355 Large Frame Support
436======================
437
438The driver supports large frames (also called jumbo frames). Using large
439frames can result in an improved throughput if transferring large amounts
440of data.
441To enable large frames, set the MTU (maximum transfer unit) of the
442interface to the desired value (up to 9000), execute the following
443command:
444 ifconfig eth0 mtu 9000
445This will only work if you have two adapters connected back-to-back
446or if you use a switch that supports large frames. When using a switch,
447it should be configured to allow large frames and auto-negotiation should
448be set to OFF. The setting must be configured on all adapters that can be
449reached by the large frames. If one adapter is not set to receive large
450frames, it will simply drop them.
451
452You can switch back to the standard ethernet frame size by executing the
453following command:
454 ifconfig eth0 mtu 1500
455
456To permanently configure this setting, add a script with the 'ifconfig'
457line to the system startup sequence (named something like "S99sk98lin"
458in /etc/rc.d/rc2.d).
459***
460
461
4626 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad)
463==================================================================
464
465The Marvell Yukon/SysKonnect Linux drivers are able to support VLAN and
466Link Aggregation according to IEEE standards 802.1, 802.1q, and 802.3ad.
467These features are only available after installation of open source
468modules available on the Internet:
469For VLAN go to: http://www.candelatech.com/~greear/vlan.html
470For Link Aggregation go to: http://www.st.rim.or.jp/~yumo
471
472NOTE: SysKonnect GmbH does not offer any support for these open source
473 modules and does not take the responsibility for any kind of
474 failures or problems arising in connection with these modules.
475
476NOTE: Configuring Link Aggregation on a SysKonnect dual link adapter may
477 cause problems when unloading the driver.
478
479
4807 Troubleshooting
481==================
482
483If any problems occur during the installation process, check the
484following list:
485
486
487Problem: The SK-98xx adapter cannot be found by the driver.
488Solution: In /proc/pci search for the following entry:
489 'Ethernet controller: SysKonnect SK-98xx ...'
490 If this entry exists, the SK-98xx or SK-98xx V2.0 adapter has
491 been found by the system and should be operational.
492 If this entry does not exist or if the file '/proc/pci' is not
493 found, there may be a hardware problem or the PCI support may
494 not be enabled in your kernel.
495 The adapter can be checked using the diagnostics program which
496 is available on the SysKonnect web site:
497 www.syskonnect.com
498
499 Some COMPAQ machines have problems dealing with PCI under Linux.
500 This problem is described in the 'PCI howto' document
501 (included in some distributions or available from the
502 web, e.g. at 'www.linux.org').
503
504
505Problem: Programs such as 'ifconfig' or 'route' cannot be found or the
506 error message 'Operation not permitted' is displayed.
507Reason: You are not logged in as user 'root'.
508Solution: Logout and login as 'root' or change to 'root' via 'su'.
509
510
511Problem: Upon use of the command 'ping <address>' the message
512 "ping: sendto: Network is unreachable" is displayed.
513Reason: Your route is not set correctly.
514Solution: If you are using RedHat, you probably forgot to set up the
515 route in the 'network configuration'.
516 Check the existing routes with the 'route' command and check
517 if an entry for 'eth0' exists, and if so, if it is set correctly.
518
519
520Problem: The driver can be started, the adapter is connected to the
521 network, but you cannot receive or transmit any packets;
522 e.g. 'ping' does not work.
523Reason: There is an incorrect route in your routing table.
524Solution: Check the routing table with the command 'route' and read the
525 manual help pages dealing with routes (enter 'man route').
526
527NOTE: Although the 2.2.x kernel versions generate the routing entry
528 automatically, problems of this kind may occur here as well. We've
529 come across a situation in which the driver started correctly at
530 system start, but after the driver has been removed and reloaded,
531 the route of the adapter's network pointed to the 'dummy0'device
532 and had to be corrected manually.
533
534
535Problem: Your computer should act as a router between multiple
536 IP subnetworks (using multiple adapters), but computers in
537 other subnetworks cannot be reached.
538Reason: Either the router's kernel is not configured for IP forwarding
539 or the routing table and gateway configuration of at least one
540 computer is not working.
541
542Problem: Upon driver start, the following error message is displayed:
543 "eth0: -- ERROR --
544 Class: internal Software error
545 Nr: 0xcc
546 Msg: SkGeInitPort() cannot init running ports"
547Reason: You are using a driver compiled for single processor machines
548 on a multiprocessor machine with SMP (Symmetric MultiProcessor)
549 kernel.
550Solution: Configure your kernel appropriately and recompile the kernel or
551 the modules.
552
553
554
555If your problem is not listed here, please contact SysKonnect's technical
556support for help (linux@syskonnect.de).
557When contacting our technical support, please ensure that the following
558information is available:
559- System Manufacturer and HW Informations (CPU, Memory... )
560- PCI-Boards in your system
561- Distribution
562- Kernel version
563- Driver version
564***
565
566
567
568***End of Readme File***
diff --git a/Documentation/networking/spider_net.txt b/Documentation/networking/spider_net.txt
new file mode 100644
index 000000000000..4b4adb8eb14f
--- /dev/null
+++ b/Documentation/networking/spider_net.txt
@@ -0,0 +1,204 @@
1
2 The Spidernet Device Driver
3 ===========================
4
5Written by Linas Vepstas <linas@austin.ibm.com>
6
7Version of 7 June 2007
8
9Abstract
10========
11This document sketches the structure of portions of the spidernet
12device driver in the Linux kernel tree. The spidernet is a gigabit
13ethernet device built into the Toshiba southbridge commonly used
14in the SONY Playstation 3 and the IBM QS20 Cell blade.
15
16The Structure of the RX Ring.
17=============================
18The receive (RX) ring is a circular linked list of RX descriptors,
19together with three pointers into the ring that are used to manage its
20contents.
21
22The elements of the ring are called "descriptors" or "descrs"; they
23describe the received data. This includes a pointer to a buffer
24containing the received data, the buffer size, and various status bits.
25
26There are three primary states that a descriptor can be in: "empty",
27"full" and "not-in-use". An "empty" or "ready" descriptor is ready
28to receive data from the hardware. A "full" descriptor has data in it,
29and is waiting to be emptied and processed by the OS. A "not-in-use"
30descriptor is neither empty or full; it is simply not ready. It may
31not even have a data buffer in it, or is otherwise unusable.
32
33During normal operation, on device startup, the OS (specifically, the
34spidernet device driver) allocates a set of RX descriptors and RX
35buffers. These are all marked "empty", ready to receive data. This
36ring is handed off to the hardware, which sequentially fills in the
37buffers, and marks them "full". The OS follows up, taking the full
38buffers, processing them, and re-marking them empty.
39
40This filling and emptying is managed by three pointers, the "head"
41and "tail" pointers, managed by the OS, and a hardware current
42descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
43currently being filled. When this descr is filled, the hardware
44marks it full, and advances the GDACTDPA by one. Thus, when there is
45flowing RX traffic, every descr behind it should be marked "full",
46and everything in front of it should be "empty". If the hardware
47discovers that the current descr is not empty, it will signal an
48interrupt, and halt processing.
49
50The tail pointer tails or trails the hardware pointer. When the
51hardware is ahead, the tail pointer will be pointing at a "full"
52descr. The OS will process this descr, and then mark it "not-in-use",
53and advance the tail pointer. Thus, when there is flowing RX traffic,
54all of the descrs in front of the tail pointer should be "full", and
55all of those behind it should be "not-in-use". When RX traffic is not
56flowing, then the tail pointer can catch up to the hardware pointer.
57The OS will then note that the current tail is "empty", and halt
58processing.
59
60The head pointer (somewhat mis-named) follows after the tail pointer.
61When traffic is flowing, then the head pointer will be pointing at
62a "not-in-use" descr. The OS will perform various housekeeping duties
63on this descr. This includes allocating a new data buffer and
64dma-mapping it so as to make it visible to the hardware. The OS will
65then mark the descr as "empty", ready to receive data. Thus, when there
66is flowing RX traffic, everything in front of the head pointer should
67be "not-in-use", and everything behind it should be "empty". If no
68RX traffic is flowing, then the head pointer can catch up to the tail
69pointer, at which point the OS will notice that the head descr is
70"empty", and it will halt processing.
71
72Thus, in an idle system, the GDACTDPA, tail and head pointers will
73all be pointing at the same descr, which should be "empty". All of the
74other descrs in the ring should be "empty" as well.
75
76The show_rx_chain() routine will print out the the locations of the
77GDACTDPA, tail and head pointers. It will also summarize the contents
78of the ring, starting at the tail pointer, and listing the status
79of the descrs that follow.
80
81A typical example of the output, for a nearly idle system, might be
82
83net eth1: Total number of descrs=256
84net eth1: Chain tail located at descr=20
85net eth1: Chain head is at 20
86net eth1: HW curr desc (GDACTDPA) is at 21
87net eth1: Have 1 descrs with stat=x40800101
88net eth1: HW next desc (GDACNEXTDA) is at 22
89net eth1: Last 255 descrs with stat=xa0800000
90
91In the above, the hardware has filled in one descr, number 20. Both
92head and tail are pointing at 20, because it has not yet been emptied.
93Meanwhile, hw is pointing at 21, which is free.
94
95The "Have nnn decrs" refers to the descr starting at the tail: in this
96case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers
97to all of the rest of the descrs, from the last status change. The "nnn"
98is a count of how many descrs have exactly the same status.
99
100The status x4... corresponds to "full" and status xa... corresponds
101to "empty". The actual value printed is RXCOMST_A.
102
103In the device driver source code, a different set of names are
104used for these same concepts, so that
105
106"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
107"full" == SPIDER_NET_DESCR_FRAME_END == 0x4
108"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
109
110
111The RX RAM full bug/feature
112===========================
113
114As long as the OS can empty out the RX buffers at a rate faster than
115the hardware can fill them, there is no problem. If, for some reason,
116the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
117pointer will catch up to the head, notice the not-empty condition,
118ad stop. However, RX packets may still continue arriving on the wire.
119The spidernet chip can save some limited number of these in local RAM.
120When this local ram fills up, the spider chip will issue an interrupt
121indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
122will be set in GHIINT1STS). When the RX ram full condition occurs,
123a certain bug/feature is triggered that has to be specially handled.
124This section describes the special handling for this condition.
125
126When the OS finally has a chance to run, it will empty out the RX ring.
127In particular, it will clear the descriptor on which the hardware had
128stopped. However, once the hardware has decided that a certain
129descriptor is invalid, it will not restart at that descriptor; instead
130it will restart at the next descr. This potentially will lead to a
131deadlock condition, as the tail pointer will be pointing at this descr,
132which, from the OS point of view, is empty; the OS will be waiting for
133this descr to be filled. However, the hardware has skipped this descr,
134and is filling the next descrs. Since the OS doesn't see this, there
135is a potential deadlock, with the OS waiting for one descr to fill,
136while the hardware is waiting for a different set of descrs to become
137empty.
138
139A call to show_rx_chain() at this point indicates the nature of the
140problem. A typical print when the network is hung shows the following:
141
142net eth1: Spider RX RAM full, incoming packets might be discarded!
143net eth1: Total number of descrs=256
144net eth1: Chain tail located at descr=255
145net eth1: Chain head is at 255
146net eth1: HW curr desc (GDACTDPA) is at 0
147net eth1: Have 1 descrs with stat=xa0800000
148net eth1: HW next desc (GDACNEXTDA) is at 1
149net eth1: Have 127 descrs with stat=x40800101
150net eth1: Have 1 descrs with stat=x40800001
151net eth1: Have 126 descrs with stat=x40800101
152net eth1: Last 1 descrs with stat=xa0800000
153
154Both the tail and head pointers are pointing at descr 255, which is
155marked xa... which is "empty". Thus, from the OS point of view, there
156is nothing to be done. In particular, there is the implicit assumption
157that everything in front of the "empty" descr must surely also be empty,
158as explained in the last section. The OS is waiting for descr 255 to
159become non-empty, which, in this case, will never happen.
160
161The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
162Since its already full, the hardware can do nothing more, and thus has
163halted processing. Notice that descrs 0 through 254 are all marked
164"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
165descr 254, since tail was at 255.) Thus, the system is deadlocked,
166and there can be no forward progress; the OS thinks there's nothing
167to do, and the hardware has nowhere to put incoming data.
168
169This bug/feature is worked around with the spider_net_resync_head_ptr()
170routine. When the driver receives RX interrupts, but an examination
171of the RX chain seems to show it is empty, then it is probable that
172the hardware has skipped a descr or two (sometimes dozens under heavy
173network conditions). The spider_net_resync_head_ptr() subroutine will
174search the ring for the next full descr, and the driver will resume
175operations there. Since this will leave "holes" in the ring, there
176is also a spider_net_resync_tail_ptr() that will skip over such holes.
177
178As of this writing, the spider_net_resync() strategy seems to work very
179well, even under heavy network loads.
180
181
182The TX ring
183===========
184The TX ring uses a low-watermark interrupt scheme to make sure that
185the TX queue is appropriately serviced for large packet sizes.
186
187For packet sizes greater than about 1KBytes, the kernel can fill
188the TX ring quicker than the device can drain it. Once the ring
189is full, the netdev is stopped. When there is room in the ring,
190the netdev needs to be reawakened, so that more TX packets are placed
191in the ring. The hardware can empty the ring about four times per jiffy,
192so its not appropriate to wait for the poll routine to refill, since
193the poll routine runs only once per jiffy. The low-watermark mechanism
194marks a descr about 1/4th of the way from the bottom of the queue, so
195that an interrupt is generated when the descr is processed. This
196interrupt wakes up the netdev, which can then refill the queue.
197For large packets, this mechanism generates a relatively small number
198of interrupts, about 1K/sec. For smaller packets, this will drop to zero
199interrupts, as the hardware can empty the queue faster than the kernel
200can fill it.
201
202
203 ======= END OF DOCUMENT ========
204
diff --git a/Documentation/networking/xfrm_sysctl.txt b/Documentation/networking/xfrm_sysctl.txt
new file mode 100644
index 000000000000..5bbd16792fe1
--- /dev/null
+++ b/Documentation/networking/xfrm_sysctl.txt
@@ -0,0 +1,4 @@
1/proc/sys/net/core/xfrm_* Variables:
2
3xfrm_acq_expires - INTEGER
4 default 30 - hard timeout in seconds for acquire requests
diff --git a/Documentation/pci.txt b/Documentation/pci.txt
index d38261b67905..7754f5aea4e9 100644
--- a/Documentation/pci.txt
+++ b/Documentation/pci.txt
@@ -113,9 +113,6 @@ initialization with a pointer to a structure describing the driver
113 (Please see Documentation/power/pci.txt for descriptions 113 (Please see Documentation/power/pci.txt for descriptions
114 of PCI Power Management and the related functions.) 114 of PCI Power Management and the related functions.)
115 115
116 enable_wake Enable device to generate wake events from a low power
117 state.
118
119 shutdown Hook into reboot_notifier_list (kernel/sys.c). 116 shutdown Hook into reboot_notifier_list (kernel/sys.c).
120 Intended to stop any idling DMA operations. 117 Intended to stop any idling DMA operations.
121 Useful for enabling wake-on-lan (NIC) or changing 118 Useful for enabling wake-on-lan (NIC) or changing
@@ -299,7 +296,10 @@ If the PCI device can use the PCI Memory-Write-Invalidate transaction,
299call pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval 296call pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval
300and also ensures that the cache line size register is set correctly. 297and also ensures that the cache line size register is set correctly.
301Check the return value of pci_set_mwi() as not all architectures 298Check the return value of pci_set_mwi() as not all architectures
302or chip-sets may support Memory-Write-Invalidate. 299or chip-sets may support Memory-Write-Invalidate. Alternatively,
300if Mem-Wr-Inval would be nice to have but is not required, call
301pci_try_set_mwi() to have the system do its best effort at enabling
302Mem-Wr-Inval.
303 303
304 304
3053.2 Request MMIO/IOP resources 3053.2 Request MMIO/IOP resources
diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt
index e00b099a4b86..dd8fe43888d3 100644
--- a/Documentation/power/pci.txt
+++ b/Documentation/power/pci.txt
@@ -164,7 +164,6 @@ struct pci_driver:
164 164
165 int (*suspend) (struct pci_dev *dev, pm_message_t state); 165 int (*suspend) (struct pci_dev *dev, pm_message_t state);
166 int (*resume) (struct pci_dev *dev); 166 int (*resume) (struct pci_dev *dev);
167 int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable);
168 167
169 168
170suspend 169suspend
@@ -251,42 +250,6 @@ The driver should update the current_state field in its pci_dev structure in
251this function, except for PM-capable devices when pci_set_power_state is used. 250this function, except for PM-capable devices when pci_set_power_state is used.
252 251
253 252
254enable_wake
255-----------
256
257Usage:
258
259if (dev->driver && dev->driver->enable_wake)
260 dev->driver->enable_wake(dev,state,enable);
261
262This callback is generally only relevant for devices that support the PCI PM
263spec and have the ability to generate a PME# (Power Management Event Signal)
264to wake the system up. (However, it is possible that a device may support
265some non-standard way of generating a wake event on sleep.)
266
267Bits 15:11 of the PMC (Power Mgmt Capabilities) Register in a device's
268PM Capabilities describe what power states the device supports generating a
269wake event from:
270
271+------------------+
272| Bit | State |
273+------------------+
274| 11 | D0 |
275| 12 | D1 |
276| 13 | D2 |
277| 14 | D3hot |
278| 15 | D3cold |
279+------------------+
280
281A device can use this to enable wake events:
282
283 pci_enable_wake(dev,state,enable);
284
285Note that to enable PME# from D3cold, a value of 4 should be passed to
286pci_enable_wake (since it uses an index into a bitmask). If a driver gets
287a request to enable wake events from D3, two calls should be made to
288pci_enable_wake (one for both D3hot and D3cold).
289
290 253
291A reference implementation 254A reference implementation
292------------------------- 255-------------------------
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt
index 5b8d6953f05e..152b510d1bbb 100644
--- a/Documentation/power/swsusp.txt
+++ b/Documentation/power/swsusp.txt
@@ -393,6 +393,9 @@ safest thing is to unmount all filesystems on removable media (such USB,
393Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays) 393Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays)
394before suspending; then remount them after resuming. 394before suspending; then remount them after resuming.
395 395
396There is a work-around for this problem. For more information, see
397Documentation/usb/persist.txt.
398
396Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were 399Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were
397compiled with the similar configuration files. Anyway I found that 400compiled with the similar configuration files. Anyway I found that
398suspend to disk (and resume) is much slower on 2.6.16 compared to 401suspend to disk (and resume) is much slower on 2.6.16 compared to
diff --git a/Documentation/power_supply_class.txt b/Documentation/power_supply_class.txt
new file mode 100644
index 000000000000..9758cf433c06
--- /dev/null
+++ b/Documentation/power_supply_class.txt
@@ -0,0 +1,167 @@
1Linux power supply class
2========================
3
4Synopsis
5~~~~~~~~
6Power supply class used to represent battery, UPS, AC or DC power supply
7properties to user-space.
8
9It defines core set of attributes, which should be applicable to (almost)
10every power supply out there. Attributes are available via sysfs and uevent
11interfaces.
12
13Each attribute has well defined meaning, up to unit of measure used. While
14the attributes provided are believed to be universally applicable to any
15power supply, specific monitoring hardware may not be able to provide them
16all, so any of them may be skipped.
17
18Power supply class is extensible, and allows to define drivers own attributes.
19The core attribute set is subject to the standard Linux evolution (i.e.
20if it will be found that some attribute is applicable to many power supply
21types or their drivers, it can be added to the core set).
22
23It also integrates with LED framework, for the purpose of providing
24typically expected feedback of battery charging/fully charged status and
25AC/USB power supply online status. (Note that specific details of the
26indication (including whether to use it at all) are fully controllable by
27user and/or specific machine defaults, per design principles of LED
28framework).
29
30
31Attributes/properties
32~~~~~~~~~~~~~~~~~~~~~
33Power supply class has predefined set of attributes, this eliminates code
34duplication across drivers. Power supply class insist on reusing its
35predefined attributes *and* their units.
36
37So, userspace gets predictable set of attributes and their units for any
38kind of power supply, and can process/present them to a user in consistent
39manner. Results for different power supplies and machines are also directly
40comparable.
41
42See drivers/power/ds2760_battery.c and drivers/power/pda_power.c for the
43example how to declare and handle attributes.
44
45
46Units
47~~~~~
48Quoting include/linux/power_supply.h:
49
50 All voltages, currents, charges, energies, time and temperatures in µV,
51 µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise
52 stated. It's driver's job to convert its raw values to units in which
53 this class operates.
54
55
56Attributes/properties detailed
57~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
58
59~ ~ ~ ~ ~ ~ ~ Charge/Energy/Capacity - how to not confuse ~ ~ ~ ~ ~ ~ ~
60~ ~
61~ Because both "charge" (µAh) and "energy" (µWh) represents "capacity" ~
62~ of battery, this class distinguish these terms. Don't mix them! ~
63~ ~
64~ CHARGE_* attributes represents capacity in µAh only. ~
65~ ENERGY_* attributes represents capacity in µWh only. ~
66~ CAPACITY attribute represents capacity in *percents*, from 0 to 100. ~
67~ ~
68~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
69
70Postfixes:
71_AVG - *hardware* averaged value, use it if your hardware is really able to
72report averaged values.
73_NOW - momentary/instantaneous values.
74
75STATUS - this attribute represents operating status (charging, full,
76discharging (i.e. powering a load), etc.). This corresponds to
77BATTERY_STATUS_* values, as defined in battery.h.
78
79HEALTH - represents health of the battery, values corresponds to
80POWER_SUPPLY_HEALTH_*, defined in battery.h.
81
82VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN - design values for maximal and
83minimal power supply voltages. Maximal/minimal means values of voltages
84when battery considered "full"/"empty" at normal conditions. Yes, there is
85no direct relation between voltage and battery capacity, but some dumb
86batteries use voltage for very approximated calculation of capacity.
87Battery driver also can use this attribute just to inform userspace
88about maximal and minimal voltage thresholds of a given battery.
89
90CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN - design charge values, when
91battery considered full/empty.
92
93ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN - same as above but for energy.
94
95CHARGE_FULL, CHARGE_EMPTY - These attributes means "last remembered value
96of charge when battery became full/empty". It also could mean "value of
97charge when battery considered full/empty at given conditions (temperature,
98age)". I.e. these attributes represents real thresholds, not design values.
99
100ENERGY_FULL, ENERGY_EMPTY - same as above but for energy.
101
102CAPACITY - capacity in percents.
103CAPACITY_LEVEL - capacity level. This corresponds to
104POWER_SUPPLY_CAPACITY_LEVEL_*.
105
106TEMP - temperature of the power supply.
107TEMP_AMBIENT - ambient temperature.
108
109TIME_TO_EMPTY - seconds left for battery to be considered empty (i.e.
110while battery powers a load)
111TIME_TO_FULL - seconds left for battery to be considered full (i.e.
112while battery is charging)
113
114
115Battery <-> external power supply interaction
116~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
117Often power supplies are acting as supplies and supplicants at the same
118time. Batteries are good example. So, batteries usually care if they're
119externally powered or not.
120
121For that case, power supply class implements notification mechanism for
122batteries.
123
124External power supply (AC) lists supplicants (batteries) names in
125"supplied_to" struct member, and each power_supply_changed() call
126issued by external power supply will notify supplicants via
127external_power_changed callback.
128
129
130QA
131~~
132Q: Where is POWER_SUPPLY_PROP_XYZ attribute?
133A: If you cannot find attribute suitable for your driver needs, feel free
134 to add it and send patch along with your driver.
135
136 The attributes available currently are the ones currently provided by the
137 drivers written.
138
139 Good candidates to add in future: model/part#, cycle_time, manufacturer,
140 etc.
141
142
143Q: I have some very specific attribute (e.g. battery color), should I add
144 this attribute to standard ones?
145A: Most likely, no. Such attribute can be placed in the driver itself, if
146 it is useful. Of course, if the attribute in question applicable to
147 large set of batteries, provided by many drivers, and/or comes from
148 some general battery specification/standard, it may be a candidate to
149 be added to the core attribute set.
150
151
152Q: Suppose, my battery monitoring chip/firmware does not provides capacity
153 in percents, but provides charge_{now,full,empty}. Should I calculate
154 percentage capacity manually, inside the driver, and register CAPACITY
155 attribute? The same question about time_to_empty/time_to_full.
156A: Most likely, no. This class is designed to export properties which are
157 directly measurable by the specific hardware available.
158
159 Inferring not available properties using some heuristics or mathematical
160 model is not subject of work for a battery driver. Such functionality
161 should be factored out, and in fact, apm_power, the driver to serve
162 legacy APM API on top of power supply class, uses a simple heuristic of
163 approximating remaining battery capacity based on its charge, current,
164 voltage and so on. But full-fledged battery model is likely not subject
165 for kernel at all, as it would require floating point calculation to deal
166 with things like differential equations and Kalman filters. This is
167 better be handled by batteryd/libbattery, yet to be written.
diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt
index b49ce169a63a..d42d98107d49 100644
--- a/Documentation/powerpc/booting-without-of.txt
+++ b/Documentation/powerpc/booting-without-of.txt
@@ -1,7 +1,6 @@
1 Booting the Linux/ppc kernel without Open Firmware 1 Booting the Linux/ppc kernel without Open Firmware
2 -------------------------------------------------- 2 --------------------------------------------------
3 3
4
5(c) 2005 Benjamin Herrenschmidt <benh at kernel.crashing.org>, 4(c) 2005 Benjamin Herrenschmidt <benh at kernel.crashing.org>,
6 IBM Corp. 5 IBM Corp.
7(c) 2005 Becky Bruce <becky.bruce at freescale.com>, 6(c) 2005 Becky Bruce <becky.bruce at freescale.com>,
@@ -9,6 +8,62 @@
9(c) 2006 MontaVista Software, Inc. 8(c) 2006 MontaVista Software, Inc.
10 Flash chip node definition 9 Flash chip node definition
11 10
11Table of Contents
12=================
13
14 I - Introduction
15 1) Entry point for arch/powerpc
16 2) Board support
17
18 II - The DT block format
19 1) Header
20 2) Device tree generalities
21 3) Device tree "structure" block
22 4) Device tree "strings" block
23
24 III - Required content of the device tree
25 1) Note about cells and address representation
26 2) Note about "compatible" properties
27 3) Note about "name" properties
28 4) Note about node and property names and character set
29 5) Required nodes and properties
30 a) The root node
31 b) The /cpus node
32 c) The /cpus/* nodes
33 d) the /memory node(s)
34 e) The /chosen node
35 f) the /soc<SOCname> node
36
37 IV - "dtc", the device tree compiler
38
39 V - Recommendations for a bootloader
40
41 VI - System-on-a-chip devices and nodes
42 1) Defining child nodes of an SOC
43 2) Representing devices without a current OF specification
44 a) MDIO IO device
45 c) PHY nodes
46 b) Gianfar-compatible ethernet nodes
47 d) Interrupt controllers
48 e) I2C
49 f) Freescale SOC USB controllers
50 g) Freescale SOC SEC Security Engines
51 h) Board Control and Status (BCSR)
52 i) Freescale QUICC Engine module (QE)
53 g) Flash chip nodes
54
55 VII - Specifying interrupt information for devices
56 1) interrupts property
57 2) interrupt-parent property
58 3) OpenPIC Interrupt Controllers
59 4) ISA Interrupt Controllers
60
61 Appendix A - Sample SOC node for MPC8540
62
63
64Revision Information
65====================
66
12 May 18, 2005: Rev 0.1 - Initial draft, no chapter III yet. 67 May 18, 2005: Rev 0.1 - Initial draft, no chapter III yet.
13 68
14 May 19, 2005: Rev 0.2 - Add chapter III and bits & pieces here or 69 May 19, 2005: Rev 0.2 - Add chapter III and bits & pieces here or
@@ -1687,7 +1742,7 @@ platforms are moved over to use the flattened-device-tree model.
1687 }; 1742 };
1688 }; 1743 };
1689 1744
1690 g) Flash chip nodes 1745 j) Flash chip nodes
1691 1746
1692 Flash chips (Memory Technology Devices) are often used for solid state 1747 Flash chips (Memory Technology Devices) are often used for solid state
1693 file systems on embedded devices. 1748 file systems on embedded devices.
diff --git a/Documentation/sched-design-CFS.txt b/Documentation/sched-design-CFS.txt
new file mode 100644
index 000000000000..16feebb7bdc0
--- /dev/null
+++ b/Documentation/sched-design-CFS.txt
@@ -0,0 +1,119 @@
1
2This is the CFS scheduler.
3
480% of CFS's design can be summed up in a single sentence: CFS basically
5models an "ideal, precise multi-tasking CPU" on real hardware.
6
7"Ideal multi-tasking CPU" is a (non-existent :-)) CPU that has 100%
8physical power and which can run each task at precise equal speed, in
9parallel, each at 1/nr_running speed. For example: if there are 2 tasks
10running then it runs each at 50% physical power - totally in parallel.
11
12On real hardware, we can run only a single task at once, so while that
13one task runs, the other tasks that are waiting for the CPU are at a
14disadvantage - the current task gets an unfair amount of CPU time. In
15CFS this fairness imbalance is expressed and tracked via the per-task
16p->wait_runtime (nanosec-unit) value. "wait_runtime" is the amount of
17time the task should now run on the CPU for it to become completely fair
18and balanced.
19
20( small detail: on 'ideal' hardware, the p->wait_runtime value would
21 always be zero - no task would ever get 'out of balance' from the
22 'ideal' share of CPU time. )
23
24CFS's task picking logic is based on this p->wait_runtime value and it
25is thus very simple: it always tries to run the task with the largest
26p->wait_runtime value. In other words, CFS tries to run the task with
27the 'gravest need' for more CPU time. So CFS always tries to split up
28CPU time between runnable tasks as close to 'ideal multitasking
29hardware' as possible.
30
31Most of the rest of CFS's design just falls out of this really simple
32concept, with a few add-on embellishments like nice levels,
33multiprocessing and various algorithm variants to recognize sleepers.
34
35In practice it works like this: the system runs a task a bit, and when
36the task schedules (or a scheduler tick happens) the task's CPU usage is
37'accounted for': the (small) time it just spent using the physical CPU
38is deducted from p->wait_runtime. [minus the 'fair share' it would have
39gotten anyway]. Once p->wait_runtime gets low enough so that another
40task becomes the 'leftmost task' of the time-ordered rbtree it maintains
41(plus a small amount of 'granularity' distance relative to the leftmost
42task so that we do not over-schedule tasks and trash the cache) then the
43new leftmost task is picked and the current task is preempted.
44
45The rq->fair_clock value tracks the 'CPU time a runnable task would have
46fairly gotten, had it been runnable during that time'. So by using
47rq->fair_clock values we can accurately timestamp and measure the
48'expected CPU time' a task should have gotten. All runnable tasks are
49sorted in the rbtree by the "rq->fair_clock - p->wait_runtime" key, and
50CFS picks the 'leftmost' task and sticks to it. As the system progresses
51forwards, newly woken tasks are put into the tree more and more to the
52right - slowly but surely giving a chance for every task to become the
53'leftmost task' and thus get on the CPU within a deterministic amount of
54time.
55
56Some implementation details:
57
58 - the introduction of Scheduling Classes: an extensible hierarchy of
59 scheduler modules. These modules encapsulate scheduling policy
60 details and are handled by the scheduler core without the core
61 code assuming about them too much.
62
63 - sched_fair.c implements the 'CFS desktop scheduler': it is a
64 replacement for the vanilla scheduler's SCHED_OTHER interactivity
65 code.
66
67 I'd like to give credit to Con Kolivas for the general approach here:
68 he has proven via RSDL/SD that 'fair scheduling' is possible and that
69 it results in better desktop scheduling. Kudos Con!
70
71 The CFS patch uses a completely different approach and implementation
72 from RSDL/SD. My goal was to make CFS's interactivity quality exceed
73 that of RSDL/SD, which is a high standard to meet :-) Testing
74 feedback is welcome to decide this one way or another. [ and, in any
75 case, all of SD's logic could be added via a kernel/sched_sd.c module
76 as well, if Con is interested in such an approach. ]
77
78 CFS's design is quite radical: it does not use runqueues, it uses a
79 time-ordered rbtree to build a 'timeline' of future task execution,
80 and thus has no 'array switch' artifacts (by which both the vanilla
81 scheduler and RSDL/SD are affected).
82
83 CFS uses nanosecond granularity accounting and does not rely on any
84 jiffies or other HZ detail. Thus the CFS scheduler has no notion of
85 'timeslices' and has no heuristics whatsoever. There is only one
86 central tunable:
87
88 /proc/sys/kernel/sched_granularity_ns
89
90 which can be used to tune the scheduler from 'desktop' (low
91 latencies) to 'server' (good batching) workloads. It defaults to a
92 setting suitable for desktop workloads. SCHED_BATCH is handled by the
93 CFS scheduler module too.
94
95 Due to its design, the CFS scheduler is not prone to any of the
96 'attacks' that exist today against the heuristics of the stock
97 scheduler: fiftyp.c, thud.c, chew.c, ring-test.c, massive_intr.c all
98 work fine and do not impact interactivity and produce the expected
99 behavior.
100
101 the CFS scheduler has a much stronger handling of nice levels and
102 SCHED_BATCH: both types of workloads should be isolated much more
103 agressively than under the vanilla scheduler.
104
105 ( another detail: due to nanosec accounting and timeline sorting,
106 sched_yield() support is very simple under CFS, and in fact under
107 CFS sched_yield() behaves much better than under any other
108 scheduler i have tested so far. )
109
110 - sched_rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler
111 way than the vanilla scheduler does. It uses 100 runqueues (for all
112 100 RT priority levels, instead of 140 in the vanilla scheduler)
113 and it needs no expired array.
114
115 - reworked/sanitized SMP load-balancing: the runqueue-walking
116 assumptions are gone from the load-balancing code now, and
117 iterators of the scheduling modules are used. The balancing code got
118 quite a bit simpler as a result.
119
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index 57b878cc393c..355ff0a2bb7c 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -917,6 +917,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
917 ref Reference board, base config 917 ref Reference board, base config
918 m2-2 Some Gateway MX series laptops 918 m2-2 Some Gateway MX series laptops
919 m6 Some Gateway NX series laptops 919 m6 Some Gateway NX series laptops
920 pa6 Gateway NX860 series
920 921
921 STAC9227/9228/9229/927x 922 STAC9227/9228/9229/927x
922 ref Reference board 923 ref Reference board
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 1d192565e182..8cfca173d4bc 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -31,6 +31,7 @@ Currently, these files are in /proc/sys/vm:
31- min_unmapped_ratio 31- min_unmapped_ratio
32- min_slab_ratio 32- min_slab_ratio
33- panic_on_oom 33- panic_on_oom
34- mmap_min_address
34 35
35============================================================== 36==============================================================
36 37
@@ -216,3 +217,17 @@ above-mentioned.
216The default value is 0. 217The default value is 0.
2171 and 2 are for failover of clustering. Please select either 2181 and 2 are for failover of clustering. Please select either
218according to your policy of failover. 219according to your policy of failover.
220
221==============================================================
222
223mmap_min_addr
224
225This file indicates the amount of address space which a user process will
226be restricted from mmaping. Since kernel null dereference bugs could
227accidentally operate based on the information in the first couple of pages
228of memory userspace processes should not be allowed to write to them. By
229default this value is set to 0 and no protections will be enforced by the
230security module. Setting this value to something like 64k will allow the
231vast majority of applications to work correctly and provide defense in depth
232against future potential kernel bugs.
233
diff --git a/Documentation/sysfs-rules.txt b/Documentation/sysfs-rules.txt
new file mode 100644
index 000000000000..42861bb0bc9b
--- /dev/null
+++ b/Documentation/sysfs-rules.txt
@@ -0,0 +1,166 @@
1Rules on how to access information in the Linux kernel sysfs
2
3The kernel exported sysfs exports internal kernel implementation-details
4and depends on internal kernel structures and layout. It is agreed upon
5by the kernel developers that the Linux kernel does not provide a stable
6internal API. As sysfs is a direct export of kernel internal
7structures, the sysfs interface can not provide a stable interface eighter,
8it may always change along with internal kernel changes.
9
10To minimize the risk of breaking users of sysfs, which are in most cases
11low-level userspace applications, with a new kernel release, the users
12of sysfs must follow some rules to use an as abstract-as-possible way to
13access this filesystem. The current udev and HAL programs already
14implement this and users are encouraged to plug, if possible, into the
15abstractions these programs provide instead of accessing sysfs
16directly.
17
18But if you really do want or need to access sysfs directly, please follow
19the following rules and then your programs should work with future
20versions of the sysfs interface.
21
22- Do not use libsysfs
23 It makes assumptions about sysfs which are not true. Its API does not
24 offer any abstraction, it exposes all the kernel driver-core
25 implementation details in its own API. Therefore it is not better than
26 reading directories and opening the files yourself.
27 Also, it is not actively maintained, in the sense of reflecting the
28 current kernel-development. The goal of providing a stable interface
29 to sysfs has failed, it causes more problems, than it solves. It
30 violates many of the rules in this document.
31
32- sysfs is always at /sys
33 Parsing /proc/mounts is a waste of time. Other mount points are a
34 system configuration bug you should not try to solve. For test cases,
35 possibly support a SYSFS_PATH environment variable to overwrite the
36 applications behavior, but never try to search for sysfs. Never try
37 to mount it, if you are not an early boot script.
38
39- devices are only "devices"
40 There is no such thing like class-, bus-, physical devices,
41 interfaces, and such that you can rely on in userspace. Everything is
42 just simply a "device". Class-, bus-, physical, ... types are just
43 kernel implementation details, which should not be expected by
44 applications that look for devices in sysfs.
45
46 The properties of a device are:
47 o devpath (/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0)
48 - identical to the DEVPATH value in the event sent from the kernel
49 at device creation and removal
50 - the unique key to the device at that point in time
51 - the kernels path to the device-directory without the leading
52 /sys, and always starting with with a slash
53 - all elements of a devpath must be real directories. Symlinks
54 pointing to /sys/devices must always be resolved to their real
55 target, and the target path must be used to access the device.
56 That way the devpath to the device matches the devpath of the
57 kernel used at event time.
58 - using or exposing symlink values as elements in a devpath string
59 is a bug in the application
60
61 o kernel name (sda, tty, 0000:00:1f.2, ...)
62 - a directory name, identical to the last element of the devpath
63 - applications need to handle spaces and characters like '!' in
64 the name
65
66 o subsystem (block, tty, pci, ...)
67 - simple string, never a path or a link
68 - retrieved by reading the "subsystem"-link and using only the
69 last element of the target path
70
71 o driver (tg3, ata_piix, uhci_hcd)
72 - a simple string, which may contain spaces, never a path or a
73 link
74 - it is retrieved by reading the "driver"-link and using only the
75 last element of the target path
76 - devices which do not have "driver"-link, just do not have a
77 driver; copying the driver value in a child device context, is a
78 bug in the application
79
80 o attributes
81 - the files in the device directory or files below a subdirectories
82 of the same device directory
83 - accessing attributes reached by a symlink pointing to another device,
84 like the "device"-link, is a bug in the application
85
86 Everything else is just a kernel driver-core implementation detail,
87 that should not be assumed to be stable across kernel releases.
88
89- Properties of parent devices never belong into a child device.
90 Always look at the parent devices themselves for determining device
91 context properties. If the device 'eth0' or 'sda' does not have a
92 "driver"-link, then this device does not have a driver. Its value is empty.
93 Never copy any property of the parent-device into a child-device. Parent
94 device-properties may change dynamically without any notice to the
95 child device.
96
97- Hierarchy in a single device-tree
98 There is only one valid place in sysfs where hierarchy can be examined
99 and this is below: /sys/devices.
100 It is planned, that all device directories will end up in the tree
101 below this directory.
102
103- Classification by subsystem
104 There are currently three places for classification of devices:
105 /sys/block, /sys/class and /sys/bus. It is planned that these will
106 not contain any device-directories themselves, but only flat lists of
107 symlinks pointing to the unified /sys/devices tree.
108 All three places have completely different rules on how to access
109 device information. It is planned to merge all three
110 classification-directories into one place at /sys/subsystem,
111 following the layout of the bus-directories. All buses and
112 classes, including the converted block-subsystem, will show up
113 there.
114 The devices belonging to a subsystem will create a symlink in the
115 "devices" directory at /sys/subsystem/<name>/devices.
116
117 If /sys/subsystem exists, /sys/bus, /sys/class and /sys/block can be
118 ignored. If it does not exist, you have always to scan all three
119 places, as the kernel is free to move a subsystem from one place to
120 the other, as long as the devices are still reachable by the same
121 subsystem name.
122
123 Assuming /sys/class/<subsystem> and /sys/bus/<subsystem>, or
124 /sys/block and /sys/class/block are not interchangeable, is a bug in
125 the application.
126
127- Block
128 The converted block-subsystem at /sys/class/block, or
129 /sys/subsystem/block will contain the links for disks and partitions
130 at the same level, never in a hierarchy. Assuming the block-subsytem to
131 contain only disks and not partition-devices in the same flat list is
132 a bug in the application.
133
134- "device"-link and <subsystem>:<kernel name>-links
135 Never depend on the "device"-link. The "device"-link is a workaround
136 for the old layout, where class-devices are not created in
137 /sys/devices/ like the bus-devices. If the link-resolving of a
138 device-directory does not end in /sys/devices/, you can use the
139 "device"-link to find the parent devices in /sys/devices/. That is the
140 single valid use of the "device"-link, it must never appear in any
141 path as an element. Assuming the existence of the "device"-link for
142 a device in /sys/devices/ is a bug in the application.
143 Accessing /sys/class/net/eth0/device is a bug in the application.
144
145 Never depend on the class-specific links back to the /sys/class
146 directory. These links are also a workaround for the design mistake
147 that class-devices are not created in /sys/devices. If a device
148 directory does not contain directories for child devices, these links
149 may be used to find the child devices in /sys/class. That is the single
150 valid use of these links, they must never appear in any path as an
151 element. Assuming the existence of these links for devices which are
152 real child device directories in the /sys/devices tree, is a bug in
153 the application.
154
155 It is planned to remove all these links when when all class-device
156 directories live in /sys/devices.
157
158- Position of devices along device chain can change.
159 Never depend on a specific parent device position in the devpath,
160 or the chain of parent devices. The kernel is free to insert devices into
161 the chain. You must always request the parent device you are looking for
162 by its subsystem value. You need to walk up the chain until you find
163 the device that matches the expected subsystem. Depending on a specific
164 position of a parent device, or exposing relative paths, using "../" to
165 access the chain of parents, is a bug in the application.
166
diff --git a/Documentation/thinkpad-acpi.txt b/Documentation/thinkpad-acpi.txt
index 2d4803359a04..9e6b94face4b 100644
--- a/Documentation/thinkpad-acpi.txt
+++ b/Documentation/thinkpad-acpi.txt
@@ -138,7 +138,7 @@ Hot keys
138-------- 138--------
139 139
140procfs: /proc/acpi/ibm/hotkey 140procfs: /proc/acpi/ibm/hotkey
141sysfs device attribute: hotkey/* 141sysfs device attribute: hotkey_*
142 142
143Without this driver, only the Fn-F4 key (sleep button) generates an 143Without this driver, only the Fn-F4 key (sleep button) generates an
144ACPI event. With the driver loaded, the hotkey feature enabled and the 144ACPI event. With the driver loaded, the hotkey feature enabled and the
@@ -196,10 +196,7 @@ The following commands can be written to the /proc/acpi/ibm/hotkey file:
196 196
197sysfs notes: 197sysfs notes:
198 198
199 The hot keys attributes are in a hotkey/ subdirectory off the 199 hotkey_bios_enabled:
200 thinkpad device.
201
202 bios_enabled:
203 Returns the status of the hot keys feature when 200 Returns the status of the hot keys feature when
204 thinkpad-acpi was loaded. Upon module unload, the hot 201 thinkpad-acpi was loaded. Upon module unload, the hot
205 key feature status will be restored to this value. 202 key feature status will be restored to this value.
@@ -207,19 +204,19 @@ sysfs notes:
207 0: hot keys were disabled 204 0: hot keys were disabled
208 1: hot keys were enabled 205 1: hot keys were enabled
209 206
210 bios_mask: 207 hotkey_bios_mask:
211 Returns the hot keys mask when thinkpad-acpi was loaded. 208 Returns the hot keys mask when thinkpad-acpi was loaded.
212 Upon module unload, the hot keys mask will be restored 209 Upon module unload, the hot keys mask will be restored
213 to this value. 210 to this value.
214 211
215 enable: 212 hotkey_enable:
216 Enables/disables the hot keys feature, and reports 213 Enables/disables the hot keys feature, and reports
217 current status of the hot keys feature. 214 current status of the hot keys feature.
218 215
219 0: disables the hot keys feature / feature disabled 216 0: disables the hot keys feature / feature disabled
220 1: enables the hot keys feature / feature enabled 217 1: enables the hot keys feature / feature enabled
221 218
222 mask: 219 hotkey_mask:
223 bit mask to enable ACPI event generation for each hot 220 bit mask to enable ACPI event generation for each hot
224 key (see above). Returns the current status of the hot 221 key (see above). Returns the current status of the hot
225 keys mask, and allows one to modify it. 222 keys mask, and allows one to modify it.
@@ -229,7 +226,7 @@ Bluetooth
229--------- 226---------
230 227
231procfs: /proc/acpi/ibm/bluetooth 228procfs: /proc/acpi/ibm/bluetooth
232sysfs device attribute: bluetooth/enable 229sysfs device attribute: bluetooth_enable
233 230
234This feature shows the presence and current state of a ThinkPad 231This feature shows the presence and current state of a ThinkPad
235Bluetooth device in the internal ThinkPad CDC slot. 232Bluetooth device in the internal ThinkPad CDC slot.
@@ -244,7 +241,7 @@ If Bluetooth is installed, the following commands can be used:
244Sysfs notes: 241Sysfs notes:
245 242
246 If the Bluetooth CDC card is installed, it can be enabled / 243 If the Bluetooth CDC card is installed, it can be enabled /
247 disabled through the "bluetooth/enable" thinkpad-acpi device 244 disabled through the "bluetooth_enable" thinkpad-acpi device
248 attribute, and its current status can also be queried. 245 attribute, and its current status can also be queried.
249 246
250 enable: 247 enable:
@@ -252,7 +249,7 @@ Sysfs notes:
252 1: enables Bluetooth / Bluetooth is enabled. 249 1: enables Bluetooth / Bluetooth is enabled.
253 250
254 Note: this interface will be probably be superseeded by the 251 Note: this interface will be probably be superseeded by the
255 generic rfkill class. 252 generic rfkill class, so it is NOT to be considered stable yet.
256 253
257Video output control -- /proc/acpi/ibm/video 254Video output control -- /proc/acpi/ibm/video
258-------------------------------------------- 255--------------------------------------------
@@ -898,7 +895,7 @@ EXPERIMENTAL: WAN
898----------------- 895-----------------
899 896
900procfs: /proc/acpi/ibm/wan 897procfs: /proc/acpi/ibm/wan
901sysfs device attribute: wwan/enable 898sysfs device attribute: wwan_enable
902 899
903This feature is marked EXPERIMENTAL because the implementation 900This feature is marked EXPERIMENTAL because the implementation
904directly accesses hardware registers and may not work as expected. USE 901directly accesses hardware registers and may not work as expected. USE
@@ -921,7 +918,7 @@ If the W-WAN card is installed, the following commands can be used:
921Sysfs notes: 918Sysfs notes:
922 919
923 If the W-WAN card is installed, it can be enabled / 920 If the W-WAN card is installed, it can be enabled /
924 disabled through the "wwan/enable" thinkpad-acpi device 921 disabled through the "wwan_enable" thinkpad-acpi device
925 attribute, and its current status can also be queried. 922 attribute, and its current status can also be queried.
926 923
927 enable: 924 enable:
@@ -929,7 +926,7 @@ Sysfs notes:
929 1: enables WWAN card / WWAN card is enabled. 926 1: enables WWAN card / WWAN card is enabled.
930 927
931 Note: this interface will be probably be superseeded by the 928 Note: this interface will be probably be superseeded by the
932 generic rfkill class. 929 generic rfkill class, so it is NOT to be considered stable yet.
933 930
934Multiple Commands, Module Parameters 931Multiple Commands, Module Parameters
935------------------------------------ 932------------------------------------
diff --git a/Documentation/usb/dma.txt b/Documentation/usb/dma.txt
index 62844aeba69c..e8b50b7de9d9 100644
--- a/Documentation/usb/dma.txt
+++ b/Documentation/usb/dma.txt
@@ -32,12 +32,15 @@ ELIMINATING COPIES
32It's good to avoid making CPUs copy data needlessly. The costs can add up, 32It's good to avoid making CPUs copy data needlessly. The costs can add up,
33and effects like cache-trashing can impose subtle penalties. 33and effects like cache-trashing can impose subtle penalties.
34 34
35- When you're allocating a buffer for DMA purposes anyway, use the buffer 35- If you're doing lots of small data transfers from the same buffer all
36 primitives. Think of them as kmalloc and kfree that give you the right 36 the time, that can really burn up resources on systems which use an
37 kind of addresses to store in urb->transfer_buffer and urb->transfer_dma, 37 IOMMU to manage the DMA mappings. It can cost MUCH more to set up and
38 while guaranteeing that no hidden copies through DMA "bounce" buffers will 38 tear down the IOMMU mappings with each request than perform the I/O!
39 slow things down. You'd also set URB_NO_TRANSFER_DMA_MAP in 39
40 urb->transfer_flags: 40 For those specific cases, USB has primitives to allocate less expensive
41 memory. They work like kmalloc and kfree versions that give you the right
42 kind of addresses to store in urb->transfer_buffer and urb->transfer_dma.
43 You'd also set URB_NO_TRANSFER_DMA_MAP in urb->transfer_flags:
41 44
42 void *usb_buffer_alloc (struct usb_device *dev, size_t size, 45 void *usb_buffer_alloc (struct usb_device *dev, size_t size,
43 int mem_flags, dma_addr_t *dma); 46 int mem_flags, dma_addr_t *dma);
@@ -45,6 +48,10 @@ and effects like cache-trashing can impose subtle penalties.
45 void usb_buffer_free (struct usb_device *dev, size_t size, 48 void usb_buffer_free (struct usb_device *dev, size_t size,
46 void *addr, dma_addr_t dma); 49 void *addr, dma_addr_t dma);
47 50
51 Most drivers should *NOT* be using these primitives; they don't need
52 to use this type of memory ("dma-coherent"), and memory returned from
53 kmalloc() will work just fine.
54
48 For control transfers you can use the buffer primitives or not for each 55 For control transfers you can use the buffer primitives or not for each
49 of the transfer buffer and setup buffer independently. Set the flag bits 56 of the transfer buffer and setup buffer independently. Set the flag bits
50 URB_NO_TRANSFER_DMA_MAP and URB_NO_SETUP_DMA_MAP to indicate which 57 URB_NO_TRANSFER_DMA_MAP and URB_NO_SETUP_DMA_MAP to indicate which
@@ -54,29 +61,39 @@ and effects like cache-trashing can impose subtle penalties.
54 The memory buffer returned is "dma-coherent"; sometimes you might need to 61 The memory buffer returned is "dma-coherent"; sometimes you might need to
55 force a consistent memory access ordering by using memory barriers. It's 62 force a consistent memory access ordering by using memory barriers. It's
56 not using a streaming DMA mapping, so it's good for small transfers on 63 not using a streaming DMA mapping, so it's good for small transfers on
57 systems where the I/O would otherwise tie up an IOMMU mapping. (See 64 systems where the I/O would otherwise thrash an IOMMU mapping. (See
58 Documentation/DMA-mapping.txt for definitions of "coherent" and "streaming" 65 Documentation/DMA-mapping.txt for definitions of "coherent" and "streaming"
59 DMA mappings.) 66 DMA mappings.)
60 67
61 Asking for 1/Nth of a page (as well as asking for N pages) is reasonably 68 Asking for 1/Nth of a page (as well as asking for N pages) is reasonably
62 space-efficient. 69 space-efficient.
63 70
71 On most systems the memory returned will be uncached, because the
72 semantics of dma-coherent memory require either bypassing CPU caches
73 or using cache hardware with bus-snooping support. While x86 hardware
74 has such bus-snooping, many other systems use software to flush cache
75 lines to prevent DMA conflicts.
76
64- Devices on some EHCI controllers could handle DMA to/from high memory. 77- Devices on some EHCI controllers could handle DMA to/from high memory.
65 Driver probe() routines can notice this using a generic DMA call, then
66 tell higher level code (network, scsi, etc) about it like this:
67 78
68 if (dma_supported (&intf->dev, 0xffffffffffffffffULL)) 79 Unfortunately, the current Linux DMA infrastructure doesn't have a sane
69 net->features |= NETIF_F_HIGHDMA; 80 way to expose these capabilities ... and in any case, HIGHMEM is mostly a
81 design wart specific to x86_32. So your best bet is to ensure you never
82 pass a highmem buffer into a USB driver. That's easy; it's the default
83 behavior. Just don't override it; e.g. with NETIF_F_HIGHDMA.
70 84
71 That can eliminate dma bounce buffering of requests that originate (or 85 This may force your callers to do some bounce buffering, copying from
72 terminate) in high memory, in cases where the buffers aren't allocated 86 high memory to "normal" DMA memory. If you can come up with a good way
73 with usb_buffer_alloc() but instead are dma-mapped. 87 to fix this issue (for x86_32 machines with over 1 GByte of memory),
88 feel free to submit patches.
74 89
75 90
76WORKING WITH EXISTING BUFFERS 91WORKING WITH EXISTING BUFFERS
77 92
78Existing buffers aren't usable for DMA without first being mapped into the 93Existing buffers aren't usable for DMA without first being mapped into the
79DMA address space of the device. 94DMA address space of the device. However, most buffers passed to your
95driver can safely be used with such DMA mapping. (See the first section
96of DMA-mapping.txt, titled "What memory is DMA-able?")
80 97
81- When you're using scatterlists, you can map everything at once. On some 98- When you're using scatterlists, you can map everything at once. On some
82 systems, this kicks in an IOMMU and turns the scatterlists into single 99 systems, this kicks in an IOMMU and turns the scatterlists into single
@@ -114,3 +131,8 @@ DMA address space of the device.
114 The calls manage urb->transfer_dma for you, and set URB_NO_TRANSFER_DMA_MAP 131 The calls manage urb->transfer_dma for you, and set URB_NO_TRANSFER_DMA_MAP
115 so that usbcore won't map or unmap the buffer. The same goes for 132 so that usbcore won't map or unmap the buffer. The same goes for
116 urb->setup_dma and URB_NO_SETUP_DMA_MAP for control requests. 133 urb->setup_dma and URB_NO_SETUP_DMA_MAP for control requests.
134
135Note that several of those interfaces are currently commented out, since
136they don't have current users. See the source code. Other than the dmasync
137calls (where the underlying DMA primitives have changed), most of them can
138easily be commented back in if you want to use them.
diff --git a/Documentation/usb/persist.txt b/Documentation/usb/persist.txt
new file mode 100644
index 000000000000..df54d645cbb5
--- /dev/null
+++ b/Documentation/usb/persist.txt
@@ -0,0 +1,156 @@
1 USB device persistence during system suspend
2
3 Alan Stern <stern@rowland.harvard.edu>
4
5 September 2, 2006 (Updated May 29, 2007)
6
7
8 What is the problem?
9
10According to the USB specification, when a USB bus is suspended the
11bus must continue to supply suspend current (around 1-5 mA). This
12is so that devices can maintain their internal state and hubs can
13detect connect-change events (devices being plugged in or unplugged).
14The technical term is "power session".
15
16If a USB device's power session is interrupted then the system is
17required to behave as though the device has been unplugged. It's a
18conservative approach; in the absence of suspend current the computer
19has no way to know what has actually happened. Perhaps the same
20device is still attached or perhaps it was removed and a different
21device plugged into the port. The system must assume the worst.
22
23By default, Linux behaves according to the spec. If a USB host
24controller loses power during a system suspend, then when the system
25wakes up all the devices attached to that controller are treated as
26though they had disconnected. This is always safe and it is the
27"officially correct" thing to do.
28
29For many sorts of devices this behavior doesn't matter in the least.
30If the kernel wants to believe that your USB keyboard was unplugged
31while the system was asleep and a new keyboard was plugged in when the
32system woke up, who cares? It'll still work the same when you type on
33it.
34
35Unfortunately problems _can_ arise, particularly with mass-storage
36devices. The effect is exactly the same as if the device really had
37been unplugged while the system was suspended. If you had a mounted
38filesystem on the device, you're out of luck -- everything in that
39filesystem is now inaccessible. This is especially annoying if your
40root filesystem was located on the device, since your system will
41instantly crash.
42
43Loss of power isn't the only mechanism to worry about. Anything that
44interrupts a power session will have the same effect. For example,
45even though suspend current may have been maintained while the system
46was asleep, on many systems during the initial stages of wakeup the
47firmware (i.e., the BIOS) resets the motherboard's USB host
48controllers. Result: all the power sessions are destroyed and again
49it's as though you had unplugged all the USB devices. Yes, it's
50entirely the BIOS's fault, but that doesn't do _you_ any good unless
51you can convince the BIOS supplier to fix the problem (lots of luck!).
52
53On many systems the USB host controllers will get reset after a
54suspend-to-RAM. On almost all systems, no suspend current is
55available during hibernation (also known as swsusp or suspend-to-disk).
56You can check the kernel log after resuming to see if either of these
57has happened; look for lines saying "root hub lost power or was reset".
58
59In practice, people are forced to unmount any filesystems on a USB
60device before suspending. If the root filesystem is on a USB device,
61the system can't be suspended at all. (All right, it _can_ be
62suspended -- but it will crash as soon as it wakes up, which isn't
63much better.)
64
65
66 What is the solution?
67
68Setting CONFIG_USB_PERSIST will cause the kernel to work around these
69issues. It enables a mode in which the core USB device data
70structures are allowed to persist across a power-session disruption.
71It works like this. If the kernel sees that a USB host controller is
72not in the expected state during resume (i.e., if the controller was
73reset or otherwise had lost power) then it applies a persistence check
74to each of the USB devices below that controller for which the
75"persist" attribute is set. It doesn't try to resume the device; that
76can't work once the power session is gone. Instead it issues a USB
77port reset and then re-enumerates the device. (This is exactly the
78same thing that happens whenever a USB device is reset.) If the
79re-enumeration shows that the device now attached to that port has the
80same descriptors as before, including the Vendor and Product IDs, then
81the kernel continues to use the same device structure. In effect, the
82kernel treats the device as though it had merely been reset instead of
83unplugged.
84
85If no device is now attached to the port, or if the descriptors are
86different from what the kernel remembers, then the treatment is what
87you would expect. The kernel destroys the old device structure and
88behaves as though the old device had been unplugged and a new device
89plugged in, just as it would without the CONFIG_USB_PERSIST option.
90
91The end result is that the USB device remains available and usable.
92Filesystem mounts and memory mappings are unaffected, and the world is
93now a good and happy place.
94
95Note that even when CONFIG_USB_PERSIST is set, the "persist" feature
96will be applied only to those devices for which it is enabled. You
97can enable the feature by doing (as root):
98
99 echo 1 >/sys/bus/usb/devices/.../power/persist
100
101where the "..." should be filled in the with the device's ID. Disable
102the feature by writing 0 instead of 1. For hubs the feature is
103automatically and permanently enabled, so you only have to worry about
104setting it for devices where it really matters.
105
106
107 Is this the best solution?
108
109Perhaps not. Arguably, keeping track of mounted filesystems and
110memory mappings across device disconnects should be handled by a
111centralized Logical Volume Manager. Such a solution would allow you
112to plug in a USB flash device, create a persistent volume associated
113with it, unplug the flash device, plug it back in later, and still
114have the same persistent volume associated with the device. As such
115it would be more far-reaching than CONFIG_USB_PERSIST.
116
117On the other hand, writing a persistent volume manager would be a big
118job and using it would require significant input from the user. This
119solution is much quicker and easier -- and it exists now, a giant
120point in its favor!
121
122Furthermore, the USB_PERSIST option applies to _all_ USB devices, not
123just mass-storage devices. It might turn out to be equally useful for
124other device types, such as network interfaces.
125
126
127 WARNING: Using CONFIG_USB_PERSIST can be dangerous!!
128
129When recovering an interrupted power session the kernel does its best
130to make sure the USB device hasn't been changed; that is, the same
131device is still plugged into the port as before. But the checks
132aren't guaranteed to be 100% accurate.
133
134If you replace one USB device with another of the same type (same
135manufacturer, same IDs, and so on) there's an excellent chance the
136kernel won't detect the change. Serial numbers and other strings are
137not compared. In many cases it wouldn't help if they were, because
138manufacturers frequently omit serial numbers entirely in their
139devices.
140
141Furthermore it's quite possible to leave a USB device exactly the same
142while changing its media. If you replace the flash memory card in a
143USB card reader while the system is asleep, the kernel will have no
144way to know you did it. The kernel will assume that nothing has
145happened and will continue to use the partition tables, inodes, and
146memory mappings for the old card.
147
148If the kernel gets fooled in this way, it's almost certain to cause
149data corruption and to crash your system. You'll have no one to blame
150but yourself.
151
152YOU HAVE BEEN WARNED! USE AT YOUR OWN RISK!
153
154That having been said, most of the time there shouldn't be any trouble
155at all. The "persist" feature can be extremely useful. Make the most
156of it.
diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt
index 727c8d81aeaf..1523320abd87 100644
--- a/Documentation/vm/slub.txt
+++ b/Documentation/vm/slub.txt
@@ -1,13 +1,9 @@
1Short users guide for SLUB 1Short users guide for SLUB
2-------------------------- 2--------------------------
3 3
4First of all slub should transparently replace SLAB. If you enable
5SLUB then everything should work the same (Note the word "should".
6There is likely not much value in that word at this point).
7
8The basic philosophy of SLUB is very different from SLAB. SLAB 4The basic philosophy of SLUB is very different from SLAB. SLAB
9requires rebuilding the kernel to activate debug options for all 5requires rebuilding the kernel to activate debug options for all
10SLABS. SLUB always includes full debugging but its off by default. 6slab caches. SLUB always includes full debugging but it is off by default.
11SLUB can enable debugging only for selected slabs in order to avoid 7SLUB can enable debugging only for selected slabs in order to avoid
12an impact on overall system performance which may make a bug more 8an impact on overall system performance which may make a bug more
13difficult to find. 9difficult to find.
@@ -76,13 +72,28 @@ of objects.
76Careful with tracing: It may spew out lots of information and never stop if 72Careful with tracing: It may spew out lots of information and never stop if
77used on the wrong slab. 73used on the wrong slab.
78 74
79SLAB Merging 75Slab merging
80------------ 76------------
81 77
82If no debugging is specified then SLUB may merge similar slabs together 78If no debug options are specified then SLUB may merge similar slabs together
83in order to reduce overhead and increase cache hotness of objects. 79in order to reduce overhead and increase cache hotness of objects.
84slabinfo -a displays which slabs were merged together. 80slabinfo -a displays which slabs were merged together.
85 81
82Slab validation
83---------------
84
85SLUB can validate all object if the kernel was booted with slub_debug. In
86order to do so you must have the slabinfo tool. Then you can do
87
88slabinfo -v
89
90which will test all objects. Output will be generated to the syslog.
91
92This also works in a more limited way if boot was without slab debug.
93In that case slabinfo -v simply tests all reachable objects. Usually
94these are in the cpu slabs and the partial slabs. Full slabs are not
95tracked by SLUB in a non debug situation.
96
86Getting more performance 97Getting more performance
87------------------------ 98------------------------
88 99
@@ -91,9 +102,9 @@ list_lock once in a while to deal with partial slabs. That overhead is
91governed by the order of the allocation for each slab. The allocations 102governed by the order of the allocation for each slab. The allocations
92can be influenced by kernel parameters: 103can be influenced by kernel parameters:
93 104
94slub_min_objects=x (default 8) 105slub_min_objects=x (default 4)
95slub_min_order=x (default 0) 106slub_min_order=x (default 0)
96slub_max_order=x (default 4) 107slub_max_order=x (default 1)
97 108
98slub_min_objects allows to specify how many objects must at least fit 109slub_min_objects allows to specify how many objects must at least fit
99into one slab in order for the allocation order to be acceptable. 110into one slab in order for the allocation order to be acceptable.
@@ -109,5 +120,107 @@ longer be checked. This is useful to avoid SLUB trying to generate
109super large order pages to fit slub_min_objects of a slab cache with 120super large order pages to fit slub_min_objects of a slab cache with
110large object sizes into one high order page. 121large object sizes into one high order page.
111 122
112 123SLUB Debug output
113Christoph Lameter, <clameter@sgi.com>, April 10, 2007 124-----------------
125
126Here is a sample of slub debug output:
127
128*** SLUB kmalloc-8: Redzone Active@0xc90f6d20 slab 0xc528c530 offset=3360 flags=0x400000c3 inuse=61 freelist=0xc90f6d58
129 Bytes b4 0xc90f6d10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
130 Object 0xc90f6d20: 31 30 31 39 2e 30 30 35 1019.005
131 Redzone 0xc90f6d28: 00 cc cc cc .
132FreePointer 0xc90f6d2c -> 0xc90f6d58
133Last alloc: get_modalias+0x61/0xf5 jiffies_ago=53 cpu=1 pid=554
134Filler 0xc90f6d50: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
135 [<c010523d>] dump_trace+0x63/0x1eb
136 [<c01053df>] show_trace_log_lvl+0x1a/0x2f
137 [<c010601d>] show_trace+0x12/0x14
138 [<c0106035>] dump_stack+0x16/0x18
139 [<c017e0fa>] object_err+0x143/0x14b
140 [<c017e2cc>] check_object+0x66/0x234
141 [<c017eb43>] __slab_free+0x239/0x384
142 [<c017f446>] kfree+0xa6/0xc6
143 [<c02e2335>] get_modalias+0xb9/0xf5
144 [<c02e23b7>] dmi_dev_uevent+0x27/0x3c
145 [<c027866a>] dev_uevent+0x1ad/0x1da
146 [<c0205024>] kobject_uevent_env+0x20a/0x45b
147 [<c020527f>] kobject_uevent+0xa/0xf
148 [<c02779f1>] store_uevent+0x4f/0x58
149 [<c027758e>] dev_attr_store+0x29/0x2f
150 [<c01bec4f>] sysfs_write_file+0x16e/0x19c
151 [<c0183ba7>] vfs_write+0xd1/0x15a
152 [<c01841d7>] sys_write+0x3d/0x72
153 [<c0104112>] sysenter_past_esp+0x5f/0x99
154 [<b7f7b410>] 0xb7f7b410
155 =======================
156@@@ SLUB kmalloc-8: Restoring redzone (0xcc) from 0xc90f6d28-0xc90f6d2b
157
158
159
160If SLUB encounters a corrupted object then it will perform the following
161actions:
162
1631. Isolation and report of the issue
164
165This will be a message in the system log starting with
166
167*** SLUB <slab cache affected>: <What went wrong>@<object address>
168offset=<offset of object into slab> flags=<slabflags>
169inuse=<objects in use in this slab> freelist=<first free object in slab>
170
1712. Report on how the problem was dealt with in order to ensure the continued
172operation of the system.
173
174These are messages in the system log beginning with
175
176@@@ SLUB <slab cache affected>: <corrective action taken>
177
178
179In the above sample SLUB found that the Redzone of an active object has
180been overwritten. Here a string of 8 characters was written into a slab that
181has the length of 8 characters. However, a 8 character string needs a
182terminating 0. That zero has overwritten the first byte of the Redzone field.
183After reporting the details of the issue encountered the @@@ SLUB message
184tell us that SLUB has restored the redzone to its proper value and then
185system operations continue.
186
187Various types of lines can follow the @@@ SLUB line:
188
189Bytes b4 <address> : <bytes>
190 Show a few bytes before the object where the problem was detected.
191 Can be useful if the corruption does not stop with the start of the
192 object.
193
194Object <address> : <bytes>
195 The bytes of the object. If the object is inactive then the bytes
196 typically contain poisoning values. Any non-poison value shows a
197 corruption by a write after free.
198
199Redzone <address> : <bytes>
200 The redzone following the object. The redzone is used to detect
201 writes after the object. All bytes should always have the same
202 value. If there is any deviation then it is due to a write after
203 the object boundary.
204
205Freepointer
206 The pointer to the next free object in the slab. May become
207 corrupted if overwriting continues after the red zone.
208
209Last alloc:
210Last free:
211 Shows the address from which the object was allocated/freed last.
212 We note the pid, the time and the CPU that did so. This is usually
213 the most useful information to figure out where things went wrong.
214 Here get_modalias() did an kmalloc(8) instead of a kmalloc(9).
215
216Filler <address> : <bytes>
217 Unused data to fill up the space in order to get the next object
218 properly aligned. In the debug case we make sure that there are
219 at least 4 bytes of filler. This allow for the detection of writes
220 before the object.
221
222Following the filler will be a stackdump. That stackdump describes the
223location where the error was detected. The cause of the corruption is more
224likely to be found by looking at the information about the last alloc / free.
225
226Christoph Lameter, <clameter@sgi.com>, May 23, 2007
diff --git a/Documentation/volatile-considered-harmful.txt b/Documentation/volatile-considered-harmful.txt
new file mode 100644
index 000000000000..10c2e411cca8
--- /dev/null
+++ b/Documentation/volatile-considered-harmful.txt
@@ -0,0 +1,119 @@
1Why the "volatile" type class should not be used
2------------------------------------------------
3
4C programmers have often taken volatile to mean that the variable could be
5changed outside of the current thread of execution; as a result, they are
6sometimes tempted to use it in kernel code when shared data structures are
7being used. In other words, they have been known to treat volatile types
8as a sort of easy atomic variable, which they are not. The use of volatile in
9kernel code is almost never correct; this document describes why.
10
11The key point to understand with regard to volatile is that its purpose is
12to suppress optimization, which is almost never what one really wants to
13do. In the kernel, one must protect shared data structures against
14unwanted concurrent access, which is very much a different task. The
15process of protecting against unwanted concurrency will also avoid almost
16all optimization-related problems in a more efficient way.
17
18Like volatile, the kernel primitives which make concurrent access to data
19safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent
20unwanted optimization. If they are being used properly, there will be no
21need to use volatile as well. If volatile is still necessary, there is
22almost certainly a bug in the code somewhere. In properly-written kernel
23code, volatile can only serve to slow things down.
24
25Consider a typical block of kernel code:
26
27 spin_lock(&the_lock);
28 do_something_on(&shared_data);
29 do_something_else_with(&shared_data);
30 spin_unlock(&the_lock);
31
32If all the code follows the locking rules, the value of shared_data cannot
33change unexpectedly while the_lock is held. Any other code which might
34want to play with that data will be waiting on the lock. The spinlock
35primitives act as memory barriers - they are explicitly written to do so -
36meaning that data accesses will not be optimized across them. So the
37compiler might think it knows what will be in shared_data, but the
38spin_lock() call, since it acts as a memory barrier, will force it to
39forget anything it knows. There will be no optimization problems with
40accesses to that data.
41
42If shared_data were declared volatile, the locking would still be
43necessary. But the compiler would also be prevented from optimizing access
44to shared_data _within_ the critical section, when we know that nobody else
45can be working with it. While the lock is held, shared_data is not
46volatile. When dealing with shared data, proper locking makes volatile
47unnecessary - and potentially harmful.
48
49The volatile storage class was originally meant for memory-mapped I/O
50registers. Within the kernel, register accesses, too, should be protected
51by locks, but one also does not want the compiler "optimizing" register
52accesses within a critical section. But, within the kernel, I/O memory
53accesses are always done through accessor functions; accessing I/O memory
54directly through pointers is frowned upon and does not work on all
55architectures. Those accessors are written to prevent unwanted
56optimization, so, once again, volatile is unnecessary.
57
58Another situation where one might be tempted to use volatile is
59when the processor is busy-waiting on the value of a variable. The right
60way to perform a busy wait is:
61
62 while (my_variable != what_i_want)
63 cpu_relax();
64
65The cpu_relax() call can lower CPU power consumption or yield to a
66hyperthreaded twin processor; it also happens to serve as a memory barrier,
67so, once again, volatile is unnecessary. Of course, busy-waiting is
68generally an anti-social act to begin with.
69
70There are still a few rare situations where volatile makes sense in the
71kernel:
72
73 - The above-mentioned accessor functions might use volatile on
74 architectures where direct I/O memory access does work. Essentially,
75 each accessor call becomes a little critical section on its own and
76 ensures that the access happens as expected by the programmer.
77
78 - Inline assembly code which changes memory, but which has no other
79 visible side effects, risks being deleted by GCC. Adding the volatile
80 keyword to asm statements will prevent this removal.
81
82 - The jiffies variable is special in that it can have a different value
83 every time it is referenced, but it can be read without any special
84 locking. So jiffies can be volatile, but the addition of other
85 variables of this type is strongly frowned upon. Jiffies is considered
86 to be a "stupid legacy" issue (Linus's words) in this regard; fixing it
87 would be more trouble than it is worth.
88
89 - Pointers to data structures in coherent memory which might be modified
90 by I/O devices can, sometimes, legitimately be volatile. A ring buffer
91 used by a network adapter, where that adapter changes pointers to
92 indicate which descriptors have been processed, is an example of this
93 type of situation.
94
95For most code, none of the above justifications for volatile apply. As a
96result, the use of volatile is likely to be seen as a bug and will bring
97additional scrutiny to the code. Developers who are tempted to use
98volatile should take a step back and think about what they are truly trying
99to accomplish.
100
101Patches to remove volatile variables are generally welcome - as long as
102they come with a justification which shows that the concurrency issues have
103been properly thought through.
104
105
106NOTES
107-----
108
109[1] http://lwn.net/Articles/233481/
110[2] http://lwn.net/Articles/233482/
111
112CREDITS
113-------
114
115Original impetus and research by Randy Dunlap
116Written by Jonathan Corbet
117Improvements via coments from Satyam Sharma, Johannes Stezenbach, Jesper
118 Juhl, Heikki Orsila, H. Peter Anvin, Philipp Hahn, and Stefan
119 Richter.
diff --git a/Documentation/watchdog/pcwd-watchdog.txt b/Documentation/watchdog/pcwd-watchdog.txt
index d9ee6336c1d4..4f68052395c0 100644
--- a/Documentation/watchdog/pcwd-watchdog.txt
+++ b/Documentation/watchdog/pcwd-watchdog.txt
@@ -1,3 +1,5 @@
1Last reviewed: 10/05/2007
2
1 Berkshire Products PC Watchdog Card 3 Berkshire Products PC Watchdog Card
2 Support for ISA Cards Revision A and C 4 Support for ISA Cards Revision A and C
3 Documentation and Driver by Ken Hollis <kenji@bitgate.com> 5 Documentation and Driver by Ken Hollis <kenji@bitgate.com>
@@ -14,8 +16,8 @@
14 16
15 The Watchdog Driver will automatically find your watchdog card, and will 17 The Watchdog Driver will automatically find your watchdog card, and will
16 attach a running driver for use with that card. After the watchdog 18 attach a running driver for use with that card. After the watchdog
17 drivers have initialized, you can then talk to the card using the PC 19 drivers have initialized, you can then talk to the card using a PC
18 Watchdog program, available from http://ftp.bitgate.com/pcwd/. 20 Watchdog program.
19 21
20 I suggest putting a "watchdog -d" before the beginning of an fsck, and 22 I suggest putting a "watchdog -d" before the beginning of an fsck, and
21 a "watchdog -e -t 1" immediately after the end of an fsck. (Remember 23 a "watchdog -e -t 1" immediately after the end of an fsck. (Remember
@@ -62,5 +64,3 @@
62 -- Ken Hollis 64 -- Ken Hollis
63 (kenji@bitgate.com) 65 (kenji@bitgate.com)
64 66
65(This documentation may be out of date. Check
66 http://ftp.bitgate.com/pcwd/ for the absolute latest additions.)
diff --git a/Documentation/watchdog/watchdog-api.txt b/Documentation/watchdog/watchdog-api.txt
index 8d16f6f3c4ec..bb7cb1d31ec7 100644
--- a/Documentation/watchdog/watchdog-api.txt
+++ b/Documentation/watchdog/watchdog-api.txt
@@ -1,3 +1,6 @@
1Last reviewed: 10/05/2007
2
3
1The Linux Watchdog driver API. 4The Linux Watchdog driver API.
2 5
3Copyright 2002 Christer Weingel <wingel@nano-system.com> 6Copyright 2002 Christer Weingel <wingel@nano-system.com>
@@ -22,7 +25,7 @@ the system. If userspace fails (RAM error, kernel bug, whatever), the
22notifications cease to occur, and the hardware watchdog will reset the 25notifications cease to occur, and the hardware watchdog will reset the
23system (causing a reboot) after the timeout occurs. 26system (causing a reboot) after the timeout occurs.
24 27
25The Linux watchdog API is a rather AD hoc construction and different 28The Linux watchdog API is a rather ad-hoc construction and different
26drivers implement different, and sometimes incompatible, parts of it. 29drivers implement different, and sometimes incompatible, parts of it.
27This file is an attempt to document the existing usage and allow 30This file is an attempt to document the existing usage and allow
28future driver writers to use it as a reference. 31future driver writers to use it as a reference.
@@ -46,14 +49,16 @@ some of the drivers support the configuration option "Disable watchdog
46shutdown on close", CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when 49shutdown on close", CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when
47compiling the kernel, there is no way of disabling the watchdog once 50compiling the kernel, there is no way of disabling the watchdog once
48it has been started. So, if the watchdog daemon crashes, the system 51it has been started. So, if the watchdog daemon crashes, the system
49will reboot after the timeout has passed. 52will reboot after the timeout has passed. Watchdog devices also usually
53support the nowayout module parameter so that this option can be controlled
54at runtime.
50 55
51Some other drivers will not disable the watchdog, unless a specific 56Drivers will not disable the watchdog, unless a specific magic character 'V'
52magic character 'V' has been sent /dev/watchdog just before closing 57has been sent /dev/watchdog just before closing the file. If the userspace
53the file. If the userspace daemon closes the file without sending 58daemon closes the file without sending this special character, the driver
54this special character, the driver will assume that the daemon (and 59will assume that the daemon (and userspace in general) died, and will stop
55userspace in general) died, and will stop pinging the watchdog without 60pinging the watchdog without disabling it first. This will then cause a
56disabling it first. This will then cause a reboot. 61reboot if the watchdog is not re-opened in sufficient time.
57 62
58The ioctl API: 63The ioctl API:
59 64
@@ -227,218 +232,3 @@ The following options are available:
227 232
228[FIXME -- better explanations] 233[FIXME -- better explanations]
229 234
230Implementations in the current drivers in the kernel tree:
231
232Here I have tried to summarize what the different drivers support and
233where they do strange things compared to the other drivers.
234
235acquirewdt.c -- Acquire Single Board Computer
236
237 This driver has a hardcoded timeout of 1 minute
238
239 Supports CONFIG_WATCHDOG_NOWAYOUT
240
241 GETSUPPORT returns KEEPALIVEPING. GETSTATUS will return 1 if
242 the device is open, 0 if not. [FIXME -- isn't this rather
243 silly? To be able to use the ioctl, the device must be open
244 and so GETSTATUS will always return 1].
245
246advantechwdt.c -- Advantech Single Board Computer
247
248 Timeout that defaults to 60 seconds, supports SETTIMEOUT.
249
250 Supports CONFIG_WATCHDOG_NOWAYOUT
251
252 GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
253 The GETSTATUS call returns if the device is open or not.
254 [FIXME -- silliness again?]
255
256booke_wdt.c -- PowerPC BookE Watchdog Timer
257
258 Timeout default varies according to frequency, supports
259 SETTIMEOUT
260
261 Watchdog cannot be turned off, CONFIG_WATCHDOG_NOWAYOUT
262 does not make sense
263
264 GETSUPPORT returns the watchdog_info struct, and
265 GETSTATUS returns the supported options. GETBOOTSTATUS
266 returns a 1 if the last reset was caused by the
267 watchdog and a 0 otherwise. This watchdog cannot be
268 disabled once it has been started. The wdt_period kernel
269 parameter selects which bit of the time base changing
270 from 0->1 will trigger the watchdog exception. Changing
271 the timeout from the ioctl calls will change the
272 wdt_period as defined above. Finally if you would like to
273 replace the default Watchdog Handler you can implement the
274 WatchdogHandler() function in your own code.
275
276eurotechwdt.c -- Eurotech CPU-1220/1410
277
278 The timeout can be set using the SETTIMEOUT ioctl and defaults
279 to 60 seconds.
280
281 Also has a module parameter "ev", event type which controls
282 what should happen on a timeout, the string "int" or anything
283 else that causes a reboot. [FIXME -- better description]
284
285 Supports CONFIG_WATCHDOG_NOWAYOUT
286
287 GETSUPPORT returns CARDRESET and WDIOF_SETTIMEOUT but
288 GETSTATUS is not supported and GETBOOTSTATUS just returns 0.
289
290i810-tco.c -- Intel 810 chipset
291
292 Also has support for a lot of other i8x0 stuff, but the
293 watchdog is one of the things.
294
295 The timeout is set using the module parameter "i810_margin",
296 which is in steps of 0.6 seconds where 2<i810_margin<64. The
297 driver supports the SETTIMEOUT ioctl.
298
299 Supports CONFIG_WATCHDOG_NOWAYOUT.
300
301 GETSUPPORT returns WDIOF_SETTIMEOUT. The GETSTATUS call
302 returns some kind of timer value which ist not compatible with
303 the other drivers. GETBOOT status returns some kind of
304 hardware specific boot status. [FIXME -- describe this]
305
306ib700wdt.c -- IB700 Single Board Computer
307
308 Default timeout of 30 seconds and the timeout is settable
309 using the SETTIMEOUT ioctl. Note that only a few timeout
310 values are supported.
311
312 Supports CONFIG_WATCHDOG_NOWAYOUT
313
314 GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
315 The GETSTATUS call returns if the device is open or not.
316 [FIXME -- silliness again?]
317
318machzwd.c -- MachZ ZF-Logic
319
320 Hardcoded timeout of 10 seconds
321
322 Has a module parameter "action" that controls what happens
323 when the timeout runs out which can be 0 = RESET (default),
324 1 = SMI, 2 = NMI, 3 = SCI.
325
326 Supports CONFIG_WATCHDOG_NOWAYOUT and the magic character
327 'V' close handling.
328
329 GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call
330 returns if the device is open or not. [FIXME -- silliness
331 again?]
332
333mixcomwd.c -- MixCom Watchdog
334
335 [FIXME -- I'm unable to tell what the timeout is]
336
337 Supports CONFIG_WATCHDOG_NOWAYOUT
338
339 GETSUPPORT returns WDIOF_KEEPALIVEPING, GETSTATUS returns if
340 the device is opened or not [FIXME -- I'm not really sure how
341 this works, there seems to be some magic connected to
342 CONFIG_WATCHDOG_NOWAYOUT]
343
344pcwd.c -- Berkshire PC Watchdog
345
346 Hardcoded timeout of 1.5 seconds
347
348 Supports CONFIG_WATCHDOG_NOWAYOUT
349
350 GETSUPPORT returns WDIOF_OVERHEAT|WDIOF_CARDRESET and both
351 GETSTATUS and GETBOOTSTATUS return something useful.
352
353 The SETOPTIONS call can be used to enable and disable the card
354 and to ask the driver to call panic if the system overheats.
355
356sbc60xxwdt.c -- 60xx Single Board Computer
357
358 Hardcoded timeout of 10 seconds
359
360 Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic
361 character 'V' close handling.
362
363 No bits set in GETSUPPORT
364
365scx200.c -- National SCx200 CPUs
366
367 Not in the kernel yet.
368
369 The timeout is set using a module parameter "margin" which
370 defaults to 60 seconds. The timeout can also be set using
371 SETTIMEOUT and read using GETTIMEOUT.
372
373 Supports a module parameter "nowayout" that is initialized
374 with the value of CONFIG_WATCHDOG_NOWAYOUT. Also supports the
375 magic character 'V' handling.
376
377shwdt.c -- SuperH 3/4 processors
378
379 [FIXME -- I'm unable to tell what the timeout is]
380
381 Supports CONFIG_WATCHDOG_NOWAYOUT
382
383 GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call
384 returns if the device is open or not. [FIXME -- silliness
385 again?]
386
387softdog.c -- Software watchdog
388
389 The timeout is set with the module parameter "soft_margin"
390 which defaults to 60 seconds, the timeout is also settable
391 using the SETTIMEOUT ioctl.
392
393 Supports CONFIG_WATCHDOG_NOWAYOUT
394
395 WDIOF_SETTIMEOUT bit set in GETSUPPORT
396
397w83877f_wdt.c -- W83877F Computer
398
399 Hardcoded timeout of 30 seconds
400
401 Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic
402 character 'V' close handling.
403
404 No bits set in GETSUPPORT
405
406w83627hf_wdt.c -- w83627hf watchdog
407
408 Timeout that defaults to 60 seconds, supports SETTIMEOUT.
409
410 Supports CONFIG_WATCHDOG_NOWAYOUT
411
412 GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
413 The GETSTATUS call returns if the device is open or not.
414
415wdt.c -- ICS WDT500/501 ISA and
416wdt_pci.c -- ICS WDT500/501 PCI
417
418 Default timeout of 60 seconds. The timeout is also settable
419 using the SETTIMEOUT ioctl.
420
421 Supports CONFIG_WATCHDOG_NOWAYOUT
422
423 GETSUPPORT returns with bits set depending on the actual
424 card. The WDT501 supports a lot of external monitoring, the
425 WDT500 much less.
426
427wdt285.c -- Footbridge watchdog
428
429 The timeout is set with the module parameter "soft_margin"
430 which defaults to 60 seconds. The timeout is also settable
431 using the SETTIMEOUT ioctl.
432
433 Does not support CONFIG_WATCHDOG_NOWAYOUT
434
435 WDIOF_SETTIMEOUT bit set in GETSUPPORT
436
437wdt977.c -- Netwinder W83977AF chip
438
439 Hardcoded timeout of 3 minutes
440
441 Supports CONFIG_WATCHDOG_NOWAYOUT
442
443 Does not support any ioctls at all.
444
diff --git a/Documentation/watchdog/watchdog.txt b/Documentation/watchdog/watchdog.txt
deleted file mode 100644
index 4b1ff69cc19a..000000000000
--- a/Documentation/watchdog/watchdog.txt
+++ /dev/null
@@ -1,94 +0,0 @@
1 Watchdog Timer Interfaces For The Linux Operating System
2
3 Alan Cox <alan@lxorguk.ukuu.org.uk>
4
5 Custom Linux Driver And Program Development
6
7
8The following watchdog drivers are currently implemented:
9
10 ICS WDT501-P
11 ICS WDT501-P (no fan tachometer)
12 ICS WDT500-P
13 Software Only
14 SA1100 Internal Watchdog
15 Berkshire Products PC Watchdog Revision A & C (by Ken Hollis)
16
17
18All six interfaces provide /dev/watchdog, which when open must be written
19to within a timeout or the machine will reboot. Each write delays the reboot
20time another timeout. In the case of the software watchdog the ability to
21reboot will depend on the state of the machines and interrupts. The hardware
22boards physically pull the machine down off their own onboard timers and
23will reboot from almost anything.
24
25A second temperature monitoring interface is available on the WDT501P cards
26and some Berkshire cards. This provides /dev/temperature. This is the machine
27internal temperature in degrees Fahrenheit. Each read returns a single byte
28giving the temperature.
29
30The third interface logs kernel messages on additional alert events.
31
32Both software and hardware watchdog drivers are available in the standard
33kernel. If you are using the software watchdog, you probably also want
34to use "panic=60" as a boot argument as well.
35
36The wdt card cannot be safely probed for. Instead you need to pass
37wdt=ioaddr,irq as a boot parameter - eg "wdt=0x240,11".
38
39The SA1100 watchdog module can be configured with the "sa1100_margin"
40commandline argument which specifies timeout value in seconds.
41
42The i810 TCO watchdog modules can be configured with the "i810_margin"
43commandline argument which specifies the counter initial value. The counter
44is decremented every 0.6 seconds and default to 50 (30 seconds). Values can
45range between 3 and 63.
46
47The i810 TCO watchdog driver also implements the WDIOC_GETSTATUS and
48WDIOC_GETBOOTSTATUS ioctl()s. WDIOC_GETSTATUS returns the actual counter value
49and WDIOC_GETBOOTSTATUS returns the value of TCO2 Status Register (see Intel's
50documentation for the 82801AA and 82801AB datasheet).
51
52Features
53--------
54 WDT501P WDT500P Software Berkshire i810 TCO SA1100WD
55Reboot Timer X X X X X X
56External Reboot X X o o o X
57I/O Port Monitor o o o X o o
58Temperature X o o X o o
59Fan Speed X o o o o o
60Power Under X o o o o o
61Power Over X o o o o o
62Overheat X o o o o o
63
64The external event interfaces on the WDT boards are not currently supported.
65Minor numbers are however allocated for it.
66
67
68Example Watchdog Driver: see Documentation/watchdog/src/watchdog-simple.c
69
70
71Contact Information
72
73People keep asking about the WDT watchdog timer hardware: The phone contacts
74for Industrial Computer Source are:
75
76Industrial Computer Source
77http://www.indcompsrc.com
78ICS Advent, San Diego
796260 Sequence Dr.
80San Diego, CA 92121-4371
81Phone (858) 677-0877
82FAX: (858) 677-0895
83>
84ICS Advent Europe, UK
85Oving Road
86Chichester,
87West Sussex,
88PO19 4ET, UK
89Phone: 00.44.1243.533900
90
91
92and please mention Linux when enquiring.
93
94For full information about the PCWD cards see the pcwd-watchdog.txt document.
diff --git a/Documentation/watchdog/wdt.txt b/Documentation/watchdog/wdt.txt
new file mode 100644
index 000000000000..03fd756d976d
--- /dev/null
+++ b/Documentation/watchdog/wdt.txt
@@ -0,0 +1,43 @@
1Last Reviewed: 10/05/2007
2
3 WDT Watchdog Timer Interfaces For The Linux Operating System
4 Alan Cox <alan@lxorguk.ukuu.org.uk>
5
6 ICS WDT501-P
7 ICS WDT501-P (no fan tachometer)
8 ICS WDT500-P
9
10All the interfaces provide /dev/watchdog, which when open must be written
11to within a timeout or the machine will reboot. Each write delays the reboot
12time another timeout. In the case of the software watchdog the ability to
13reboot will depend on the state of the machines and interrupts. The hardware
14boards physically pull the machine down off their own onboard timers and
15will reboot from almost anything.
16
17A second temperature monitoring interface is available on the WDT501P cards
18This provides /dev/temperature. This is the machine internal temperature in
19degrees Fahrenheit. Each read returns a single byte giving the temperature.
20
21The third interface logs kernel messages on additional alert events.
22
23The wdt card cannot be safely probed for. Instead you need to pass
24wdt=ioaddr,irq as a boot parameter - eg "wdt=0x240,11".
25
26Features
27--------
28 WDT501P WDT500P
29Reboot Timer X X
30External Reboot X X
31I/O Port Monitor o o
32Temperature X o
33Fan Speed X o
34Power Under X o
35Power Over X o
36Overheat X o
37
38The external event interfaces on the WDT boards are not currently supported.
39Minor numbers are however allocated for it.
40
41
42Example Watchdog Driver: see Documentation/watchdog/src/watchdog-simple.c
43