diff options
Diffstat (limited to 'Documentation')
59 files changed, 2472 insertions, 1302 deletions
diff --git a/Documentation/ABI/removed/raw1394_legacy_isochronous b/Documentation/ABI/removed/raw1394_legacy_isochronous new file mode 100644 index 000000000000..1b629622d883 --- /dev/null +++ b/Documentation/ABI/removed/raw1394_legacy_isochronous | |||
@@ -0,0 +1,16 @@ | |||
1 | What: legacy isochronous ABI of raw1394 (1st generation iso ABI) | ||
2 | Date: June 2007 (scheduled), removed in kernel v2.6.23 | ||
3 | Contact: linux1394-devel@lists.sourceforge.net | ||
4 | Description: | ||
5 | The two request types RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN have | ||
6 | been deprecated for quite some time. They are very inefficient as they | ||
7 | come with high interrupt load and several layers of callbacks for each | ||
8 | packet. Because of these deficiencies, the video1394 and dv1394 drivers | ||
9 | and the 3rd-generation isochronous ABI in raw1394 (rawiso) were created. | ||
10 | |||
11 | Users: | ||
12 | libraw1394 users via the long deprecated API raw1394_iso_write, | ||
13 | raw1394_start_iso_write, raw1394_start_iso_rcv, raw1394_stop_iso_rcv | ||
14 | |||
15 | libdc1394, which optionally uses these old libraw1394 calls | ||
16 | alternatively to the more efficient video1394 ABI | ||
diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb index f9937add033d..9734577d1711 100644 --- a/Documentation/ABI/testing/sysfs-bus-usb +++ b/Documentation/ABI/testing/sysfs-bus-usb | |||
@@ -39,3 +39,16 @@ Description: | |||
39 | If you want to suspend a device immediately but leave it | 39 | If you want to suspend a device immediately but leave it |
40 | free to wake up in response to I/O requests, you should | 40 | free to wake up in response to I/O requests, you should |
41 | write "0" to power/autosuspend. | 41 | write "0" to power/autosuspend. |
42 | |||
43 | What: /sys/bus/usb/devices/.../power/persist | ||
44 | Date: May 2007 | ||
45 | KernelVersion: 2.6.23 | ||
46 | Contact: Alan Stern <stern@rowland.harvard.edu> | ||
47 | Description: | ||
48 | If CONFIG_USB_PERSIST is set, then each USB device directory | ||
49 | will contain a file named power/persist. The file holds a | ||
50 | boolean value (0 or 1) indicating whether or not the | ||
51 | "USB-Persist" facility is enabled for the device. Since the | ||
52 | facility is inherently dangerous, it is disabled by default | ||
53 | for all devices except hubs. For more information, see | ||
54 | Documentation/usb/persist.txt. | ||
diff --git a/Documentation/BUG-HUNTING b/Documentation/BUG-HUNTING index 65b97e1dbf70..35f5bd243336 100644 --- a/Documentation/BUG-HUNTING +++ b/Documentation/BUG-HUNTING | |||
@@ -191,6 +191,30 @@ e.g. crash dump output as shown by Dave Miller. | |||
191 | > mov 0x8(%ebp), %ebx ! %ebx = skb->sk | 191 | > mov 0x8(%ebp), %ebx ! %ebx = skb->sk |
192 | > mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt | 192 | > mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt |
193 | 193 | ||
194 | In addition, you can use GDB to figure out the exact file and line | ||
195 | number of the OOPS from the vmlinux file. If you have | ||
196 | CONFIG_DEBUG_INFO enabled, you can simply copy the EIP value from the | ||
197 | OOPS: | ||
198 | |||
199 | EIP: 0060:[<c021e50e>] Not tainted VLI | ||
200 | |||
201 | And use GDB to translate that to human-readable form: | ||
202 | |||
203 | gdb vmlinux | ||
204 | (gdb) l *0xc021e50e | ||
205 | |||
206 | If you don't have CONFIG_DEBUG_INFO enabled, you use the function | ||
207 | offset from the OOPS: | ||
208 | |||
209 | EIP is at vt_ioctl+0xda8/0x1482 | ||
210 | |||
211 | And recompile the kernel with CONFIG_DEBUG_INFO enabled: | ||
212 | |||
213 | make vmlinux | ||
214 | gdb vmlinux | ||
215 | (gdb) p vt_ioctl | ||
216 | (gdb) l *(0x<address of vt_ioctl> + 0xda8) | ||
217 | |||
194 | Another very useful option of the Kernel Hacking section in menuconfig is | 218 | Another very useful option of the Kernel Hacking section in menuconfig is |
195 | Debug memory allocations. This will help you see whether data has been | 219 | Debug memory allocations. This will help you see whether data has been |
196 | initialised and not set before use etc. To see the values that get assigned | 220 | initialised and not set before use etc. To see the values that get assigned |
diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt index 028614cdd062..e07f2530326b 100644 --- a/Documentation/DMA-mapping.txt +++ b/Documentation/DMA-mapping.txt | |||
@@ -664,109 +664,6 @@ It is that simple. | |||
664 | Well, not for some odd devices. See the next section for information | 664 | Well, not for some odd devices. See the next section for information |
665 | about that. | 665 | about that. |
666 | 666 | ||
667 | DAC Addressing for Address Space Hungry Devices | ||
668 | |||
669 | There exists a class of devices which do not mesh well with the PCI | ||
670 | DMA mapping API. By definition these "mappings" are a finite | ||
671 | resource. The number of total available mappings per bus is platform | ||
672 | specific, but there will always be a reasonable amount. | ||
673 | |||
674 | What is "reasonable"? Reasonable means that networking and block I/O | ||
675 | devices need not worry about using too many mappings. | ||
676 | |||
677 | As an example of a problematic device, consider compute cluster cards. | ||
678 | They can potentially need to access gigabytes of memory at once via | ||
679 | DMA. Dynamic mappings are unsuitable for this kind of access pattern. | ||
680 | |||
681 | To this end we've provided a small API by which a device driver | ||
682 | may use DAC cycles to directly address all of physical memory. | ||
683 | Not all platforms support this, but most do. It is easy to determine | ||
684 | whether the platform will work properly at probe time. | ||
685 | |||
686 | First, understand that there may be a SEVERE performance penalty for | ||
687 | using these interfaces on some platforms. Therefore, you MUST only | ||
688 | use these interfaces if it is absolutely required. %99 of devices can | ||
689 | use the normal APIs without any problems. | ||
690 | |||
691 | Note that for streaming type mappings you must either use these | ||
692 | interfaces, or the dynamic mapping interfaces above. You may not mix | ||
693 | usage of both for the same device. Such an act is illegal and is | ||
694 | guaranteed to put a banana in your tailpipe. | ||
695 | |||
696 | However, consistent mappings may in fact be used in conjunction with | ||
697 | these interfaces. Remember that, as defined, consistent mappings are | ||
698 | always going to be SAC addressable. | ||
699 | |||
700 | The first thing your driver needs to do is query the PCI platform | ||
701 | layer if it is capable of handling your devices DAC addressing | ||
702 | capabilities: | ||
703 | |||
704 | int pci_dac_dma_supported(struct pci_dev *hwdev, u64 mask); | ||
705 | |||
706 | You may not use the following interfaces if this routine fails. | ||
707 | |||
708 | Next, DMA addresses using this API are kept track of using the | ||
709 | dma64_addr_t type. It is guaranteed to be big enough to hold any | ||
710 | DAC address the platform layer will give to you from the following | ||
711 | routines. If you have consistent mappings as well, you still | ||
712 | use plain dma_addr_t to keep track of those. | ||
713 | |||
714 | All mappings obtained here will be direct. The mappings are not | ||
715 | translated, and this is the purpose of this dialect of the DMA API. | ||
716 | |||
717 | All routines work with page/offset pairs. This is the _ONLY_ way to | ||
718 | portably refer to any piece of memory. If you have a cpu pointer | ||
719 | (which may be validly DMA'd too) you may easily obtain the page | ||
720 | and offset using something like this: | ||
721 | |||
722 | struct page *page = virt_to_page(ptr); | ||
723 | unsigned long offset = offset_in_page(ptr); | ||
724 | |||
725 | Here are the interfaces: | ||
726 | |||
727 | dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev, | ||
728 | struct page *page, | ||
729 | unsigned long offset, | ||
730 | int direction); | ||
731 | |||
732 | The DAC address for the tuple PAGE/OFFSET are returned. The direction | ||
733 | argument is the same as for pci_{map,unmap}_single(). The same rules | ||
734 | for cpu/device access apply here as for the streaming mapping | ||
735 | interfaces. To reiterate: | ||
736 | |||
737 | The cpu may touch the buffer before pci_dac_page_to_dma. | ||
738 | The device may touch the buffer after pci_dac_page_to_dma | ||
739 | is made, but the cpu may NOT. | ||
740 | |||
741 | When the DMA transfer is complete, invoke: | ||
742 | |||
743 | void pci_dac_dma_sync_single_for_cpu(struct pci_dev *pdev, | ||
744 | dma64_addr_t dma_addr, | ||
745 | size_t len, int direction); | ||
746 | |||
747 | This must be done before the CPU looks at the buffer again. | ||
748 | This interface behaves identically to pci_dma_sync_{single,sg}_for_cpu(). | ||
749 | |||
750 | And likewise, if you wish to let the device get back at the buffer after | ||
751 | the cpu has read/written it, invoke: | ||
752 | |||
753 | void pci_dac_dma_sync_single_for_device(struct pci_dev *pdev, | ||
754 | dma64_addr_t dma_addr, | ||
755 | size_t len, int direction); | ||
756 | |||
757 | before letting the device access the DMA area again. | ||
758 | |||
759 | If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t | ||
760 | the following interfaces are provided: | ||
761 | |||
762 | struct page *pci_dac_dma_to_page(struct pci_dev *pdev, | ||
763 | dma64_addr_t dma_addr); | ||
764 | unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev, | ||
765 | dma64_addr_t dma_addr); | ||
766 | |||
767 | This is possible with the DAC interfaces purely because they are | ||
768 | not translated in any way. | ||
769 | |||
770 | Optimizing Unmap State Space Consumption | 667 | Optimizing Unmap State Space Consumption |
771 | 668 | ||
772 | On many platforms, pci_unmap_{single,page}() is simply a nop. | 669 | On many platforms, pci_unmap_{single,page}() is simply a nop. |
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index 38f88b6ae405..46bcff2849bd 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl | |||
@@ -643,4 +643,70 @@ X!Idrivers/video/console/fonts.c | |||
643 | !Edrivers/spi/spi.c | 643 | !Edrivers/spi/spi.c |
644 | </chapter> | 644 | </chapter> |
645 | 645 | ||
646 | <chapter id="i2c"> | ||
647 | <title>I<superscript>2</superscript>C and SMBus Subsystem</title> | ||
648 | |||
649 | <para> | ||
650 | I<superscript>2</superscript>C (or without fancy typography, "I2C") | ||
651 | is an acronym for the "Inter-IC" bus, a simple bus protocol which is | ||
652 | widely used where low data rate communications suffice. | ||
653 | Since it's also a licensed trademark, some vendors use another | ||
654 | name (such as "Two-Wire Interface", TWI) for the same bus. | ||
655 | I2C only needs two signals (SCL for clock, SDA for data), conserving | ||
656 | board real estate and minimizing signal quality issues. | ||
657 | Most I2C devices use seven bit addresses, and bus speeds of up | ||
658 | to 400 kHz; there's a high speed extension (3.4 MHz) that's not yet | ||
659 | found wide use. | ||
660 | I2C is a multi-master bus; open drain signaling is used to | ||
661 | arbitrate between masters, as well as to handshake and to | ||
662 | synchronize clocks from slower clients. | ||
663 | </para> | ||
664 | |||
665 | <para> | ||
666 | The Linux I2C programming interfaces support only the master | ||
667 | side of bus interactions, not the slave side. | ||
668 | The programming interface is structured around two kinds of driver, | ||
669 | and two kinds of device. | ||
670 | An I2C "Adapter Driver" abstracts the controller hardware; it binds | ||
671 | to a physical device (perhaps a PCI device or platform_device) and | ||
672 | exposes a <structname>struct i2c_adapter</structname> representing | ||
673 | each I2C bus segment it manages. | ||
674 | On each I2C bus segment will be I2C devices represented by a | ||
675 | <structname>struct i2c_client</structname>. Those devices will | ||
676 | be bound to a <structname>struct i2c_driver</structname>, | ||
677 | which should follow the standard Linux driver model. | ||
678 | (At this writing, a legacy model is more widely used.) | ||
679 | There are functions to perform various I2C protocol operations; at | ||
680 | this writing all such functions are usable only from task context. | ||
681 | </para> | ||
682 | |||
683 | <para> | ||
684 | The System Management Bus (SMBus) is a sibling protocol. Most SMBus | ||
685 | systems are also I2C conformant. The electrical constraints are | ||
686 | tighter for SMBus, and it standardizes particular protocol messages | ||
687 | and idioms. Controllers that support I2C can also support most | ||
688 | SMBus operations, but SMBus controllers don't support all the protocol | ||
689 | options that an I2C controller will. | ||
690 | There are functions to perform various SMBus protocol operations, | ||
691 | either using I2C primitives or by issuing SMBus commands to | ||
692 | i2c_adapter devices which don't support those I2C operations. | ||
693 | </para> | ||
694 | |||
695 | !Iinclude/linux/i2c.h | ||
696 | !Fdrivers/i2c/i2c-boardinfo.c i2c_register_board_info | ||
697 | !Edrivers/i2c/i2c-core.c | ||
698 | </chapter> | ||
699 | |||
700 | <chapter id="splice"> | ||
701 | <title>splice API</title> | ||
702 | <para>) | ||
703 | splice is a method for moving blocks of data around inside the | ||
704 | kernel, without continually transferring it between the kernel | ||
705 | and user space. | ||
706 | </para> | ||
707 | !Iinclude/linux/splice.h | ||
708 | !Ffs/splice.c | ||
709 | </chapter> | ||
710 | |||
711 | |||
646 | </book> | 712 | </book> |
diff --git a/Documentation/HOWTO b/Documentation/HOWTO index ced9207bedcf..98e2701c746f 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO | |||
@@ -322,39 +322,34 @@ kernel releases as described above. | |||
322 | Here is a list of some of the different kernel trees available: | 322 | Here is a list of some of the different kernel trees available: |
323 | git trees: | 323 | git trees: |
324 | - Kbuild development tree, Sam Ravnborg <sam@ravnborg.org> | 324 | - Kbuild development tree, Sam Ravnborg <sam@ravnborg.org> |
325 | kernel.org:/pub/scm/linux/kernel/git/sam/kbuild.git | 325 | git.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild.git |
326 | 326 | ||
327 | - ACPI development tree, Len Brown <len.brown@intel.com> | 327 | - ACPI development tree, Len Brown <len.brown@intel.com> |
328 | kernel.org:/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git | 328 | git.kernel.org:/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git |
329 | 329 | ||
330 | - Block development tree, Jens Axboe <axboe@suse.de> | 330 | - Block development tree, Jens Axboe <axboe@suse.de> |
331 | kernel.org:/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git | 331 | git.kernel.org:/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git |
332 | 332 | ||
333 | - DRM development tree, Dave Airlie <airlied@linux.ie> | 333 | - DRM development tree, Dave Airlie <airlied@linux.ie> |
334 | kernel.org:/pub/scm/linux/kernel/git/airlied/drm-2.6.git | 334 | git.kernel.org:/pub/scm/linux/kernel/git/airlied/drm-2.6.git |
335 | 335 | ||
336 | - ia64 development tree, Tony Luck <tony.luck@intel.com> | 336 | - ia64 development tree, Tony Luck <tony.luck@intel.com> |
337 | kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6.git | 337 | git.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6.git |
338 | |||
339 | - ieee1394 development tree, Jody McIntyre <scjody@modernduck.com> | ||
340 | kernel.org:/pub/scm/linux/kernel/git/scjody/ieee1394.git | ||
341 | 338 | ||
342 | - infiniband, Roland Dreier <rolandd@cisco.com> | 339 | - infiniband, Roland Dreier <rolandd@cisco.com> |
343 | kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git | 340 | git.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git |
344 | 341 | ||
345 | - libata, Jeff Garzik <jgarzik@pobox.com> | 342 | - libata, Jeff Garzik <jgarzik@pobox.com> |
346 | kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git | 343 | git.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git |
347 | 344 | ||
348 | - network drivers, Jeff Garzik <jgarzik@pobox.com> | 345 | - network drivers, Jeff Garzik <jgarzik@pobox.com> |
349 | kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git | 346 | git.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git |
350 | 347 | ||
351 | - pcmcia, Dominik Brodowski <linux@dominikbrodowski.net> | 348 | - pcmcia, Dominik Brodowski <linux@dominikbrodowski.net> |
352 | kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git | 349 | git.kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git |
353 | 350 | ||
354 | - SCSI, James Bottomley <James.Bottomley@SteelEye.com> | 351 | - SCSI, James Bottomley <James.Bottomley@SteelEye.com> |
355 | kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git | 352 | git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git |
356 | |||
357 | Other git kernel trees can be found listed at http://kernel.org/git | ||
358 | 353 | ||
359 | quilt trees: | 354 | quilt trees: |
360 | - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de> | 355 | - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de> |
@@ -362,6 +357,9 @@ Here is a list of some of the different kernel trees available: | |||
362 | - x86-64, partly i386, Andi Kleen <ak@suse.de> | 357 | - x86-64, partly i386, Andi Kleen <ak@suse.de> |
363 | ftp.firstfloor.org:/pub/ak/x86_64/quilt/ | 358 | ftp.firstfloor.org:/pub/ak/x86_64/quilt/ |
364 | 359 | ||
360 | Other kernel trees can be found listed at http://git.kernel.org/ and in | ||
361 | the MAINTAINERS file. | ||
362 | |||
365 | Bug Reporting | 363 | Bug Reporting |
366 | ------------- | 364 | ------------- |
367 | 365 | ||
diff --git a/Documentation/SM501.txt b/Documentation/SM501.txt new file mode 100644 index 000000000000..3a1bd95d3767 --- /dev/null +++ b/Documentation/SM501.txt | |||
@@ -0,0 +1,66 @@ | |||
1 | SM501 Driver | ||
2 | ============ | ||
3 | |||
4 | Copyright 2006, 2007 Simtec Electronics | ||
5 | |||
6 | Core | ||
7 | ---- | ||
8 | |||
9 | The core driver in drivers/mfd provides common services for the | ||
10 | drivers which manage the specific hardware blocks. These services | ||
11 | include locking for common registers, clock control and resource | ||
12 | management. | ||
13 | |||
14 | The core registers drivers for both PCI and generic bus based | ||
15 | chips via the platform device and driver system. | ||
16 | |||
17 | On detection of a device, the core initialises the chip (which may | ||
18 | be specified by the platform data) and then exports the selected | ||
19 | peripheral set as platform devices for the specific drivers. | ||
20 | |||
21 | The core re-uses the platform device system as the platform device | ||
22 | system provides enough features to support the drivers without the | ||
23 | need to create a new bus-type and the associated code to go with it. | ||
24 | |||
25 | |||
26 | Resources | ||
27 | --------- | ||
28 | |||
29 | Each peripheral has a view of the device which is implicitly narrowed to | ||
30 | the specific set of resources that peripheral requires in order to | ||
31 | function correctly. | ||
32 | |||
33 | The centralised memory allocation allows the driver to ensure that the | ||
34 | maximum possible resource allocation can be made to the video subsystem | ||
35 | as this is by-far the most resource-sensitive of the on-chip functions. | ||
36 | |||
37 | The primary issue with memory allocation is that of moving the video | ||
38 | buffers once a display mode is chosen. Indeed when a video mode change | ||
39 | occurs the memory footprint of the video subsystem changes. | ||
40 | |||
41 | Since video memory is difficult to move without changing the display | ||
42 | (unless sufficient contiguous memory can be provided for the old and new | ||
43 | modes simultaneously) the video driver fully utilises the memory area | ||
44 | given to it by aligning fb0 to the start of the area and fb1 to the end | ||
45 | of it. Any memory left over in the middle is used for the acceleration | ||
46 | functions, which are transient and thus their location is less critical | ||
47 | as it can be moved. | ||
48 | |||
49 | |||
50 | Configuration | ||
51 | ------------- | ||
52 | |||
53 | The platform device driver uses a set of platform data to pass | ||
54 | configurations through to the core and the subsidiary drivers | ||
55 | so that there can be support for more than one system carrying | ||
56 | an SM501 built into a single kernel image. | ||
57 | |||
58 | The PCI driver assumes that the PCI card behaves as per the Silicon | ||
59 | Motion reference design. | ||
60 | |||
61 | There is an errata (AB-5) affecting the selection of the | ||
62 | of the M1XCLK and M1CLK frequencies. These two clocks | ||
63 | must be sourced from the same PLL, although they can then | ||
64 | be divided down individually. If this is not set, then SM501 may | ||
65 | lock and hang the whole system. The driver will refuse to | ||
66 | attach if the PLL selection is different. | ||
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist index 3af3e65cf43b..6ebffb57e3db 100644 --- a/Documentation/SubmitChecklist +++ b/Documentation/SubmitChecklist | |||
@@ -84,3 +84,9 @@ kernel patches. | |||
84 | 24: Avoid whitespace damage such as indenting with spaces or whitespace | 84 | 24: Avoid whitespace damage such as indenting with spaces or whitespace |
85 | at the end of lines. You can test this by feeding the patch to | 85 | at the end of lines. You can test this by feeding the patch to |
86 | "git apply --check --whitespace=error-all" | 86 | "git apply --check --whitespace=error-all" |
87 | |||
88 | 25: Check your patch for general style as detailed in | ||
89 | Documentation/CodingStyle. Check for trivial violations with the | ||
90 | patch style checker prior to submission (scripts/checkpatch.pl). | ||
91 | You should be able to justify all violations that remain in | ||
92 | your patch. | ||
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index a417b25fb1aa..0958e97d4bf4 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches | |||
@@ -118,7 +118,20 @@ then only post say 15 or so at a time and wait for review and integration. | |||
118 | 118 | ||
119 | 119 | ||
120 | 120 | ||
121 | 4) Select e-mail destination. | 121 | 4) Style check your changes. |
122 | |||
123 | Check your patch for basic style violations, details of which can be | ||
124 | found in Documentation/CodingStyle. Failure to do so simply wastes | ||
125 | the reviewers time and will get your patch rejected, probabally | ||
126 | without even being read. | ||
127 | |||
128 | At a minimum you should check your patches with the patch style | ||
129 | checker prior to submission (scripts/patchcheck.pl). You should | ||
130 | be able to justify all violations that remain in your patch. | ||
131 | |||
132 | |||
133 | |||
134 | 5) Select e-mail destination. | ||
122 | 135 | ||
123 | Look through the MAINTAINERS file and the source code, and determine | 136 | Look through the MAINTAINERS file and the source code, and determine |
124 | if your change applies to a specific subsystem of the kernel, with | 137 | if your change applies to a specific subsystem of the kernel, with |
@@ -146,7 +159,7 @@ discussed should the patch then be submitted to Linus. | |||
146 | 159 | ||
147 | 160 | ||
148 | 161 | ||
149 | 5) Select your CC (e-mail carbon copy) list. | 162 | 6) Select your CC (e-mail carbon copy) list. |
150 | 163 | ||
151 | Unless you have a reason NOT to do so, CC linux-kernel@vger.kernel.org. | 164 | Unless you have a reason NOT to do so, CC linux-kernel@vger.kernel.org. |
152 | 165 | ||
@@ -187,8 +200,7 @@ URL: <http://www.kernel.org/pub/linux/kernel/people/bunk/trivial/> | |||
187 | 200 | ||
188 | 201 | ||
189 | 202 | ||
190 | 203 | 7) No MIME, no links, no compression, no attachments. Just plain text. | |
191 | 6) No MIME, no links, no compression, no attachments. Just plain text. | ||
192 | 204 | ||
193 | Linus and other kernel developers need to be able to read and comment | 205 | Linus and other kernel developers need to be able to read and comment |
194 | on the changes you are submitting. It is important for a kernel | 206 | on the changes you are submitting. It is important for a kernel |
@@ -223,9 +235,9 @@ pref("mailnews.display.disable_format_flowed_support", true); | |||
223 | 235 | ||
224 | 236 | ||
225 | 237 | ||
226 | 7) E-mail size. | 238 | 8) E-mail size. |
227 | 239 | ||
228 | When sending patches to Linus, always follow step #6. | 240 | When sending patches to Linus, always follow step #7. |
229 | 241 | ||
230 | Large changes are not appropriate for mailing lists, and some | 242 | Large changes are not appropriate for mailing lists, and some |
231 | maintainers. If your patch, uncompressed, exceeds 40 kB in size, | 243 | maintainers. If your patch, uncompressed, exceeds 40 kB in size, |
@@ -234,7 +246,7 @@ server, and provide instead a URL (link) pointing to your patch. | |||
234 | 246 | ||
235 | 247 | ||
236 | 248 | ||
237 | 8) Name your kernel version. | 249 | 9) Name your kernel version. |
238 | 250 | ||
239 | It is important to note, either in the subject line or in the patch | 251 | It is important to note, either in the subject line or in the patch |
240 | description, the kernel version to which this patch applies. | 252 | description, the kernel version to which this patch applies. |
@@ -244,7 +256,7 @@ Linus will not apply it. | |||
244 | 256 | ||
245 | 257 | ||
246 | 258 | ||
247 | 9) Don't get discouraged. Re-submit. | 259 | 10) Don't get discouraged. Re-submit. |
248 | 260 | ||
249 | After you have submitted your change, be patient and wait. If Linus | 261 | After you have submitted your change, be patient and wait. If Linus |
250 | likes your change and applies it, it will appear in the next version | 262 | likes your change and applies it, it will appear in the next version |
@@ -270,7 +282,7 @@ When in doubt, solicit comments on linux-kernel mailing list. | |||
270 | 282 | ||
271 | 283 | ||
272 | 284 | ||
273 | 10) Include PATCH in the subject | 285 | 11) Include PATCH in the subject |
274 | 286 | ||
275 | Due to high e-mail traffic to Linus, and to linux-kernel, it is common | 287 | Due to high e-mail traffic to Linus, and to linux-kernel, it is common |
276 | convention to prefix your subject line with [PATCH]. This lets Linus | 288 | convention to prefix your subject line with [PATCH]. This lets Linus |
@@ -279,7 +291,7 @@ e-mail discussions. | |||
279 | 291 | ||
280 | 292 | ||
281 | 293 | ||
282 | 11) Sign your work | 294 | 12) Sign your work |
283 | 295 | ||
284 | To improve tracking of who did what, especially with patches that can | 296 | To improve tracking of who did what, especially with patches that can |
285 | percolate to their final resting place in the kernel through several | 297 | percolate to their final resting place in the kernel through several |
@@ -328,7 +340,32 @@ now, but you can do this to mark internal company procedures or just | |||
328 | point out some special detail about the sign-off. | 340 | point out some special detail about the sign-off. |
329 | 341 | ||
330 | 342 | ||
331 | 12) The canonical patch format | 343 | 13) When to use Acked-by: |
344 | |||
345 | The Signed-off-by: tag indicates that the signer was involved in the | ||
346 | development of the patch, or that he/she was in the patch's delivery path. | ||
347 | |||
348 | If a person was not directly involved in the preparation or handling of a | ||
349 | patch but wishes to signify and record their approval of it then they can | ||
350 | arrange to have an Acked-by: line added to the patch's changelog. | ||
351 | |||
352 | Acked-by: is often used by the maintainer of the affected code when that | ||
353 | maintainer neither contributed to nor forwarded the patch. | ||
354 | |||
355 | Acked-by: is not as formal as Signed-off-by:. It is a record that the acker | ||
356 | has at least reviewed the patch and has indicated acceptance. Hence patch | ||
357 | mergers will sometimes manually convert an acker's "yep, looks good to me" | ||
358 | into an Acked-by:. | ||
359 | |||
360 | Acked-by: does not necessarily indicate acknowledgement of the entire patch. | ||
361 | For example, if a patch affects multiple subsystems and has an Acked-by: from | ||
362 | one subsystem maintainer then this usually indicates acknowledgement of just | ||
363 | the part which affects that maintainer's code. Judgement should be used here. | ||
364 | When in doubt people should refer to the original discussion in the mailing | ||
365 | list archives. | ||
366 | |||
367 | |||
368 | 14) The canonical patch format | ||
332 | 369 | ||
333 | The canonical patch subject line is: | 370 | The canonical patch subject line is: |
334 | 371 | ||
@@ -427,6 +464,10 @@ section Linus Computer Science 101. | |||
427 | Nuff said. If your code deviates too much from this, it is likely | 464 | Nuff said. If your code deviates too much from this, it is likely |
428 | to be rejected without further review, and without comment. | 465 | to be rejected without further review, and without comment. |
429 | 466 | ||
467 | Check your patches with the patch style checker prior to submission | ||
468 | (scripts/checkpatch.pl). You should be able to justify all | ||
469 | violations that remain in your patch. | ||
470 | |||
430 | 471 | ||
431 | 472 | ||
432 | 2) #ifdefs are ugly | 473 | 2) #ifdefs are ugly |
diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt index 2a63d5662a93..05851e9982ed 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/atomic_ops.txt | |||
@@ -149,7 +149,7 @@ defined which accomplish this: | |||
149 | void smp_mb__before_atomic_dec(void); | 149 | void smp_mb__before_atomic_dec(void); |
150 | void smp_mb__after_atomic_dec(void); | 150 | void smp_mb__after_atomic_dec(void); |
151 | void smp_mb__before_atomic_inc(void); | 151 | void smp_mb__before_atomic_inc(void); |
152 | void smp_mb__after_atomic_dec(void); | 152 | void smp_mb__after_atomic_inc(void); |
153 | 153 | ||
154 | For example, smp_mb__before_atomic_dec() can be used like so: | 154 | For example, smp_mb__before_atomic_dec() can be used like so: |
155 | 155 | ||
diff --git a/Documentation/blackfin/kgdb.txt b/Documentation/blackfin/kgdb.txt new file mode 100644 index 000000000000..84f6a484ae9a --- /dev/null +++ b/Documentation/blackfin/kgdb.txt | |||
@@ -0,0 +1,155 @@ | |||
1 | A Simple Guide to Configure KGDB | ||
2 | |||
3 | Sonic Zhang <sonic.zhang@analog.com> | ||
4 | Aug. 24th 2006 | ||
5 | |||
6 | |||
7 | This KGDB patch enables the kernel developer to do source level debugging on | ||
8 | the kernel for the Blackfin architecture. The debugging works over either the | ||
9 | ethernet interface or one of the uarts. Both software breakpoints and | ||
10 | hardware breakpoints are supported in this version. | ||
11 | http://docs.blackfin.uclinux.org/doku.php?id=kgdb | ||
12 | |||
13 | |||
14 | 2 known issues: | ||
15 | 1. This bug: | ||
16 | http://blackfin.uclinux.org/tracker/index.php?func=detail&aid=544&group_id=18&atid=145 | ||
17 | The GDB client for Blackfin uClinux causes incorrect values of local | ||
18 | variables to be displayed when the user breaks the running of kernel in GDB. | ||
19 | 2. Because of a hardware bug in Blackfin 533 v1.0.3: | ||
20 | 05000067 - Watchpoints (Hardware Breakpoints) are not supported | ||
21 | Hardware breakpoints cannot be set properly. | ||
22 | |||
23 | |||
24 | Debug over Ethernet: | ||
25 | |||
26 | 1. Compile and install the cross platform version of gdb for blackfin, which | ||
27 | can be found at $(BINROOT)/bfin-elf-gdb. | ||
28 | |||
29 | 2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under | ||
30 | "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb". | ||
31 | With this selected, option "Full Symbolic/Source Debugging support" and | ||
32 | "Compile the kernel with frame pointers" are also selected. | ||
33 | |||
34 | 3. Select option "KGDB: connect over (Ethernet)". Add "kgdboe=@target-IP/,@host-IP/" to | ||
35 | the option "Compiled-in Kernel Boot Parameter" under "Kernel hacking". | ||
36 | |||
37 | 4. Connect minicom to the serial port and boot the kernel image. | ||
38 | |||
39 | 5. Configure the IP "/> ifconfig eth0 target-IP" | ||
40 | |||
41 | 6. Start GDB client "bfin-elf-gdb vmlinux". | ||
42 | |||
43 | 7. Connect to the target "(gdb) target remote udp:target-IP:6443". | ||
44 | |||
45 | 8. Set software breakpoint "(gdb) break sys_open". | ||
46 | |||
47 | 9. Continue "(gdb) c". | ||
48 | |||
49 | 10. Run ls in the target console "/> ls". | ||
50 | |||
51 | 11. Breakpoint hits. "Breakpoint 1: sys_open(..." | ||
52 | |||
53 | 12. Display local variables and function paramters. | ||
54 | (*) This operation gives wrong results, see known issue 1. | ||
55 | |||
56 | 13. Single stepping "(gdb) si". | ||
57 | |||
58 | 14. Remove breakpoint 1. "(gdb) del 1" | ||
59 | |||
60 | 15. Set hardware breakpoint "(gdb) hbreak sys_open". | ||
61 | |||
62 | 16. Continue "(gdb) c". | ||
63 | |||
64 | 17. Run ls in the target console "/> ls". | ||
65 | |||
66 | 18. Hardware breakpoint hits. "Breakpoint 1: sys_open(...". | ||
67 | (*) This hardware breakpoint will not be hit, see known issue 2. | ||
68 | |||
69 | 19. Continue "(gdb) c". | ||
70 | |||
71 | 20. Interrupt the target in GDB "Ctrl+C". | ||
72 | |||
73 | 21. Detach from the target "(gdb) detach". | ||
74 | |||
75 | 22. Exit GDB "(gdb) quit". | ||
76 | |||
77 | |||
78 | Debug over the UART: | ||
79 | |||
80 | 1. Compile and install the cross platform version of gdb for blackfin, which | ||
81 | can be found at $(BINROOT)/bfin-elf-gdb. | ||
82 | |||
83 | 2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under | ||
84 | "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb". | ||
85 | With this selected, option "Full Symbolic/Source Debugging support" and | ||
86 | "Compile the kernel with frame pointers" are also selected. | ||
87 | |||
88 | 3. Select option "KGDB: connect over (UART)". Set "KGDB: UART port number" to be | ||
89 | a different one from the console. Don't forget to change the mode of | ||
90 | blackfin serial driver to PIO. Otherwise kgdb works incorrectly on UART. | ||
91 | |||
92 | 4. If you want connect to kgdb when the kernel boots, enable | ||
93 | "KGDB: Wait for gdb connection early" | ||
94 | |||
95 | 5. Compile kernel. | ||
96 | |||
97 | 6. Connect minicom to the serial port of the console and boot the kernel image. | ||
98 | |||
99 | 7. Start GDB client "bfin-elf-gdb vmlinux". | ||
100 | |||
101 | 8. Set the baud rate in GDB "(gdb) set remotebaud 57600". | ||
102 | |||
103 | 9. Connect to the target on the second serial port "(gdb) target remote /dev/ttyS1". | ||
104 | |||
105 | 10. Set software breakpoint "(gdb) break sys_open". | ||
106 | |||
107 | 11. Continue "(gdb) c". | ||
108 | |||
109 | 12. Run ls in the target console "/> ls". | ||
110 | |||
111 | 13. A breakpoint is hit. "Breakpoint 1: sys_open(..." | ||
112 | |||
113 | 14. All other operations are the same as that in KGDB over Ethernet. | ||
114 | |||
115 | |||
116 | Debug over the same UART as console: | ||
117 | |||
118 | 1. Compile and install the cross platform version of gdb for blackfin, which | ||
119 | can be found at $(BINROOT)/bfin-elf-gdb. | ||
120 | |||
121 | 2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under | ||
122 | "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb". | ||
123 | With this selected, option "Full Symbolic/Source Debugging support" and | ||
124 | "Compile the kernel with frame pointers" are also selected. | ||
125 | |||
126 | 3. Select option "KGDB: connect over UART". Set "KGDB: UART port number" to console. | ||
127 | Don't forget to change the mode of blackfin serial driver to PIO. | ||
128 | Otherwise kgdb works incorrectly on UART. | ||
129 | |||
130 | 4. If you want connect to kgdb when the kernel boots, enable | ||
131 | "KGDB: Wait for gdb connection early" | ||
132 | |||
133 | 5. Connect minicom to the serial port and boot the kernel image. | ||
134 | |||
135 | 6. (Optional) Ask target to wait for gdb connection by entering Ctrl+A. In minicom, you should enter Ctrl+A+A. | ||
136 | |||
137 | 7. Start GDB client "bfin-elf-gdb vmlinux". | ||
138 | |||
139 | 8. Set the baud rate in GDB "(gdb) set remotebaud 57600". | ||
140 | |||
141 | 9. Connect to the target "(gdb) target remote /dev/ttyS0". | ||
142 | |||
143 | 10. Set software breakpoint "(gdb) break sys_open". | ||
144 | |||
145 | 11. Continue "(gdb) c". Then enter Ctrl+C twice to stop GDB connection. | ||
146 | |||
147 | 12. Run ls in the target console "/> ls". Dummy string can be seen on the console. | ||
148 | |||
149 | 13. Then connect the gdb to target again. "(gdb) target remote /dev/ttyS0". | ||
150 | Now you will find a breakpoint is hit. "Breakpoint 1: sys_open(..." | ||
151 | |||
152 | 14. All other operations are the same as that in KGDB over Ethernet. The only | ||
153 | difference is that after continue command in GDB, please stop GDB | ||
154 | connection by 2 "Ctrl+C"s and connect again after breakpoints are hit or | ||
155 | Ctrl+A is entered. | ||
diff --git a/Documentation/block/barrier.txt b/Documentation/block/barrier.txt index a272c3db8094..7d279f2f5bb2 100644 --- a/Documentation/block/barrier.txt +++ b/Documentation/block/barrier.txt | |||
@@ -82,23 +82,12 @@ including draining and flushing. | |||
82 | typedef void (prepare_flush_fn)(request_queue_t *q, struct request *rq); | 82 | typedef void (prepare_flush_fn)(request_queue_t *q, struct request *rq); |
83 | 83 | ||
84 | int blk_queue_ordered(request_queue_t *q, unsigned ordered, | 84 | int blk_queue_ordered(request_queue_t *q, unsigned ordered, |
85 | prepare_flush_fn *prepare_flush_fn, | 85 | prepare_flush_fn *prepare_flush_fn); |
86 | unsigned gfp_mask); | ||
87 | |||
88 | int blk_queue_ordered_locked(request_queue_t *q, unsigned ordered, | ||
89 | prepare_flush_fn *prepare_flush_fn, | ||
90 | unsigned gfp_mask); | ||
91 | |||
92 | The only difference between the two functions is whether or not the | ||
93 | caller is holding q->queue_lock on entry. The latter expects the | ||
94 | caller is holding the lock. | ||
95 | 86 | ||
96 | @q : the queue in question | 87 | @q : the queue in question |
97 | @ordered : the ordered mode the driver/device supports | 88 | @ordered : the ordered mode the driver/device supports |
98 | @prepare_flush_fn : this function should prepare @rq such that it | 89 | @prepare_flush_fn : this function should prepare @rq such that it |
99 | flushes cache to physical medium when executed | 90 | flushes cache to physical medium when executed |
100 | @gfp_mask : gfp_mask used when allocating data structures | ||
101 | for ordered processing | ||
102 | 91 | ||
103 | For example, SCSI disk driver's prepare_flush_fn looks like the | 92 | For example, SCSI disk driver's prepare_flush_fn looks like the |
104 | following. | 93 | following. |
@@ -106,9 +95,10 @@ following. | |||
106 | static void sd_prepare_flush(request_queue_t *q, struct request *rq) | 95 | static void sd_prepare_flush(request_queue_t *q, struct request *rq) |
107 | { | 96 | { |
108 | memset(rq->cmd, 0, sizeof(rq->cmd)); | 97 | memset(rq->cmd, 0, sizeof(rq->cmd)); |
109 | rq->flags |= REQ_BLOCK_PC; | 98 | rq->cmd_type = REQ_TYPE_BLOCK_PC; |
110 | rq->timeout = SD_TIMEOUT; | 99 | rq->timeout = SD_TIMEOUT; |
111 | rq->cmd[0] = SYNCHRONIZE_CACHE; | 100 | rq->cmd[0] = SYNCHRONIZE_CACHE; |
101 | rq->cmd_len = 10; | ||
112 | } | 102 | } |
113 | 103 | ||
114 | The following seven ordered modes are supported. The following table | 104 | The following seven ordered modes are supported. The following table |
diff --git a/Documentation/driver-model/platform.txt b/Documentation/driver-model/platform.txt index 19c4a6e13676..2a97320ee17f 100644 --- a/Documentation/driver-model/platform.txt +++ b/Documentation/driver-model/platform.txt | |||
@@ -96,6 +96,46 @@ System setup also associates those clocks with the device, so that that | |||
96 | calls to clk_get(&pdev->dev, clock_name) return them as needed. | 96 | calls to clk_get(&pdev->dev, clock_name) return them as needed. |
97 | 97 | ||
98 | 98 | ||
99 | Legacy Drivers: Device Probing | ||
100 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
101 | Some drivers are not fully converted to the driver model, because they take | ||
102 | on a non-driver role: the driver registers its platform device, rather than | ||
103 | leaving that for system infrastructure. Such drivers can't be hotplugged | ||
104 | or coldplugged, since those mechanisms require device creation to be in a | ||
105 | different system component than the driver. | ||
106 | |||
107 | The only "good" reason for this is to handle older system designs which, like | ||
108 | original IBM PCs, rely on error-prone "probe-the-hardware" models for hardware | ||
109 | configuration. Newer systems have largely abandoned that model, in favor of | ||
110 | bus-level support for dynamic configuration (PCI, USB), or device tables | ||
111 | provided by the boot firmware (e.g. PNPACPI on x86). There are too many | ||
112 | conflicting options about what might be where, and even educated guesses by | ||
113 | an operating system will be wrong often enough to make trouble. | ||
114 | |||
115 | This style of driver is discouraged. If you're updating such a driver, | ||
116 | please try to move the device enumeration to a more appropriate location, | ||
117 | outside the driver. This will usually be cleanup, since such drivers | ||
118 | tend to already have "normal" modes, such as ones using device nodes that | ||
119 | were created by PNP or by platform device setup. | ||
120 | |||
121 | None the less, there are some APIs to support such legacy drivers. Avoid | ||
122 | using these calls except with such hotplug-deficient drivers. | ||
123 | |||
124 | struct platform_device *platform_device_alloc( | ||
125 | char *name, unsigned id); | ||
126 | |||
127 | You can use platform_device_alloc() to dynamically allocate a device, which | ||
128 | you will then initialize with resources and platform_device_register(). | ||
129 | A better solution is usually: | ||
130 | |||
131 | struct platform_device *platform_device_register_simple( | ||
132 | char *name, unsigned id, | ||
133 | struct resource *res, unsigned nres); | ||
134 | |||
135 | You can use platform_device_register_simple() as a one-step call to allocate | ||
136 | and register a device. | ||
137 | |||
138 | |||
99 | Device Naming and Driver Binding | 139 | Device Naming and Driver Binding |
100 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 140 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
101 | The platform_device.dev.bus_id is the canonical name for the devices. | 141 | The platform_device.dev.bus_id is the canonical name for the devices. |
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 2d7ea85075ba..092c65dd35c2 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -49,16 +49,6 @@ Who: Adrian Bunk <bunk@stusta.de> | |||
49 | 49 | ||
50 | --------------------------- | 50 | --------------------------- |
51 | 51 | ||
52 | What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN | ||
53 | When: June 2007 | ||
54 | Why: Deprecated in favour of the more efficient and robust rawiso interface. | ||
55 | Affected are applications which use the deprecated part of libraw1394 | ||
56 | (raw1394_iso_write, raw1394_start_iso_write, raw1394_start_iso_rcv, | ||
57 | raw1394_stop_iso_rcv) or bypass libraw1394. | ||
58 | Who: Dan Dennedy <dan@dennedy.org>, Stefan Richter <stefanr@s5r6.in-berlin.de> | ||
59 | |||
60 | --------------------------- | ||
61 | |||
62 | What: old NCR53C9x driver | 52 | What: old NCR53C9x driver |
63 | When: October 2007 | 53 | When: October 2007 |
64 | Why: Replaced by the much better esp_scsi driver. Actual low-level | 54 | Why: Replaced by the much better esp_scsi driver. Actual low-level |
@@ -70,6 +60,7 @@ Who: David Miller <davem@davemloft.net> | |||
70 | 60 | ||
71 | What: Video4Linux API 1 ioctls and video_decoder.h from Video devices. | 61 | What: Video4Linux API 1 ioctls and video_decoder.h from Video devices. |
72 | When: December 2006 | 62 | When: December 2006 |
63 | Files: include/linux/video_decoder.h | ||
73 | Why: V4L1 AP1 was replaced by V4L2 API. during migration from 2.4 to 2.6 | 64 | Why: V4L1 AP1 was replaced by V4L2 API. during migration from 2.4 to 2.6 |
74 | series. The old API have lots of drawbacks and don't provide enough | 65 | series. The old API have lots of drawbacks and don't provide enough |
75 | means to work with all video and audio standards. The newer API is | 66 | means to work with all video and audio standards. The newer API is |
@@ -103,6 +94,7 @@ Who: Dominik Brodowski <linux@brodo.de> | |||
103 | What: remove EXPORT_SYMBOL(kernel_thread) | 94 | What: remove EXPORT_SYMBOL(kernel_thread) |
104 | When: August 2006 | 95 | When: August 2006 |
105 | Files: arch/*/kernel/*_ksyms.c | 96 | Files: arch/*/kernel/*_ksyms.c |
97 | Funcs: kernel_thread | ||
106 | Why: kernel_thread is a low-level implementation detail. Drivers should | 98 | Why: kernel_thread is a low-level implementation detail. Drivers should |
107 | use the <linux/kthread.h> API instead which shields them from | 99 | use the <linux/kthread.h> API instead which shields them from |
108 | implementation details and provides a higherlevel interface that | 100 | implementation details and provides a higherlevel interface that |
@@ -204,28 +196,6 @@ Who: Adrian Bunk <bunk@stusta.de> | |||
204 | 196 | ||
205 | --------------------------- | 197 | --------------------------- |
206 | 198 | ||
207 | What: ACPI hooks (X86_SPEEDSTEP_CENTRINO_ACPI) in speedstep-centrino driver | ||
208 | When: December 2006 | ||
209 | Why: Speedstep-centrino driver with ACPI hooks and acpi-cpufreq driver are | ||
210 | functionally very much similar. They talk to ACPI in same way. Only | ||
211 | difference between them is the way they do frequency transitions. | ||
212 | One uses MSRs and the other one uses IO ports. Functionaliy of | ||
213 | speedstep_centrino with ACPI hooks is now merged into acpi-cpufreq. | ||
214 | That means one common driver will support all Intel Enhanced Speedstep | ||
215 | capable CPUs. That means less confusion over name of | ||
216 | speedstep-centrino driver (with that driver supposed to be used on | ||
217 | non-centrino platforms). That means less duplication of code and | ||
218 | less maintenance effort and no possibility of these two drivers | ||
219 | going out of sync. | ||
220 | Current users of speedstep_centrino with ACPI hooks are requested to | ||
221 | switch over to acpi-cpufreq driver. speedstep-centrino will continue | ||
222 | to work using older non-ACPI static table based scheme even after this | ||
223 | date. | ||
224 | |||
225 | Who: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> | ||
226 | |||
227 | --------------------------- | ||
228 | |||
229 | What: /sys/firmware/acpi/namespace | 199 | What: /sys/firmware/acpi/namespace |
230 | When: 2.6.21 | 200 | When: 2.6.21 |
231 | Why: The ACPI namespace is effectively the symbol list for | 201 | Why: The ACPI namespace is effectively the symbol list for |
@@ -256,14 +226,6 @@ Who: Len Brown <len.brown@intel.com> | |||
256 | 226 | ||
257 | --------------------------- | 227 | --------------------------- |
258 | 228 | ||
259 | What: sk98lin network driver | ||
260 | When: July 2007 | ||
261 | Why: In kernel tree version of driver is unmaintained. Sk98lin driver | ||
262 | replaced by the skge driver. | ||
263 | Who: Stephen Hemminger <shemminger@osdl.org> | ||
264 | |||
265 | --------------------------- | ||
266 | |||
267 | What: Compaq touchscreen device emulation | 229 | What: Compaq touchscreen device emulation |
268 | When: Oct 2007 | 230 | When: Oct 2007 |
269 | Files: drivers/input/tsdev.c | 231 | Files: drivers/input/tsdev.c |
@@ -278,25 +240,6 @@ Who: Richard Purdie <rpurdie@rpsys.net> | |||
278 | 240 | ||
279 | --------------------------- | 241 | --------------------------- |
280 | 242 | ||
281 | What: Multipath cached routing support in ipv4 | ||
282 | When: in 2.6.23 | ||
283 | Why: Code was merged, then submitter immediately disappeared leaving | ||
284 | us with no maintainer and lots of bugs. The code should not have | ||
285 | been merged in the first place, and many aspects of it's | ||
286 | implementation are blocking more critical core networking | ||
287 | development. It's marked EXPERIMENTAL and no distribution | ||
288 | enables it because it cause obscure crashes due to unfixable bugs | ||
289 | (interfaces don't return errors so memory allocation can't be | ||
290 | handled, calling contexts of these interfaces make handling | ||
291 | errors impossible too because they get called after we've | ||
292 | totally commited to creating a route object, for example). | ||
293 | This problem has existed for years and no forward progress | ||
294 | has ever been made, and nobody steps up to try and salvage | ||
295 | this code, so we're going to finally just get rid of it. | ||
296 | Who: David S. Miller <davem@davemloft.net> | ||
297 | |||
298 | --------------------------- | ||
299 | |||
300 | What: read_dev_chars(), read_conf_data{,_lpm}() (s390 common I/O layer) | 243 | What: read_dev_chars(), read_conf_data{,_lpm}() (s390 common I/O layer) |
301 | When: December 2007 | 244 | When: December 2007 |
302 | Why: These functions are a leftover from 2.4 times. They have several | 245 | Why: These functions are a leftover from 2.4 times. They have several |
@@ -346,3 +289,18 @@ Who: Tejun Heo <htejun@gmail.com> | |||
346 | 289 | ||
347 | --------------------------- | 290 | --------------------------- |
348 | 291 | ||
292 | What: Legacy RTC drivers (under drivers/i2c/chips) | ||
293 | When: November 2007 | ||
294 | Why: Obsolete. We have a RTC subsystem with better drivers. | ||
295 | Who: Jean Delvare <khali@linux-fr.org> | ||
296 | |||
297 | --------------------------- | ||
298 | |||
299 | What: iptables SAME target | ||
300 | When: 1.1. 2008 | ||
301 | Files: net/ipv4/netfilter/ipt_SAME.c, include/linux/netfilter_ipv4/ipt_SAME.h | ||
302 | Why: Obsolete for multiple years now, NAT core provides the same behaviour. | ||
303 | Unfixable broken wrt. 32/64 bit cleanness. | ||
304 | Who: Patrick McHardy <kaber@trash.net> | ||
305 | |||
306 | --------------------------- | ||
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt index 6dd050878a20..145e44086358 100644 --- a/Documentation/filesystems/tmpfs.txt +++ b/Documentation/filesystems/tmpfs.txt | |||
@@ -94,10 +94,10 @@ largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15 | |||
94 | 94 | ||
95 | Note that trying to mount a tmpfs with an mpol option will fail if the | 95 | Note that trying to mount a tmpfs with an mpol option will fail if the |
96 | running kernel does not support NUMA; and will fail if its nodelist | 96 | running kernel does not support NUMA; and will fail if its nodelist |
97 | specifies a node >= MAX_NUMNODES. If your system relies on that tmpfs | 97 | specifies a node which is not online. If your system relies on that |
98 | being mounted, but from time to time runs a kernel built without NUMA | 98 | tmpfs being mounted, but from time to time runs a kernel built without |
99 | capability (perhaps a safe recovery kernel), or configured to support | 99 | NUMA capability (perhaps a safe recovery kernel), or with fewer nodes |
100 | fewer nodes, then it is advisable to omit the mpol option from automatic | 100 | online, then it is advisable to omit the mpol option from automatic |
101 | mount options. It can be added later, when the tmpfs is already mounted | 101 | mount options. It can be added later, when the tmpfs is already mounted |
102 | on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'. | 102 | on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'. |
103 | 103 | ||
@@ -121,4 +121,4 @@ RAM/SWAP in 10240 inodes and it is only accessible by root. | |||
121 | Author: | 121 | Author: |
122 | Christoph Rohland <cr@sap.com>, 1.12.01 | 122 | Christoph Rohland <cr@sap.com>, 1.12.01 |
123 | Updated: | 123 | Updated: |
124 | Hugh Dickins <hugh@veritas.com>, 19 February 2006 | 124 | Hugh Dickins <hugh@veritas.com>, 4 June 2007 |
diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README index e9cc8bb26f7d..c3480aa66ba8 100644 --- a/Documentation/firmware_class/README +++ b/Documentation/firmware_class/README | |||
@@ -1,7 +1,7 @@ | |||
1 | 1 | ||
2 | request_firmware() hotplug interface: | 2 | request_firmware() hotplug interface: |
3 | ------------------------------------ | 3 | ------------------------------------ |
4 | Copyright (C) 2003 Manuel Estrada Sainz <ranty@debian.org> | 4 | Copyright (C) 2003 Manuel Estrada Sainz |
5 | 5 | ||
6 | Why: | 6 | Why: |
7 | --- | 7 | --- |
diff --git a/Documentation/firmware_class/firmware_sample_driver.c b/Documentation/firmware_class/firmware_sample_driver.c index 87feccdb5c9f..6865cbe075ec 100644 --- a/Documentation/firmware_class/firmware_sample_driver.c +++ b/Documentation/firmware_class/firmware_sample_driver.c | |||
@@ -1,7 +1,7 @@ | |||
1 | /* | 1 | /* |
2 | * firmware_sample_driver.c - | 2 | * firmware_sample_driver.c - |
3 | * | 3 | * |
4 | * Copyright (c) 2003 Manuel Estrada Sainz <ranty@debian.org> | 4 | * Copyright (c) 2003 Manuel Estrada Sainz |
5 | * | 5 | * |
6 | * Sample code on how to use request_firmware() from drivers. | 6 | * Sample code on how to use request_firmware() from drivers. |
7 | * | 7 | * |
diff --git a/Documentation/firmware_class/firmware_sample_firmware_class.c b/Documentation/firmware_class/firmware_sample_firmware_class.c index 9e1b0e4051cd..fba943aacf93 100644 --- a/Documentation/firmware_class/firmware_sample_firmware_class.c +++ b/Documentation/firmware_class/firmware_sample_firmware_class.c | |||
@@ -1,7 +1,7 @@ | |||
1 | /* | 1 | /* |
2 | * firmware_sample_firmware_class.c - | 2 | * firmware_sample_firmware_class.c - |
3 | * | 3 | * |
4 | * Copyright (c) 2003 Manuel Estrada Sainz <ranty@debian.org> | 4 | * Copyright (c) 2003 Manuel Estrada Sainz |
5 | * | 5 | * |
6 | * NOTE: This is just a probe of concept, if you think that your driver would | 6 | * NOTE: This is just a probe of concept, if you think that your driver would |
7 | * be well served by this mechanism please contact me first. | 7 | * be well served by this mechanism please contact me first. |
@@ -19,7 +19,7 @@ | |||
19 | #include <linux/firmware.h> | 19 | #include <linux/firmware.h> |
20 | 20 | ||
21 | 21 | ||
22 | MODULE_AUTHOR("Manuel Estrada Sainz <ranty@debian.org>"); | 22 | MODULE_AUTHOR("Manuel Estrada Sainz"); |
23 | MODULE_DESCRIPTION("Hackish sample for using firmware class directly"); | 23 | MODULE_DESCRIPTION("Hackish sample for using firmware class directly"); |
24 | MODULE_LICENSE("GPL"); | 24 | MODULE_LICENSE("GPL"); |
25 | 25 | ||
@@ -78,6 +78,7 @@ static CLASS_DEVICE_ATTR(loading, 0644, | |||
78 | firmware_loading_show, firmware_loading_store); | 78 | firmware_loading_show, firmware_loading_store); |
79 | 79 | ||
80 | static ssize_t firmware_data_read(struct kobject *kobj, | 80 | static ssize_t firmware_data_read(struct kobject *kobj, |
81 | struct bin_attribute *bin_attr, | ||
81 | char *buffer, loff_t offset, size_t count) | 82 | char *buffer, loff_t offset, size_t count) |
82 | { | 83 | { |
83 | struct class_device *class_dev = to_class_dev(kobj); | 84 | struct class_device *class_dev = to_class_dev(kobj); |
@@ -88,6 +89,7 @@ static ssize_t firmware_data_read(struct kobject *kobj, | |||
88 | return count; | 89 | return count; |
89 | } | 90 | } |
90 | static ssize_t firmware_data_write(struct kobject *kobj, | 91 | static ssize_t firmware_data_write(struct kobject *kobj, |
92 | struct bin_attribute *bin_attr, | ||
91 | char *buffer, loff_t offset, size_t count) | 93 | char *buffer, loff_t offset, size_t count) |
92 | { | 94 | { |
93 | struct class_device *class_dev = to_class_dev(kobj); | 95 | struct class_device *class_dev = to_class_dev(kobj); |
diff --git a/Documentation/hrtimer/timer_stats.txt b/Documentation/hrtimer/timer_stats.txt index 27f782e3593f..22b0814d0ad0 100644 --- a/Documentation/hrtimer/timer_stats.txt +++ b/Documentation/hrtimer/timer_stats.txt | |||
@@ -2,9 +2,10 @@ timer_stats - timer usage statistics | |||
2 | ------------------------------------ | 2 | ------------------------------------ |
3 | 3 | ||
4 | timer_stats is a debugging facility to make the timer (ab)usage in a Linux | 4 | timer_stats is a debugging facility to make the timer (ab)usage in a Linux |
5 | system visible to kernel and userspace developers. It is not intended for | 5 | system visible to kernel and userspace developers. If enabled in the config |
6 | production usage as it adds significant overhead to the (hr)timer code and the | 6 | but not used it has almost zero runtime overhead, and a relatively small |
7 | (hr)timer data structures. | 7 | data structure overhead. Even if collection is enabled runtime all the |
8 | locking is per-CPU and lookup is hashed. | ||
8 | 9 | ||
9 | timer_stats should be used by kernel and userspace developers to verify that | 10 | timer_stats should be used by kernel and userspace developers to verify that |
10 | their code does not make unduly use of timers. This helps to avoid unnecessary | 11 | their code does not make unduly use of timers. This helps to avoid unnecessary |
diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801 index c34f0db78a30..fe6406f2f9a6 100644 --- a/Documentation/i2c/busses/i2c-i801 +++ b/Documentation/i2c/busses/i2c-i801 | |||
@@ -5,8 +5,8 @@ Supported adapters: | |||
5 | '810' and '810E' chipsets) | 5 | '810' and '810E' chipsets) |
6 | * Intel 82801BA (ICH2 - part of the '815E' chipset) | 6 | * Intel 82801BA (ICH2 - part of the '815E' chipset) |
7 | * Intel 82801CA/CAM (ICH3) | 7 | * Intel 82801CA/CAM (ICH3) |
8 | * Intel 82801DB (ICH4) (HW PEC supported, 32 byte buffer not supported) | 8 | * Intel 82801DB (ICH4) (HW PEC supported) |
9 | * Intel 82801EB/ER (ICH5) (HW PEC supported, 32 byte buffer not supported) | 9 | * Intel 82801EB/ER (ICH5) (HW PEC supported) |
10 | * Intel 6300ESB | 10 | * Intel 6300ESB |
11 | * Intel 82801FB/FR/FW/FRW (ICH6) | 11 | * Intel 82801FB/FR/FW/FRW (ICH6) |
12 | * Intel 82801G (ICH7) | 12 | * Intel 82801G (ICH7) |
diff --git a/Documentation/i2c/busses/i2c-piix4 b/Documentation/i2c/busses/i2c-piix4 index 7cbe43fa2701..fa0c786a8bf5 100644 --- a/Documentation/i2c/busses/i2c-piix4 +++ b/Documentation/i2c/busses/i2c-piix4 | |||
@@ -6,7 +6,7 @@ Supported adapters: | |||
6 | Datasheet: Publicly available at the Intel website | 6 | Datasheet: Publicly available at the Intel website |
7 | * ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges | 7 | * ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges |
8 | Datasheet: Only available via NDA from ServerWorks | 8 | Datasheet: Only available via NDA from ServerWorks |
9 | * ATI IXP200, IXP300, IXP400 and SB600 southbridges | 9 | * ATI IXP200, IXP300, IXP400, SB600 and SB700 southbridges |
10 | Datasheet: Not publicly available | 10 | Datasheet: Not publicly available |
11 | * Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge | 11 | * Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge |
12 | Datasheet: Publicly available at the SMSC website http://www.smsc.com | 12 | Datasheet: Publicly available at the SMSC website http://www.smsc.com |
diff --git a/Documentation/i2c/busses/i2c-taos-evm b/Documentation/i2c/busses/i2c-taos-evm new file mode 100644 index 000000000000..9146e33be6dd --- /dev/null +++ b/Documentation/i2c/busses/i2c-taos-evm | |||
@@ -0,0 +1,46 @@ | |||
1 | Kernel driver i2c-taos-evm | ||
2 | |||
3 | Author: Jean Delvare <khali@linux-fr.org> | ||
4 | |||
5 | This is a driver for the evaluation modules for TAOS I2C/SMBus chips. | ||
6 | The modules include an SMBus master with limited capabilities, which can | ||
7 | be controlled over the serial port. Virtually all evaluation modules | ||
8 | are supported, but a few lines of code need to be added for each new | ||
9 | module to instantiate the right I2C chip on the bus. Obviously, a driver | ||
10 | for the chip in question is also needed. | ||
11 | |||
12 | Currently supported devices are: | ||
13 | |||
14 | * TAOS TSL2550 EVM | ||
15 | |||
16 | For addtional information on TAOS products, please see | ||
17 | http://www.taosinc.com/ | ||
18 | |||
19 | |||
20 | Using this driver | ||
21 | ----------------- | ||
22 | |||
23 | In order to use this driver, you'll need the serport driver, and the | ||
24 | inputattach tool, which is part of the input-utils package. The following | ||
25 | commands will tell the kernel that you have a TAOS EVM on the first | ||
26 | serial port: | ||
27 | |||
28 | # modprobe serport | ||
29 | # inputattach --taos-evm /dev/ttyS0 | ||
30 | |||
31 | |||
32 | Technical details | ||
33 | ----------------- | ||
34 | |||
35 | Only 4 SMBus transaction types are supported by the TAOS evaluation | ||
36 | modules: | ||
37 | * Receive Byte | ||
38 | * Send Byte | ||
39 | * Read Byte | ||
40 | * Write Byte | ||
41 | |||
42 | The communication protocol is text-based and pretty simple. It is | ||
43 | described in a PDF document on the CD which comes with the evaluation | ||
44 | module. The communication is rather slow, because the serial port has | ||
45 | to operate at 1200 bps. However, I don't think this is a big concern in | ||
46 | practice, as these modules are meant for evaluation and testing only. | ||
diff --git a/Documentation/i2c/chips/max6875 b/Documentation/i2c/chips/max6875 index 96fec562a8e9..a0cd8af2f408 100644 --- a/Documentation/i2c/chips/max6875 +++ b/Documentation/i2c/chips/max6875 | |||
@@ -99,7 +99,7 @@ And then read the data | |||
99 | 99 | ||
100 | or | 100 | or |
101 | 101 | ||
102 | count = i2c_smbus_read_i2c_block_data(fd, 0x84, buffer); | 102 | count = i2c_smbus_read_i2c_block_data(fd, 0x84, 16, buffer); |
103 | 103 | ||
104 | The block read should read 16 bytes. | 104 | The block read should read 16 bytes. |
105 | 0x84 is the block read command. | 105 | 0x84 is the block read command. |
diff --git a/Documentation/i2c/chips/x1205 b/Documentation/i2c/chips/x1205 deleted file mode 100644 index 09407c991fe5..000000000000 --- a/Documentation/i2c/chips/x1205 +++ /dev/null | |||
@@ -1,38 +0,0 @@ | |||
1 | Kernel driver x1205 | ||
2 | =================== | ||
3 | |||
4 | Supported chips: | ||
5 | * Xicor X1205 RTC | ||
6 | Prefix: 'x1205' | ||
7 | Addresses scanned: none | ||
8 | Datasheet: http://www.intersil.com/cda/deviceinfo/0,1477,X1205,00.html | ||
9 | |||
10 | Authors: | ||
11 | Karen Spearel <kas11@tampabay.rr.com>, | ||
12 | Alessandro Zummo <a.zummo@towertech.it> | ||
13 | |||
14 | Description | ||
15 | ----------- | ||
16 | |||
17 | This module aims to provide complete access to the Xicor X1205 RTC. | ||
18 | Recently Xicor has merged with Intersil, but the chip is | ||
19 | still sold under the Xicor brand. | ||
20 | |||
21 | This chip is located at address 0x6f and uses a 2-byte register addressing. | ||
22 | Two bytes need to be written to read a single register, while most | ||
23 | other chips just require one and take the second one as the data | ||
24 | to be written. To prevent corrupting unknown chips, the user must | ||
25 | explicitely set the probe parameter. | ||
26 | |||
27 | example: | ||
28 | |||
29 | modprobe x1205 probe=0,0x6f | ||
30 | |||
31 | The module supports one more option, hctosys, which is used to set the | ||
32 | software clock from the x1205. On systems where the x1205 is the | ||
33 | only hardware rtc, this parameter could be used to achieve a correct | ||
34 | date/time earlier in the system boot sequence. | ||
35 | |||
36 | example: | ||
37 | |||
38 | modprobe x1205 probe=0,0x6f hctosys=1 | ||
diff --git a/Documentation/i2c/summary b/Documentation/i2c/summary index aea60bf7e8f0..003c7319b8c7 100644 --- a/Documentation/i2c/summary +++ b/Documentation/i2c/summary | |||
@@ -67,7 +67,6 @@ i2c-proc: The /proc/sys/dev/sensors interface for device (client) drivers | |||
67 | Algorithm drivers | 67 | Algorithm drivers |
68 | ----------------- | 68 | ----------------- |
69 | 69 | ||
70 | i2c-algo-8xx: An algorithm for CPM's I2C device in Motorola 8xx processors (NOT BUILT BY DEFAULT) | ||
71 | i2c-algo-bit: A bit-banging algorithm | 70 | i2c-algo-bit: A bit-banging algorithm |
72 | i2c-algo-pcf: A PCF 8584 style algorithm | 71 | i2c-algo-pcf: A PCF 8584 style algorithm |
73 | i2c-algo-ibm_ocp: An algorithm for the I2C device in IBM 4xx processors (NOT BUILT BY DEFAULT) | 72 | i2c-algo-ibm_ocp: An algorithm for the I2C device in IBM 4xx processors (NOT BUILT BY DEFAULT) |
@@ -81,6 +80,5 @@ i2c-pcf-epp: PCF8584 on a EPP parallel port (uses i2c-algo-pcf) (NOT mkpatch | |||
81 | i2c-philips-par: Philips style parallel port adapter (uses i2c-algo-bit) | 80 | i2c-philips-par: Philips style parallel port adapter (uses i2c-algo-bit) |
82 | i2c-adap-ibm_ocp: IBM 4xx processor I2C device (uses i2c-algo-ibm_ocp) (NOT BUILT BY DEFAULT) | 81 | i2c-adap-ibm_ocp: IBM 4xx processor I2C device (uses i2c-algo-ibm_ocp) (NOT BUILT BY DEFAULT) |
83 | i2c-pport: Primitive parallel port adapter (uses i2c-algo-bit) | 82 | i2c-pport: Primitive parallel port adapter (uses i2c-algo-bit) |
84 | i2c-rpx: RPX board Motorola 8xx I2C device (uses i2c-algo-8xx) (NOT BUILT BY DEFAULT) | ||
85 | i2c-velleman: Velleman K8000 parallel port adapter (uses i2c-algo-bit) | 83 | i2c-velleman: Velleman K8000 parallel port adapter (uses i2c-algo-bit) |
86 | 84 | ||
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients index 3d8d36b0ad12..2c170032bf37 100644 --- a/Documentation/i2c/writing-clients +++ b/Documentation/i2c/writing-clients | |||
@@ -571,7 +571,7 @@ SMBus communication | |||
571 | u8 command, u8 length, | 571 | u8 command, u8 length, |
572 | u8 *values); | 572 | u8 *values); |
573 | extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client, | 573 | extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client, |
574 | u8 command, u8 *values); | 574 | u8 command, u8 length, u8 *values); |
575 | 575 | ||
576 | These ones were removed in Linux 2.6.10 because they had no users, but could | 576 | These ones were removed in Linux 2.6.10 because they had no users, but could |
577 | be added back later if needed: | 577 | be added back later if needed: |
diff --git a/Documentation/i386/zero-page.txt b/Documentation/i386/zero-page.txt index c04a421f4a7c..75b3680c41eb 100644 --- a/Documentation/i386/zero-page.txt +++ b/Documentation/i386/zero-page.txt | |||
@@ -37,6 +37,7 @@ Offset Type Description | |||
37 | 0x1d0 unsigned long EFI memory descriptor map pointer | 37 | 0x1d0 unsigned long EFI memory descriptor map pointer |
38 | 0x1d4 unsigned long EFI memory descriptor map size | 38 | 0x1d4 unsigned long EFI memory descriptor map size |
39 | 0x1e0 unsigned long ALT_MEM_K, alternative mem check, in Kb | 39 | 0x1e0 unsigned long ALT_MEM_K, alternative mem check, in Kb |
40 | 0x1e4 unsigned long Scratch field for the kernel setup code | ||
40 | 0x1e8 char number of entries in E820MAP (below) | 41 | 0x1e8 char number of entries in E820MAP (below) |
41 | 0x1e9 unsigned char number of entries in EDDBUF (below) | 42 | 0x1e9 unsigned char number of entries in EDDBUF (below) |
42 | 0x1ea unsigned char number of entries in EDD_MBR_SIG_BUFFER (below) | 43 | 0x1ea unsigned char number of entries in EDD_MBR_SIG_BUFFER (below) |
diff --git a/Documentation/ia64/aliasing-test.c b/Documentation/ia64/aliasing-test.c index 3153167b41c3..773a814d4093 100644 --- a/Documentation/ia64/aliasing-test.c +++ b/Documentation/ia64/aliasing-test.c | |||
@@ -19,6 +19,7 @@ | |||
19 | #include <sys/mman.h> | 19 | #include <sys/mman.h> |
20 | #include <sys/stat.h> | 20 | #include <sys/stat.h> |
21 | #include <unistd.h> | 21 | #include <unistd.h> |
22 | #include <linux/pci.h> | ||
22 | 23 | ||
23 | int sum; | 24 | int sum; |
24 | 25 | ||
@@ -34,13 +35,19 @@ int map_mem(char *path, off_t offset, size_t length, int touch) | |||
34 | return -1; | 35 | return -1; |
35 | } | 36 | } |
36 | 37 | ||
38 | if (fnmatch("/proc/bus/pci/*", path, 0) == 0) { | ||
39 | rc = ioctl(fd, PCIIOC_MMAP_IS_MEM); | ||
40 | if (rc == -1) | ||
41 | perror("PCIIOC_MMAP_IS_MEM ioctl"); | ||
42 | } | ||
43 | |||
37 | addr = mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, offset); | 44 | addr = mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, offset); |
38 | if (addr == MAP_FAILED) | 45 | if (addr == MAP_FAILED) |
39 | return 1; | 46 | return 1; |
40 | 47 | ||
41 | if (touch) { | 48 | if (touch) { |
42 | c = (int *) addr; | 49 | c = (int *) addr; |
43 | while (c < (int *) (offset + length)) | 50 | while (c < (int *) (addr + length)) |
44 | sum += *c++; | 51 | sum += *c++; |
45 | } | 52 | } |
46 | 53 | ||
@@ -54,7 +61,7 @@ int map_mem(char *path, off_t offset, size_t length, int touch) | |||
54 | return 0; | 61 | return 0; |
55 | } | 62 | } |
56 | 63 | ||
57 | int scan_sysfs(char *path, char *file, off_t offset, size_t length, int touch) | 64 | int scan_tree(char *path, char *file, off_t offset, size_t length, int touch) |
58 | { | 65 | { |
59 | struct dirent **namelist; | 66 | struct dirent **namelist; |
60 | char *name, *path2; | 67 | char *name, *path2; |
@@ -93,7 +100,7 @@ int scan_sysfs(char *path, char *file, off_t offset, size_t length, int touch) | |||
93 | } else { | 100 | } else { |
94 | r = lstat(path2, &buf); | 101 | r = lstat(path2, &buf); |
95 | if (r == 0 && S_ISDIR(buf.st_mode)) { | 102 | if (r == 0 && S_ISDIR(buf.st_mode)) { |
96 | rc = scan_sysfs(path2, file, offset, length, touch); | 103 | rc = scan_tree(path2, file, offset, length, touch); |
97 | if (rc < 0) | 104 | if (rc < 0) |
98 | return rc; | 105 | return rc; |
99 | } | 106 | } |
@@ -197,7 +204,7 @@ skip: | |||
197 | return rc; | 204 | return rc; |
198 | } | 205 | } |
199 | 206 | ||
200 | main() | 207 | int main() |
201 | { | 208 | { |
202 | int rc; | 209 | int rc; |
203 | 210 | ||
@@ -238,10 +245,15 @@ main() | |||
238 | else | 245 | else |
239 | fprintf(stderr, "FAIL: /dev/mem 0x0-0x100000 not accessible\n"); | 246 | fprintf(stderr, "FAIL: /dev/mem 0x0-0x100000 not accessible\n"); |
240 | 247 | ||
241 | scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0, 0xA0000, 1); | 248 | scan_tree("/sys/class/pci_bus", "legacy_mem", 0, 0xA0000, 1); |
242 | scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0xA0000, 0x20000, 0); | 249 | scan_tree("/sys/class/pci_bus", "legacy_mem", 0xA0000, 0x20000, 0); |
243 | scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0xC0000, 0x40000, 1); | 250 | scan_tree("/sys/class/pci_bus", "legacy_mem", 0xC0000, 0x40000, 1); |
244 | scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0, 1024*1024, 0); | 251 | scan_tree("/sys/class/pci_bus", "legacy_mem", 0, 1024*1024, 0); |
245 | 252 | ||
246 | scan_rom("/sys/devices", "rom"); | 253 | scan_rom("/sys/devices", "rom"); |
254 | |||
255 | scan_tree("/proc/bus/pci", "??.?", 0, 0xA0000, 1); | ||
256 | scan_tree("/proc/bus/pci", "??.?", 0xA0000, 0x20000, 0); | ||
257 | scan_tree("/proc/bus/pci", "??.?", 0xC0000, 0x40000, 1); | ||
258 | scan_tree("/proc/bus/pci", "??.?", 0, 1024*1024, 0); | ||
247 | } | 259 | } |
diff --git a/Documentation/ia64/aliasing.txt b/Documentation/ia64/aliasing.txt index 9a431a7d0f5d..aa3e953f0f7b 100644 --- a/Documentation/ia64/aliasing.txt +++ b/Documentation/ia64/aliasing.txt | |||
@@ -112,6 +112,18 @@ POTENTIAL ATTRIBUTE ALIASING CASES | |||
112 | 112 | ||
113 | The /dev/mem mmap constraints apply. | 113 | The /dev/mem mmap constraints apply. |
114 | 114 | ||
115 | mmap of /proc/bus/pci/.../??.? | ||
116 | |||
117 | This is an MMIO mmap of PCI functions, which additionally may or | ||
118 | may not be requested as using the WC attribute. | ||
119 | |||
120 | If WC is requested, and the region in kern_memmap is either WC | ||
121 | or UC, and the EFI memory map designates the region as WC, then | ||
122 | the WC mapping is allowed. | ||
123 | |||
124 | Otherwise, the user mapping must use the same attribute as the | ||
125 | kernel mapping. | ||
126 | |||
115 | read/write of /dev/mem | 127 | read/write of /dev/mem |
116 | 128 | ||
117 | This uses copy_from_user(), which implicitly uses a kernel | 129 | This uses copy_from_user(), which implicitly uses a kernel |
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index aae2282600ca..4d880b3d1f35 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -170,7 +170,10 @@ and is between 256 and 4096 characters. It is defined in the file | |||
170 | acpi_os_name= [HW,ACPI] Tell ACPI BIOS the name of the OS | 170 | acpi_os_name= [HW,ACPI] Tell ACPI BIOS the name of the OS |
171 | Format: To spoof as Windows 98: ="Microsoft Windows" | 171 | Format: To spoof as Windows 98: ="Microsoft Windows" |
172 | 172 | ||
173 | acpi_osi= [HW,ACPI] empty param disables _OSI | 173 | acpi_osi= [HW,ACPI] Modify list of supported OS interface strings |
174 | acpi_osi="string1" # add string1 -- only one string | ||
175 | acpi_osi="!string2" # remove built-in string2 | ||
176 | acpi_osi= # disable all strings | ||
174 | 177 | ||
175 | acpi_serialize [HW,ACPI] force serialization of AML methods | 178 | acpi_serialize [HW,ACPI] force serialization of AML methods |
176 | 179 | ||
@@ -220,11 +223,6 @@ and is between 256 and 4096 characters. It is defined in the file | |||
220 | 223 | ||
221 | acpi_fake_ecdt [HW,ACPI] Workaround failure due to BIOS lacking ECDT | 224 | acpi_fake_ecdt [HW,ACPI] Workaround failure due to BIOS lacking ECDT |
222 | 225 | ||
223 | acpi_generic_hotkey [HW,ACPI] | ||
224 | Allow consolidated generic hotkey driver to | ||
225 | override platform specific driver. | ||
226 | See also Documentation/acpi-hotkey.txt. | ||
227 | |||
228 | acpi_pm_good [IA-32,X86-64] | 226 | acpi_pm_good [IA-32,X86-64] |
229 | Override the pmtimer bug detection: force the kernel | 227 | Override the pmtimer bug detection: force the kernel |
230 | to assume that this machine's pmtimer latches its value | 228 | to assume that this machine's pmtimer latches its value |
@@ -1016,49 +1014,6 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1016 | 1014 | ||
1017 | mga= [HW,DRM] | 1015 | mga= [HW,DRM] |
1018 | 1016 | ||
1019 | migration_cost= | ||
1020 | [KNL,SMP] debug: override scheduler migration costs | ||
1021 | Format: <level-1-usecs>,<level-2-usecs>,... | ||
1022 | This debugging option can be used to override the | ||
1023 | default scheduler migration cost matrix. The numbers | ||
1024 | are indexed by 'CPU domain distance'. | ||
1025 | E.g. migration_cost=1000,2000,3000 on an SMT NUMA | ||
1026 | box will set up an intra-core migration cost of | ||
1027 | 1 msec, an inter-core migration cost of 2 msecs, | ||
1028 | and an inter-node migration cost of 3 msecs. | ||
1029 | |||
1030 | WARNING: using the wrong values here can break | ||
1031 | scheduler performance, so it's only for scheduler | ||
1032 | development purposes, not production environments. | ||
1033 | |||
1034 | migration_debug= | ||
1035 | [KNL,SMP] migration cost auto-detect verbosity | ||
1036 | Format=<0|1|2> | ||
1037 | If a system's migration matrix reported at bootup | ||
1038 | seems erroneous then this option can be used to | ||
1039 | increase verbosity of the detection process. | ||
1040 | We default to 0 (no extra messages), 1 will print | ||
1041 | some more information, and 2 will be really | ||
1042 | verbose (probably only useful if you also have a | ||
1043 | serial console attached to the system). | ||
1044 | |||
1045 | migration_factor= | ||
1046 | [KNL,SMP] multiply/divide migration costs by a factor | ||
1047 | Format=<percent> | ||
1048 | This debug option can be used to proportionally | ||
1049 | increase or decrease the auto-detected migration | ||
1050 | costs for all entries of the migration matrix. | ||
1051 | E.g. migration_factor=150 will increase migration | ||
1052 | costs by 50%. (and thus the scheduler will be less | ||
1053 | eager migrating cache-hot tasks) | ||
1054 | migration_factor=80 will decrease migration costs | ||
1055 | by 20%. (thus the scheduler will be more eager to | ||
1056 | migrate tasks) | ||
1057 | |||
1058 | WARNING: using the wrong values here can break | ||
1059 | scheduler performance, so it's only for scheduler | ||
1060 | development purposes, not production environments. | ||
1061 | |||
1062 | mousedev.tap_time= | 1017 | mousedev.tap_time= |
1063 | [MOUSE] Maximum time between finger touching and | 1018 | [MOUSE] Maximum time between finger touching and |
1064 | leaving touchpad surface for touch to be considered | 1019 | leaving touchpad surface for touch to be considered |
@@ -1132,9 +1087,9 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1132 | when set. | 1087 | when set. |
1133 | Format: <int> | 1088 | Format: <int> |
1134 | 1089 | ||
1135 | noaliencache [MM, NUMA] Disables the allcoation of alien caches in | 1090 | noaliencache [MM, NUMA, SLAB] Disables the allocation of alien |
1136 | the slab allocator. Saves per-node memory, but will | 1091 | caches in the slab allocator. Saves per-node memory, |
1137 | impact performance on real NUMA hardware. | 1092 | but will impact performance. |
1138 | 1093 | ||
1139 | noalign [KNL,ARM] | 1094 | noalign [KNL,ARM] |
1140 | 1095 | ||
@@ -1613,6 +1568,37 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1613 | 1568 | ||
1614 | slram= [HW,MTD] | 1569 | slram= [HW,MTD] |
1615 | 1570 | ||
1571 | slub_debug [MM, SLUB] | ||
1572 | Enabling slub_debug allows one to determine the culprit | ||
1573 | if slab objects become corrupted. Enabling slub_debug | ||
1574 | creates guard zones around objects and poisons objects | ||
1575 | when not in use. Also tracks the last alloc / free. | ||
1576 | For more information see Documentation/vm/slub.txt. | ||
1577 | |||
1578 | slub_max_order= [MM, SLUB] | ||
1579 | Determines the maximum allowed order for slabs. Setting | ||
1580 | this too high may cause fragmentation. | ||
1581 | For more information see Documentation/vm/slub.txt. | ||
1582 | |||
1583 | slub_min_objects= [MM, SLUB] | ||
1584 | The minimum objects per slab. SLUB will increase the | ||
1585 | slab order up to slub_max_order to generate a | ||
1586 | sufficiently big slab to satisfy the number of objects. | ||
1587 | The higher the number of objects the smaller the overhead | ||
1588 | of tracking slabs. | ||
1589 | For more information see Documentation/vm/slub.txt. | ||
1590 | |||
1591 | slub_min_order= [MM, SLUB] | ||
1592 | Determines the mininum page order for slabs. Must be | ||
1593 | lower than slub_max_order | ||
1594 | For more information see Documentation/vm/slub.txt. | ||
1595 | |||
1596 | slub_nomerge [MM, SLUB] | ||
1597 | Disable merging of slabs of similar size. May be | ||
1598 | necessary if there is some reason to distinguish | ||
1599 | allocs to different slabs. | ||
1600 | For more information see Documentation/vm/slub.txt. | ||
1601 | |||
1616 | smart2= [HW] | 1602 | smart2= [HW] |
1617 | Format: <io1>[,<io2>[,...,<io8>]] | 1603 | Format: <io1>[,<io2>[,...,<io8>]] |
1618 | 1604 | ||
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index e06b6e3c1db5..d63f480afb74 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX | |||
@@ -32,6 +32,8 @@ cops.txt | |||
32 | - info on the COPS LocalTalk Linux driver | 32 | - info on the COPS LocalTalk Linux driver |
33 | cs89x0.txt | 33 | cs89x0.txt |
34 | - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver | 34 | - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver |
35 | cxacru.txt | ||
36 | - Conexant AccessRunner USB ADSL Modem | ||
35 | de4x5.txt | 37 | de4x5.txt |
36 | - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver | 38 | - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver |
37 | decnet.txt | 39 | decnet.txt |
@@ -94,9 +96,6 @@ routing.txt | |||
94 | - the new routing mechanism | 96 | - the new routing mechanism |
95 | shaper.txt | 97 | shaper.txt |
96 | - info on the module that can shape/limit transmitted traffic. | 98 | - info on the module that can shape/limit transmitted traffic. |
97 | sk98lin.txt | ||
98 | - Marvell Yukon Chipset / SysKonnect SK-98xx compliant Gigabit | ||
99 | Ethernet Adapter family driver info | ||
100 | skfp.txt | 99 | skfp.txt |
101 | - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. | 100 | - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. |
102 | smc9.txt | 101 | smc9.txt |
diff --git a/Documentation/networking/cxacru.txt b/Documentation/networking/cxacru.txt new file mode 100644 index 000000000000..b074681a963e --- /dev/null +++ b/Documentation/networking/cxacru.txt | |||
@@ -0,0 +1,84 @@ | |||
1 | Firmware is required for this device: http://accessrunner.sourceforge.net/ | ||
2 | |||
3 | While it is capable of managing/maintaining the ADSL connection without the | ||
4 | module loaded, the device will sometimes stop responding after unloading the | ||
5 | driver and it is necessary to unplug/remove power to the device to fix this. | ||
6 | |||
7 | Detected devices will appear as ATM devices named "cxacru". In /sys/class/atm/ | ||
8 | these are directories named cxacruN where N is the device number. A symlink | ||
9 | named device points to the USB interface device's directory which contains | ||
10 | several sysfs attribute files for retrieving device statistics: | ||
11 | |||
12 | * adsl_controller_version | ||
13 | |||
14 | * adsl_headend | ||
15 | * adsl_headend_environment | ||
16 | Information about the remote headend. | ||
17 | |||
18 | * downstream_attenuation (dB) | ||
19 | * downstream_bits_per_frame | ||
20 | * downstream_rate (kbps) | ||
21 | * downstream_snr_margin (dB) | ||
22 | Downstream stats. | ||
23 | |||
24 | * upstream_attenuation (dB) | ||
25 | * upstream_bits_per_frame | ||
26 | * upstream_rate (kbps) | ||
27 | * upstream_snr_margin (dB) | ||
28 | * transmitter_power (dBm/Hz) | ||
29 | Upstream stats. | ||
30 | |||
31 | * downstream_crc_errors | ||
32 | * downstream_fec_errors | ||
33 | * downstream_hec_errors | ||
34 | * upstream_crc_errors | ||
35 | * upstream_fec_errors | ||
36 | * upstream_hec_errors | ||
37 | Error counts. | ||
38 | |||
39 | * line_startable | ||
40 | Indicates that ADSL support on the device | ||
41 | is/can be enabled, see adsl_start. | ||
42 | |||
43 | * line_status | ||
44 | "initialising" | ||
45 | "down" | ||
46 | "attempting to activate" | ||
47 | "training" | ||
48 | "channel analysis" | ||
49 | "exchange" | ||
50 | "waiting" | ||
51 | "up" | ||
52 | |||
53 | Changes between "down" and "attempting to activate" | ||
54 | if there is no signal. | ||
55 | |||
56 | * link_status | ||
57 | "not connected" | ||
58 | "connected" | ||
59 | "lost" | ||
60 | |||
61 | * mac_address | ||
62 | |||
63 | * modulation | ||
64 | "ANSI T1.413" | ||
65 | "ITU-T G.992.1 (G.DMT)" | ||
66 | "ITU-T G.992.2 (G.LITE)" | ||
67 | |||
68 | * startup_attempts | ||
69 | Count of total attempts to initialise ADSL. | ||
70 | |||
71 | To enable/disable ADSL, the following can be written to the adsl_state file: | ||
72 | "start" | ||
73 | "stop | ||
74 | "restart" (stops, waits 1.5s, then starts) | ||
75 | "poll" (used to resume status polling if it was disabled due to failure) | ||
76 | |||
77 | Changes in adsl/line state are reported via kernel log messages: | ||
78 | [4942145.150704] ATM dev 0: ADSL state: running | ||
79 | [4942243.663766] ATM dev 0: ADSL line: down | ||
80 | [4942249.665075] ATM dev 0: ADSL line: attempting to activate | ||
81 | [4942253.654954] ATM dev 0: ADSL line: training | ||
82 | [4942255.666387] ATM dev 0: ADSL line: channel analysis | ||
83 | [4942259.656262] ATM dev 0: ADSL line: exchange | ||
84 | [2635357.696901] ATM dev 0: ADSL line: up (8128 kb/s down | 832 kb/s up) | ||
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index af6a63ab9026..32c2e9da5f3a 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt | |||
@@ -433,6 +433,12 @@ tcp_workaround_signed_windows - BOOLEAN | |||
433 | not receive a window scaling option from them. | 433 | not receive a window scaling option from them. |
434 | Default: 0 | 434 | Default: 0 |
435 | 435 | ||
436 | tcp_dma_copybreak - INTEGER | ||
437 | Lower limit, in bytes, of the size of socket reads that will be | ||
438 | offloaded to a DMA copy engine, if one is present in the system | ||
439 | and CONFIG_NET_DMA is enabled. | ||
440 | Default: 4096 | ||
441 | |||
436 | CIPSOv4 Variables: | 442 | CIPSOv4 Variables: |
437 | 443 | ||
438 | cipso_cache_enable - BOOLEAN | 444 | cipso_cache_enable - BOOLEAN |
@@ -874,8 +880,7 @@ accept_redirects - BOOLEAN | |||
874 | accept_source_route - INTEGER | 880 | accept_source_route - INTEGER |
875 | Accept source routing (routing extension header). | 881 | Accept source routing (routing extension header). |
876 | 882 | ||
877 | > 0: Accept routing header. | 883 | >= 0: Accept only routing header type 2. |
878 | = 0: Accept only routing header type 2. | ||
879 | < 0: Do not accept routing header. | 884 | < 0: Do not accept routing header. |
880 | 885 | ||
881 | Default: 0 | 886 | Default: 0 |
diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt new file mode 100644 index 000000000000..2451f551c505 --- /dev/null +++ b/Documentation/networking/l2tp.txt | |||
@@ -0,0 +1,169 @@ | |||
1 | This brief document describes how to use the kernel's PPPoL2TP driver | ||
2 | to provide L2TP functionality. L2TP is a protocol that tunnels one or | ||
3 | more PPP sessions over a UDP tunnel. It is commonly used for VPNs | ||
4 | (L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP | ||
5 | network infrastructure. | ||
6 | |||
7 | Design | ||
8 | ====== | ||
9 | |||
10 | The PPPoL2TP driver, drivers/net/pppol2tp.c, provides a mechanism by | ||
11 | which PPP frames carried through an L2TP session are passed through | ||
12 | the kernel's PPP subsystem. The standard PPP daemon, pppd, handles all | ||
13 | PPP interaction with the peer. PPP network interfaces are created for | ||
14 | each local PPP endpoint. | ||
15 | |||
16 | The L2TP protocol http://www.faqs.org/rfcs/rfc2661.html defines L2TP | ||
17 | control and data frames. L2TP control frames carry messages between | ||
18 | L2TP clients/servers and are used to setup / teardown tunnels and | ||
19 | sessions. An L2TP client or server is implemented in userspace and | ||
20 | will use a regular UDP socket per tunnel. L2TP data frames carry PPP | ||
21 | frames, which may be PPP control or PPP data. The kernel's PPP | ||
22 | subsystem arranges for PPP control frames to be delivered to pppd, | ||
23 | while data frames are forwarded as usual. | ||
24 | |||
25 | Each tunnel and session within a tunnel is assigned a unique tunnel_id | ||
26 | and session_id. These ids are carried in the L2TP header of every | ||
27 | control and data packet. The pppol2tp driver uses them to lookup | ||
28 | internal tunnel and/or session contexts. Zero tunnel / session ids are | ||
29 | treated specially - zero ids are never assigned to tunnels or sessions | ||
30 | in the network. In the driver, the tunnel context keeps a pointer to | ||
31 | the tunnel UDP socket. The session context keeps a pointer to the | ||
32 | PPPoL2TP socket, as well as other data that lets the driver interface | ||
33 | to the kernel PPP subsystem. | ||
34 | |||
35 | Note that the pppol2tp kernel driver handles only L2TP data frames; | ||
36 | L2TP control frames are simply passed up to userspace in the UDP | ||
37 | tunnel socket. The kernel handles all datapath aspects of the | ||
38 | protocol, including data packet resequencing (if enabled). | ||
39 | |||
40 | There are a number of requirements on the userspace L2TP daemon in | ||
41 | order to use the pppol2tp driver. | ||
42 | |||
43 | 1. Use a UDP socket per tunnel. | ||
44 | |||
45 | 2. Create a single PPPoL2TP socket per tunnel bound to a special null | ||
46 | session id. This is used only for communicating with the driver but | ||
47 | must remain open while the tunnel is active. Opening this tunnel | ||
48 | management socket causes the driver to mark the tunnel socket as an | ||
49 | L2TP UDP encapsulation socket and flags it for use by the | ||
50 | referenced tunnel id. This hooks up the UDP receive path via | ||
51 | udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed | ||
52 | in this special PPPoX socket. | ||
53 | |||
54 | 3. Create a PPPoL2TP socket per L2TP session. This is typically done | ||
55 | by starting pppd with the pppol2tp plugin and appropriate | ||
56 | arguments. A PPPoL2TP tunnel management socket (Step 2) must be | ||
57 | created before the first PPPoL2TP session socket is created. | ||
58 | |||
59 | When creating PPPoL2TP sockets, the application provides information | ||
60 | to the driver about the socket in a socket connect() call. Source and | ||
61 | destination tunnel and session ids are provided, as well as the file | ||
62 | descriptor of a UDP socket. See struct pppol2tp_addr in | ||
63 | include/linux/if_ppp.h. Note that zero tunnel / session ids are | ||
64 | treated specially. When creating the per-tunnel PPPoL2TP management | ||
65 | socket in Step 2 above, zero source and destination session ids are | ||
66 | specified, which tells the driver to prepare the supplied UDP file | ||
67 | descriptor for use as an L2TP tunnel socket. | ||
68 | |||
69 | Userspace may control behavior of the tunnel or session using | ||
70 | setsockopt and ioctl on the PPPoX socket. The following socket | ||
71 | options are supported:- | ||
72 | |||
73 | DEBUG - bitmask of debug message categories. See below. | ||
74 | SENDSEQ - 0 => don't send packets with sequence numbers | ||
75 | 1 => send packets with sequence numbers | ||
76 | RECVSEQ - 0 => receive packet sequence numbers are optional | ||
77 | 1 => drop receive packets without sequence numbers | ||
78 | LNSMODE - 0 => act as LAC. | ||
79 | 1 => act as LNS. | ||
80 | REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder. | ||
81 | |||
82 | Only the DEBUG option is supported by the special tunnel management | ||
83 | PPPoX socket. | ||
84 | |||
85 | In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided | ||
86 | to retrieve tunnel and session statistics from the kernel using the | ||
87 | PPPoX socket of the appropriate tunnel or session. | ||
88 | |||
89 | Debugging | ||
90 | ========= | ||
91 | |||
92 | The driver supports a flexible debug scheme where kernel trace | ||
93 | messages may be optionally enabled per tunnel and per session. Care is | ||
94 | needed when debugging a live system since the messages are not | ||
95 | rate-limited and a busy system could be swamped. Userspace uses | ||
96 | setsockopt on the PPPoX socket to set a debug mask. | ||
97 | |||
98 | The following debug mask bits are available: | ||
99 | |||
100 | PPPOL2TP_MSG_DEBUG verbose debug (if compiled in) | ||
101 | PPPOL2TP_MSG_CONTROL userspace - kernel interface | ||
102 | PPPOL2TP_MSG_SEQ sequence numbers handling | ||
103 | PPPOL2TP_MSG_DATA data packets | ||
104 | |||
105 | Sample Userspace Code | ||
106 | ===================== | ||
107 | |||
108 | 1. Create tunnel management PPPoX socket | ||
109 | |||
110 | kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP); | ||
111 | if (kernel_fd >= 0) { | ||
112 | struct sockaddr_pppol2tp sax; | ||
113 | struct sockaddr_in const *peer_addr; | ||
114 | |||
115 | peer_addr = l2tp_tunnel_get_peer_addr(tunnel); | ||
116 | memset(&sax, 0, sizeof(sax)); | ||
117 | sax.sa_family = AF_PPPOX; | ||
118 | sax.sa_protocol = PX_PROTO_OL2TP; | ||
119 | sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */ | ||
120 | sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr; | ||
121 | sax.pppol2tp.addr.sin_port = peer_addr->sin_port; | ||
122 | sax.pppol2tp.addr.sin_family = AF_INET; | ||
123 | sax.pppol2tp.s_tunnel = tunnel_id; | ||
124 | sax.pppol2tp.s_session = 0; /* special case: mgmt socket */ | ||
125 | sax.pppol2tp.d_tunnel = 0; | ||
126 | sax.pppol2tp.d_session = 0; /* special case: mgmt socket */ | ||
127 | |||
128 | if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) { | ||
129 | perror("connect failed"); | ||
130 | result = -errno; | ||
131 | goto err; | ||
132 | } | ||
133 | } | ||
134 | |||
135 | 2. Create session PPPoX data socket | ||
136 | |||
137 | struct sockaddr_pppol2tp sax; | ||
138 | int fd; | ||
139 | |||
140 | /* Note, the target socket must be bound already, else it will not be ready */ | ||
141 | sax.sa_family = AF_PPPOX; | ||
142 | sax.sa_protocol = PX_PROTO_OL2TP; | ||
143 | sax.pppol2tp.fd = tunnel_fd; | ||
144 | sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr; | ||
145 | sax.pppol2tp.addr.sin_port = addr->sin_port; | ||
146 | sax.pppol2tp.addr.sin_family = AF_INET; | ||
147 | sax.pppol2tp.s_tunnel = tunnel_id; | ||
148 | sax.pppol2tp.s_session = session_id; | ||
149 | sax.pppol2tp.d_tunnel = peer_tunnel_id; | ||
150 | sax.pppol2tp.d_session = peer_session_id; | ||
151 | |||
152 | /* session_fd is the fd of the session's PPPoL2TP socket. | ||
153 | * tunnel_fd is the fd of the tunnel UDP socket. | ||
154 | */ | ||
155 | fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax)); | ||
156 | if (fd < 0 ) { | ||
157 | return -errno; | ||
158 | } | ||
159 | return 0; | ||
160 | |||
161 | Miscellanous | ||
162 | ============ | ||
163 | |||
164 | The PPPoL2TP driver was developed as part of the OpenL2TP project by | ||
165 | Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server, | ||
166 | designed from the ground up to have the L2TP datapath in the | ||
167 | kernel. The project also implemented the pppol2tp plugin for pppd | ||
168 | which allows pppd to use the kernel driver. Details can be found at | ||
169 | http://openl2tp.sourceforge.net. | ||
diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.txt new file mode 100644 index 000000000000..53ef7a06f49c --- /dev/null +++ b/Documentation/networking/mac80211-injection.txt | |||
@@ -0,0 +1,59 @@ | |||
1 | How to use packet injection with mac80211 | ||
2 | ========================================= | ||
3 | |||
4 | mac80211 now allows arbitrary packets to be injected down any Monitor Mode | ||
5 | interface from userland. The packet you inject needs to be composed in the | ||
6 | following format: | ||
7 | |||
8 | [ radiotap header ] | ||
9 | [ ieee80211 header ] | ||
10 | [ payload ] | ||
11 | |||
12 | The radiotap format is discussed in | ||
13 | ./Documentation/networking/radiotap-headers.txt. | ||
14 | |||
15 | Despite 13 radiotap argument types are currently defined, most only make sense | ||
16 | to appear on received packets. Currently three kinds of argument are used by | ||
17 | the injection code, although it knows to skip any other arguments that are | ||
18 | present (facilitating replay of captured radiotap headers directly): | ||
19 | |||
20 | - IEEE80211_RADIOTAP_RATE - u8 arg in 500kbps units (0x02 --> 1Mbps) | ||
21 | |||
22 | - IEEE80211_RADIOTAP_ANTENNA - u8 arg, 0x00 = ant1, 0x01 = ant2 | ||
23 | |||
24 | - IEEE80211_RADIOTAP_DBM_TX_POWER - u8 arg, dBm | ||
25 | |||
26 | Here is an example valid radiotap header defining these three parameters | ||
27 | |||
28 | 0x00, 0x00, // <-- radiotap version | ||
29 | 0x0b, 0x00, // <- radiotap header length | ||
30 | 0x04, 0x0c, 0x00, 0x00, // <-- bitmap | ||
31 | 0x6c, // <-- rate | ||
32 | 0x0c, //<-- tx power | ||
33 | 0x01 //<-- antenna | ||
34 | |||
35 | The ieee80211 header follows immediately afterwards, looking for example like | ||
36 | this: | ||
37 | |||
38 | 0x08, 0x01, 0x00, 0x00, | ||
39 | 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, | ||
40 | 0x13, 0x22, 0x33, 0x44, 0x55, 0x66, | ||
41 | 0x13, 0x22, 0x33, 0x44, 0x55, 0x66, | ||
42 | 0x10, 0x86 | ||
43 | |||
44 | Then lastly there is the payload. | ||
45 | |||
46 | After composing the packet contents, it is sent by send()-ing it to a logical | ||
47 | mac80211 interface that is in Monitor mode. Libpcap can also be used, | ||
48 | (which is easier than doing the work to bind the socket to the right | ||
49 | interface), along the following lines: | ||
50 | |||
51 | ppcap = pcap_open_live(szInterfaceName, 800, 1, 20, szErrbuf); | ||
52 | ... | ||
53 | r = pcap_inject(ppcap, u8aSendBuffer, nLength); | ||
54 | |||
55 | You can also find sources for a complete inject test applet here: | ||
56 | |||
57 | http://penumbra.warmcat.com/_twk/tiki-index.php?page=packetspammer | ||
58 | |||
59 | Andy Green <andy@warmcat.com> | ||
diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt new file mode 100644 index 000000000000..00b60cce2224 --- /dev/null +++ b/Documentation/networking/multiqueue.txt | |||
@@ -0,0 +1,111 @@ | |||
1 | |||
2 | HOWTO for multiqueue network device support | ||
3 | =========================================== | ||
4 | |||
5 | Section 1: Base driver requirements for implementing multiqueue support | ||
6 | Section 2: Qdisc support for multiqueue devices | ||
7 | Section 3: Brief howto using PRIO or RR for multiqueue devices | ||
8 | |||
9 | |||
10 | Intro: Kernel support for multiqueue devices | ||
11 | --------------------------------------------------------- | ||
12 | |||
13 | Kernel support for multiqueue devices is only an API that is presented to the | ||
14 | netdevice layer for base drivers to implement. This feature is part of the | ||
15 | core networking stack, and all network devices will be running on the | ||
16 | multiqueue-aware stack. If a base driver only has one queue, then these | ||
17 | changes are transparent to that driver. | ||
18 | |||
19 | |||
20 | Section 1: Base driver requirements for implementing multiqueue support | ||
21 | ----------------------------------------------------------------------- | ||
22 | |||
23 | Base drivers are required to use the new alloc_etherdev_mq() or | ||
24 | alloc_netdev_mq() functions to allocate the subqueues for the device. The | ||
25 | underlying kernel API will take care of the allocation and deallocation of | ||
26 | the subqueue memory, as well as netdev configuration of where the queues | ||
27 | exist in memory. | ||
28 | |||
29 | The base driver will also need to manage the queues as it does the global | ||
30 | netdev->queue_lock today. Therefore base drivers should use the | ||
31 | netif_{start|stop|wake}_subqueue() functions to manage each queue while the | ||
32 | device is still operational. netdev->queue_lock is still used when the device | ||
33 | comes online or when it's completely shut down (unregister_netdev(), etc.). | ||
34 | |||
35 | Finally, the base driver should indicate that it is a multiqueue device. The | ||
36 | feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features | ||
37 | bitmap on device initialization. Below is an example from e1000: | ||
38 | |||
39 | #ifdef CONFIG_E1000_MQ | ||
40 | if ( (adapter->hw.mac.type == e1000_82571) || | ||
41 | (adapter->hw.mac.type == e1000_82572) || | ||
42 | (adapter->hw.mac.type == e1000_80003es2lan)) | ||
43 | netdev->features |= NETIF_F_MULTI_QUEUE; | ||
44 | #endif | ||
45 | |||
46 | |||
47 | Section 2: Qdisc support for multiqueue devices | ||
48 | ----------------------------------------------- | ||
49 | |||
50 | Currently two qdiscs support multiqueue devices. A new round-robin qdisc, | ||
51 | sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to | ||
52 | bands and queues, and will store the queue mapping into skb->queue_mapping. | ||
53 | Use this field in the base driver to determine which queue to send the skb | ||
54 | to. | ||
55 | |||
56 | sch_rr has been added for hardware that doesn't want scheduling policies from | ||
57 | software, so it's a straight round-robin qdisc. It uses the same syntax and | ||
58 | classification priomap that sch_prio uses, so it should be intuitive to | ||
59 | configure for people who've used sch_prio. | ||
60 | |||
61 | The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been | ||
62 | built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of | ||
63 | bands requested is equal to the number of queues on the hardware. If they | ||
64 | are equal, it sets a one-to-one mapping up between the queues and bands. If | ||
65 | they're not equal, it will not load the qdisc. This is the same behavior | ||
66 | for RR. Once the association is made, any skb that is classified will have | ||
67 | skb->queue_mapping set, which will allow the driver to properly queue skb's | ||
68 | to multiple queues. | ||
69 | |||
70 | |||
71 | Section 3: Brief howto using PRIO and RR for multiqueue devices | ||
72 | --------------------------------------------------------------- | ||
73 | |||
74 | The userspace command 'tc,' part of the iproute2 package, is used to configure | ||
75 | qdiscs. To add the PRIO qdisc to your network device, assuming the device is | ||
76 | called eth0, run the following command: | ||
77 | |||
78 | # tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue | ||
79 | |||
80 | This will create 4 bands, 0 being highest priority, and associate those bands | ||
81 | to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping | ||
82 | would look like: | ||
83 | |||
84 | band 0 => queue 0 | ||
85 | band 1 => queue 1 | ||
86 | band 2 => queue 2 | ||
87 | band 3 => queue 3 | ||
88 | |||
89 | Traffic will begin flowing through each queue if your TOS values are assigning | ||
90 | traffic across the various bands. For example, ssh traffic will always try to | ||
91 | go out band 0 based on TOS -> Linux priority conversion (realtime traffic), | ||
92 | so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal" | ||
93 | traffic classification, which is band 1. Therefore pings will be send out | ||
94 | queue 1 on the NIC. | ||
95 | |||
96 | Note the use of the multiqueue keyword. This is only in versions of iproute2 | ||
97 | that support multiqueue networking devices; if this is omitted when loading | ||
98 | a qdisc onto a multiqueue device, the qdisc will load and operate the same | ||
99 | if it were loaded onto a single-queue device (i.e. - sends all traffic to | ||
100 | queue 0). | ||
101 | |||
102 | Another alternative to multiqueue band allocation can be done by using the | ||
103 | multiqueue option and specify 0 bands. If this is the case, the qdisc will | ||
104 | allocate the number of bands to equal the number of queues that the device | ||
105 | reports, and bring the qdisc online. | ||
106 | |||
107 | The behavior of tc filters remains the same, where it will override TOS priority | ||
108 | classification. | ||
109 | |||
110 | |||
111 | Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> | ||
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt index ce1361f95243..37869295fc70 100644 --- a/Documentation/networking/netdevices.txt +++ b/Documentation/networking/netdevices.txt | |||
@@ -20,6 +20,30 @@ private data which gets freed when the network device is freed. If | |||
20 | separately allocated data is attached to the network device | 20 | separately allocated data is attached to the network device |
21 | (dev->priv) then it is up to the module exit handler to free that. | 21 | (dev->priv) then it is up to the module exit handler to free that. |
22 | 22 | ||
23 | MTU | ||
24 | === | ||
25 | Each network device has a Maximum Transfer Unit. The MTU does not | ||
26 | include any link layer protocol overhead. Upper layer protocols must | ||
27 | not pass a socket buffer (skb) to a device to transmit with more data | ||
28 | than the mtu. The MTU does not include link layer header overhead, so | ||
29 | for example on Ethernet if the standard MTU is 1500 bytes used, the | ||
30 | actual skb will contain up to 1514 bytes because of the Ethernet | ||
31 | header. Devices should allow for the 4 byte VLAN header as well. | ||
32 | |||
33 | Segmentation Offload (GSO, TSO) is an exception to this rule. The | ||
34 | upper layer protocol may pass a large socket buffer to the device | ||
35 | transmit routine, and the device will break that up into separate | ||
36 | packets based on the current MTU. | ||
37 | |||
38 | MTU is symmetrical and applies both to receive and transmit. A device | ||
39 | must be able to receive at least the maximum size packet allowed by | ||
40 | the MTU. A network device may use the MTU as mechanism to size receive | ||
41 | buffers, but the device should allow packets with VLAN header. With | ||
42 | standard Ethernet mtu of 1500 bytes, the device should allow up to | ||
43 | 1518 byte packets (1500 + 14 header + 4 tag). The device may either: | ||
44 | drop, truncate, or pass up oversize packets, but dropping oversize | ||
45 | packets is preferred. | ||
46 | |||
23 | 47 | ||
24 | struct net_device synchronization rules | 48 | struct net_device synchronization rules |
25 | ======================================= | 49 | ======================================= |
@@ -43,16 +67,17 @@ dev->get_stats: | |||
43 | 67 | ||
44 | dev->hard_start_xmit: | 68 | dev->hard_start_xmit: |
45 | Synchronization: netif_tx_lock spinlock. | 69 | Synchronization: netif_tx_lock spinlock. |
70 | |||
46 | When the driver sets NETIF_F_LLTX in dev->features this will be | 71 | When the driver sets NETIF_F_LLTX in dev->features this will be |
47 | called without holding netif_tx_lock. In this case the driver | 72 | called without holding netif_tx_lock. In this case the driver |
48 | has to lock by itself when needed. It is recommended to use a try lock | 73 | has to lock by itself when needed. It is recommended to use a try lock |
49 | for this and return -1 when the spin lock fails. | 74 | for this and return NETDEV_TX_LOCKED when the spin lock fails. |
50 | The locking there should also properly protect against | 75 | The locking there should also properly protect against |
51 | set_multicast_list | 76 | set_multicast_list. |
52 | Context: Process with BHs disabled or BH (timer). | 77 | |
53 | Notes: netif_queue_stopped() is guaranteed false | 78 | Context: Process with BHs disabled or BH (timer), |
54 | Interrupts must be enabled when calling hard_start_xmit. | 79 | will be called with interrupts disabled by netconsole. |
55 | (Interrupts must also be enabled when enabling the BH handler.) | 80 | |
56 | Return codes: | 81 | Return codes: |
57 | o NETDEV_TX_OK everything ok. | 82 | o NETDEV_TX_OK everything ok. |
58 | o NETDEV_TX_BUSY Cannot transmit packet, try later | 83 | o NETDEV_TX_BUSY Cannot transmit packet, try later |
@@ -74,4 +99,5 @@ dev->poll: | |||
74 | Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See | 99 | Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See |
75 | dev_close code and comments in net/core/dev.c for more info. | 100 | dev_close code and comments in net/core/dev.c for more info. |
76 | Context: softirq | 101 | Context: softirq |
102 | will be called with interrupts disabled by netconsole. | ||
77 | 103 | ||
diff --git a/Documentation/networking/radiotap-headers.txt b/Documentation/networking/radiotap-headers.txt new file mode 100644 index 000000000000..953331c7984f --- /dev/null +++ b/Documentation/networking/radiotap-headers.txt | |||
@@ -0,0 +1,152 @@ | |||
1 | How to use radiotap headers | ||
2 | =========================== | ||
3 | |||
4 | Pointer to the radiotap include file | ||
5 | ------------------------------------ | ||
6 | |||
7 | Radiotap headers are variable-length and extensible, you can get most of the | ||
8 | information you need to know on them from: | ||
9 | |||
10 | ./include/net/ieee80211_radiotap.h | ||
11 | |||
12 | This document gives an overview and warns on some corner cases. | ||
13 | |||
14 | |||
15 | Structure of the header | ||
16 | ----------------------- | ||
17 | |||
18 | There is a fixed portion at the start which contains a u32 bitmap that defines | ||
19 | if the possible argument associated with that bit is present or not. So if b0 | ||
20 | of the it_present member of ieee80211_radiotap_header is set, it means that | ||
21 | the header for argument index 0 (IEEE80211_RADIOTAP_TSFT) is present in the | ||
22 | argument area. | ||
23 | |||
24 | < 8-byte ieee80211_radiotap_header > | ||
25 | [ <possible argument bitmap extensions ... > ] | ||
26 | [ <argument> ... ] | ||
27 | |||
28 | At the moment there are only 13 possible argument indexes defined, but in case | ||
29 | we run out of space in the u32 it_present member, it is defined that b31 set | ||
30 | indicates that there is another u32 bitmap following (shown as "possible | ||
31 | argument bitmap extensions..." above), and the start of the arguments is moved | ||
32 | forward 4 bytes each time. | ||
33 | |||
34 | Note also that the it_len member __le16 is set to the total number of bytes | ||
35 | covered by the ieee80211_radiotap_header and any arguments following. | ||
36 | |||
37 | |||
38 | Requirements for arguments | ||
39 | -------------------------- | ||
40 | |||
41 | After the fixed part of the header, the arguments follow for each argument | ||
42 | index whose matching bit is set in the it_present member of | ||
43 | ieee80211_radiotap_header. | ||
44 | |||
45 | - the arguments are all stored little-endian! | ||
46 | |||
47 | - the argument payload for a given argument index has a fixed size. So | ||
48 | IEEE80211_RADIOTAP_TSFT being present always indicates an 8-byte argument is | ||
49 | present. See the comments in ./include/net/ieee80211_radiotap.h for a nice | ||
50 | breakdown of all the argument sizes | ||
51 | |||
52 | - the arguments must be aligned to a boundary of the argument size using | ||
53 | padding. So a u16 argument must start on the next u16 boundary if it isn't | ||
54 | already on one, a u32 must start on the next u32 boundary and so on. | ||
55 | |||
56 | - "alignment" is relative to the start of the ieee80211_radiotap_header, ie, | ||
57 | the first byte of the radiotap header. The absolute alignment of that first | ||
58 | byte isn't defined. So even if the whole radiotap header is starting at, eg, | ||
59 | address 0x00000003, still the first byte of the radiotap header is treated as | ||
60 | 0 for alignment purposes. | ||
61 | |||
62 | - the above point that there may be no absolute alignment for multibyte | ||
63 | entities in the fixed radiotap header or the argument region means that you | ||
64 | have to take special evasive action when trying to access these multibyte | ||
65 | entities. Some arches like Blackfin cannot deal with an attempt to | ||
66 | dereference, eg, a u16 pointer that is pointing to an odd address. Instead | ||
67 | you have to use a kernel API get_unaligned() to dereference the pointer, | ||
68 | which will do it bytewise on the arches that require that. | ||
69 | |||
70 | - The arguments for a given argument index can be a compound of multiple types | ||
71 | together. For example IEEE80211_RADIOTAP_CHANNEL has an argument payload | ||
72 | consisting of two u16s of total length 4. When this happens, the padding | ||
73 | rule is applied dealing with a u16, NOT dealing with a 4-byte single entity. | ||
74 | |||
75 | |||
76 | Example valid radiotap header | ||
77 | ----------------------------- | ||
78 | |||
79 | 0x00, 0x00, // <-- radiotap version + pad byte | ||
80 | 0x0b, 0x00, // <- radiotap header length | ||
81 | 0x04, 0x0c, 0x00, 0x00, // <-- bitmap | ||
82 | 0x6c, // <-- rate (in 500kHz units) | ||
83 | 0x0c, //<-- tx power | ||
84 | 0x01 //<-- antenna | ||
85 | |||
86 | |||
87 | Using the Radiotap Parser | ||
88 | ------------------------- | ||
89 | |||
90 | If you are having to parse a radiotap struct, you can radically simplify the | ||
91 | job by using the radiotap parser that lives in net/wireless/radiotap.c and has | ||
92 | its prototypes available in include/net/cfg80211.h. You use it like this: | ||
93 | |||
94 | #include <net/cfg80211.h> | ||
95 | |||
96 | /* buf points to the start of the radiotap header part */ | ||
97 | |||
98 | int MyFunction(u8 * buf, int buflen) | ||
99 | { | ||
100 | int pkt_rate_100kHz = 0, antenna = 0, pwr = 0; | ||
101 | struct ieee80211_radiotap_iterator iterator; | ||
102 | int ret = ieee80211_radiotap_iterator_init(&iterator, buf, buflen); | ||
103 | |||
104 | while (!ret) { | ||
105 | |||
106 | ret = ieee80211_radiotap_iterator_next(&iterator); | ||
107 | |||
108 | if (ret) | ||
109 | continue; | ||
110 | |||
111 | /* see if this argument is something we can use */ | ||
112 | |||
113 | switch (iterator.this_arg_index) { | ||
114 | /* | ||
115 | * You must take care when dereferencing iterator.this_arg | ||
116 | * for multibyte types... the pointer is not aligned. Use | ||
117 | * get_unaligned((type *)iterator.this_arg) to dereference | ||
118 | * iterator.this_arg for type "type" safely on all arches. | ||
119 | */ | ||
120 | case IEEE80211_RADIOTAP_RATE: | ||
121 | /* radiotap "rate" u8 is in | ||
122 | * 500kbps units, eg, 0x02=1Mbps | ||
123 | */ | ||
124 | pkt_rate_100kHz = (*iterator.this_arg) * 5; | ||
125 | break; | ||
126 | |||
127 | case IEEE80211_RADIOTAP_ANTENNA: | ||
128 | /* radiotap uses 0 for 1st ant */ | ||
129 | antenna = *iterator.this_arg); | ||
130 | break; | ||
131 | |||
132 | case IEEE80211_RADIOTAP_DBM_TX_POWER: | ||
133 | pwr = *iterator.this_arg; | ||
134 | break; | ||
135 | |||
136 | default: | ||
137 | break; | ||
138 | } | ||
139 | } /* while more rt headers */ | ||
140 | |||
141 | if (ret != -ENOENT) | ||
142 | return TXRX_DROP; | ||
143 | |||
144 | /* discard the radiotap header part */ | ||
145 | buf += iterator.max_length; | ||
146 | buflen -= iterator.max_length; | ||
147 | |||
148 | ... | ||
149 | |||
150 | } | ||
151 | |||
152 | Andy Green <andy@warmcat.com> | ||
diff --git a/Documentation/networking/sk98lin.txt b/Documentation/networking/sk98lin.txt deleted file mode 100644 index 8590a954df1d..000000000000 --- a/Documentation/networking/sk98lin.txt +++ /dev/null | |||
@@ -1,568 +0,0 @@ | |||
1 | (C)Copyright 1999-2004 Marvell(R). | ||
2 | All rights reserved | ||
3 | =========================================================================== | ||
4 | |||
5 | sk98lin.txt created 13-Feb-2004 | ||
6 | |||
7 | Readme File for sk98lin v6.23 | ||
8 | Marvell Yukon/SysKonnect SK-98xx Gigabit Ethernet Adapter family driver for LINUX | ||
9 | |||
10 | This file contains | ||
11 | 1 Overview | ||
12 | 2 Required Files | ||
13 | 3 Installation | ||
14 | 3.1 Driver Installation | ||
15 | 3.2 Inclusion of adapter at system start | ||
16 | 4 Driver Parameters | ||
17 | 4.1 Per-Port Parameters | ||
18 | 4.2 Adapter Parameters | ||
19 | 5 Large Frame Support | ||
20 | 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) | ||
21 | 7 Troubleshooting | ||
22 | |||
23 | =========================================================================== | ||
24 | |||
25 | |||
26 | 1 Overview | ||
27 | =========== | ||
28 | |||
29 | The sk98lin driver supports the Marvell Yukon and SysKonnect | ||
30 | SK-98xx/SK-95xx compliant Gigabit Ethernet Adapter on Linux. It has | ||
31 | been tested with Linux on Intel/x86 machines. | ||
32 | *** | ||
33 | |||
34 | |||
35 | 2 Required Files | ||
36 | ================= | ||
37 | |||
38 | The linux kernel source. | ||
39 | No additional files required. | ||
40 | *** | ||
41 | |||
42 | |||
43 | 3 Installation | ||
44 | =============== | ||
45 | |||
46 | It is recommended to download the latest version of the driver from the | ||
47 | SysKonnect web site www.syskonnect.com. If you have downloaded the latest | ||
48 | driver, the Linux kernel has to be patched before the driver can be | ||
49 | installed. For details on how to patch a Linux kernel, refer to the | ||
50 | patch.txt file. | ||
51 | |||
52 | 3.1 Driver Installation | ||
53 | ------------------------ | ||
54 | |||
55 | The following steps describe the actions that are required to install | ||
56 | the driver and to start it manually. These steps should be carried | ||
57 | out for the initial driver setup. Once confirmed to be ok, they can | ||
58 | be included in the system start. | ||
59 | |||
60 | NOTE 1: To perform the following tasks you need 'root' access. | ||
61 | |||
62 | NOTE 2: In case of problems, please read the section "Troubleshooting" | ||
63 | below. | ||
64 | |||
65 | The driver can either be integrated into the kernel or it can be compiled | ||
66 | as a module. Select the appropriate option during the kernel | ||
67 | configuration. | ||
68 | |||
69 | Compile/use the driver as a module | ||
70 | ---------------------------------- | ||
71 | To compile the driver, go to the directory /usr/src/linux and | ||
72 | execute the command "make menuconfig" or "make xconfig" and proceed as | ||
73 | follows: | ||
74 | |||
75 | To integrate the driver permanently into the kernel, proceed as follows: | ||
76 | |||
77 | 1. Select the menu "Network device support" and then "Ethernet(1000Mbit)" | ||
78 | 2. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" | ||
79 | with (*) | ||
80 | 3. Build a new kernel when the configuration of the above options is | ||
81 | finished. | ||
82 | 4. Install the new kernel. | ||
83 | 5. Reboot your system. | ||
84 | |||
85 | To use the driver as a module, proceed as follows: | ||
86 | |||
87 | 1. Enable 'loadable module support' in the kernel. | ||
88 | 2. For automatic driver start, enable the 'Kernel module loader'. | ||
89 | 3. Select the menu "Network device support" and then "Ethernet(1000Mbit)" | ||
90 | 4. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" | ||
91 | with (M) | ||
92 | 5. Execute the command "make modules". | ||
93 | 6. Execute the command "make modules_install". | ||
94 | The appropriate modules will be installed. | ||
95 | 7. Reboot your system. | ||
96 | |||
97 | |||
98 | Load the module manually | ||
99 | ------------------------ | ||
100 | To load the module manually, proceed as follows: | ||
101 | |||
102 | 1. Enter "modprobe sk98lin". | ||
103 | 2. If a Marvell Yukon or SysKonnect SK-98xx adapter is installed in | ||
104 | your computer and you have a /proc file system, execute the command: | ||
105 | "ls /proc/net/sk98lin/" | ||
106 | This should produce an output containing a line with the following | ||
107 | format: | ||
108 | eth0 eth1 ... | ||
109 | which indicates that your adapter has been found and initialized. | ||
110 | |||
111 | NOTE 1: If you have more than one Marvell Yukon or SysKonnect SK-98xx | ||
112 | adapter installed, the adapters will be listed as 'eth0', | ||
113 | 'eth1', 'eth2', etc. | ||
114 | For each adapter, repeat steps 3 and 4 below. | ||
115 | |||
116 | NOTE 2: If you have other Ethernet adapters installed, your Marvell | ||
117 | Yukon or SysKonnect SK-98xx adapter will be mapped to the | ||
118 | next available number, e.g. 'eth1'. The mapping is executed | ||
119 | automatically. | ||
120 | The module installation message (displayed either in a system | ||
121 | log file or on the console) prints a line for each adapter | ||
122 | found containing the corresponding 'ethX'. | ||
123 | |||
124 | 3. Select an IP address and assign it to the respective adapter by | ||
125 | entering: | ||
126 | ifconfig eth0 <ip-address> | ||
127 | With this command, the adapter is connected to the Ethernet. | ||
128 | |||
129 | SK-98xx Gigabit Ethernet Server Adapters: The yellow LED on the adapter | ||
130 | is now active, the link status LED of the primary port is active and | ||
131 | the link status LED of the secondary port (on dual port adapters) is | ||
132 | blinking (if the ports are connected to a switch or hub). | ||
133 | SK-98xx V2.0 Gigabit Ethernet Adapters: The link status LED is active. | ||
134 | In addition, you will receive a status message on the console stating | ||
135 | "ethX: network connection up using port Y" and showing the selected | ||
136 | connection parameters (x stands for the ethernet device number | ||
137 | (0,1,2, etc), y stands for the port name (A or B)). | ||
138 | |||
139 | NOTE: If you are in doubt about IP addresses, ask your network | ||
140 | administrator for assistance. | ||
141 | |||
142 | 4. Your adapter should now be fully operational. | ||
143 | Use 'ping <otherstation>' to verify the connection to other computers | ||
144 | on your network. | ||
145 | 5. To check the adapter configuration view /proc/net/sk98lin/[devicename]. | ||
146 | For example by executing: | ||
147 | "cat /proc/net/sk98lin/eth0" | ||
148 | |||
149 | Unload the module | ||
150 | ----------------- | ||
151 | To stop and unload the driver modules, proceed as follows: | ||
152 | |||
153 | 1. Execute the command "ifconfig eth0 down". | ||
154 | 2. Execute the command "rmmod sk98lin". | ||
155 | |||
156 | 3.2 Inclusion of adapter at system start | ||
157 | ----------------------------------------- | ||
158 | |||
159 | Since a large number of different Linux distributions are | ||
160 | available, we are unable to describe a general installation procedure | ||
161 | for the driver module. | ||
162 | Because the driver is now integrated in the kernel, installation should | ||
163 | be easy, using the standard mechanism of your distribution. | ||
164 | Refer to the distribution's manual for installation of ethernet adapters. | ||
165 | |||
166 | *** | ||
167 | |||
168 | 4 Driver Parameters | ||
169 | ==================== | ||
170 | |||
171 | Parameters can be set at the command line after the module has been | ||
172 | loaded with the command 'modprobe'. | ||
173 | In some distributions, the configuration tools are able to pass parameters | ||
174 | to the driver module. | ||
175 | |||
176 | If you use the kernel module loader, you can set driver parameters | ||
177 | in the file /etc/modprobe.conf (or /etc/modules.conf in 2.4 or earlier). | ||
178 | To set the driver parameters in this file, proceed as follows: | ||
179 | |||
180 | 1. Insert a line of the form : | ||
181 | options sk98lin ... | ||
182 | For "...", the same syntax is required as described for the command | ||
183 | line parameters of modprobe below. | ||
184 | 2. To activate the new parameters, either reboot your computer | ||
185 | or | ||
186 | unload and reload the driver. | ||
187 | The syntax of the driver parameters is: | ||
188 | |||
189 | modprobe sk98lin parameter=value1[,value2[,value3...]] | ||
190 | |||
191 | where value1 refers to the first adapter, value2 to the second etc. | ||
192 | |||
193 | NOTE: All parameters are case sensitive. Write them exactly as shown | ||
194 | below. | ||
195 | |||
196 | Example: | ||
197 | Suppose you have two adapters. You want to set auto-negotiation | ||
198 | on the first adapter to ON and on the second adapter to OFF. | ||
199 | You also want to set DuplexCapabilities on the first adapter | ||
200 | to FULL, and on the second adapter to HALF. | ||
201 | Then, you must enter: | ||
202 | |||
203 | modprobe sk98lin AutoNeg_A=On,Off DupCap_A=Full,Half | ||
204 | |||
205 | NOTE: The number of adapters that can be configured this way is | ||
206 | limited in the driver (file skge.c, constant SK_MAX_CARD_PARAM). | ||
207 | The current limit is 16. If you happen to install | ||
208 | more adapters, adjust this and recompile. | ||
209 | |||
210 | |||
211 | 4.1 Per-Port Parameters | ||
212 | ------------------------ | ||
213 | |||
214 | These settings are available for each port on the adapter. | ||
215 | In the following description, '?' stands for the port for | ||
216 | which you set the parameter (A or B). | ||
217 | |||
218 | Speed | ||
219 | ----- | ||
220 | Parameter: Speed_? | ||
221 | Values: 10, 100, 1000, Auto | ||
222 | Default: Auto | ||
223 | |||
224 | This parameter is used to set the speed capabilities. It is only valid | ||
225 | for the SK-98xx V2.0 copper adapters. | ||
226 | Usually, the speed is negotiated between the two ports during link | ||
227 | establishment. If this fails, a port can be forced to a specific setting | ||
228 | with this parameter. | ||
229 | |||
230 | Auto-Negotiation | ||
231 | ---------------- | ||
232 | Parameter: AutoNeg_? | ||
233 | Values: On, Off, Sense | ||
234 | Default: On | ||
235 | |||
236 | The "Sense"-mode automatically detects whether the link partner supports | ||
237 | auto-negotiation or not. | ||
238 | |||
239 | Duplex Capabilities | ||
240 | ------------------- | ||
241 | Parameter: DupCap_? | ||
242 | Values: Half, Full, Both | ||
243 | Default: Both | ||
244 | |||
245 | This parameters is only relevant if auto-negotiation for this port is | ||
246 | not set to "Sense". If auto-negotiation is set to "On", all three values | ||
247 | are possible. If it is set to "Off", only "Full" and "Half" are allowed. | ||
248 | This parameter is useful if your link partner does not support all | ||
249 | possible combinations. | ||
250 | |||
251 | Flow Control | ||
252 | ------------ | ||
253 | Parameter: FlowCtrl_? | ||
254 | Values: Sym, SymOrRem, LocSend, None | ||
255 | Default: SymOrRem | ||
256 | |||
257 | This parameter can be used to set the flow control capabilities the | ||
258 | port reports during auto-negotiation. It can be set for each port | ||
259 | individually. | ||
260 | Possible modes: | ||
261 | -- Sym = Symmetric: both link partners are allowed to send | ||
262 | PAUSE frames | ||
263 | -- SymOrRem = SymmetricOrRemote: both or only remote partner | ||
264 | are allowed to send PAUSE frames | ||
265 | -- LocSend = LocalSend: only local link partner is allowed | ||
266 | to send PAUSE frames | ||
267 | -- None = no link partner is allowed to send PAUSE frames | ||
268 | |||
269 | NOTE: This parameter is ignored if auto-negotiation is set to "Off". | ||
270 | |||
271 | Role in Master-Slave-Negotiation (1000Base-T only) | ||
272 | -------------------------------------------------- | ||
273 | Parameter: Role_? | ||
274 | Values: Auto, Master, Slave | ||
275 | Default: Auto | ||
276 | |||
277 | This parameter is only valid for the SK-9821 and SK-9822 adapters. | ||
278 | For two 1000Base-T ports to communicate, one must take the role of the | ||
279 | master (providing timing information), while the other must be the | ||
280 | slave. Usually, this is negotiated between the two ports during link | ||
281 | establishment. If this fails, a port can be forced to a specific setting | ||
282 | with this parameter. | ||
283 | |||
284 | |||
285 | 4.2 Adapter Parameters | ||
286 | ----------------------- | ||
287 | |||
288 | Connection Type (SK-98xx V2.0 copper adapters only) | ||
289 | --------------- | ||
290 | Parameter: ConType | ||
291 | Values: Auto, 100FD, 100HD, 10FD, 10HD | ||
292 | Default: Auto | ||
293 | |||
294 | The parameter 'ConType' is a combination of all five per-port parameters | ||
295 | within one single parameter. This simplifies the configuration of both ports | ||
296 | of an adapter card! The different values of this variable reflect the most | ||
297 | meaningful combinations of port parameters. | ||
298 | |||
299 | The following table shows the values of 'ConType' and the corresponding | ||
300 | combinations of the per-port parameters: | ||
301 | |||
302 | ConType | DupCap AutoNeg FlowCtrl Role Speed | ||
303 | ----------+------------------------------------------------------ | ||
304 | Auto | Both On SymOrRem Auto Auto | ||
305 | 100FD | Full Off None Auto (ignored) 100 | ||
306 | 100HD | Half Off None Auto (ignored) 100 | ||
307 | 10FD | Full Off None Auto (ignored) 10 | ||
308 | 10HD | Half Off None Auto (ignored) 10 | ||
309 | |||
310 | Stating any other port parameter together with this 'ConType' variable | ||
311 | will result in a merged configuration of those settings. This due to | ||
312 | the fact, that the per-port parameters (e.g. Speed_? ) have a higher | ||
313 | priority than the combined variable 'ConType'. | ||
314 | |||
315 | NOTE: This parameter is always used on both ports of the adapter card. | ||
316 | |||
317 | Interrupt Moderation | ||
318 | -------------------- | ||
319 | Parameter: Moderation | ||
320 | Values: None, Static, Dynamic | ||
321 | Default: None | ||
322 | |||
323 | Interrupt moderation is employed to limit the maximum number of interrupts | ||
324 | the driver has to serve. That is, one or more interrupts (which indicate any | ||
325 | transmit or receive packet to be processed) are queued until the driver | ||
326 | processes them. When queued interrupts are to be served, is determined by the | ||
327 | 'IntsPerSec' parameter, which is explained later below. | ||
328 | |||
329 | Possible modes: | ||
330 | |||
331 | -- None - No interrupt moderation is applied on the adapter card. | ||
332 | Therefore, each transmit or receive interrupt is served immediately | ||
333 | as soon as it appears on the interrupt line of the adapter card. | ||
334 | |||
335 | -- Static - Interrupt moderation is applied on the adapter card. | ||
336 | All transmit and receive interrupts are queued until a complete | ||
337 | moderation interval ends. If such a moderation interval ends, all | ||
338 | queued interrupts are processed in one big bunch without any delay. | ||
339 | The term 'static' reflects the fact, that interrupt moderation is | ||
340 | always enabled, regardless how much network load is currently | ||
341 | passing via a particular interface. In addition, the duration of | ||
342 | the moderation interval has a fixed length that never changes while | ||
343 | the driver is operational. | ||
344 | |||
345 | -- Dynamic - Interrupt moderation might be applied on the adapter card, | ||
346 | depending on the load of the system. If the driver detects that the | ||
347 | system load is too high, the driver tries to shield the system against | ||
348 | too much network load by enabling interrupt moderation. If - at a later | ||
349 | time - the CPU utilization decreases again (or if the network load is | ||
350 | negligible) the interrupt moderation will automatically be disabled. | ||
351 | |||
352 | Interrupt moderation should be used when the driver has to handle one or more | ||
353 | interfaces with a high network load, which - as a consequence - leads also to a | ||
354 | high CPU utilization. When moderation is applied in such high network load | ||
355 | situations, CPU load might be reduced by 20-30%. | ||
356 | |||
357 | NOTE: The drawback of using interrupt moderation is an increase of the round- | ||
358 | trip-time (RTT), due to the queueing and serving of interrupts at dedicated | ||
359 | moderation times. | ||
360 | |||
361 | Interrupts per second | ||
362 | --------------------- | ||
363 | Parameter: IntsPerSec | ||
364 | Values: 30...40000 (interrupts per second) | ||
365 | Default: 2000 | ||
366 | |||
367 | This parameter is only used if either static or dynamic interrupt moderation | ||
368 | is used on a network adapter card. Using this parameter if no moderation is | ||
369 | applied will lead to no action performed. | ||
370 | |||
371 | This parameter determines the length of any interrupt moderation interval. | ||
372 | Assuming that static interrupt moderation is to be used, an 'IntsPerSec' | ||
373 | parameter value of 2000 will lead to an interrupt moderation interval of | ||
374 | 500 microseconds. | ||
375 | |||
376 | NOTE: The duration of the moderation interval is to be chosen with care. | ||
377 | At first glance, selecting a very long duration (e.g. only 100 interrupts per | ||
378 | second) seems to be meaningful, but the increase of packet-processing delay | ||
379 | is tremendous. On the other hand, selecting a very short moderation time might | ||
380 | compensate the use of any moderation being applied. | ||
381 | |||
382 | |||
383 | Preferred Port | ||
384 | -------------- | ||
385 | Parameter: PrefPort | ||
386 | Values: A, B | ||
387 | Default: A | ||
388 | |||
389 | This is used to force the preferred port to A or B (on dual-port network | ||
390 | adapters). The preferred port is the one that is used if both are detected | ||
391 | as fully functional. | ||
392 | |||
393 | RLMT Mode (Redundant Link Management Technology) | ||
394 | ------------------------------------------------ | ||
395 | Parameter: RlmtMode | ||
396 | Values: CheckLinkState,CheckLocalPort, CheckSeg, DualNet | ||
397 | Default: CheckLinkState | ||
398 | |||
399 | RLMT monitors the status of the port. If the link of the active port | ||
400 | fails, RLMT switches immediately to the standby link. The virtual link is | ||
401 | maintained as long as at least one 'physical' link is up. | ||
402 | |||
403 | Possible modes: | ||
404 | |||
405 | -- CheckLinkState - Check link state only: RLMT uses the link state | ||
406 | reported by the adapter hardware for each individual port to | ||
407 | determine whether a port can be used for all network traffic or | ||
408 | not. | ||
409 | |||
410 | -- CheckLocalPort - In this mode, RLMT monitors the network path | ||
411 | between the two ports of an adapter by regularly exchanging packets | ||
412 | between them. This mode requires a network configuration in which | ||
413 | the two ports are able to "see" each other (i.e. there must not be | ||
414 | any router between the ports). | ||
415 | |||
416 | -- CheckSeg - Check local port and segmentation: This mode supports the | ||
417 | same functions as the CheckLocalPort mode and additionally checks | ||
418 | network segmentation between the ports. Therefore, this mode is only | ||
419 | to be used if Gigabit Ethernet switches are installed on the network | ||
420 | that have been configured to use the Spanning Tree protocol. | ||
421 | |||
422 | -- DualNet - In this mode, ports A and B are used as separate devices. | ||
423 | If you have a dual port adapter, port A will be configured as eth0 | ||
424 | and port B as eth1. Both ports can be used independently with | ||
425 | distinct IP addresses. The preferred port setting is not used. | ||
426 | RLMT is turned off. | ||
427 | |||
428 | NOTE: RLMT modes CLP and CLPSS are designed to operate in configurations | ||
429 | where a network path between the ports on one adapter exists. | ||
430 | Moreover, they are not designed to work where adapters are connected | ||
431 | back-to-back. | ||
432 | *** | ||
433 | |||
434 | |||
435 | 5 Large Frame Support | ||
436 | ====================== | ||
437 | |||
438 | The driver supports large frames (also called jumbo frames). Using large | ||
439 | frames can result in an improved throughput if transferring large amounts | ||
440 | of data. | ||
441 | To enable large frames, set the MTU (maximum transfer unit) of the | ||
442 | interface to the desired value (up to 9000), execute the following | ||
443 | command: | ||
444 | ifconfig eth0 mtu 9000 | ||
445 | This will only work if you have two adapters connected back-to-back | ||
446 | or if you use a switch that supports large frames. When using a switch, | ||
447 | it should be configured to allow large frames and auto-negotiation should | ||
448 | be set to OFF. The setting must be configured on all adapters that can be | ||
449 | reached by the large frames. If one adapter is not set to receive large | ||
450 | frames, it will simply drop them. | ||
451 | |||
452 | You can switch back to the standard ethernet frame size by executing the | ||
453 | following command: | ||
454 | ifconfig eth0 mtu 1500 | ||
455 | |||
456 | To permanently configure this setting, add a script with the 'ifconfig' | ||
457 | line to the system startup sequence (named something like "S99sk98lin" | ||
458 | in /etc/rc.d/rc2.d). | ||
459 | *** | ||
460 | |||
461 | |||
462 | 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) | ||
463 | ================================================================== | ||
464 | |||
465 | The Marvell Yukon/SysKonnect Linux drivers are able to support VLAN and | ||
466 | Link Aggregation according to IEEE standards 802.1, 802.1q, and 802.3ad. | ||
467 | These features are only available after installation of open source | ||
468 | modules available on the Internet: | ||
469 | For VLAN go to: http://www.candelatech.com/~greear/vlan.html | ||
470 | For Link Aggregation go to: http://www.st.rim.or.jp/~yumo | ||
471 | |||
472 | NOTE: SysKonnect GmbH does not offer any support for these open source | ||
473 | modules and does not take the responsibility for any kind of | ||
474 | failures or problems arising in connection with these modules. | ||
475 | |||
476 | NOTE: Configuring Link Aggregation on a SysKonnect dual link adapter may | ||
477 | cause problems when unloading the driver. | ||
478 | |||
479 | |||
480 | 7 Troubleshooting | ||
481 | ================== | ||
482 | |||
483 | If any problems occur during the installation process, check the | ||
484 | following list: | ||
485 | |||
486 | |||
487 | Problem: The SK-98xx adapter cannot be found by the driver. | ||
488 | Solution: In /proc/pci search for the following entry: | ||
489 | 'Ethernet controller: SysKonnect SK-98xx ...' | ||
490 | If this entry exists, the SK-98xx or SK-98xx V2.0 adapter has | ||
491 | been found by the system and should be operational. | ||
492 | If this entry does not exist or if the file '/proc/pci' is not | ||
493 | found, there may be a hardware problem or the PCI support may | ||
494 | not be enabled in your kernel. | ||
495 | The adapter can be checked using the diagnostics program which | ||
496 | is available on the SysKonnect web site: | ||
497 | www.syskonnect.com | ||
498 | |||
499 | Some COMPAQ machines have problems dealing with PCI under Linux. | ||
500 | This problem is described in the 'PCI howto' document | ||
501 | (included in some distributions or available from the | ||
502 | web, e.g. at 'www.linux.org'). | ||
503 | |||
504 | |||
505 | Problem: Programs such as 'ifconfig' or 'route' cannot be found or the | ||
506 | error message 'Operation not permitted' is displayed. | ||
507 | Reason: You are not logged in as user 'root'. | ||
508 | Solution: Logout and login as 'root' or change to 'root' via 'su'. | ||
509 | |||
510 | |||
511 | Problem: Upon use of the command 'ping <address>' the message | ||
512 | "ping: sendto: Network is unreachable" is displayed. | ||
513 | Reason: Your route is not set correctly. | ||
514 | Solution: If you are using RedHat, you probably forgot to set up the | ||
515 | route in the 'network configuration'. | ||
516 | Check the existing routes with the 'route' command and check | ||
517 | if an entry for 'eth0' exists, and if so, if it is set correctly. | ||
518 | |||
519 | |||
520 | Problem: The driver can be started, the adapter is connected to the | ||
521 | network, but you cannot receive or transmit any packets; | ||
522 | e.g. 'ping' does not work. | ||
523 | Reason: There is an incorrect route in your routing table. | ||
524 | Solution: Check the routing table with the command 'route' and read the | ||
525 | manual help pages dealing with routes (enter 'man route'). | ||
526 | |||
527 | NOTE: Although the 2.2.x kernel versions generate the routing entry | ||
528 | automatically, problems of this kind may occur here as well. We've | ||
529 | come across a situation in which the driver started correctly at | ||
530 | system start, but after the driver has been removed and reloaded, | ||
531 | the route of the adapter's network pointed to the 'dummy0'device | ||
532 | and had to be corrected manually. | ||
533 | |||
534 | |||
535 | Problem: Your computer should act as a router between multiple | ||
536 | IP subnetworks (using multiple adapters), but computers in | ||
537 | other subnetworks cannot be reached. | ||
538 | Reason: Either the router's kernel is not configured for IP forwarding | ||
539 | or the routing table and gateway configuration of at least one | ||
540 | computer is not working. | ||
541 | |||
542 | Problem: Upon driver start, the following error message is displayed: | ||
543 | "eth0: -- ERROR -- | ||
544 | Class: internal Software error | ||
545 | Nr: 0xcc | ||
546 | Msg: SkGeInitPort() cannot init running ports" | ||
547 | Reason: You are using a driver compiled for single processor machines | ||
548 | on a multiprocessor machine with SMP (Symmetric MultiProcessor) | ||
549 | kernel. | ||
550 | Solution: Configure your kernel appropriately and recompile the kernel or | ||
551 | the modules. | ||
552 | |||
553 | |||
554 | |||
555 | If your problem is not listed here, please contact SysKonnect's technical | ||
556 | support for help (linux@syskonnect.de). | ||
557 | When contacting our technical support, please ensure that the following | ||
558 | information is available: | ||
559 | - System Manufacturer and HW Informations (CPU, Memory... ) | ||
560 | - PCI-Boards in your system | ||
561 | - Distribution | ||
562 | - Kernel version | ||
563 | - Driver version | ||
564 | *** | ||
565 | |||
566 | |||
567 | |||
568 | ***End of Readme File*** | ||
diff --git a/Documentation/networking/spider_net.txt b/Documentation/networking/spider_net.txt new file mode 100644 index 000000000000..4b4adb8eb14f --- /dev/null +++ b/Documentation/networking/spider_net.txt | |||
@@ -0,0 +1,204 @@ | |||
1 | |||
2 | The Spidernet Device Driver | ||
3 | =========================== | ||
4 | |||
5 | Written by Linas Vepstas <linas@austin.ibm.com> | ||
6 | |||
7 | Version of 7 June 2007 | ||
8 | |||
9 | Abstract | ||
10 | ======== | ||
11 | This document sketches the structure of portions of the spidernet | ||
12 | device driver in the Linux kernel tree. The spidernet is a gigabit | ||
13 | ethernet device built into the Toshiba southbridge commonly used | ||
14 | in the SONY Playstation 3 and the IBM QS20 Cell blade. | ||
15 | |||
16 | The Structure of the RX Ring. | ||
17 | ============================= | ||
18 | The receive (RX) ring is a circular linked list of RX descriptors, | ||
19 | together with three pointers into the ring that are used to manage its | ||
20 | contents. | ||
21 | |||
22 | The elements of the ring are called "descriptors" or "descrs"; they | ||
23 | describe the received data. This includes a pointer to a buffer | ||
24 | containing the received data, the buffer size, and various status bits. | ||
25 | |||
26 | There are three primary states that a descriptor can be in: "empty", | ||
27 | "full" and "not-in-use". An "empty" or "ready" descriptor is ready | ||
28 | to receive data from the hardware. A "full" descriptor has data in it, | ||
29 | and is waiting to be emptied and processed by the OS. A "not-in-use" | ||
30 | descriptor is neither empty or full; it is simply not ready. It may | ||
31 | not even have a data buffer in it, or is otherwise unusable. | ||
32 | |||
33 | During normal operation, on device startup, the OS (specifically, the | ||
34 | spidernet device driver) allocates a set of RX descriptors and RX | ||
35 | buffers. These are all marked "empty", ready to receive data. This | ||
36 | ring is handed off to the hardware, which sequentially fills in the | ||
37 | buffers, and marks them "full". The OS follows up, taking the full | ||
38 | buffers, processing them, and re-marking them empty. | ||
39 | |||
40 | This filling and emptying is managed by three pointers, the "head" | ||
41 | and "tail" pointers, managed by the OS, and a hardware current | ||
42 | descriptor pointer (GDACTDPA). The GDACTDPA points at the descr | ||
43 | currently being filled. When this descr is filled, the hardware | ||
44 | marks it full, and advances the GDACTDPA by one. Thus, when there is | ||
45 | flowing RX traffic, every descr behind it should be marked "full", | ||
46 | and everything in front of it should be "empty". If the hardware | ||
47 | discovers that the current descr is not empty, it will signal an | ||
48 | interrupt, and halt processing. | ||
49 | |||
50 | The tail pointer tails or trails the hardware pointer. When the | ||
51 | hardware is ahead, the tail pointer will be pointing at a "full" | ||
52 | descr. The OS will process this descr, and then mark it "not-in-use", | ||
53 | and advance the tail pointer. Thus, when there is flowing RX traffic, | ||
54 | all of the descrs in front of the tail pointer should be "full", and | ||
55 | all of those behind it should be "not-in-use". When RX traffic is not | ||
56 | flowing, then the tail pointer can catch up to the hardware pointer. | ||
57 | The OS will then note that the current tail is "empty", and halt | ||
58 | processing. | ||
59 | |||
60 | The head pointer (somewhat mis-named) follows after the tail pointer. | ||
61 | When traffic is flowing, then the head pointer will be pointing at | ||
62 | a "not-in-use" descr. The OS will perform various housekeeping duties | ||
63 | on this descr. This includes allocating a new data buffer and | ||
64 | dma-mapping it so as to make it visible to the hardware. The OS will | ||
65 | then mark the descr as "empty", ready to receive data. Thus, when there | ||
66 | is flowing RX traffic, everything in front of the head pointer should | ||
67 | be "not-in-use", and everything behind it should be "empty". If no | ||
68 | RX traffic is flowing, then the head pointer can catch up to the tail | ||
69 | pointer, at which point the OS will notice that the head descr is | ||
70 | "empty", and it will halt processing. | ||
71 | |||
72 | Thus, in an idle system, the GDACTDPA, tail and head pointers will | ||
73 | all be pointing at the same descr, which should be "empty". All of the | ||
74 | other descrs in the ring should be "empty" as well. | ||
75 | |||
76 | The show_rx_chain() routine will print out the the locations of the | ||
77 | GDACTDPA, tail and head pointers. It will also summarize the contents | ||
78 | of the ring, starting at the tail pointer, and listing the status | ||
79 | of the descrs that follow. | ||
80 | |||
81 | A typical example of the output, for a nearly idle system, might be | ||
82 | |||
83 | net eth1: Total number of descrs=256 | ||
84 | net eth1: Chain tail located at descr=20 | ||
85 | net eth1: Chain head is at 20 | ||
86 | net eth1: HW curr desc (GDACTDPA) is at 21 | ||
87 | net eth1: Have 1 descrs with stat=x40800101 | ||
88 | net eth1: HW next desc (GDACNEXTDA) is at 22 | ||
89 | net eth1: Last 255 descrs with stat=xa0800000 | ||
90 | |||
91 | In the above, the hardware has filled in one descr, number 20. Both | ||
92 | head and tail are pointing at 20, because it has not yet been emptied. | ||
93 | Meanwhile, hw is pointing at 21, which is free. | ||
94 | |||
95 | The "Have nnn decrs" refers to the descr starting at the tail: in this | ||
96 | case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers | ||
97 | to all of the rest of the descrs, from the last status change. The "nnn" | ||
98 | is a count of how many descrs have exactly the same status. | ||
99 | |||
100 | The status x4... corresponds to "full" and status xa... corresponds | ||
101 | to "empty". The actual value printed is RXCOMST_A. | ||
102 | |||
103 | In the device driver source code, a different set of names are | ||
104 | used for these same concepts, so that | ||
105 | |||
106 | "empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa | ||
107 | "full" == SPIDER_NET_DESCR_FRAME_END == 0x4 | ||
108 | "not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf | ||
109 | |||
110 | |||
111 | The RX RAM full bug/feature | ||
112 | =========================== | ||
113 | |||
114 | As long as the OS can empty out the RX buffers at a rate faster than | ||
115 | the hardware can fill them, there is no problem. If, for some reason, | ||
116 | the OS fails to empty the RX ring fast enough, the hardware GDACTDPA | ||
117 | pointer will catch up to the head, notice the not-empty condition, | ||
118 | ad stop. However, RX packets may still continue arriving on the wire. | ||
119 | The spidernet chip can save some limited number of these in local RAM. | ||
120 | When this local ram fills up, the spider chip will issue an interrupt | ||
121 | indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit | ||
122 | will be set in GHIINT1STS). When the RX ram full condition occurs, | ||
123 | a certain bug/feature is triggered that has to be specially handled. | ||
124 | This section describes the special handling for this condition. | ||
125 | |||
126 | When the OS finally has a chance to run, it will empty out the RX ring. | ||
127 | In particular, it will clear the descriptor on which the hardware had | ||
128 | stopped. However, once the hardware has decided that a certain | ||
129 | descriptor is invalid, it will not restart at that descriptor; instead | ||
130 | it will restart at the next descr. This potentially will lead to a | ||
131 | deadlock condition, as the tail pointer will be pointing at this descr, | ||
132 | which, from the OS point of view, is empty; the OS will be waiting for | ||
133 | this descr to be filled. However, the hardware has skipped this descr, | ||
134 | and is filling the next descrs. Since the OS doesn't see this, there | ||
135 | is a potential deadlock, with the OS waiting for one descr to fill, | ||
136 | while the hardware is waiting for a different set of descrs to become | ||
137 | empty. | ||
138 | |||
139 | A call to show_rx_chain() at this point indicates the nature of the | ||
140 | problem. A typical print when the network is hung shows the following: | ||
141 | |||
142 | net eth1: Spider RX RAM full, incoming packets might be discarded! | ||
143 | net eth1: Total number of descrs=256 | ||
144 | net eth1: Chain tail located at descr=255 | ||
145 | net eth1: Chain head is at 255 | ||
146 | net eth1: HW curr desc (GDACTDPA) is at 0 | ||
147 | net eth1: Have 1 descrs with stat=xa0800000 | ||
148 | net eth1: HW next desc (GDACNEXTDA) is at 1 | ||
149 | net eth1: Have 127 descrs with stat=x40800101 | ||
150 | net eth1: Have 1 descrs with stat=x40800001 | ||
151 | net eth1: Have 126 descrs with stat=x40800101 | ||
152 | net eth1: Last 1 descrs with stat=xa0800000 | ||
153 | |||
154 | Both the tail and head pointers are pointing at descr 255, which is | ||
155 | marked xa... which is "empty". Thus, from the OS point of view, there | ||
156 | is nothing to be done. In particular, there is the implicit assumption | ||
157 | that everything in front of the "empty" descr must surely also be empty, | ||
158 | as explained in the last section. The OS is waiting for descr 255 to | ||
159 | become non-empty, which, in this case, will never happen. | ||
160 | |||
161 | The HW pointer is at descr 0. This descr is marked 0x4.. or "full". | ||
162 | Since its already full, the hardware can do nothing more, and thus has | ||
163 | halted processing. Notice that descrs 0 through 254 are all marked | ||
164 | "full", while descr 254 and 255 are empty. (The "Last 1 descrs" is | ||
165 | descr 254, since tail was at 255.) Thus, the system is deadlocked, | ||
166 | and there can be no forward progress; the OS thinks there's nothing | ||
167 | to do, and the hardware has nowhere to put incoming data. | ||
168 | |||
169 | This bug/feature is worked around with the spider_net_resync_head_ptr() | ||
170 | routine. When the driver receives RX interrupts, but an examination | ||
171 | of the RX chain seems to show it is empty, then it is probable that | ||
172 | the hardware has skipped a descr or two (sometimes dozens under heavy | ||
173 | network conditions). The spider_net_resync_head_ptr() subroutine will | ||
174 | search the ring for the next full descr, and the driver will resume | ||
175 | operations there. Since this will leave "holes" in the ring, there | ||
176 | is also a spider_net_resync_tail_ptr() that will skip over such holes. | ||
177 | |||
178 | As of this writing, the spider_net_resync() strategy seems to work very | ||
179 | well, even under heavy network loads. | ||
180 | |||
181 | |||
182 | The TX ring | ||
183 | =========== | ||
184 | The TX ring uses a low-watermark interrupt scheme to make sure that | ||
185 | the TX queue is appropriately serviced for large packet sizes. | ||
186 | |||
187 | For packet sizes greater than about 1KBytes, the kernel can fill | ||
188 | the TX ring quicker than the device can drain it. Once the ring | ||
189 | is full, the netdev is stopped. When there is room in the ring, | ||
190 | the netdev needs to be reawakened, so that more TX packets are placed | ||
191 | in the ring. The hardware can empty the ring about four times per jiffy, | ||
192 | so its not appropriate to wait for the poll routine to refill, since | ||
193 | the poll routine runs only once per jiffy. The low-watermark mechanism | ||
194 | marks a descr about 1/4th of the way from the bottom of the queue, so | ||
195 | that an interrupt is generated when the descr is processed. This | ||
196 | interrupt wakes up the netdev, which can then refill the queue. | ||
197 | For large packets, this mechanism generates a relatively small number | ||
198 | of interrupts, about 1K/sec. For smaller packets, this will drop to zero | ||
199 | interrupts, as the hardware can empty the queue faster than the kernel | ||
200 | can fill it. | ||
201 | |||
202 | |||
203 | ======= END OF DOCUMENT ======== | ||
204 | |||
diff --git a/Documentation/networking/xfrm_sysctl.txt b/Documentation/networking/xfrm_sysctl.txt new file mode 100644 index 000000000000..5bbd16792fe1 --- /dev/null +++ b/Documentation/networking/xfrm_sysctl.txt | |||
@@ -0,0 +1,4 @@ | |||
1 | /proc/sys/net/core/xfrm_* Variables: | ||
2 | |||
3 | xfrm_acq_expires - INTEGER | ||
4 | default 30 - hard timeout in seconds for acquire requests | ||
diff --git a/Documentation/pci.txt b/Documentation/pci.txt index d38261b67905..7754f5aea4e9 100644 --- a/Documentation/pci.txt +++ b/Documentation/pci.txt | |||
@@ -113,9 +113,6 @@ initialization with a pointer to a structure describing the driver | |||
113 | (Please see Documentation/power/pci.txt for descriptions | 113 | (Please see Documentation/power/pci.txt for descriptions |
114 | of PCI Power Management and the related functions.) | 114 | of PCI Power Management and the related functions.) |
115 | 115 | ||
116 | enable_wake Enable device to generate wake events from a low power | ||
117 | state. | ||
118 | |||
119 | shutdown Hook into reboot_notifier_list (kernel/sys.c). | 116 | shutdown Hook into reboot_notifier_list (kernel/sys.c). |
120 | Intended to stop any idling DMA operations. | 117 | Intended to stop any idling DMA operations. |
121 | Useful for enabling wake-on-lan (NIC) or changing | 118 | Useful for enabling wake-on-lan (NIC) or changing |
@@ -299,7 +296,10 @@ If the PCI device can use the PCI Memory-Write-Invalidate transaction, | |||
299 | call pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval | 296 | call pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval |
300 | and also ensures that the cache line size register is set correctly. | 297 | and also ensures that the cache line size register is set correctly. |
301 | Check the return value of pci_set_mwi() as not all architectures | 298 | Check the return value of pci_set_mwi() as not all architectures |
302 | or chip-sets may support Memory-Write-Invalidate. | 299 | or chip-sets may support Memory-Write-Invalidate. Alternatively, |
300 | if Mem-Wr-Inval would be nice to have but is not required, call | ||
301 | pci_try_set_mwi() to have the system do its best effort at enabling | ||
302 | Mem-Wr-Inval. | ||
303 | 303 | ||
304 | 304 | ||
305 | 3.2 Request MMIO/IOP resources | 305 | 3.2 Request MMIO/IOP resources |
diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt index e00b099a4b86..dd8fe43888d3 100644 --- a/Documentation/power/pci.txt +++ b/Documentation/power/pci.txt | |||
@@ -164,7 +164,6 @@ struct pci_driver: | |||
164 | 164 | ||
165 | int (*suspend) (struct pci_dev *dev, pm_message_t state); | 165 | int (*suspend) (struct pci_dev *dev, pm_message_t state); |
166 | int (*resume) (struct pci_dev *dev); | 166 | int (*resume) (struct pci_dev *dev); |
167 | int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); | ||
168 | 167 | ||
169 | 168 | ||
170 | suspend | 169 | suspend |
@@ -251,42 +250,6 @@ The driver should update the current_state field in its pci_dev structure in | |||
251 | this function, except for PM-capable devices when pci_set_power_state is used. | 250 | this function, except for PM-capable devices when pci_set_power_state is used. |
252 | 251 | ||
253 | 252 | ||
254 | enable_wake | ||
255 | ----------- | ||
256 | |||
257 | Usage: | ||
258 | |||
259 | if (dev->driver && dev->driver->enable_wake) | ||
260 | dev->driver->enable_wake(dev,state,enable); | ||
261 | |||
262 | This callback is generally only relevant for devices that support the PCI PM | ||
263 | spec and have the ability to generate a PME# (Power Management Event Signal) | ||
264 | to wake the system up. (However, it is possible that a device may support | ||
265 | some non-standard way of generating a wake event on sleep.) | ||
266 | |||
267 | Bits 15:11 of the PMC (Power Mgmt Capabilities) Register in a device's | ||
268 | PM Capabilities describe what power states the device supports generating a | ||
269 | wake event from: | ||
270 | |||
271 | +------------------+ | ||
272 | | Bit | State | | ||
273 | +------------------+ | ||
274 | | 11 | D0 | | ||
275 | | 12 | D1 | | ||
276 | | 13 | D2 | | ||
277 | | 14 | D3hot | | ||
278 | | 15 | D3cold | | ||
279 | +------------------+ | ||
280 | |||
281 | A device can use this to enable wake events: | ||
282 | |||
283 | pci_enable_wake(dev,state,enable); | ||
284 | |||
285 | Note that to enable PME# from D3cold, a value of 4 should be passed to | ||
286 | pci_enable_wake (since it uses an index into a bitmask). If a driver gets | ||
287 | a request to enable wake events from D3, two calls should be made to | ||
288 | pci_enable_wake (one for both D3hot and D3cold). | ||
289 | |||
290 | 253 | ||
291 | A reference implementation | 254 | A reference implementation |
292 | ------------------------- | 255 | ------------------------- |
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt index 5b8d6953f05e..152b510d1bbb 100644 --- a/Documentation/power/swsusp.txt +++ b/Documentation/power/swsusp.txt | |||
@@ -393,6 +393,9 @@ safest thing is to unmount all filesystems on removable media (such USB, | |||
393 | Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays) | 393 | Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays) |
394 | before suspending; then remount them after resuming. | 394 | before suspending; then remount them after resuming. |
395 | 395 | ||
396 | There is a work-around for this problem. For more information, see | ||
397 | Documentation/usb/persist.txt. | ||
398 | |||
396 | Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were | 399 | Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were |
397 | compiled with the similar configuration files. Anyway I found that | 400 | compiled with the similar configuration files. Anyway I found that |
398 | suspend to disk (and resume) is much slower on 2.6.16 compared to | 401 | suspend to disk (and resume) is much slower on 2.6.16 compared to |
diff --git a/Documentation/power_supply_class.txt b/Documentation/power_supply_class.txt new file mode 100644 index 000000000000..9758cf433c06 --- /dev/null +++ b/Documentation/power_supply_class.txt | |||
@@ -0,0 +1,167 @@ | |||
1 | Linux power supply class | ||
2 | ======================== | ||
3 | |||
4 | Synopsis | ||
5 | ~~~~~~~~ | ||
6 | Power supply class used to represent battery, UPS, AC or DC power supply | ||
7 | properties to user-space. | ||
8 | |||
9 | It defines core set of attributes, which should be applicable to (almost) | ||
10 | every power supply out there. Attributes are available via sysfs and uevent | ||
11 | interfaces. | ||
12 | |||
13 | Each attribute has well defined meaning, up to unit of measure used. While | ||
14 | the attributes provided are believed to be universally applicable to any | ||
15 | power supply, specific monitoring hardware may not be able to provide them | ||
16 | all, so any of them may be skipped. | ||
17 | |||
18 | Power supply class is extensible, and allows to define drivers own attributes. | ||
19 | The core attribute set is subject to the standard Linux evolution (i.e. | ||
20 | if it will be found that some attribute is applicable to many power supply | ||
21 | types or their drivers, it can be added to the core set). | ||
22 | |||
23 | It also integrates with LED framework, for the purpose of providing | ||
24 | typically expected feedback of battery charging/fully charged status and | ||
25 | AC/USB power supply online status. (Note that specific details of the | ||
26 | indication (including whether to use it at all) are fully controllable by | ||
27 | user and/or specific machine defaults, per design principles of LED | ||
28 | framework). | ||
29 | |||
30 | |||
31 | Attributes/properties | ||
32 | ~~~~~~~~~~~~~~~~~~~~~ | ||
33 | Power supply class has predefined set of attributes, this eliminates code | ||
34 | duplication across drivers. Power supply class insist on reusing its | ||
35 | predefined attributes *and* their units. | ||
36 | |||
37 | So, userspace gets predictable set of attributes and their units for any | ||
38 | kind of power supply, and can process/present them to a user in consistent | ||
39 | manner. Results for different power supplies and machines are also directly | ||
40 | comparable. | ||
41 | |||
42 | See drivers/power/ds2760_battery.c and drivers/power/pda_power.c for the | ||
43 | example how to declare and handle attributes. | ||
44 | |||
45 | |||
46 | Units | ||
47 | ~~~~~ | ||
48 | Quoting include/linux/power_supply.h: | ||
49 | |||
50 | All voltages, currents, charges, energies, time and temperatures in µV, | ||
51 | µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise | ||
52 | stated. It's driver's job to convert its raw values to units in which | ||
53 | this class operates. | ||
54 | |||
55 | |||
56 | Attributes/properties detailed | ||
57 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
58 | |||
59 | ~ ~ ~ ~ ~ ~ ~ Charge/Energy/Capacity - how to not confuse ~ ~ ~ ~ ~ ~ ~ | ||
60 | ~ ~ | ||
61 | ~ Because both "charge" (µAh) and "energy" (µWh) represents "capacity" ~ | ||
62 | ~ of battery, this class distinguish these terms. Don't mix them! ~ | ||
63 | ~ ~ | ||
64 | ~ CHARGE_* attributes represents capacity in µAh only. ~ | ||
65 | ~ ENERGY_* attributes represents capacity in µWh only. ~ | ||
66 | ~ CAPACITY attribute represents capacity in *percents*, from 0 to 100. ~ | ||
67 | ~ ~ | ||
68 | ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ | ||
69 | |||
70 | Postfixes: | ||
71 | _AVG - *hardware* averaged value, use it if your hardware is really able to | ||
72 | report averaged values. | ||
73 | _NOW - momentary/instantaneous values. | ||
74 | |||
75 | STATUS - this attribute represents operating status (charging, full, | ||
76 | discharging (i.e. powering a load), etc.). This corresponds to | ||
77 | BATTERY_STATUS_* values, as defined in battery.h. | ||
78 | |||
79 | HEALTH - represents health of the battery, values corresponds to | ||
80 | POWER_SUPPLY_HEALTH_*, defined in battery.h. | ||
81 | |||
82 | VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN - design values for maximal and | ||
83 | minimal power supply voltages. Maximal/minimal means values of voltages | ||
84 | when battery considered "full"/"empty" at normal conditions. Yes, there is | ||
85 | no direct relation between voltage and battery capacity, but some dumb | ||
86 | batteries use voltage for very approximated calculation of capacity. | ||
87 | Battery driver also can use this attribute just to inform userspace | ||
88 | about maximal and minimal voltage thresholds of a given battery. | ||
89 | |||
90 | CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN - design charge values, when | ||
91 | battery considered full/empty. | ||
92 | |||
93 | ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN - same as above but for energy. | ||
94 | |||
95 | CHARGE_FULL, CHARGE_EMPTY - These attributes means "last remembered value | ||
96 | of charge when battery became full/empty". It also could mean "value of | ||
97 | charge when battery considered full/empty at given conditions (temperature, | ||
98 | age)". I.e. these attributes represents real thresholds, not design values. | ||
99 | |||
100 | ENERGY_FULL, ENERGY_EMPTY - same as above but for energy. | ||
101 | |||
102 | CAPACITY - capacity in percents. | ||
103 | CAPACITY_LEVEL - capacity level. This corresponds to | ||
104 | POWER_SUPPLY_CAPACITY_LEVEL_*. | ||
105 | |||
106 | TEMP - temperature of the power supply. | ||
107 | TEMP_AMBIENT - ambient temperature. | ||
108 | |||
109 | TIME_TO_EMPTY - seconds left for battery to be considered empty (i.e. | ||
110 | while battery powers a load) | ||
111 | TIME_TO_FULL - seconds left for battery to be considered full (i.e. | ||
112 | while battery is charging) | ||
113 | |||
114 | |||
115 | Battery <-> external power supply interaction | ||
116 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
117 | Often power supplies are acting as supplies and supplicants at the same | ||
118 | time. Batteries are good example. So, batteries usually care if they're | ||
119 | externally powered or not. | ||
120 | |||
121 | For that case, power supply class implements notification mechanism for | ||
122 | batteries. | ||
123 | |||
124 | External power supply (AC) lists supplicants (batteries) names in | ||
125 | "supplied_to" struct member, and each power_supply_changed() call | ||
126 | issued by external power supply will notify supplicants via | ||
127 | external_power_changed callback. | ||
128 | |||
129 | |||
130 | QA | ||
131 | ~~ | ||
132 | Q: Where is POWER_SUPPLY_PROP_XYZ attribute? | ||
133 | A: If you cannot find attribute suitable for your driver needs, feel free | ||
134 | to add it and send patch along with your driver. | ||
135 | |||
136 | The attributes available currently are the ones currently provided by the | ||
137 | drivers written. | ||
138 | |||
139 | Good candidates to add in future: model/part#, cycle_time, manufacturer, | ||
140 | etc. | ||
141 | |||
142 | |||
143 | Q: I have some very specific attribute (e.g. battery color), should I add | ||
144 | this attribute to standard ones? | ||
145 | A: Most likely, no. Such attribute can be placed in the driver itself, if | ||
146 | it is useful. Of course, if the attribute in question applicable to | ||
147 | large set of batteries, provided by many drivers, and/or comes from | ||
148 | some general battery specification/standard, it may be a candidate to | ||
149 | be added to the core attribute set. | ||
150 | |||
151 | |||
152 | Q: Suppose, my battery monitoring chip/firmware does not provides capacity | ||
153 | in percents, but provides charge_{now,full,empty}. Should I calculate | ||
154 | percentage capacity manually, inside the driver, and register CAPACITY | ||
155 | attribute? The same question about time_to_empty/time_to_full. | ||
156 | A: Most likely, no. This class is designed to export properties which are | ||
157 | directly measurable by the specific hardware available. | ||
158 | |||
159 | Inferring not available properties using some heuristics or mathematical | ||
160 | model is not subject of work for a battery driver. Such functionality | ||
161 | should be factored out, and in fact, apm_power, the driver to serve | ||
162 | legacy APM API on top of power supply class, uses a simple heuristic of | ||
163 | approximating remaining battery capacity based on its charge, current, | ||
164 | voltage and so on. But full-fledged battery model is likely not subject | ||
165 | for kernel at all, as it would require floating point calculation to deal | ||
166 | with things like differential equations and Kalman filters. This is | ||
167 | better be handled by batteryd/libbattery, yet to be written. | ||
diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt index b49ce169a63a..d42d98107d49 100644 --- a/Documentation/powerpc/booting-without-of.txt +++ b/Documentation/powerpc/booting-without-of.txt | |||
@@ -1,7 +1,6 @@ | |||
1 | Booting the Linux/ppc kernel without Open Firmware | 1 | Booting the Linux/ppc kernel without Open Firmware |
2 | -------------------------------------------------- | 2 | -------------------------------------------------- |
3 | 3 | ||
4 | |||
5 | (c) 2005 Benjamin Herrenschmidt <benh at kernel.crashing.org>, | 4 | (c) 2005 Benjamin Herrenschmidt <benh at kernel.crashing.org>, |
6 | IBM Corp. | 5 | IBM Corp. |
7 | (c) 2005 Becky Bruce <becky.bruce at freescale.com>, | 6 | (c) 2005 Becky Bruce <becky.bruce at freescale.com>, |
@@ -9,6 +8,62 @@ | |||
9 | (c) 2006 MontaVista Software, Inc. | 8 | (c) 2006 MontaVista Software, Inc. |
10 | Flash chip node definition | 9 | Flash chip node definition |
11 | 10 | ||
11 | Table of Contents | ||
12 | ================= | ||
13 | |||
14 | I - Introduction | ||
15 | 1) Entry point for arch/powerpc | ||
16 | 2) Board support | ||
17 | |||
18 | II - The DT block format | ||
19 | 1) Header | ||
20 | 2) Device tree generalities | ||
21 | 3) Device tree "structure" block | ||
22 | 4) Device tree "strings" block | ||
23 | |||
24 | III - Required content of the device tree | ||
25 | 1) Note about cells and address representation | ||
26 | 2) Note about "compatible" properties | ||
27 | 3) Note about "name" properties | ||
28 | 4) Note about node and property names and character set | ||
29 | 5) Required nodes and properties | ||
30 | a) The root node | ||
31 | b) The /cpus node | ||
32 | c) The /cpus/* nodes | ||
33 | d) the /memory node(s) | ||
34 | e) The /chosen node | ||
35 | f) the /soc<SOCname> node | ||
36 | |||
37 | IV - "dtc", the device tree compiler | ||
38 | |||
39 | V - Recommendations for a bootloader | ||
40 | |||
41 | VI - System-on-a-chip devices and nodes | ||
42 | 1) Defining child nodes of an SOC | ||
43 | 2) Representing devices without a current OF specification | ||
44 | a) MDIO IO device | ||
45 | c) PHY nodes | ||
46 | b) Gianfar-compatible ethernet nodes | ||
47 | d) Interrupt controllers | ||
48 | e) I2C | ||
49 | f) Freescale SOC USB controllers | ||
50 | g) Freescale SOC SEC Security Engines | ||
51 | h) Board Control and Status (BCSR) | ||
52 | i) Freescale QUICC Engine module (QE) | ||
53 | g) Flash chip nodes | ||
54 | |||
55 | VII - Specifying interrupt information for devices | ||
56 | 1) interrupts property | ||
57 | 2) interrupt-parent property | ||
58 | 3) OpenPIC Interrupt Controllers | ||
59 | 4) ISA Interrupt Controllers | ||
60 | |||
61 | Appendix A - Sample SOC node for MPC8540 | ||
62 | |||
63 | |||
64 | Revision Information | ||
65 | ==================== | ||
66 | |||
12 | May 18, 2005: Rev 0.1 - Initial draft, no chapter III yet. | 67 | May 18, 2005: Rev 0.1 - Initial draft, no chapter III yet. |
13 | 68 | ||
14 | May 19, 2005: Rev 0.2 - Add chapter III and bits & pieces here or | 69 | May 19, 2005: Rev 0.2 - Add chapter III and bits & pieces here or |
@@ -1687,7 +1742,7 @@ platforms are moved over to use the flattened-device-tree model. | |||
1687 | }; | 1742 | }; |
1688 | }; | 1743 | }; |
1689 | 1744 | ||
1690 | g) Flash chip nodes | 1745 | j) Flash chip nodes |
1691 | 1746 | ||
1692 | Flash chips (Memory Technology Devices) are often used for solid state | 1747 | Flash chips (Memory Technology Devices) are often used for solid state |
1693 | file systems on embedded devices. | 1748 | file systems on embedded devices. |
diff --git a/Documentation/sched-design-CFS.txt b/Documentation/sched-design-CFS.txt new file mode 100644 index 000000000000..16feebb7bdc0 --- /dev/null +++ b/Documentation/sched-design-CFS.txt | |||
@@ -0,0 +1,119 @@ | |||
1 | |||
2 | This is the CFS scheduler. | ||
3 | |||
4 | 80% of CFS's design can be summed up in a single sentence: CFS basically | ||
5 | models an "ideal, precise multi-tasking CPU" on real hardware. | ||
6 | |||
7 | "Ideal multi-tasking CPU" is a (non-existent :-)) CPU that has 100% | ||
8 | physical power and which can run each task at precise equal speed, in | ||
9 | parallel, each at 1/nr_running speed. For example: if there are 2 tasks | ||
10 | running then it runs each at 50% physical power - totally in parallel. | ||
11 | |||
12 | On real hardware, we can run only a single task at once, so while that | ||
13 | one task runs, the other tasks that are waiting for the CPU are at a | ||
14 | disadvantage - the current task gets an unfair amount of CPU time. In | ||
15 | CFS this fairness imbalance is expressed and tracked via the per-task | ||
16 | p->wait_runtime (nanosec-unit) value. "wait_runtime" is the amount of | ||
17 | time the task should now run on the CPU for it to become completely fair | ||
18 | and balanced. | ||
19 | |||
20 | ( small detail: on 'ideal' hardware, the p->wait_runtime value would | ||
21 | always be zero - no task would ever get 'out of balance' from the | ||
22 | 'ideal' share of CPU time. ) | ||
23 | |||
24 | CFS's task picking logic is based on this p->wait_runtime value and it | ||
25 | is thus very simple: it always tries to run the task with the largest | ||
26 | p->wait_runtime value. In other words, CFS tries to run the task with | ||
27 | the 'gravest need' for more CPU time. So CFS always tries to split up | ||
28 | CPU time between runnable tasks as close to 'ideal multitasking | ||
29 | hardware' as possible. | ||
30 | |||
31 | Most of the rest of CFS's design just falls out of this really simple | ||
32 | concept, with a few add-on embellishments like nice levels, | ||
33 | multiprocessing and various algorithm variants to recognize sleepers. | ||
34 | |||
35 | In practice it works like this: the system runs a task a bit, and when | ||
36 | the task schedules (or a scheduler tick happens) the task's CPU usage is | ||
37 | 'accounted for': the (small) time it just spent using the physical CPU | ||
38 | is deducted from p->wait_runtime. [minus the 'fair share' it would have | ||
39 | gotten anyway]. Once p->wait_runtime gets low enough so that another | ||
40 | task becomes the 'leftmost task' of the time-ordered rbtree it maintains | ||
41 | (plus a small amount of 'granularity' distance relative to the leftmost | ||
42 | task so that we do not over-schedule tasks and trash the cache) then the | ||
43 | new leftmost task is picked and the current task is preempted. | ||
44 | |||
45 | The rq->fair_clock value tracks the 'CPU time a runnable task would have | ||
46 | fairly gotten, had it been runnable during that time'. So by using | ||
47 | rq->fair_clock values we can accurately timestamp and measure the | ||
48 | 'expected CPU time' a task should have gotten. All runnable tasks are | ||
49 | sorted in the rbtree by the "rq->fair_clock - p->wait_runtime" key, and | ||
50 | CFS picks the 'leftmost' task and sticks to it. As the system progresses | ||
51 | forwards, newly woken tasks are put into the tree more and more to the | ||
52 | right - slowly but surely giving a chance for every task to become the | ||
53 | 'leftmost task' and thus get on the CPU within a deterministic amount of | ||
54 | time. | ||
55 | |||
56 | Some implementation details: | ||
57 | |||
58 | - the introduction of Scheduling Classes: an extensible hierarchy of | ||
59 | scheduler modules. These modules encapsulate scheduling policy | ||
60 | details and are handled by the scheduler core without the core | ||
61 | code assuming about them too much. | ||
62 | |||
63 | - sched_fair.c implements the 'CFS desktop scheduler': it is a | ||
64 | replacement for the vanilla scheduler's SCHED_OTHER interactivity | ||
65 | code. | ||
66 | |||
67 | I'd like to give credit to Con Kolivas for the general approach here: | ||
68 | he has proven via RSDL/SD that 'fair scheduling' is possible and that | ||
69 | it results in better desktop scheduling. Kudos Con! | ||
70 | |||
71 | The CFS patch uses a completely different approach and implementation | ||
72 | from RSDL/SD. My goal was to make CFS's interactivity quality exceed | ||
73 | that of RSDL/SD, which is a high standard to meet :-) Testing | ||
74 | feedback is welcome to decide this one way or another. [ and, in any | ||
75 | case, all of SD's logic could be added via a kernel/sched_sd.c module | ||
76 | as well, if Con is interested in such an approach. ] | ||
77 | |||
78 | CFS's design is quite radical: it does not use runqueues, it uses a | ||
79 | time-ordered rbtree to build a 'timeline' of future task execution, | ||
80 | and thus has no 'array switch' artifacts (by which both the vanilla | ||
81 | scheduler and RSDL/SD are affected). | ||
82 | |||
83 | CFS uses nanosecond granularity accounting and does not rely on any | ||
84 | jiffies or other HZ detail. Thus the CFS scheduler has no notion of | ||
85 | 'timeslices' and has no heuristics whatsoever. There is only one | ||
86 | central tunable: | ||
87 | |||
88 | /proc/sys/kernel/sched_granularity_ns | ||
89 | |||
90 | which can be used to tune the scheduler from 'desktop' (low | ||
91 | latencies) to 'server' (good batching) workloads. It defaults to a | ||
92 | setting suitable for desktop workloads. SCHED_BATCH is handled by the | ||
93 | CFS scheduler module too. | ||
94 | |||
95 | Due to its design, the CFS scheduler is not prone to any of the | ||
96 | 'attacks' that exist today against the heuristics of the stock | ||
97 | scheduler: fiftyp.c, thud.c, chew.c, ring-test.c, massive_intr.c all | ||
98 | work fine and do not impact interactivity and produce the expected | ||
99 | behavior. | ||
100 | |||
101 | the CFS scheduler has a much stronger handling of nice levels and | ||
102 | SCHED_BATCH: both types of workloads should be isolated much more | ||
103 | agressively than under the vanilla scheduler. | ||
104 | |||
105 | ( another detail: due to nanosec accounting and timeline sorting, | ||
106 | sched_yield() support is very simple under CFS, and in fact under | ||
107 | CFS sched_yield() behaves much better than under any other | ||
108 | scheduler i have tested so far. ) | ||
109 | |||
110 | - sched_rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler | ||
111 | way than the vanilla scheduler does. It uses 100 runqueues (for all | ||
112 | 100 RT priority levels, instead of 140 in the vanilla scheduler) | ||
113 | and it needs no expired array. | ||
114 | |||
115 | - reworked/sanitized SMP load-balancing: the runqueue-walking | ||
116 | assumptions are gone from the load-balancing code now, and | ||
117 | iterators of the scheduling modules are used. The balancing code got | ||
118 | quite a bit simpler as a result. | ||
119 | |||
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt index 57b878cc393c..355ff0a2bb7c 100644 --- a/Documentation/sound/alsa/ALSA-Configuration.txt +++ b/Documentation/sound/alsa/ALSA-Configuration.txt | |||
@@ -917,6 +917,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
917 | ref Reference board, base config | 917 | ref Reference board, base config |
918 | m2-2 Some Gateway MX series laptops | 918 | m2-2 Some Gateway MX series laptops |
919 | m6 Some Gateway NX series laptops | 919 | m6 Some Gateway NX series laptops |
920 | pa6 Gateway NX860 series | ||
920 | 921 | ||
921 | STAC9227/9228/9229/927x | 922 | STAC9227/9228/9229/927x |
922 | ref Reference board | 923 | ref Reference board |
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 1d192565e182..8cfca173d4bc 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt | |||
@@ -31,6 +31,7 @@ Currently, these files are in /proc/sys/vm: | |||
31 | - min_unmapped_ratio | 31 | - min_unmapped_ratio |
32 | - min_slab_ratio | 32 | - min_slab_ratio |
33 | - panic_on_oom | 33 | - panic_on_oom |
34 | - mmap_min_address | ||
34 | 35 | ||
35 | ============================================================== | 36 | ============================================================== |
36 | 37 | ||
@@ -216,3 +217,17 @@ above-mentioned. | |||
216 | The default value is 0. | 217 | The default value is 0. |
217 | 1 and 2 are for failover of clustering. Please select either | 218 | 1 and 2 are for failover of clustering. Please select either |
218 | according to your policy of failover. | 219 | according to your policy of failover. |
220 | |||
221 | ============================================================== | ||
222 | |||
223 | mmap_min_addr | ||
224 | |||
225 | This file indicates the amount of address space which a user process will | ||
226 | be restricted from mmaping. Since kernel null dereference bugs could | ||
227 | accidentally operate based on the information in the first couple of pages | ||
228 | of memory userspace processes should not be allowed to write to them. By | ||
229 | default this value is set to 0 and no protections will be enforced by the | ||
230 | security module. Setting this value to something like 64k will allow the | ||
231 | vast majority of applications to work correctly and provide defense in depth | ||
232 | against future potential kernel bugs. | ||
233 | |||
diff --git a/Documentation/sysfs-rules.txt b/Documentation/sysfs-rules.txt new file mode 100644 index 000000000000..42861bb0bc9b --- /dev/null +++ b/Documentation/sysfs-rules.txt | |||
@@ -0,0 +1,166 @@ | |||
1 | Rules on how to access information in the Linux kernel sysfs | ||
2 | |||
3 | The kernel exported sysfs exports internal kernel implementation-details | ||
4 | and depends on internal kernel structures and layout. It is agreed upon | ||
5 | by the kernel developers that the Linux kernel does not provide a stable | ||
6 | internal API. As sysfs is a direct export of kernel internal | ||
7 | structures, the sysfs interface can not provide a stable interface eighter, | ||
8 | it may always change along with internal kernel changes. | ||
9 | |||
10 | To minimize the risk of breaking users of sysfs, which are in most cases | ||
11 | low-level userspace applications, with a new kernel release, the users | ||
12 | of sysfs must follow some rules to use an as abstract-as-possible way to | ||
13 | access this filesystem. The current udev and HAL programs already | ||
14 | implement this and users are encouraged to plug, if possible, into the | ||
15 | abstractions these programs provide instead of accessing sysfs | ||
16 | directly. | ||
17 | |||
18 | But if you really do want or need to access sysfs directly, please follow | ||
19 | the following rules and then your programs should work with future | ||
20 | versions of the sysfs interface. | ||
21 | |||
22 | - Do not use libsysfs | ||
23 | It makes assumptions about sysfs which are not true. Its API does not | ||
24 | offer any abstraction, it exposes all the kernel driver-core | ||
25 | implementation details in its own API. Therefore it is not better than | ||
26 | reading directories and opening the files yourself. | ||
27 | Also, it is not actively maintained, in the sense of reflecting the | ||
28 | current kernel-development. The goal of providing a stable interface | ||
29 | to sysfs has failed, it causes more problems, than it solves. It | ||
30 | violates many of the rules in this document. | ||
31 | |||
32 | - sysfs is always at /sys | ||
33 | Parsing /proc/mounts is a waste of time. Other mount points are a | ||
34 | system configuration bug you should not try to solve. For test cases, | ||
35 | possibly support a SYSFS_PATH environment variable to overwrite the | ||
36 | applications behavior, but never try to search for sysfs. Never try | ||
37 | to mount it, if you are not an early boot script. | ||
38 | |||
39 | - devices are only "devices" | ||
40 | There is no such thing like class-, bus-, physical devices, | ||
41 | interfaces, and such that you can rely on in userspace. Everything is | ||
42 | just simply a "device". Class-, bus-, physical, ... types are just | ||
43 | kernel implementation details, which should not be expected by | ||
44 | applications that look for devices in sysfs. | ||
45 | |||
46 | The properties of a device are: | ||
47 | o devpath (/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0) | ||
48 | - identical to the DEVPATH value in the event sent from the kernel | ||
49 | at device creation and removal | ||
50 | - the unique key to the device at that point in time | ||
51 | - the kernels path to the device-directory without the leading | ||
52 | /sys, and always starting with with a slash | ||
53 | - all elements of a devpath must be real directories. Symlinks | ||
54 | pointing to /sys/devices must always be resolved to their real | ||
55 | target, and the target path must be used to access the device. | ||
56 | That way the devpath to the device matches the devpath of the | ||
57 | kernel used at event time. | ||
58 | - using or exposing symlink values as elements in a devpath string | ||
59 | is a bug in the application | ||
60 | |||
61 | o kernel name (sda, tty, 0000:00:1f.2, ...) | ||
62 | - a directory name, identical to the last element of the devpath | ||
63 | - applications need to handle spaces and characters like '!' in | ||
64 | the name | ||
65 | |||
66 | o subsystem (block, tty, pci, ...) | ||
67 | - simple string, never a path or a link | ||
68 | - retrieved by reading the "subsystem"-link and using only the | ||
69 | last element of the target path | ||
70 | |||
71 | o driver (tg3, ata_piix, uhci_hcd) | ||
72 | - a simple string, which may contain spaces, never a path or a | ||
73 | link | ||
74 | - it is retrieved by reading the "driver"-link and using only the | ||
75 | last element of the target path | ||
76 | - devices which do not have "driver"-link, just do not have a | ||
77 | driver; copying the driver value in a child device context, is a | ||
78 | bug in the application | ||
79 | |||
80 | o attributes | ||
81 | - the files in the device directory or files below a subdirectories | ||
82 | of the same device directory | ||
83 | - accessing attributes reached by a symlink pointing to another device, | ||
84 | like the "device"-link, is a bug in the application | ||
85 | |||
86 | Everything else is just a kernel driver-core implementation detail, | ||
87 | that should not be assumed to be stable across kernel releases. | ||
88 | |||
89 | - Properties of parent devices never belong into a child device. | ||
90 | Always look at the parent devices themselves for determining device | ||
91 | context properties. If the device 'eth0' or 'sda' does not have a | ||
92 | "driver"-link, then this device does not have a driver. Its value is empty. | ||
93 | Never copy any property of the parent-device into a child-device. Parent | ||
94 | device-properties may change dynamically without any notice to the | ||
95 | child device. | ||
96 | |||
97 | - Hierarchy in a single device-tree | ||
98 | There is only one valid place in sysfs where hierarchy can be examined | ||
99 | and this is below: /sys/devices. | ||
100 | It is planned, that all device directories will end up in the tree | ||
101 | below this directory. | ||
102 | |||
103 | - Classification by subsystem | ||
104 | There are currently three places for classification of devices: | ||
105 | /sys/block, /sys/class and /sys/bus. It is planned that these will | ||
106 | not contain any device-directories themselves, but only flat lists of | ||
107 | symlinks pointing to the unified /sys/devices tree. | ||
108 | All three places have completely different rules on how to access | ||
109 | device information. It is planned to merge all three | ||
110 | classification-directories into one place at /sys/subsystem, | ||
111 | following the layout of the bus-directories. All buses and | ||
112 | classes, including the converted block-subsystem, will show up | ||
113 | there. | ||
114 | The devices belonging to a subsystem will create a symlink in the | ||
115 | "devices" directory at /sys/subsystem/<name>/devices. | ||
116 | |||
117 | If /sys/subsystem exists, /sys/bus, /sys/class and /sys/block can be | ||
118 | ignored. If it does not exist, you have always to scan all three | ||
119 | places, as the kernel is free to move a subsystem from one place to | ||
120 | the other, as long as the devices are still reachable by the same | ||
121 | subsystem name. | ||
122 | |||
123 | Assuming /sys/class/<subsystem> and /sys/bus/<subsystem>, or | ||
124 | /sys/block and /sys/class/block are not interchangeable, is a bug in | ||
125 | the application. | ||
126 | |||
127 | - Block | ||
128 | The converted block-subsystem at /sys/class/block, or | ||
129 | /sys/subsystem/block will contain the links for disks and partitions | ||
130 | at the same level, never in a hierarchy. Assuming the block-subsytem to | ||
131 | contain only disks and not partition-devices in the same flat list is | ||
132 | a bug in the application. | ||
133 | |||
134 | - "device"-link and <subsystem>:<kernel name>-links | ||
135 | Never depend on the "device"-link. The "device"-link is a workaround | ||
136 | for the old layout, where class-devices are not created in | ||
137 | /sys/devices/ like the bus-devices. If the link-resolving of a | ||
138 | device-directory does not end in /sys/devices/, you can use the | ||
139 | "device"-link to find the parent devices in /sys/devices/. That is the | ||
140 | single valid use of the "device"-link, it must never appear in any | ||
141 | path as an element. Assuming the existence of the "device"-link for | ||
142 | a device in /sys/devices/ is a bug in the application. | ||
143 | Accessing /sys/class/net/eth0/device is a bug in the application. | ||
144 | |||
145 | Never depend on the class-specific links back to the /sys/class | ||
146 | directory. These links are also a workaround for the design mistake | ||
147 | that class-devices are not created in /sys/devices. If a device | ||
148 | directory does not contain directories for child devices, these links | ||
149 | may be used to find the child devices in /sys/class. That is the single | ||
150 | valid use of these links, they must never appear in any path as an | ||
151 | element. Assuming the existence of these links for devices which are | ||
152 | real child device directories in the /sys/devices tree, is a bug in | ||
153 | the application. | ||
154 | |||
155 | It is planned to remove all these links when when all class-device | ||
156 | directories live in /sys/devices. | ||
157 | |||
158 | - Position of devices along device chain can change. | ||
159 | Never depend on a specific parent device position in the devpath, | ||
160 | or the chain of parent devices. The kernel is free to insert devices into | ||
161 | the chain. You must always request the parent device you are looking for | ||
162 | by its subsystem value. You need to walk up the chain until you find | ||
163 | the device that matches the expected subsystem. Depending on a specific | ||
164 | position of a parent device, or exposing relative paths, using "../" to | ||
165 | access the chain of parents, is a bug in the application. | ||
166 | |||
diff --git a/Documentation/thinkpad-acpi.txt b/Documentation/thinkpad-acpi.txt index 2d4803359a04..9e6b94face4b 100644 --- a/Documentation/thinkpad-acpi.txt +++ b/Documentation/thinkpad-acpi.txt | |||
@@ -138,7 +138,7 @@ Hot keys | |||
138 | -------- | 138 | -------- |
139 | 139 | ||
140 | procfs: /proc/acpi/ibm/hotkey | 140 | procfs: /proc/acpi/ibm/hotkey |
141 | sysfs device attribute: hotkey/* | 141 | sysfs device attribute: hotkey_* |
142 | 142 | ||
143 | Without this driver, only the Fn-F4 key (sleep button) generates an | 143 | Without this driver, only the Fn-F4 key (sleep button) generates an |
144 | ACPI event. With the driver loaded, the hotkey feature enabled and the | 144 | ACPI event. With the driver loaded, the hotkey feature enabled and the |
@@ -196,10 +196,7 @@ The following commands can be written to the /proc/acpi/ibm/hotkey file: | |||
196 | 196 | ||
197 | sysfs notes: | 197 | sysfs notes: |
198 | 198 | ||
199 | The hot keys attributes are in a hotkey/ subdirectory off the | 199 | hotkey_bios_enabled: |
200 | thinkpad device. | ||
201 | |||
202 | bios_enabled: | ||
203 | Returns the status of the hot keys feature when | 200 | Returns the status of the hot keys feature when |
204 | thinkpad-acpi was loaded. Upon module unload, the hot | 201 | thinkpad-acpi was loaded. Upon module unload, the hot |
205 | key feature status will be restored to this value. | 202 | key feature status will be restored to this value. |
@@ -207,19 +204,19 @@ sysfs notes: | |||
207 | 0: hot keys were disabled | 204 | 0: hot keys were disabled |
208 | 1: hot keys were enabled | 205 | 1: hot keys were enabled |
209 | 206 | ||
210 | bios_mask: | 207 | hotkey_bios_mask: |
211 | Returns the hot keys mask when thinkpad-acpi was loaded. | 208 | Returns the hot keys mask when thinkpad-acpi was loaded. |
212 | Upon module unload, the hot keys mask will be restored | 209 | Upon module unload, the hot keys mask will be restored |
213 | to this value. | 210 | to this value. |
214 | 211 | ||
215 | enable: | 212 | hotkey_enable: |
216 | Enables/disables the hot keys feature, and reports | 213 | Enables/disables the hot keys feature, and reports |
217 | current status of the hot keys feature. | 214 | current status of the hot keys feature. |
218 | 215 | ||
219 | 0: disables the hot keys feature / feature disabled | 216 | 0: disables the hot keys feature / feature disabled |
220 | 1: enables the hot keys feature / feature enabled | 217 | 1: enables the hot keys feature / feature enabled |
221 | 218 | ||
222 | mask: | 219 | hotkey_mask: |
223 | bit mask to enable ACPI event generation for each hot | 220 | bit mask to enable ACPI event generation for each hot |
224 | key (see above). Returns the current status of the hot | 221 | key (see above). Returns the current status of the hot |
225 | keys mask, and allows one to modify it. | 222 | keys mask, and allows one to modify it. |
@@ -229,7 +226,7 @@ Bluetooth | |||
229 | --------- | 226 | --------- |
230 | 227 | ||
231 | procfs: /proc/acpi/ibm/bluetooth | 228 | procfs: /proc/acpi/ibm/bluetooth |
232 | sysfs device attribute: bluetooth/enable | 229 | sysfs device attribute: bluetooth_enable |
233 | 230 | ||
234 | This feature shows the presence and current state of a ThinkPad | 231 | This feature shows the presence and current state of a ThinkPad |
235 | Bluetooth device in the internal ThinkPad CDC slot. | 232 | Bluetooth device in the internal ThinkPad CDC slot. |
@@ -244,7 +241,7 @@ If Bluetooth is installed, the following commands can be used: | |||
244 | Sysfs notes: | 241 | Sysfs notes: |
245 | 242 | ||
246 | If the Bluetooth CDC card is installed, it can be enabled / | 243 | If the Bluetooth CDC card is installed, it can be enabled / |
247 | disabled through the "bluetooth/enable" thinkpad-acpi device | 244 | disabled through the "bluetooth_enable" thinkpad-acpi device |
248 | attribute, and its current status can also be queried. | 245 | attribute, and its current status can also be queried. |
249 | 246 | ||
250 | enable: | 247 | enable: |
@@ -252,7 +249,7 @@ Sysfs notes: | |||
252 | 1: enables Bluetooth / Bluetooth is enabled. | 249 | 1: enables Bluetooth / Bluetooth is enabled. |
253 | 250 | ||
254 | Note: this interface will be probably be superseeded by the | 251 | Note: this interface will be probably be superseeded by the |
255 | generic rfkill class. | 252 | generic rfkill class, so it is NOT to be considered stable yet. |
256 | 253 | ||
257 | Video output control -- /proc/acpi/ibm/video | 254 | Video output control -- /proc/acpi/ibm/video |
258 | -------------------------------------------- | 255 | -------------------------------------------- |
@@ -898,7 +895,7 @@ EXPERIMENTAL: WAN | |||
898 | ----------------- | 895 | ----------------- |
899 | 896 | ||
900 | procfs: /proc/acpi/ibm/wan | 897 | procfs: /proc/acpi/ibm/wan |
901 | sysfs device attribute: wwan/enable | 898 | sysfs device attribute: wwan_enable |
902 | 899 | ||
903 | This feature is marked EXPERIMENTAL because the implementation | 900 | This feature is marked EXPERIMENTAL because the implementation |
904 | directly accesses hardware registers and may not work as expected. USE | 901 | directly accesses hardware registers and may not work as expected. USE |
@@ -921,7 +918,7 @@ If the W-WAN card is installed, the following commands can be used: | |||
921 | Sysfs notes: | 918 | Sysfs notes: |
922 | 919 | ||
923 | If the W-WAN card is installed, it can be enabled / | 920 | If the W-WAN card is installed, it can be enabled / |
924 | disabled through the "wwan/enable" thinkpad-acpi device | 921 | disabled through the "wwan_enable" thinkpad-acpi device |
925 | attribute, and its current status can also be queried. | 922 | attribute, and its current status can also be queried. |
926 | 923 | ||
927 | enable: | 924 | enable: |
@@ -929,7 +926,7 @@ Sysfs notes: | |||
929 | 1: enables WWAN card / WWAN card is enabled. | 926 | 1: enables WWAN card / WWAN card is enabled. |
930 | 927 | ||
931 | Note: this interface will be probably be superseeded by the | 928 | Note: this interface will be probably be superseeded by the |
932 | generic rfkill class. | 929 | generic rfkill class, so it is NOT to be considered stable yet. |
933 | 930 | ||
934 | Multiple Commands, Module Parameters | 931 | Multiple Commands, Module Parameters |
935 | ------------------------------------ | 932 | ------------------------------------ |
diff --git a/Documentation/usb/dma.txt b/Documentation/usb/dma.txt index 62844aeba69c..e8b50b7de9d9 100644 --- a/Documentation/usb/dma.txt +++ b/Documentation/usb/dma.txt | |||
@@ -32,12 +32,15 @@ ELIMINATING COPIES | |||
32 | It's good to avoid making CPUs copy data needlessly. The costs can add up, | 32 | It's good to avoid making CPUs copy data needlessly. The costs can add up, |
33 | and effects like cache-trashing can impose subtle penalties. | 33 | and effects like cache-trashing can impose subtle penalties. |
34 | 34 | ||
35 | - When you're allocating a buffer for DMA purposes anyway, use the buffer | 35 | - If you're doing lots of small data transfers from the same buffer all |
36 | primitives. Think of them as kmalloc and kfree that give you the right | 36 | the time, that can really burn up resources on systems which use an |
37 | kind of addresses to store in urb->transfer_buffer and urb->transfer_dma, | 37 | IOMMU to manage the DMA mappings. It can cost MUCH more to set up and |
38 | while guaranteeing that no hidden copies through DMA "bounce" buffers will | 38 | tear down the IOMMU mappings with each request than perform the I/O! |
39 | slow things down. You'd also set URB_NO_TRANSFER_DMA_MAP in | 39 | |
40 | urb->transfer_flags: | 40 | For those specific cases, USB has primitives to allocate less expensive |
41 | memory. They work like kmalloc and kfree versions that give you the right | ||
42 | kind of addresses to store in urb->transfer_buffer and urb->transfer_dma. | ||
43 | You'd also set URB_NO_TRANSFER_DMA_MAP in urb->transfer_flags: | ||
41 | 44 | ||
42 | void *usb_buffer_alloc (struct usb_device *dev, size_t size, | 45 | void *usb_buffer_alloc (struct usb_device *dev, size_t size, |
43 | int mem_flags, dma_addr_t *dma); | 46 | int mem_flags, dma_addr_t *dma); |
@@ -45,6 +48,10 @@ and effects like cache-trashing can impose subtle penalties. | |||
45 | void usb_buffer_free (struct usb_device *dev, size_t size, | 48 | void usb_buffer_free (struct usb_device *dev, size_t size, |
46 | void *addr, dma_addr_t dma); | 49 | void *addr, dma_addr_t dma); |
47 | 50 | ||
51 | Most drivers should *NOT* be using these primitives; they don't need | ||
52 | to use this type of memory ("dma-coherent"), and memory returned from | ||
53 | kmalloc() will work just fine. | ||
54 | |||
48 | For control transfers you can use the buffer primitives or not for each | 55 | For control transfers you can use the buffer primitives or not for each |
49 | of the transfer buffer and setup buffer independently. Set the flag bits | 56 | of the transfer buffer and setup buffer independently. Set the flag bits |
50 | URB_NO_TRANSFER_DMA_MAP and URB_NO_SETUP_DMA_MAP to indicate which | 57 | URB_NO_TRANSFER_DMA_MAP and URB_NO_SETUP_DMA_MAP to indicate which |
@@ -54,29 +61,39 @@ and effects like cache-trashing can impose subtle penalties. | |||
54 | The memory buffer returned is "dma-coherent"; sometimes you might need to | 61 | The memory buffer returned is "dma-coherent"; sometimes you might need to |
55 | force a consistent memory access ordering by using memory barriers. It's | 62 | force a consistent memory access ordering by using memory barriers. It's |
56 | not using a streaming DMA mapping, so it's good for small transfers on | 63 | not using a streaming DMA mapping, so it's good for small transfers on |
57 | systems where the I/O would otherwise tie up an IOMMU mapping. (See | 64 | systems where the I/O would otherwise thrash an IOMMU mapping. (See |
58 | Documentation/DMA-mapping.txt for definitions of "coherent" and "streaming" | 65 | Documentation/DMA-mapping.txt for definitions of "coherent" and "streaming" |
59 | DMA mappings.) | 66 | DMA mappings.) |
60 | 67 | ||
61 | Asking for 1/Nth of a page (as well as asking for N pages) is reasonably | 68 | Asking for 1/Nth of a page (as well as asking for N pages) is reasonably |
62 | space-efficient. | 69 | space-efficient. |
63 | 70 | ||
71 | On most systems the memory returned will be uncached, because the | ||
72 | semantics of dma-coherent memory require either bypassing CPU caches | ||
73 | or using cache hardware with bus-snooping support. While x86 hardware | ||
74 | has such bus-snooping, many other systems use software to flush cache | ||
75 | lines to prevent DMA conflicts. | ||
76 | |||
64 | - Devices on some EHCI controllers could handle DMA to/from high memory. | 77 | - Devices on some EHCI controllers could handle DMA to/from high memory. |
65 | Driver probe() routines can notice this using a generic DMA call, then | ||
66 | tell higher level code (network, scsi, etc) about it like this: | ||
67 | 78 | ||
68 | if (dma_supported (&intf->dev, 0xffffffffffffffffULL)) | 79 | Unfortunately, the current Linux DMA infrastructure doesn't have a sane |
69 | net->features |= NETIF_F_HIGHDMA; | 80 | way to expose these capabilities ... and in any case, HIGHMEM is mostly a |
81 | design wart specific to x86_32. So your best bet is to ensure you never | ||
82 | pass a highmem buffer into a USB driver. That's easy; it's the default | ||
83 | behavior. Just don't override it; e.g. with NETIF_F_HIGHDMA. | ||
70 | 84 | ||
71 | That can eliminate dma bounce buffering of requests that originate (or | 85 | This may force your callers to do some bounce buffering, copying from |
72 | terminate) in high memory, in cases where the buffers aren't allocated | 86 | high memory to "normal" DMA memory. If you can come up with a good way |
73 | with usb_buffer_alloc() but instead are dma-mapped. | 87 | to fix this issue (for x86_32 machines with over 1 GByte of memory), |
88 | feel free to submit patches. | ||
74 | 89 | ||
75 | 90 | ||
76 | WORKING WITH EXISTING BUFFERS | 91 | WORKING WITH EXISTING BUFFERS |
77 | 92 | ||
78 | Existing buffers aren't usable for DMA without first being mapped into the | 93 | Existing buffers aren't usable for DMA without first being mapped into the |
79 | DMA address space of the device. | 94 | DMA address space of the device. However, most buffers passed to your |
95 | driver can safely be used with such DMA mapping. (See the first section | ||
96 | of DMA-mapping.txt, titled "What memory is DMA-able?") | ||
80 | 97 | ||
81 | - When you're using scatterlists, you can map everything at once. On some | 98 | - When you're using scatterlists, you can map everything at once. On some |
82 | systems, this kicks in an IOMMU and turns the scatterlists into single | 99 | systems, this kicks in an IOMMU and turns the scatterlists into single |
@@ -114,3 +131,8 @@ DMA address space of the device. | |||
114 | The calls manage urb->transfer_dma for you, and set URB_NO_TRANSFER_DMA_MAP | 131 | The calls manage urb->transfer_dma for you, and set URB_NO_TRANSFER_DMA_MAP |
115 | so that usbcore won't map or unmap the buffer. The same goes for | 132 | so that usbcore won't map or unmap the buffer. The same goes for |
116 | urb->setup_dma and URB_NO_SETUP_DMA_MAP for control requests. | 133 | urb->setup_dma and URB_NO_SETUP_DMA_MAP for control requests. |
134 | |||
135 | Note that several of those interfaces are currently commented out, since | ||
136 | they don't have current users. See the source code. Other than the dmasync | ||
137 | calls (where the underlying DMA primitives have changed), most of them can | ||
138 | easily be commented back in if you want to use them. | ||
diff --git a/Documentation/usb/persist.txt b/Documentation/usb/persist.txt new file mode 100644 index 000000000000..df54d645cbb5 --- /dev/null +++ b/Documentation/usb/persist.txt | |||
@@ -0,0 +1,156 @@ | |||
1 | USB device persistence during system suspend | ||
2 | |||
3 | Alan Stern <stern@rowland.harvard.edu> | ||
4 | |||
5 | September 2, 2006 (Updated May 29, 2007) | ||
6 | |||
7 | |||
8 | What is the problem? | ||
9 | |||
10 | According to the USB specification, when a USB bus is suspended the | ||
11 | bus must continue to supply suspend current (around 1-5 mA). This | ||
12 | is so that devices can maintain their internal state and hubs can | ||
13 | detect connect-change events (devices being plugged in or unplugged). | ||
14 | The technical term is "power session". | ||
15 | |||
16 | If a USB device's power session is interrupted then the system is | ||
17 | required to behave as though the device has been unplugged. It's a | ||
18 | conservative approach; in the absence of suspend current the computer | ||
19 | has no way to know what has actually happened. Perhaps the same | ||
20 | device is still attached or perhaps it was removed and a different | ||
21 | device plugged into the port. The system must assume the worst. | ||
22 | |||
23 | By default, Linux behaves according to the spec. If a USB host | ||
24 | controller loses power during a system suspend, then when the system | ||
25 | wakes up all the devices attached to that controller are treated as | ||
26 | though they had disconnected. This is always safe and it is the | ||
27 | "officially correct" thing to do. | ||
28 | |||
29 | For many sorts of devices this behavior doesn't matter in the least. | ||
30 | If the kernel wants to believe that your USB keyboard was unplugged | ||
31 | while the system was asleep and a new keyboard was plugged in when the | ||
32 | system woke up, who cares? It'll still work the same when you type on | ||
33 | it. | ||
34 | |||
35 | Unfortunately problems _can_ arise, particularly with mass-storage | ||
36 | devices. The effect is exactly the same as if the device really had | ||
37 | been unplugged while the system was suspended. If you had a mounted | ||
38 | filesystem on the device, you're out of luck -- everything in that | ||
39 | filesystem is now inaccessible. This is especially annoying if your | ||
40 | root filesystem was located on the device, since your system will | ||
41 | instantly crash. | ||
42 | |||
43 | Loss of power isn't the only mechanism to worry about. Anything that | ||
44 | interrupts a power session will have the same effect. For example, | ||
45 | even though suspend current may have been maintained while the system | ||
46 | was asleep, on many systems during the initial stages of wakeup the | ||
47 | firmware (i.e., the BIOS) resets the motherboard's USB host | ||
48 | controllers. Result: all the power sessions are destroyed and again | ||
49 | it's as though you had unplugged all the USB devices. Yes, it's | ||
50 | entirely the BIOS's fault, but that doesn't do _you_ any good unless | ||
51 | you can convince the BIOS supplier to fix the problem (lots of luck!). | ||
52 | |||
53 | On many systems the USB host controllers will get reset after a | ||
54 | suspend-to-RAM. On almost all systems, no suspend current is | ||
55 | available during hibernation (also known as swsusp or suspend-to-disk). | ||
56 | You can check the kernel log after resuming to see if either of these | ||
57 | has happened; look for lines saying "root hub lost power or was reset". | ||
58 | |||
59 | In practice, people are forced to unmount any filesystems on a USB | ||
60 | device before suspending. If the root filesystem is on a USB device, | ||
61 | the system can't be suspended at all. (All right, it _can_ be | ||
62 | suspended -- but it will crash as soon as it wakes up, which isn't | ||
63 | much better.) | ||
64 | |||
65 | |||
66 | What is the solution? | ||
67 | |||
68 | Setting CONFIG_USB_PERSIST will cause the kernel to work around these | ||
69 | issues. It enables a mode in which the core USB device data | ||
70 | structures are allowed to persist across a power-session disruption. | ||
71 | It works like this. If the kernel sees that a USB host controller is | ||
72 | not in the expected state during resume (i.e., if the controller was | ||
73 | reset or otherwise had lost power) then it applies a persistence check | ||
74 | to each of the USB devices below that controller for which the | ||
75 | "persist" attribute is set. It doesn't try to resume the device; that | ||
76 | can't work once the power session is gone. Instead it issues a USB | ||
77 | port reset and then re-enumerates the device. (This is exactly the | ||
78 | same thing that happens whenever a USB device is reset.) If the | ||
79 | re-enumeration shows that the device now attached to that port has the | ||
80 | same descriptors as before, including the Vendor and Product IDs, then | ||
81 | the kernel continues to use the same device structure. In effect, the | ||
82 | kernel treats the device as though it had merely been reset instead of | ||
83 | unplugged. | ||
84 | |||
85 | If no device is now attached to the port, or if the descriptors are | ||
86 | different from what the kernel remembers, then the treatment is what | ||
87 | you would expect. The kernel destroys the old device structure and | ||
88 | behaves as though the old device had been unplugged and a new device | ||
89 | plugged in, just as it would without the CONFIG_USB_PERSIST option. | ||
90 | |||
91 | The end result is that the USB device remains available and usable. | ||
92 | Filesystem mounts and memory mappings are unaffected, and the world is | ||
93 | now a good and happy place. | ||
94 | |||
95 | Note that even when CONFIG_USB_PERSIST is set, the "persist" feature | ||
96 | will be applied only to those devices for which it is enabled. You | ||
97 | can enable the feature by doing (as root): | ||
98 | |||
99 | echo 1 >/sys/bus/usb/devices/.../power/persist | ||
100 | |||
101 | where the "..." should be filled in the with the device's ID. Disable | ||
102 | the feature by writing 0 instead of 1. For hubs the feature is | ||
103 | automatically and permanently enabled, so you only have to worry about | ||
104 | setting it for devices where it really matters. | ||
105 | |||
106 | |||
107 | Is this the best solution? | ||
108 | |||
109 | Perhaps not. Arguably, keeping track of mounted filesystems and | ||
110 | memory mappings across device disconnects should be handled by a | ||
111 | centralized Logical Volume Manager. Such a solution would allow you | ||
112 | to plug in a USB flash device, create a persistent volume associated | ||
113 | with it, unplug the flash device, plug it back in later, and still | ||
114 | have the same persistent volume associated with the device. As such | ||
115 | it would be more far-reaching than CONFIG_USB_PERSIST. | ||
116 | |||
117 | On the other hand, writing a persistent volume manager would be a big | ||
118 | job and using it would require significant input from the user. This | ||
119 | solution is much quicker and easier -- and it exists now, a giant | ||
120 | point in its favor! | ||
121 | |||
122 | Furthermore, the USB_PERSIST option applies to _all_ USB devices, not | ||
123 | just mass-storage devices. It might turn out to be equally useful for | ||
124 | other device types, such as network interfaces. | ||
125 | |||
126 | |||
127 | WARNING: Using CONFIG_USB_PERSIST can be dangerous!! | ||
128 | |||
129 | When recovering an interrupted power session the kernel does its best | ||
130 | to make sure the USB device hasn't been changed; that is, the same | ||
131 | device is still plugged into the port as before. But the checks | ||
132 | aren't guaranteed to be 100% accurate. | ||
133 | |||
134 | If you replace one USB device with another of the same type (same | ||
135 | manufacturer, same IDs, and so on) there's an excellent chance the | ||
136 | kernel won't detect the change. Serial numbers and other strings are | ||
137 | not compared. In many cases it wouldn't help if they were, because | ||
138 | manufacturers frequently omit serial numbers entirely in their | ||
139 | devices. | ||
140 | |||
141 | Furthermore it's quite possible to leave a USB device exactly the same | ||
142 | while changing its media. If you replace the flash memory card in a | ||
143 | USB card reader while the system is asleep, the kernel will have no | ||
144 | way to know you did it. The kernel will assume that nothing has | ||
145 | happened and will continue to use the partition tables, inodes, and | ||
146 | memory mappings for the old card. | ||
147 | |||
148 | If the kernel gets fooled in this way, it's almost certain to cause | ||
149 | data corruption and to crash your system. You'll have no one to blame | ||
150 | but yourself. | ||
151 | |||
152 | YOU HAVE BEEN WARNED! USE AT YOUR OWN RISK! | ||
153 | |||
154 | That having been said, most of the time there shouldn't be any trouble | ||
155 | at all. The "persist" feature can be extremely useful. Make the most | ||
156 | of it. | ||
diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt index 727c8d81aeaf..1523320abd87 100644 --- a/Documentation/vm/slub.txt +++ b/Documentation/vm/slub.txt | |||
@@ -1,13 +1,9 @@ | |||
1 | Short users guide for SLUB | 1 | Short users guide for SLUB |
2 | -------------------------- | 2 | -------------------------- |
3 | 3 | ||
4 | First of all slub should transparently replace SLAB. If you enable | ||
5 | SLUB then everything should work the same (Note the word "should". | ||
6 | There is likely not much value in that word at this point). | ||
7 | |||
8 | The basic philosophy of SLUB is very different from SLAB. SLAB | 4 | The basic philosophy of SLUB is very different from SLAB. SLAB |
9 | requires rebuilding the kernel to activate debug options for all | 5 | requires rebuilding the kernel to activate debug options for all |
10 | SLABS. SLUB always includes full debugging but its off by default. | 6 | slab caches. SLUB always includes full debugging but it is off by default. |
11 | SLUB can enable debugging only for selected slabs in order to avoid | 7 | SLUB can enable debugging only for selected slabs in order to avoid |
12 | an impact on overall system performance which may make a bug more | 8 | an impact on overall system performance which may make a bug more |
13 | difficult to find. | 9 | difficult to find. |
@@ -76,13 +72,28 @@ of objects. | |||
76 | Careful with tracing: It may spew out lots of information and never stop if | 72 | Careful with tracing: It may spew out lots of information and never stop if |
77 | used on the wrong slab. | 73 | used on the wrong slab. |
78 | 74 | ||
79 | SLAB Merging | 75 | Slab merging |
80 | ------------ | 76 | ------------ |
81 | 77 | ||
82 | If no debugging is specified then SLUB may merge similar slabs together | 78 | If no debug options are specified then SLUB may merge similar slabs together |
83 | in order to reduce overhead and increase cache hotness of objects. | 79 | in order to reduce overhead and increase cache hotness of objects. |
84 | slabinfo -a displays which slabs were merged together. | 80 | slabinfo -a displays which slabs were merged together. |
85 | 81 | ||
82 | Slab validation | ||
83 | --------------- | ||
84 | |||
85 | SLUB can validate all object if the kernel was booted with slub_debug. In | ||
86 | order to do so you must have the slabinfo tool. Then you can do | ||
87 | |||
88 | slabinfo -v | ||
89 | |||
90 | which will test all objects. Output will be generated to the syslog. | ||
91 | |||
92 | This also works in a more limited way if boot was without slab debug. | ||
93 | In that case slabinfo -v simply tests all reachable objects. Usually | ||
94 | these are in the cpu slabs and the partial slabs. Full slabs are not | ||
95 | tracked by SLUB in a non debug situation. | ||
96 | |||
86 | Getting more performance | 97 | Getting more performance |
87 | ------------------------ | 98 | ------------------------ |
88 | 99 | ||
@@ -91,9 +102,9 @@ list_lock once in a while to deal with partial slabs. That overhead is | |||
91 | governed by the order of the allocation for each slab. The allocations | 102 | governed by the order of the allocation for each slab. The allocations |
92 | can be influenced by kernel parameters: | 103 | can be influenced by kernel parameters: |
93 | 104 | ||
94 | slub_min_objects=x (default 8) | 105 | slub_min_objects=x (default 4) |
95 | slub_min_order=x (default 0) | 106 | slub_min_order=x (default 0) |
96 | slub_max_order=x (default 4) | 107 | slub_max_order=x (default 1) |
97 | 108 | ||
98 | slub_min_objects allows to specify how many objects must at least fit | 109 | slub_min_objects allows to specify how many objects must at least fit |
99 | into one slab in order for the allocation order to be acceptable. | 110 | into one slab in order for the allocation order to be acceptable. |
@@ -109,5 +120,107 @@ longer be checked. This is useful to avoid SLUB trying to generate | |||
109 | super large order pages to fit slub_min_objects of a slab cache with | 120 | super large order pages to fit slub_min_objects of a slab cache with |
110 | large object sizes into one high order page. | 121 | large object sizes into one high order page. |
111 | 122 | ||
112 | 123 | SLUB Debug output | |
113 | Christoph Lameter, <clameter@sgi.com>, April 10, 2007 | 124 | ----------------- |
125 | |||
126 | Here is a sample of slub debug output: | ||
127 | |||
128 | *** SLUB kmalloc-8: Redzone Active@0xc90f6d20 slab 0xc528c530 offset=3360 flags=0x400000c3 inuse=61 freelist=0xc90f6d58 | ||
129 | Bytes b4 0xc90f6d10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ | ||
130 | Object 0xc90f6d20: 31 30 31 39 2e 30 30 35 1019.005 | ||
131 | Redzone 0xc90f6d28: 00 cc cc cc . | ||
132 | FreePointer 0xc90f6d2c -> 0xc90f6d58 | ||
133 | Last alloc: get_modalias+0x61/0xf5 jiffies_ago=53 cpu=1 pid=554 | ||
134 | Filler 0xc90f6d50: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ | ||
135 | [<c010523d>] dump_trace+0x63/0x1eb | ||
136 | [<c01053df>] show_trace_log_lvl+0x1a/0x2f | ||
137 | [<c010601d>] show_trace+0x12/0x14 | ||
138 | [<c0106035>] dump_stack+0x16/0x18 | ||
139 | [<c017e0fa>] object_err+0x143/0x14b | ||
140 | [<c017e2cc>] check_object+0x66/0x234 | ||
141 | [<c017eb43>] __slab_free+0x239/0x384 | ||
142 | [<c017f446>] kfree+0xa6/0xc6 | ||
143 | [<c02e2335>] get_modalias+0xb9/0xf5 | ||
144 | [<c02e23b7>] dmi_dev_uevent+0x27/0x3c | ||
145 | [<c027866a>] dev_uevent+0x1ad/0x1da | ||
146 | [<c0205024>] kobject_uevent_env+0x20a/0x45b | ||
147 | [<c020527f>] kobject_uevent+0xa/0xf | ||
148 | [<c02779f1>] store_uevent+0x4f/0x58 | ||
149 | [<c027758e>] dev_attr_store+0x29/0x2f | ||
150 | [<c01bec4f>] sysfs_write_file+0x16e/0x19c | ||
151 | [<c0183ba7>] vfs_write+0xd1/0x15a | ||
152 | [<c01841d7>] sys_write+0x3d/0x72 | ||
153 | [<c0104112>] sysenter_past_esp+0x5f/0x99 | ||
154 | [<b7f7b410>] 0xb7f7b410 | ||
155 | ======================= | ||
156 | @@@ SLUB kmalloc-8: Restoring redzone (0xcc) from 0xc90f6d28-0xc90f6d2b | ||
157 | |||
158 | |||
159 | |||
160 | If SLUB encounters a corrupted object then it will perform the following | ||
161 | actions: | ||
162 | |||
163 | 1. Isolation and report of the issue | ||
164 | |||
165 | This will be a message in the system log starting with | ||
166 | |||
167 | *** SLUB <slab cache affected>: <What went wrong>@<object address> | ||
168 | offset=<offset of object into slab> flags=<slabflags> | ||
169 | inuse=<objects in use in this slab> freelist=<first free object in slab> | ||
170 | |||
171 | 2. Report on how the problem was dealt with in order to ensure the continued | ||
172 | operation of the system. | ||
173 | |||
174 | These are messages in the system log beginning with | ||
175 | |||
176 | @@@ SLUB <slab cache affected>: <corrective action taken> | ||
177 | |||
178 | |||
179 | In the above sample SLUB found that the Redzone of an active object has | ||
180 | been overwritten. Here a string of 8 characters was written into a slab that | ||
181 | has the length of 8 characters. However, a 8 character string needs a | ||
182 | terminating 0. That zero has overwritten the first byte of the Redzone field. | ||
183 | After reporting the details of the issue encountered the @@@ SLUB message | ||
184 | tell us that SLUB has restored the redzone to its proper value and then | ||
185 | system operations continue. | ||
186 | |||
187 | Various types of lines can follow the @@@ SLUB line: | ||
188 | |||
189 | Bytes b4 <address> : <bytes> | ||
190 | Show a few bytes before the object where the problem was detected. | ||
191 | Can be useful if the corruption does not stop with the start of the | ||
192 | object. | ||
193 | |||
194 | Object <address> : <bytes> | ||
195 | The bytes of the object. If the object is inactive then the bytes | ||
196 | typically contain poisoning values. Any non-poison value shows a | ||
197 | corruption by a write after free. | ||
198 | |||
199 | Redzone <address> : <bytes> | ||
200 | The redzone following the object. The redzone is used to detect | ||
201 | writes after the object. All bytes should always have the same | ||
202 | value. If there is any deviation then it is due to a write after | ||
203 | the object boundary. | ||
204 | |||
205 | Freepointer | ||
206 | The pointer to the next free object in the slab. May become | ||
207 | corrupted if overwriting continues after the red zone. | ||
208 | |||
209 | Last alloc: | ||
210 | Last free: | ||
211 | Shows the address from which the object was allocated/freed last. | ||
212 | We note the pid, the time and the CPU that did so. This is usually | ||
213 | the most useful information to figure out where things went wrong. | ||
214 | Here get_modalias() did an kmalloc(8) instead of a kmalloc(9). | ||
215 | |||
216 | Filler <address> : <bytes> | ||
217 | Unused data to fill up the space in order to get the next object | ||
218 | properly aligned. In the debug case we make sure that there are | ||
219 | at least 4 bytes of filler. This allow for the detection of writes | ||
220 | before the object. | ||
221 | |||
222 | Following the filler will be a stackdump. That stackdump describes the | ||
223 | location where the error was detected. The cause of the corruption is more | ||
224 | likely to be found by looking at the information about the last alloc / free. | ||
225 | |||
226 | Christoph Lameter, <clameter@sgi.com>, May 23, 2007 | ||
diff --git a/Documentation/volatile-considered-harmful.txt b/Documentation/volatile-considered-harmful.txt new file mode 100644 index 000000000000..10c2e411cca8 --- /dev/null +++ b/Documentation/volatile-considered-harmful.txt | |||
@@ -0,0 +1,119 @@ | |||
1 | Why the "volatile" type class should not be used | ||
2 | ------------------------------------------------ | ||
3 | |||
4 | C programmers have often taken volatile to mean that the variable could be | ||
5 | changed outside of the current thread of execution; as a result, they are | ||
6 | sometimes tempted to use it in kernel code when shared data structures are | ||
7 | being used. In other words, they have been known to treat volatile types | ||
8 | as a sort of easy atomic variable, which they are not. The use of volatile in | ||
9 | kernel code is almost never correct; this document describes why. | ||
10 | |||
11 | The key point to understand with regard to volatile is that its purpose is | ||
12 | to suppress optimization, which is almost never what one really wants to | ||
13 | do. In the kernel, one must protect shared data structures against | ||
14 | unwanted concurrent access, which is very much a different task. The | ||
15 | process of protecting against unwanted concurrency will also avoid almost | ||
16 | all optimization-related problems in a more efficient way. | ||
17 | |||
18 | Like volatile, the kernel primitives which make concurrent access to data | ||
19 | safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent | ||
20 | unwanted optimization. If they are being used properly, there will be no | ||
21 | need to use volatile as well. If volatile is still necessary, there is | ||
22 | almost certainly a bug in the code somewhere. In properly-written kernel | ||
23 | code, volatile can only serve to slow things down. | ||
24 | |||
25 | Consider a typical block of kernel code: | ||
26 | |||
27 | spin_lock(&the_lock); | ||
28 | do_something_on(&shared_data); | ||
29 | do_something_else_with(&shared_data); | ||
30 | spin_unlock(&the_lock); | ||
31 | |||
32 | If all the code follows the locking rules, the value of shared_data cannot | ||
33 | change unexpectedly while the_lock is held. Any other code which might | ||
34 | want to play with that data will be waiting on the lock. The spinlock | ||
35 | primitives act as memory barriers - they are explicitly written to do so - | ||
36 | meaning that data accesses will not be optimized across them. So the | ||
37 | compiler might think it knows what will be in shared_data, but the | ||
38 | spin_lock() call, since it acts as a memory barrier, will force it to | ||
39 | forget anything it knows. There will be no optimization problems with | ||
40 | accesses to that data. | ||
41 | |||
42 | If shared_data were declared volatile, the locking would still be | ||
43 | necessary. But the compiler would also be prevented from optimizing access | ||
44 | to shared_data _within_ the critical section, when we know that nobody else | ||
45 | can be working with it. While the lock is held, shared_data is not | ||
46 | volatile. When dealing with shared data, proper locking makes volatile | ||
47 | unnecessary - and potentially harmful. | ||
48 | |||
49 | The volatile storage class was originally meant for memory-mapped I/O | ||
50 | registers. Within the kernel, register accesses, too, should be protected | ||
51 | by locks, but one also does not want the compiler "optimizing" register | ||
52 | accesses within a critical section. But, within the kernel, I/O memory | ||
53 | accesses are always done through accessor functions; accessing I/O memory | ||
54 | directly through pointers is frowned upon and does not work on all | ||
55 | architectures. Those accessors are written to prevent unwanted | ||
56 | optimization, so, once again, volatile is unnecessary. | ||
57 | |||
58 | Another situation where one might be tempted to use volatile is | ||
59 | when the processor is busy-waiting on the value of a variable. The right | ||
60 | way to perform a busy wait is: | ||
61 | |||
62 | while (my_variable != what_i_want) | ||
63 | cpu_relax(); | ||
64 | |||
65 | The cpu_relax() call can lower CPU power consumption or yield to a | ||
66 | hyperthreaded twin processor; it also happens to serve as a memory barrier, | ||
67 | so, once again, volatile is unnecessary. Of course, busy-waiting is | ||
68 | generally an anti-social act to begin with. | ||
69 | |||
70 | There are still a few rare situations where volatile makes sense in the | ||
71 | kernel: | ||
72 | |||
73 | - The above-mentioned accessor functions might use volatile on | ||
74 | architectures where direct I/O memory access does work. Essentially, | ||
75 | each accessor call becomes a little critical section on its own and | ||
76 | ensures that the access happens as expected by the programmer. | ||
77 | |||
78 | - Inline assembly code which changes memory, but which has no other | ||
79 | visible side effects, risks being deleted by GCC. Adding the volatile | ||
80 | keyword to asm statements will prevent this removal. | ||
81 | |||
82 | - The jiffies variable is special in that it can have a different value | ||
83 | every time it is referenced, but it can be read without any special | ||
84 | locking. So jiffies can be volatile, but the addition of other | ||
85 | variables of this type is strongly frowned upon. Jiffies is considered | ||
86 | to be a "stupid legacy" issue (Linus's words) in this regard; fixing it | ||
87 | would be more trouble than it is worth. | ||
88 | |||
89 | - Pointers to data structures in coherent memory which might be modified | ||
90 | by I/O devices can, sometimes, legitimately be volatile. A ring buffer | ||
91 | used by a network adapter, where that adapter changes pointers to | ||
92 | indicate which descriptors have been processed, is an example of this | ||
93 | type of situation. | ||
94 | |||
95 | For most code, none of the above justifications for volatile apply. As a | ||
96 | result, the use of volatile is likely to be seen as a bug and will bring | ||
97 | additional scrutiny to the code. Developers who are tempted to use | ||
98 | volatile should take a step back and think about what they are truly trying | ||
99 | to accomplish. | ||
100 | |||
101 | Patches to remove volatile variables are generally welcome - as long as | ||
102 | they come with a justification which shows that the concurrency issues have | ||
103 | been properly thought through. | ||
104 | |||
105 | |||
106 | NOTES | ||
107 | ----- | ||
108 | |||
109 | [1] http://lwn.net/Articles/233481/ | ||
110 | [2] http://lwn.net/Articles/233482/ | ||
111 | |||
112 | CREDITS | ||
113 | ------- | ||
114 | |||
115 | Original impetus and research by Randy Dunlap | ||
116 | Written by Jonathan Corbet | ||
117 | Improvements via coments from Satyam Sharma, Johannes Stezenbach, Jesper | ||
118 | Juhl, Heikki Orsila, H. Peter Anvin, Philipp Hahn, and Stefan | ||
119 | Richter. | ||
diff --git a/Documentation/watchdog/pcwd-watchdog.txt b/Documentation/watchdog/pcwd-watchdog.txt index d9ee6336c1d4..4f68052395c0 100644 --- a/Documentation/watchdog/pcwd-watchdog.txt +++ b/Documentation/watchdog/pcwd-watchdog.txt | |||
@@ -1,3 +1,5 @@ | |||
1 | Last reviewed: 10/05/2007 | ||
2 | |||
1 | Berkshire Products PC Watchdog Card | 3 | Berkshire Products PC Watchdog Card |
2 | Support for ISA Cards Revision A and C | 4 | Support for ISA Cards Revision A and C |
3 | Documentation and Driver by Ken Hollis <kenji@bitgate.com> | 5 | Documentation and Driver by Ken Hollis <kenji@bitgate.com> |
@@ -14,8 +16,8 @@ | |||
14 | 16 | ||
15 | The Watchdog Driver will automatically find your watchdog card, and will | 17 | The Watchdog Driver will automatically find your watchdog card, and will |
16 | attach a running driver for use with that card. After the watchdog | 18 | attach a running driver for use with that card. After the watchdog |
17 | drivers have initialized, you can then talk to the card using the PC | 19 | drivers have initialized, you can then talk to the card using a PC |
18 | Watchdog program, available from http://ftp.bitgate.com/pcwd/. | 20 | Watchdog program. |
19 | 21 | ||
20 | I suggest putting a "watchdog -d" before the beginning of an fsck, and | 22 | I suggest putting a "watchdog -d" before the beginning of an fsck, and |
21 | a "watchdog -e -t 1" immediately after the end of an fsck. (Remember | 23 | a "watchdog -e -t 1" immediately after the end of an fsck. (Remember |
@@ -62,5 +64,3 @@ | |||
62 | -- Ken Hollis | 64 | -- Ken Hollis |
63 | (kenji@bitgate.com) | 65 | (kenji@bitgate.com) |
64 | 66 | ||
65 | (This documentation may be out of date. Check | ||
66 | http://ftp.bitgate.com/pcwd/ for the absolute latest additions.) | ||
diff --git a/Documentation/watchdog/watchdog-api.txt b/Documentation/watchdog/watchdog-api.txt index 8d16f6f3c4ec..bb7cb1d31ec7 100644 --- a/Documentation/watchdog/watchdog-api.txt +++ b/Documentation/watchdog/watchdog-api.txt | |||
@@ -1,3 +1,6 @@ | |||
1 | Last reviewed: 10/05/2007 | ||
2 | |||
3 | |||
1 | The Linux Watchdog driver API. | 4 | The Linux Watchdog driver API. |
2 | 5 | ||
3 | Copyright 2002 Christer Weingel <wingel@nano-system.com> | 6 | Copyright 2002 Christer Weingel <wingel@nano-system.com> |
@@ -22,7 +25,7 @@ the system. If userspace fails (RAM error, kernel bug, whatever), the | |||
22 | notifications cease to occur, and the hardware watchdog will reset the | 25 | notifications cease to occur, and the hardware watchdog will reset the |
23 | system (causing a reboot) after the timeout occurs. | 26 | system (causing a reboot) after the timeout occurs. |
24 | 27 | ||
25 | The Linux watchdog API is a rather AD hoc construction and different | 28 | The Linux watchdog API is a rather ad-hoc construction and different |
26 | drivers implement different, and sometimes incompatible, parts of it. | 29 | drivers implement different, and sometimes incompatible, parts of it. |
27 | This file is an attempt to document the existing usage and allow | 30 | This file is an attempt to document the existing usage and allow |
28 | future driver writers to use it as a reference. | 31 | future driver writers to use it as a reference. |
@@ -46,14 +49,16 @@ some of the drivers support the configuration option "Disable watchdog | |||
46 | shutdown on close", CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when | 49 | shutdown on close", CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when |
47 | compiling the kernel, there is no way of disabling the watchdog once | 50 | compiling the kernel, there is no way of disabling the watchdog once |
48 | it has been started. So, if the watchdog daemon crashes, the system | 51 | it has been started. So, if the watchdog daemon crashes, the system |
49 | will reboot after the timeout has passed. | 52 | will reboot after the timeout has passed. Watchdog devices also usually |
53 | support the nowayout module parameter so that this option can be controlled | ||
54 | at runtime. | ||
50 | 55 | ||
51 | Some other drivers will not disable the watchdog, unless a specific | 56 | Drivers will not disable the watchdog, unless a specific magic character 'V' |
52 | magic character 'V' has been sent /dev/watchdog just before closing | 57 | has been sent /dev/watchdog just before closing the file. If the userspace |
53 | the file. If the userspace daemon closes the file without sending | 58 | daemon closes the file without sending this special character, the driver |
54 | this special character, the driver will assume that the daemon (and | 59 | will assume that the daemon (and userspace in general) died, and will stop |
55 | userspace in general) died, and will stop pinging the watchdog without | 60 | pinging the watchdog without disabling it first. This will then cause a |
56 | disabling it first. This will then cause a reboot. | 61 | reboot if the watchdog is not re-opened in sufficient time. |
57 | 62 | ||
58 | The ioctl API: | 63 | The ioctl API: |
59 | 64 | ||
@@ -227,218 +232,3 @@ The following options are available: | |||
227 | 232 | ||
228 | [FIXME -- better explanations] | 233 | [FIXME -- better explanations] |
229 | 234 | ||
230 | Implementations in the current drivers in the kernel tree: | ||
231 | |||
232 | Here I have tried to summarize what the different drivers support and | ||
233 | where they do strange things compared to the other drivers. | ||
234 | |||
235 | acquirewdt.c -- Acquire Single Board Computer | ||
236 | |||
237 | This driver has a hardcoded timeout of 1 minute | ||
238 | |||
239 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
240 | |||
241 | GETSUPPORT returns KEEPALIVEPING. GETSTATUS will return 1 if | ||
242 | the device is open, 0 if not. [FIXME -- isn't this rather | ||
243 | silly? To be able to use the ioctl, the device must be open | ||
244 | and so GETSTATUS will always return 1]. | ||
245 | |||
246 | advantechwdt.c -- Advantech Single Board Computer | ||
247 | |||
248 | Timeout that defaults to 60 seconds, supports SETTIMEOUT. | ||
249 | |||
250 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
251 | |||
252 | GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. | ||
253 | The GETSTATUS call returns if the device is open or not. | ||
254 | [FIXME -- silliness again?] | ||
255 | |||
256 | booke_wdt.c -- PowerPC BookE Watchdog Timer | ||
257 | |||
258 | Timeout default varies according to frequency, supports | ||
259 | SETTIMEOUT | ||
260 | |||
261 | Watchdog cannot be turned off, CONFIG_WATCHDOG_NOWAYOUT | ||
262 | does not make sense | ||
263 | |||
264 | GETSUPPORT returns the watchdog_info struct, and | ||
265 | GETSTATUS returns the supported options. GETBOOTSTATUS | ||
266 | returns a 1 if the last reset was caused by the | ||
267 | watchdog and a 0 otherwise. This watchdog cannot be | ||
268 | disabled once it has been started. The wdt_period kernel | ||
269 | parameter selects which bit of the time base changing | ||
270 | from 0->1 will trigger the watchdog exception. Changing | ||
271 | the timeout from the ioctl calls will change the | ||
272 | wdt_period as defined above. Finally if you would like to | ||
273 | replace the default Watchdog Handler you can implement the | ||
274 | WatchdogHandler() function in your own code. | ||
275 | |||
276 | eurotechwdt.c -- Eurotech CPU-1220/1410 | ||
277 | |||
278 | The timeout can be set using the SETTIMEOUT ioctl and defaults | ||
279 | to 60 seconds. | ||
280 | |||
281 | Also has a module parameter "ev", event type which controls | ||
282 | what should happen on a timeout, the string "int" or anything | ||
283 | else that causes a reboot. [FIXME -- better description] | ||
284 | |||
285 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
286 | |||
287 | GETSUPPORT returns CARDRESET and WDIOF_SETTIMEOUT but | ||
288 | GETSTATUS is not supported and GETBOOTSTATUS just returns 0. | ||
289 | |||
290 | i810-tco.c -- Intel 810 chipset | ||
291 | |||
292 | Also has support for a lot of other i8x0 stuff, but the | ||
293 | watchdog is one of the things. | ||
294 | |||
295 | The timeout is set using the module parameter "i810_margin", | ||
296 | which is in steps of 0.6 seconds where 2<i810_margin<64. The | ||
297 | driver supports the SETTIMEOUT ioctl. | ||
298 | |||
299 | Supports CONFIG_WATCHDOG_NOWAYOUT. | ||
300 | |||
301 | GETSUPPORT returns WDIOF_SETTIMEOUT. The GETSTATUS call | ||
302 | returns some kind of timer value which ist not compatible with | ||
303 | the other drivers. GETBOOT status returns some kind of | ||
304 | hardware specific boot status. [FIXME -- describe this] | ||
305 | |||
306 | ib700wdt.c -- IB700 Single Board Computer | ||
307 | |||
308 | Default timeout of 30 seconds and the timeout is settable | ||
309 | using the SETTIMEOUT ioctl. Note that only a few timeout | ||
310 | values are supported. | ||
311 | |||
312 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
313 | |||
314 | GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. | ||
315 | The GETSTATUS call returns if the device is open or not. | ||
316 | [FIXME -- silliness again?] | ||
317 | |||
318 | machzwd.c -- MachZ ZF-Logic | ||
319 | |||
320 | Hardcoded timeout of 10 seconds | ||
321 | |||
322 | Has a module parameter "action" that controls what happens | ||
323 | when the timeout runs out which can be 0 = RESET (default), | ||
324 | 1 = SMI, 2 = NMI, 3 = SCI. | ||
325 | |||
326 | Supports CONFIG_WATCHDOG_NOWAYOUT and the magic character | ||
327 | 'V' close handling. | ||
328 | |||
329 | GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call | ||
330 | returns if the device is open or not. [FIXME -- silliness | ||
331 | again?] | ||
332 | |||
333 | mixcomwd.c -- MixCom Watchdog | ||
334 | |||
335 | [FIXME -- I'm unable to tell what the timeout is] | ||
336 | |||
337 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
338 | |||
339 | GETSUPPORT returns WDIOF_KEEPALIVEPING, GETSTATUS returns if | ||
340 | the device is opened or not [FIXME -- I'm not really sure how | ||
341 | this works, there seems to be some magic connected to | ||
342 | CONFIG_WATCHDOG_NOWAYOUT] | ||
343 | |||
344 | pcwd.c -- Berkshire PC Watchdog | ||
345 | |||
346 | Hardcoded timeout of 1.5 seconds | ||
347 | |||
348 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
349 | |||
350 | GETSUPPORT returns WDIOF_OVERHEAT|WDIOF_CARDRESET and both | ||
351 | GETSTATUS and GETBOOTSTATUS return something useful. | ||
352 | |||
353 | The SETOPTIONS call can be used to enable and disable the card | ||
354 | and to ask the driver to call panic if the system overheats. | ||
355 | |||
356 | sbc60xxwdt.c -- 60xx Single Board Computer | ||
357 | |||
358 | Hardcoded timeout of 10 seconds | ||
359 | |||
360 | Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic | ||
361 | character 'V' close handling. | ||
362 | |||
363 | No bits set in GETSUPPORT | ||
364 | |||
365 | scx200.c -- National SCx200 CPUs | ||
366 | |||
367 | Not in the kernel yet. | ||
368 | |||
369 | The timeout is set using a module parameter "margin" which | ||
370 | defaults to 60 seconds. The timeout can also be set using | ||
371 | SETTIMEOUT and read using GETTIMEOUT. | ||
372 | |||
373 | Supports a module parameter "nowayout" that is initialized | ||
374 | with the value of CONFIG_WATCHDOG_NOWAYOUT. Also supports the | ||
375 | magic character 'V' handling. | ||
376 | |||
377 | shwdt.c -- SuperH 3/4 processors | ||
378 | |||
379 | [FIXME -- I'm unable to tell what the timeout is] | ||
380 | |||
381 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
382 | |||
383 | GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call | ||
384 | returns if the device is open or not. [FIXME -- silliness | ||
385 | again?] | ||
386 | |||
387 | softdog.c -- Software watchdog | ||
388 | |||
389 | The timeout is set with the module parameter "soft_margin" | ||
390 | which defaults to 60 seconds, the timeout is also settable | ||
391 | using the SETTIMEOUT ioctl. | ||
392 | |||
393 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
394 | |||
395 | WDIOF_SETTIMEOUT bit set in GETSUPPORT | ||
396 | |||
397 | w83877f_wdt.c -- W83877F Computer | ||
398 | |||
399 | Hardcoded timeout of 30 seconds | ||
400 | |||
401 | Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic | ||
402 | character 'V' close handling. | ||
403 | |||
404 | No bits set in GETSUPPORT | ||
405 | |||
406 | w83627hf_wdt.c -- w83627hf watchdog | ||
407 | |||
408 | Timeout that defaults to 60 seconds, supports SETTIMEOUT. | ||
409 | |||
410 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
411 | |||
412 | GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. | ||
413 | The GETSTATUS call returns if the device is open or not. | ||
414 | |||
415 | wdt.c -- ICS WDT500/501 ISA and | ||
416 | wdt_pci.c -- ICS WDT500/501 PCI | ||
417 | |||
418 | Default timeout of 60 seconds. The timeout is also settable | ||
419 | using the SETTIMEOUT ioctl. | ||
420 | |||
421 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
422 | |||
423 | GETSUPPORT returns with bits set depending on the actual | ||
424 | card. The WDT501 supports a lot of external monitoring, the | ||
425 | WDT500 much less. | ||
426 | |||
427 | wdt285.c -- Footbridge watchdog | ||
428 | |||
429 | The timeout is set with the module parameter "soft_margin" | ||
430 | which defaults to 60 seconds. The timeout is also settable | ||
431 | using the SETTIMEOUT ioctl. | ||
432 | |||
433 | Does not support CONFIG_WATCHDOG_NOWAYOUT | ||
434 | |||
435 | WDIOF_SETTIMEOUT bit set in GETSUPPORT | ||
436 | |||
437 | wdt977.c -- Netwinder W83977AF chip | ||
438 | |||
439 | Hardcoded timeout of 3 minutes | ||
440 | |||
441 | Supports CONFIG_WATCHDOG_NOWAYOUT | ||
442 | |||
443 | Does not support any ioctls at all. | ||
444 | |||
diff --git a/Documentation/watchdog/watchdog.txt b/Documentation/watchdog/watchdog.txt deleted file mode 100644 index 4b1ff69cc19a..000000000000 --- a/Documentation/watchdog/watchdog.txt +++ /dev/null | |||
@@ -1,94 +0,0 @@ | |||
1 | Watchdog Timer Interfaces For The Linux Operating System | ||
2 | |||
3 | Alan Cox <alan@lxorguk.ukuu.org.uk> | ||
4 | |||
5 | Custom Linux Driver And Program Development | ||
6 | |||
7 | |||
8 | The following watchdog drivers are currently implemented: | ||
9 | |||
10 | ICS WDT501-P | ||
11 | ICS WDT501-P (no fan tachometer) | ||
12 | ICS WDT500-P | ||
13 | Software Only | ||
14 | SA1100 Internal Watchdog | ||
15 | Berkshire Products PC Watchdog Revision A & C (by Ken Hollis) | ||
16 | |||
17 | |||
18 | All six interfaces provide /dev/watchdog, which when open must be written | ||
19 | to within a timeout or the machine will reboot. Each write delays the reboot | ||
20 | time another timeout. In the case of the software watchdog the ability to | ||
21 | reboot will depend on the state of the machines and interrupts. The hardware | ||
22 | boards physically pull the machine down off their own onboard timers and | ||
23 | will reboot from almost anything. | ||
24 | |||
25 | A second temperature monitoring interface is available on the WDT501P cards | ||
26 | and some Berkshire cards. This provides /dev/temperature. This is the machine | ||
27 | internal temperature in degrees Fahrenheit. Each read returns a single byte | ||
28 | giving the temperature. | ||
29 | |||
30 | The third interface logs kernel messages on additional alert events. | ||
31 | |||
32 | Both software and hardware watchdog drivers are available in the standard | ||
33 | kernel. If you are using the software watchdog, you probably also want | ||
34 | to use "panic=60" as a boot argument as well. | ||
35 | |||
36 | The wdt card cannot be safely probed for. Instead you need to pass | ||
37 | wdt=ioaddr,irq as a boot parameter - eg "wdt=0x240,11". | ||
38 | |||
39 | The SA1100 watchdog module can be configured with the "sa1100_margin" | ||
40 | commandline argument which specifies timeout value in seconds. | ||
41 | |||
42 | The i810 TCO watchdog modules can be configured with the "i810_margin" | ||
43 | commandline argument which specifies the counter initial value. The counter | ||
44 | is decremented every 0.6 seconds and default to 50 (30 seconds). Values can | ||
45 | range between 3 and 63. | ||
46 | |||
47 | The i810 TCO watchdog driver also implements the WDIOC_GETSTATUS and | ||
48 | WDIOC_GETBOOTSTATUS ioctl()s. WDIOC_GETSTATUS returns the actual counter value | ||
49 | and WDIOC_GETBOOTSTATUS returns the value of TCO2 Status Register (see Intel's | ||
50 | documentation for the 82801AA and 82801AB datasheet). | ||
51 | |||
52 | Features | ||
53 | -------- | ||
54 | WDT501P WDT500P Software Berkshire i810 TCO SA1100WD | ||
55 | Reboot Timer X X X X X X | ||
56 | External Reboot X X o o o X | ||
57 | I/O Port Monitor o o o X o o | ||
58 | Temperature X o o X o o | ||
59 | Fan Speed X o o o o o | ||
60 | Power Under X o o o o o | ||
61 | Power Over X o o o o o | ||
62 | Overheat X o o o o o | ||
63 | |||
64 | The external event interfaces on the WDT boards are not currently supported. | ||
65 | Minor numbers are however allocated for it. | ||
66 | |||
67 | |||
68 | Example Watchdog Driver: see Documentation/watchdog/src/watchdog-simple.c | ||
69 | |||
70 | |||
71 | Contact Information | ||
72 | |||
73 | People keep asking about the WDT watchdog timer hardware: The phone contacts | ||
74 | for Industrial Computer Source are: | ||
75 | |||
76 | Industrial Computer Source | ||
77 | http://www.indcompsrc.com | ||
78 | ICS Advent, San Diego | ||
79 | 6260 Sequence Dr. | ||
80 | San Diego, CA 92121-4371 | ||
81 | Phone (858) 677-0877 | ||
82 | FAX: (858) 677-0895 | ||
83 | > | ||
84 | ICS Advent Europe, UK | ||
85 | Oving Road | ||
86 | Chichester, | ||
87 | West Sussex, | ||
88 | PO19 4ET, UK | ||
89 | Phone: 00.44.1243.533900 | ||
90 | |||
91 | |||
92 | and please mention Linux when enquiring. | ||
93 | |||
94 | For full information about the PCWD cards see the pcwd-watchdog.txt document. | ||
diff --git a/Documentation/watchdog/wdt.txt b/Documentation/watchdog/wdt.txt new file mode 100644 index 000000000000..03fd756d976d --- /dev/null +++ b/Documentation/watchdog/wdt.txt | |||
@@ -0,0 +1,43 @@ | |||
1 | Last Reviewed: 10/05/2007 | ||
2 | |||
3 | WDT Watchdog Timer Interfaces For The Linux Operating System | ||
4 | Alan Cox <alan@lxorguk.ukuu.org.uk> | ||
5 | |||
6 | ICS WDT501-P | ||
7 | ICS WDT501-P (no fan tachometer) | ||
8 | ICS WDT500-P | ||
9 | |||
10 | All the interfaces provide /dev/watchdog, which when open must be written | ||
11 | to within a timeout or the machine will reboot. Each write delays the reboot | ||
12 | time another timeout. In the case of the software watchdog the ability to | ||
13 | reboot will depend on the state of the machines and interrupts. The hardware | ||
14 | boards physically pull the machine down off their own onboard timers and | ||
15 | will reboot from almost anything. | ||
16 | |||
17 | A second temperature monitoring interface is available on the WDT501P cards | ||
18 | This provides /dev/temperature. This is the machine internal temperature in | ||
19 | degrees Fahrenheit. Each read returns a single byte giving the temperature. | ||
20 | |||
21 | The third interface logs kernel messages on additional alert events. | ||
22 | |||
23 | The wdt card cannot be safely probed for. Instead you need to pass | ||
24 | wdt=ioaddr,irq as a boot parameter - eg "wdt=0x240,11". | ||
25 | |||
26 | Features | ||
27 | -------- | ||
28 | WDT501P WDT500P | ||
29 | Reboot Timer X X | ||
30 | External Reboot X X | ||
31 | I/O Port Monitor o o | ||
32 | Temperature X o | ||
33 | Fan Speed X o | ||
34 | Power Under X o | ||
35 | Power Over X o | ||
36 | Overheat X o | ||
37 | |||
38 | The external event interfaces on the WDT boards are not currently supported. | ||
39 | Minor numbers are however allocated for it. | ||
40 | |||
41 | |||
42 | Example Watchdog Driver: see Documentation/watchdog/src/watchdog-simple.c | ||
43 | |||