diff options
Diffstat (limited to 'Documentation')
28 files changed, 1282 insertions, 861 deletions
diff --git a/Documentation/ABI/removed/raw1394_legacy_isochronous b/Documentation/ABI/removed/raw1394_legacy_isochronous new file mode 100644 index 000000000000..1b629622d883 --- /dev/null +++ b/Documentation/ABI/removed/raw1394_legacy_isochronous | |||
@@ -0,0 +1,16 @@ | |||
1 | What: legacy isochronous ABI of raw1394 (1st generation iso ABI) | ||
2 | Date: June 2007 (scheduled), removed in kernel v2.6.23 | ||
3 | Contact: linux1394-devel@lists.sourceforge.net | ||
4 | Description: | ||
5 | The two request types RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN have | ||
6 | been deprecated for quite some time. They are very inefficient as they | ||
7 | come with high interrupt load and several layers of callbacks for each | ||
8 | packet. Because of these deficiencies, the video1394 and dv1394 drivers | ||
9 | and the 3rd-generation isochronous ABI in raw1394 (rawiso) were created. | ||
10 | |||
11 | Users: | ||
12 | libraw1394 users via the long deprecated API raw1394_iso_write, | ||
13 | raw1394_start_iso_write, raw1394_start_iso_rcv, raw1394_stop_iso_rcv | ||
14 | |||
15 | libdc1394, which optionally uses these old libraw1394 calls | ||
16 | alternatively to the more efficient video1394 ABI | ||
diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt index 028614cdd062..e07f2530326b 100644 --- a/Documentation/DMA-mapping.txt +++ b/Documentation/DMA-mapping.txt | |||
@@ -664,109 +664,6 @@ It is that simple. | |||
664 | Well, not for some odd devices. See the next section for information | 664 | Well, not for some odd devices. See the next section for information |
665 | about that. | 665 | about that. |
666 | 666 | ||
667 | DAC Addressing for Address Space Hungry Devices | ||
668 | |||
669 | There exists a class of devices which do not mesh well with the PCI | ||
670 | DMA mapping API. By definition these "mappings" are a finite | ||
671 | resource. The number of total available mappings per bus is platform | ||
672 | specific, but there will always be a reasonable amount. | ||
673 | |||
674 | What is "reasonable"? Reasonable means that networking and block I/O | ||
675 | devices need not worry about using too many mappings. | ||
676 | |||
677 | As an example of a problematic device, consider compute cluster cards. | ||
678 | They can potentially need to access gigabytes of memory at once via | ||
679 | DMA. Dynamic mappings are unsuitable for this kind of access pattern. | ||
680 | |||
681 | To this end we've provided a small API by which a device driver | ||
682 | may use DAC cycles to directly address all of physical memory. | ||
683 | Not all platforms support this, but most do. It is easy to determine | ||
684 | whether the platform will work properly at probe time. | ||
685 | |||
686 | First, understand that there may be a SEVERE performance penalty for | ||
687 | using these interfaces on some platforms. Therefore, you MUST only | ||
688 | use these interfaces if it is absolutely required. %99 of devices can | ||
689 | use the normal APIs without any problems. | ||
690 | |||
691 | Note that for streaming type mappings you must either use these | ||
692 | interfaces, or the dynamic mapping interfaces above. You may not mix | ||
693 | usage of both for the same device. Such an act is illegal and is | ||
694 | guaranteed to put a banana in your tailpipe. | ||
695 | |||
696 | However, consistent mappings may in fact be used in conjunction with | ||
697 | these interfaces. Remember that, as defined, consistent mappings are | ||
698 | always going to be SAC addressable. | ||
699 | |||
700 | The first thing your driver needs to do is query the PCI platform | ||
701 | layer if it is capable of handling your devices DAC addressing | ||
702 | capabilities: | ||
703 | |||
704 | int pci_dac_dma_supported(struct pci_dev *hwdev, u64 mask); | ||
705 | |||
706 | You may not use the following interfaces if this routine fails. | ||
707 | |||
708 | Next, DMA addresses using this API are kept track of using the | ||
709 | dma64_addr_t type. It is guaranteed to be big enough to hold any | ||
710 | DAC address the platform layer will give to you from the following | ||
711 | routines. If you have consistent mappings as well, you still | ||
712 | use plain dma_addr_t to keep track of those. | ||
713 | |||
714 | All mappings obtained here will be direct. The mappings are not | ||
715 | translated, and this is the purpose of this dialect of the DMA API. | ||
716 | |||
717 | All routines work with page/offset pairs. This is the _ONLY_ way to | ||
718 | portably refer to any piece of memory. If you have a cpu pointer | ||
719 | (which may be validly DMA'd too) you may easily obtain the page | ||
720 | and offset using something like this: | ||
721 | |||
722 | struct page *page = virt_to_page(ptr); | ||
723 | unsigned long offset = offset_in_page(ptr); | ||
724 | |||
725 | Here are the interfaces: | ||
726 | |||
727 | dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev, | ||
728 | struct page *page, | ||
729 | unsigned long offset, | ||
730 | int direction); | ||
731 | |||
732 | The DAC address for the tuple PAGE/OFFSET are returned. The direction | ||
733 | argument is the same as for pci_{map,unmap}_single(). The same rules | ||
734 | for cpu/device access apply here as for the streaming mapping | ||
735 | interfaces. To reiterate: | ||
736 | |||
737 | The cpu may touch the buffer before pci_dac_page_to_dma. | ||
738 | The device may touch the buffer after pci_dac_page_to_dma | ||
739 | is made, but the cpu may NOT. | ||
740 | |||
741 | When the DMA transfer is complete, invoke: | ||
742 | |||
743 | void pci_dac_dma_sync_single_for_cpu(struct pci_dev *pdev, | ||
744 | dma64_addr_t dma_addr, | ||
745 | size_t len, int direction); | ||
746 | |||
747 | This must be done before the CPU looks at the buffer again. | ||
748 | This interface behaves identically to pci_dma_sync_{single,sg}_for_cpu(). | ||
749 | |||
750 | And likewise, if you wish to let the device get back at the buffer after | ||
751 | the cpu has read/written it, invoke: | ||
752 | |||
753 | void pci_dac_dma_sync_single_for_device(struct pci_dev *pdev, | ||
754 | dma64_addr_t dma_addr, | ||
755 | size_t len, int direction); | ||
756 | |||
757 | before letting the device access the DMA area again. | ||
758 | |||
759 | If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t | ||
760 | the following interfaces are provided: | ||
761 | |||
762 | struct page *pci_dac_dma_to_page(struct pci_dev *pdev, | ||
763 | dma64_addr_t dma_addr); | ||
764 | unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev, | ||
765 | dma64_addr_t dma_addr); | ||
766 | |||
767 | This is possible with the DAC interfaces purely because they are | ||
768 | not translated in any way. | ||
769 | |||
770 | Optimizing Unmap State Space Consumption | 667 | Optimizing Unmap State Space Consumption |
771 | 668 | ||
772 | On many platforms, pci_unmap_{single,page}() is simply a nop. | 669 | On many platforms, pci_unmap_{single,page}() is simply a nop. |
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index 38f88b6ae405..46bcff2849bd 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl | |||
@@ -643,4 +643,70 @@ X!Idrivers/video/console/fonts.c | |||
643 | !Edrivers/spi/spi.c | 643 | !Edrivers/spi/spi.c |
644 | </chapter> | 644 | </chapter> |
645 | 645 | ||
646 | <chapter id="i2c"> | ||
647 | <title>I<superscript>2</superscript>C and SMBus Subsystem</title> | ||
648 | |||
649 | <para> | ||
650 | I<superscript>2</superscript>C (or without fancy typography, "I2C") | ||
651 | is an acronym for the "Inter-IC" bus, a simple bus protocol which is | ||
652 | widely used where low data rate communications suffice. | ||
653 | Since it's also a licensed trademark, some vendors use another | ||
654 | name (such as "Two-Wire Interface", TWI) for the same bus. | ||
655 | I2C only needs two signals (SCL for clock, SDA for data), conserving | ||
656 | board real estate and minimizing signal quality issues. | ||
657 | Most I2C devices use seven bit addresses, and bus speeds of up | ||
658 | to 400 kHz; there's a high speed extension (3.4 MHz) that's not yet | ||
659 | found wide use. | ||
660 | I2C is a multi-master bus; open drain signaling is used to | ||
661 | arbitrate between masters, as well as to handshake and to | ||
662 | synchronize clocks from slower clients. | ||
663 | </para> | ||
664 | |||
665 | <para> | ||
666 | The Linux I2C programming interfaces support only the master | ||
667 | side of bus interactions, not the slave side. | ||
668 | The programming interface is structured around two kinds of driver, | ||
669 | and two kinds of device. | ||
670 | An I2C "Adapter Driver" abstracts the controller hardware; it binds | ||
671 | to a physical device (perhaps a PCI device or platform_device) and | ||
672 | exposes a <structname>struct i2c_adapter</structname> representing | ||
673 | each I2C bus segment it manages. | ||
674 | On each I2C bus segment will be I2C devices represented by a | ||
675 | <structname>struct i2c_client</structname>. Those devices will | ||
676 | be bound to a <structname>struct i2c_driver</structname>, | ||
677 | which should follow the standard Linux driver model. | ||
678 | (At this writing, a legacy model is more widely used.) | ||
679 | There are functions to perform various I2C protocol operations; at | ||
680 | this writing all such functions are usable only from task context. | ||
681 | </para> | ||
682 | |||
683 | <para> | ||
684 | The System Management Bus (SMBus) is a sibling protocol. Most SMBus | ||
685 | systems are also I2C conformant. The electrical constraints are | ||
686 | tighter for SMBus, and it standardizes particular protocol messages | ||
687 | and idioms. Controllers that support I2C can also support most | ||
688 | SMBus operations, but SMBus controllers don't support all the protocol | ||
689 | options that an I2C controller will. | ||
690 | There are functions to perform various SMBus protocol operations, | ||
691 | either using I2C primitives or by issuing SMBus commands to | ||
692 | i2c_adapter devices which don't support those I2C operations. | ||
693 | </para> | ||
694 | |||
695 | !Iinclude/linux/i2c.h | ||
696 | !Fdrivers/i2c/i2c-boardinfo.c i2c_register_board_info | ||
697 | !Edrivers/i2c/i2c-core.c | ||
698 | </chapter> | ||
699 | |||
700 | <chapter id="splice"> | ||
701 | <title>splice API</title> | ||
702 | <para>) | ||
703 | splice is a method for moving blocks of data around inside the | ||
704 | kernel, without continually transferring it between the kernel | ||
705 | and user space. | ||
706 | </para> | ||
707 | !Iinclude/linux/splice.h | ||
708 | !Ffs/splice.c | ||
709 | </chapter> | ||
710 | |||
711 | |||
646 | </book> | 712 | </book> |
diff --git a/Documentation/blackfin/kgdb.txt b/Documentation/blackfin/kgdb.txt new file mode 100644 index 000000000000..84f6a484ae9a --- /dev/null +++ b/Documentation/blackfin/kgdb.txt | |||
@@ -0,0 +1,155 @@ | |||
1 | A Simple Guide to Configure KGDB | ||
2 | |||
3 | Sonic Zhang <sonic.zhang@analog.com> | ||
4 | Aug. 24th 2006 | ||
5 | |||
6 | |||
7 | This KGDB patch enables the kernel developer to do source level debugging on | ||
8 | the kernel for the Blackfin architecture. The debugging works over either the | ||
9 | ethernet interface or one of the uarts. Both software breakpoints and | ||
10 | hardware breakpoints are supported in this version. | ||
11 | http://docs.blackfin.uclinux.org/doku.php?id=kgdb | ||
12 | |||
13 | |||
14 | 2 known issues: | ||
15 | 1. This bug: | ||
16 | http://blackfin.uclinux.org/tracker/index.php?func=detail&aid=544&group_id=18&atid=145 | ||
17 | The GDB client for Blackfin uClinux causes incorrect values of local | ||
18 | variables to be displayed when the user breaks the running of kernel in GDB. | ||
19 | 2. Because of a hardware bug in Blackfin 533 v1.0.3: | ||
20 | 05000067 - Watchpoints (Hardware Breakpoints) are not supported | ||
21 | Hardware breakpoints cannot be set properly. | ||
22 | |||
23 | |||
24 | Debug over Ethernet: | ||
25 | |||
26 | 1. Compile and install the cross platform version of gdb for blackfin, which | ||
27 | can be found at $(BINROOT)/bfin-elf-gdb. | ||
28 | |||
29 | 2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under | ||
30 | "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb". | ||
31 | With this selected, option "Full Symbolic/Source Debugging support" and | ||
32 | "Compile the kernel with frame pointers" are also selected. | ||
33 | |||
34 | 3. Select option "KGDB: connect over (Ethernet)". Add "kgdboe=@target-IP/,@host-IP/" to | ||
35 | the option "Compiled-in Kernel Boot Parameter" under "Kernel hacking". | ||
36 | |||
37 | 4. Connect minicom to the serial port and boot the kernel image. | ||
38 | |||
39 | 5. Configure the IP "/> ifconfig eth0 target-IP" | ||
40 | |||
41 | 6. Start GDB client "bfin-elf-gdb vmlinux". | ||
42 | |||
43 | 7. Connect to the target "(gdb) target remote udp:target-IP:6443". | ||
44 | |||
45 | 8. Set software breakpoint "(gdb) break sys_open". | ||
46 | |||
47 | 9. Continue "(gdb) c". | ||
48 | |||
49 | 10. Run ls in the target console "/> ls". | ||
50 | |||
51 | 11. Breakpoint hits. "Breakpoint 1: sys_open(..." | ||
52 | |||
53 | 12. Display local variables and function paramters. | ||
54 | (*) This operation gives wrong results, see known issue 1. | ||
55 | |||
56 | 13. Single stepping "(gdb) si". | ||
57 | |||
58 | 14. Remove breakpoint 1. "(gdb) del 1" | ||
59 | |||
60 | 15. Set hardware breakpoint "(gdb) hbreak sys_open". | ||
61 | |||
62 | 16. Continue "(gdb) c". | ||
63 | |||
64 | 17. Run ls in the target console "/> ls". | ||
65 | |||
66 | 18. Hardware breakpoint hits. "Breakpoint 1: sys_open(...". | ||
67 | (*) This hardware breakpoint will not be hit, see known issue 2. | ||
68 | |||
69 | 19. Continue "(gdb) c". | ||
70 | |||
71 | 20. Interrupt the target in GDB "Ctrl+C". | ||
72 | |||
73 | 21. Detach from the target "(gdb) detach". | ||
74 | |||
75 | 22. Exit GDB "(gdb) quit". | ||
76 | |||
77 | |||
78 | Debug over the UART: | ||
79 | |||
80 | 1. Compile and install the cross platform version of gdb for blackfin, which | ||
81 | can be found at $(BINROOT)/bfin-elf-gdb. | ||
82 | |||
83 | 2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under | ||
84 | "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb". | ||
85 | With this selected, option "Full Symbolic/Source Debugging support" and | ||
86 | "Compile the kernel with frame pointers" are also selected. | ||
87 | |||
88 | 3. Select option "KGDB: connect over (UART)". Set "KGDB: UART port number" to be | ||
89 | a different one from the console. Don't forget to change the mode of | ||
90 | blackfin serial driver to PIO. Otherwise kgdb works incorrectly on UART. | ||
91 | |||
92 | 4. If you want connect to kgdb when the kernel boots, enable | ||
93 | "KGDB: Wait for gdb connection early" | ||
94 | |||
95 | 5. Compile kernel. | ||
96 | |||
97 | 6. Connect minicom to the serial port of the console and boot the kernel image. | ||
98 | |||
99 | 7. Start GDB client "bfin-elf-gdb vmlinux". | ||
100 | |||
101 | 8. Set the baud rate in GDB "(gdb) set remotebaud 57600". | ||
102 | |||
103 | 9. Connect to the target on the second serial port "(gdb) target remote /dev/ttyS1". | ||
104 | |||
105 | 10. Set software breakpoint "(gdb) break sys_open". | ||
106 | |||
107 | 11. Continue "(gdb) c". | ||
108 | |||
109 | 12. Run ls in the target console "/> ls". | ||
110 | |||
111 | 13. A breakpoint is hit. "Breakpoint 1: sys_open(..." | ||
112 | |||
113 | 14. All other operations are the same as that in KGDB over Ethernet. | ||
114 | |||
115 | |||
116 | Debug over the same UART as console: | ||
117 | |||
118 | 1. Compile and install the cross platform version of gdb for blackfin, which | ||
119 | can be found at $(BINROOT)/bfin-elf-gdb. | ||
120 | |||
121 | 2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under | ||
122 | "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb". | ||
123 | With this selected, option "Full Symbolic/Source Debugging support" and | ||
124 | "Compile the kernel with frame pointers" are also selected. | ||
125 | |||
126 | 3. Select option "KGDB: connect over UART". Set "KGDB: UART port number" to console. | ||
127 | Don't forget to change the mode of blackfin serial driver to PIO. | ||
128 | Otherwise kgdb works incorrectly on UART. | ||
129 | |||
130 | 4. If you want connect to kgdb when the kernel boots, enable | ||
131 | "KGDB: Wait for gdb connection early" | ||
132 | |||
133 | 5. Connect minicom to the serial port and boot the kernel image. | ||
134 | |||
135 | 6. (Optional) Ask target to wait for gdb connection by entering Ctrl+A. In minicom, you should enter Ctrl+A+A. | ||
136 | |||
137 | 7. Start GDB client "bfin-elf-gdb vmlinux". | ||
138 | |||
139 | 8. Set the baud rate in GDB "(gdb) set remotebaud 57600". | ||
140 | |||
141 | 9. Connect to the target "(gdb) target remote /dev/ttyS0". | ||
142 | |||
143 | 10. Set software breakpoint "(gdb) break sys_open". | ||
144 | |||
145 | 11. Continue "(gdb) c". Then enter Ctrl+C twice to stop GDB connection. | ||
146 | |||
147 | 12. Run ls in the target console "/> ls". Dummy string can be seen on the console. | ||
148 | |||
149 | 13. Then connect the gdb to target again. "(gdb) target remote /dev/ttyS0". | ||
150 | Now you will find a breakpoint is hit. "Breakpoint 1: sys_open(..." | ||
151 | |||
152 | 14. All other operations are the same as that in KGDB over Ethernet. The only | ||
153 | difference is that after continue command in GDB, please stop GDB | ||
154 | connection by 2 "Ctrl+C"s and connect again after breakpoints are hit or | ||
155 | Ctrl+A is entered. | ||
diff --git a/Documentation/block/barrier.txt b/Documentation/block/barrier.txt index a272c3db8094..7d279f2f5bb2 100644 --- a/Documentation/block/barrier.txt +++ b/Documentation/block/barrier.txt | |||
@@ -82,23 +82,12 @@ including draining and flushing. | |||
82 | typedef void (prepare_flush_fn)(request_queue_t *q, struct request *rq); | 82 | typedef void (prepare_flush_fn)(request_queue_t *q, struct request *rq); |
83 | 83 | ||
84 | int blk_queue_ordered(request_queue_t *q, unsigned ordered, | 84 | int blk_queue_ordered(request_queue_t *q, unsigned ordered, |
85 | prepare_flush_fn *prepare_flush_fn, | 85 | prepare_flush_fn *prepare_flush_fn); |
86 | unsigned gfp_mask); | ||
87 | |||
88 | int blk_queue_ordered_locked(request_queue_t *q, unsigned ordered, | ||
89 | prepare_flush_fn *prepare_flush_fn, | ||
90 | unsigned gfp_mask); | ||
91 | |||
92 | The only difference between the two functions is whether or not the | ||
93 | caller is holding q->queue_lock on entry. The latter expects the | ||
94 | caller is holding the lock. | ||
95 | 86 | ||
96 | @q : the queue in question | 87 | @q : the queue in question |
97 | @ordered : the ordered mode the driver/device supports | 88 | @ordered : the ordered mode the driver/device supports |
98 | @prepare_flush_fn : this function should prepare @rq such that it | 89 | @prepare_flush_fn : this function should prepare @rq such that it |
99 | flushes cache to physical medium when executed | 90 | flushes cache to physical medium when executed |
100 | @gfp_mask : gfp_mask used when allocating data structures | ||
101 | for ordered processing | ||
102 | 91 | ||
103 | For example, SCSI disk driver's prepare_flush_fn looks like the | 92 | For example, SCSI disk driver's prepare_flush_fn looks like the |
104 | following. | 93 | following. |
@@ -106,9 +95,10 @@ following. | |||
106 | static void sd_prepare_flush(request_queue_t *q, struct request *rq) | 95 | static void sd_prepare_flush(request_queue_t *q, struct request *rq) |
107 | { | 96 | { |
108 | memset(rq->cmd, 0, sizeof(rq->cmd)); | 97 | memset(rq->cmd, 0, sizeof(rq->cmd)); |
109 | rq->flags |= REQ_BLOCK_PC; | 98 | rq->cmd_type = REQ_TYPE_BLOCK_PC; |
110 | rq->timeout = SD_TIMEOUT; | 99 | rq->timeout = SD_TIMEOUT; |
111 | rq->cmd[0] = SYNCHRONIZE_CACHE; | 100 | rq->cmd[0] = SYNCHRONIZE_CACHE; |
101 | rq->cmd_len = 10; | ||
112 | } | 102 | } |
113 | 103 | ||
114 | The following seven ordered modes are supported. The following table | 104 | The following seven ordered modes are supported. The following table |
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 7d3f205b0ba5..0599a0c7c026 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -49,16 +49,6 @@ Who: Adrian Bunk <bunk@stusta.de> | |||
49 | 49 | ||
50 | --------------------------- | 50 | --------------------------- |
51 | 51 | ||
52 | What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN | ||
53 | When: June 2007 | ||
54 | Why: Deprecated in favour of the more efficient and robust rawiso interface. | ||
55 | Affected are applications which use the deprecated part of libraw1394 | ||
56 | (raw1394_iso_write, raw1394_start_iso_write, raw1394_start_iso_rcv, | ||
57 | raw1394_stop_iso_rcv) or bypass libraw1394. | ||
58 | Who: Dan Dennedy <dan@dennedy.org>, Stefan Richter <stefanr@s5r6.in-berlin.de> | ||
59 | |||
60 | --------------------------- | ||
61 | |||
62 | What: old NCR53C9x driver | 52 | What: old NCR53C9x driver |
63 | When: October 2007 | 53 | When: October 2007 |
64 | Why: Replaced by the much better esp_scsi driver. Actual low-level | 54 | Why: Replaced by the much better esp_scsi driver. Actual low-level |
@@ -258,14 +248,6 @@ Who: Len Brown <len.brown@intel.com> | |||
258 | 248 | ||
259 | --------------------------- | 249 | --------------------------- |
260 | 250 | ||
261 | What: sk98lin network driver | ||
262 | When: July 2007 | ||
263 | Why: In kernel tree version of driver is unmaintained. Sk98lin driver | ||
264 | replaced by the skge driver. | ||
265 | Who: Stephen Hemminger <shemminger@osdl.org> | ||
266 | |||
267 | --------------------------- | ||
268 | |||
269 | What: Compaq touchscreen device emulation | 251 | What: Compaq touchscreen device emulation |
270 | When: Oct 2007 | 252 | When: Oct 2007 |
271 | Files: drivers/input/tsdev.c | 253 | Files: drivers/input/tsdev.c |
@@ -280,25 +262,6 @@ Who: Richard Purdie <rpurdie@rpsys.net> | |||
280 | 262 | ||
281 | --------------------------- | 263 | --------------------------- |
282 | 264 | ||
283 | What: Multipath cached routing support in ipv4 | ||
284 | When: in 2.6.23 | ||
285 | Why: Code was merged, then submitter immediately disappeared leaving | ||
286 | us with no maintainer and lots of bugs. The code should not have | ||
287 | been merged in the first place, and many aspects of it's | ||
288 | implementation are blocking more critical core networking | ||
289 | development. It's marked EXPERIMENTAL and no distribution | ||
290 | enables it because it cause obscure crashes due to unfixable bugs | ||
291 | (interfaces don't return errors so memory allocation can't be | ||
292 | handled, calling contexts of these interfaces make handling | ||
293 | errors impossible too because they get called after we've | ||
294 | totally commited to creating a route object, for example). | ||
295 | This problem has existed for years and no forward progress | ||
296 | has ever been made, and nobody steps up to try and salvage | ||
297 | this code, so we're going to finally just get rid of it. | ||
298 | Who: David S. Miller <davem@davemloft.net> | ||
299 | |||
300 | --------------------------- | ||
301 | |||
302 | What: read_dev_chars(), read_conf_data{,_lpm}() (s390 common I/O layer) | 265 | What: read_dev_chars(), read_conf_data{,_lpm}() (s390 common I/O layer) |
303 | When: December 2007 | 266 | When: December 2007 |
304 | Why: These functions are a leftover from 2.4 times. They have several | 267 | Why: These functions are a leftover from 2.4 times. They have several |
@@ -348,3 +311,18 @@ Who: Tejun Heo <htejun@gmail.com> | |||
348 | 311 | ||
349 | --------------------------- | 312 | --------------------------- |
350 | 313 | ||
314 | What: Legacy RTC drivers (under drivers/i2c/chips) | ||
315 | When: November 2007 | ||
316 | Why: Obsolete. We have a RTC subsystem with better drivers. | ||
317 | Who: Jean Delvare <khali@linux-fr.org> | ||
318 | |||
319 | --------------------------- | ||
320 | |||
321 | What: iptables SAME target | ||
322 | When: 1.1. 2008 | ||
323 | Files: net/ipv4/netfilter/ipt_SAME.c, include/linux/netfilter_ipv4/ipt_SAME.h | ||
324 | Why: Obsolete for multiple years now, NAT core provides the same behaviour. | ||
325 | Unfixable broken wrt. 32/64 bit cleanness. | ||
326 | Who: Patrick McHardy <kaber@trash.net> | ||
327 | |||
328 | --------------------------- | ||
diff --git a/Documentation/firmware_class/firmware_sample_firmware_class.c b/Documentation/firmware_class/firmware_sample_firmware_class.c index 4994f1f28f8c..fba943aacf93 100644 --- a/Documentation/firmware_class/firmware_sample_firmware_class.c +++ b/Documentation/firmware_class/firmware_sample_firmware_class.c | |||
@@ -78,6 +78,7 @@ static CLASS_DEVICE_ATTR(loading, 0644, | |||
78 | firmware_loading_show, firmware_loading_store); | 78 | firmware_loading_show, firmware_loading_store); |
79 | 79 | ||
80 | static ssize_t firmware_data_read(struct kobject *kobj, | 80 | static ssize_t firmware_data_read(struct kobject *kobj, |
81 | struct bin_attribute *bin_attr, | ||
81 | char *buffer, loff_t offset, size_t count) | 82 | char *buffer, loff_t offset, size_t count) |
82 | { | 83 | { |
83 | struct class_device *class_dev = to_class_dev(kobj); | 84 | struct class_device *class_dev = to_class_dev(kobj); |
@@ -88,6 +89,7 @@ static ssize_t firmware_data_read(struct kobject *kobj, | |||
88 | return count; | 89 | return count; |
89 | } | 90 | } |
90 | static ssize_t firmware_data_write(struct kobject *kobj, | 91 | static ssize_t firmware_data_write(struct kobject *kobj, |
92 | struct bin_attribute *bin_attr, | ||
91 | char *buffer, loff_t offset, size_t count) | 93 | char *buffer, loff_t offset, size_t count) |
92 | { | 94 | { |
93 | struct class_device *class_dev = to_class_dev(kobj); | 95 | struct class_device *class_dev = to_class_dev(kobj); |
diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801 index c34f0db78a30..fe6406f2f9a6 100644 --- a/Documentation/i2c/busses/i2c-i801 +++ b/Documentation/i2c/busses/i2c-i801 | |||
@@ -5,8 +5,8 @@ Supported adapters: | |||
5 | '810' and '810E' chipsets) | 5 | '810' and '810E' chipsets) |
6 | * Intel 82801BA (ICH2 - part of the '815E' chipset) | 6 | * Intel 82801BA (ICH2 - part of the '815E' chipset) |
7 | * Intel 82801CA/CAM (ICH3) | 7 | * Intel 82801CA/CAM (ICH3) |
8 | * Intel 82801DB (ICH4) (HW PEC supported, 32 byte buffer not supported) | 8 | * Intel 82801DB (ICH4) (HW PEC supported) |
9 | * Intel 82801EB/ER (ICH5) (HW PEC supported, 32 byte buffer not supported) | 9 | * Intel 82801EB/ER (ICH5) (HW PEC supported) |
10 | * Intel 6300ESB | 10 | * Intel 6300ESB |
11 | * Intel 82801FB/FR/FW/FRW (ICH6) | 11 | * Intel 82801FB/FR/FW/FRW (ICH6) |
12 | * Intel 82801G (ICH7) | 12 | * Intel 82801G (ICH7) |
diff --git a/Documentation/i2c/busses/i2c-piix4 b/Documentation/i2c/busses/i2c-piix4 index 7cbe43fa2701..fa0c786a8bf5 100644 --- a/Documentation/i2c/busses/i2c-piix4 +++ b/Documentation/i2c/busses/i2c-piix4 | |||
@@ -6,7 +6,7 @@ Supported adapters: | |||
6 | Datasheet: Publicly available at the Intel website | 6 | Datasheet: Publicly available at the Intel website |
7 | * ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges | 7 | * ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges |
8 | Datasheet: Only available via NDA from ServerWorks | 8 | Datasheet: Only available via NDA from ServerWorks |
9 | * ATI IXP200, IXP300, IXP400 and SB600 southbridges | 9 | * ATI IXP200, IXP300, IXP400, SB600 and SB700 southbridges |
10 | Datasheet: Not publicly available | 10 | Datasheet: Not publicly available |
11 | * Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge | 11 | * Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge |
12 | Datasheet: Publicly available at the SMSC website http://www.smsc.com | 12 | Datasheet: Publicly available at the SMSC website http://www.smsc.com |
diff --git a/Documentation/i2c/busses/i2c-taos-evm b/Documentation/i2c/busses/i2c-taos-evm new file mode 100644 index 000000000000..9146e33be6dd --- /dev/null +++ b/Documentation/i2c/busses/i2c-taos-evm | |||
@@ -0,0 +1,46 @@ | |||
1 | Kernel driver i2c-taos-evm | ||
2 | |||
3 | Author: Jean Delvare <khali@linux-fr.org> | ||
4 | |||
5 | This is a driver for the evaluation modules for TAOS I2C/SMBus chips. | ||
6 | The modules include an SMBus master with limited capabilities, which can | ||
7 | be controlled over the serial port. Virtually all evaluation modules | ||
8 | are supported, but a few lines of code need to be added for each new | ||
9 | module to instantiate the right I2C chip on the bus. Obviously, a driver | ||
10 | for the chip in question is also needed. | ||
11 | |||
12 | Currently supported devices are: | ||
13 | |||
14 | * TAOS TSL2550 EVM | ||
15 | |||
16 | For addtional information on TAOS products, please see | ||
17 | http://www.taosinc.com/ | ||
18 | |||
19 | |||
20 | Using this driver | ||
21 | ----------------- | ||
22 | |||
23 | In order to use this driver, you'll need the serport driver, and the | ||
24 | inputattach tool, which is part of the input-utils package. The following | ||
25 | commands will tell the kernel that you have a TAOS EVM on the first | ||
26 | serial port: | ||
27 | |||
28 | # modprobe serport | ||
29 | # inputattach --taos-evm /dev/ttyS0 | ||
30 | |||
31 | |||
32 | Technical details | ||
33 | ----------------- | ||
34 | |||
35 | Only 4 SMBus transaction types are supported by the TAOS evaluation | ||
36 | modules: | ||
37 | * Receive Byte | ||
38 | * Send Byte | ||
39 | * Read Byte | ||
40 | * Write Byte | ||
41 | |||
42 | The communication protocol is text-based and pretty simple. It is | ||
43 | described in a PDF document on the CD which comes with the evaluation | ||
44 | module. The communication is rather slow, because the serial port has | ||
45 | to operate at 1200 bps. However, I don't think this is a big concern in | ||
46 | practice, as these modules are meant for evaluation and testing only. | ||
diff --git a/Documentation/i2c/chips/max6875 b/Documentation/i2c/chips/max6875 index 96fec562a8e9..a0cd8af2f408 100644 --- a/Documentation/i2c/chips/max6875 +++ b/Documentation/i2c/chips/max6875 | |||
@@ -99,7 +99,7 @@ And then read the data | |||
99 | 99 | ||
100 | or | 100 | or |
101 | 101 | ||
102 | count = i2c_smbus_read_i2c_block_data(fd, 0x84, buffer); | 102 | count = i2c_smbus_read_i2c_block_data(fd, 0x84, 16, buffer); |
103 | 103 | ||
104 | The block read should read 16 bytes. | 104 | The block read should read 16 bytes. |
105 | 0x84 is the block read command. | 105 | 0x84 is the block read command. |
diff --git a/Documentation/i2c/chips/x1205 b/Documentation/i2c/chips/x1205 deleted file mode 100644 index 09407c991fe5..000000000000 --- a/Documentation/i2c/chips/x1205 +++ /dev/null | |||
@@ -1,38 +0,0 @@ | |||
1 | Kernel driver x1205 | ||
2 | =================== | ||
3 | |||
4 | Supported chips: | ||
5 | * Xicor X1205 RTC | ||
6 | Prefix: 'x1205' | ||
7 | Addresses scanned: none | ||
8 | Datasheet: http://www.intersil.com/cda/deviceinfo/0,1477,X1205,00.html | ||
9 | |||
10 | Authors: | ||
11 | Karen Spearel <kas11@tampabay.rr.com>, | ||
12 | Alessandro Zummo <a.zummo@towertech.it> | ||
13 | |||
14 | Description | ||
15 | ----------- | ||
16 | |||
17 | This module aims to provide complete access to the Xicor X1205 RTC. | ||
18 | Recently Xicor has merged with Intersil, but the chip is | ||
19 | still sold under the Xicor brand. | ||
20 | |||
21 | This chip is located at address 0x6f and uses a 2-byte register addressing. | ||
22 | Two bytes need to be written to read a single register, while most | ||
23 | other chips just require one and take the second one as the data | ||
24 | to be written. To prevent corrupting unknown chips, the user must | ||
25 | explicitely set the probe parameter. | ||
26 | |||
27 | example: | ||
28 | |||
29 | modprobe x1205 probe=0,0x6f | ||
30 | |||
31 | The module supports one more option, hctosys, which is used to set the | ||
32 | software clock from the x1205. On systems where the x1205 is the | ||
33 | only hardware rtc, this parameter could be used to achieve a correct | ||
34 | date/time earlier in the system boot sequence. | ||
35 | |||
36 | example: | ||
37 | |||
38 | modprobe x1205 probe=0,0x6f hctosys=1 | ||
diff --git a/Documentation/i2c/summary b/Documentation/i2c/summary index aea60bf7e8f0..003c7319b8c7 100644 --- a/Documentation/i2c/summary +++ b/Documentation/i2c/summary | |||
@@ -67,7 +67,6 @@ i2c-proc: The /proc/sys/dev/sensors interface for device (client) drivers | |||
67 | Algorithm drivers | 67 | Algorithm drivers |
68 | ----------------- | 68 | ----------------- |
69 | 69 | ||
70 | i2c-algo-8xx: An algorithm for CPM's I2C device in Motorola 8xx processors (NOT BUILT BY DEFAULT) | ||
71 | i2c-algo-bit: A bit-banging algorithm | 70 | i2c-algo-bit: A bit-banging algorithm |
72 | i2c-algo-pcf: A PCF 8584 style algorithm | 71 | i2c-algo-pcf: A PCF 8584 style algorithm |
73 | i2c-algo-ibm_ocp: An algorithm for the I2C device in IBM 4xx processors (NOT BUILT BY DEFAULT) | 72 | i2c-algo-ibm_ocp: An algorithm for the I2C device in IBM 4xx processors (NOT BUILT BY DEFAULT) |
@@ -81,6 +80,5 @@ i2c-pcf-epp: PCF8584 on a EPP parallel port (uses i2c-algo-pcf) (NOT mkpatch | |||
81 | i2c-philips-par: Philips style parallel port adapter (uses i2c-algo-bit) | 80 | i2c-philips-par: Philips style parallel port adapter (uses i2c-algo-bit) |
82 | i2c-adap-ibm_ocp: IBM 4xx processor I2C device (uses i2c-algo-ibm_ocp) (NOT BUILT BY DEFAULT) | 81 | i2c-adap-ibm_ocp: IBM 4xx processor I2C device (uses i2c-algo-ibm_ocp) (NOT BUILT BY DEFAULT) |
83 | i2c-pport: Primitive parallel port adapter (uses i2c-algo-bit) | 82 | i2c-pport: Primitive parallel port adapter (uses i2c-algo-bit) |
84 | i2c-rpx: RPX board Motorola 8xx I2C device (uses i2c-algo-8xx) (NOT BUILT BY DEFAULT) | ||
85 | i2c-velleman: Velleman K8000 parallel port adapter (uses i2c-algo-bit) | 83 | i2c-velleman: Velleman K8000 parallel port adapter (uses i2c-algo-bit) |
86 | 84 | ||
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients index 3d8d36b0ad12..2c170032bf37 100644 --- a/Documentation/i2c/writing-clients +++ b/Documentation/i2c/writing-clients | |||
@@ -571,7 +571,7 @@ SMBus communication | |||
571 | u8 command, u8 length, | 571 | u8 command, u8 length, |
572 | u8 *values); | 572 | u8 *values); |
573 | extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client, | 573 | extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client, |
574 | u8 command, u8 *values); | 574 | u8 command, u8 length, u8 *values); |
575 | 575 | ||
576 | These ones were removed in Linux 2.6.10 because they had no users, but could | 576 | These ones were removed in Linux 2.6.10 because they had no users, but could |
577 | be added back later if needed: | 577 | be added back later if needed: |
diff --git a/Documentation/i386/zero-page.txt b/Documentation/i386/zero-page.txt index c04a421f4a7c..75b3680c41eb 100644 --- a/Documentation/i386/zero-page.txt +++ b/Documentation/i386/zero-page.txt | |||
@@ -37,6 +37,7 @@ Offset Type Description | |||
37 | 0x1d0 unsigned long EFI memory descriptor map pointer | 37 | 0x1d0 unsigned long EFI memory descriptor map pointer |
38 | 0x1d4 unsigned long EFI memory descriptor map size | 38 | 0x1d4 unsigned long EFI memory descriptor map size |
39 | 0x1e0 unsigned long ALT_MEM_K, alternative mem check, in Kb | 39 | 0x1e0 unsigned long ALT_MEM_K, alternative mem check, in Kb |
40 | 0x1e4 unsigned long Scratch field for the kernel setup code | ||
40 | 0x1e8 char number of entries in E820MAP (below) | 41 | 0x1e8 char number of entries in E820MAP (below) |
41 | 0x1e9 unsigned char number of entries in EDDBUF (below) | 42 | 0x1e9 unsigned char number of entries in EDDBUF (below) |
42 | 0x1ea unsigned char number of entries in EDD_MBR_SIG_BUFFER (below) | 43 | 0x1ea unsigned char number of entries in EDD_MBR_SIG_BUFFER (below) |
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index af50f9bbe68e..4d880b3d1f35 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -1014,49 +1014,6 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1014 | 1014 | ||
1015 | mga= [HW,DRM] | 1015 | mga= [HW,DRM] |
1016 | 1016 | ||
1017 | migration_cost= | ||
1018 | [KNL,SMP] debug: override scheduler migration costs | ||
1019 | Format: <level-1-usecs>,<level-2-usecs>,... | ||
1020 | This debugging option can be used to override the | ||
1021 | default scheduler migration cost matrix. The numbers | ||
1022 | are indexed by 'CPU domain distance'. | ||
1023 | E.g. migration_cost=1000,2000,3000 on an SMT NUMA | ||
1024 | box will set up an intra-core migration cost of | ||
1025 | 1 msec, an inter-core migration cost of 2 msecs, | ||
1026 | and an inter-node migration cost of 3 msecs. | ||
1027 | |||
1028 | WARNING: using the wrong values here can break | ||
1029 | scheduler performance, so it's only for scheduler | ||
1030 | development purposes, not production environments. | ||
1031 | |||
1032 | migration_debug= | ||
1033 | [KNL,SMP] migration cost auto-detect verbosity | ||
1034 | Format=<0|1|2> | ||
1035 | If a system's migration matrix reported at bootup | ||
1036 | seems erroneous then this option can be used to | ||
1037 | increase verbosity of the detection process. | ||
1038 | We default to 0 (no extra messages), 1 will print | ||
1039 | some more information, and 2 will be really | ||
1040 | verbose (probably only useful if you also have a | ||
1041 | serial console attached to the system). | ||
1042 | |||
1043 | migration_factor= | ||
1044 | [KNL,SMP] multiply/divide migration costs by a factor | ||
1045 | Format=<percent> | ||
1046 | This debug option can be used to proportionally | ||
1047 | increase or decrease the auto-detected migration | ||
1048 | costs for all entries of the migration matrix. | ||
1049 | E.g. migration_factor=150 will increase migration | ||
1050 | costs by 50%. (and thus the scheduler will be less | ||
1051 | eager migrating cache-hot tasks) | ||
1052 | migration_factor=80 will decrease migration costs | ||
1053 | by 20%. (thus the scheduler will be more eager to | ||
1054 | migrate tasks) | ||
1055 | |||
1056 | WARNING: using the wrong values here can break | ||
1057 | scheduler performance, so it's only for scheduler | ||
1058 | development purposes, not production environments. | ||
1059 | |||
1060 | mousedev.tap_time= | 1017 | mousedev.tap_time= |
1061 | [MOUSE] Maximum time between finger touching and | 1018 | [MOUSE] Maximum time between finger touching and |
1062 | leaving touchpad surface for touch to be considered | 1019 | leaving touchpad surface for touch to be considered |
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index 153d84d281e6..d63f480afb74 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX | |||
@@ -96,9 +96,6 @@ routing.txt | |||
96 | - the new routing mechanism | 96 | - the new routing mechanism |
97 | shaper.txt | 97 | shaper.txt |
98 | - info on the module that can shape/limit transmitted traffic. | 98 | - info on the module that can shape/limit transmitted traffic. |
99 | sk98lin.txt | ||
100 | - Marvell Yukon Chipset / SysKonnect SK-98xx compliant Gigabit | ||
101 | Ethernet Adapter family driver info | ||
102 | skfp.txt | 99 | skfp.txt |
103 | - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. | 100 | - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. |
104 | smc9.txt | 101 | smc9.txt |
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index af6a63ab9026..09c184e41cf8 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt | |||
@@ -874,8 +874,7 @@ accept_redirects - BOOLEAN | |||
874 | accept_source_route - INTEGER | 874 | accept_source_route - INTEGER |
875 | Accept source routing (routing extension header). | 875 | Accept source routing (routing extension header). |
876 | 876 | ||
877 | > 0: Accept routing header. | 877 | >= 0: Accept only routing header type 2. |
878 | = 0: Accept only routing header type 2. | ||
879 | < 0: Do not accept routing header. | 878 | < 0: Do not accept routing header. |
880 | 879 | ||
881 | Default: 0 | 880 | Default: 0 |
diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt new file mode 100644 index 000000000000..2451f551c505 --- /dev/null +++ b/Documentation/networking/l2tp.txt | |||
@@ -0,0 +1,169 @@ | |||
1 | This brief document describes how to use the kernel's PPPoL2TP driver | ||
2 | to provide L2TP functionality. L2TP is a protocol that tunnels one or | ||
3 | more PPP sessions over a UDP tunnel. It is commonly used for VPNs | ||
4 | (L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP | ||
5 | network infrastructure. | ||
6 | |||
7 | Design | ||
8 | ====== | ||
9 | |||
10 | The PPPoL2TP driver, drivers/net/pppol2tp.c, provides a mechanism by | ||
11 | which PPP frames carried through an L2TP session are passed through | ||
12 | the kernel's PPP subsystem. The standard PPP daemon, pppd, handles all | ||
13 | PPP interaction with the peer. PPP network interfaces are created for | ||
14 | each local PPP endpoint. | ||
15 | |||
16 | The L2TP protocol http://www.faqs.org/rfcs/rfc2661.html defines L2TP | ||
17 | control and data frames. L2TP control frames carry messages between | ||
18 | L2TP clients/servers and are used to setup / teardown tunnels and | ||
19 | sessions. An L2TP client or server is implemented in userspace and | ||
20 | will use a regular UDP socket per tunnel. L2TP data frames carry PPP | ||
21 | frames, which may be PPP control or PPP data. The kernel's PPP | ||
22 | subsystem arranges for PPP control frames to be delivered to pppd, | ||
23 | while data frames are forwarded as usual. | ||
24 | |||
25 | Each tunnel and session within a tunnel is assigned a unique tunnel_id | ||
26 | and session_id. These ids are carried in the L2TP header of every | ||
27 | control and data packet. The pppol2tp driver uses them to lookup | ||
28 | internal tunnel and/or session contexts. Zero tunnel / session ids are | ||
29 | treated specially - zero ids are never assigned to tunnels or sessions | ||
30 | in the network. In the driver, the tunnel context keeps a pointer to | ||
31 | the tunnel UDP socket. The session context keeps a pointer to the | ||
32 | PPPoL2TP socket, as well as other data that lets the driver interface | ||
33 | to the kernel PPP subsystem. | ||
34 | |||
35 | Note that the pppol2tp kernel driver handles only L2TP data frames; | ||
36 | L2TP control frames are simply passed up to userspace in the UDP | ||
37 | tunnel socket. The kernel handles all datapath aspects of the | ||
38 | protocol, including data packet resequencing (if enabled). | ||
39 | |||
40 | There are a number of requirements on the userspace L2TP daemon in | ||
41 | order to use the pppol2tp driver. | ||
42 | |||
43 | 1. Use a UDP socket per tunnel. | ||
44 | |||
45 | 2. Create a single PPPoL2TP socket per tunnel bound to a special null | ||
46 | session id. This is used only for communicating with the driver but | ||
47 | must remain open while the tunnel is active. Opening this tunnel | ||
48 | management socket causes the driver to mark the tunnel socket as an | ||
49 | L2TP UDP encapsulation socket and flags it for use by the | ||
50 | referenced tunnel id. This hooks up the UDP receive path via | ||
51 | udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed | ||
52 | in this special PPPoX socket. | ||
53 | |||
54 | 3. Create a PPPoL2TP socket per L2TP session. This is typically done | ||
55 | by starting pppd with the pppol2tp plugin and appropriate | ||
56 | arguments. A PPPoL2TP tunnel management socket (Step 2) must be | ||
57 | created before the first PPPoL2TP session socket is created. | ||
58 | |||
59 | When creating PPPoL2TP sockets, the application provides information | ||
60 | to the driver about the socket in a socket connect() call. Source and | ||
61 | destination tunnel and session ids are provided, as well as the file | ||
62 | descriptor of a UDP socket. See struct pppol2tp_addr in | ||
63 | include/linux/if_ppp.h. Note that zero tunnel / session ids are | ||
64 | treated specially. When creating the per-tunnel PPPoL2TP management | ||
65 | socket in Step 2 above, zero source and destination session ids are | ||
66 | specified, which tells the driver to prepare the supplied UDP file | ||
67 | descriptor for use as an L2TP tunnel socket. | ||
68 | |||
69 | Userspace may control behavior of the tunnel or session using | ||
70 | setsockopt and ioctl on the PPPoX socket. The following socket | ||
71 | options are supported:- | ||
72 | |||
73 | DEBUG - bitmask of debug message categories. See below. | ||
74 | SENDSEQ - 0 => don't send packets with sequence numbers | ||
75 | 1 => send packets with sequence numbers | ||
76 | RECVSEQ - 0 => receive packet sequence numbers are optional | ||
77 | 1 => drop receive packets without sequence numbers | ||
78 | LNSMODE - 0 => act as LAC. | ||
79 | 1 => act as LNS. | ||
80 | REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder. | ||
81 | |||
82 | Only the DEBUG option is supported by the special tunnel management | ||
83 | PPPoX socket. | ||
84 | |||
85 | In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided | ||
86 | to retrieve tunnel and session statistics from the kernel using the | ||
87 | PPPoX socket of the appropriate tunnel or session. | ||
88 | |||
89 | Debugging | ||
90 | ========= | ||
91 | |||
92 | The driver supports a flexible debug scheme where kernel trace | ||
93 | messages may be optionally enabled per tunnel and per session. Care is | ||
94 | needed when debugging a live system since the messages are not | ||
95 | rate-limited and a busy system could be swamped. Userspace uses | ||
96 | setsockopt on the PPPoX socket to set a debug mask. | ||
97 | |||
98 | The following debug mask bits are available: | ||
99 | |||
100 | PPPOL2TP_MSG_DEBUG verbose debug (if compiled in) | ||
101 | PPPOL2TP_MSG_CONTROL userspace - kernel interface | ||
102 | PPPOL2TP_MSG_SEQ sequence numbers handling | ||
103 | PPPOL2TP_MSG_DATA data packets | ||
104 | |||
105 | Sample Userspace Code | ||
106 | ===================== | ||
107 | |||
108 | 1. Create tunnel management PPPoX socket | ||
109 | |||
110 | kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP); | ||
111 | if (kernel_fd >= 0) { | ||
112 | struct sockaddr_pppol2tp sax; | ||
113 | struct sockaddr_in const *peer_addr; | ||
114 | |||
115 | peer_addr = l2tp_tunnel_get_peer_addr(tunnel); | ||
116 | memset(&sax, 0, sizeof(sax)); | ||
117 | sax.sa_family = AF_PPPOX; | ||
118 | sax.sa_protocol = PX_PROTO_OL2TP; | ||
119 | sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */ | ||
120 | sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr; | ||
121 | sax.pppol2tp.addr.sin_port = peer_addr->sin_port; | ||
122 | sax.pppol2tp.addr.sin_family = AF_INET; | ||
123 | sax.pppol2tp.s_tunnel = tunnel_id; | ||
124 | sax.pppol2tp.s_session = 0; /* special case: mgmt socket */ | ||
125 | sax.pppol2tp.d_tunnel = 0; | ||
126 | sax.pppol2tp.d_session = 0; /* special case: mgmt socket */ | ||
127 | |||
128 | if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) { | ||
129 | perror("connect failed"); | ||
130 | result = -errno; | ||
131 | goto err; | ||
132 | } | ||
133 | } | ||
134 | |||
135 | 2. Create session PPPoX data socket | ||
136 | |||
137 | struct sockaddr_pppol2tp sax; | ||
138 | int fd; | ||
139 | |||
140 | /* Note, the target socket must be bound already, else it will not be ready */ | ||
141 | sax.sa_family = AF_PPPOX; | ||
142 | sax.sa_protocol = PX_PROTO_OL2TP; | ||
143 | sax.pppol2tp.fd = tunnel_fd; | ||
144 | sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr; | ||
145 | sax.pppol2tp.addr.sin_port = addr->sin_port; | ||
146 | sax.pppol2tp.addr.sin_family = AF_INET; | ||
147 | sax.pppol2tp.s_tunnel = tunnel_id; | ||
148 | sax.pppol2tp.s_session = session_id; | ||
149 | sax.pppol2tp.d_tunnel = peer_tunnel_id; | ||
150 | sax.pppol2tp.d_session = peer_session_id; | ||
151 | |||
152 | /* session_fd is the fd of the session's PPPoL2TP socket. | ||
153 | * tunnel_fd is the fd of the tunnel UDP socket. | ||
154 | */ | ||
155 | fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax)); | ||
156 | if (fd < 0 ) { | ||
157 | return -errno; | ||
158 | } | ||
159 | return 0; | ||
160 | |||
161 | Miscellanous | ||
162 | ============ | ||
163 | |||
164 | The PPPoL2TP driver was developed as part of the OpenL2TP project by | ||
165 | Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server, | ||
166 | designed from the ground up to have the L2TP datapath in the | ||
167 | kernel. The project also implemented the pppol2tp plugin for pppd | ||
168 | which allows pppd to use the kernel driver. Details can be found at | ||
169 | http://openl2tp.sourceforge.net. | ||
diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt new file mode 100644 index 000000000000..00b60cce2224 --- /dev/null +++ b/Documentation/networking/multiqueue.txt | |||
@@ -0,0 +1,111 @@ | |||
1 | |||
2 | HOWTO for multiqueue network device support | ||
3 | =========================================== | ||
4 | |||
5 | Section 1: Base driver requirements for implementing multiqueue support | ||
6 | Section 2: Qdisc support for multiqueue devices | ||
7 | Section 3: Brief howto using PRIO or RR for multiqueue devices | ||
8 | |||
9 | |||
10 | Intro: Kernel support for multiqueue devices | ||
11 | --------------------------------------------------------- | ||
12 | |||
13 | Kernel support for multiqueue devices is only an API that is presented to the | ||
14 | netdevice layer for base drivers to implement. This feature is part of the | ||
15 | core networking stack, and all network devices will be running on the | ||
16 | multiqueue-aware stack. If a base driver only has one queue, then these | ||
17 | changes are transparent to that driver. | ||
18 | |||
19 | |||
20 | Section 1: Base driver requirements for implementing multiqueue support | ||
21 | ----------------------------------------------------------------------- | ||
22 | |||
23 | Base drivers are required to use the new alloc_etherdev_mq() or | ||
24 | alloc_netdev_mq() functions to allocate the subqueues for the device. The | ||
25 | underlying kernel API will take care of the allocation and deallocation of | ||
26 | the subqueue memory, as well as netdev configuration of where the queues | ||
27 | exist in memory. | ||
28 | |||
29 | The base driver will also need to manage the queues as it does the global | ||
30 | netdev->queue_lock today. Therefore base drivers should use the | ||
31 | netif_{start|stop|wake}_subqueue() functions to manage each queue while the | ||
32 | device is still operational. netdev->queue_lock is still used when the device | ||
33 | comes online or when it's completely shut down (unregister_netdev(), etc.). | ||
34 | |||
35 | Finally, the base driver should indicate that it is a multiqueue device. The | ||
36 | feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features | ||
37 | bitmap on device initialization. Below is an example from e1000: | ||
38 | |||
39 | #ifdef CONFIG_E1000_MQ | ||
40 | if ( (adapter->hw.mac.type == e1000_82571) || | ||
41 | (adapter->hw.mac.type == e1000_82572) || | ||
42 | (adapter->hw.mac.type == e1000_80003es2lan)) | ||
43 | netdev->features |= NETIF_F_MULTI_QUEUE; | ||
44 | #endif | ||
45 | |||
46 | |||
47 | Section 2: Qdisc support for multiqueue devices | ||
48 | ----------------------------------------------- | ||
49 | |||
50 | Currently two qdiscs support multiqueue devices. A new round-robin qdisc, | ||
51 | sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to | ||
52 | bands and queues, and will store the queue mapping into skb->queue_mapping. | ||
53 | Use this field in the base driver to determine which queue to send the skb | ||
54 | to. | ||
55 | |||
56 | sch_rr has been added for hardware that doesn't want scheduling policies from | ||
57 | software, so it's a straight round-robin qdisc. It uses the same syntax and | ||
58 | classification priomap that sch_prio uses, so it should be intuitive to | ||
59 | configure for people who've used sch_prio. | ||
60 | |||
61 | The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been | ||
62 | built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of | ||
63 | bands requested is equal to the number of queues on the hardware. If they | ||
64 | are equal, it sets a one-to-one mapping up between the queues and bands. If | ||
65 | they're not equal, it will not load the qdisc. This is the same behavior | ||
66 | for RR. Once the association is made, any skb that is classified will have | ||
67 | skb->queue_mapping set, which will allow the driver to properly queue skb's | ||
68 | to multiple queues. | ||
69 | |||
70 | |||
71 | Section 3: Brief howto using PRIO and RR for multiqueue devices | ||
72 | --------------------------------------------------------------- | ||
73 | |||
74 | The userspace command 'tc,' part of the iproute2 package, is used to configure | ||
75 | qdiscs. To add the PRIO qdisc to your network device, assuming the device is | ||
76 | called eth0, run the following command: | ||
77 | |||
78 | # tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue | ||
79 | |||
80 | This will create 4 bands, 0 being highest priority, and associate those bands | ||
81 | to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping | ||
82 | would look like: | ||
83 | |||
84 | band 0 => queue 0 | ||
85 | band 1 => queue 1 | ||
86 | band 2 => queue 2 | ||
87 | band 3 => queue 3 | ||
88 | |||
89 | Traffic will begin flowing through each queue if your TOS values are assigning | ||
90 | traffic across the various bands. For example, ssh traffic will always try to | ||
91 | go out band 0 based on TOS -> Linux priority conversion (realtime traffic), | ||
92 | so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal" | ||
93 | traffic classification, which is band 1. Therefore pings will be send out | ||
94 | queue 1 on the NIC. | ||
95 | |||
96 | Note the use of the multiqueue keyword. This is only in versions of iproute2 | ||
97 | that support multiqueue networking devices; if this is omitted when loading | ||
98 | a qdisc onto a multiqueue device, the qdisc will load and operate the same | ||
99 | if it were loaded onto a single-queue device (i.e. - sends all traffic to | ||
100 | queue 0). | ||
101 | |||
102 | Another alternative to multiqueue band allocation can be done by using the | ||
103 | multiqueue option and specify 0 bands. If this is the case, the qdisc will | ||
104 | allocate the number of bands to equal the number of queues that the device | ||
105 | reports, and bring the qdisc online. | ||
106 | |||
107 | The behavior of tc filters remains the same, where it will override TOS priority | ||
108 | classification. | ||
109 | |||
110 | |||
111 | Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> | ||
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt index ce1361f95243..37869295fc70 100644 --- a/Documentation/networking/netdevices.txt +++ b/Documentation/networking/netdevices.txt | |||
@@ -20,6 +20,30 @@ private data which gets freed when the network device is freed. If | |||
20 | separately allocated data is attached to the network device | 20 | separately allocated data is attached to the network device |
21 | (dev->priv) then it is up to the module exit handler to free that. | 21 | (dev->priv) then it is up to the module exit handler to free that. |
22 | 22 | ||
23 | MTU | ||
24 | === | ||
25 | Each network device has a Maximum Transfer Unit. The MTU does not | ||
26 | include any link layer protocol overhead. Upper layer protocols must | ||
27 | not pass a socket buffer (skb) to a device to transmit with more data | ||
28 | than the mtu. The MTU does not include link layer header overhead, so | ||
29 | for example on Ethernet if the standard MTU is 1500 bytes used, the | ||
30 | actual skb will contain up to 1514 bytes because of the Ethernet | ||
31 | header. Devices should allow for the 4 byte VLAN header as well. | ||
32 | |||
33 | Segmentation Offload (GSO, TSO) is an exception to this rule. The | ||
34 | upper layer protocol may pass a large socket buffer to the device | ||
35 | transmit routine, and the device will break that up into separate | ||
36 | packets based on the current MTU. | ||
37 | |||
38 | MTU is symmetrical and applies both to receive and transmit. A device | ||
39 | must be able to receive at least the maximum size packet allowed by | ||
40 | the MTU. A network device may use the MTU as mechanism to size receive | ||
41 | buffers, but the device should allow packets with VLAN header. With | ||
42 | standard Ethernet mtu of 1500 bytes, the device should allow up to | ||
43 | 1518 byte packets (1500 + 14 header + 4 tag). The device may either: | ||
44 | drop, truncate, or pass up oversize packets, but dropping oversize | ||
45 | packets is preferred. | ||
46 | |||
23 | 47 | ||
24 | struct net_device synchronization rules | 48 | struct net_device synchronization rules |
25 | ======================================= | 49 | ======================================= |
@@ -43,16 +67,17 @@ dev->get_stats: | |||
43 | 67 | ||
44 | dev->hard_start_xmit: | 68 | dev->hard_start_xmit: |
45 | Synchronization: netif_tx_lock spinlock. | 69 | Synchronization: netif_tx_lock spinlock. |
70 | |||
46 | When the driver sets NETIF_F_LLTX in dev->features this will be | 71 | When the driver sets NETIF_F_LLTX in dev->features this will be |
47 | called without holding netif_tx_lock. In this case the driver | 72 | called without holding netif_tx_lock. In this case the driver |
48 | has to lock by itself when needed. It is recommended to use a try lock | 73 | has to lock by itself when needed. It is recommended to use a try lock |
49 | for this and return -1 when the spin lock fails. | 74 | for this and return NETDEV_TX_LOCKED when the spin lock fails. |
50 | The locking there should also properly protect against | 75 | The locking there should also properly protect against |
51 | set_multicast_list | 76 | set_multicast_list. |
52 | Context: Process with BHs disabled or BH (timer). | 77 | |
53 | Notes: netif_queue_stopped() is guaranteed false | 78 | Context: Process with BHs disabled or BH (timer), |
54 | Interrupts must be enabled when calling hard_start_xmit. | 79 | will be called with interrupts disabled by netconsole. |
55 | (Interrupts must also be enabled when enabling the BH handler.) | 80 | |
56 | Return codes: | 81 | Return codes: |
57 | o NETDEV_TX_OK everything ok. | 82 | o NETDEV_TX_OK everything ok. |
58 | o NETDEV_TX_BUSY Cannot transmit packet, try later | 83 | o NETDEV_TX_BUSY Cannot transmit packet, try later |
@@ -74,4 +99,5 @@ dev->poll: | |||
74 | Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See | 99 | Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See |
75 | dev_close code and comments in net/core/dev.c for more info. | 100 | dev_close code and comments in net/core/dev.c for more info. |
76 | Context: softirq | 101 | Context: softirq |
102 | will be called with interrupts disabled by netconsole. | ||
77 | 103 | ||
diff --git a/Documentation/networking/sk98lin.txt b/Documentation/networking/sk98lin.txt deleted file mode 100644 index 8590a954df1d..000000000000 --- a/Documentation/networking/sk98lin.txt +++ /dev/null | |||
@@ -1,568 +0,0 @@ | |||
1 | (C)Copyright 1999-2004 Marvell(R). | ||
2 | All rights reserved | ||
3 | =========================================================================== | ||
4 | |||
5 | sk98lin.txt created 13-Feb-2004 | ||
6 | |||
7 | Readme File for sk98lin v6.23 | ||
8 | Marvell Yukon/SysKonnect SK-98xx Gigabit Ethernet Adapter family driver for LINUX | ||
9 | |||
10 | This file contains | ||
11 | 1 Overview | ||
12 | 2 Required Files | ||
13 | 3 Installation | ||
14 | 3.1 Driver Installation | ||
15 | 3.2 Inclusion of adapter at system start | ||
16 | 4 Driver Parameters | ||
17 | 4.1 Per-Port Parameters | ||
18 | 4.2 Adapter Parameters | ||
19 | 5 Large Frame Support | ||
20 | 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) | ||
21 | 7 Troubleshooting | ||
22 | |||
23 | =========================================================================== | ||
24 | |||
25 | |||
26 | 1 Overview | ||
27 | =========== | ||
28 | |||
29 | The sk98lin driver supports the Marvell Yukon and SysKonnect | ||
30 | SK-98xx/SK-95xx compliant Gigabit Ethernet Adapter on Linux. It has | ||
31 | been tested with Linux on Intel/x86 machines. | ||
32 | *** | ||
33 | |||
34 | |||
35 | 2 Required Files | ||
36 | ================= | ||
37 | |||
38 | The linux kernel source. | ||
39 | No additional files required. | ||
40 | *** | ||
41 | |||
42 | |||
43 | 3 Installation | ||
44 | =============== | ||
45 | |||
46 | It is recommended to download the latest version of the driver from the | ||
47 | SysKonnect web site www.syskonnect.com. If you have downloaded the latest | ||
48 | driver, the Linux kernel has to be patched before the driver can be | ||
49 | installed. For details on how to patch a Linux kernel, refer to the | ||
50 | patch.txt file. | ||
51 | |||
52 | 3.1 Driver Installation | ||
53 | ------------------------ | ||
54 | |||
55 | The following steps describe the actions that are required to install | ||
56 | the driver and to start it manually. These steps should be carried | ||
57 | out for the initial driver setup. Once confirmed to be ok, they can | ||
58 | be included in the system start. | ||
59 | |||
60 | NOTE 1: To perform the following tasks you need 'root' access. | ||
61 | |||
62 | NOTE 2: In case of problems, please read the section "Troubleshooting" | ||
63 | below. | ||
64 | |||
65 | The driver can either be integrated into the kernel or it can be compiled | ||
66 | as a module. Select the appropriate option during the kernel | ||
67 | configuration. | ||
68 | |||
69 | Compile/use the driver as a module | ||
70 | ---------------------------------- | ||
71 | To compile the driver, go to the directory /usr/src/linux and | ||
72 | execute the command "make menuconfig" or "make xconfig" and proceed as | ||
73 | follows: | ||
74 | |||
75 | To integrate the driver permanently into the kernel, proceed as follows: | ||
76 | |||
77 | 1. Select the menu "Network device support" and then "Ethernet(1000Mbit)" | ||
78 | 2. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" | ||
79 | with (*) | ||
80 | 3. Build a new kernel when the configuration of the above options is | ||
81 | finished. | ||
82 | 4. Install the new kernel. | ||
83 | 5. Reboot your system. | ||
84 | |||
85 | To use the driver as a module, proceed as follows: | ||
86 | |||
87 | 1. Enable 'loadable module support' in the kernel. | ||
88 | 2. For automatic driver start, enable the 'Kernel module loader'. | ||
89 | 3. Select the menu "Network device support" and then "Ethernet(1000Mbit)" | ||
90 | 4. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" | ||
91 | with (M) | ||
92 | 5. Execute the command "make modules". | ||
93 | 6. Execute the command "make modules_install". | ||
94 | The appropriate modules will be installed. | ||
95 | 7. Reboot your system. | ||
96 | |||
97 | |||
98 | Load the module manually | ||
99 | ------------------------ | ||
100 | To load the module manually, proceed as follows: | ||
101 | |||
102 | 1. Enter "modprobe sk98lin". | ||
103 | 2. If a Marvell Yukon or SysKonnect SK-98xx adapter is installed in | ||
104 | your computer and you have a /proc file system, execute the command: | ||
105 | "ls /proc/net/sk98lin/" | ||
106 | This should produce an output containing a line with the following | ||
107 | format: | ||
108 | eth0 eth1 ... | ||
109 | which indicates that your adapter has been found and initialized. | ||
110 | |||
111 | NOTE 1: If you have more than one Marvell Yukon or SysKonnect SK-98xx | ||
112 | adapter installed, the adapters will be listed as 'eth0', | ||
113 | 'eth1', 'eth2', etc. | ||
114 | For each adapter, repeat steps 3 and 4 below. | ||
115 | |||
116 | NOTE 2: If you have other Ethernet adapters installed, your Marvell | ||
117 | Yukon or SysKonnect SK-98xx adapter will be mapped to the | ||
118 | next available number, e.g. 'eth1'. The mapping is executed | ||
119 | automatically. | ||
120 | The module installation message (displayed either in a system | ||
121 | log file or on the console) prints a line for each adapter | ||
122 | found containing the corresponding 'ethX'. | ||
123 | |||
124 | 3. Select an IP address and assign it to the respective adapter by | ||
125 | entering: | ||
126 | ifconfig eth0 <ip-address> | ||
127 | With this command, the adapter is connected to the Ethernet. | ||
128 | |||
129 | SK-98xx Gigabit Ethernet Server Adapters: The yellow LED on the adapter | ||
130 | is now active, the link status LED of the primary port is active and | ||
131 | the link status LED of the secondary port (on dual port adapters) is | ||
132 | blinking (if the ports are connected to a switch or hub). | ||
133 | SK-98xx V2.0 Gigabit Ethernet Adapters: The link status LED is active. | ||
134 | In addition, you will receive a status message on the console stating | ||
135 | "ethX: network connection up using port Y" and showing the selected | ||
136 | connection parameters (x stands for the ethernet device number | ||
137 | (0,1,2, etc), y stands for the port name (A or B)). | ||
138 | |||
139 | NOTE: If you are in doubt about IP addresses, ask your network | ||
140 | administrator for assistance. | ||
141 | |||
142 | 4. Your adapter should now be fully operational. | ||
143 | Use 'ping <otherstation>' to verify the connection to other computers | ||
144 | on your network. | ||
145 | 5. To check the adapter configuration view /proc/net/sk98lin/[devicename]. | ||
146 | For example by executing: | ||
147 | "cat /proc/net/sk98lin/eth0" | ||
148 | |||
149 | Unload the module | ||
150 | ----------------- | ||
151 | To stop and unload the driver modules, proceed as follows: | ||
152 | |||
153 | 1. Execute the command "ifconfig eth0 down". | ||
154 | 2. Execute the command "rmmod sk98lin". | ||
155 | |||
156 | 3.2 Inclusion of adapter at system start | ||
157 | ----------------------------------------- | ||
158 | |||
159 | Since a large number of different Linux distributions are | ||
160 | available, we are unable to describe a general installation procedure | ||
161 | for the driver module. | ||
162 | Because the driver is now integrated in the kernel, installation should | ||
163 | be easy, using the standard mechanism of your distribution. | ||
164 | Refer to the distribution's manual for installation of ethernet adapters. | ||
165 | |||
166 | *** | ||
167 | |||
168 | 4 Driver Parameters | ||
169 | ==================== | ||
170 | |||
171 | Parameters can be set at the command line after the module has been | ||
172 | loaded with the command 'modprobe'. | ||
173 | In some distributions, the configuration tools are able to pass parameters | ||
174 | to the driver module. | ||
175 | |||
176 | If you use the kernel module loader, you can set driver parameters | ||
177 | in the file /etc/modprobe.conf (or /etc/modules.conf in 2.4 or earlier). | ||
178 | To set the driver parameters in this file, proceed as follows: | ||
179 | |||
180 | 1. Insert a line of the form : | ||
181 | options sk98lin ... | ||
182 | For "...", the same syntax is required as described for the command | ||
183 | line parameters of modprobe below. | ||
184 | 2. To activate the new parameters, either reboot your computer | ||
185 | or | ||
186 | unload and reload the driver. | ||
187 | The syntax of the driver parameters is: | ||
188 | |||
189 | modprobe sk98lin parameter=value1[,value2[,value3...]] | ||
190 | |||
191 | where value1 refers to the first adapter, value2 to the second etc. | ||
192 | |||
193 | NOTE: All parameters are case sensitive. Write them exactly as shown | ||
194 | below. | ||
195 | |||
196 | Example: | ||
197 | Suppose you have two adapters. You want to set auto-negotiation | ||
198 | on the first adapter to ON and on the second adapter to OFF. | ||
199 | You also want to set DuplexCapabilities on the first adapter | ||
200 | to FULL, and on the second adapter to HALF. | ||
201 | Then, you must enter: | ||
202 | |||
203 | modprobe sk98lin AutoNeg_A=On,Off DupCap_A=Full,Half | ||
204 | |||
205 | NOTE: The number of adapters that can be configured this way is | ||
206 | limited in the driver (file skge.c, constant SK_MAX_CARD_PARAM). | ||
207 | The current limit is 16. If you happen to install | ||
208 | more adapters, adjust this and recompile. | ||
209 | |||
210 | |||
211 | 4.1 Per-Port Parameters | ||
212 | ------------------------ | ||
213 | |||
214 | These settings are available for each port on the adapter. | ||
215 | In the following description, '?' stands for the port for | ||
216 | which you set the parameter (A or B). | ||
217 | |||
218 | Speed | ||
219 | ----- | ||
220 | Parameter: Speed_? | ||
221 | Values: 10, 100, 1000, Auto | ||
222 | Default: Auto | ||
223 | |||
224 | This parameter is used to set the speed capabilities. It is only valid | ||
225 | for the SK-98xx V2.0 copper adapters. | ||
226 | Usually, the speed is negotiated between the two ports during link | ||
227 | establishment. If this fails, a port can be forced to a specific setting | ||
228 | with this parameter. | ||
229 | |||
230 | Auto-Negotiation | ||
231 | ---------------- | ||
232 | Parameter: AutoNeg_? | ||
233 | Values: On, Off, Sense | ||
234 | Default: On | ||
235 | |||
236 | The "Sense"-mode automatically detects whether the link partner supports | ||
237 | auto-negotiation or not. | ||
238 | |||
239 | Duplex Capabilities | ||
240 | ------------------- | ||
241 | Parameter: DupCap_? | ||
242 | Values: Half, Full, Both | ||
243 | Default: Both | ||
244 | |||
245 | This parameters is only relevant if auto-negotiation for this port is | ||
246 | not set to "Sense". If auto-negotiation is set to "On", all three values | ||
247 | are possible. If it is set to "Off", only "Full" and "Half" are allowed. | ||
248 | This parameter is useful if your link partner does not support all | ||
249 | possible combinations. | ||
250 | |||
251 | Flow Control | ||
252 | ------------ | ||
253 | Parameter: FlowCtrl_? | ||
254 | Values: Sym, SymOrRem, LocSend, None | ||
255 | Default: SymOrRem | ||
256 | |||
257 | This parameter can be used to set the flow control capabilities the | ||
258 | port reports during auto-negotiation. It can be set for each port | ||
259 | individually. | ||
260 | Possible modes: | ||
261 | -- Sym = Symmetric: both link partners are allowed to send | ||
262 | PAUSE frames | ||
263 | -- SymOrRem = SymmetricOrRemote: both or only remote partner | ||
264 | are allowed to send PAUSE frames | ||
265 | -- LocSend = LocalSend: only local link partner is allowed | ||
266 | to send PAUSE frames | ||
267 | -- None = no link partner is allowed to send PAUSE frames | ||
268 | |||
269 | NOTE: This parameter is ignored if auto-negotiation is set to "Off". | ||
270 | |||
271 | Role in Master-Slave-Negotiation (1000Base-T only) | ||
272 | -------------------------------------------------- | ||
273 | Parameter: Role_? | ||
274 | Values: Auto, Master, Slave | ||
275 | Default: Auto | ||
276 | |||
277 | This parameter is only valid for the SK-9821 and SK-9822 adapters. | ||
278 | For two 1000Base-T ports to communicate, one must take the role of the | ||
279 | master (providing timing information), while the other must be the | ||
280 | slave. Usually, this is negotiated between the two ports during link | ||
281 | establishment. If this fails, a port can be forced to a specific setting | ||
282 | with this parameter. | ||
283 | |||
284 | |||
285 | 4.2 Adapter Parameters | ||
286 | ----------------------- | ||
287 | |||
288 | Connection Type (SK-98xx V2.0 copper adapters only) | ||
289 | --------------- | ||
290 | Parameter: ConType | ||
291 | Values: Auto, 100FD, 100HD, 10FD, 10HD | ||
292 | Default: Auto | ||
293 | |||
294 | The parameter 'ConType' is a combination of all five per-port parameters | ||
295 | within one single parameter. This simplifies the configuration of both ports | ||
296 | of an adapter card! The different values of this variable reflect the most | ||
297 | meaningful combinations of port parameters. | ||
298 | |||
299 | The following table shows the values of 'ConType' and the corresponding | ||
300 | combinations of the per-port parameters: | ||
301 | |||
302 | ConType | DupCap AutoNeg FlowCtrl Role Speed | ||
303 | ----------+------------------------------------------------------ | ||
304 | Auto | Both On SymOrRem Auto Auto | ||
305 | 100FD | Full Off None Auto (ignored) 100 | ||
306 | 100HD | Half Off None Auto (ignored) 100 | ||
307 | 10FD | Full Off None Auto (ignored) 10 | ||
308 | 10HD | Half Off None Auto (ignored) 10 | ||
309 | |||
310 | Stating any other port parameter together with this 'ConType' variable | ||
311 | will result in a merged configuration of those settings. This due to | ||
312 | the fact, that the per-port parameters (e.g. Speed_? ) have a higher | ||
313 | priority than the combined variable 'ConType'. | ||
314 | |||
315 | NOTE: This parameter is always used on both ports of the adapter card. | ||
316 | |||
317 | Interrupt Moderation | ||
318 | -------------------- | ||
319 | Parameter: Moderation | ||
320 | Values: None, Static, Dynamic | ||
321 | Default: None | ||
322 | |||
323 | Interrupt moderation is employed to limit the maximum number of interrupts | ||
324 | the driver has to serve. That is, one or more interrupts (which indicate any | ||
325 | transmit or receive packet to be processed) are queued until the driver | ||
326 | processes them. When queued interrupts are to be served, is determined by the | ||
327 | 'IntsPerSec' parameter, which is explained later below. | ||
328 | |||
329 | Possible modes: | ||
330 | |||
331 | -- None - No interrupt moderation is applied on the adapter card. | ||
332 | Therefore, each transmit or receive interrupt is served immediately | ||
333 | as soon as it appears on the interrupt line of the adapter card. | ||
334 | |||
335 | -- Static - Interrupt moderation is applied on the adapter card. | ||
336 | All transmit and receive interrupts are queued until a complete | ||
337 | moderation interval ends. If such a moderation interval ends, all | ||
338 | queued interrupts are processed in one big bunch without any delay. | ||
339 | The term 'static' reflects the fact, that interrupt moderation is | ||
340 | always enabled, regardless how much network load is currently | ||
341 | passing via a particular interface. In addition, the duration of | ||
342 | the moderation interval has a fixed length that never changes while | ||
343 | the driver is operational. | ||
344 | |||
345 | -- Dynamic - Interrupt moderation might be applied on the adapter card, | ||
346 | depending on the load of the system. If the driver detects that the | ||
347 | system load is too high, the driver tries to shield the system against | ||
348 | too much network load by enabling interrupt moderation. If - at a later | ||
349 | time - the CPU utilization decreases again (or if the network load is | ||
350 | negligible) the interrupt moderation will automatically be disabled. | ||
351 | |||
352 | Interrupt moderation should be used when the driver has to handle one or more | ||
353 | interfaces with a high network load, which - as a consequence - leads also to a | ||
354 | high CPU utilization. When moderation is applied in such high network load | ||
355 | situations, CPU load might be reduced by 20-30%. | ||
356 | |||
357 | NOTE: The drawback of using interrupt moderation is an increase of the round- | ||
358 | trip-time (RTT), due to the queueing and serving of interrupts at dedicated | ||
359 | moderation times. | ||
360 | |||
361 | Interrupts per second | ||
362 | --------------------- | ||
363 | Parameter: IntsPerSec | ||
364 | Values: 30...40000 (interrupts per second) | ||
365 | Default: 2000 | ||
366 | |||
367 | This parameter is only used if either static or dynamic interrupt moderation | ||
368 | is used on a network adapter card. Using this parameter if no moderation is | ||
369 | applied will lead to no action performed. | ||
370 | |||
371 | This parameter determines the length of any interrupt moderation interval. | ||
372 | Assuming that static interrupt moderation is to be used, an 'IntsPerSec' | ||
373 | parameter value of 2000 will lead to an interrupt moderation interval of | ||
374 | 500 microseconds. | ||
375 | |||
376 | NOTE: The duration of the moderation interval is to be chosen with care. | ||
377 | At first glance, selecting a very long duration (e.g. only 100 interrupts per | ||
378 | second) seems to be meaningful, but the increase of packet-processing delay | ||
379 | is tremendous. On the other hand, selecting a very short moderation time might | ||
380 | compensate the use of any moderation being applied. | ||
381 | |||
382 | |||
383 | Preferred Port | ||
384 | -------------- | ||
385 | Parameter: PrefPort | ||
386 | Values: A, B | ||
387 | Default: A | ||
388 | |||
389 | This is used to force the preferred port to A or B (on dual-port network | ||
390 | adapters). The preferred port is the one that is used if both are detected | ||
391 | as fully functional. | ||
392 | |||
393 | RLMT Mode (Redundant Link Management Technology) | ||
394 | ------------------------------------------------ | ||
395 | Parameter: RlmtMode | ||
396 | Values: CheckLinkState,CheckLocalPort, CheckSeg, DualNet | ||
397 | Default: CheckLinkState | ||
398 | |||
399 | RLMT monitors the status of the port. If the link of the active port | ||
400 | fails, RLMT switches immediately to the standby link. The virtual link is | ||
401 | maintained as long as at least one 'physical' link is up. | ||
402 | |||
403 | Possible modes: | ||
404 | |||
405 | -- CheckLinkState - Check link state only: RLMT uses the link state | ||
406 | reported by the adapter hardware for each individual port to | ||
407 | determine whether a port can be used for all network traffic or | ||
408 | not. | ||
409 | |||
410 | -- CheckLocalPort - In this mode, RLMT monitors the network path | ||
411 | between the two ports of an adapter by regularly exchanging packets | ||
412 | between them. This mode requires a network configuration in which | ||
413 | the two ports are able to "see" each other (i.e. there must not be | ||
414 | any router between the ports). | ||
415 | |||
416 | -- CheckSeg - Check local port and segmentation: This mode supports the | ||
417 | same functions as the CheckLocalPort mode and additionally checks | ||
418 | network segmentation between the ports. Therefore, this mode is only | ||
419 | to be used if Gigabit Ethernet switches are installed on the network | ||
420 | that have been configured to use the Spanning Tree protocol. | ||
421 | |||
422 | -- DualNet - In this mode, ports A and B are used as separate devices. | ||
423 | If you have a dual port adapter, port A will be configured as eth0 | ||
424 | and port B as eth1. Both ports can be used independently with | ||
425 | distinct IP addresses. The preferred port setting is not used. | ||
426 | RLMT is turned off. | ||
427 | |||
428 | NOTE: RLMT modes CLP and CLPSS are designed to operate in configurations | ||
429 | where a network path between the ports on one adapter exists. | ||
430 | Moreover, they are not designed to work where adapters are connected | ||
431 | back-to-back. | ||
432 | *** | ||
433 | |||
434 | |||
435 | 5 Large Frame Support | ||
436 | ====================== | ||
437 | |||
438 | The driver supports large frames (also called jumbo frames). Using large | ||
439 | frames can result in an improved throughput if transferring large amounts | ||
440 | of data. | ||
441 | To enable large frames, set the MTU (maximum transfer unit) of the | ||
442 | interface to the desired value (up to 9000), execute the following | ||
443 | command: | ||
444 | ifconfig eth0 mtu 9000 | ||
445 | This will only work if you have two adapters connected back-to-back | ||
446 | or if you use a switch that supports large frames. When using a switch, | ||
447 | it should be configured to allow large frames and auto-negotiation should | ||
448 | be set to OFF. The setting must be configured on all adapters that can be | ||
449 | reached by the large frames. If one adapter is not set to receive large | ||
450 | frames, it will simply drop them. | ||
451 | |||
452 | You can switch back to the standard ethernet frame size by executing the | ||
453 | following command: | ||
454 | ifconfig eth0 mtu 1500 | ||
455 | |||
456 | To permanently configure this setting, add a script with the 'ifconfig' | ||
457 | line to the system startup sequence (named something like "S99sk98lin" | ||
458 | in /etc/rc.d/rc2.d). | ||
459 | *** | ||
460 | |||
461 | |||
462 | 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) | ||
463 | ================================================================== | ||
464 | |||
465 | The Marvell Yukon/SysKonnect Linux drivers are able to support VLAN and | ||
466 | Link Aggregation according to IEEE standards 802.1, 802.1q, and 802.3ad. | ||
467 | These features are only available after installation of open source | ||
468 | modules available on the Internet: | ||
469 | For VLAN go to: http://www.candelatech.com/~greear/vlan.html | ||
470 | For Link Aggregation go to: http://www.st.rim.or.jp/~yumo | ||
471 | |||
472 | NOTE: SysKonnect GmbH does not offer any support for these open source | ||
473 | modules and does not take the responsibility for any kind of | ||
474 | failures or problems arising in connection with these modules. | ||
475 | |||
476 | NOTE: Configuring Link Aggregation on a SysKonnect dual link adapter may | ||
477 | cause problems when unloading the driver. | ||
478 | |||
479 | |||
480 | 7 Troubleshooting | ||
481 | ================== | ||
482 | |||
483 | If any problems occur during the installation process, check the | ||
484 | following list: | ||
485 | |||
486 | |||
487 | Problem: The SK-98xx adapter cannot be found by the driver. | ||
488 | Solution: In /proc/pci search for the following entry: | ||
489 | 'Ethernet controller: SysKonnect SK-98xx ...' | ||
490 | If this entry exists, the SK-98xx or SK-98xx V2.0 adapter has | ||
491 | been found by the system and should be operational. | ||
492 | If this entry does not exist or if the file '/proc/pci' is not | ||
493 | found, there may be a hardware problem or the PCI support may | ||
494 | not be enabled in your kernel. | ||
495 | The adapter can be checked using the diagnostics program which | ||
496 | is available on the SysKonnect web site: | ||
497 | www.syskonnect.com | ||
498 | |||
499 | Some COMPAQ machines have problems dealing with PCI under Linux. | ||
500 | This problem is described in the 'PCI howto' document | ||
501 | (included in some distributions or available from the | ||
502 | web, e.g. at 'www.linux.org'). | ||
503 | |||
504 | |||
505 | Problem: Programs such as 'ifconfig' or 'route' cannot be found or the | ||
506 | error message 'Operation not permitted' is displayed. | ||
507 | Reason: You are not logged in as user 'root'. | ||
508 | Solution: Logout and login as 'root' or change to 'root' via 'su'. | ||
509 | |||
510 | |||
511 | Problem: Upon use of the command 'ping <address>' the message | ||
512 | "ping: sendto: Network is unreachable" is displayed. | ||
513 | Reason: Your route is not set correctly. | ||
514 | Solution: If you are using RedHat, you probably forgot to set up the | ||
515 | route in the 'network configuration'. | ||
516 | Check the existing routes with the 'route' command and check | ||
517 | if an entry for 'eth0' exists, and if so, if it is set correctly. | ||
518 | |||
519 | |||
520 | Problem: The driver can be started, the adapter is connected to the | ||
521 | network, but you cannot receive or transmit any packets; | ||
522 | e.g. 'ping' does not work. | ||
523 | Reason: There is an incorrect route in your routing table. | ||
524 | Solution: Check the routing table with the command 'route' and read the | ||
525 | manual help pages dealing with routes (enter 'man route'). | ||
526 | |||
527 | NOTE: Although the 2.2.x kernel versions generate the routing entry | ||
528 | automatically, problems of this kind may occur here as well. We've | ||
529 | come across a situation in which the driver started correctly at | ||
530 | system start, but after the driver has been removed and reloaded, | ||
531 | the route of the adapter's network pointed to the 'dummy0'device | ||
532 | and had to be corrected manually. | ||
533 | |||
534 | |||
535 | Problem: Your computer should act as a router between multiple | ||
536 | IP subnetworks (using multiple adapters), but computers in | ||
537 | other subnetworks cannot be reached. | ||
538 | Reason: Either the router's kernel is not configured for IP forwarding | ||
539 | or the routing table and gateway configuration of at least one | ||
540 | computer is not working. | ||
541 | |||
542 | Problem: Upon driver start, the following error message is displayed: | ||
543 | "eth0: -- ERROR -- | ||
544 | Class: internal Software error | ||
545 | Nr: 0xcc | ||
546 | Msg: SkGeInitPort() cannot init running ports" | ||
547 | Reason: You are using a driver compiled for single processor machines | ||
548 | on a multiprocessor machine with SMP (Symmetric MultiProcessor) | ||
549 | kernel. | ||
550 | Solution: Configure your kernel appropriately and recompile the kernel or | ||
551 | the modules. | ||
552 | |||
553 | |||
554 | |||
555 | If your problem is not listed here, please contact SysKonnect's technical | ||
556 | support for help (linux@syskonnect.de). | ||
557 | When contacting our technical support, please ensure that the following | ||
558 | information is available: | ||
559 | - System Manufacturer and HW Informations (CPU, Memory... ) | ||
560 | - PCI-Boards in your system | ||
561 | - Distribution | ||
562 | - Kernel version | ||
563 | - Driver version | ||
564 | *** | ||
565 | |||
566 | |||
567 | |||
568 | ***End of Readme File*** | ||
diff --git a/Documentation/networking/spider_net.txt b/Documentation/networking/spider_net.txt new file mode 100644 index 000000000000..4b4adb8eb14f --- /dev/null +++ b/Documentation/networking/spider_net.txt | |||
@@ -0,0 +1,204 @@ | |||
1 | |||
2 | The Spidernet Device Driver | ||
3 | =========================== | ||
4 | |||
5 | Written by Linas Vepstas <linas@austin.ibm.com> | ||
6 | |||
7 | Version of 7 June 2007 | ||
8 | |||
9 | Abstract | ||
10 | ======== | ||
11 | This document sketches the structure of portions of the spidernet | ||
12 | device driver in the Linux kernel tree. The spidernet is a gigabit | ||
13 | ethernet device built into the Toshiba southbridge commonly used | ||
14 | in the SONY Playstation 3 and the IBM QS20 Cell blade. | ||
15 | |||
16 | The Structure of the RX Ring. | ||
17 | ============================= | ||
18 | The receive (RX) ring is a circular linked list of RX descriptors, | ||
19 | together with three pointers into the ring that are used to manage its | ||
20 | contents. | ||
21 | |||
22 | The elements of the ring are called "descriptors" or "descrs"; they | ||
23 | describe the received data. This includes a pointer to a buffer | ||
24 | containing the received data, the buffer size, and various status bits. | ||
25 | |||
26 | There are three primary states that a descriptor can be in: "empty", | ||
27 | "full" and "not-in-use". An "empty" or "ready" descriptor is ready | ||
28 | to receive data from the hardware. A "full" descriptor has data in it, | ||
29 | and is waiting to be emptied and processed by the OS. A "not-in-use" | ||
30 | descriptor is neither empty or full; it is simply not ready. It may | ||
31 | not even have a data buffer in it, or is otherwise unusable. | ||
32 | |||
33 | During normal operation, on device startup, the OS (specifically, the | ||
34 | spidernet device driver) allocates a set of RX descriptors and RX | ||
35 | buffers. These are all marked "empty", ready to receive data. This | ||
36 | ring is handed off to the hardware, which sequentially fills in the | ||
37 | buffers, and marks them "full". The OS follows up, taking the full | ||
38 | buffers, processing them, and re-marking them empty. | ||
39 | |||
40 | This filling and emptying is managed by three pointers, the "head" | ||
41 | and "tail" pointers, managed by the OS, and a hardware current | ||
42 | descriptor pointer (GDACTDPA). The GDACTDPA points at the descr | ||
43 | currently being filled. When this descr is filled, the hardware | ||
44 | marks it full, and advances the GDACTDPA by one. Thus, when there is | ||
45 | flowing RX traffic, every descr behind it should be marked "full", | ||
46 | and everything in front of it should be "empty". If the hardware | ||
47 | discovers that the current descr is not empty, it will signal an | ||
48 | interrupt, and halt processing. | ||
49 | |||
50 | The tail pointer tails or trails the hardware pointer. When the | ||
51 | hardware is ahead, the tail pointer will be pointing at a "full" | ||
52 | descr. The OS will process this descr, and then mark it "not-in-use", | ||
53 | and advance the tail pointer. Thus, when there is flowing RX traffic, | ||
54 | all of the descrs in front of the tail pointer should be "full", and | ||
55 | all of those behind it should be "not-in-use". When RX traffic is not | ||
56 | flowing, then the tail pointer can catch up to the hardware pointer. | ||
57 | The OS will then note that the current tail is "empty", and halt | ||
58 | processing. | ||
59 | |||
60 | The head pointer (somewhat mis-named) follows after the tail pointer. | ||
61 | When traffic is flowing, then the head pointer will be pointing at | ||
62 | a "not-in-use" descr. The OS will perform various housekeeping duties | ||
63 | on this descr. This includes allocating a new data buffer and | ||
64 | dma-mapping it so as to make it visible to the hardware. The OS will | ||
65 | then mark the descr as "empty", ready to receive data. Thus, when there | ||
66 | is flowing RX traffic, everything in front of the head pointer should | ||
67 | be "not-in-use", and everything behind it should be "empty". If no | ||
68 | RX traffic is flowing, then the head pointer can catch up to the tail | ||
69 | pointer, at which point the OS will notice that the head descr is | ||
70 | "empty", and it will halt processing. | ||
71 | |||
72 | Thus, in an idle system, the GDACTDPA, tail and head pointers will | ||
73 | all be pointing at the same descr, which should be "empty". All of the | ||
74 | other descrs in the ring should be "empty" as well. | ||
75 | |||
76 | The show_rx_chain() routine will print out the the locations of the | ||
77 | GDACTDPA, tail and head pointers. It will also summarize the contents | ||
78 | of the ring, starting at the tail pointer, and listing the status | ||
79 | of the descrs that follow. | ||
80 | |||
81 | A typical example of the output, for a nearly idle system, might be | ||
82 | |||
83 | net eth1: Total number of descrs=256 | ||
84 | net eth1: Chain tail located at descr=20 | ||
85 | net eth1: Chain head is at 20 | ||
86 | net eth1: HW curr desc (GDACTDPA) is at 21 | ||
87 | net eth1: Have 1 descrs with stat=x40800101 | ||
88 | net eth1: HW next desc (GDACNEXTDA) is at 22 | ||
89 | net eth1: Last 255 descrs with stat=xa0800000 | ||
90 | |||
91 | In the above, the hardware has filled in one descr, number 20. Both | ||
92 | head and tail are pointing at 20, because it has not yet been emptied. | ||
93 | Meanwhile, hw is pointing at 21, which is free. | ||
94 | |||
95 | The "Have nnn decrs" refers to the descr starting at the tail: in this | ||
96 | case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers | ||
97 | to all of the rest of the descrs, from the last status change. The "nnn" | ||
98 | is a count of how many descrs have exactly the same status. | ||
99 | |||
100 | The status x4... corresponds to "full" and status xa... corresponds | ||
101 | to "empty". The actual value printed is RXCOMST_A. | ||
102 | |||
103 | In the device driver source code, a different set of names are | ||
104 | used for these same concepts, so that | ||
105 | |||
106 | "empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa | ||
107 | "full" == SPIDER_NET_DESCR_FRAME_END == 0x4 | ||
108 | "not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf | ||
109 | |||
110 | |||
111 | The RX RAM full bug/feature | ||
112 | =========================== | ||
113 | |||
114 | As long as the OS can empty out the RX buffers at a rate faster than | ||
115 | the hardware can fill them, there is no problem. If, for some reason, | ||
116 | the OS fails to empty the RX ring fast enough, the hardware GDACTDPA | ||
117 | pointer will catch up to the head, notice the not-empty condition, | ||
118 | ad stop. However, RX packets may still continue arriving on the wire. | ||
119 | The spidernet chip can save some limited number of these in local RAM. | ||
120 | When this local ram fills up, the spider chip will issue an interrupt | ||
121 | indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit | ||
122 | will be set in GHIINT1STS). When the RX ram full condition occurs, | ||
123 | a certain bug/feature is triggered that has to be specially handled. | ||
124 | This section describes the special handling for this condition. | ||
125 | |||
126 | When the OS finally has a chance to run, it will empty out the RX ring. | ||
127 | In particular, it will clear the descriptor on which the hardware had | ||
128 | stopped. However, once the hardware has decided that a certain | ||
129 | descriptor is invalid, it will not restart at that descriptor; instead | ||
130 | it will restart at the next descr. This potentially will lead to a | ||
131 | deadlock condition, as the tail pointer will be pointing at this descr, | ||
132 | which, from the OS point of view, is empty; the OS will be waiting for | ||
133 | this descr to be filled. However, the hardware has skipped this descr, | ||
134 | and is filling the next descrs. Since the OS doesn't see this, there | ||
135 | is a potential deadlock, with the OS waiting for one descr to fill, | ||
136 | while the hardware is waiting for a different set of descrs to become | ||
137 | empty. | ||
138 | |||
139 | A call to show_rx_chain() at this point indicates the nature of the | ||
140 | problem. A typical print when the network is hung shows the following: | ||
141 | |||
142 | net eth1: Spider RX RAM full, incoming packets might be discarded! | ||
143 | net eth1: Total number of descrs=256 | ||
144 | net eth1: Chain tail located at descr=255 | ||
145 | net eth1: Chain head is at 255 | ||
146 | net eth1: HW curr desc (GDACTDPA) is at 0 | ||
147 | net eth1: Have 1 descrs with stat=xa0800000 | ||
148 | net eth1: HW next desc (GDACNEXTDA) is at 1 | ||
149 | net eth1: Have 127 descrs with stat=x40800101 | ||
150 | net eth1: Have 1 descrs with stat=x40800001 | ||
151 | net eth1: Have 126 descrs with stat=x40800101 | ||
152 | net eth1: Last 1 descrs with stat=xa0800000 | ||
153 | |||
154 | Both the tail and head pointers are pointing at descr 255, which is | ||
155 | marked xa... which is "empty". Thus, from the OS point of view, there | ||
156 | is nothing to be done. In particular, there is the implicit assumption | ||
157 | that everything in front of the "empty" descr must surely also be empty, | ||
158 | as explained in the last section. The OS is waiting for descr 255 to | ||
159 | become non-empty, which, in this case, will never happen. | ||
160 | |||
161 | The HW pointer is at descr 0. This descr is marked 0x4.. or "full". | ||
162 | Since its already full, the hardware can do nothing more, and thus has | ||
163 | halted processing. Notice that descrs 0 through 254 are all marked | ||
164 | "full", while descr 254 and 255 are empty. (The "Last 1 descrs" is | ||
165 | descr 254, since tail was at 255.) Thus, the system is deadlocked, | ||
166 | and there can be no forward progress; the OS thinks there's nothing | ||
167 | to do, and the hardware has nowhere to put incoming data. | ||
168 | |||
169 | This bug/feature is worked around with the spider_net_resync_head_ptr() | ||
170 | routine. When the driver receives RX interrupts, but an examination | ||
171 | of the RX chain seems to show it is empty, then it is probable that | ||
172 | the hardware has skipped a descr or two (sometimes dozens under heavy | ||
173 | network conditions). The spider_net_resync_head_ptr() subroutine will | ||
174 | search the ring for the next full descr, and the driver will resume | ||
175 | operations there. Since this will leave "holes" in the ring, there | ||
176 | is also a spider_net_resync_tail_ptr() that will skip over such holes. | ||
177 | |||
178 | As of this writing, the spider_net_resync() strategy seems to work very | ||
179 | well, even under heavy network loads. | ||
180 | |||
181 | |||
182 | The TX ring | ||
183 | =========== | ||
184 | The TX ring uses a low-watermark interrupt scheme to make sure that | ||
185 | the TX queue is appropriately serviced for large packet sizes. | ||
186 | |||
187 | For packet sizes greater than about 1KBytes, the kernel can fill | ||
188 | the TX ring quicker than the device can drain it. Once the ring | ||
189 | is full, the netdev is stopped. When there is room in the ring, | ||
190 | the netdev needs to be reawakened, so that more TX packets are placed | ||
191 | in the ring. The hardware can empty the ring about four times per jiffy, | ||
192 | so its not appropriate to wait for the poll routine to refill, since | ||
193 | the poll routine runs only once per jiffy. The low-watermark mechanism | ||
194 | marks a descr about 1/4th of the way from the bottom of the queue, so | ||
195 | that an interrupt is generated when the descr is processed. This | ||
196 | interrupt wakes up the netdev, which can then refill the queue. | ||
197 | For large packets, this mechanism generates a relatively small number | ||
198 | of interrupts, about 1K/sec. For smaller packets, this will drop to zero | ||
199 | interrupts, as the hardware can empty the queue faster than the kernel | ||
200 | can fill it. | ||
201 | |||
202 | |||
203 | ======= END OF DOCUMENT ======== | ||
204 | |||
diff --git a/Documentation/pci.txt b/Documentation/pci.txt index d38261b67905..7754f5aea4e9 100644 --- a/Documentation/pci.txt +++ b/Documentation/pci.txt | |||
@@ -113,9 +113,6 @@ initialization with a pointer to a structure describing the driver | |||
113 | (Please see Documentation/power/pci.txt for descriptions | 113 | (Please see Documentation/power/pci.txt for descriptions |
114 | of PCI Power Management and the related functions.) | 114 | of PCI Power Management and the related functions.) |
115 | 115 | ||
116 | enable_wake Enable device to generate wake events from a low power | ||
117 | state. | ||
118 | |||
119 | shutdown Hook into reboot_notifier_list (kernel/sys.c). | 116 | shutdown Hook into reboot_notifier_list (kernel/sys.c). |
120 | Intended to stop any idling DMA operations. | 117 | Intended to stop any idling DMA operations. |
121 | Useful for enabling wake-on-lan (NIC) or changing | 118 | Useful for enabling wake-on-lan (NIC) or changing |
@@ -299,7 +296,10 @@ If the PCI device can use the PCI Memory-Write-Invalidate transaction, | |||
299 | call pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval | 296 | call pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval |
300 | and also ensures that the cache line size register is set correctly. | 297 | and also ensures that the cache line size register is set correctly. |
301 | Check the return value of pci_set_mwi() as not all architectures | 298 | Check the return value of pci_set_mwi() as not all architectures |
302 | or chip-sets may support Memory-Write-Invalidate. | 299 | or chip-sets may support Memory-Write-Invalidate. Alternatively, |
300 | if Mem-Wr-Inval would be nice to have but is not required, call | ||
301 | pci_try_set_mwi() to have the system do its best effort at enabling | ||
302 | Mem-Wr-Inval. | ||
303 | 303 | ||
304 | 304 | ||
305 | 3.2 Request MMIO/IOP resources | 305 | 3.2 Request MMIO/IOP resources |
diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt index e00b099a4b86..dd8fe43888d3 100644 --- a/Documentation/power/pci.txt +++ b/Documentation/power/pci.txt | |||
@@ -164,7 +164,6 @@ struct pci_driver: | |||
164 | 164 | ||
165 | int (*suspend) (struct pci_dev *dev, pm_message_t state); | 165 | int (*suspend) (struct pci_dev *dev, pm_message_t state); |
166 | int (*resume) (struct pci_dev *dev); | 166 | int (*resume) (struct pci_dev *dev); |
167 | int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); | ||
168 | 167 | ||
169 | 168 | ||
170 | suspend | 169 | suspend |
@@ -251,42 +250,6 @@ The driver should update the current_state field in its pci_dev structure in | |||
251 | this function, except for PM-capable devices when pci_set_power_state is used. | 250 | this function, except for PM-capable devices when pci_set_power_state is used. |
252 | 251 | ||
253 | 252 | ||
254 | enable_wake | ||
255 | ----------- | ||
256 | |||
257 | Usage: | ||
258 | |||
259 | if (dev->driver && dev->driver->enable_wake) | ||
260 | dev->driver->enable_wake(dev,state,enable); | ||
261 | |||
262 | This callback is generally only relevant for devices that support the PCI PM | ||
263 | spec and have the ability to generate a PME# (Power Management Event Signal) | ||
264 | to wake the system up. (However, it is possible that a device may support | ||
265 | some non-standard way of generating a wake event on sleep.) | ||
266 | |||
267 | Bits 15:11 of the PMC (Power Mgmt Capabilities) Register in a device's | ||
268 | PM Capabilities describe what power states the device supports generating a | ||
269 | wake event from: | ||
270 | |||
271 | +------------------+ | ||
272 | | Bit | State | | ||
273 | +------------------+ | ||
274 | | 11 | D0 | | ||
275 | | 12 | D1 | | ||
276 | | 13 | D2 | | ||
277 | | 14 | D3hot | | ||
278 | | 15 | D3cold | | ||
279 | +------------------+ | ||
280 | |||
281 | A device can use this to enable wake events: | ||
282 | |||
283 | pci_enable_wake(dev,state,enable); | ||
284 | |||
285 | Note that to enable PME# from D3cold, a value of 4 should be passed to | ||
286 | pci_enable_wake (since it uses an index into a bitmask). If a driver gets | ||
287 | a request to enable wake events from D3, two calls should be made to | ||
288 | pci_enable_wake (one for both D3hot and D3cold). | ||
289 | |||
290 | 253 | ||
291 | A reference implementation | 254 | A reference implementation |
292 | ------------------------- | 255 | ------------------------- |
diff --git a/Documentation/power_supply_class.txt b/Documentation/power_supply_class.txt new file mode 100644 index 000000000000..9758cf433c06 --- /dev/null +++ b/Documentation/power_supply_class.txt | |||
@@ -0,0 +1,167 @@ | |||
1 | Linux power supply class | ||
2 | ======================== | ||
3 | |||
4 | Synopsis | ||
5 | ~~~~~~~~ | ||
6 | Power supply class used to represent battery, UPS, AC or DC power supply | ||
7 | properties to user-space. | ||
8 | |||
9 | It defines core set of attributes, which should be applicable to (almost) | ||
10 | every power supply out there. Attributes are available via sysfs and uevent | ||
11 | interfaces. | ||
12 | |||
13 | Each attribute has well defined meaning, up to unit of measure used. While | ||
14 | the attributes provided are believed to be universally applicable to any | ||
15 | power supply, specific monitoring hardware may not be able to provide them | ||
16 | all, so any of them may be skipped. | ||
17 | |||
18 | Power supply class is extensible, and allows to define drivers own attributes. | ||
19 | The core attribute set is subject to the standard Linux evolution (i.e. | ||
20 | if it will be found that some attribute is applicable to many power supply | ||
21 | types or their drivers, it can be added to the core set). | ||
22 | |||
23 | It also integrates with LED framework, for the purpose of providing | ||
24 | typically expected feedback of battery charging/fully charged status and | ||
25 | AC/USB power supply online status. (Note that specific details of the | ||
26 | indication (including whether to use it at all) are fully controllable by | ||
27 | user and/or specific machine defaults, per design principles of LED | ||
28 | framework). | ||
29 | |||
30 | |||
31 | Attributes/properties | ||
32 | ~~~~~~~~~~~~~~~~~~~~~ | ||
33 | Power supply class has predefined set of attributes, this eliminates code | ||
34 | duplication across drivers. Power supply class insist on reusing its | ||
35 | predefined attributes *and* their units. | ||
36 | |||
37 | So, userspace gets predictable set of attributes and their units for any | ||
38 | kind of power supply, and can process/present them to a user in consistent | ||
39 | manner. Results for different power supplies and machines are also directly | ||
40 | comparable. | ||
41 | |||
42 | See drivers/power/ds2760_battery.c and drivers/power/pda_power.c for the | ||
43 | example how to declare and handle attributes. | ||
44 | |||
45 | |||
46 | Units | ||
47 | ~~~~~ | ||
48 | Quoting include/linux/power_supply.h: | ||
49 | |||
50 | All voltages, currents, charges, energies, time and temperatures in µV, | ||
51 | µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise | ||
52 | stated. It's driver's job to convert its raw values to units in which | ||
53 | this class operates. | ||
54 | |||
55 | |||
56 | Attributes/properties detailed | ||
57 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
58 | |||
59 | ~ ~ ~ ~ ~ ~ ~ Charge/Energy/Capacity - how to not confuse ~ ~ ~ ~ ~ ~ ~ | ||
60 | ~ ~ | ||
61 | ~ Because both "charge" (µAh) and "energy" (µWh) represents "capacity" ~ | ||
62 | ~ of battery, this class distinguish these terms. Don't mix them! ~ | ||
63 | ~ ~ | ||
64 | ~ CHARGE_* attributes represents capacity in µAh only. ~ | ||
65 | ~ ENERGY_* attributes represents capacity in µWh only. ~ | ||
66 | ~ CAPACITY attribute represents capacity in *percents*, from 0 to 100. ~ | ||
67 | ~ ~ | ||
68 | ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ | ||
69 | |||
70 | Postfixes: | ||
71 | _AVG - *hardware* averaged value, use it if your hardware is really able to | ||
72 | report averaged values. | ||
73 | _NOW - momentary/instantaneous values. | ||
74 | |||
75 | STATUS - this attribute represents operating status (charging, full, | ||
76 | discharging (i.e. powering a load), etc.). This corresponds to | ||
77 | BATTERY_STATUS_* values, as defined in battery.h. | ||
78 | |||
79 | HEALTH - represents health of the battery, values corresponds to | ||
80 | POWER_SUPPLY_HEALTH_*, defined in battery.h. | ||
81 | |||
82 | VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN - design values for maximal and | ||
83 | minimal power supply voltages. Maximal/minimal means values of voltages | ||
84 | when battery considered "full"/"empty" at normal conditions. Yes, there is | ||
85 | no direct relation between voltage and battery capacity, but some dumb | ||
86 | batteries use voltage for very approximated calculation of capacity. | ||
87 | Battery driver also can use this attribute just to inform userspace | ||
88 | about maximal and minimal voltage thresholds of a given battery. | ||
89 | |||
90 | CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN - design charge values, when | ||
91 | battery considered full/empty. | ||
92 | |||
93 | ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN - same as above but for energy. | ||
94 | |||
95 | CHARGE_FULL, CHARGE_EMPTY - These attributes means "last remembered value | ||
96 | of charge when battery became full/empty". It also could mean "value of | ||
97 | charge when battery considered full/empty at given conditions (temperature, | ||
98 | age)". I.e. these attributes represents real thresholds, not design values. | ||
99 | |||
100 | ENERGY_FULL, ENERGY_EMPTY - same as above but for energy. | ||
101 | |||
102 | CAPACITY - capacity in percents. | ||
103 | CAPACITY_LEVEL - capacity level. This corresponds to | ||
104 | POWER_SUPPLY_CAPACITY_LEVEL_*. | ||
105 | |||
106 | TEMP - temperature of the power supply. | ||
107 | TEMP_AMBIENT - ambient temperature. | ||
108 | |||
109 | TIME_TO_EMPTY - seconds left for battery to be considered empty (i.e. | ||
110 | while battery powers a load) | ||
111 | TIME_TO_FULL - seconds left for battery to be considered full (i.e. | ||
112 | while battery is charging) | ||
113 | |||
114 | |||
115 | Battery <-> external power supply interaction | ||
116 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
117 | Often power supplies are acting as supplies and supplicants at the same | ||
118 | time. Batteries are good example. So, batteries usually care if they're | ||
119 | externally powered or not. | ||
120 | |||
121 | For that case, power supply class implements notification mechanism for | ||
122 | batteries. | ||
123 | |||
124 | External power supply (AC) lists supplicants (batteries) names in | ||
125 | "supplied_to" struct member, and each power_supply_changed() call | ||
126 | issued by external power supply will notify supplicants via | ||
127 | external_power_changed callback. | ||
128 | |||
129 | |||
130 | QA | ||
131 | ~~ | ||
132 | Q: Where is POWER_SUPPLY_PROP_XYZ attribute? | ||
133 | A: If you cannot find attribute suitable for your driver needs, feel free | ||
134 | to add it and send patch along with your driver. | ||
135 | |||
136 | The attributes available currently are the ones currently provided by the | ||
137 | drivers written. | ||
138 | |||
139 | Good candidates to add in future: model/part#, cycle_time, manufacturer, | ||
140 | etc. | ||
141 | |||
142 | |||
143 | Q: I have some very specific attribute (e.g. battery color), should I add | ||
144 | this attribute to standard ones? | ||
145 | A: Most likely, no. Such attribute can be placed in the driver itself, if | ||
146 | it is useful. Of course, if the attribute in question applicable to | ||
147 | large set of batteries, provided by many drivers, and/or comes from | ||
148 | some general battery specification/standard, it may be a candidate to | ||
149 | be added to the core attribute set. | ||
150 | |||
151 | |||
152 | Q: Suppose, my battery monitoring chip/firmware does not provides capacity | ||
153 | in percents, but provides charge_{now,full,empty}. Should I calculate | ||
154 | percentage capacity manually, inside the driver, and register CAPACITY | ||
155 | attribute? The same question about time_to_empty/time_to_full. | ||
156 | A: Most likely, no. This class is designed to export properties which are | ||
157 | directly measurable by the specific hardware available. | ||
158 | |||
159 | Inferring not available properties using some heuristics or mathematical | ||
160 | model is not subject of work for a battery driver. Such functionality | ||
161 | should be factored out, and in fact, apm_power, the driver to serve | ||
162 | legacy APM API on top of power supply class, uses a simple heuristic of | ||
163 | approximating remaining battery capacity based on its charge, current, | ||
164 | voltage and so on. But full-fledged battery model is likely not subject | ||
165 | for kernel at all, as it would require floating point calculation to deal | ||
166 | with things like differential equations and Kalman filters. This is | ||
167 | better be handled by batteryd/libbattery, yet to be written. | ||
diff --git a/Documentation/sched-design-CFS.txt b/Documentation/sched-design-CFS.txt new file mode 100644 index 000000000000..16feebb7bdc0 --- /dev/null +++ b/Documentation/sched-design-CFS.txt | |||
@@ -0,0 +1,119 @@ | |||
1 | |||
2 | This is the CFS scheduler. | ||
3 | |||
4 | 80% of CFS's design can be summed up in a single sentence: CFS basically | ||
5 | models an "ideal, precise multi-tasking CPU" on real hardware. | ||
6 | |||
7 | "Ideal multi-tasking CPU" is a (non-existent :-)) CPU that has 100% | ||
8 | physical power and which can run each task at precise equal speed, in | ||
9 | parallel, each at 1/nr_running speed. For example: if there are 2 tasks | ||
10 | running then it runs each at 50% physical power - totally in parallel. | ||
11 | |||
12 | On real hardware, we can run only a single task at once, so while that | ||
13 | one task runs, the other tasks that are waiting for the CPU are at a | ||
14 | disadvantage - the current task gets an unfair amount of CPU time. In | ||
15 | CFS this fairness imbalance is expressed and tracked via the per-task | ||
16 | p->wait_runtime (nanosec-unit) value. "wait_runtime" is the amount of | ||
17 | time the task should now run on the CPU for it to become completely fair | ||
18 | and balanced. | ||
19 | |||
20 | ( small detail: on 'ideal' hardware, the p->wait_runtime value would | ||
21 | always be zero - no task would ever get 'out of balance' from the | ||
22 | 'ideal' share of CPU time. ) | ||
23 | |||
24 | CFS's task picking logic is based on this p->wait_runtime value and it | ||
25 | is thus very simple: it always tries to run the task with the largest | ||
26 | p->wait_runtime value. In other words, CFS tries to run the task with | ||
27 | the 'gravest need' for more CPU time. So CFS always tries to split up | ||
28 | CPU time between runnable tasks as close to 'ideal multitasking | ||
29 | hardware' as possible. | ||
30 | |||
31 | Most of the rest of CFS's design just falls out of this really simple | ||
32 | concept, with a few add-on embellishments like nice levels, | ||
33 | multiprocessing and various algorithm variants to recognize sleepers. | ||
34 | |||
35 | In practice it works like this: the system runs a task a bit, and when | ||
36 | the task schedules (or a scheduler tick happens) the task's CPU usage is | ||
37 | 'accounted for': the (small) time it just spent using the physical CPU | ||
38 | is deducted from p->wait_runtime. [minus the 'fair share' it would have | ||
39 | gotten anyway]. Once p->wait_runtime gets low enough so that another | ||
40 | task becomes the 'leftmost task' of the time-ordered rbtree it maintains | ||
41 | (plus a small amount of 'granularity' distance relative to the leftmost | ||
42 | task so that we do not over-schedule tasks and trash the cache) then the | ||
43 | new leftmost task is picked and the current task is preempted. | ||
44 | |||
45 | The rq->fair_clock value tracks the 'CPU time a runnable task would have | ||
46 | fairly gotten, had it been runnable during that time'. So by using | ||
47 | rq->fair_clock values we can accurately timestamp and measure the | ||
48 | 'expected CPU time' a task should have gotten. All runnable tasks are | ||
49 | sorted in the rbtree by the "rq->fair_clock - p->wait_runtime" key, and | ||
50 | CFS picks the 'leftmost' task and sticks to it. As the system progresses | ||
51 | forwards, newly woken tasks are put into the tree more and more to the | ||
52 | right - slowly but surely giving a chance for every task to become the | ||
53 | 'leftmost task' and thus get on the CPU within a deterministic amount of | ||
54 | time. | ||
55 | |||
56 | Some implementation details: | ||
57 | |||
58 | - the introduction of Scheduling Classes: an extensible hierarchy of | ||
59 | scheduler modules. These modules encapsulate scheduling policy | ||
60 | details and are handled by the scheduler core without the core | ||
61 | code assuming about them too much. | ||
62 | |||
63 | - sched_fair.c implements the 'CFS desktop scheduler': it is a | ||
64 | replacement for the vanilla scheduler's SCHED_OTHER interactivity | ||
65 | code. | ||
66 | |||
67 | I'd like to give credit to Con Kolivas for the general approach here: | ||
68 | he has proven via RSDL/SD that 'fair scheduling' is possible and that | ||
69 | it results in better desktop scheduling. Kudos Con! | ||
70 | |||
71 | The CFS patch uses a completely different approach and implementation | ||
72 | from RSDL/SD. My goal was to make CFS's interactivity quality exceed | ||
73 | that of RSDL/SD, which is a high standard to meet :-) Testing | ||
74 | feedback is welcome to decide this one way or another. [ and, in any | ||
75 | case, all of SD's logic could be added via a kernel/sched_sd.c module | ||
76 | as well, if Con is interested in such an approach. ] | ||
77 | |||
78 | CFS's design is quite radical: it does not use runqueues, it uses a | ||
79 | time-ordered rbtree to build a 'timeline' of future task execution, | ||
80 | and thus has no 'array switch' artifacts (by which both the vanilla | ||
81 | scheduler and RSDL/SD are affected). | ||
82 | |||
83 | CFS uses nanosecond granularity accounting and does not rely on any | ||
84 | jiffies or other HZ detail. Thus the CFS scheduler has no notion of | ||
85 | 'timeslices' and has no heuristics whatsoever. There is only one | ||
86 | central tunable: | ||
87 | |||
88 | /proc/sys/kernel/sched_granularity_ns | ||
89 | |||
90 | which can be used to tune the scheduler from 'desktop' (low | ||
91 | latencies) to 'server' (good batching) workloads. It defaults to a | ||
92 | setting suitable for desktop workloads. SCHED_BATCH is handled by the | ||
93 | CFS scheduler module too. | ||
94 | |||
95 | Due to its design, the CFS scheduler is not prone to any of the | ||
96 | 'attacks' that exist today against the heuristics of the stock | ||
97 | scheduler: fiftyp.c, thud.c, chew.c, ring-test.c, massive_intr.c all | ||
98 | work fine and do not impact interactivity and produce the expected | ||
99 | behavior. | ||
100 | |||
101 | the CFS scheduler has a much stronger handling of nice levels and | ||
102 | SCHED_BATCH: both types of workloads should be isolated much more | ||
103 | agressively than under the vanilla scheduler. | ||
104 | |||
105 | ( another detail: due to nanosec accounting and timeline sorting, | ||
106 | sched_yield() support is very simple under CFS, and in fact under | ||
107 | CFS sched_yield() behaves much better than under any other | ||
108 | scheduler i have tested so far. ) | ||
109 | |||
110 | - sched_rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler | ||
111 | way than the vanilla scheduler does. It uses 100 runqueues (for all | ||
112 | 100 RT priority levels, instead of 140 in the vanilla scheduler) | ||
113 | and it needs no expired array. | ||
114 | |||
115 | - reworked/sanitized SMP load-balancing: the runqueue-walking | ||
116 | assumptions are gone from the load-balancing code now, and | ||
117 | iterators of the scheduling modules are used. The balancing code got | ||
118 | quite a bit simpler as a result. | ||
119 | |||
diff --git a/Documentation/sysfs-rules.txt b/Documentation/sysfs-rules.txt new file mode 100644 index 000000000000..42861bb0bc9b --- /dev/null +++ b/Documentation/sysfs-rules.txt | |||
@@ -0,0 +1,166 @@ | |||
1 | Rules on how to access information in the Linux kernel sysfs | ||
2 | |||
3 | The kernel exported sysfs exports internal kernel implementation-details | ||
4 | and depends on internal kernel structures and layout. It is agreed upon | ||
5 | by the kernel developers that the Linux kernel does not provide a stable | ||
6 | internal API. As sysfs is a direct export of kernel internal | ||
7 | structures, the sysfs interface can not provide a stable interface eighter, | ||
8 | it may always change along with internal kernel changes. | ||
9 | |||
10 | To minimize the risk of breaking users of sysfs, which are in most cases | ||
11 | low-level userspace applications, with a new kernel release, the users | ||
12 | of sysfs must follow some rules to use an as abstract-as-possible way to | ||
13 | access this filesystem. The current udev and HAL programs already | ||
14 | implement this and users are encouraged to plug, if possible, into the | ||
15 | abstractions these programs provide instead of accessing sysfs | ||
16 | directly. | ||
17 | |||
18 | But if you really do want or need to access sysfs directly, please follow | ||
19 | the following rules and then your programs should work with future | ||
20 | versions of the sysfs interface. | ||
21 | |||
22 | - Do not use libsysfs | ||
23 | It makes assumptions about sysfs which are not true. Its API does not | ||
24 | offer any abstraction, it exposes all the kernel driver-core | ||
25 | implementation details in its own API. Therefore it is not better than | ||
26 | reading directories and opening the files yourself. | ||
27 | Also, it is not actively maintained, in the sense of reflecting the | ||
28 | current kernel-development. The goal of providing a stable interface | ||
29 | to sysfs has failed, it causes more problems, than it solves. It | ||
30 | violates many of the rules in this document. | ||
31 | |||
32 | - sysfs is always at /sys | ||
33 | Parsing /proc/mounts is a waste of time. Other mount points are a | ||
34 | system configuration bug you should not try to solve. For test cases, | ||
35 | possibly support a SYSFS_PATH environment variable to overwrite the | ||
36 | applications behavior, but never try to search for sysfs. Never try | ||
37 | to mount it, if you are not an early boot script. | ||
38 | |||
39 | - devices are only "devices" | ||
40 | There is no such thing like class-, bus-, physical devices, | ||
41 | interfaces, and such that you can rely on in userspace. Everything is | ||
42 | just simply a "device". Class-, bus-, physical, ... types are just | ||
43 | kernel implementation details, which should not be expected by | ||
44 | applications that look for devices in sysfs. | ||
45 | |||
46 | The properties of a device are: | ||
47 | o devpath (/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0) | ||
48 | - identical to the DEVPATH value in the event sent from the kernel | ||
49 | at device creation and removal | ||
50 | - the unique key to the device at that point in time | ||
51 | - the kernels path to the device-directory without the leading | ||
52 | /sys, and always starting with with a slash | ||
53 | - all elements of a devpath must be real directories. Symlinks | ||
54 | pointing to /sys/devices must always be resolved to their real | ||
55 | target, and the target path must be used to access the device. | ||
56 | That way the devpath to the device matches the devpath of the | ||
57 | kernel used at event time. | ||
58 | - using or exposing symlink values as elements in a devpath string | ||
59 | is a bug in the application | ||
60 | |||
61 | o kernel name (sda, tty, 0000:00:1f.2, ...) | ||
62 | - a directory name, identical to the last element of the devpath | ||
63 | - applications need to handle spaces and characters like '!' in | ||
64 | the name | ||
65 | |||
66 | o subsystem (block, tty, pci, ...) | ||
67 | - simple string, never a path or a link | ||
68 | - retrieved by reading the "subsystem"-link and using only the | ||
69 | last element of the target path | ||
70 | |||
71 | o driver (tg3, ata_piix, uhci_hcd) | ||
72 | - a simple string, which may contain spaces, never a path or a | ||
73 | link | ||
74 | - it is retrieved by reading the "driver"-link and using only the | ||
75 | last element of the target path | ||
76 | - devices which do not have "driver"-link, just do not have a | ||
77 | driver; copying the driver value in a child device context, is a | ||
78 | bug in the application | ||
79 | |||
80 | o attributes | ||
81 | - the files in the device directory or files below a subdirectories | ||
82 | of the same device directory | ||
83 | - accessing attributes reached by a symlink pointing to another device, | ||
84 | like the "device"-link, is a bug in the application | ||
85 | |||
86 | Everything else is just a kernel driver-core implementation detail, | ||
87 | that should not be assumed to be stable across kernel releases. | ||
88 | |||
89 | - Properties of parent devices never belong into a child device. | ||
90 | Always look at the parent devices themselves for determining device | ||
91 | context properties. If the device 'eth0' or 'sda' does not have a | ||
92 | "driver"-link, then this device does not have a driver. Its value is empty. | ||
93 | Never copy any property of the parent-device into a child-device. Parent | ||
94 | device-properties may change dynamically without any notice to the | ||
95 | child device. | ||
96 | |||
97 | - Hierarchy in a single device-tree | ||
98 | There is only one valid place in sysfs where hierarchy can be examined | ||
99 | and this is below: /sys/devices. | ||
100 | It is planned, that all device directories will end up in the tree | ||
101 | below this directory. | ||
102 | |||
103 | - Classification by subsystem | ||
104 | There are currently three places for classification of devices: | ||
105 | /sys/block, /sys/class and /sys/bus. It is planned that these will | ||
106 | not contain any device-directories themselves, but only flat lists of | ||
107 | symlinks pointing to the unified /sys/devices tree. | ||
108 | All three places have completely different rules on how to access | ||
109 | device information. It is planned to merge all three | ||
110 | classification-directories into one place at /sys/subsystem, | ||
111 | following the layout of the bus-directories. All buses and | ||
112 | classes, including the converted block-subsystem, will show up | ||
113 | there. | ||
114 | The devices belonging to a subsystem will create a symlink in the | ||
115 | "devices" directory at /sys/subsystem/<name>/devices. | ||
116 | |||
117 | If /sys/subsystem exists, /sys/bus, /sys/class and /sys/block can be | ||
118 | ignored. If it does not exist, you have always to scan all three | ||
119 | places, as the kernel is free to move a subsystem from one place to | ||
120 | the other, as long as the devices are still reachable by the same | ||
121 | subsystem name. | ||
122 | |||
123 | Assuming /sys/class/<subsystem> and /sys/bus/<subsystem>, or | ||
124 | /sys/block and /sys/class/block are not interchangeable, is a bug in | ||
125 | the application. | ||
126 | |||
127 | - Block | ||
128 | The converted block-subsystem at /sys/class/block, or | ||
129 | /sys/subsystem/block will contain the links for disks and partitions | ||
130 | at the same level, never in a hierarchy. Assuming the block-subsytem to | ||
131 | contain only disks and not partition-devices in the same flat list is | ||
132 | a bug in the application. | ||
133 | |||
134 | - "device"-link and <subsystem>:<kernel name>-links | ||
135 | Never depend on the "device"-link. The "device"-link is a workaround | ||
136 | for the old layout, where class-devices are not created in | ||
137 | /sys/devices/ like the bus-devices. If the link-resolving of a | ||
138 | device-directory does not end in /sys/devices/, you can use the | ||
139 | "device"-link to find the parent devices in /sys/devices/. That is the | ||
140 | single valid use of the "device"-link, it must never appear in any | ||
141 | path as an element. Assuming the existence of the "device"-link for | ||
142 | a device in /sys/devices/ is a bug in the application. | ||
143 | Accessing /sys/class/net/eth0/device is a bug in the application. | ||
144 | |||
145 | Never depend on the class-specific links back to the /sys/class | ||
146 | directory. These links are also a workaround for the design mistake | ||
147 | that class-devices are not created in /sys/devices. If a device | ||
148 | directory does not contain directories for child devices, these links | ||
149 | may be used to find the child devices in /sys/class. That is the single | ||
150 | valid use of these links, they must never appear in any path as an | ||
151 | element. Assuming the existence of these links for devices which are | ||
152 | real child device directories in the /sys/devices tree, is a bug in | ||
153 | the application. | ||
154 | |||
155 | It is planned to remove all these links when when all class-device | ||
156 | directories live in /sys/devices. | ||
157 | |||
158 | - Position of devices along device chain can change. | ||
159 | Never depend on a specific parent device position in the devpath, | ||
160 | or the chain of parent devices. The kernel is free to insert devices into | ||
161 | the chain. You must always request the parent device you are looking for | ||
162 | by its subsystem value. You need to walk up the chain until you find | ||
163 | the device that matches the expected subsystem. Depending on a specific | ||
164 | position of a parent device, or exposing relative paths, using "../" to | ||
165 | access the chain of parents, is a bug in the application. | ||
166 | |||