aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/removed/raw1394_legacy_isochronous16
-rw-r--r--Documentation/DMA-mapping.txt103
-rw-r--r--Documentation/DocBook/kernel-api.tmpl66
-rw-r--r--Documentation/blackfin/kgdb.txt155
-rw-r--r--Documentation/block/barrier.txt16
-rw-r--r--Documentation/feature-removal-schedule.txt52
-rw-r--r--Documentation/firmware_class/firmware_sample_firmware_class.c2
-rw-r--r--Documentation/i2c/busses/i2c-i8014
-rw-r--r--Documentation/i2c/busses/i2c-piix42
-rw-r--r--Documentation/i2c/busses/i2c-taos-evm46
-rw-r--r--Documentation/i2c/chips/max68752
-rw-r--r--Documentation/i2c/chips/x120538
-rw-r--r--Documentation/i2c/summary2
-rw-r--r--Documentation/i2c/writing-clients2
-rw-r--r--Documentation/i386/zero-page.txt1
-rw-r--r--Documentation/kernel-parameters.txt43
-rw-r--r--Documentation/networking/00-INDEX3
-rw-r--r--Documentation/networking/ip-sysctl.txt3
-rw-r--r--Documentation/networking/l2tp.txt169
-rw-r--r--Documentation/networking/multiqueue.txt111
-rw-r--r--Documentation/networking/netdevices.txt38
-rw-r--r--Documentation/networking/sk98lin.txt568
-rw-r--r--Documentation/networking/spider_net.txt204
-rw-r--r--Documentation/pci.txt8
-rw-r--r--Documentation/power/pci.txt37
-rw-r--r--Documentation/power_supply_class.txt167
-rw-r--r--Documentation/sched-design-CFS.txt119
-rw-r--r--Documentation/sysfs-rules.txt166
28 files changed, 1282 insertions, 861 deletions
diff --git a/Documentation/ABI/removed/raw1394_legacy_isochronous b/Documentation/ABI/removed/raw1394_legacy_isochronous
new file mode 100644
index 000000000000..1b629622d883
--- /dev/null
+++ b/Documentation/ABI/removed/raw1394_legacy_isochronous
@@ -0,0 +1,16 @@
1What: legacy isochronous ABI of raw1394 (1st generation iso ABI)
2Date: June 2007 (scheduled), removed in kernel v2.6.23
3Contact: linux1394-devel@lists.sourceforge.net
4Description:
5 The two request types RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN have
6 been deprecated for quite some time. They are very inefficient as they
7 come with high interrupt load and several layers of callbacks for each
8 packet. Because of these deficiencies, the video1394 and dv1394 drivers
9 and the 3rd-generation isochronous ABI in raw1394 (rawiso) were created.
10
11Users:
12 libraw1394 users via the long deprecated API raw1394_iso_write,
13 raw1394_start_iso_write, raw1394_start_iso_rcv, raw1394_stop_iso_rcv
14
15 libdc1394, which optionally uses these old libraw1394 calls
16 alternatively to the more efficient video1394 ABI
diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt
index 028614cdd062..e07f2530326b 100644
--- a/Documentation/DMA-mapping.txt
+++ b/Documentation/DMA-mapping.txt
@@ -664,109 +664,6 @@ It is that simple.
664Well, not for some odd devices. See the next section for information 664Well, not for some odd devices. See the next section for information
665about that. 665about that.
666 666
667 DAC Addressing for Address Space Hungry Devices
668
669There exists a class of devices which do not mesh well with the PCI
670DMA mapping API. By definition these "mappings" are a finite
671resource. The number of total available mappings per bus is platform
672specific, but there will always be a reasonable amount.
673
674What is "reasonable"? Reasonable means that networking and block I/O
675devices need not worry about using too many mappings.
676
677As an example of a problematic device, consider compute cluster cards.
678They can potentially need to access gigabytes of memory at once via
679DMA. Dynamic mappings are unsuitable for this kind of access pattern.
680
681To this end we've provided a small API by which a device driver
682may use DAC cycles to directly address all of physical memory.
683Not all platforms support this, but most do. It is easy to determine
684whether the platform will work properly at probe time.
685
686First, understand that there may be a SEVERE performance penalty for
687using these interfaces on some platforms. Therefore, you MUST only
688use these interfaces if it is absolutely required. %99 of devices can
689use the normal APIs without any problems.
690
691Note that for streaming type mappings you must either use these
692interfaces, or the dynamic mapping interfaces above. You may not mix
693usage of both for the same device. Such an act is illegal and is
694guaranteed to put a banana in your tailpipe.
695
696However, consistent mappings may in fact be used in conjunction with
697these interfaces. Remember that, as defined, consistent mappings are
698always going to be SAC addressable.
699
700The first thing your driver needs to do is query the PCI platform
701layer if it is capable of handling your devices DAC addressing
702capabilities:
703
704 int pci_dac_dma_supported(struct pci_dev *hwdev, u64 mask);
705
706You may not use the following interfaces if this routine fails.
707
708Next, DMA addresses using this API are kept track of using the
709dma64_addr_t type. It is guaranteed to be big enough to hold any
710DAC address the platform layer will give to you from the following
711routines. If you have consistent mappings as well, you still
712use plain dma_addr_t to keep track of those.
713
714All mappings obtained here will be direct. The mappings are not
715translated, and this is the purpose of this dialect of the DMA API.
716
717All routines work with page/offset pairs. This is the _ONLY_ way to
718portably refer to any piece of memory. If you have a cpu pointer
719(which may be validly DMA'd too) you may easily obtain the page
720and offset using something like this:
721
722 struct page *page = virt_to_page(ptr);
723 unsigned long offset = offset_in_page(ptr);
724
725Here are the interfaces:
726
727 dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev,
728 struct page *page,
729 unsigned long offset,
730 int direction);
731
732The DAC address for the tuple PAGE/OFFSET are returned. The direction
733argument is the same as for pci_{map,unmap}_single(). The same rules
734for cpu/device access apply here as for the streaming mapping
735interfaces. To reiterate:
736
737 The cpu may touch the buffer before pci_dac_page_to_dma.
738 The device may touch the buffer after pci_dac_page_to_dma
739 is made, but the cpu may NOT.
740
741When the DMA transfer is complete, invoke:
742
743 void pci_dac_dma_sync_single_for_cpu(struct pci_dev *pdev,
744 dma64_addr_t dma_addr,
745 size_t len, int direction);
746
747This must be done before the CPU looks at the buffer again.
748This interface behaves identically to pci_dma_sync_{single,sg}_for_cpu().
749
750And likewise, if you wish to let the device get back at the buffer after
751the cpu has read/written it, invoke:
752
753 void pci_dac_dma_sync_single_for_device(struct pci_dev *pdev,
754 dma64_addr_t dma_addr,
755 size_t len, int direction);
756
757before letting the device access the DMA area again.
758
759If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t
760the following interfaces are provided:
761
762 struct page *pci_dac_dma_to_page(struct pci_dev *pdev,
763 dma64_addr_t dma_addr);
764 unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev,
765 dma64_addr_t dma_addr);
766
767This is possible with the DAC interfaces purely because they are
768not translated in any way.
769
770 Optimizing Unmap State Space Consumption 667 Optimizing Unmap State Space Consumption
771 668
772On many platforms, pci_unmap_{single,page}() is simply a nop. 669On many platforms, pci_unmap_{single,page}() is simply a nop.
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index 38f88b6ae405..46bcff2849bd 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -643,4 +643,70 @@ X!Idrivers/video/console/fonts.c
643!Edrivers/spi/spi.c 643!Edrivers/spi/spi.c
644 </chapter> 644 </chapter>
645 645
646 <chapter id="i2c">
647 <title>I<superscript>2</superscript>C and SMBus Subsystem</title>
648
649 <para>
650 I<superscript>2</superscript>C (or without fancy typography, "I2C")
651 is an acronym for the "Inter-IC" bus, a simple bus protocol which is
652 widely used where low data rate communications suffice.
653 Since it's also a licensed trademark, some vendors use another
654 name (such as "Two-Wire Interface", TWI) for the same bus.
655 I2C only needs two signals (SCL for clock, SDA for data), conserving
656 board real estate and minimizing signal quality issues.
657 Most I2C devices use seven bit addresses, and bus speeds of up
658 to 400 kHz; there's a high speed extension (3.4 MHz) that's not yet
659 found wide use.
660 I2C is a multi-master bus; open drain signaling is used to
661 arbitrate between masters, as well as to handshake and to
662 synchronize clocks from slower clients.
663 </para>
664
665 <para>
666 The Linux I2C programming interfaces support only the master
667 side of bus interactions, not the slave side.
668 The programming interface is structured around two kinds of driver,
669 and two kinds of device.
670 An I2C "Adapter Driver" abstracts the controller hardware; it binds
671 to a physical device (perhaps a PCI device or platform_device) and
672 exposes a <structname>struct i2c_adapter</structname> representing
673 each I2C bus segment it manages.
674 On each I2C bus segment will be I2C devices represented by a
675 <structname>struct i2c_client</structname>. Those devices will
676 be bound to a <structname>struct i2c_driver</structname>,
677 which should follow the standard Linux driver model.
678 (At this writing, a legacy model is more widely used.)
679 There are functions to perform various I2C protocol operations; at
680 this writing all such functions are usable only from task context.
681 </para>
682
683 <para>
684 The System Management Bus (SMBus) is a sibling protocol. Most SMBus
685 systems are also I2C conformant. The electrical constraints are
686 tighter for SMBus, and it standardizes particular protocol messages
687 and idioms. Controllers that support I2C can also support most
688 SMBus operations, but SMBus controllers don't support all the protocol
689 options that an I2C controller will.
690 There are functions to perform various SMBus protocol operations,
691 either using I2C primitives or by issuing SMBus commands to
692 i2c_adapter devices which don't support those I2C operations.
693 </para>
694
695!Iinclude/linux/i2c.h
696!Fdrivers/i2c/i2c-boardinfo.c i2c_register_board_info
697!Edrivers/i2c/i2c-core.c
698 </chapter>
699
700 <chapter id="splice">
701 <title>splice API</title>
702 <para>)
703 splice is a method for moving blocks of data around inside the
704 kernel, without continually transferring it between the kernel
705 and user space.
706 </para>
707!Iinclude/linux/splice.h
708!Ffs/splice.c
709 </chapter>
710
711
646</book> 712</book>
diff --git a/Documentation/blackfin/kgdb.txt b/Documentation/blackfin/kgdb.txt
new file mode 100644
index 000000000000..84f6a484ae9a
--- /dev/null
+++ b/Documentation/blackfin/kgdb.txt
@@ -0,0 +1,155 @@
1 A Simple Guide to Configure KGDB
2
3 Sonic Zhang <sonic.zhang@analog.com>
4 Aug. 24th 2006
5
6
7This KGDB patch enables the kernel developer to do source level debugging on
8the kernel for the Blackfin architecture. The debugging works over either the
9ethernet interface or one of the uarts. Both software breakpoints and
10hardware breakpoints are supported in this version.
11http://docs.blackfin.uclinux.org/doku.php?id=kgdb
12
13
142 known issues:
151. This bug:
16 http://blackfin.uclinux.org/tracker/index.php?func=detail&aid=544&group_id=18&atid=145
17 The GDB client for Blackfin uClinux causes incorrect values of local
18 variables to be displayed when the user breaks the running of kernel in GDB.
192. Because of a hardware bug in Blackfin 533 v1.0.3:
20 05000067 - Watchpoints (Hardware Breakpoints) are not supported
21 Hardware breakpoints cannot be set properly.
22
23
24Debug over Ethernet:
25
261. Compile and install the cross platform version of gdb for blackfin, which
27 can be found at $(BINROOT)/bfin-elf-gdb.
28
292. Apply this patch to the 2.6.x kernel. Select the menuconfig option under
30 "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb".
31 With this selected, option "Full Symbolic/Source Debugging support" and
32 "Compile the kernel with frame pointers" are also selected.
33
343. Select option "KGDB: connect over (Ethernet)". Add "kgdboe=@target-IP/,@host-IP/" to
35 the option "Compiled-in Kernel Boot Parameter" under "Kernel hacking".
36
374. Connect minicom to the serial port and boot the kernel image.
38
395. Configure the IP "/> ifconfig eth0 target-IP"
40
416. Start GDB client "bfin-elf-gdb vmlinux".
42
437. Connect to the target "(gdb) target remote udp:target-IP:6443".
44
458. Set software breakpoint "(gdb) break sys_open".
46
479. Continue "(gdb) c".
48
4910. Run ls in the target console "/> ls".
50
5111. Breakpoint hits. "Breakpoint 1: sys_open(..."
52
5312. Display local variables and function paramters.
54 (*) This operation gives wrong results, see known issue 1.
55
5613. Single stepping "(gdb) si".
57
5814. Remove breakpoint 1. "(gdb) del 1"
59
6015. Set hardware breakpoint "(gdb) hbreak sys_open".
61
6216. Continue "(gdb) c".
63
6417. Run ls in the target console "/> ls".
65
6618. Hardware breakpoint hits. "Breakpoint 1: sys_open(...".
67 (*) This hardware breakpoint will not be hit, see known issue 2.
68
6919. Continue "(gdb) c".
70
7120. Interrupt the target in GDB "Ctrl+C".
72
7321. Detach from the target "(gdb) detach".
74
7522. Exit GDB "(gdb) quit".
76
77
78Debug over the UART:
79
801. Compile and install the cross platform version of gdb for blackfin, which
81 can be found at $(BINROOT)/bfin-elf-gdb.
82
832. Apply this patch to the 2.6.x kernel. Select the menuconfig option under
84 "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb".
85 With this selected, option "Full Symbolic/Source Debugging support" and
86 "Compile the kernel with frame pointers" are also selected.
87
883. Select option "KGDB: connect over (UART)". Set "KGDB: UART port number" to be
89 a different one from the console. Don't forget to change the mode of
90 blackfin serial driver to PIO. Otherwise kgdb works incorrectly on UART.
91
924. If you want connect to kgdb when the kernel boots, enable
93 "KGDB: Wait for gdb connection early"
94
955. Compile kernel.
96
976. Connect minicom to the serial port of the console and boot the kernel image.
98
997. Start GDB client "bfin-elf-gdb vmlinux".
100
1018. Set the baud rate in GDB "(gdb) set remotebaud 57600".
102
1039. Connect to the target on the second serial port "(gdb) target remote /dev/ttyS1".
104
10510. Set software breakpoint "(gdb) break sys_open".
106
10711. Continue "(gdb) c".
108
10912. Run ls in the target console "/> ls".
110
11113. A breakpoint is hit. "Breakpoint 1: sys_open(..."
112
11314. All other operations are the same as that in KGDB over Ethernet.
114
115
116Debug over the same UART as console:
117
1181. Compile and install the cross platform version of gdb for blackfin, which
119 can be found at $(BINROOT)/bfin-elf-gdb.
120
1212. Apply this patch to the 2.6.x kernel. Select the menuconfig option under
122 "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb".
123 With this selected, option "Full Symbolic/Source Debugging support" and
124 "Compile the kernel with frame pointers" are also selected.
125
1263. Select option "KGDB: connect over UART". Set "KGDB: UART port number" to console.
127 Don't forget to change the mode of blackfin serial driver to PIO.
128 Otherwise kgdb works incorrectly on UART.
129
1304. If you want connect to kgdb when the kernel boots, enable
131 "KGDB: Wait for gdb connection early"
132
1335. Connect minicom to the serial port and boot the kernel image.
134
1356. (Optional) Ask target to wait for gdb connection by entering Ctrl+A. In minicom, you should enter Ctrl+A+A.
136
1377. Start GDB client "bfin-elf-gdb vmlinux".
138
1398. Set the baud rate in GDB "(gdb) set remotebaud 57600".
140
1419. Connect to the target "(gdb) target remote /dev/ttyS0".
142
14310. Set software breakpoint "(gdb) break sys_open".
144
14511. Continue "(gdb) c". Then enter Ctrl+C twice to stop GDB connection.
146
14712. Run ls in the target console "/> ls". Dummy string can be seen on the console.
148
14913. Then connect the gdb to target again. "(gdb) target remote /dev/ttyS0".
150 Now you will find a breakpoint is hit. "Breakpoint 1: sys_open(..."
151
15214. All other operations are the same as that in KGDB over Ethernet. The only
153 difference is that after continue command in GDB, please stop GDB
154 connection by 2 "Ctrl+C"s and connect again after breakpoints are hit or
155 Ctrl+A is entered.
diff --git a/Documentation/block/barrier.txt b/Documentation/block/barrier.txt
index a272c3db8094..7d279f2f5bb2 100644
--- a/Documentation/block/barrier.txt
+++ b/Documentation/block/barrier.txt
@@ -82,23 +82,12 @@ including draining and flushing.
82typedef void (prepare_flush_fn)(request_queue_t *q, struct request *rq); 82typedef void (prepare_flush_fn)(request_queue_t *q, struct request *rq);
83 83
84int blk_queue_ordered(request_queue_t *q, unsigned ordered, 84int blk_queue_ordered(request_queue_t *q, unsigned ordered,
85 prepare_flush_fn *prepare_flush_fn, 85 prepare_flush_fn *prepare_flush_fn);
86 unsigned gfp_mask);
87
88int blk_queue_ordered_locked(request_queue_t *q, unsigned ordered,
89 prepare_flush_fn *prepare_flush_fn,
90 unsigned gfp_mask);
91
92The only difference between the two functions is whether or not the
93caller is holding q->queue_lock on entry. The latter expects the
94caller is holding the lock.
95 86
96@q : the queue in question 87@q : the queue in question
97@ordered : the ordered mode the driver/device supports 88@ordered : the ordered mode the driver/device supports
98@prepare_flush_fn : this function should prepare @rq such that it 89@prepare_flush_fn : this function should prepare @rq such that it
99 flushes cache to physical medium when executed 90 flushes cache to physical medium when executed
100@gfp_mask : gfp_mask used when allocating data structures
101 for ordered processing
102 91
103For example, SCSI disk driver's prepare_flush_fn looks like the 92For example, SCSI disk driver's prepare_flush_fn looks like the
104following. 93following.
@@ -106,9 +95,10 @@ following.
106static void sd_prepare_flush(request_queue_t *q, struct request *rq) 95static void sd_prepare_flush(request_queue_t *q, struct request *rq)
107{ 96{
108 memset(rq->cmd, 0, sizeof(rq->cmd)); 97 memset(rq->cmd, 0, sizeof(rq->cmd));
109 rq->flags |= REQ_BLOCK_PC; 98 rq->cmd_type = REQ_TYPE_BLOCK_PC;
110 rq->timeout = SD_TIMEOUT; 99 rq->timeout = SD_TIMEOUT;
111 rq->cmd[0] = SYNCHRONIZE_CACHE; 100 rq->cmd[0] = SYNCHRONIZE_CACHE;
101 rq->cmd_len = 10;
112} 102}
113 103
114The following seven ordered modes are supported. The following table 104The following seven ordered modes are supported. The following table
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 7d3f205b0ba5..0599a0c7c026 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -49,16 +49,6 @@ Who: Adrian Bunk <bunk@stusta.de>
49 49
50--------------------------- 50---------------------------
51 51
52What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN
53When: June 2007
54Why: Deprecated in favour of the more efficient and robust rawiso interface.
55 Affected are applications which use the deprecated part of libraw1394
56 (raw1394_iso_write, raw1394_start_iso_write, raw1394_start_iso_rcv,
57 raw1394_stop_iso_rcv) or bypass libraw1394.
58Who: Dan Dennedy <dan@dennedy.org>, Stefan Richter <stefanr@s5r6.in-berlin.de>
59
60---------------------------
61
62What: old NCR53C9x driver 52What: old NCR53C9x driver
63When: October 2007 53When: October 2007
64Why: Replaced by the much better esp_scsi driver. Actual low-level 54Why: Replaced by the much better esp_scsi driver. Actual low-level
@@ -258,14 +248,6 @@ Who: Len Brown <len.brown@intel.com>
258 248
259--------------------------- 249---------------------------
260 250
261What: sk98lin network driver
262When: July 2007
263Why: In kernel tree version of driver is unmaintained. Sk98lin driver
264 replaced by the skge driver.
265Who: Stephen Hemminger <shemminger@osdl.org>
266
267---------------------------
268
269What: Compaq touchscreen device emulation 251What: Compaq touchscreen device emulation
270When: Oct 2007 252When: Oct 2007
271Files: drivers/input/tsdev.c 253Files: drivers/input/tsdev.c
@@ -280,25 +262,6 @@ Who: Richard Purdie <rpurdie@rpsys.net>
280 262
281--------------------------- 263---------------------------
282 264
283What: Multipath cached routing support in ipv4
284When: in 2.6.23
285Why: Code was merged, then submitter immediately disappeared leaving
286 us with no maintainer and lots of bugs. The code should not have
287 been merged in the first place, and many aspects of it's
288 implementation are blocking more critical core networking
289 development. It's marked EXPERIMENTAL and no distribution
290 enables it because it cause obscure crashes due to unfixable bugs
291 (interfaces don't return errors so memory allocation can't be
292 handled, calling contexts of these interfaces make handling
293 errors impossible too because they get called after we've
294 totally commited to creating a route object, for example).
295 This problem has existed for years and no forward progress
296 has ever been made, and nobody steps up to try and salvage
297 this code, so we're going to finally just get rid of it.
298Who: David S. Miller <davem@davemloft.net>
299
300---------------------------
301
302What: read_dev_chars(), read_conf_data{,_lpm}() (s390 common I/O layer) 265What: read_dev_chars(), read_conf_data{,_lpm}() (s390 common I/O layer)
303When: December 2007 266When: December 2007
304Why: These functions are a leftover from 2.4 times. They have several 267Why: These functions are a leftover from 2.4 times. They have several
@@ -348,3 +311,18 @@ Who: Tejun Heo <htejun@gmail.com>
348 311
349--------------------------- 312---------------------------
350 313
314What: Legacy RTC drivers (under drivers/i2c/chips)
315When: November 2007
316Why: Obsolete. We have a RTC subsystem with better drivers.
317Who: Jean Delvare <khali@linux-fr.org>
318
319---------------------------
320
321What: iptables SAME target
322When: 1.1. 2008
323Files: net/ipv4/netfilter/ipt_SAME.c, include/linux/netfilter_ipv4/ipt_SAME.h
324Why: Obsolete for multiple years now, NAT core provides the same behaviour.
325 Unfixable broken wrt. 32/64 bit cleanness.
326Who: Patrick McHardy <kaber@trash.net>
327
328---------------------------
diff --git a/Documentation/firmware_class/firmware_sample_firmware_class.c b/Documentation/firmware_class/firmware_sample_firmware_class.c
index 4994f1f28f8c..fba943aacf93 100644
--- a/Documentation/firmware_class/firmware_sample_firmware_class.c
+++ b/Documentation/firmware_class/firmware_sample_firmware_class.c
@@ -78,6 +78,7 @@ static CLASS_DEVICE_ATTR(loading, 0644,
78 firmware_loading_show, firmware_loading_store); 78 firmware_loading_show, firmware_loading_store);
79 79
80static ssize_t firmware_data_read(struct kobject *kobj, 80static ssize_t firmware_data_read(struct kobject *kobj,
81 struct bin_attribute *bin_attr,
81 char *buffer, loff_t offset, size_t count) 82 char *buffer, loff_t offset, size_t count)
82{ 83{
83 struct class_device *class_dev = to_class_dev(kobj); 84 struct class_device *class_dev = to_class_dev(kobj);
@@ -88,6 +89,7 @@ static ssize_t firmware_data_read(struct kobject *kobj,
88 return count; 89 return count;
89} 90}
90static ssize_t firmware_data_write(struct kobject *kobj, 91static ssize_t firmware_data_write(struct kobject *kobj,
92 struct bin_attribute *bin_attr,
91 char *buffer, loff_t offset, size_t count) 93 char *buffer, loff_t offset, size_t count)
92{ 94{
93 struct class_device *class_dev = to_class_dev(kobj); 95 struct class_device *class_dev = to_class_dev(kobj);
diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801
index c34f0db78a30..fe6406f2f9a6 100644
--- a/Documentation/i2c/busses/i2c-i801
+++ b/Documentation/i2c/busses/i2c-i801
@@ -5,8 +5,8 @@ Supported adapters:
5 '810' and '810E' chipsets) 5 '810' and '810E' chipsets)
6 * Intel 82801BA (ICH2 - part of the '815E' chipset) 6 * Intel 82801BA (ICH2 - part of the '815E' chipset)
7 * Intel 82801CA/CAM (ICH3) 7 * Intel 82801CA/CAM (ICH3)
8 * Intel 82801DB (ICH4) (HW PEC supported, 32 byte buffer not supported) 8 * Intel 82801DB (ICH4) (HW PEC supported)
9 * Intel 82801EB/ER (ICH5) (HW PEC supported, 32 byte buffer not supported) 9 * Intel 82801EB/ER (ICH5) (HW PEC supported)
10 * Intel 6300ESB 10 * Intel 6300ESB
11 * Intel 82801FB/FR/FW/FRW (ICH6) 11 * Intel 82801FB/FR/FW/FRW (ICH6)
12 * Intel 82801G (ICH7) 12 * Intel 82801G (ICH7)
diff --git a/Documentation/i2c/busses/i2c-piix4 b/Documentation/i2c/busses/i2c-piix4
index 7cbe43fa2701..fa0c786a8bf5 100644
--- a/Documentation/i2c/busses/i2c-piix4
+++ b/Documentation/i2c/busses/i2c-piix4
@@ -6,7 +6,7 @@ Supported adapters:
6 Datasheet: Publicly available at the Intel website 6 Datasheet: Publicly available at the Intel website
7 * ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges 7 * ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges
8 Datasheet: Only available via NDA from ServerWorks 8 Datasheet: Only available via NDA from ServerWorks
9 * ATI IXP200, IXP300, IXP400 and SB600 southbridges 9 * ATI IXP200, IXP300, IXP400, SB600 and SB700 southbridges
10 Datasheet: Not publicly available 10 Datasheet: Not publicly available
11 * Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge 11 * Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge
12 Datasheet: Publicly available at the SMSC website http://www.smsc.com 12 Datasheet: Publicly available at the SMSC website http://www.smsc.com
diff --git a/Documentation/i2c/busses/i2c-taos-evm b/Documentation/i2c/busses/i2c-taos-evm
new file mode 100644
index 000000000000..9146e33be6dd
--- /dev/null
+++ b/Documentation/i2c/busses/i2c-taos-evm
@@ -0,0 +1,46 @@
1Kernel driver i2c-taos-evm
2
3Author: Jean Delvare <khali@linux-fr.org>
4
5This is a driver for the evaluation modules for TAOS I2C/SMBus chips.
6The modules include an SMBus master with limited capabilities, which can
7be controlled over the serial port. Virtually all evaluation modules
8are supported, but a few lines of code need to be added for each new
9module to instantiate the right I2C chip on the bus. Obviously, a driver
10for the chip in question is also needed.
11
12Currently supported devices are:
13
14* TAOS TSL2550 EVM
15
16For addtional information on TAOS products, please see
17 http://www.taosinc.com/
18
19
20Using this driver
21-----------------
22
23In order to use this driver, you'll need the serport driver, and the
24inputattach tool, which is part of the input-utils package. The following
25commands will tell the kernel that you have a TAOS EVM on the first
26serial port:
27
28# modprobe serport
29# inputattach --taos-evm /dev/ttyS0
30
31
32Technical details
33-----------------
34
35Only 4 SMBus transaction types are supported by the TAOS evaluation
36modules:
37* Receive Byte
38* Send Byte
39* Read Byte
40* Write Byte
41
42The communication protocol is text-based and pretty simple. It is
43described in a PDF document on the CD which comes with the evaluation
44module. The communication is rather slow, because the serial port has
45to operate at 1200 bps. However, I don't think this is a big concern in
46practice, as these modules are meant for evaluation and testing only.
diff --git a/Documentation/i2c/chips/max6875 b/Documentation/i2c/chips/max6875
index 96fec562a8e9..a0cd8af2f408 100644
--- a/Documentation/i2c/chips/max6875
+++ b/Documentation/i2c/chips/max6875
@@ -99,7 +99,7 @@ And then read the data
99 99
100 or 100 or
101 101
102 count = i2c_smbus_read_i2c_block_data(fd, 0x84, buffer); 102 count = i2c_smbus_read_i2c_block_data(fd, 0x84, 16, buffer);
103 103
104The block read should read 16 bytes. 104The block read should read 16 bytes.
1050x84 is the block read command. 1050x84 is the block read command.
diff --git a/Documentation/i2c/chips/x1205 b/Documentation/i2c/chips/x1205
deleted file mode 100644
index 09407c991fe5..000000000000
--- a/Documentation/i2c/chips/x1205
+++ /dev/null
@@ -1,38 +0,0 @@
1Kernel driver x1205
2===================
3
4Supported chips:
5 * Xicor X1205 RTC
6 Prefix: 'x1205'
7 Addresses scanned: none
8 Datasheet: http://www.intersil.com/cda/deviceinfo/0,1477,X1205,00.html
9
10Authors:
11 Karen Spearel <kas11@tampabay.rr.com>,
12 Alessandro Zummo <a.zummo@towertech.it>
13
14Description
15-----------
16
17This module aims to provide complete access to the Xicor X1205 RTC.
18Recently Xicor has merged with Intersil, but the chip is
19still sold under the Xicor brand.
20
21This chip is located at address 0x6f and uses a 2-byte register addressing.
22Two bytes need to be written to read a single register, while most
23other chips just require one and take the second one as the data
24to be written. To prevent corrupting unknown chips, the user must
25explicitely set the probe parameter.
26
27example:
28
29modprobe x1205 probe=0,0x6f
30
31The module supports one more option, hctosys, which is used to set the
32software clock from the x1205. On systems where the x1205 is the
33only hardware rtc, this parameter could be used to achieve a correct
34date/time earlier in the system boot sequence.
35
36example:
37
38modprobe x1205 probe=0,0x6f hctosys=1
diff --git a/Documentation/i2c/summary b/Documentation/i2c/summary
index aea60bf7e8f0..003c7319b8c7 100644
--- a/Documentation/i2c/summary
+++ b/Documentation/i2c/summary
@@ -67,7 +67,6 @@ i2c-proc: The /proc/sys/dev/sensors interface for device (client) drivers
67Algorithm drivers 67Algorithm drivers
68----------------- 68-----------------
69 69
70i2c-algo-8xx: An algorithm for CPM's I2C device in Motorola 8xx processors (NOT BUILT BY DEFAULT)
71i2c-algo-bit: A bit-banging algorithm 70i2c-algo-bit: A bit-banging algorithm
72i2c-algo-pcf: A PCF 8584 style algorithm 71i2c-algo-pcf: A PCF 8584 style algorithm
73i2c-algo-ibm_ocp: An algorithm for the I2C device in IBM 4xx processors (NOT BUILT BY DEFAULT) 72i2c-algo-ibm_ocp: An algorithm for the I2C device in IBM 4xx processors (NOT BUILT BY DEFAULT)
@@ -81,6 +80,5 @@ i2c-pcf-epp: PCF8584 on a EPP parallel port (uses i2c-algo-pcf) (NOT mkpatch
81i2c-philips-par: Philips style parallel port adapter (uses i2c-algo-bit) 80i2c-philips-par: Philips style parallel port adapter (uses i2c-algo-bit)
82i2c-adap-ibm_ocp: IBM 4xx processor I2C device (uses i2c-algo-ibm_ocp) (NOT BUILT BY DEFAULT) 81i2c-adap-ibm_ocp: IBM 4xx processor I2C device (uses i2c-algo-ibm_ocp) (NOT BUILT BY DEFAULT)
83i2c-pport: Primitive parallel port adapter (uses i2c-algo-bit) 82i2c-pport: Primitive parallel port adapter (uses i2c-algo-bit)
84i2c-rpx: RPX board Motorola 8xx I2C device (uses i2c-algo-8xx) (NOT BUILT BY DEFAULT)
85i2c-velleman: Velleman K8000 parallel port adapter (uses i2c-algo-bit) 83i2c-velleman: Velleman K8000 parallel port adapter (uses i2c-algo-bit)
86 84
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients
index 3d8d36b0ad12..2c170032bf37 100644
--- a/Documentation/i2c/writing-clients
+++ b/Documentation/i2c/writing-clients
@@ -571,7 +571,7 @@ SMBus communication
571 u8 command, u8 length, 571 u8 command, u8 length,
572 u8 *values); 572 u8 *values);
573 extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client, 573 extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client,
574 u8 command, u8 *values); 574 u8 command, u8 length, u8 *values);
575 575
576These ones were removed in Linux 2.6.10 because they had no users, but could 576These ones were removed in Linux 2.6.10 because they had no users, but could
577be added back later if needed: 577be added back later if needed:
diff --git a/Documentation/i386/zero-page.txt b/Documentation/i386/zero-page.txt
index c04a421f4a7c..75b3680c41eb 100644
--- a/Documentation/i386/zero-page.txt
+++ b/Documentation/i386/zero-page.txt
@@ -37,6 +37,7 @@ Offset Type Description
370x1d0 unsigned long EFI memory descriptor map pointer 370x1d0 unsigned long EFI memory descriptor map pointer
380x1d4 unsigned long EFI memory descriptor map size 380x1d4 unsigned long EFI memory descriptor map size
390x1e0 unsigned long ALT_MEM_K, alternative mem check, in Kb 390x1e0 unsigned long ALT_MEM_K, alternative mem check, in Kb
400x1e4 unsigned long Scratch field for the kernel setup code
400x1e8 char number of entries in E820MAP (below) 410x1e8 char number of entries in E820MAP (below)
410x1e9 unsigned char number of entries in EDDBUF (below) 420x1e9 unsigned char number of entries in EDDBUF (below)
420x1ea unsigned char number of entries in EDD_MBR_SIG_BUFFER (below) 430x1ea unsigned char number of entries in EDD_MBR_SIG_BUFFER (below)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index af50f9bbe68e..4d880b3d1f35 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1014,49 +1014,6 @@ and is between 256 and 4096 characters. It is defined in the file
1014 1014
1015 mga= [HW,DRM] 1015 mga= [HW,DRM]
1016 1016
1017 migration_cost=
1018 [KNL,SMP] debug: override scheduler migration costs
1019 Format: <level-1-usecs>,<level-2-usecs>,...
1020 This debugging option can be used to override the
1021 default scheduler migration cost matrix. The numbers
1022 are indexed by 'CPU domain distance'.
1023 E.g. migration_cost=1000,2000,3000 on an SMT NUMA
1024 box will set up an intra-core migration cost of
1025 1 msec, an inter-core migration cost of 2 msecs,
1026 and an inter-node migration cost of 3 msecs.
1027
1028 WARNING: using the wrong values here can break
1029 scheduler performance, so it's only for scheduler
1030 development purposes, not production environments.
1031
1032 migration_debug=
1033 [KNL,SMP] migration cost auto-detect verbosity
1034 Format=<0|1|2>
1035 If a system's migration matrix reported at bootup
1036 seems erroneous then this option can be used to
1037 increase verbosity of the detection process.
1038 We default to 0 (no extra messages), 1 will print
1039 some more information, and 2 will be really
1040 verbose (probably only useful if you also have a
1041 serial console attached to the system).
1042
1043 migration_factor=
1044 [KNL,SMP] multiply/divide migration costs by a factor
1045 Format=<percent>
1046 This debug option can be used to proportionally
1047 increase or decrease the auto-detected migration
1048 costs for all entries of the migration matrix.
1049 E.g. migration_factor=150 will increase migration
1050 costs by 50%. (and thus the scheduler will be less
1051 eager migrating cache-hot tasks)
1052 migration_factor=80 will decrease migration costs
1053 by 20%. (thus the scheduler will be more eager to
1054 migrate tasks)
1055
1056 WARNING: using the wrong values here can break
1057 scheduler performance, so it's only for scheduler
1058 development purposes, not production environments.
1059
1060 mousedev.tap_time= 1017 mousedev.tap_time=
1061 [MOUSE] Maximum time between finger touching and 1018 [MOUSE] Maximum time between finger touching and
1062 leaving touchpad surface for touch to be considered 1019 leaving touchpad surface for touch to be considered
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX
index 153d84d281e6..d63f480afb74 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -96,9 +96,6 @@ routing.txt
96 - the new routing mechanism 96 - the new routing mechanism
97shaper.txt 97shaper.txt
98 - info on the module that can shape/limit transmitted traffic. 98 - info on the module that can shape/limit transmitted traffic.
99sk98lin.txt
100 - Marvell Yukon Chipset / SysKonnect SK-98xx compliant Gigabit
101 Ethernet Adapter family driver info
102skfp.txt 99skfp.txt
103 - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. 100 - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info.
104smc9.txt 101smc9.txt
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index af6a63ab9026..09c184e41cf8 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -874,8 +874,7 @@ accept_redirects - BOOLEAN
874accept_source_route - INTEGER 874accept_source_route - INTEGER
875 Accept source routing (routing extension header). 875 Accept source routing (routing extension header).
876 876
877 > 0: Accept routing header. 877 >= 0: Accept only routing header type 2.
878 = 0: Accept only routing header type 2.
879 < 0: Do not accept routing header. 878 < 0: Do not accept routing header.
880 879
881 Default: 0 880 Default: 0
diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt
new file mode 100644
index 000000000000..2451f551c505
--- /dev/null
+++ b/Documentation/networking/l2tp.txt
@@ -0,0 +1,169 @@
1This brief document describes how to use the kernel's PPPoL2TP driver
2to provide L2TP functionality. L2TP is a protocol that tunnels one or
3more PPP sessions over a UDP tunnel. It is commonly used for VPNs
4(L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP
5network infrastructure.
6
7Design
8======
9
10The PPPoL2TP driver, drivers/net/pppol2tp.c, provides a mechanism by
11which PPP frames carried through an L2TP session are passed through
12the kernel's PPP subsystem. The standard PPP daemon, pppd, handles all
13PPP interaction with the peer. PPP network interfaces are created for
14each local PPP endpoint.
15
16The L2TP protocol http://www.faqs.org/rfcs/rfc2661.html defines L2TP
17control and data frames. L2TP control frames carry messages between
18L2TP clients/servers and are used to setup / teardown tunnels and
19sessions. An L2TP client or server is implemented in userspace and
20will use a regular UDP socket per tunnel. L2TP data frames carry PPP
21frames, which may be PPP control or PPP data. The kernel's PPP
22subsystem arranges for PPP control frames to be delivered to pppd,
23while data frames are forwarded as usual.
24
25Each tunnel and session within a tunnel is assigned a unique tunnel_id
26and session_id. These ids are carried in the L2TP header of every
27control and data packet. The pppol2tp driver uses them to lookup
28internal tunnel and/or session contexts. Zero tunnel / session ids are
29treated specially - zero ids are never assigned to tunnels or sessions
30in the network. In the driver, the tunnel context keeps a pointer to
31the tunnel UDP socket. The session context keeps a pointer to the
32PPPoL2TP socket, as well as other data that lets the driver interface
33to the kernel PPP subsystem.
34
35Note that the pppol2tp kernel driver handles only L2TP data frames;
36L2TP control frames are simply passed up to userspace in the UDP
37tunnel socket. The kernel handles all datapath aspects of the
38protocol, including data packet resequencing (if enabled).
39
40There are a number of requirements on the userspace L2TP daemon in
41order to use the pppol2tp driver.
42
431. Use a UDP socket per tunnel.
44
452. Create a single PPPoL2TP socket per tunnel bound to a special null
46 session id. This is used only for communicating with the driver but
47 must remain open while the tunnel is active. Opening this tunnel
48 management socket causes the driver to mark the tunnel socket as an
49 L2TP UDP encapsulation socket and flags it for use by the
50 referenced tunnel id. This hooks up the UDP receive path via
51 udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed
52 in this special PPPoX socket.
53
543. Create a PPPoL2TP socket per L2TP session. This is typically done
55 by starting pppd with the pppol2tp plugin and appropriate
56 arguments. A PPPoL2TP tunnel management socket (Step 2) must be
57 created before the first PPPoL2TP session socket is created.
58
59When creating PPPoL2TP sockets, the application provides information
60to the driver about the socket in a socket connect() call. Source and
61destination tunnel and session ids are provided, as well as the file
62descriptor of a UDP socket. See struct pppol2tp_addr in
63include/linux/if_ppp.h. Note that zero tunnel / session ids are
64treated specially. When creating the per-tunnel PPPoL2TP management
65socket in Step 2 above, zero source and destination session ids are
66specified, which tells the driver to prepare the supplied UDP file
67descriptor for use as an L2TP tunnel socket.
68
69Userspace may control behavior of the tunnel or session using
70setsockopt and ioctl on the PPPoX socket. The following socket
71options are supported:-
72
73DEBUG - bitmask of debug message categories. See below.
74SENDSEQ - 0 => don't send packets with sequence numbers
75 1 => send packets with sequence numbers
76RECVSEQ - 0 => receive packet sequence numbers are optional
77 1 => drop receive packets without sequence numbers
78LNSMODE - 0 => act as LAC.
79 1 => act as LNS.
80REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder.
81
82Only the DEBUG option is supported by the special tunnel management
83PPPoX socket.
84
85In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided
86to retrieve tunnel and session statistics from the kernel using the
87PPPoX socket of the appropriate tunnel or session.
88
89Debugging
90=========
91
92The driver supports a flexible debug scheme where kernel trace
93messages may be optionally enabled per tunnel and per session. Care is
94needed when debugging a live system since the messages are not
95rate-limited and a busy system could be swamped. Userspace uses
96setsockopt on the PPPoX socket to set a debug mask.
97
98The following debug mask bits are available:
99
100PPPOL2TP_MSG_DEBUG verbose debug (if compiled in)
101PPPOL2TP_MSG_CONTROL userspace - kernel interface
102PPPOL2TP_MSG_SEQ sequence numbers handling
103PPPOL2TP_MSG_DATA data packets
104
105Sample Userspace Code
106=====================
107
1081. Create tunnel management PPPoX socket
109
110 kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
111 if (kernel_fd >= 0) {
112 struct sockaddr_pppol2tp sax;
113 struct sockaddr_in const *peer_addr;
114
115 peer_addr = l2tp_tunnel_get_peer_addr(tunnel);
116 memset(&sax, 0, sizeof(sax));
117 sax.sa_family = AF_PPPOX;
118 sax.sa_protocol = PX_PROTO_OL2TP;
119 sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */
120 sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr;
121 sax.pppol2tp.addr.sin_port = peer_addr->sin_port;
122 sax.pppol2tp.addr.sin_family = AF_INET;
123 sax.pppol2tp.s_tunnel = tunnel_id;
124 sax.pppol2tp.s_session = 0; /* special case: mgmt socket */
125 sax.pppol2tp.d_tunnel = 0;
126 sax.pppol2tp.d_session = 0; /* special case: mgmt socket */
127
128 if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) {
129 perror("connect failed");
130 result = -errno;
131 goto err;
132 }
133 }
134
1352. Create session PPPoX data socket
136
137 struct sockaddr_pppol2tp sax;
138 int fd;
139
140 /* Note, the target socket must be bound already, else it will not be ready */
141 sax.sa_family = AF_PPPOX;
142 sax.sa_protocol = PX_PROTO_OL2TP;
143 sax.pppol2tp.fd = tunnel_fd;
144 sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr;
145 sax.pppol2tp.addr.sin_port = addr->sin_port;
146 sax.pppol2tp.addr.sin_family = AF_INET;
147 sax.pppol2tp.s_tunnel = tunnel_id;
148 sax.pppol2tp.s_session = session_id;
149 sax.pppol2tp.d_tunnel = peer_tunnel_id;
150 sax.pppol2tp.d_session = peer_session_id;
151
152 /* session_fd is the fd of the session's PPPoL2TP socket.
153 * tunnel_fd is the fd of the tunnel UDP socket.
154 */
155 fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax));
156 if (fd < 0 ) {
157 return -errno;
158 }
159 return 0;
160
161Miscellanous
162============
163
164The PPPoL2TP driver was developed as part of the OpenL2TP project by
165Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server,
166designed from the ground up to have the L2TP datapath in the
167kernel. The project also implemented the pppol2tp plugin for pppd
168which allows pppd to use the kernel driver. Details can be found at
169http://openl2tp.sourceforge.net.
diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt
new file mode 100644
index 000000000000..00b60cce2224
--- /dev/null
+++ b/Documentation/networking/multiqueue.txt
@@ -0,0 +1,111 @@
1
2 HOWTO for multiqueue network device support
3 ===========================================
4
5Section 1: Base driver requirements for implementing multiqueue support
6Section 2: Qdisc support for multiqueue devices
7Section 3: Brief howto using PRIO or RR for multiqueue devices
8
9
10Intro: Kernel support for multiqueue devices
11---------------------------------------------------------
12
13Kernel support for multiqueue devices is only an API that is presented to the
14netdevice layer for base drivers to implement. This feature is part of the
15core networking stack, and all network devices will be running on the
16multiqueue-aware stack. If a base driver only has one queue, then these
17changes are transparent to that driver.
18
19
20Section 1: Base driver requirements for implementing multiqueue support
21-----------------------------------------------------------------------
22
23Base drivers are required to use the new alloc_etherdev_mq() or
24alloc_netdev_mq() functions to allocate the subqueues for the device. The
25underlying kernel API will take care of the allocation and deallocation of
26the subqueue memory, as well as netdev configuration of where the queues
27exist in memory.
28
29The base driver will also need to manage the queues as it does the global
30netdev->queue_lock today. Therefore base drivers should use the
31netif_{start|stop|wake}_subqueue() functions to manage each queue while the
32device is still operational. netdev->queue_lock is still used when the device
33comes online or when it's completely shut down (unregister_netdev(), etc.).
34
35Finally, the base driver should indicate that it is a multiqueue device. The
36feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features
37bitmap on device initialization. Below is an example from e1000:
38
39#ifdef CONFIG_E1000_MQ
40 if ( (adapter->hw.mac.type == e1000_82571) ||
41 (adapter->hw.mac.type == e1000_82572) ||
42 (adapter->hw.mac.type == e1000_80003es2lan))
43 netdev->features |= NETIF_F_MULTI_QUEUE;
44#endif
45
46
47Section 2: Qdisc support for multiqueue devices
48-----------------------------------------------
49
50Currently two qdiscs support multiqueue devices. A new round-robin qdisc,
51sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to
52bands and queues, and will store the queue mapping into skb->queue_mapping.
53Use this field in the base driver to determine which queue to send the skb
54to.
55
56sch_rr has been added for hardware that doesn't want scheduling policies from
57software, so it's a straight round-robin qdisc. It uses the same syntax and
58classification priomap that sch_prio uses, so it should be intuitive to
59configure for people who've used sch_prio.
60
61The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been
62built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of
63bands requested is equal to the number of queues on the hardware. If they
64are equal, it sets a one-to-one mapping up between the queues and bands. If
65they're not equal, it will not load the qdisc. This is the same behavior
66for RR. Once the association is made, any skb that is classified will have
67skb->queue_mapping set, which will allow the driver to properly queue skb's
68to multiple queues.
69
70
71Section 3: Brief howto using PRIO and RR for multiqueue devices
72---------------------------------------------------------------
73
74The userspace command 'tc,' part of the iproute2 package, is used to configure
75qdiscs. To add the PRIO qdisc to your network device, assuming the device is
76called eth0, run the following command:
77
78# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue
79
80This will create 4 bands, 0 being highest priority, and associate those bands
81to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping
82would look like:
83
84band 0 => queue 0
85band 1 => queue 1
86band 2 => queue 2
87band 3 => queue 3
88
89Traffic will begin flowing through each queue if your TOS values are assigning
90traffic across the various bands. For example, ssh traffic will always try to
91go out band 0 based on TOS -> Linux priority conversion (realtime traffic),
92so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal"
93traffic classification, which is band 1. Therefore pings will be send out
94queue 1 on the NIC.
95
96Note the use of the multiqueue keyword. This is only in versions of iproute2
97that support multiqueue networking devices; if this is omitted when loading
98a qdisc onto a multiqueue device, the qdisc will load and operate the same
99if it were loaded onto a single-queue device (i.e. - sends all traffic to
100queue 0).
101
102Another alternative to multiqueue band allocation can be done by using the
103multiqueue option and specify 0 bands. If this is the case, the qdisc will
104allocate the number of bands to equal the number of queues that the device
105reports, and bring the qdisc online.
106
107The behavior of tc filters remains the same, where it will override TOS priority
108classification.
109
110
111Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt
index ce1361f95243..37869295fc70 100644
--- a/Documentation/networking/netdevices.txt
+++ b/Documentation/networking/netdevices.txt
@@ -20,6 +20,30 @@ private data which gets freed when the network device is freed. If
20separately allocated data is attached to the network device 20separately allocated data is attached to the network device
21(dev->priv) then it is up to the module exit handler to free that. 21(dev->priv) then it is up to the module exit handler to free that.
22 22
23MTU
24===
25Each network device has a Maximum Transfer Unit. The MTU does not
26include any link layer protocol overhead. Upper layer protocols must
27not pass a socket buffer (skb) to a device to transmit with more data
28than the mtu. The MTU does not include link layer header overhead, so
29for example on Ethernet if the standard MTU is 1500 bytes used, the
30actual skb will contain up to 1514 bytes because of the Ethernet
31header. Devices should allow for the 4 byte VLAN header as well.
32
33Segmentation Offload (GSO, TSO) is an exception to this rule. The
34upper layer protocol may pass a large socket buffer to the device
35transmit routine, and the device will break that up into separate
36packets based on the current MTU.
37
38MTU is symmetrical and applies both to receive and transmit. A device
39must be able to receive at least the maximum size packet allowed by
40the MTU. A network device may use the MTU as mechanism to size receive
41buffers, but the device should allow packets with VLAN header. With
42standard Ethernet mtu of 1500 bytes, the device should allow up to
431518 byte packets (1500 + 14 header + 4 tag). The device may either:
44drop, truncate, or pass up oversize packets, but dropping oversize
45packets is preferred.
46
23 47
24struct net_device synchronization rules 48struct net_device synchronization rules
25======================================= 49=======================================
@@ -43,16 +67,17 @@ dev->get_stats:
43 67
44dev->hard_start_xmit: 68dev->hard_start_xmit:
45 Synchronization: netif_tx_lock spinlock. 69 Synchronization: netif_tx_lock spinlock.
70
46 When the driver sets NETIF_F_LLTX in dev->features this will be 71 When the driver sets NETIF_F_LLTX in dev->features this will be
47 called without holding netif_tx_lock. In this case the driver 72 called without holding netif_tx_lock. In this case the driver
48 has to lock by itself when needed. It is recommended to use a try lock 73 has to lock by itself when needed. It is recommended to use a try lock
49 for this and return -1 when the spin lock fails. 74 for this and return NETDEV_TX_LOCKED when the spin lock fails.
50 The locking there should also properly protect against 75 The locking there should also properly protect against
51 set_multicast_list 76 set_multicast_list.
52 Context: Process with BHs disabled or BH (timer). 77
53 Notes: netif_queue_stopped() is guaranteed false 78 Context: Process with BHs disabled or BH (timer),
54 Interrupts must be enabled when calling hard_start_xmit. 79 will be called with interrupts disabled by netconsole.
55 (Interrupts must also be enabled when enabling the BH handler.) 80
56 Return codes: 81 Return codes:
57 o NETDEV_TX_OK everything ok. 82 o NETDEV_TX_OK everything ok.
58 o NETDEV_TX_BUSY Cannot transmit packet, try later 83 o NETDEV_TX_BUSY Cannot transmit packet, try later
@@ -74,4 +99,5 @@ dev->poll:
74 Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See 99 Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See
75 dev_close code and comments in net/core/dev.c for more info. 100 dev_close code and comments in net/core/dev.c for more info.
76 Context: softirq 101 Context: softirq
102 will be called with interrupts disabled by netconsole.
77 103
diff --git a/Documentation/networking/sk98lin.txt b/Documentation/networking/sk98lin.txt
deleted file mode 100644
index 8590a954df1d..000000000000
--- a/Documentation/networking/sk98lin.txt
+++ /dev/null
@@ -1,568 +0,0 @@
1(C)Copyright 1999-2004 Marvell(R).
2All rights reserved
3===========================================================================
4
5sk98lin.txt created 13-Feb-2004
6
7Readme File for sk98lin v6.23
8Marvell Yukon/SysKonnect SK-98xx Gigabit Ethernet Adapter family driver for LINUX
9
10This file contains
11 1 Overview
12 2 Required Files
13 3 Installation
14 3.1 Driver Installation
15 3.2 Inclusion of adapter at system start
16 4 Driver Parameters
17 4.1 Per-Port Parameters
18 4.2 Adapter Parameters
19 5 Large Frame Support
20 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad)
21 7 Troubleshooting
22
23===========================================================================
24
25
261 Overview
27===========
28
29The sk98lin driver supports the Marvell Yukon and SysKonnect
30SK-98xx/SK-95xx compliant Gigabit Ethernet Adapter on Linux. It has
31been tested with Linux on Intel/x86 machines.
32***
33
34
352 Required Files
36=================
37
38The linux kernel source.
39No additional files required.
40***
41
42
433 Installation
44===============
45
46It is recommended to download the latest version of the driver from the
47SysKonnect web site www.syskonnect.com. If you have downloaded the latest
48driver, the Linux kernel has to be patched before the driver can be
49installed. For details on how to patch a Linux kernel, refer to the
50patch.txt file.
51
523.1 Driver Installation
53------------------------
54
55The following steps describe the actions that are required to install
56the driver and to start it manually. These steps should be carried
57out for the initial driver setup. Once confirmed to be ok, they can
58be included in the system start.
59
60NOTE 1: To perform the following tasks you need 'root' access.
61
62NOTE 2: In case of problems, please read the section "Troubleshooting"
63 below.
64
65The driver can either be integrated into the kernel or it can be compiled
66as a module. Select the appropriate option during the kernel
67configuration.
68
69Compile/use the driver as a module
70----------------------------------
71To compile the driver, go to the directory /usr/src/linux and
72execute the command "make menuconfig" or "make xconfig" and proceed as
73follows:
74
75To integrate the driver permanently into the kernel, proceed as follows:
76
771. Select the menu "Network device support" and then "Ethernet(1000Mbit)"
782. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support"
79 with (*)
803. Build a new kernel when the configuration of the above options is
81 finished.
824. Install the new kernel.
835. Reboot your system.
84
85To use the driver as a module, proceed as follows:
86
871. Enable 'loadable module support' in the kernel.
882. For automatic driver start, enable the 'Kernel module loader'.
893. Select the menu "Network device support" and then "Ethernet(1000Mbit)"
904. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support"
91 with (M)
925. Execute the command "make modules".
936. Execute the command "make modules_install".
94 The appropriate modules will be installed.
957. Reboot your system.
96
97
98Load the module manually
99------------------------
100To load the module manually, proceed as follows:
101
1021. Enter "modprobe sk98lin".
1032. If a Marvell Yukon or SysKonnect SK-98xx adapter is installed in
104 your computer and you have a /proc file system, execute the command:
105 "ls /proc/net/sk98lin/"
106 This should produce an output containing a line with the following
107 format:
108 eth0 eth1 ...
109 which indicates that your adapter has been found and initialized.
110
111 NOTE 1: If you have more than one Marvell Yukon or SysKonnect SK-98xx
112 adapter installed, the adapters will be listed as 'eth0',
113 'eth1', 'eth2', etc.
114 For each adapter, repeat steps 3 and 4 below.
115
116 NOTE 2: If you have other Ethernet adapters installed, your Marvell
117 Yukon or SysKonnect SK-98xx adapter will be mapped to the
118 next available number, e.g. 'eth1'. The mapping is executed
119 automatically.
120 The module installation message (displayed either in a system
121 log file or on the console) prints a line for each adapter
122 found containing the corresponding 'ethX'.
123
1243. Select an IP address and assign it to the respective adapter by
125 entering:
126 ifconfig eth0 <ip-address>
127 With this command, the adapter is connected to the Ethernet.
128
129 SK-98xx Gigabit Ethernet Server Adapters: The yellow LED on the adapter
130 is now active, the link status LED of the primary port is active and
131 the link status LED of the secondary port (on dual port adapters) is
132 blinking (if the ports are connected to a switch or hub).
133 SK-98xx V2.0 Gigabit Ethernet Adapters: The link status LED is active.
134 In addition, you will receive a status message on the console stating
135 "ethX: network connection up using port Y" and showing the selected
136 connection parameters (x stands for the ethernet device number
137 (0,1,2, etc), y stands for the port name (A or B)).
138
139 NOTE: If you are in doubt about IP addresses, ask your network
140 administrator for assistance.
141
1424. Your adapter should now be fully operational.
143 Use 'ping <otherstation>' to verify the connection to other computers
144 on your network.
1455. To check the adapter configuration view /proc/net/sk98lin/[devicename].
146 For example by executing:
147 "cat /proc/net/sk98lin/eth0"
148
149Unload the module
150-----------------
151To stop and unload the driver modules, proceed as follows:
152
1531. Execute the command "ifconfig eth0 down".
1542. Execute the command "rmmod sk98lin".
155
1563.2 Inclusion of adapter at system start
157-----------------------------------------
158
159Since a large number of different Linux distributions are
160available, we are unable to describe a general installation procedure
161for the driver module.
162Because the driver is now integrated in the kernel, installation should
163be easy, using the standard mechanism of your distribution.
164Refer to the distribution's manual for installation of ethernet adapters.
165
166***
167
1684 Driver Parameters
169====================
170
171Parameters can be set at the command line after the module has been
172loaded with the command 'modprobe'.
173In some distributions, the configuration tools are able to pass parameters
174to the driver module.
175
176If you use the kernel module loader, you can set driver parameters
177in the file /etc/modprobe.conf (or /etc/modules.conf in 2.4 or earlier).
178To set the driver parameters in this file, proceed as follows:
179
1801. Insert a line of the form :
181 options sk98lin ...
182 For "...", the same syntax is required as described for the command
183 line parameters of modprobe below.
1842. To activate the new parameters, either reboot your computer
185 or
186 unload and reload the driver.
187 The syntax of the driver parameters is:
188
189 modprobe sk98lin parameter=value1[,value2[,value3...]]
190
191 where value1 refers to the first adapter, value2 to the second etc.
192
193NOTE: All parameters are case sensitive. Write them exactly as shown
194 below.
195
196Example:
197Suppose you have two adapters. You want to set auto-negotiation
198on the first adapter to ON and on the second adapter to OFF.
199You also want to set DuplexCapabilities on the first adapter
200to FULL, and on the second adapter to HALF.
201Then, you must enter:
202
203 modprobe sk98lin AutoNeg_A=On,Off DupCap_A=Full,Half
204
205NOTE: The number of adapters that can be configured this way is
206 limited in the driver (file skge.c, constant SK_MAX_CARD_PARAM).
207 The current limit is 16. If you happen to install
208 more adapters, adjust this and recompile.
209
210
2114.1 Per-Port Parameters
212------------------------
213
214These settings are available for each port on the adapter.
215In the following description, '?' stands for the port for
216which you set the parameter (A or B).
217
218Speed
219-----
220Parameter: Speed_?
221Values: 10, 100, 1000, Auto
222Default: Auto
223
224This parameter is used to set the speed capabilities. It is only valid
225for the SK-98xx V2.0 copper adapters.
226Usually, the speed is negotiated between the two ports during link
227establishment. If this fails, a port can be forced to a specific setting
228with this parameter.
229
230Auto-Negotiation
231----------------
232Parameter: AutoNeg_?
233Values: On, Off, Sense
234Default: On
235
236The "Sense"-mode automatically detects whether the link partner supports
237auto-negotiation or not.
238
239Duplex Capabilities
240-------------------
241Parameter: DupCap_?
242Values: Half, Full, Both
243Default: Both
244
245This parameters is only relevant if auto-negotiation for this port is
246not set to "Sense". If auto-negotiation is set to "On", all three values
247are possible. If it is set to "Off", only "Full" and "Half" are allowed.
248This parameter is useful if your link partner does not support all
249possible combinations.
250
251Flow Control
252------------
253Parameter: FlowCtrl_?
254Values: Sym, SymOrRem, LocSend, None
255Default: SymOrRem
256
257This parameter can be used to set the flow control capabilities the
258port reports during auto-negotiation. It can be set for each port
259individually.
260Possible modes:
261 -- Sym = Symmetric: both link partners are allowed to send
262 PAUSE frames
263 -- SymOrRem = SymmetricOrRemote: both or only remote partner
264 are allowed to send PAUSE frames
265 -- LocSend = LocalSend: only local link partner is allowed
266 to send PAUSE frames
267 -- None = no link partner is allowed to send PAUSE frames
268
269NOTE: This parameter is ignored if auto-negotiation is set to "Off".
270
271Role in Master-Slave-Negotiation (1000Base-T only)
272--------------------------------------------------
273Parameter: Role_?
274Values: Auto, Master, Slave
275Default: Auto
276
277This parameter is only valid for the SK-9821 and SK-9822 adapters.
278For two 1000Base-T ports to communicate, one must take the role of the
279master (providing timing information), while the other must be the
280slave. Usually, this is negotiated between the two ports during link
281establishment. If this fails, a port can be forced to a specific setting
282with this parameter.
283
284
2854.2 Adapter Parameters
286-----------------------
287
288Connection Type (SK-98xx V2.0 copper adapters only)
289---------------
290Parameter: ConType
291Values: Auto, 100FD, 100HD, 10FD, 10HD
292Default: Auto
293
294The parameter 'ConType' is a combination of all five per-port parameters
295within one single parameter. This simplifies the configuration of both ports
296of an adapter card! The different values of this variable reflect the most
297meaningful combinations of port parameters.
298
299The following table shows the values of 'ConType' and the corresponding
300combinations of the per-port parameters:
301
302 ConType | DupCap AutoNeg FlowCtrl Role Speed
303 ----------+------------------------------------------------------
304 Auto | Both On SymOrRem Auto Auto
305 100FD | Full Off None Auto (ignored) 100
306 100HD | Half Off None Auto (ignored) 100
307 10FD | Full Off None Auto (ignored) 10
308 10HD | Half Off None Auto (ignored) 10
309
310Stating any other port parameter together with this 'ConType' variable
311will result in a merged configuration of those settings. This due to
312the fact, that the per-port parameters (e.g. Speed_? ) have a higher
313priority than the combined variable 'ConType'.
314
315NOTE: This parameter is always used on both ports of the adapter card.
316
317Interrupt Moderation
318--------------------
319Parameter: Moderation
320Values: None, Static, Dynamic
321Default: None
322
323Interrupt moderation is employed to limit the maximum number of interrupts
324the driver has to serve. That is, one or more interrupts (which indicate any
325transmit or receive packet to be processed) are queued until the driver
326processes them. When queued interrupts are to be served, is determined by the
327'IntsPerSec' parameter, which is explained later below.
328
329Possible modes:
330
331 -- None - No interrupt moderation is applied on the adapter card.
332 Therefore, each transmit or receive interrupt is served immediately
333 as soon as it appears on the interrupt line of the adapter card.
334
335 -- Static - Interrupt moderation is applied on the adapter card.
336 All transmit and receive interrupts are queued until a complete
337 moderation interval ends. If such a moderation interval ends, all
338 queued interrupts are processed in one big bunch without any delay.
339 The term 'static' reflects the fact, that interrupt moderation is
340 always enabled, regardless how much network load is currently
341 passing via a particular interface. In addition, the duration of
342 the moderation interval has a fixed length that never changes while
343 the driver is operational.
344
345 -- Dynamic - Interrupt moderation might be applied on the adapter card,
346 depending on the load of the system. If the driver detects that the
347 system load is too high, the driver tries to shield the system against
348 too much network load by enabling interrupt moderation. If - at a later
349 time - the CPU utilization decreases again (or if the network load is
350 negligible) the interrupt moderation will automatically be disabled.
351
352Interrupt moderation should be used when the driver has to handle one or more
353interfaces with a high network load, which - as a consequence - leads also to a
354high CPU utilization. When moderation is applied in such high network load
355situations, CPU load might be reduced by 20-30%.
356
357NOTE: The drawback of using interrupt moderation is an increase of the round-
358trip-time (RTT), due to the queueing and serving of interrupts at dedicated
359moderation times.
360
361Interrupts per second
362---------------------
363Parameter: IntsPerSec
364Values: 30...40000 (interrupts per second)
365Default: 2000
366
367This parameter is only used if either static or dynamic interrupt moderation
368is used on a network adapter card. Using this parameter if no moderation is
369applied will lead to no action performed.
370
371This parameter determines the length of any interrupt moderation interval.
372Assuming that static interrupt moderation is to be used, an 'IntsPerSec'
373parameter value of 2000 will lead to an interrupt moderation interval of
374500 microseconds.
375
376NOTE: The duration of the moderation interval is to be chosen with care.
377At first glance, selecting a very long duration (e.g. only 100 interrupts per
378second) seems to be meaningful, but the increase of packet-processing delay
379is tremendous. On the other hand, selecting a very short moderation time might
380compensate the use of any moderation being applied.
381
382
383Preferred Port
384--------------
385Parameter: PrefPort
386Values: A, B
387Default: A
388
389This is used to force the preferred port to A or B (on dual-port network
390adapters). The preferred port is the one that is used if both are detected
391as fully functional.
392
393RLMT Mode (Redundant Link Management Technology)
394------------------------------------------------
395Parameter: RlmtMode
396Values: CheckLinkState,CheckLocalPort, CheckSeg, DualNet
397Default: CheckLinkState
398
399RLMT monitors the status of the port. If the link of the active port
400fails, RLMT switches immediately to the standby link. The virtual link is
401maintained as long as at least one 'physical' link is up.
402
403Possible modes:
404
405 -- CheckLinkState - Check link state only: RLMT uses the link state
406 reported by the adapter hardware for each individual port to
407 determine whether a port can be used for all network traffic or
408 not.
409
410 -- CheckLocalPort - In this mode, RLMT monitors the network path
411 between the two ports of an adapter by regularly exchanging packets
412 between them. This mode requires a network configuration in which
413 the two ports are able to "see" each other (i.e. there must not be
414 any router between the ports).
415
416 -- CheckSeg - Check local port and segmentation: This mode supports the
417 same functions as the CheckLocalPort mode and additionally checks
418 network segmentation between the ports. Therefore, this mode is only
419 to be used if Gigabit Ethernet switches are installed on the network
420 that have been configured to use the Spanning Tree protocol.
421
422 -- DualNet - In this mode, ports A and B are used as separate devices.
423 If you have a dual port adapter, port A will be configured as eth0
424 and port B as eth1. Both ports can be used independently with
425 distinct IP addresses. The preferred port setting is not used.
426 RLMT is turned off.
427
428NOTE: RLMT modes CLP and CLPSS are designed to operate in configurations
429 where a network path between the ports on one adapter exists.
430 Moreover, they are not designed to work where adapters are connected
431 back-to-back.
432***
433
434
4355 Large Frame Support
436======================
437
438The driver supports large frames (also called jumbo frames). Using large
439frames can result in an improved throughput if transferring large amounts
440of data.
441To enable large frames, set the MTU (maximum transfer unit) of the
442interface to the desired value (up to 9000), execute the following
443command:
444 ifconfig eth0 mtu 9000
445This will only work if you have two adapters connected back-to-back
446or if you use a switch that supports large frames. When using a switch,
447it should be configured to allow large frames and auto-negotiation should
448be set to OFF. The setting must be configured on all adapters that can be
449reached by the large frames. If one adapter is not set to receive large
450frames, it will simply drop them.
451
452You can switch back to the standard ethernet frame size by executing the
453following command:
454 ifconfig eth0 mtu 1500
455
456To permanently configure this setting, add a script with the 'ifconfig'
457line to the system startup sequence (named something like "S99sk98lin"
458in /etc/rc.d/rc2.d).
459***
460
461
4626 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad)
463==================================================================
464
465The Marvell Yukon/SysKonnect Linux drivers are able to support VLAN and
466Link Aggregation according to IEEE standards 802.1, 802.1q, and 802.3ad.
467These features are only available after installation of open source
468modules available on the Internet:
469For VLAN go to: http://www.candelatech.com/~greear/vlan.html
470For Link Aggregation go to: http://www.st.rim.or.jp/~yumo
471
472NOTE: SysKonnect GmbH does not offer any support for these open source
473 modules and does not take the responsibility for any kind of
474 failures or problems arising in connection with these modules.
475
476NOTE: Configuring Link Aggregation on a SysKonnect dual link adapter may
477 cause problems when unloading the driver.
478
479
4807 Troubleshooting
481==================
482
483If any problems occur during the installation process, check the
484following list:
485
486
487Problem: The SK-98xx adapter cannot be found by the driver.
488Solution: In /proc/pci search for the following entry:
489 'Ethernet controller: SysKonnect SK-98xx ...'
490 If this entry exists, the SK-98xx or SK-98xx V2.0 adapter has
491 been found by the system and should be operational.
492 If this entry does not exist or if the file '/proc/pci' is not
493 found, there may be a hardware problem or the PCI support may
494 not be enabled in your kernel.
495 The adapter can be checked using the diagnostics program which
496 is available on the SysKonnect web site:
497 www.syskonnect.com
498
499 Some COMPAQ machines have problems dealing with PCI under Linux.
500 This problem is described in the 'PCI howto' document
501 (included in some distributions or available from the
502 web, e.g. at 'www.linux.org').
503
504
505Problem: Programs such as 'ifconfig' or 'route' cannot be found or the
506 error message 'Operation not permitted' is displayed.
507Reason: You are not logged in as user 'root'.
508Solution: Logout and login as 'root' or change to 'root' via 'su'.
509
510
511Problem: Upon use of the command 'ping <address>' the message
512 "ping: sendto: Network is unreachable" is displayed.
513Reason: Your route is not set correctly.
514Solution: If you are using RedHat, you probably forgot to set up the
515 route in the 'network configuration'.
516 Check the existing routes with the 'route' command and check
517 if an entry for 'eth0' exists, and if so, if it is set correctly.
518
519
520Problem: The driver can be started, the adapter is connected to the
521 network, but you cannot receive or transmit any packets;
522 e.g. 'ping' does not work.
523Reason: There is an incorrect route in your routing table.
524Solution: Check the routing table with the command 'route' and read the
525 manual help pages dealing with routes (enter 'man route').
526
527NOTE: Although the 2.2.x kernel versions generate the routing entry
528 automatically, problems of this kind may occur here as well. We've
529 come across a situation in which the driver started correctly at
530 system start, but after the driver has been removed and reloaded,
531 the route of the adapter's network pointed to the 'dummy0'device
532 and had to be corrected manually.
533
534
535Problem: Your computer should act as a router between multiple
536 IP subnetworks (using multiple adapters), but computers in
537 other subnetworks cannot be reached.
538Reason: Either the router's kernel is not configured for IP forwarding
539 or the routing table and gateway configuration of at least one
540 computer is not working.
541
542Problem: Upon driver start, the following error message is displayed:
543 "eth0: -- ERROR --
544 Class: internal Software error
545 Nr: 0xcc
546 Msg: SkGeInitPort() cannot init running ports"
547Reason: You are using a driver compiled for single processor machines
548 on a multiprocessor machine with SMP (Symmetric MultiProcessor)
549 kernel.
550Solution: Configure your kernel appropriately and recompile the kernel or
551 the modules.
552
553
554
555If your problem is not listed here, please contact SysKonnect's technical
556support for help (linux@syskonnect.de).
557When contacting our technical support, please ensure that the following
558information is available:
559- System Manufacturer and HW Informations (CPU, Memory... )
560- PCI-Boards in your system
561- Distribution
562- Kernel version
563- Driver version
564***
565
566
567
568***End of Readme File***
diff --git a/Documentation/networking/spider_net.txt b/Documentation/networking/spider_net.txt
new file mode 100644
index 000000000000..4b4adb8eb14f
--- /dev/null
+++ b/Documentation/networking/spider_net.txt
@@ -0,0 +1,204 @@
1
2 The Spidernet Device Driver
3 ===========================
4
5Written by Linas Vepstas <linas@austin.ibm.com>
6
7Version of 7 June 2007
8
9Abstract
10========
11This document sketches the structure of portions of the spidernet
12device driver in the Linux kernel tree. The spidernet is a gigabit
13ethernet device built into the Toshiba southbridge commonly used
14in the SONY Playstation 3 and the IBM QS20 Cell blade.
15
16The Structure of the RX Ring.
17=============================
18The receive (RX) ring is a circular linked list of RX descriptors,
19together with three pointers into the ring that are used to manage its
20contents.
21
22The elements of the ring are called "descriptors" or "descrs"; they
23describe the received data. This includes a pointer to a buffer
24containing the received data, the buffer size, and various status bits.
25
26There are three primary states that a descriptor can be in: "empty",
27"full" and "not-in-use". An "empty" or "ready" descriptor is ready
28to receive data from the hardware. A "full" descriptor has data in it,
29and is waiting to be emptied and processed by the OS. A "not-in-use"
30descriptor is neither empty or full; it is simply not ready. It may
31not even have a data buffer in it, or is otherwise unusable.
32
33During normal operation, on device startup, the OS (specifically, the
34spidernet device driver) allocates a set of RX descriptors and RX
35buffers. These are all marked "empty", ready to receive data. This
36ring is handed off to the hardware, which sequentially fills in the
37buffers, and marks them "full". The OS follows up, taking the full
38buffers, processing them, and re-marking them empty.
39
40This filling and emptying is managed by three pointers, the "head"
41and "tail" pointers, managed by the OS, and a hardware current
42descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
43currently being filled. When this descr is filled, the hardware
44marks it full, and advances the GDACTDPA by one. Thus, when there is
45flowing RX traffic, every descr behind it should be marked "full",
46and everything in front of it should be "empty". If the hardware
47discovers that the current descr is not empty, it will signal an
48interrupt, and halt processing.
49
50The tail pointer tails or trails the hardware pointer. When the
51hardware is ahead, the tail pointer will be pointing at a "full"
52descr. The OS will process this descr, and then mark it "not-in-use",
53and advance the tail pointer. Thus, when there is flowing RX traffic,
54all of the descrs in front of the tail pointer should be "full", and
55all of those behind it should be "not-in-use". When RX traffic is not
56flowing, then the tail pointer can catch up to the hardware pointer.
57The OS will then note that the current tail is "empty", and halt
58processing.
59
60The head pointer (somewhat mis-named) follows after the tail pointer.
61When traffic is flowing, then the head pointer will be pointing at
62a "not-in-use" descr. The OS will perform various housekeeping duties
63on this descr. This includes allocating a new data buffer and
64dma-mapping it so as to make it visible to the hardware. The OS will
65then mark the descr as "empty", ready to receive data. Thus, when there
66is flowing RX traffic, everything in front of the head pointer should
67be "not-in-use", and everything behind it should be "empty". If no
68RX traffic is flowing, then the head pointer can catch up to the tail
69pointer, at which point the OS will notice that the head descr is
70"empty", and it will halt processing.
71
72Thus, in an idle system, the GDACTDPA, tail and head pointers will
73all be pointing at the same descr, which should be "empty". All of the
74other descrs in the ring should be "empty" as well.
75
76The show_rx_chain() routine will print out the the locations of the
77GDACTDPA, tail and head pointers. It will also summarize the contents
78of the ring, starting at the tail pointer, and listing the status
79of the descrs that follow.
80
81A typical example of the output, for a nearly idle system, might be
82
83net eth1: Total number of descrs=256
84net eth1: Chain tail located at descr=20
85net eth1: Chain head is at 20
86net eth1: HW curr desc (GDACTDPA) is at 21
87net eth1: Have 1 descrs with stat=x40800101
88net eth1: HW next desc (GDACNEXTDA) is at 22
89net eth1: Last 255 descrs with stat=xa0800000
90
91In the above, the hardware has filled in one descr, number 20. Both
92head and tail are pointing at 20, because it has not yet been emptied.
93Meanwhile, hw is pointing at 21, which is free.
94
95The "Have nnn decrs" refers to the descr starting at the tail: in this
96case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers
97to all of the rest of the descrs, from the last status change. The "nnn"
98is a count of how many descrs have exactly the same status.
99
100The status x4... corresponds to "full" and status xa... corresponds
101to "empty". The actual value printed is RXCOMST_A.
102
103In the device driver source code, a different set of names are
104used for these same concepts, so that
105
106"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
107"full" == SPIDER_NET_DESCR_FRAME_END == 0x4
108"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
109
110
111The RX RAM full bug/feature
112===========================
113
114As long as the OS can empty out the RX buffers at a rate faster than
115the hardware can fill them, there is no problem. If, for some reason,
116the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
117pointer will catch up to the head, notice the not-empty condition,
118ad stop. However, RX packets may still continue arriving on the wire.
119The spidernet chip can save some limited number of these in local RAM.
120When this local ram fills up, the spider chip will issue an interrupt
121indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
122will be set in GHIINT1STS). When the RX ram full condition occurs,
123a certain bug/feature is triggered that has to be specially handled.
124This section describes the special handling for this condition.
125
126When the OS finally has a chance to run, it will empty out the RX ring.
127In particular, it will clear the descriptor on which the hardware had
128stopped. However, once the hardware has decided that a certain
129descriptor is invalid, it will not restart at that descriptor; instead
130it will restart at the next descr. This potentially will lead to a
131deadlock condition, as the tail pointer will be pointing at this descr,
132which, from the OS point of view, is empty; the OS will be waiting for
133this descr to be filled. However, the hardware has skipped this descr,
134and is filling the next descrs. Since the OS doesn't see this, there
135is a potential deadlock, with the OS waiting for one descr to fill,
136while the hardware is waiting for a different set of descrs to become
137empty.
138
139A call to show_rx_chain() at this point indicates the nature of the
140problem. A typical print when the network is hung shows the following:
141
142net eth1: Spider RX RAM full, incoming packets might be discarded!
143net eth1: Total number of descrs=256
144net eth1: Chain tail located at descr=255
145net eth1: Chain head is at 255
146net eth1: HW curr desc (GDACTDPA) is at 0
147net eth1: Have 1 descrs with stat=xa0800000
148net eth1: HW next desc (GDACNEXTDA) is at 1
149net eth1: Have 127 descrs with stat=x40800101
150net eth1: Have 1 descrs with stat=x40800001
151net eth1: Have 126 descrs with stat=x40800101
152net eth1: Last 1 descrs with stat=xa0800000
153
154Both the tail and head pointers are pointing at descr 255, which is
155marked xa... which is "empty". Thus, from the OS point of view, there
156is nothing to be done. In particular, there is the implicit assumption
157that everything in front of the "empty" descr must surely also be empty,
158as explained in the last section. The OS is waiting for descr 255 to
159become non-empty, which, in this case, will never happen.
160
161The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
162Since its already full, the hardware can do nothing more, and thus has
163halted processing. Notice that descrs 0 through 254 are all marked
164"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
165descr 254, since tail was at 255.) Thus, the system is deadlocked,
166and there can be no forward progress; the OS thinks there's nothing
167to do, and the hardware has nowhere to put incoming data.
168
169This bug/feature is worked around with the spider_net_resync_head_ptr()
170routine. When the driver receives RX interrupts, but an examination
171of the RX chain seems to show it is empty, then it is probable that
172the hardware has skipped a descr or two (sometimes dozens under heavy
173network conditions). The spider_net_resync_head_ptr() subroutine will
174search the ring for the next full descr, and the driver will resume
175operations there. Since this will leave "holes" in the ring, there
176is also a spider_net_resync_tail_ptr() that will skip over such holes.
177
178As of this writing, the spider_net_resync() strategy seems to work very
179well, even under heavy network loads.
180
181
182The TX ring
183===========
184The TX ring uses a low-watermark interrupt scheme to make sure that
185the TX queue is appropriately serviced for large packet sizes.
186
187For packet sizes greater than about 1KBytes, the kernel can fill
188the TX ring quicker than the device can drain it. Once the ring
189is full, the netdev is stopped. When there is room in the ring,
190the netdev needs to be reawakened, so that more TX packets are placed
191in the ring. The hardware can empty the ring about four times per jiffy,
192so its not appropriate to wait for the poll routine to refill, since
193the poll routine runs only once per jiffy. The low-watermark mechanism
194marks a descr about 1/4th of the way from the bottom of the queue, so
195that an interrupt is generated when the descr is processed. This
196interrupt wakes up the netdev, which can then refill the queue.
197For large packets, this mechanism generates a relatively small number
198of interrupts, about 1K/sec. For smaller packets, this will drop to zero
199interrupts, as the hardware can empty the queue faster than the kernel
200can fill it.
201
202
203 ======= END OF DOCUMENT ========
204
diff --git a/Documentation/pci.txt b/Documentation/pci.txt
index d38261b67905..7754f5aea4e9 100644
--- a/Documentation/pci.txt
+++ b/Documentation/pci.txt
@@ -113,9 +113,6 @@ initialization with a pointer to a structure describing the driver
113 (Please see Documentation/power/pci.txt for descriptions 113 (Please see Documentation/power/pci.txt for descriptions
114 of PCI Power Management and the related functions.) 114 of PCI Power Management and the related functions.)
115 115
116 enable_wake Enable device to generate wake events from a low power
117 state.
118
119 shutdown Hook into reboot_notifier_list (kernel/sys.c). 116 shutdown Hook into reboot_notifier_list (kernel/sys.c).
120 Intended to stop any idling DMA operations. 117 Intended to stop any idling DMA operations.
121 Useful for enabling wake-on-lan (NIC) or changing 118 Useful for enabling wake-on-lan (NIC) or changing
@@ -299,7 +296,10 @@ If the PCI device can use the PCI Memory-Write-Invalidate transaction,
299call pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval 296call pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval
300and also ensures that the cache line size register is set correctly. 297and also ensures that the cache line size register is set correctly.
301Check the return value of pci_set_mwi() as not all architectures 298Check the return value of pci_set_mwi() as not all architectures
302or chip-sets may support Memory-Write-Invalidate. 299or chip-sets may support Memory-Write-Invalidate. Alternatively,
300if Mem-Wr-Inval would be nice to have but is not required, call
301pci_try_set_mwi() to have the system do its best effort at enabling
302Mem-Wr-Inval.
303 303
304 304
3053.2 Request MMIO/IOP resources 3053.2 Request MMIO/IOP resources
diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt
index e00b099a4b86..dd8fe43888d3 100644
--- a/Documentation/power/pci.txt
+++ b/Documentation/power/pci.txt
@@ -164,7 +164,6 @@ struct pci_driver:
164 164
165 int (*suspend) (struct pci_dev *dev, pm_message_t state); 165 int (*suspend) (struct pci_dev *dev, pm_message_t state);
166 int (*resume) (struct pci_dev *dev); 166 int (*resume) (struct pci_dev *dev);
167 int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable);
168 167
169 168
170suspend 169suspend
@@ -251,42 +250,6 @@ The driver should update the current_state field in its pci_dev structure in
251this function, except for PM-capable devices when pci_set_power_state is used. 250this function, except for PM-capable devices when pci_set_power_state is used.
252 251
253 252
254enable_wake
255-----------
256
257Usage:
258
259if (dev->driver && dev->driver->enable_wake)
260 dev->driver->enable_wake(dev,state,enable);
261
262This callback is generally only relevant for devices that support the PCI PM
263spec and have the ability to generate a PME# (Power Management Event Signal)
264to wake the system up. (However, it is possible that a device may support
265some non-standard way of generating a wake event on sleep.)
266
267Bits 15:11 of the PMC (Power Mgmt Capabilities) Register in a device's
268PM Capabilities describe what power states the device supports generating a
269wake event from:
270
271+------------------+
272| Bit | State |
273+------------------+
274| 11 | D0 |
275| 12 | D1 |
276| 13 | D2 |
277| 14 | D3hot |
278| 15 | D3cold |
279+------------------+
280
281A device can use this to enable wake events:
282
283 pci_enable_wake(dev,state,enable);
284
285Note that to enable PME# from D3cold, a value of 4 should be passed to
286pci_enable_wake (since it uses an index into a bitmask). If a driver gets
287a request to enable wake events from D3, two calls should be made to
288pci_enable_wake (one for both D3hot and D3cold).
289
290 253
291A reference implementation 254A reference implementation
292------------------------- 255-------------------------
diff --git a/Documentation/power_supply_class.txt b/Documentation/power_supply_class.txt
new file mode 100644
index 000000000000..9758cf433c06
--- /dev/null
+++ b/Documentation/power_supply_class.txt
@@ -0,0 +1,167 @@
1Linux power supply class
2========================
3
4Synopsis
5~~~~~~~~
6Power supply class used to represent battery, UPS, AC or DC power supply
7properties to user-space.
8
9It defines core set of attributes, which should be applicable to (almost)
10every power supply out there. Attributes are available via sysfs and uevent
11interfaces.
12
13Each attribute has well defined meaning, up to unit of measure used. While
14the attributes provided are believed to be universally applicable to any
15power supply, specific monitoring hardware may not be able to provide them
16all, so any of them may be skipped.
17
18Power supply class is extensible, and allows to define drivers own attributes.
19The core attribute set is subject to the standard Linux evolution (i.e.
20if it will be found that some attribute is applicable to many power supply
21types or their drivers, it can be added to the core set).
22
23It also integrates with LED framework, for the purpose of providing
24typically expected feedback of battery charging/fully charged status and
25AC/USB power supply online status. (Note that specific details of the
26indication (including whether to use it at all) are fully controllable by
27user and/or specific machine defaults, per design principles of LED
28framework).
29
30
31Attributes/properties
32~~~~~~~~~~~~~~~~~~~~~
33Power supply class has predefined set of attributes, this eliminates code
34duplication across drivers. Power supply class insist on reusing its
35predefined attributes *and* their units.
36
37So, userspace gets predictable set of attributes and their units for any
38kind of power supply, and can process/present them to a user in consistent
39manner. Results for different power supplies and machines are also directly
40comparable.
41
42See drivers/power/ds2760_battery.c and drivers/power/pda_power.c for the
43example how to declare and handle attributes.
44
45
46Units
47~~~~~
48Quoting include/linux/power_supply.h:
49
50 All voltages, currents, charges, energies, time and temperatures in µV,
51 µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise
52 stated. It's driver's job to convert its raw values to units in which
53 this class operates.
54
55
56Attributes/properties detailed
57~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
58
59~ ~ ~ ~ ~ ~ ~ Charge/Energy/Capacity - how to not confuse ~ ~ ~ ~ ~ ~ ~
60~ ~
61~ Because both "charge" (µAh) and "energy" (µWh) represents "capacity" ~
62~ of battery, this class distinguish these terms. Don't mix them! ~
63~ ~
64~ CHARGE_* attributes represents capacity in µAh only. ~
65~ ENERGY_* attributes represents capacity in µWh only. ~
66~ CAPACITY attribute represents capacity in *percents*, from 0 to 100. ~
67~ ~
68~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
69
70Postfixes:
71_AVG - *hardware* averaged value, use it if your hardware is really able to
72report averaged values.
73_NOW - momentary/instantaneous values.
74
75STATUS - this attribute represents operating status (charging, full,
76discharging (i.e. powering a load), etc.). This corresponds to
77BATTERY_STATUS_* values, as defined in battery.h.
78
79HEALTH - represents health of the battery, values corresponds to
80POWER_SUPPLY_HEALTH_*, defined in battery.h.
81
82VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN - design values for maximal and
83minimal power supply voltages. Maximal/minimal means values of voltages
84when battery considered "full"/"empty" at normal conditions. Yes, there is
85no direct relation between voltage and battery capacity, but some dumb
86batteries use voltage for very approximated calculation of capacity.
87Battery driver also can use this attribute just to inform userspace
88about maximal and minimal voltage thresholds of a given battery.
89
90CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN - design charge values, when
91battery considered full/empty.
92
93ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN - same as above but for energy.
94
95CHARGE_FULL, CHARGE_EMPTY - These attributes means "last remembered value
96of charge when battery became full/empty". It also could mean "value of
97charge when battery considered full/empty at given conditions (temperature,
98age)". I.e. these attributes represents real thresholds, not design values.
99
100ENERGY_FULL, ENERGY_EMPTY - same as above but for energy.
101
102CAPACITY - capacity in percents.
103CAPACITY_LEVEL - capacity level. This corresponds to
104POWER_SUPPLY_CAPACITY_LEVEL_*.
105
106TEMP - temperature of the power supply.
107TEMP_AMBIENT - ambient temperature.
108
109TIME_TO_EMPTY - seconds left for battery to be considered empty (i.e.
110while battery powers a load)
111TIME_TO_FULL - seconds left for battery to be considered full (i.e.
112while battery is charging)
113
114
115Battery <-> external power supply interaction
116~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
117Often power supplies are acting as supplies and supplicants at the same
118time. Batteries are good example. So, batteries usually care if they're
119externally powered or not.
120
121For that case, power supply class implements notification mechanism for
122batteries.
123
124External power supply (AC) lists supplicants (batteries) names in
125"supplied_to" struct member, and each power_supply_changed() call
126issued by external power supply will notify supplicants via
127external_power_changed callback.
128
129
130QA
131~~
132Q: Where is POWER_SUPPLY_PROP_XYZ attribute?
133A: If you cannot find attribute suitable for your driver needs, feel free
134 to add it and send patch along with your driver.
135
136 The attributes available currently are the ones currently provided by the
137 drivers written.
138
139 Good candidates to add in future: model/part#, cycle_time, manufacturer,
140 etc.
141
142
143Q: I have some very specific attribute (e.g. battery color), should I add
144 this attribute to standard ones?
145A: Most likely, no. Such attribute can be placed in the driver itself, if
146 it is useful. Of course, if the attribute in question applicable to
147 large set of batteries, provided by many drivers, and/or comes from
148 some general battery specification/standard, it may be a candidate to
149 be added to the core attribute set.
150
151
152Q: Suppose, my battery monitoring chip/firmware does not provides capacity
153 in percents, but provides charge_{now,full,empty}. Should I calculate
154 percentage capacity manually, inside the driver, and register CAPACITY
155 attribute? The same question about time_to_empty/time_to_full.
156A: Most likely, no. This class is designed to export properties which are
157 directly measurable by the specific hardware available.
158
159 Inferring not available properties using some heuristics or mathematical
160 model is not subject of work for a battery driver. Such functionality
161 should be factored out, and in fact, apm_power, the driver to serve
162 legacy APM API on top of power supply class, uses a simple heuristic of
163 approximating remaining battery capacity based on its charge, current,
164 voltage and so on. But full-fledged battery model is likely not subject
165 for kernel at all, as it would require floating point calculation to deal
166 with things like differential equations and Kalman filters. This is
167 better be handled by batteryd/libbattery, yet to be written.
diff --git a/Documentation/sched-design-CFS.txt b/Documentation/sched-design-CFS.txt
new file mode 100644
index 000000000000..16feebb7bdc0
--- /dev/null
+++ b/Documentation/sched-design-CFS.txt
@@ -0,0 +1,119 @@
1
2This is the CFS scheduler.
3
480% of CFS's design can be summed up in a single sentence: CFS basically
5models an "ideal, precise multi-tasking CPU" on real hardware.
6
7"Ideal multi-tasking CPU" is a (non-existent :-)) CPU that has 100%
8physical power and which can run each task at precise equal speed, in
9parallel, each at 1/nr_running speed. For example: if there are 2 tasks
10running then it runs each at 50% physical power - totally in parallel.
11
12On real hardware, we can run only a single task at once, so while that
13one task runs, the other tasks that are waiting for the CPU are at a
14disadvantage - the current task gets an unfair amount of CPU time. In
15CFS this fairness imbalance is expressed and tracked via the per-task
16p->wait_runtime (nanosec-unit) value. "wait_runtime" is the amount of
17time the task should now run on the CPU for it to become completely fair
18and balanced.
19
20( small detail: on 'ideal' hardware, the p->wait_runtime value would
21 always be zero - no task would ever get 'out of balance' from the
22 'ideal' share of CPU time. )
23
24CFS's task picking logic is based on this p->wait_runtime value and it
25is thus very simple: it always tries to run the task with the largest
26p->wait_runtime value. In other words, CFS tries to run the task with
27the 'gravest need' for more CPU time. So CFS always tries to split up
28CPU time between runnable tasks as close to 'ideal multitasking
29hardware' as possible.
30
31Most of the rest of CFS's design just falls out of this really simple
32concept, with a few add-on embellishments like nice levels,
33multiprocessing and various algorithm variants to recognize sleepers.
34
35In practice it works like this: the system runs a task a bit, and when
36the task schedules (or a scheduler tick happens) the task's CPU usage is
37'accounted for': the (small) time it just spent using the physical CPU
38is deducted from p->wait_runtime. [minus the 'fair share' it would have
39gotten anyway]. Once p->wait_runtime gets low enough so that another
40task becomes the 'leftmost task' of the time-ordered rbtree it maintains
41(plus a small amount of 'granularity' distance relative to the leftmost
42task so that we do not over-schedule tasks and trash the cache) then the
43new leftmost task is picked and the current task is preempted.
44
45The rq->fair_clock value tracks the 'CPU time a runnable task would have
46fairly gotten, had it been runnable during that time'. So by using
47rq->fair_clock values we can accurately timestamp and measure the
48'expected CPU time' a task should have gotten. All runnable tasks are
49sorted in the rbtree by the "rq->fair_clock - p->wait_runtime" key, and
50CFS picks the 'leftmost' task and sticks to it. As the system progresses
51forwards, newly woken tasks are put into the tree more and more to the
52right - slowly but surely giving a chance for every task to become the
53'leftmost task' and thus get on the CPU within a deterministic amount of
54time.
55
56Some implementation details:
57
58 - the introduction of Scheduling Classes: an extensible hierarchy of
59 scheduler modules. These modules encapsulate scheduling policy
60 details and are handled by the scheduler core without the core
61 code assuming about them too much.
62
63 - sched_fair.c implements the 'CFS desktop scheduler': it is a
64 replacement for the vanilla scheduler's SCHED_OTHER interactivity
65 code.
66
67 I'd like to give credit to Con Kolivas for the general approach here:
68 he has proven via RSDL/SD that 'fair scheduling' is possible and that
69 it results in better desktop scheduling. Kudos Con!
70
71 The CFS patch uses a completely different approach and implementation
72 from RSDL/SD. My goal was to make CFS's interactivity quality exceed
73 that of RSDL/SD, which is a high standard to meet :-) Testing
74 feedback is welcome to decide this one way or another. [ and, in any
75 case, all of SD's logic could be added via a kernel/sched_sd.c module
76 as well, if Con is interested in such an approach. ]
77
78 CFS's design is quite radical: it does not use runqueues, it uses a
79 time-ordered rbtree to build a 'timeline' of future task execution,
80 and thus has no 'array switch' artifacts (by which both the vanilla
81 scheduler and RSDL/SD are affected).
82
83 CFS uses nanosecond granularity accounting and does not rely on any
84 jiffies or other HZ detail. Thus the CFS scheduler has no notion of
85 'timeslices' and has no heuristics whatsoever. There is only one
86 central tunable:
87
88 /proc/sys/kernel/sched_granularity_ns
89
90 which can be used to tune the scheduler from 'desktop' (low
91 latencies) to 'server' (good batching) workloads. It defaults to a
92 setting suitable for desktop workloads. SCHED_BATCH is handled by the
93 CFS scheduler module too.
94
95 Due to its design, the CFS scheduler is not prone to any of the
96 'attacks' that exist today against the heuristics of the stock
97 scheduler: fiftyp.c, thud.c, chew.c, ring-test.c, massive_intr.c all
98 work fine and do not impact interactivity and produce the expected
99 behavior.
100
101 the CFS scheduler has a much stronger handling of nice levels and
102 SCHED_BATCH: both types of workloads should be isolated much more
103 agressively than under the vanilla scheduler.
104
105 ( another detail: due to nanosec accounting and timeline sorting,
106 sched_yield() support is very simple under CFS, and in fact under
107 CFS sched_yield() behaves much better than under any other
108 scheduler i have tested so far. )
109
110 - sched_rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler
111 way than the vanilla scheduler does. It uses 100 runqueues (for all
112 100 RT priority levels, instead of 140 in the vanilla scheduler)
113 and it needs no expired array.
114
115 - reworked/sanitized SMP load-balancing: the runqueue-walking
116 assumptions are gone from the load-balancing code now, and
117 iterators of the scheduling modules are used. The balancing code got
118 quite a bit simpler as a result.
119
diff --git a/Documentation/sysfs-rules.txt b/Documentation/sysfs-rules.txt
new file mode 100644
index 000000000000..42861bb0bc9b
--- /dev/null
+++ b/Documentation/sysfs-rules.txt
@@ -0,0 +1,166 @@
1Rules on how to access information in the Linux kernel sysfs
2
3The kernel exported sysfs exports internal kernel implementation-details
4and depends on internal kernel structures and layout. It is agreed upon
5by the kernel developers that the Linux kernel does not provide a stable
6internal API. As sysfs is a direct export of kernel internal
7structures, the sysfs interface can not provide a stable interface eighter,
8it may always change along with internal kernel changes.
9
10To minimize the risk of breaking users of sysfs, which are in most cases
11low-level userspace applications, with a new kernel release, the users
12of sysfs must follow some rules to use an as abstract-as-possible way to
13access this filesystem. The current udev and HAL programs already
14implement this and users are encouraged to plug, if possible, into the
15abstractions these programs provide instead of accessing sysfs
16directly.
17
18But if you really do want or need to access sysfs directly, please follow
19the following rules and then your programs should work with future
20versions of the sysfs interface.
21
22- Do not use libsysfs
23 It makes assumptions about sysfs which are not true. Its API does not
24 offer any abstraction, it exposes all the kernel driver-core
25 implementation details in its own API. Therefore it is not better than
26 reading directories and opening the files yourself.
27 Also, it is not actively maintained, in the sense of reflecting the
28 current kernel-development. The goal of providing a stable interface
29 to sysfs has failed, it causes more problems, than it solves. It
30 violates many of the rules in this document.
31
32- sysfs is always at /sys
33 Parsing /proc/mounts is a waste of time. Other mount points are a
34 system configuration bug you should not try to solve. For test cases,
35 possibly support a SYSFS_PATH environment variable to overwrite the
36 applications behavior, but never try to search for sysfs. Never try
37 to mount it, if you are not an early boot script.
38
39- devices are only "devices"
40 There is no such thing like class-, bus-, physical devices,
41 interfaces, and such that you can rely on in userspace. Everything is
42 just simply a "device". Class-, bus-, physical, ... types are just
43 kernel implementation details, which should not be expected by
44 applications that look for devices in sysfs.
45
46 The properties of a device are:
47 o devpath (/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0)
48 - identical to the DEVPATH value in the event sent from the kernel
49 at device creation and removal
50 - the unique key to the device at that point in time
51 - the kernels path to the device-directory without the leading
52 /sys, and always starting with with a slash
53 - all elements of a devpath must be real directories. Symlinks
54 pointing to /sys/devices must always be resolved to their real
55 target, and the target path must be used to access the device.
56 That way the devpath to the device matches the devpath of the
57 kernel used at event time.
58 - using or exposing symlink values as elements in a devpath string
59 is a bug in the application
60
61 o kernel name (sda, tty, 0000:00:1f.2, ...)
62 - a directory name, identical to the last element of the devpath
63 - applications need to handle spaces and characters like '!' in
64 the name
65
66 o subsystem (block, tty, pci, ...)
67 - simple string, never a path or a link
68 - retrieved by reading the "subsystem"-link and using only the
69 last element of the target path
70
71 o driver (tg3, ata_piix, uhci_hcd)
72 - a simple string, which may contain spaces, never a path or a
73 link
74 - it is retrieved by reading the "driver"-link and using only the
75 last element of the target path
76 - devices which do not have "driver"-link, just do not have a
77 driver; copying the driver value in a child device context, is a
78 bug in the application
79
80 o attributes
81 - the files in the device directory or files below a subdirectories
82 of the same device directory
83 - accessing attributes reached by a symlink pointing to another device,
84 like the "device"-link, is a bug in the application
85
86 Everything else is just a kernel driver-core implementation detail,
87 that should not be assumed to be stable across kernel releases.
88
89- Properties of parent devices never belong into a child device.
90 Always look at the parent devices themselves for determining device
91 context properties. If the device 'eth0' or 'sda' does not have a
92 "driver"-link, then this device does not have a driver. Its value is empty.
93 Never copy any property of the parent-device into a child-device. Parent
94 device-properties may change dynamically without any notice to the
95 child device.
96
97- Hierarchy in a single device-tree
98 There is only one valid place in sysfs where hierarchy can be examined
99 and this is below: /sys/devices.
100 It is planned, that all device directories will end up in the tree
101 below this directory.
102
103- Classification by subsystem
104 There are currently three places for classification of devices:
105 /sys/block, /sys/class and /sys/bus. It is planned that these will
106 not contain any device-directories themselves, but only flat lists of
107 symlinks pointing to the unified /sys/devices tree.
108 All three places have completely different rules on how to access
109 device information. It is planned to merge all three
110 classification-directories into one place at /sys/subsystem,
111 following the layout of the bus-directories. All buses and
112 classes, including the converted block-subsystem, will show up
113 there.
114 The devices belonging to a subsystem will create a symlink in the
115 "devices" directory at /sys/subsystem/<name>/devices.
116
117 If /sys/subsystem exists, /sys/bus, /sys/class and /sys/block can be
118 ignored. If it does not exist, you have always to scan all three
119 places, as the kernel is free to move a subsystem from one place to
120 the other, as long as the devices are still reachable by the same
121 subsystem name.
122
123 Assuming /sys/class/<subsystem> and /sys/bus/<subsystem>, or
124 /sys/block and /sys/class/block are not interchangeable, is a bug in
125 the application.
126
127- Block
128 The converted block-subsystem at /sys/class/block, or
129 /sys/subsystem/block will contain the links for disks and partitions
130 at the same level, never in a hierarchy. Assuming the block-subsytem to
131 contain only disks and not partition-devices in the same flat list is
132 a bug in the application.
133
134- "device"-link and <subsystem>:<kernel name>-links
135 Never depend on the "device"-link. The "device"-link is a workaround
136 for the old layout, where class-devices are not created in
137 /sys/devices/ like the bus-devices. If the link-resolving of a
138 device-directory does not end in /sys/devices/, you can use the
139 "device"-link to find the parent devices in /sys/devices/. That is the
140 single valid use of the "device"-link, it must never appear in any
141 path as an element. Assuming the existence of the "device"-link for
142 a device in /sys/devices/ is a bug in the application.
143 Accessing /sys/class/net/eth0/device is a bug in the application.
144
145 Never depend on the class-specific links back to the /sys/class
146 directory. These links are also a workaround for the design mistake
147 that class-devices are not created in /sys/devices. If a device
148 directory does not contain directories for child devices, these links
149 may be used to find the child devices in /sys/class. That is the single
150 valid use of these links, they must never appear in any path as an
151 element. Assuming the existence of these links for devices which are
152 real child device directories in the /sys/devices tree, is a bug in
153 the application.
154
155 It is planned to remove all these links when when all class-device
156 directories live in /sys/devices.
157
158- Position of devices along device chain can change.
159 Never depend on a specific parent device position in the devpath,
160 or the chain of parent devices. The kernel is free to insert devices into
161 the chain. You must always request the parent device you are looking for
162 by its subsystem value. You need to walk up the chain until you find
163 the device that matches the expected subsystem. Depending on a specific
164 position of a parent device, or exposing relative paths, using "../" to
165 access the chain of parents, is a bug in the application.
166