7 files changed, 517 insertions, 579 deletions
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX
index 153d84d281e6..d63f480afb74 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -96,9 +96,6 @@ routing.txt
        - the new routing mechanism
 shaper.txt
        - info on the module that can shape/limit transmitted traffic.
-sk98lin.txt
-        - Marvell Yukon Chipset / SysKonnect SK-98xx compliant Gigabit
-          Ethernet Adapter family driver info
 skfp.txt
        - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info.
 smc9.txt
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 8f6067ea5e3e..32c2e9da5f3a 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -880,8 +880,7 @@ accept_redirects - BOOLEAN
 accept_source_route - INTEGER
        Accept source routing (routing extension header).
-        > 0: Accept routing header.
+        >= 0: Accept only routing header type 2.
-        = 0: Accept only routing header type 2.
        < 0: Do not accept routing header.
        Default: 0
diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt
new file mode 100644
index 000000000000..2451f551c505
--- /dev/null
+++ b/Documentation/networking/l2tp.txt
@@ -0,0 +1,169 @@
+This brief document describes how to use the kernel's PPPoL2TP driver
+to provide L2TP functionality. L2TP is a protocol that tunnels one or
+more PPP sessions over a UDP tunnel. It is commonly used for VPNs
+(L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP
+network infrastructure.
+Design
+======
+The PPPoL2TP driver, drivers/net/pppol2tp.c, provides a mechanism by
+which PPP frames carried through an L2TP session are passed through
+the kernel's PPP subsystem. The standard PPP daemon, pppd, handles all
+PPP interaction with the peer. PPP network interfaces are created for
+each local PPP endpoint.
+The L2TP protocol http://www.faqs.org/rfcs/rfc2661.html defines L2TP
+control and data frames. L2TP control frames carry messages between
+L2TP clients/servers and are used to setup / teardown tunnels and
+sessions. An L2TP client or server is implemented in userspace and
+will use a regular UDP socket per tunnel. L2TP data frames carry PPP
+frames, which may be PPP control or PPP data. The kernel's PPP
+subsystem arranges for PPP control frames to be delivered to pppd,
+while data frames are forwarded as usual.
+Each tunnel and session within a tunnel is assigned a unique tunnel_id
+and session_id. These ids are carried in the L2TP header of every
+control and data packet. The pppol2tp driver uses them to lookup
+internal tunnel and/or session contexts. Zero tunnel / session ids are
+treated specially - zero ids are never assigned to tunnels or sessions
+in the network. In the driver, the tunnel context keeps a pointer to
+the tunnel UDP socket. The session context keeps a pointer to the
+PPPoL2TP socket, as well as other data that lets the driver interface
+to the kernel PPP subsystem.
+Note that the pppol2tp kernel driver handles only L2TP data frames;
+L2TP control frames are simply passed up to userspace in the UDP
+tunnel socket. The kernel handles all datapath aspects of the
+protocol, including data packet resequencing (if enabled).
+There are a number of requirements on the userspace L2TP daemon in
+order to use the pppol2tp driver.
+1. Use a UDP socket per tunnel.
+2. Create a single PPPoL2TP socket per tunnel bound to a special null
+   session id. This is used only for communicating with the driver but
+   must remain open while the tunnel is active. Opening this tunnel
+   management socket causes the driver to mark the tunnel socket as an
+   L2TP UDP encapsulation socket and flags it for use by the
+   referenced tunnel id. This hooks up the UDP receive path via
+   udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed
+   in this special PPPoX socket.
+3. Create a PPPoL2TP socket per L2TP session. This is typically done
+   by starting pppd with the pppol2tp plugin and appropriate
+   arguments. A PPPoL2TP tunnel management socket (Step 2) must be
+   created before the first PPPoL2TP session socket is created.
+When creating PPPoL2TP sockets, the application provides information
+to the driver about the socket in a socket connect() call. Source and
+destination tunnel and session ids are provided, as well as the file
+descriptor of a UDP socket. See struct pppol2tp_addr in
+include/linux/if_ppp.h. Note that zero tunnel / session ids are
+treated specially. When creating the per-tunnel PPPoL2TP management
+socket in Step 2 above, zero source and destination session ids are
+specified, which tells the driver to prepare the supplied UDP file
+descriptor for use as an L2TP tunnel socket.
+Userspace may control behavior of the tunnel or session using
+setsockopt and ioctl on the PPPoX socket. The following socket
+options are supported:-
+DEBUG     - bitmask of debug message categories. See below.
+SENDSEQ   - 0 => don't send packets with sequence numbers
+            1 => send packets with sequence numbers
+RECVSEQ   - 0 => receive packet sequence numbers are optional
+            1 => drop receive packets without sequence numbers
+LNSMODE   - 0 => act as LAC.
+            1 => act as LNS.
+REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder.
+Only the DEBUG option is supported by the special tunnel management
+PPPoX socket.
+In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided
+to retrieve tunnel and session statistics from the kernel using the
+PPPoX socket of the appropriate tunnel or session.
+Debugging
+=========
+The driver supports a flexible debug scheme where kernel trace
+messages may be optionally enabled per tunnel and per session. Care is
+needed when debugging a live system since the messages are not
+rate-limited and a busy system could be swamped. Userspace uses
+setsockopt on the PPPoX socket to set a debug mask.
+The following debug mask bits are available:
+PPPOL2TP_MSG_DEBUG    verbose debug (if compiled in)
+PPPOL2TP_MSG_CONTROL  userspace - kernel interface
+PPPOL2TP_MSG_SEQ      sequence numbers handling
+PPPOL2TP_MSG_DATA     data packets
+Sample Userspace Code
+=====================
+1. Create tunnel management PPPoX socket
+        kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
+        if (kernel_fd >= 0) {
+                struct sockaddr_pppol2tp sax;
+                struct sockaddr_in const *peer_addr;
+                peer_addr = l2tp_tunnel_get_peer_addr(tunnel);
+                memset(&sax, 0, sizeof(sax));
+                sax.sa_family = AF_PPPOX;
+                sax.sa_protocol = PX_PROTO_OL2TP;
+                sax.pppol2tp.fd = udp_fd;       /* fd of tunnel UDP socket */
+                sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr;
+                sax.pppol2tp.addr.sin_port = peer_addr->sin_port;
+                sax.pppol2tp.addr.sin_family = AF_INET;
+                sax.pppol2tp.s_tunnel = tunnel_id;
+                sax.pppol2tp.s_session = 0;     /* special case: mgmt socket */
+                sax.pppol2tp.d_tunnel = 0;
+                sax.pppol2tp.d_session = 0;     /* special case: mgmt socket */
+                if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) {
+                        perror("connect failed");
+                        result = -errno;
+                        goto err;
+                }
+        }
+2. Create session PPPoX data socket
+        struct sockaddr_pppol2tp sax;
+        int fd;
+        /* Note, the target socket must be bound already, else it will not be ready */
+        sax.sa_family = AF_PPPOX;
+        sax.sa_protocol = PX_PROTO_OL2TP;
+        sax.pppol2tp.fd = tunnel_fd;
+        sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr;
+        sax.pppol2tp.addr.sin_port = addr->sin_port;
+        sax.pppol2tp.addr.sin_family = AF_INET;
+        sax.pppol2tp.s_tunnel  = tunnel_id;
+        sax.pppol2tp.s_session = session_id;
+        sax.pppol2tp.d_tunnel  = peer_tunnel_id;
+        sax.pppol2tp.d_session = peer_session_id;
+        /* session_fd is the fd of the session's PPPoL2TP socket.
+         * tunnel_fd is the fd of the tunnel UDP socket.
+         */
+        fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax));
+        if (fd < 0 )    {
+                return -errno;
+        }
+        return 0;
+Miscellanous
+============
+The PPPoL2TP driver was developed as part of the OpenL2TP project by
+Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server,
+designed from the ground up to have the L2TP datapath in the
+kernel. The project also implemented the pppol2tp plugin for pppd
+which allows pppd to use the kernel driver. Details can be found at
+http://openl2tp.sourceforge.net.
diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt
new file mode 100644
index 000000000000..00b60cce2224
--- /dev/null
+++ b/Documentation/networking/multiqueue.txt
@@ -0,0 +1,111 @@
+                HOWTO for multiqueue network device support
+                ===========================================
+Section 1: Base driver requirements for implementing multiqueue support
+Section 2: Qdisc support for multiqueue devices
+Section 3: Brief howto using PRIO or RR for multiqueue devices
+Intro: Kernel support for multiqueue devices
+---------------------------------------------------------
+Kernel support for multiqueue devices is only an API that is presented to the
+netdevice layer for base drivers to implement.  This feature is part of the
+core networking stack, and all network devices will be running on the
+multiqueue-aware stack.  If a base driver only has one queue, then these
+changes are transparent to that driver.
+Section 1: Base driver requirements for implementing multiqueue support
+-----------------------------------------------------------------------
+Base drivers are required to use the new alloc_etherdev_mq() or
+alloc_netdev_mq() functions to allocate the subqueues for the device.  The
+underlying kernel API will take care of the allocation and deallocation of
+the subqueue memory, as well as netdev configuration of where the queues
+exist in memory.
+The base driver will also need to manage the queues as it does the global
+netdev->queue_lock today.  Therefore base drivers should use the
+netif_{start|stop|wake}_subqueue() functions to manage each queue while the
+device is still operational.  netdev->queue_lock is still used when the device
+comes online or when it's completely shut down (unregister_netdev(), etc.).
+Finally, the base driver should indicate that it is a multiqueue device.  The
+feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features
+bitmap on device initialization.  Below is an example from e1000:
+#ifdef CONFIG_E1000_MQ
+        if ( (adapter->hw.mac.type == e1000_82571) ||
+             (adapter->hw.mac.type == e1000_82572) ||
+             (adapter->hw.mac.type == e1000_80003es2lan))
+                netdev->features |= NETIF_F_MULTI_QUEUE;
+#endif
+Section 2: Qdisc support for multiqueue devices
+-----------------------------------------------
+Currently two qdiscs support multiqueue devices.  A new round-robin qdisc,
+sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to
+bands and queues, and will store the queue mapping into skb->queue_mapping.
+Use this field in the base driver to determine which queue to send the skb
+to.
+sch_rr has been added for hardware that doesn't want scheduling policies from
+software, so it's a straight round-robin qdisc.  It uses the same syntax and
+classification priomap that sch_prio uses, so it should be intuitive to
+configure for people who've used sch_prio.
+The PRIO qdisc naturally plugs into a multiqueue device.  If PRIO has been
+built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of
+bands requested is equal to the number of queues on the hardware.  If they
+are equal, it sets a one-to-one mapping up between the queues and bands.  If
+they're not equal, it will not load the qdisc.  This is the same behavior
+for RR.  Once the association is made, any skb that is classified will have
+skb->queue_mapping set, which will allow the driver to properly queue skb's
+to multiple queues.
+Section 3: Brief howto using PRIO and RR for multiqueue devices
+---------------------------------------------------------------
+The userspace command 'tc,' part of the iproute2 package, is used to configure
+qdiscs.  To add the PRIO qdisc to your network device, assuming the device is
+called eth0, run the following command:
+# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue
+This will create 4 bands, 0 being highest priority, and associate those bands
+to the queues on your NIC.  Assuming eth0 has 4 Tx queues, the band mapping
+would look like:
+band 0 => queue 0
+band 1 => queue 1
+band 2 => queue 2
+band 3 => queue 3
+Traffic will begin flowing through each queue if your TOS values are assigning
+traffic across the various bands.  For example, ssh traffic will always try to
+go out band 0 based on TOS -> Linux priority conversion (realtime traffic),
+so it will be sent out queue 0.  ICMP traffic (pings) fall into the "normal"
+traffic classification, which is band 1.  Therefore pings will be send out
+queue 1 on the NIC.
+Note the use of the multiqueue keyword.  This is only in versions of iproute2
+that support multiqueue networking devices; if this is omitted when loading
+a qdisc onto a multiqueue device, the qdisc will load and operate the same
+if it were loaded onto a single-queue device (i.e. - sends all traffic to
+queue 0).
+Another alternative to multiqueue band allocation can be done by using the
+multiqueue option and specify 0 bands.  If this is the case, the qdisc will
+allocate the number of bands to equal the number of queues that the device
+reports, and bring the qdisc online.
+The behavior of tc filters remains the same, where it will override TOS priority
+classification.
+Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt
index ce1361f95243..37869295fc70 100644
--- a/Documentation/networking/netdevices.txt
+++ b/Documentation/networking/netdevices.txt
@@ -20,6 +20,30 @@ private data which gets freed when the network device is freed. If
 separately allocated data is attached to the network device
 (dev->priv) then it is up to the module exit handler to free that.
+MTU
+===
+Each network device has a Maximum Transfer Unit. The MTU does not
+include any link layer protocol overhead. Upper layer protocols must
+not pass a socket buffer (skb) to a device to transmit with more data
+than the mtu. The MTU does not include link layer header overhead, so
+for example on Ethernet if the standard MTU is 1500 bytes used, the
+actual skb will contain up to 1514 bytes because of the Ethernet
+header. Devices should allow for the 4 byte VLAN header as well.
+Segmentation Offload (GSO, TSO) is an exception to this rule.  The
+upper layer protocol may pass a large socket buffer to the device
+transmit routine, and the device will break that up into separate
+packets based on the current MTU.
+MTU is symmetrical and applies both to receive and transmit. A device
+must be able to receive at least the maximum size packet allowed by
+the MTU. A network device may use the MTU as mechanism to size receive
+buffers, but the device should allow packets with VLAN header. With
+standard Ethernet mtu of 1500 bytes, the device should allow up to
+1518 byte packets (1500 + 14 header + 4 tag).  The device may either:
+drop, truncate, or pass up oversize packets, but dropping oversize
+packets is preferred.
 struct net_device synchronization rules
 =======================================
@@ -43,16 +67,17 @@ dev->get_stats:
 dev->hard_start_xmit:
        Synchronization: netif_tx_lock spinlock.
        When the driver sets NETIF_F_LLTX in dev->features this will be
        called without holding netif_tx_lock. In this case the driver
        has to lock by itself when needed. It is recommended to use a try lock
-        for this and return -1 when the spin lock fails. 
+        for this and return NETDEV_TX_LOCKED when the spin lock fails.
        The locking there should also properly protect against 
-        set_multicast_list
+        set_multicast_list.
-        Context: Process with BHs disabled or BH (timer).
-        Notes: netif_queue_stopped() is guaranteed false
+        Context: Process with BHs disabled or BH (timer),
-               Interrupts must be enabled when calling hard_start_xmit.
+                 will be called with interrupts disabled by netconsole.
-                (Interrupts must also be enabled when enabling the BH handler.)
        Return codes: 
        o NETDEV_TX_OK everything ok. 
        o NETDEV_TX_BUSY Cannot transmit packet, try later 
@@ -74,4 +99,5 @@ dev->poll:
        Synchronization: __LINK_STATE_RX_SCHED bit in dev->state.  See
                dev_close code and comments in net/core/dev.c for more info.
        Context: softirq
+                 will be called with interrupts disabled by netconsole.
diff --git a/Documentation/networking/sk98lin.txt b/Documentation/networking/sk98lin.txt
deleted file mode 100644
index 8590a954df1d..000000000000
--- a/Documentation/networking/sk98lin.txt
+++ /dev/null
@@ -1,568 +0,0 @@
-(C)Copyright 1999-2004 Marvell(R).
-All rights reserved
-===========================================================================
-sk98lin.txt created 13-Feb-2004
-Readme File for sk98lin v6.23
-Marvell Yukon/SysKonnect SK-98xx Gigabit Ethernet Adapter family driver for LINUX
-This file contains
- 1  Overview
- 2  Required Files
- 3  Installation
-    3.1  Driver Installation
-    3.2  Inclusion of adapter at system start
- 4  Driver Parameters
-    4.1  Per-Port Parameters
-    4.2  Adapter Parameters
- 5  Large Frame Support
- 6  VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad)
- 7  Troubleshooting
-===========================================================================
-1  Overview
-===========
-The sk98lin driver supports the Marvell Yukon and SysKonnect 
-SK-98xx/SK-95xx compliant Gigabit Ethernet Adapter on Linux. It has 
-been tested with Linux on Intel/x86 machines.
-***
-2  Required Files
-=================
-The linux kernel source.
-No additional files required.
-***
-3  Installation
-===============
-It is recommended to download the latest version of the driver from the 
-SysKonnect web site www.syskonnect.com. If you have downloaded the latest
-driver, the Linux kernel has to be patched before the driver can be 
-installed. For details on how to patch a Linux kernel, refer to the 
-patch.txt file.
-3.1  Driver Installation
------------------------
-The following steps describe the actions that are required to install
-the driver and to start it manually. These steps should be carried
-out for the initial driver setup. Once confirmed to be ok, they can
-be included in the system start.
-NOTE 1: To perform the following tasks you need 'root' access.
-NOTE 2: In case of problems, please read the section "Troubleshooting" 
-        below.
-The driver can either be integrated into the kernel or it can be compiled 
-as a module. Select the appropriate option during the kernel 
-configuration.
-Compile/use the driver as a module
----------------------------------
-To compile the driver, go to the directory /usr/src/linux and
-execute the command "make menuconfig" or "make xconfig" and proceed as 
-follows:
-To integrate the driver permanently into the kernel, proceed as follows:
-1. Select the menu "Network device support" and then "Ethernet(1000Mbit)"
-2. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" 
-   with (*) 
-3. Build a new kernel when the configuration of the above options is 
-   finished.
-4. Install the new kernel.
-5. Reboot your system.
-To use the driver as a module, proceed as follows:
-1. Enable 'loadable module support' in the kernel.
-2. For automatic driver start, enable the 'Kernel module loader'.
-3. Select the menu "Network device support" and then "Ethernet(1000Mbit)"
-4. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" 
-   with (M)
-5. Execute the command "make modules".
-6. Execute the command "make modules_install".
-   The appropriate modules will be installed.
-7. Reboot your system.
-Load the module manually
------------------------
-To load the module manually, proceed as follows:
-1. Enter "modprobe sk98lin".
-2. If a Marvell Yukon or SysKonnect SK-98xx adapter is installed in 
-   your computer and you have a /proc file system, execute the command:
-   "ls /proc/net/sk98lin/" 
-   This should produce an output containing a line with the following 
-   format:
-   eth0   eth1  ...
-   which indicates that your adapter has been found and initialized.
-   
-   NOTE 1: If you have more than one Marvell Yukon or SysKonnect SK-98xx 
-           adapter installed, the adapters will be listed as 'eth0', 
-                   'eth1', 'eth2', etc.
-                   For each adapter, repeat steps 3 and 4 below.
-   NOTE 2: If you have other Ethernet adapters installed, your Marvell
-           Yukon or SysKonnect SK-98xx adapter will be mapped to the 
-                   next available number, e.g. 'eth1'. The mapping is executed 
-                   automatically.
-           The module installation message (displayed either in a system
-           log file or on the console) prints a line for each adapter 
-           found containing the corresponding 'ethX'.
-3. Select an IP address and assign it to the respective adapter by 
-   entering:
-   ifconfig eth0 <ip-address>
-   With this command, the adapter is connected to the Ethernet. 
-   
-   SK-98xx Gigabit Ethernet Server Adapters: The yellow LED on the adapter 
-   is now active, the link status LED of the primary port is active and 
-   the link status LED of the secondary port (on dual port adapters) is 
-   blinking (if the ports are connected to a switch or hub).
-   SK-98xx V2.0 Gigabit Ethernet Adapters: The link status LED is active.
-   In addition, you will receive a status message on the console stating
-   "ethX: network connection up using port Y" and showing the selected 
-   connection parameters (x stands for the ethernet device number 
-   (0,1,2, etc), y stands for the port name (A or B)).
-   NOTE: If you are in doubt about IP addresses, ask your network
-         administrator for assistance.
-  
-4. Your adapter should now be fully operational.
-   Use 'ping <otherstation>' to verify the connection to other computers 
-   on your network.
-5. To check the adapter configuration view /proc/net/sk98lin/[devicename].
-   For example by executing:    
-   "cat /proc/net/sk98lin/eth0" 
-Unload the module
-----------------
-To stop and unload the driver modules, proceed as follows:
-1. Execute the command "ifconfig eth0 down".
-2. Execute the command "rmmod sk98lin".
-3.2  Inclusion of adapter at system start
-----------------------------------------
-Since a large number of different Linux distributions are 
-available, we are unable to describe a general installation procedure
-for the driver module.
-Because the driver is now integrated in the kernel, installation should
-be easy, using the standard mechanism of your distribution.
-Refer to the distribution's manual for installation of ethernet adapters.
-***
-4  Driver Parameters
-====================
-Parameters can be set at the command line after the module has been 
-loaded with the command 'modprobe'.
-In some distributions, the configuration tools are able to pass parameters
-to the driver module.
-If you use the kernel module loader, you can set driver parameters
-in the file /etc/modprobe.conf (or /etc/modules.conf in 2.4 or earlier).
-To set the driver parameters in this file, proceed as follows:
-1. Insert a line of the form :
-   options sk98lin ...
-   For "...", the same syntax is required as described for the command
-   line parameters of modprobe below.
-2. To activate the new parameters, either reboot your computer
-   or 
-   unload and reload the driver.
-   The syntax of the driver parameters is:
-        modprobe sk98lin parameter=value1[,value2[,value3...]]
-   where value1 refers to the first adapter, value2 to the second etc.
-NOTE: All parameters are case sensitive. Write them exactly as shown 
-      below.
-Example:
-Suppose you have two adapters. You want to set auto-negotiation
-on the first adapter to ON and on the second adapter to OFF.
-You also want to set DuplexCapabilities on the first adapter
-to FULL, and on the second adapter to HALF.
-Then, you must enter:
-        modprobe sk98lin AutoNeg_A=On,Off DupCap_A=Full,Half
-NOTE: The number of adapters that can be configured this way is
-      limited in the driver (file skge.c, constant SK_MAX_CARD_PARAM).
-      The current limit is 16. If you happen to install
-      more adapters, adjust this and recompile.
-4.1  Per-Port Parameters
------------------------
-These settings are available for each port on the adapter.
-In the following description, '?' stands for the port for
-which you set the parameter (A or B).
-Speed
-----
-Parameter:    Speed_?
-Values:       10, 100, 1000, Auto
-Default:      Auto
-This parameter is used to set the speed capabilities. It is only valid 
-for the SK-98xx V2.0 copper adapters.
-Usually, the speed is negotiated between the two ports during link 
-establishment. If this fails, a port can be forced to a specific setting
-with this parameter.
-Auto-Negotiation
----------------
-Parameter:    AutoNeg_?
-Values:       On, Off, Sense
-Default:      On
-  
-The "Sense"-mode automatically detects whether the link partner supports
-auto-negotiation or not.
-Duplex Capabilities
-------------------
-Parameter:    DupCap_?
-Values:       Half, Full, Both
-Default:      Both
-This parameters is only relevant if auto-negotiation for this port is 
-not set to "Sense". If auto-negotiation is set to "On", all three values
-are possible. If it is set to "Off", only "Full" and "Half" are allowed.
-This parameter is useful if your link partner does not support all
-possible combinations.
-Flow Control
------------
-Parameter:    FlowCtrl_?
-Values:       Sym, SymOrRem, LocSend, None
-Default:      SymOrRem
-This parameter can be used to set the flow control capabilities the 
-port reports during auto-negotiation. It can be set for each port 
-individually.
-Possible modes:
-   -- Sym      = Symmetric: both link partners are allowed to send 
-                  PAUSE frames
-   -- SymOrRem = SymmetricOrRemote: both or only remote partner 
-                  are allowed to send PAUSE frames
-   -- LocSend  = LocalSend: only local link partner is allowed 
-                  to send PAUSE frames
-   -- None     = no link partner is allowed to send PAUSE frames
-  
-NOTE: This parameter is ignored if auto-negotiation is set to "Off".
-Role in Master-Slave-Negotiation (1000Base-T only)
--------------------------------------------------
-Parameter:    Role_?
-Values:       Auto, Master, Slave
-Default:      Auto
-This parameter is only valid for the SK-9821 and SK-9822 adapters.
-For two 1000Base-T ports to communicate, one must take the role of the
-master (providing timing information), while the other must be the 
-slave. Usually, this is negotiated between the two ports during link 
-establishment. If this fails, a port can be forced to a specific setting
-with this parameter.
-4.2  Adapter Parameters
-----------------------
-Connection Type (SK-98xx V2.0 copper adapters only)
---------------
-Parameter:    ConType
-Values:       Auto, 100FD, 100HD, 10FD, 10HD
-Default:      Auto
-The parameter 'ConType' is a combination of all five per-port parameters
-within one single parameter. This simplifies the configuration of both ports
-of an adapter card! The different values of this variable reflect the most 
-meaningful combinations of port parameters.
-The following table shows the values of 'ConType' and the corresponding
-combinations of the per-port parameters:
-    ConType   |  DupCap   AutoNeg   FlowCtrl   Role             Speed
-    ----------+------------------------------------------------------
-    Auto      |  Both     On        SymOrRem   Auto             Auto
-    100FD     |  Full     Off       None       Auto (ignored)   100
-    100HD     |  Half     Off       None       Auto (ignored)   100
-    10FD      |  Full     Off       None       Auto (ignored)   10
-    10HD      |  Half     Off       None       Auto (ignored)   10
-Stating any other port parameter together with this 'ConType' variable
-will result in a merged configuration of those settings. This due to 
-the fact, that the per-port parameters (e.g. Speed_? ) have a higher
-priority than the combined variable 'ConType'.
-NOTE: This parameter is always used on both ports of the adapter card.
-Interrupt Moderation
--------------------
-Parameter:    Moderation
-Values:       None, Static, Dynamic
-Default:      None
-Interrupt moderation is employed to limit the maximum number of interrupts
-the driver has to serve. That is, one or more interrupts (which indicate any
-transmit or receive packet to be processed) are queued until the driver 
-processes them. When queued interrupts are to be served, is determined by the
-'IntsPerSec' parameter, which is explained later below.
-Possible modes:
-   -- None - No interrupt moderation is applied on the adapter card. 
-      Therefore, each transmit or receive interrupt is served immediately
-      as soon as it appears on the interrupt line of the adapter card.
-   -- Static - Interrupt moderation is applied on the adapter card. 
-      All transmit and receive interrupts are queued until a complete
-      moderation interval ends. If such a moderation interval ends, all
-      queued interrupts are processed in one big bunch without any delay.
-      The term 'static' reflects the fact, that interrupt moderation is
-      always enabled, regardless how much network load is currently 
-      passing via a particular interface. In addition, the duration of
-      the moderation interval has a fixed length that never changes while
-      the driver is operational.
-   -- Dynamic - Interrupt moderation might be applied on the adapter card,
-      depending on the load of the system. If the driver detects that the
-      system load is too high, the driver tries to shield the system against 
-      too much network load by enabling interrupt moderation. If - at a later
-      time - the CPU utilization decreases again (or if the network load is 
-      negligible) the interrupt moderation will automatically be disabled.
-Interrupt moderation should be used when the driver has to handle one or more
-interfaces with a high network load, which - as a consequence - leads also to a
-high CPU utilization. When moderation is applied in such high network load 
-situations, CPU load might be reduced by 20-30%.
-NOTE: The drawback of using interrupt moderation is an increase of the round-
-trip-time (RTT), due to the queueing and serving of interrupts at dedicated
-moderation times.
-Interrupts per second
---------------------
-Parameter:    IntsPerSec
-Values:       30...40000 (interrupts per second)
-Default:      2000
-This parameter is only used if either static or dynamic interrupt moderation
-is used on a network adapter card. Using this parameter if no moderation is
-applied will lead to no action performed.
-This parameter determines the length of any interrupt moderation interval. 
-Assuming that static interrupt moderation is to be used, an 'IntsPerSec' 
-parameter value of 2000 will lead to an interrupt moderation interval of
-500 microseconds. 
-NOTE: The duration of the moderation interval is to be chosen with care.
-At first glance, selecting a very long duration (e.g. only 100 interrupts per 
-second) seems to be meaningful, but the increase of packet-processing delay 
-is tremendous. On the other hand, selecting a very short moderation time might
-compensate the use of any moderation being applied.
-Preferred Port
--------------
-Parameter:    PrefPort
-Values:       A, B
-Default:      A
-This is used to force the preferred port to A or B (on dual-port network 
-adapters). The preferred port is the one that is used if both are detected
-as fully functional.
-RLMT Mode (Redundant Link Management Technology)
------------------------------------------------
-Parameter:    RlmtMode
-Values:       CheckLinkState,CheckLocalPort, CheckSeg, DualNet
-Default:      CheckLinkState
-RLMT monitors the status of the port. If the link of the active port 
-fails, RLMT switches immediately to the standby link. The virtual link is 
-maintained as long as at least one 'physical' link is up. 
-Possible modes:
-   -- CheckLinkState - Check link state only: RLMT uses the link state
-      reported by the adapter hardware for each individual port to 
-      determine whether a port can be used for all network traffic or 
-      not.
-   -- CheckLocalPort - In this mode, RLMT monitors the network path 
-      between the two ports of an adapter by regularly exchanging packets
-      between them. This mode requires a network configuration in which 
-      the two ports are able to "see" each other (i.e. there must not be 
-      any router between the ports).
-   -- CheckSeg - Check local port and segmentation: This mode supports the
-      same functions as the CheckLocalPort mode and additionally checks 
-      network segmentation between the ports. Therefore, this mode is only
-      to be used if Gigabit Ethernet switches are installed on the network
-      that have been configured to use the Spanning Tree protocol. 
-   -- DualNet - In this mode, ports A and B are used as separate devices. 
-      If you have a dual port adapter, port A will be configured as eth0 
-      and port B as eth1. Both ports can be used independently with 
-      distinct IP addresses. The preferred port setting is not used. 
-      RLMT is turned off.
-   
-NOTE: RLMT modes CLP and CLPSS are designed to operate in configurations 
-      where a network path between the ports on one adapter exists. 
-      Moreover, they are not designed to work where adapters are connected
-      back-to-back.
-***
-5  Large Frame Support
-======================
-The driver supports large frames (also called jumbo frames). Using large 
-frames can result in an improved throughput if transferring large amounts 
-of data.
-To enable large frames, set the MTU (maximum transfer unit) of the 
-interface to the desired value (up to 9000), execute the following 
-command:
-      ifconfig eth0 mtu 9000
-This will only work if you have two adapters connected back-to-back
-or if you use a switch that supports large frames. When using a switch, 
-it should be configured to allow large frames and auto-negotiation should  
-be set to OFF. The setting must be configured on all adapters that can be 
-reached by the large frames. If one adapter is not set to receive large 
-frames, it will simply drop them.
-You can switch back to the standard ethernet frame size by executing the 
-following command:
-      ifconfig eth0 mtu 1500
-To permanently configure this setting, add a script with the 'ifconfig' 
-line to the system startup sequence (named something like "S99sk98lin" 
-in /etc/rc.d/rc2.d).
-***
-6  VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad)
-==================================================================
-The Marvell Yukon/SysKonnect Linux drivers are able to support VLAN and 
-Link Aggregation according to IEEE standards 802.1, 802.1q, and 802.3ad. 
-These features are only available after installation of open source 
-modules available on the Internet:
-For VLAN go to: http://www.candelatech.com/~greear/vlan.html
-For Link Aggregation go to: http://www.st.rim.or.jp/~yumo
-NOTE: SysKonnect GmbH does not offer any support for these open source 
-      modules and does not take the responsibility for any kind of 
-      failures or problems arising in connection with these modules.
-NOTE: Configuring Link Aggregation on a SysKonnect dual link adapter may 
-      cause problems when unloading the driver.
-7  Troubleshooting
-==================
-If any problems occur during the installation process, check the 
-following list:
-Problem:  The SK-98xx adapter cannot be found by the driver.
-Solution: In /proc/pci search for the following entry:
-             'Ethernet controller: SysKonnect SK-98xx ...'
-          If this entry exists, the SK-98xx or SK-98xx V2.0 adapter has 
-          been found by the system and should be operational.
-          If this entry does not exist or if the file '/proc/pci' is not 
-          found, there may be a hardware problem or the PCI support may 
-          not be enabled in your kernel.
-          The adapter can be checked using the diagnostics program which 
-          is available on the SysKonnect web site:
-          www.syskonnect.com
-          
-          Some COMPAQ machines have problems dealing with PCI under Linux.
-          This problem is described in the 'PCI howto' document
-          (included in some distributions or available from the
-          web, e.g. at 'www.linux.org'). 
-Problem:  Programs such as 'ifconfig' or 'route' cannot be found or the 
-          error message 'Operation not permitted' is displayed.
-Reason:   You are not logged in as user 'root'.
-Solution: Logout and login as 'root' or change to 'root' via 'su'.
-Problem:  Upon use of the command 'ping <address>' the message
-          "ping: sendto: Network is unreachable" is displayed.
-Reason:   Your route is not set correctly.
-Solution: If you are using RedHat, you probably forgot to set up the 
-          route in the 'network configuration'.
-          Check the existing routes with the 'route' command and check 
-          if an entry for 'eth0' exists, and if so, if it is set correctly.
-Problem:  The driver can be started, the adapter is connected to the 
-          network, but you cannot receive or transmit any packets; 
-          e.g. 'ping' does not work.
-Reason:   There is an incorrect route in your routing table.
-Solution: Check the routing table with the command 'route' and read the 
-          manual help pages dealing with routes (enter 'man route').
-NOTE: Although the 2.2.x kernel versions generate the routing entry 
-      automatically, problems of this kind may occur here as well. We've 
-      come across a situation in which the driver started correctly at 
-      system start, but after the driver has been removed and reloaded,
-      the route of the adapter's network pointed to the 'dummy0'device 
-      and had to be corrected manually.
-Problem:  Your computer should act as a router between multiple 
-          IP subnetworks (using multiple adapters), but computers in 
-          other subnetworks cannot be reached.
-Reason:   Either the router's kernel is not configured for IP forwarding 
-          or the routing table and gateway configuration of at least one 
-          computer is not working.
-Problem:  Upon driver start, the following error message is displayed:
-          "eth0: -- ERROR --
-          Class: internal Software error
-          Nr:    0xcc
-          Msg:   SkGeInitPort() cannot init running ports"
-Reason:   You are using a driver compiled for single processor machines 
-          on a multiprocessor machine with SMP (Symmetric MultiProcessor) 
-          kernel.
-Solution: Configure your kernel appropriately and recompile the kernel or
-          the modules.
-If your problem is not listed here, please contact SysKonnect's technical
-support for help (linux@syskonnect.de).
-When contacting our technical support, please ensure that the following 
-information is available:
- System Manufacturer and HW Informations (CPU, Memory... )
- PCI-Boards in your system
- Distribution
- Kernel version
- Driver version
-***
-***End of Readme File***
diff --git a/Documentation/networking/spider_net.txt b/Documentation/networking/spider_net.txt
new file mode 100644
index 000000000000..4b4adb8eb14f
--- /dev/null
+++ b/Documentation/networking/spider_net.txt
@@ -0,0 +1,204 @@
+            The Spidernet Device Driver
+            ===========================
+Written by Linas Vepstas <linas@austin.ibm.com>
+Version of 7 June 2007
+Abstract
+========
+This document sketches the structure of portions of the spidernet
+device driver in the Linux kernel tree. The spidernet is a gigabit
+ethernet device built into the Toshiba southbridge commonly used
+in the SONY Playstation 3 and the IBM QS20 Cell blade.
+The Structure of the RX Ring.
+=============================
+The receive (RX) ring is a circular linked list of RX descriptors,
+together with three pointers into the ring that are used to manage its
+contents.
+The elements of the ring are called "descriptors" or "descrs"; they
+describe the received data. This includes a pointer to a buffer
+containing the received data, the buffer size, and various status bits.
+There are three primary states that a descriptor can be in: "empty",
+"full" and "not-in-use".  An "empty" or "ready" descriptor is ready
+to receive data from the hardware. A "full" descriptor has data in it,
+and is waiting to be emptied and processed by the OS. A "not-in-use"
+descriptor is neither empty or full; it is simply not ready. It may
+not even have a data buffer in it, or is otherwise unusable.
+During normal operation, on device startup, the OS (specifically, the
+spidernet device driver) allocates a set of RX descriptors and RX
+buffers. These are all marked "empty", ready to receive data. This
+ring is handed off to the hardware, which sequentially fills in the
+buffers, and marks them "full". The OS follows up, taking the full
+buffers, processing them, and re-marking them empty.
+This filling and emptying is managed by three pointers, the "head"
+and "tail" pointers, managed by the OS, and a hardware current
+descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
+currently being filled. When this descr is filled, the hardware
+marks it full, and advances the GDACTDPA by one.  Thus, when there is
+flowing RX traffic, every descr behind it should be marked "full",
+and everything in front of it should be "empty".  If the hardware
+discovers that the current descr is not empty, it will signal an
+interrupt, and halt processing.
+The tail pointer tails or trails the hardware pointer. When the
+hardware is ahead, the tail pointer will be pointing at a "full"
+descr. The OS will process this descr, and then mark it "not-in-use",
+and advance the tail pointer.  Thus, when there is flowing RX traffic,
+all of the descrs in front of the tail pointer should be "full", and
+all of those behind it should be "not-in-use". When RX traffic is not
+flowing, then the tail pointer can catch up to the hardware pointer.
+The OS will then note that the current tail is "empty", and halt
+processing.
+The head pointer (somewhat mis-named) follows after the tail pointer.
+When traffic is flowing, then the head pointer will be pointing at
+a "not-in-use" descr. The OS will perform various housekeeping duties
+on this descr. This includes allocating a new data buffer and
+dma-mapping it so as to make it visible to the hardware. The OS will
+then mark the descr as "empty", ready to receive data. Thus, when there
+is flowing RX traffic, everything in front of the head pointer should
+be "not-in-use", and everything behind it should be "empty". If no
+RX traffic is flowing, then the head pointer can catch up to the tail
+pointer, at which point the OS will notice that the head descr is
+"empty", and it will halt processing.
+Thus, in an idle system, the GDACTDPA, tail and head pointers will
+all be pointing at the same descr, which should be "empty". All of the
+other descrs in the ring should be "empty" as well.
+The show_rx_chain() routine will print out the the locations of the
+GDACTDPA, tail and head pointers. It will also summarize the contents
+of the ring, starting at the tail pointer, and listing the status
+of the descrs that follow.
+A typical example of the output, for a nearly idle system, might be
+net eth1: Total number of descrs=256
+net eth1: Chain tail located at descr=20
+net eth1: Chain head is at 20
+net eth1: HW curr desc (GDACTDPA) is at 21
+net eth1: Have 1 descrs with stat=x40800101
+net eth1: HW next desc (GDACNEXTDA) is at 22
+net eth1: Last 255 descrs with stat=xa0800000
+In the above, the hardware has filled in one descr, number 20. Both
+head and tail are pointing at 20, because it has not yet been emptied.
+Meanwhile, hw is pointing at 21, which is free.
+The "Have nnn decrs" refers to the descr starting at the tail: in this
+case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers
+to all of the rest of the descrs, from the last status change. The "nnn"
+is a count of how many descrs have exactly the same status.
+The status x4... corresponds to "full" and status xa... corresponds
+to "empty". The actual value printed is RXCOMST_A.
+In the device driver source code, a different set of names are
+used for these same concepts, so that
+"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
+"full"  == SPIDER_NET_DESCR_FRAME_END == 0x4
+"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
+The RX RAM full bug/feature
+===========================
+As long as the OS can empty out the RX buffers at a rate faster than
+the hardware can fill them, there is no problem. If, for some reason,
+the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
+pointer will catch up to the head, notice the not-empty condition,
+ad stop. However, RX packets may still continue arriving on the wire.
+The spidernet chip can save some limited number of these in local RAM.
+When this local ram fills up, the spider chip will issue an interrupt
+indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
+will be set in GHIINT1STS).  When the RX ram full condition occurs,
+a certain bug/feature is triggered that has to be specially handled.
+This section describes the special handling for this condition.
+When the OS finally has a chance to run, it will empty out the RX ring.
+In particular, it will clear the descriptor on which the hardware had
+stopped. However, once the hardware has decided that a certain
+descriptor is invalid, it will not restart at that descriptor; instead
+it will restart at the next descr. This potentially will lead to a
+deadlock condition, as the tail pointer will be pointing at this descr,
+which, from the OS point of view, is empty; the OS will be waiting for
+this descr to be filled. However, the hardware has skipped this descr,
+and is filling the next descrs. Since the OS doesn't see this, there
+is a potential deadlock, with the OS waiting for one descr to fill,
+while the hardware is waiting for a different set of descrs to become
+empty.
+A call to show_rx_chain() at this point indicates the nature of the
+problem. A typical print when the network is hung shows the following:
+net eth1: Spider RX RAM full, incoming packets might be discarded!
+net eth1: Total number of descrs=256
+net eth1: Chain tail located at descr=255
+net eth1: Chain head is at 255
+net eth1: HW curr desc (GDACTDPA) is at 0
+net eth1: Have 1 descrs with stat=xa0800000
+net eth1: HW next desc (GDACNEXTDA) is at 1
+net eth1: Have 127 descrs with stat=x40800101
+net eth1: Have 1 descrs with stat=x40800001
+net eth1: Have 126 descrs with stat=x40800101
+net eth1: Last 1 descrs with stat=xa0800000
+Both the tail and head pointers are pointing at descr 255, which is
+marked xa... which is "empty". Thus, from the OS point of view, there
+is nothing to be done. In particular, there is the implicit assumption
+that everything in front of the "empty" descr must surely also be empty,
+as explained in the last section. The OS is waiting for descr 255 to
+become non-empty, which, in this case, will never happen.
+The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
+Since its already full, the hardware can do nothing more, and thus has
+halted processing. Notice that descrs 0 through 254 are all marked
+"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
+descr 254, since tail was at 255.) Thus, the system is deadlocked,
+and there can be no forward progress; the OS thinks there's nothing
+to do, and the hardware has nowhere to put incoming data.
+This bug/feature is worked around with the spider_net_resync_head_ptr()
+routine. When the driver receives RX interrupts, but an examination
+of the RX chain seems to show it is empty, then it is probable that
+the hardware has skipped a descr or two (sometimes dozens under heavy
+network conditions). The spider_net_resync_head_ptr() subroutine will
+search the ring for the next full descr, and the driver will resume
+operations there.  Since this will leave "holes" in the ring, there
+is also a spider_net_resync_tail_ptr() that will skip over such holes.
+As of this writing, the spider_net_resync() strategy seems to work very
+well, even under heavy network loads.
+The TX ring
+===========
+The TX ring uses a low-watermark interrupt scheme to make sure that
+the TX queue is appropriately serviced for large packet sizes.
+For packet sizes greater than about 1KBytes, the kernel can fill
+the TX ring quicker than the device can drain it. Once the ring
+is full, the netdev is stopped. When there is room in the ring,
+the netdev needs to be reawakened, so that more TX packets are placed
+in the ring. The hardware can empty the ring about four times per jiffy,
+so its not appropriate to wait for the poll routine to refill, since
+the poll routine runs only once per jiffy.  The low-watermark mechanism
+marks a descr about 1/4th of the way from the bottom of the queue, so
+that an interrupt is generated when the descr is processed. This
+interrupt wakes up the netdev, which can then refill the queue.
+For large packets, this mechanism generates a relatively small number
+of interrupts, about 1K/sec. For smaller packets, this will drop to zero
+interrupts, as the hardware can empty the queue faster than the kernel
+can fill it.
+ ======= END OF DOCUMENT ========