aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@woody.linux-foundation.org>2007-07-12 16:31:22 -0400
committerLinus Torvalds <torvalds@woody.linux-foundation.org>2007-07-12 16:31:22 -0400
commite1bd2ac5a6b7a8b625e40c9e9f8b6dea4cf22f85 (patch)
tree9366e9fb481da2c7195ca3f2bafeffebbf001363 /Documentation
parent0b9062f6b57a87f22309c6b920a51aaa66ce2a13 (diff)
parent15028aad00ddf241581fbe74a02ec89cbb28d35d (diff)
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (183 commits) [TG3]: Update version to 3.78. [TG3]: Add missing NVRAM strapping. [TG3]: Enable auto MDI. [TG3]: Fix the polarity bit. [TG3]: Fix irq_sync race condition. [NET_SCHED]: ematch: module autoloading [TCP]: tcp probe wraparound handling and other changes [RTNETLINK]: rtnl_link: allow specifying initial device address [RTNETLINK]: rtnl_link API simplification [VLAN]: Fix MAC address handling [ETH]: Validate address in eth_mac_addr [NET]: Fix races in net_rx_action vs netpoll. [AF_UNIX]: Rewrite garbage collector, fixes race. [NETFILTER]: {ip, nf}_conntrack_sctp: fix remotely triggerable NULL ptr dereference (CVE-2007-2876) [NET]: Make all initialized struct seq_operations const. [UDP]: Fix length check. [IPV6]: Remove unneeded pointer idev from addrconf_cleanup(). [DECNET]: Another unnecessary net/tcp.h inclusion in net/dn.h [IPV6]: Make IPV6_{RECV,2292}RTHDR boolean options. [IPV6]: Do not send RH0 anymore. ... Fixed up trivial conflict in Documentation/feature-removal-schedule.txt manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/feature-removal-schedule.txt27
-rw-r--r--Documentation/networking/ip-sysctl.txt3
-rw-r--r--Documentation/networking/l2tp.txt169
-rw-r--r--Documentation/networking/multiqueue.txt111
-rw-r--r--Documentation/networking/netdevices.txt38
5 files changed, 321 insertions, 27 deletions
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 281458b47d75..0599a0c7c026 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -262,25 +262,6 @@ Who: Richard Purdie <rpurdie@rpsys.net>
262 262
263--------------------------- 263---------------------------
264 264
265What: Multipath cached routing support in ipv4
266When: in 2.6.23
267Why: Code was merged, then submitter immediately disappeared leaving
268 us with no maintainer and lots of bugs. The code should not have
269 been merged in the first place, and many aspects of it's
270 implementation are blocking more critical core networking
271 development. It's marked EXPERIMENTAL and no distribution
272 enables it because it cause obscure crashes due to unfixable bugs
273 (interfaces don't return errors so memory allocation can't be
274 handled, calling contexts of these interfaces make handling
275 errors impossible too because they get called after we've
276 totally commited to creating a route object, for example).
277 This problem has existed for years and no forward progress
278 has ever been made, and nobody steps up to try and salvage
279 this code, so we're going to finally just get rid of it.
280Who: David S. Miller <davem@davemloft.net>
281
282---------------------------
283
284What: read_dev_chars(), read_conf_data{,_lpm}() (s390 common I/O layer) 265What: read_dev_chars(), read_conf_data{,_lpm}() (s390 common I/O layer)
285When: December 2007 266When: December 2007
286Why: These functions are a leftover from 2.4 times. They have several 267Why: These functions are a leftover from 2.4 times. They have several
@@ -337,3 +318,11 @@ Who: Jean Delvare <khali@linux-fr.org>
337 318
338--------------------------- 319---------------------------
339 320
321What: iptables SAME target
322When: 1.1. 2008
323Files: net/ipv4/netfilter/ipt_SAME.c, include/linux/netfilter_ipv4/ipt_SAME.h
324Why: Obsolete for multiple years now, NAT core provides the same behaviour.
325 Unfixable broken wrt. 32/64 bit cleanness.
326Who: Patrick McHardy <kaber@trash.net>
327
328---------------------------
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index af6a63ab9026..09c184e41cf8 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -874,8 +874,7 @@ accept_redirects - BOOLEAN
874accept_source_route - INTEGER 874accept_source_route - INTEGER
875 Accept source routing (routing extension header). 875 Accept source routing (routing extension header).
876 876
877 > 0: Accept routing header. 877 >= 0: Accept only routing header type 2.
878 = 0: Accept only routing header type 2.
879 < 0: Do not accept routing header. 878 < 0: Do not accept routing header.
880 879
881 Default: 0 880 Default: 0
diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt
new file mode 100644
index 000000000000..2451f551c505
--- /dev/null
+++ b/Documentation/networking/l2tp.txt
@@ -0,0 +1,169 @@
1This brief document describes how to use the kernel's PPPoL2TP driver
2to provide L2TP functionality. L2TP is a protocol that tunnels one or
3more PPP sessions over a UDP tunnel. It is commonly used for VPNs
4(L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP
5network infrastructure.
6
7Design
8======
9
10The PPPoL2TP driver, drivers/net/pppol2tp.c, provides a mechanism by
11which PPP frames carried through an L2TP session are passed through
12the kernel's PPP subsystem. The standard PPP daemon, pppd, handles all
13PPP interaction with the peer. PPP network interfaces are created for
14each local PPP endpoint.
15
16The L2TP protocol http://www.faqs.org/rfcs/rfc2661.html defines L2TP
17control and data frames. L2TP control frames carry messages between
18L2TP clients/servers and are used to setup / teardown tunnels and
19sessions. An L2TP client or server is implemented in userspace and
20will use a regular UDP socket per tunnel. L2TP data frames carry PPP
21frames, which may be PPP control or PPP data. The kernel's PPP
22subsystem arranges for PPP control frames to be delivered to pppd,
23while data frames are forwarded as usual.
24
25Each tunnel and session within a tunnel is assigned a unique tunnel_id
26and session_id. These ids are carried in the L2TP header of every
27control and data packet. The pppol2tp driver uses them to lookup
28internal tunnel and/or session contexts. Zero tunnel / session ids are
29treated specially - zero ids are never assigned to tunnels or sessions
30in the network. In the driver, the tunnel context keeps a pointer to
31the tunnel UDP socket. The session context keeps a pointer to the
32PPPoL2TP socket, as well as other data that lets the driver interface
33to the kernel PPP subsystem.
34
35Note that the pppol2tp kernel driver handles only L2TP data frames;
36L2TP control frames are simply passed up to userspace in the UDP
37tunnel socket. The kernel handles all datapath aspects of the
38protocol, including data packet resequencing (if enabled).
39
40There are a number of requirements on the userspace L2TP daemon in
41order to use the pppol2tp driver.
42
431. Use a UDP socket per tunnel.
44
452. Create a single PPPoL2TP socket per tunnel bound to a special null
46 session id. This is used only for communicating with the driver but
47 must remain open while the tunnel is active. Opening this tunnel
48 management socket causes the driver to mark the tunnel socket as an
49 L2TP UDP encapsulation socket and flags it for use by the
50 referenced tunnel id. This hooks up the UDP receive path via
51 udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed
52 in this special PPPoX socket.
53
543. Create a PPPoL2TP socket per L2TP session. This is typically done
55 by starting pppd with the pppol2tp plugin and appropriate
56 arguments. A PPPoL2TP tunnel management socket (Step 2) must be
57 created before the first PPPoL2TP session socket is created.
58
59When creating PPPoL2TP sockets, the application provides information
60to the driver about the socket in a socket connect() call. Source and
61destination tunnel and session ids are provided, as well as the file
62descriptor of a UDP socket. See struct pppol2tp_addr in
63include/linux/if_ppp.h. Note that zero tunnel / session ids are
64treated specially. When creating the per-tunnel PPPoL2TP management
65socket in Step 2 above, zero source and destination session ids are
66specified, which tells the driver to prepare the supplied UDP file
67descriptor for use as an L2TP tunnel socket.
68
69Userspace may control behavior of the tunnel or session using
70setsockopt and ioctl on the PPPoX socket. The following socket
71options are supported:-
72
73DEBUG - bitmask of debug message categories. See below.
74SENDSEQ - 0 => don't send packets with sequence numbers
75 1 => send packets with sequence numbers
76RECVSEQ - 0 => receive packet sequence numbers are optional
77 1 => drop receive packets without sequence numbers
78LNSMODE - 0 => act as LAC.
79 1 => act as LNS.
80REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder.
81
82Only the DEBUG option is supported by the special tunnel management
83PPPoX socket.
84
85In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided
86to retrieve tunnel and session statistics from the kernel using the
87PPPoX socket of the appropriate tunnel or session.
88
89Debugging
90=========
91
92The driver supports a flexible debug scheme where kernel trace
93messages may be optionally enabled per tunnel and per session. Care is
94needed when debugging a live system since the messages are not
95rate-limited and a busy system could be swamped. Userspace uses
96setsockopt on the PPPoX socket to set a debug mask.
97
98The following debug mask bits are available:
99
100PPPOL2TP_MSG_DEBUG verbose debug (if compiled in)
101PPPOL2TP_MSG_CONTROL userspace - kernel interface
102PPPOL2TP_MSG_SEQ sequence numbers handling
103PPPOL2TP_MSG_DATA data packets
104
105Sample Userspace Code
106=====================
107
1081. Create tunnel management PPPoX socket
109
110 kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
111 if (kernel_fd >= 0) {
112 struct sockaddr_pppol2tp sax;
113 struct sockaddr_in const *peer_addr;
114
115 peer_addr = l2tp_tunnel_get_peer_addr(tunnel);
116 memset(&sax, 0, sizeof(sax));
117 sax.sa_family = AF_PPPOX;
118 sax.sa_protocol = PX_PROTO_OL2TP;
119 sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */
120 sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr;
121 sax.pppol2tp.addr.sin_port = peer_addr->sin_port;
122 sax.pppol2tp.addr.sin_family = AF_INET;
123 sax.pppol2tp.s_tunnel = tunnel_id;
124 sax.pppol2tp.s_session = 0; /* special case: mgmt socket */
125 sax.pppol2tp.d_tunnel = 0;
126 sax.pppol2tp.d_session = 0; /* special case: mgmt socket */
127
128 if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) {
129 perror("connect failed");
130 result = -errno;
131 goto err;
132 }
133 }
134
1352. Create session PPPoX data socket
136
137 struct sockaddr_pppol2tp sax;
138 int fd;
139
140 /* Note, the target socket must be bound already, else it will not be ready */
141 sax.sa_family = AF_PPPOX;
142 sax.sa_protocol = PX_PROTO_OL2TP;
143 sax.pppol2tp.fd = tunnel_fd;
144 sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr;
145 sax.pppol2tp.addr.sin_port = addr->sin_port;
146 sax.pppol2tp.addr.sin_family = AF_INET;
147 sax.pppol2tp.s_tunnel = tunnel_id;
148 sax.pppol2tp.s_session = session_id;
149 sax.pppol2tp.d_tunnel = peer_tunnel_id;
150 sax.pppol2tp.d_session = peer_session_id;
151
152 /* session_fd is the fd of the session's PPPoL2TP socket.
153 * tunnel_fd is the fd of the tunnel UDP socket.
154 */
155 fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax));
156 if (fd < 0 ) {
157 return -errno;
158 }
159 return 0;
160
161Miscellanous
162============
163
164The PPPoL2TP driver was developed as part of the OpenL2TP project by
165Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server,
166designed from the ground up to have the L2TP datapath in the
167kernel. The project also implemented the pppol2tp plugin for pppd
168which allows pppd to use the kernel driver. Details can be found at
169http://openl2tp.sourceforge.net.
diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt
new file mode 100644
index 000000000000..00b60cce2224
--- /dev/null
+++ b/Documentation/networking/multiqueue.txt
@@ -0,0 +1,111 @@
1
2 HOWTO for multiqueue network device support
3 ===========================================
4
5Section 1: Base driver requirements for implementing multiqueue support
6Section 2: Qdisc support for multiqueue devices
7Section 3: Brief howto using PRIO or RR for multiqueue devices
8
9
10Intro: Kernel support for multiqueue devices
11---------------------------------------------------------
12
13Kernel support for multiqueue devices is only an API that is presented to the
14netdevice layer for base drivers to implement. This feature is part of the
15core networking stack, and all network devices will be running on the
16multiqueue-aware stack. If a base driver only has one queue, then these
17changes are transparent to that driver.
18
19
20Section 1: Base driver requirements for implementing multiqueue support
21-----------------------------------------------------------------------
22
23Base drivers are required to use the new alloc_etherdev_mq() or
24alloc_netdev_mq() functions to allocate the subqueues for the device. The
25underlying kernel API will take care of the allocation and deallocation of
26the subqueue memory, as well as netdev configuration of where the queues
27exist in memory.
28
29The base driver will also need to manage the queues as it does the global
30netdev->queue_lock today. Therefore base drivers should use the
31netif_{start|stop|wake}_subqueue() functions to manage each queue while the
32device is still operational. netdev->queue_lock is still used when the device
33comes online or when it's completely shut down (unregister_netdev(), etc.).
34
35Finally, the base driver should indicate that it is a multiqueue device. The
36feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features
37bitmap on device initialization. Below is an example from e1000:
38
39#ifdef CONFIG_E1000_MQ
40 if ( (adapter->hw.mac.type == e1000_82571) ||
41 (adapter->hw.mac.type == e1000_82572) ||
42 (adapter->hw.mac.type == e1000_80003es2lan))
43 netdev->features |= NETIF_F_MULTI_QUEUE;
44#endif
45
46
47Section 2: Qdisc support for multiqueue devices
48-----------------------------------------------
49
50Currently two qdiscs support multiqueue devices. A new round-robin qdisc,
51sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to
52bands and queues, and will store the queue mapping into skb->queue_mapping.
53Use this field in the base driver to determine which queue to send the skb
54to.
55
56sch_rr has been added for hardware that doesn't want scheduling policies from
57software, so it's a straight round-robin qdisc. It uses the same syntax and
58classification priomap that sch_prio uses, so it should be intuitive to
59configure for people who've used sch_prio.
60
61The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been
62built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of
63bands requested is equal to the number of queues on the hardware. If they
64are equal, it sets a one-to-one mapping up between the queues and bands. If
65they're not equal, it will not load the qdisc. This is the same behavior
66for RR. Once the association is made, any skb that is classified will have
67skb->queue_mapping set, which will allow the driver to properly queue skb's
68to multiple queues.
69
70
71Section 3: Brief howto using PRIO and RR for multiqueue devices
72---------------------------------------------------------------
73
74The userspace command 'tc,' part of the iproute2 package, is used to configure
75qdiscs. To add the PRIO qdisc to your network device, assuming the device is
76called eth0, run the following command:
77
78# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue
79
80This will create 4 bands, 0 being highest priority, and associate those bands
81to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping
82would look like:
83
84band 0 => queue 0
85band 1 => queue 1
86band 2 => queue 2
87band 3 => queue 3
88
89Traffic will begin flowing through each queue if your TOS values are assigning
90traffic across the various bands. For example, ssh traffic will always try to
91go out band 0 based on TOS -> Linux priority conversion (realtime traffic),
92so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal"
93traffic classification, which is band 1. Therefore pings will be send out
94queue 1 on the NIC.
95
96Note the use of the multiqueue keyword. This is only in versions of iproute2
97that support multiqueue networking devices; if this is omitted when loading
98a qdisc onto a multiqueue device, the qdisc will load and operate the same
99if it were loaded onto a single-queue device (i.e. - sends all traffic to
100queue 0).
101
102Another alternative to multiqueue band allocation can be done by using the
103multiqueue option and specify 0 bands. If this is the case, the qdisc will
104allocate the number of bands to equal the number of queues that the device
105reports, and bring the qdisc online.
106
107The behavior of tc filters remains the same, where it will override TOS priority
108classification.
109
110
111Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt
index ce1361f95243..37869295fc70 100644
--- a/Documentation/networking/netdevices.txt
+++ b/Documentation/networking/netdevices.txt
@@ -20,6 +20,30 @@ private data which gets freed when the network device is freed. If
20separately allocated data is attached to the network device 20separately allocated data is attached to the network device
21(dev->priv) then it is up to the module exit handler to free that. 21(dev->priv) then it is up to the module exit handler to free that.
22 22
23MTU
24===
25Each network device has a Maximum Transfer Unit. The MTU does not
26include any link layer protocol overhead. Upper layer protocols must
27not pass a socket buffer (skb) to a device to transmit with more data
28than the mtu. The MTU does not include link layer header overhead, so
29for example on Ethernet if the standard MTU is 1500 bytes used, the
30actual skb will contain up to 1514 bytes because of the Ethernet
31header. Devices should allow for the 4 byte VLAN header as well.
32
33Segmentation Offload (GSO, TSO) is an exception to this rule. The
34upper layer protocol may pass a large socket buffer to the device
35transmit routine, and the device will break that up into separate
36packets based on the current MTU.
37
38MTU is symmetrical and applies both to receive and transmit. A device
39must be able to receive at least the maximum size packet allowed by
40the MTU. A network device may use the MTU as mechanism to size receive
41buffers, but the device should allow packets with VLAN header. With
42standard Ethernet mtu of 1500 bytes, the device should allow up to
431518 byte packets (1500 + 14 header + 4 tag). The device may either:
44drop, truncate, or pass up oversize packets, but dropping oversize
45packets is preferred.
46
23 47
24struct net_device synchronization rules 48struct net_device synchronization rules
25======================================= 49=======================================
@@ -43,16 +67,17 @@ dev->get_stats:
43 67
44dev->hard_start_xmit: 68dev->hard_start_xmit:
45 Synchronization: netif_tx_lock spinlock. 69 Synchronization: netif_tx_lock spinlock.
70
46 When the driver sets NETIF_F_LLTX in dev->features this will be 71 When the driver sets NETIF_F_LLTX in dev->features this will be
47 called without holding netif_tx_lock. In this case the driver 72 called without holding netif_tx_lock. In this case the driver
48 has to lock by itself when needed. It is recommended to use a try lock 73 has to lock by itself when needed. It is recommended to use a try lock
49 for this and return -1 when the spin lock fails. 74 for this and return NETDEV_TX_LOCKED when the spin lock fails.
50 The locking there should also properly protect against 75 The locking there should also properly protect against
51 set_multicast_list 76 set_multicast_list.
52 Context: Process with BHs disabled or BH (timer). 77
53 Notes: netif_queue_stopped() is guaranteed false 78 Context: Process with BHs disabled or BH (timer),
54 Interrupts must be enabled when calling hard_start_xmit. 79 will be called with interrupts disabled by netconsole.
55 (Interrupts must also be enabled when enabling the BH handler.) 80
56 Return codes: 81 Return codes:
57 o NETDEV_TX_OK everything ok. 82 o NETDEV_TX_OK everything ok.
58 o NETDEV_TX_BUSY Cannot transmit packet, try later 83 o NETDEV_TX_BUSY Cannot transmit packet, try later
@@ -74,4 +99,5 @@ dev->poll:
74 Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See 99 Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See
75 dev_close code and comments in net/core/dev.c for more info. 100 dev_close code and comments in net/core/dev.c for more info.
76 Context: softirq 101 Context: softirq
102 will be called with interrupts disabled by netconsole.
77 103