diff options
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/altera_tse.txt | 263 | ||||
-rw-r--r-- | Documentation/networking/bonding.txt | 96 | ||||
-rw-r--r-- | Documentation/networking/can.txt | 2 | ||||
-rw-r--r-- | Documentation/networking/filter.txt | 125 | ||||
-rw-r--r-- | Documentation/networking/gianfar.txt | 30 | ||||
-rw-r--r-- | Documentation/networking/igb.txt | 48 | ||||
-rw-r--r-- | Documentation/networking/phy.txt | 11 | ||||
-rw-r--r-- | Documentation/networking/pktgen.txt | 24 | ||||
-rw-r--r-- | Documentation/networking/rxrpc.txt | 81 | ||||
-rw-r--r-- | Documentation/networking/spider_net.txt | 2 | ||||
-rw-r--r-- | Documentation/networking/tcp.txt | 2 | ||||
-rw-r--r-- | Documentation/networking/timestamping.txt | 6 |
12 files changed, 570 insertions, 120 deletions
diff --git a/Documentation/networking/altera_tse.txt b/Documentation/networking/altera_tse.txt new file mode 100644 index 000000000000..3f24df8c6e65 --- /dev/null +++ b/Documentation/networking/altera_tse.txt | |||
@@ -0,0 +1,263 @@ | |||
1 | Altera Triple-Speed Ethernet MAC driver | ||
2 | |||
3 | Copyright (C) 2008-2014 Altera Corporation | ||
4 | |||
5 | This is the driver for the Altera Triple-Speed Ethernet (TSE) controllers | ||
6 | using the SGDMA and MSGDMA soft DMA IP components. The driver uses the | ||
7 | platform bus to obtain component resources. The designs used to test this | ||
8 | driver were built for a Cyclone(R) V SOC FPGA board, a Cyclone(R) V FPGA board, | ||
9 | and tested with ARM and NIOS processor hosts seperately. The anticipated use | ||
10 | cases are simple communications between an embedded system and an external peer | ||
11 | for status and simple configuration of the embedded system. | ||
12 | |||
13 | For more information visit www.altera.com and www.rocketboards.org. Support | ||
14 | forums for the driver may be found on www.rocketboards.org, and a design used | ||
15 | to test this driver may be found there as well. Support is also available from | ||
16 | the maintainer of this driver, found in MAINTAINERS. | ||
17 | |||
18 | The Triple-Speed Ethernet, SGDMA, and MSGDMA components are all soft IP | ||
19 | components that can be assembled and built into an FPGA using the Altera | ||
20 | Quartus toolchain. Quartus 13.1 and 14.0 were used to build the design that | ||
21 | this driver was tested against. The sopc2dts tool is used to create the | ||
22 | device tree for the driver, and may be found at rocketboards.org. | ||
23 | |||
24 | The driver probe function examines the device tree and determines if the | ||
25 | Triple-Speed Ethernet instance is using an SGDMA or MSGDMA component. The | ||
26 | probe function then installs the appropriate set of DMA routines to | ||
27 | initialize, setup transmits, receives, and interrupt handling primitives for | ||
28 | the respective configurations. | ||
29 | |||
30 | The SGDMA component is to be deprecated in the near future (over the next 1-2 | ||
31 | years as of this writing in early 2014) in favor of the MSGDMA component. | ||
32 | SGDMA support is included for existing designs and reference in case a | ||
33 | developer wishes to support their own soft DMA logic and driver support. Any | ||
34 | new designs should not use the SGDMA. | ||
35 | |||
36 | The SGDMA supports only a single transmit or receive operation at a time, and | ||
37 | therefore will not perform as well compared to the MSGDMA soft IP. Please | ||
38 | visit www.altera.com for known, documented SGDMA errata. | ||
39 | |||
40 | Scatter-gather DMA is not supported by the SGDMA or MSGDMA at this time. | ||
41 | Scatter-gather DMA will be added to a future maintenance update to this | ||
42 | driver. | ||
43 | |||
44 | Jumbo frames are not supported at this time. | ||
45 | |||
46 | The driver limits PHY operations to 10/100Mbps, and has not yet been fully | ||
47 | tested for 1Gbps. This support will be added in a future maintenance update. | ||
48 | |||
49 | 1) Kernel Configuration | ||
50 | The kernel configuration option is ALTERA_TSE: | ||
51 | Device Drivers ---> Network device support ---> Ethernet driver support ---> | ||
52 | Altera Triple-Speed Ethernet MAC support (ALTERA_TSE) | ||
53 | |||
54 | 2) Driver parameters list: | ||
55 | debug: message level (0: no output, 16: all); | ||
56 | dma_rx_num: Number of descriptors in the RX list (default is 64); | ||
57 | dma_tx_num: Number of descriptors in the TX list (default is 64). | ||
58 | |||
59 | 3) Command line options | ||
60 | Driver parameters can be also passed in command line by using: | ||
61 | altera_tse=dma_rx_num:128,dma_tx_num:512 | ||
62 | |||
63 | 4) Driver information and notes | ||
64 | |||
65 | 4.1) Transmit process | ||
66 | When the driver's transmit routine is called by the kernel, it sets up a | ||
67 | transmit descriptor by calling the underlying DMA transmit routine (SGDMA or | ||
68 | MSGDMA), and initites a transmit operation. Once the transmit is complete, an | ||
69 | interrupt is driven by the transmit DMA logic. The driver handles the transmit | ||
70 | completion in the context of the interrupt handling chain by recycling | ||
71 | resource required to send and track the requested transmit operation. | ||
72 | |||
73 | 4.2) Receive process | ||
74 | The driver will post receive buffers to the receive DMA logic during driver | ||
75 | intialization. Receive buffers may or may not be queued depending upon the | ||
76 | underlying DMA logic (MSGDMA is able queue receive buffers, SGDMA is not able | ||
77 | to queue receive buffers to the SGDMA receive logic). When a packet is | ||
78 | received, the DMA logic generates an interrupt. The driver handles a receive | ||
79 | interrupt by obtaining the DMA receive logic status, reaping receive | ||
80 | completions until no more receive completions are available. | ||
81 | |||
82 | 4.3) Interrupt Mitigation | ||
83 | The driver is able to mitigate the number of its DMA interrupts | ||
84 | using NAPI for receive operations. Interrupt mitigation is not yet supported | ||
85 | for transmit operations, but will be added in a future maintenance release. | ||
86 | |||
87 | 4.4) Ethtool support | ||
88 | Ethtool is supported. Driver statistics and internal errors can be taken using: | ||
89 | ethtool -S ethX command. It is possible to dump registers etc. | ||
90 | |||
91 | 4.5) PHY Support | ||
92 | The driver is compatible with PAL to work with PHY and GPHY devices. | ||
93 | |||
94 | 4.7) List of source files: | ||
95 | o Kconfig | ||
96 | o Makefile | ||
97 | o altera_tse_main.c: main network device driver | ||
98 | o altera_tse_ethtool.c: ethtool support | ||
99 | o altera_tse.h: private driver structure and common definitions | ||
100 | o altera_msgdma.h: MSGDMA implementation function definitions | ||
101 | o altera_sgdma.h: SGDMA implementation function definitions | ||
102 | o altera_msgdma.c: MSGDMA implementation | ||
103 | o altera_sgdma.c: SGDMA implementation | ||
104 | o altera_sgdmahw.h: SGDMA register and descriptor definitions | ||
105 | o altera_msgdmahw.h: MSGDMA register and descriptor definitions | ||
106 | o altera_utils.c: Driver utility functions | ||
107 | o altera_utils.h: Driver utility function definitions | ||
108 | |||
109 | 5) Debug Information | ||
110 | |||
111 | The driver exports debug information such as internal statistics, | ||
112 | debug information, MAC and DMA registers etc. | ||
113 | |||
114 | A user may use the ethtool support to get statistics: | ||
115 | e.g. using: ethtool -S ethX (that shows the statistics counters) | ||
116 | or sees the MAC registers: e.g. using: ethtool -d ethX | ||
117 | |||
118 | The developer can also use the "debug" module parameter to get | ||
119 | further debug information. | ||
120 | |||
121 | 6) Statistics Support | ||
122 | |||
123 | The controller and driver support a mix of IEEE standard defined statistics, | ||
124 | RFC defined statistics, and driver or Altera defined statistics. The four | ||
125 | specifications containing the standard definitions for these statistics are | ||
126 | as follows: | ||
127 | |||
128 | o IEEE 802.3-2012 - IEEE Standard for Ethernet. | ||
129 | o RFC 2863 found at http://www.rfc-editor.org/rfc/rfc2863.txt. | ||
130 | o RFC 2819 found at http://www.rfc-editor.org/rfc/rfc2819.txt. | ||
131 | o Altera Triple Speed Ethernet User Guide, found at http://www.altera.com | ||
132 | |||
133 | The statistics supported by the TSE and the device driver are as follows: | ||
134 | |||
135 | "tx_packets" is equivalent to aFramesTransmittedOK defined in IEEE 802.3-2012, | ||
136 | Section 5.2.2.1.2. This statistics is the count of frames that are successfully | ||
137 | transmitted. | ||
138 | |||
139 | "rx_packets" is equivalent to aFramesReceivedOK defined in IEEE 802.3-2012, | ||
140 | Section 5.2.2.1.5. This statistic is the count of frames that are successfully | ||
141 | received. This count does not include any error packets such as CRC errors, | ||
142 | length errors, or alignment errors. | ||
143 | |||
144 | "rx_crc_errors" is equivalent to aFrameCheckSequenceErrors defined in IEEE | ||
145 | 802.3-2012, Section 5.2.2.1.6. This statistic is the count of frames that are | ||
146 | an integral number of bytes in length and do not pass the CRC test as the frame | ||
147 | is received. | ||
148 | |||
149 | "rx_align_errors" is equivalent to aAlignmentErrors defined in IEEE 802.3-2012, | ||
150 | Section 5.2.2.1.7. This statistic is the count of frames that are not an | ||
151 | integral number of bytes in length and do not pass the CRC test as the frame is | ||
152 | received. | ||
153 | |||
154 | "tx_bytes" is equivalent to aOctetsTransmittedOK defined in IEEE 802.3-2012, | ||
155 | Section 5.2.2.1.8. This statistic is the count of data and pad bytes | ||
156 | successfully transmitted from the interface. | ||
157 | |||
158 | "rx_bytes" is equivalent to aOctetsReceivedOK defined in IEEE 802.3-2012, | ||
159 | Section 5.2.2.1.14. This statistic is the count of data and pad bytes | ||
160 | successfully received by the controller. | ||
161 | |||
162 | "tx_pause" is equivalent to aPAUSEMACCtrlFramesTransmitted defined in IEEE | ||
163 | 802.3-2012, Section 30.3.4.2. This statistic is a count of PAUSE frames | ||
164 | transmitted from the network controller. | ||
165 | |||
166 | "rx_pause" is equivalent to aPAUSEMACCtrlFramesReceived defined in IEEE | ||
167 | 802.3-2012, Section 30.3.4.3. This statistic is a count of PAUSE frames | ||
168 | received by the network controller. | ||
169 | |||
170 | "rx_errors" is equivalent to ifInErrors defined in RFC 2863. This statistic is | ||
171 | a count of the number of packets received containing errors that prevented the | ||
172 | packet from being delivered to a higher level protocol. | ||
173 | |||
174 | "tx_errors" is equivalent to ifOutErrors defined in RFC 2863. This statistic | ||
175 | is a count of the number of packets that could not be transmitted due to errors. | ||
176 | |||
177 | "rx_unicast" is equivalent to ifInUcastPkts defined in RFC 2863. This | ||
178 | statistic is a count of the number of packets received that were not addressed | ||
179 | to the broadcast address or a multicast group. | ||
180 | |||
181 | "rx_multicast" is equivalent to ifInMulticastPkts defined in RFC 2863. This | ||
182 | statistic is a count of the number of packets received that were addressed to | ||
183 | a multicast address group. | ||
184 | |||
185 | "rx_broadcast" is equivalent to ifInBroadcastPkts defined in RFC 2863. This | ||
186 | statistic is a count of the number of packets received that were addressed to | ||
187 | the broadcast address. | ||
188 | |||
189 | "tx_discards" is equivalent to ifOutDiscards defined in RFC 2863. This | ||
190 | statistic is the number of outbound packets not transmitted even though an | ||
191 | error was not detected. An example of a reason this might occur is to free up | ||
192 | internal buffer space. | ||
193 | |||
194 | "tx_unicast" is equivalent to ifOutUcastPkts defined in RFC 2863. This | ||
195 | statistic counts the number of packets transmitted that were not addressed to | ||
196 | a multicast group or broadcast address. | ||
197 | |||
198 | "tx_multicast" is equivalent to ifOutMulticastPkts defined in RFC 2863. This | ||
199 | statistic counts the number of packets transmitted that were addressed to a | ||
200 | multicast group. | ||
201 | |||
202 | "tx_broadcast" is equivalent to ifOutBroadcastPkts defined in RFC 2863. This | ||
203 | statistic counts the number of packets transmitted that were addressed to a | ||
204 | broadcast address. | ||
205 | |||
206 | "ether_drops" is equivalent to etherStatsDropEvents defined in RFC 2819. | ||
207 | This statistic counts the number of packets dropped due to lack of internal | ||
208 | controller resources. | ||
209 | |||
210 | "rx_total_bytes" is equivalent to etherStatsOctets defined in RFC 2819. | ||
211 | This statistic counts the total number of bytes received by the controller, | ||
212 | including error and discarded packets. | ||
213 | |||
214 | "rx_total_packets" is equivalent to etherStatsPkts defined in RFC 2819. | ||
215 | This statistic counts the total number of packets received by the controller, | ||
216 | including error, discarded, unicast, multicast, and broadcast packets. | ||
217 | |||
218 | "rx_undersize" is equivalent to etherStatsUndersizePkts defined in RFC 2819. | ||
219 | This statistic counts the number of correctly formed packets received less | ||
220 | than 64 bytes long. | ||
221 | |||
222 | "rx_oversize" is equivalent to etherStatsOversizePkts defined in RFC 2819. | ||
223 | This statistic counts the number of correctly formed packets greater than 1518 | ||
224 | bytes long. | ||
225 | |||
226 | "rx_64_bytes" is equivalent to etherStatsPkts64Octets defined in RFC 2819. | ||
227 | This statistic counts the total number of packets received that were 64 octets | ||
228 | in length. | ||
229 | |||
230 | "rx_65_127_bytes" is equivalent to etherStatsPkts65to127Octets defined in RFC | ||
231 | 2819. This statistic counts the total number of packets received that were | ||
232 | between 65 and 127 octets in length inclusive. | ||
233 | |||
234 | "rx_128_255_bytes" is equivalent to etherStatsPkts128to255Octets defined in | ||
235 | RFC 2819. This statistic is the total number of packets received that were | ||
236 | between 128 and 255 octets in length inclusive. | ||
237 | |||
238 | "rx_256_511_bytes" is equivalent to etherStatsPkts256to511Octets defined in | ||
239 | RFC 2819. This statistic is the total number of packets received that were | ||
240 | between 256 and 511 octets in length inclusive. | ||
241 | |||
242 | "rx_512_1023_bytes" is equivalent to etherStatsPkts512to1023Octets defined in | ||
243 | RFC 2819. This statistic is the total number of packets received that were | ||
244 | between 512 and 1023 octets in length inclusive. | ||
245 | |||
246 | "rx_1024_1518_bytes" is equivalent to etherStatsPkts1024to1518Octets define | ||
247 | in RFC 2819. This statistic is the total number of packets received that were | ||
248 | between 1024 and 1518 octets in length inclusive. | ||
249 | |||
250 | "rx_gte_1519_bytes" is a statistic defined specific to the behavior of the | ||
251 | Altera TSE. This statistics counts the number of received good and errored | ||
252 | frames between the length of 1519 and the maximum frame length configured | ||
253 | in the frm_length register. See the Altera TSE User Guide for More details. | ||
254 | |||
255 | "rx_jabbers" is equivalent to etherStatsJabbers defined in RFC 2819. This | ||
256 | statistic is the total number of packets received that were longer than 1518 | ||
257 | octets, and had either a bad CRC with an integral number of octets (CRC Error) | ||
258 | or a bad CRC with a non-integral number of octets (Alignment Error). | ||
259 | |||
260 | "rx_runts" is equivalent to etherStatsFragments defined in RFC 2819. This | ||
261 | statistic is the total number of packets received that were less than 64 octets | ||
262 | in length and had either a bad CRC with an integral number of octets (CRC | ||
263 | error) or a bad CRC with a non-integral number of octets (Alignment Error). | ||
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 5cdb22971d19..a383c00392d0 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt | |||
@@ -270,16 +270,15 @@ arp_ip_target | |||
270 | arp_validate | 270 | arp_validate |
271 | 271 | ||
272 | Specifies whether or not ARP probes and replies should be | 272 | Specifies whether or not ARP probes and replies should be |
273 | validated in the active-backup mode. This causes the ARP | 273 | validated in any mode that supports arp monitoring, or whether |
274 | monitor to examine the incoming ARP requests and replies, and | 274 | non-ARP traffic should be filtered (disregarded) for link |
275 | only consider a slave to be up if it is receiving the | 275 | monitoring purposes. |
276 | appropriate ARP traffic. | ||
277 | 276 | ||
278 | Possible values are: | 277 | Possible values are: |
279 | 278 | ||
280 | none or 0 | 279 | none or 0 |
281 | 280 | ||
282 | No validation is performed. This is the default. | 281 | No validation or filtering is performed. |
283 | 282 | ||
284 | active or 1 | 283 | active or 1 |
285 | 284 | ||
@@ -293,31 +292,68 @@ arp_validate | |||
293 | 292 | ||
294 | Validation is performed for all slaves. | 293 | Validation is performed for all slaves. |
295 | 294 | ||
296 | For the active slave, the validation checks ARP replies to | 295 | filter or 4 |
297 | confirm that they were generated by an arp_ip_target. Since | 296 | |
298 | backup slaves do not typically receive these replies, the | 297 | Filtering is applied to all slaves. No validation is |
299 | validation performed for backup slaves is on the ARP request | 298 | performed. |
300 | sent out via the active slave. It is possible that some | 299 | |
301 | switch or network configurations may result in situations | 300 | filter_active or 5 |
302 | wherein the backup slaves do not receive the ARP requests; in | 301 | |
303 | such a situation, validation of backup slaves must be | 302 | Filtering is applied to all slaves, validation is performed |
304 | disabled. | 303 | only for the active slave. |
305 | 304 | ||
306 | The validation of ARP requests on backup slaves is mainly | 305 | filter_backup or 6 |
307 | helping bonding to decide which slaves are more likely to | 306 | |
308 | work in case of the active slave failure, it doesn't really | 307 | Filtering is applied to all slaves, validation is performed |
309 | guarantee that the backup slave will work if it's selected | 308 | only for backup slaves. |
310 | as the next active slave. | 309 | |
311 | 310 | Validation: | |
312 | This option is useful in network configurations in which | 311 | |
313 | multiple bonding hosts are concurrently issuing ARPs to one or | 312 | Enabling validation causes the ARP monitor to examine the incoming |
314 | more targets beyond a common switch. Should the link between | 313 | ARP requests and replies, and only consider a slave to be up if it |
315 | the switch and target fail (but not the switch itself), the | 314 | is receiving the appropriate ARP traffic. |
316 | probe traffic generated by the multiple bonding instances will | 315 | |
317 | fool the standard ARP monitor into considering the links as | 316 | For an active slave, the validation checks ARP replies to confirm |
318 | still up. Use of the arp_validate option can resolve this, as | 317 | that they were generated by an arp_ip_target. Since backup slaves |
319 | the ARP monitor will only consider ARP requests and replies | 318 | do not typically receive these replies, the validation performed |
320 | associated with its own instance of bonding. | 319 | for backup slaves is on the broadcast ARP request sent out via the |
320 | active slave. It is possible that some switch or network | ||
321 | configurations may result in situations wherein the backup slaves | ||
322 | do not receive the ARP requests; in such a situation, validation | ||
323 | of backup slaves must be disabled. | ||
324 | |||
325 | The validation of ARP requests on backup slaves is mainly helping | ||
326 | bonding to decide which slaves are more likely to work in case of | ||
327 | the active slave failure, it doesn't really guarantee that the | ||
328 | backup slave will work if it's selected as the next active slave. | ||
329 | |||
330 | Validation is useful in network configurations in which multiple | ||
331 | bonding hosts are concurrently issuing ARPs to one or more targets | ||
332 | beyond a common switch. Should the link between the switch and | ||
333 | target fail (but not the switch itself), the probe traffic | ||
334 | generated by the multiple bonding instances will fool the standard | ||
335 | ARP monitor into considering the links as still up. Use of | ||
336 | validation can resolve this, as the ARP monitor will only consider | ||
337 | ARP requests and replies associated with its own instance of | ||
338 | bonding. | ||
339 | |||
340 | Filtering: | ||
341 | |||
342 | Enabling filtering causes the ARP monitor to only use incoming ARP | ||
343 | packets for link availability purposes. Arriving packets that are | ||
344 | not ARPs are delivered normally, but do not count when determining | ||
345 | if a slave is available. | ||
346 | |||
347 | Filtering operates by only considering the reception of ARP | ||
348 | packets (any ARP packet, regardless of source or destination) when | ||
349 | determining if a slave has received traffic for link availability | ||
350 | purposes. | ||
351 | |||
352 | Filtering is useful in network configurations in which significant | ||
353 | levels of third party broadcast traffic would fool the standard | ||
354 | ARP monitor into considering the links as still up. Use of | ||
355 | filtering can resolve this, as only ARP traffic is considered for | ||
356 | link availability purposes. | ||
321 | 357 | ||
322 | This option was added in bonding version 3.1.0. | 358 | This option was added in bonding version 3.1.0. |
323 | 359 | ||
diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt index 0cbe6ec22d6f..2fa44cbe81b7 100644 --- a/Documentation/networking/can.txt +++ b/Documentation/networking/can.txt | |||
@@ -1017,7 +1017,7 @@ solution for a couple of reasons: | |||
1017 | in case of a bus-off condition after the specified delay time | 1017 | in case of a bus-off condition after the specified delay time |
1018 | in milliseconds. By default it's off. | 1018 | in milliseconds. By default it's off. |
1019 | 1019 | ||
1020 | "bitrate 125000 sample_point 0.875" | 1020 | "bitrate 125000 sample-point 0.875" |
1021 | Shows the real bit-rate in bits/sec and the sample-point in the | 1021 | Shows the real bit-rate in bits/sec and the sample-point in the |
1022 | range 0.000..0.999. If the calculation of bit-timing parameters | 1022 | range 0.000..0.999. If the calculation of bit-timing parameters |
1023 | is enabled in the kernel (CONFIG_CAN_CALC_BITTIMING=y), the | 1023 | is enabled in the kernel (CONFIG_CAN_CALC_BITTIMING=y), the |
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index a06b48d2f5cc..81f940f4e884 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt | |||
@@ -546,6 +546,130 @@ ffffffffa0069c8f + <x>: | |||
546 | For BPF JIT developers, bpf_jit_disasm, bpf_asm and bpf_dbg provides a useful | 546 | For BPF JIT developers, bpf_jit_disasm, bpf_asm and bpf_dbg provides a useful |
547 | toolchain for developing and testing the kernel's JIT compiler. | 547 | toolchain for developing and testing the kernel's JIT compiler. |
548 | 548 | ||
549 | BPF kernel internals | ||
550 | -------------------- | ||
551 | Internally, for the kernel interpreter, a different BPF instruction set | ||
552 | format with similar underlying principles from BPF described in previous | ||
553 | paragraphs is being used. However, the instruction set format is modelled | ||
554 | closer to the underlying architecture to mimic native instruction sets, so | ||
555 | that a better performance can be achieved (more details later). | ||
556 | |||
557 | It is designed to be JITed with one to one mapping, which can also open up | ||
558 | the possibility for GCC/LLVM compilers to generate optimized BPF code through | ||
559 | a BPF backend that performs almost as fast as natively compiled code. | ||
560 | |||
561 | The new instruction set was originally designed with the possible goal in | ||
562 | mind to write programs in "restricted C" and compile into BPF with a optional | ||
563 | GCC/LLVM backend, so that it can just-in-time map to modern 64-bit CPUs with | ||
564 | minimal performance overhead over two steps, that is, C -> BPF -> native code. | ||
565 | |||
566 | Currently, the new format is being used for running user BPF programs, which | ||
567 | includes seccomp BPF, classic socket filters, cls_bpf traffic classifier, | ||
568 | team driver's classifier for its load-balancing mode, netfilter's xt_bpf | ||
569 | extension, PTP dissector/classifier, and much more. They are all internally | ||
570 | converted by the kernel into the new instruction set representation and run | ||
571 | in the extended interpreter. For in-kernel handlers, this all works | ||
572 | transparently by using sk_unattached_filter_create() for setting up the | ||
573 | filter, resp. sk_unattached_filter_destroy() for destroying it. The macro | ||
574 | SK_RUN_FILTER(filter, ctx) transparently invokes the right BPF function to | ||
575 | run the filter. 'filter' is a pointer to struct sk_filter that we got from | ||
576 | sk_unattached_filter_create(), and 'ctx' the given context (e.g. skb pointer). | ||
577 | All constraints and restrictions from sk_chk_filter() apply before a | ||
578 | conversion to the new layout is being done behind the scenes! | ||
579 | |||
580 | Currently, for JITing, the user BPF format is being used and current BPF JIT | ||
581 | compilers reused whenever possible. In other words, we do not (yet!) perform | ||
582 | a JIT compilation in the new layout, however, future work will successively | ||
583 | migrate traditional JIT compilers into the new instruction format as well, so | ||
584 | that they will profit from the very same benefits. Thus, when speaking about | ||
585 | JIT in the following, a JIT compiler (TBD) for the new instruction format is | ||
586 | meant in this context. | ||
587 | |||
588 | Some core changes of the new internal format: | ||
589 | |||
590 | - Number of registers increase from 2 to 10: | ||
591 | |||
592 | The old format had two registers A and X, and a hidden frame pointer. The | ||
593 | new layout extends this to be 10 internal registers and a read-only frame | ||
594 | pointer. Since 64-bit CPUs are passing arguments to functions via registers | ||
595 | the number of args from BPF program to in-kernel function is restricted | ||
596 | to 5 and one register is used to accept return value from an in-kernel | ||
597 | function. Natively, x86_64 passes first 6 arguments in registers, aarch64/ | ||
598 | sparcv9/mips64 have 7 - 8 registers for arguments; x86_64 has 6 callee saved | ||
599 | registers, and aarch64/sparcv9/mips64 have 11 or more callee saved registers. | ||
600 | |||
601 | Therefore, BPF calling convention is defined as: | ||
602 | |||
603 | * R0 - return value from in-kernel function | ||
604 | * R1 - R5 - arguments from BPF program to in-kernel function | ||
605 | * R6 - R9 - callee saved registers that in-kernel function will preserve | ||
606 | * R10 - read-only frame pointer to access stack | ||
607 | |||
608 | Thus, all BPF registers map one to one to HW registers on x86_64, aarch64, | ||
609 | etc, and BPF calling convention maps directly to ABIs used by the kernel on | ||
610 | 64-bit architectures. | ||
611 | |||
612 | On 32-bit architectures JIT may map programs that use only 32-bit arithmetic | ||
613 | and may let more complex programs to be interpreted. | ||
614 | |||
615 | R0 - R5 are scratch registers and BPF program needs spill/fill them if | ||
616 | necessary across calls. Note that there is only one BPF program (== one BPF | ||
617 | main routine) and it cannot call other BPF functions, it can only call | ||
618 | predefined in-kernel functions, though. | ||
619 | |||
620 | - Register width increases from 32-bit to 64-bit: | ||
621 | |||
622 | Still, the semantics of the original 32-bit ALU operations are preserved | ||
623 | via 32-bit subregisters. All BPF registers are 64-bit with 32-bit lower | ||
624 | subregisters that zero-extend into 64-bit if they are being written to. | ||
625 | That behavior maps directly to x86_64 and arm64 subregister definition, but | ||
626 | makes other JITs more difficult. | ||
627 | |||
628 | 32-bit architectures run 64-bit internal BPF programs via interpreter. | ||
629 | Their JITs may convert BPF programs that only use 32-bit subregisters into | ||
630 | native instruction set and let the rest being interpreted. | ||
631 | |||
632 | Operation is 64-bit, because on 64-bit architectures, pointers are also | ||
633 | 64-bit wide, and we want to pass 64-bit values in/out of kernel functions, | ||
634 | so 32-bit BPF registers would otherwise require to define register-pair | ||
635 | ABI, thus, there won't be able to use a direct BPF register to HW register | ||
636 | mapping and JIT would need to do combine/split/move operations for every | ||
637 | register in and out of the function, which is complex, bug prone and slow. | ||
638 | Another reason is the use of atomic 64-bit counters. | ||
639 | |||
640 | - Conditional jt/jf targets replaced with jt/fall-through: | ||
641 | |||
642 | While the original design has constructs such as "if (cond) jump_true; | ||
643 | else jump_false;", they are being replaced into alternative constructs like | ||
644 | "if (cond) jump_true; /* else fall-through */". | ||
645 | |||
646 | - Introduces bpf_call insn and register passing convention for zero overhead | ||
647 | calls from/to other kernel functions: | ||
648 | |||
649 | After a kernel function call, R1 - R5 are reset to unreadable and R0 has a | ||
650 | return type of the function. Since R6 - R9 are callee saved, their state is | ||
651 | preserved across the call. | ||
652 | |||
653 | Also in the new design, BPF is limited to 4096 insns, which means that any | ||
654 | program will terminate quickly and will only call a fixed number of kernel | ||
655 | functions. Original BPF and the new format are two operand instructions, | ||
656 | which helps to do one-to-one mapping between BPF insn and x86 insn during JIT. | ||
657 | |||
658 | The input context pointer for invoking the interpreter function is generic, | ||
659 | its content is defined by a specific use case. For seccomp register R1 points | ||
660 | to seccomp_data, for converted BPF filters R1 points to a skb. | ||
661 | |||
662 | A program, that is translated internally consists of the following elements: | ||
663 | |||
664 | op:16, jt:8, jf:8, k:32 ==> op:8, a_reg:4, x_reg:4, off:16, imm:32 | ||
665 | |||
666 | Just like the original BPF, the new format runs within a controlled environment, | ||
667 | is deterministic and the kernel can easily prove that. The safety of the program | ||
668 | can be determined in two steps: first step does depth-first-search to disallow | ||
669 | loops and other CFG validation; second step starts from the first insn and | ||
670 | descends all possible paths. It simulates execution of every insn and observes | ||
671 | the state change of registers and stack. | ||
672 | |||
549 | Misc | 673 | Misc |
550 | ---- | 674 | ---- |
551 | 675 | ||
@@ -561,3 +685,4 @@ the underlying architecture. | |||
561 | 685 | ||
562 | Jay Schulist <jschlst@samba.org> | 686 | Jay Schulist <jschlst@samba.org> |
563 | Daniel Borkmann <dborkman@redhat.com> | 687 | Daniel Borkmann <dborkman@redhat.com> |
688 | Alexei Starovoitov <ast@plumgrid.com> | ||
diff --git a/Documentation/networking/gianfar.txt b/Documentation/networking/gianfar.txt index ad474ea07d07..ba1daea7f2e4 100644 --- a/Documentation/networking/gianfar.txt +++ b/Documentation/networking/gianfar.txt | |||
@@ -1,38 +1,8 @@ | |||
1 | The Gianfar Ethernet Driver | 1 | The Gianfar Ethernet Driver |
2 | Sysfs File description | ||
3 | 2 | ||
4 | Author: Andy Fleming <afleming@freescale.com> | 3 | Author: Andy Fleming <afleming@freescale.com> |
5 | Updated: 2005-07-28 | 4 | Updated: 2005-07-28 |
6 | 5 | ||
7 | SYSFS | ||
8 | |||
9 | Several of the features of the gianfar driver are controlled | ||
10 | through sysfs files. These are: | ||
11 | |||
12 | bd_stash: | ||
13 | To stash RX Buffer Descriptors in the L2, echo 'on' or '1' to | ||
14 | bd_stash, echo 'off' or '0' to disable | ||
15 | |||
16 | rx_stash_len: | ||
17 | To stash the first n bytes of the packet in L2, echo the number | ||
18 | of bytes to buf_stash_len. echo 0 to disable. | ||
19 | |||
20 | WARNING: You could really screw these up if you set them too low or high! | ||
21 | fifo_threshold: | ||
22 | To change the number of bytes the controller needs in the | ||
23 | fifo before it starts transmission, echo the number of bytes to | ||
24 | fifo_thresh. Range should be 0-511. | ||
25 | |||
26 | fifo_starve: | ||
27 | When the FIFO has less than this many bytes during a transmit, it | ||
28 | enters starve mode, and increases the priority of TX memory | ||
29 | transactions. To change, echo the number of bytes to | ||
30 | fifo_starve. Range should be 0-511. | ||
31 | |||
32 | fifo_starve_off: | ||
33 | Once in starve mode, the FIFO remains there until it has this | ||
34 | many bytes. To change, echo the number of bytes to | ||
35 | fifo_starve_off. Range should be 0-511. | ||
36 | 6 | ||
37 | CHECKSUM OFFLOADING | 7 | CHECKSUM OFFLOADING |
38 | 8 | ||
diff --git a/Documentation/networking/igb.txt b/Documentation/networking/igb.txt index 4ebbd659256f..43d3549366a0 100644 --- a/Documentation/networking/igb.txt +++ b/Documentation/networking/igb.txt | |||
@@ -36,54 +36,6 @@ Default Value: 0 | |||
36 | This parameter adds support for SR-IOV. It causes the driver to spawn up to | 36 | This parameter adds support for SR-IOV. It causes the driver to spawn up to |
37 | max_vfs worth of virtual function. | 37 | max_vfs worth of virtual function. |
38 | 38 | ||
39 | QueuePairs | ||
40 | ---------- | ||
41 | Valid Range: 0-1 | ||
42 | Default Value: 1 (TX and RX will be paired onto one interrupt vector) | ||
43 | |||
44 | If set to 0, when MSI-X is enabled, the TX and RX will attempt to occupy | ||
45 | separate vectors. | ||
46 | |||
47 | This option can be overridden to 1 if there are not sufficient interrupts | ||
48 | available. This can occur if any combination of RSS, VMDQ, and max_vfs | ||
49 | results in more than 4 queues being used. | ||
50 | |||
51 | Node | ||
52 | ---- | ||
53 | Valid Range: 0-n | ||
54 | Default Value: -1 (off) | ||
55 | |||
56 | 0 - n: where n is the number of the NUMA node that should be used to | ||
57 | allocate memory for this adapter port. | ||
58 | -1: uses the driver default of allocating memory on whichever processor is | ||
59 | running insmod/modprobe. | ||
60 | |||
61 | The Node parameter will allow you to pick which NUMA node you want to have | ||
62 | the adapter allocate memory from. All driver structures, in-memory queues, | ||
63 | and receive buffers will be allocated on the node specified. This parameter | ||
64 | is only useful when interrupt affinity is specified, otherwise some portion | ||
65 | of the time the interrupt could run on a different core than the memory is | ||
66 | allocated on, causing slower memory access and impacting throughput, CPU, or | ||
67 | both. | ||
68 | |||
69 | EEE | ||
70 | --- | ||
71 | Valid Range: 0-1 | ||
72 | Default Value: 1 (enabled) | ||
73 | |||
74 | A link between two EEE-compliant devices will result in periodic bursts of | ||
75 | data followed by long periods where in the link is in an idle state. This Low | ||
76 | Power Idle (LPI) state is supported in both 1Gbps and 100Mbps link speeds. | ||
77 | NOTE: EEE support requires autonegotiation. | ||
78 | |||
79 | DMAC | ||
80 | ---- | ||
81 | Valid Range: 0-1 | ||
82 | Default Value: 1 (enabled) | ||
83 | Enables or disables DMA Coalescing feature. | ||
84 | |||
85 | |||
86 | |||
87 | Additional Configurations | 39 | Additional Configurations |
88 | ========================= | 40 | ========================= |
89 | 41 | ||
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index ebf270719402..3544c98401fd 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt | |||
@@ -48,7 +48,7 @@ The MDIO bus | |||
48 | time, so it is safe for them to block, waiting for an interrupt to signal | 48 | time, so it is safe for them to block, waiting for an interrupt to signal |
49 | the operation is complete | 49 | the operation is complete |
50 | 50 | ||
51 | 2) A reset function is necessary. This is used to return the bus to an | 51 | 2) A reset function is optional. This is used to return the bus to an |
52 | initialized state. | 52 | initialized state. |
53 | 53 | ||
54 | 3) A probe function is needed. This function should set up anything the bus | 54 | 3) A probe function is needed. This function should set up anything the bus |
@@ -253,16 +253,25 @@ Writing a PHY driver | |||
253 | 253 | ||
254 | Each driver consists of a number of function pointers: | 254 | Each driver consists of a number of function pointers: |
255 | 255 | ||
256 | soft_reset: perform a PHY software reset | ||
256 | config_init: configures PHY into a sane state after a reset. | 257 | config_init: configures PHY into a sane state after a reset. |
257 | For instance, a Davicom PHY requires descrambling disabled. | 258 | For instance, a Davicom PHY requires descrambling disabled. |
258 | probe: Allocate phy->priv, optionally refuse to bind. | 259 | probe: Allocate phy->priv, optionally refuse to bind. |
259 | PHY may not have been reset or had fixups run yet. | 260 | PHY may not have been reset or had fixups run yet. |
260 | suspend/resume: power management | 261 | suspend/resume: power management |
261 | config_aneg: Changes the speed/duplex/negotiation settings | 262 | config_aneg: Changes the speed/duplex/negotiation settings |
263 | aneg_done: Determines the auto-negotiation result | ||
262 | read_status: Reads the current speed/duplex/negotiation settings | 264 | read_status: Reads the current speed/duplex/negotiation settings |
263 | ack_interrupt: Clear a pending interrupt | 265 | ack_interrupt: Clear a pending interrupt |
266 | did_interrupt: Checks if the PHY generated an interrupt | ||
264 | config_intr: Enable or disable interrupts | 267 | config_intr: Enable or disable interrupts |
265 | remove: Does any driver take-down | 268 | remove: Does any driver take-down |
269 | ts_info: Queries about the HW timestamping status | ||
270 | hwtstamp: Set the PHY HW timestamping configuration | ||
271 | rxtstamp: Requests a receive timestamp at the PHY level for a 'skb' | ||
272 | txtsamp: Requests a transmit timestamp at the PHY level for a 'skb' | ||
273 | set_wol: Enable Wake-on-LAN at the PHY level | ||
274 | get_wol: Get the Wake-on-LAN status at the PHY level | ||
266 | 275 | ||
267 | Of these, only config_aneg and read_status are required to be | 276 | Of these, only config_aneg and read_status are required to be |
268 | assigned by the driver code. The rest are optional. Also, it is | 277 | assigned by the driver code. The rest are optional. Also, it is |
diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt index 5a61a240a652..0e30c7845b2b 100644 --- a/Documentation/networking/pktgen.txt +++ b/Documentation/networking/pktgen.txt | |||
@@ -102,13 +102,18 @@ Examples: | |||
102 | The 'minimum' MAC is what you set with dstmac. | 102 | The 'minimum' MAC is what you set with dstmac. |
103 | 103 | ||
104 | pgset "flag [name]" Set a flag to determine behaviour. Current flags | 104 | pgset "flag [name]" Set a flag to determine behaviour. Current flags |
105 | are: IPSRC_RND #IP Source is random (between min/max), | 105 | are: IPSRC_RND # IP source is random (between min/max) |
106 | IPDST_RND, UDPSRC_RND, | 106 | IPDST_RND # IP destination is random |
107 | UDPDST_RND, MACSRC_RND, MACDST_RND | 107 | UDPSRC_RND, UDPDST_RND, |
108 | MACSRC_RND, MACDST_RND | ||
109 | TXSIZE_RND, IPV6, | ||
108 | MPLS_RND, VID_RND, SVID_RND | 110 | MPLS_RND, VID_RND, SVID_RND |
111 | FLOW_SEQ, | ||
109 | QUEUE_MAP_RND # queue map random | 112 | QUEUE_MAP_RND # queue map random |
110 | QUEUE_MAP_CPU # queue map mirrors smp_processor_id() | 113 | QUEUE_MAP_CPU # queue map mirrors smp_processor_id() |
111 | IPSEC # Make IPsec encapsulation for packet | 114 | UDPCSUM, |
115 | IPSEC # IPsec encapsulation (needs CONFIG_XFRM) | ||
116 | NODE_ALLOC # node specific memory allocation | ||
112 | 117 | ||
113 | pgset spi SPI_VALUE Set specific SA used to transform packet. | 118 | pgset spi SPI_VALUE Set specific SA used to transform packet. |
114 | 119 | ||
@@ -233,13 +238,22 @@ udp_dst_max | |||
233 | 238 | ||
234 | flag | 239 | flag |
235 | IPSRC_RND | 240 | IPSRC_RND |
236 | TXSIZE_RND | ||
237 | IPDST_RND | 241 | IPDST_RND |
238 | UDPSRC_RND | 242 | UDPSRC_RND |
239 | UDPDST_RND | 243 | UDPDST_RND |
240 | MACSRC_RND | 244 | MACSRC_RND |
241 | MACDST_RND | 245 | MACDST_RND |
246 | TXSIZE_RND | ||
247 | IPV6 | ||
248 | MPLS_RND | ||
249 | VID_RND | ||
250 | SVID_RND | ||
251 | FLOW_SEQ | ||
252 | QUEUE_MAP_RND | ||
253 | QUEUE_MAP_CPU | ||
254 | UDPCSUM | ||
242 | IPSEC | 255 | IPSEC |
256 | NODE_ALLOC | ||
243 | 257 | ||
244 | dst_min | 258 | dst_min |
245 | dst_max | 259 | dst_max |
diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index b89bc82eed46..16a924c486bf 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt | |||
@@ -27,6 +27,8 @@ Contents of this document: | |||
27 | 27 | ||
28 | (*) AF_RXRPC kernel interface. | 28 | (*) AF_RXRPC kernel interface. |
29 | 29 | ||
30 | (*) Configurable parameters. | ||
31 | |||
30 | 32 | ||
31 | ======== | 33 | ======== |
32 | OVERVIEW | 34 | OVERVIEW |
@@ -864,3 +866,82 @@ The kernel interface functions are as follows: | |||
864 | 866 | ||
865 | This is used to allocate a null RxRPC key that can be used to indicate | 867 | This is used to allocate a null RxRPC key that can be used to indicate |
866 | anonymous security for a particular domain. | 868 | anonymous security for a particular domain. |
869 | |||
870 | |||
871 | ======================= | ||
872 | CONFIGURABLE PARAMETERS | ||
873 | ======================= | ||
874 | |||
875 | The RxRPC protocol driver has a number of configurable parameters that can be | ||
876 | adjusted through sysctls in /proc/net/rxrpc/: | ||
877 | |||
878 | (*) req_ack_delay | ||
879 | |||
880 | The amount of time in milliseconds after receiving a packet with the | ||
881 | request-ack flag set before we honour the flag and actually send the | ||
882 | requested ack. | ||
883 | |||
884 | Usually the other side won't stop sending packets until the advertised | ||
885 | reception window is full (to a maximum of 255 packets), so delaying the | ||
886 | ACK permits several packets to be ACK'd in one go. | ||
887 | |||
888 | (*) soft_ack_delay | ||
889 | |||
890 | The amount of time in milliseconds after receiving a new packet before we | ||
891 | generate a soft-ACK to tell the sender that it doesn't need to resend. | ||
892 | |||
893 | (*) idle_ack_delay | ||
894 | |||
895 | The amount of time in milliseconds after all the packets currently in the | ||
896 | received queue have been consumed before we generate a hard-ACK to tell | ||
897 | the sender it can free its buffers, assuming no other reason occurs that | ||
898 | we would send an ACK. | ||
899 | |||
900 | (*) resend_timeout | ||
901 | |||
902 | The amount of time in milliseconds after transmitting a packet before we | ||
903 | transmit it again, assuming no ACK is received from the receiver telling | ||
904 | us they got it. | ||
905 | |||
906 | (*) max_call_lifetime | ||
907 | |||
908 | The maximum amount of time in seconds that a call may be in progress | ||
909 | before we preemptively kill it. | ||
910 | |||
911 | (*) dead_call_expiry | ||
912 | |||
913 | The amount of time in seconds before we remove a dead call from the call | ||
914 | list. Dead calls are kept around for a little while for the purpose of | ||
915 | repeating ACK and ABORT packets. | ||
916 | |||
917 | (*) connection_expiry | ||
918 | |||
919 | The amount of time in seconds after a connection was last used before we | ||
920 | remove it from the connection list. Whilst a connection is in existence, | ||
921 | it serves as a placeholder for negotiated security; when it is deleted, | ||
922 | the security must be renegotiated. | ||
923 | |||
924 | (*) transport_expiry | ||
925 | |||
926 | The amount of time in seconds after a transport was last used before we | ||
927 | remove it from the transport list. Whilst a transport is in existence, it | ||
928 | serves to anchor the peer data and keeps the connection ID counter. | ||
929 | |||
930 | (*) rxrpc_rx_window_size | ||
931 | |||
932 | The size of the receive window in packets. This is the maximum number of | ||
933 | unconsumed received packets we're willing to hold in memory for any | ||
934 | particular call. | ||
935 | |||
936 | (*) rxrpc_rx_mtu | ||
937 | |||
938 | The maximum packet MTU size that we're willing to receive in bytes. This | ||
939 | indicates to the peer whether we're willing to accept jumbo packets. | ||
940 | |||
941 | (*) rxrpc_rx_jumbo_max | ||
942 | |||
943 | The maximum number of packets that we're willing to accept in a jumbo | ||
944 | packet. Non-terminal packets in a jumbo packet must contain a four byte | ||
945 | header plus exactly 1412 bytes of data. The terminal packet must contain | ||
946 | a four byte header plus any amount of data. In any event, a jumbo packet | ||
947 | may not exceed rxrpc_rx_mtu in size. | ||
diff --git a/Documentation/networking/spider_net.txt b/Documentation/networking/spider_net.txt index 4b4adb8eb14f..b0b75f8463b3 100644 --- a/Documentation/networking/spider_net.txt +++ b/Documentation/networking/spider_net.txt | |||
@@ -73,7 +73,7 @@ Thus, in an idle system, the GDACTDPA, tail and head pointers will | |||
73 | all be pointing at the same descr, which should be "empty". All of the | 73 | all be pointing at the same descr, which should be "empty". All of the |
74 | other descrs in the ring should be "empty" as well. | 74 | other descrs in the ring should be "empty" as well. |
75 | 75 | ||
76 | The show_rx_chain() routine will print out the the locations of the | 76 | The show_rx_chain() routine will print out the locations of the |
77 | GDACTDPA, tail and head pointers. It will also summarize the contents | 77 | GDACTDPA, tail and head pointers. It will also summarize the contents |
78 | of the ring, starting at the tail pointer, and listing the status | 78 | of the ring, starting at the tail pointer, and listing the status |
79 | of the descrs that follow. | 79 | of the descrs that follow. |
diff --git a/Documentation/networking/tcp.txt b/Documentation/networking/tcp.txt index 7d11bb5dc30a..bdc4c0db51e1 100644 --- a/Documentation/networking/tcp.txt +++ b/Documentation/networking/tcp.txt | |||
@@ -30,7 +30,7 @@ A congestion control mechanism can be registered through functions in | |||
30 | tcp_cong.c. The functions used by the congestion control mechanism are | 30 | tcp_cong.c. The functions used by the congestion control mechanism are |
31 | registered via passing a tcp_congestion_ops struct to | 31 | registered via passing a tcp_congestion_ops struct to |
32 | tcp_register_congestion_control. As a minimum name, ssthresh, | 32 | tcp_register_congestion_control. As a minimum name, ssthresh, |
33 | cong_avoid, min_cwnd must be valid. | 33 | cong_avoid must be valid. |
34 | 34 | ||
35 | Private data for a congestion control mechanism is stored in tp->ca_priv. | 35 | Private data for a congestion control mechanism is stored in tp->ca_priv. |
36 | tcp_ca(tp) returns a pointer to this space. This is preallocated space - it | 36 | tcp_ca(tp) returns a pointer to this space. This is preallocated space - it |
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index 048c92b487f6..bc3554124903 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt | |||
@@ -202,6 +202,9 @@ Time stamps for outgoing packets are to be generated as follows: | |||
202 | and not free the skb. A driver not supporting hardware time stamping doesn't | 202 | and not free the skb. A driver not supporting hardware time stamping doesn't |
203 | do that. A driver must never touch sk_buff::tstamp! It is used to store | 203 | do that. A driver must never touch sk_buff::tstamp! It is used to store |
204 | software generated time stamps by the network subsystem. | 204 | software generated time stamps by the network subsystem. |
205 | - Driver should call skb_tx_timestamp() as close to passing sk_buff to hardware | ||
206 | as possible. skb_tx_timestamp() provides a software time stamp if requested | ||
207 | and hardware timestamping is not possible (SKBTX_IN_PROGRESS not set). | ||
205 | - As soon as the driver has sent the packet and/or obtained a | 208 | - As soon as the driver has sent the packet and/or obtained a |
206 | hardware time stamp for it, it passes the time stamp back by | 209 | hardware time stamp for it, it passes the time stamp back by |
207 | calling skb_hwtstamp_tx() with the original skb, the raw | 210 | calling skb_hwtstamp_tx() with the original skb, the raw |
@@ -212,6 +215,3 @@ Time stamps for outgoing packets are to be generated as follows: | |||
212 | this would occur at a later time in the processing pipeline than other | 215 | this would occur at a later time in the processing pipeline than other |
213 | software time stamping and therefore could lead to unexpected deltas | 216 | software time stamping and therefore could lead to unexpected deltas |
214 | between time stamps. | 217 | between time stamps. |
215 | - If the driver did not set the SKBTX_IN_PROGRESS flag (see above), then | ||
216 | dev_hard_start_xmit() checks whether software time stamping | ||
217 | is wanted as fallback and potentially generates the time stamp. | ||