diff options
author | Michał Mirosław <mirq-linux@rere.qmqm.pl> | 2011-07-13 01:27:00 -0400 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2011-07-13 01:27:00 -0400 |
commit | e5b1de1f5ebe0200e988e195fefb6c7396de6e20 (patch) | |
tree | 565ad61ba330f4ad592e0bdb9113d4d55ae6749b /Documentation | |
parent | 4c5102f94c175d81790a3a288e85efd4a8a1649a (diff) |
net: Add documentation for netdev features handling
v2: incorporated suggestions from Randy Dunlap
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/networking/netdev-features.txt | 154 |
1 files changed, 154 insertions, 0 deletions
diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt new file mode 100644 index 000000000000..4b1c0dcef84c --- /dev/null +++ b/Documentation/networking/netdev-features.txt | |||
@@ -0,0 +1,154 @@ | |||
1 | Netdev features mess and how to get out from it alive | ||
2 | ===================================================== | ||
3 | |||
4 | Author: | ||
5 | Michał Mirosław <mirq-linux@rere.qmqm.pl> | ||
6 | |||
7 | |||
8 | |||
9 | Part I: Feature sets | ||
10 | ====================== | ||
11 | |||
12 | Long gone are the days when a network card would just take and give packets | ||
13 | verbatim. Today's devices add multiple features and bugs (read: offloads) | ||
14 | that relieve an OS of various tasks like generating and checking checksums, | ||
15 | splitting packets, classifying them. Those capabilities and their state | ||
16 | are commonly referred to as netdev features in Linux kernel world. | ||
17 | |||
18 | There are currently three sets of features relevant to the driver, and | ||
19 | one used internally by network core: | ||
20 | |||
21 | 1. netdev->hw_features set contains features whose state may possibly | ||
22 | be changed (enabled or disabled) for a particular device by user's | ||
23 | request. This set should be initialized in ndo_init callback and not | ||
24 | changed later. | ||
25 | |||
26 | 2. netdev->features set contains features which are currently enabled | ||
27 | for a device. This should be changed only by network core or in | ||
28 | error paths of ndo_set_features callback. | ||
29 | |||
30 | 3. netdev->vlan_features set contains features whose state is inherited | ||
31 | by child VLAN devices (limits netdev->features set). This is currently | ||
32 | used for all VLAN devices whether tags are stripped or inserted in | ||
33 | hardware or software. | ||
34 | |||
35 | 4. netdev->wanted_features set contains feature set requested by user. | ||
36 | This set is filtered by ndo_fix_features callback whenever it or | ||
37 | some device-specific conditions change. This set is internal to | ||
38 | networking core and should not be referenced in drivers. | ||
39 | |||
40 | |||
41 | |||
42 | Part II: Controlling enabled features | ||
43 | ======================================= | ||
44 | |||
45 | When current feature set (netdev->features) is to be changed, new set | ||
46 | is calculated and filtered by calling ndo_fix_features callback | ||
47 | and netdev_fix_features(). If the resulting set differs from current | ||
48 | set, it is passed to ndo_set_features callback and (if the callback | ||
49 | returns success) replaces value stored in netdev->features. | ||
50 | NETDEV_FEAT_CHANGE notification is issued after that whenever current | ||
51 | set might have changed. | ||
52 | |||
53 | The following events trigger recalculation: | ||
54 | 1. device's registration, after ndo_init returned success | ||
55 | 2. user requested changes in features state | ||
56 | 3. netdev_update_features() is called | ||
57 | |||
58 | ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks | ||
59 | are treated as always returning success. | ||
60 | |||
61 | A driver that wants to trigger recalculation must do so by calling | ||
62 | netdev_update_features() while holding rtnl_lock. This should not be done | ||
63 | from ndo_*_features callbacks. netdev->features should not be modified by | ||
64 | driver except by means of ndo_fix_features callback. | ||
65 | |||
66 | |||
67 | |||
68 | Part III: Implementation hints | ||
69 | ================================ | ||
70 | |||
71 | * ndo_fix_features: | ||
72 | |||
73 | All dependencies between features should be resolved here. The resulting | ||
74 | set can be reduced further by networking core imposed limitations (as coded | ||
75 | in netdev_fix_features()). For this reason it is safer to disable a feature | ||
76 | when its dependencies are not met instead of forcing the dependency on. | ||
77 | |||
78 | This callback should not modify hardware nor driver state (should be | ||
79 | stateless). It can be called multiple times between successive | ||
80 | ndo_set_features calls. | ||
81 | |||
82 | Callback must not alter features contained in NETIF_F_SOFT_FEATURES or | ||
83 | NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but | ||
84 | care must be taken as the change won't affect already configured VLANs. | ||
85 | |||
86 | * ndo_set_features: | ||
87 | |||
88 | Hardware should be reconfigured to match passed feature set. The set | ||
89 | should not be altered unless some error condition happens that can't | ||
90 | be reliably detected in ndo_fix_features. In this case, the callback | ||
91 | should update netdev->features to match resulting hardware state. | ||
92 | Errors returned are not (and cannot be) propagated anywhere except dmesg. | ||
93 | (Note: successful return is zero, >0 means silent error.) | ||
94 | |||
95 | |||
96 | |||
97 | Part IV: Features | ||
98 | =================== | ||
99 | |||
100 | For current list of features, see include/linux/netdev_features.h. | ||
101 | This section describes semantics of some of them. | ||
102 | |||
103 | * Transmit checksumming | ||
104 | |||
105 | For complete description, see comments near the top of include/linux/skbuff.h. | ||
106 | |||
107 | Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM. | ||
108 | It means that device can fill TCP/UDP-like checksum anywhere in the packets | ||
109 | whatever headers there might be. | ||
110 | |||
111 | * Transmit TCP segmentation offload | ||
112 | |||
113 | NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit | ||
114 | set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6). | ||
115 | |||
116 | * Transmit DMA from high memory | ||
117 | |||
118 | On platforms where this is relevant, NETIF_F_HIGHDMA signals that | ||
119 | ndo_start_xmit can handle skbs with frags in high memory. | ||
120 | |||
121 | * Transmit scatter-gather | ||
122 | |||
123 | Those features say that ndo_start_xmit can handle fragmented skbs: | ||
124 | NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST --- | ||
125 | chained skbs (skb->next/prev list). | ||
126 | |||
127 | * Software features | ||
128 | |||
129 | Features contained in NETIF_F_SOFT_FEATURES are features of networking | ||
130 | stack. Driver should not change behaviour based on them. | ||
131 | |||
132 | * LLTX driver (deprecated for hardware drivers) | ||
133 | |||
134 | NETIF_F_LLTX should be set in drivers that implement their own locking in | ||
135 | transmit path or don't need locking at all (e.g. software tunnels). | ||
136 | In ndo_start_xmit, it is recommended to use a try_lock and return | ||
137 | NETDEV_TX_LOCKED when the spin lock fails. The locking should also properly | ||
138 | protect against other callbacks (the rules you need to find out). | ||
139 | |||
140 | Don't use it for new drivers. | ||
141 | |||
142 | * netns-local device | ||
143 | |||
144 | NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between | ||
145 | network namespaces (e.g. loopback). | ||
146 | |||
147 | Don't use it in drivers. | ||
148 | |||
149 | * VLAN challenged | ||
150 | |||
151 | NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN | ||
152 | headers. Some drivers set this because the cards can't handle the bigger MTU. | ||
153 | [FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU | ||
154 | VLANs. This may be not useful, though.] | ||