diff options
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/multiqueue.txt | 111 |
1 files changed, 111 insertions, 0 deletions
diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt new file mode 100644 index 000000000000..00b60cce2224 --- /dev/null +++ b/Documentation/networking/multiqueue.txt | |||
@@ -0,0 +1,111 @@ | |||
1 | |||
2 | HOWTO for multiqueue network device support | ||
3 | =========================================== | ||
4 | |||
5 | Section 1: Base driver requirements for implementing multiqueue support | ||
6 | Section 2: Qdisc support for multiqueue devices | ||
7 | Section 3: Brief howto using PRIO or RR for multiqueue devices | ||
8 | |||
9 | |||
10 | Intro: Kernel support for multiqueue devices | ||
11 | --------------------------------------------------------- | ||
12 | |||
13 | Kernel support for multiqueue devices is only an API that is presented to the | ||
14 | netdevice layer for base drivers to implement. This feature is part of the | ||
15 | core networking stack, and all network devices will be running on the | ||
16 | multiqueue-aware stack. If a base driver only has one queue, then these | ||
17 | changes are transparent to that driver. | ||
18 | |||
19 | |||
20 | Section 1: Base driver requirements for implementing multiqueue support | ||
21 | ----------------------------------------------------------------------- | ||
22 | |||
23 | Base drivers are required to use the new alloc_etherdev_mq() or | ||
24 | alloc_netdev_mq() functions to allocate the subqueues for the device. The | ||
25 | underlying kernel API will take care of the allocation and deallocation of | ||
26 | the subqueue memory, as well as netdev configuration of where the queues | ||
27 | exist in memory. | ||
28 | |||
29 | The base driver will also need to manage the queues as it does the global | ||
30 | netdev->queue_lock today. Therefore base drivers should use the | ||
31 | netif_{start|stop|wake}_subqueue() functions to manage each queue while the | ||
32 | device is still operational. netdev->queue_lock is still used when the device | ||
33 | comes online or when it's completely shut down (unregister_netdev(), etc.). | ||
34 | |||
35 | Finally, the base driver should indicate that it is a multiqueue device. The | ||
36 | feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features | ||
37 | bitmap on device initialization. Below is an example from e1000: | ||
38 | |||
39 | #ifdef CONFIG_E1000_MQ | ||
40 | if ( (adapter->hw.mac.type == e1000_82571) || | ||
41 | (adapter->hw.mac.type == e1000_82572) || | ||
42 | (adapter->hw.mac.type == e1000_80003es2lan)) | ||
43 | netdev->features |= NETIF_F_MULTI_QUEUE; | ||
44 | #endif | ||
45 | |||
46 | |||
47 | Section 2: Qdisc support for multiqueue devices | ||
48 | ----------------------------------------------- | ||
49 | |||
50 | Currently two qdiscs support multiqueue devices. A new round-robin qdisc, | ||
51 | sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to | ||
52 | bands and queues, and will store the queue mapping into skb->queue_mapping. | ||
53 | Use this field in the base driver to determine which queue to send the skb | ||
54 | to. | ||
55 | |||
56 | sch_rr has been added for hardware that doesn't want scheduling policies from | ||
57 | software, so it's a straight round-robin qdisc. It uses the same syntax and | ||
58 | classification priomap that sch_prio uses, so it should be intuitive to | ||
59 | configure for people who've used sch_prio. | ||
60 | |||
61 | The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been | ||
62 | built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of | ||
63 | bands requested is equal to the number of queues on the hardware. If they | ||
64 | are equal, it sets a one-to-one mapping up between the queues and bands. If | ||
65 | they're not equal, it will not load the qdisc. This is the same behavior | ||
66 | for RR. Once the association is made, any skb that is classified will have | ||
67 | skb->queue_mapping set, which will allow the driver to properly queue skb's | ||
68 | to multiple queues. | ||
69 | |||
70 | |||
71 | Section 3: Brief howto using PRIO and RR for multiqueue devices | ||
72 | --------------------------------------------------------------- | ||
73 | |||
74 | The userspace command 'tc,' part of the iproute2 package, is used to configure | ||
75 | qdiscs. To add the PRIO qdisc to your network device, assuming the device is | ||
76 | called eth0, run the following command: | ||
77 | |||
78 | # tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue | ||
79 | |||
80 | This will create 4 bands, 0 being highest priority, and associate those bands | ||
81 | to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping | ||
82 | would look like: | ||
83 | |||
84 | band 0 => queue 0 | ||
85 | band 1 => queue 1 | ||
86 | band 2 => queue 2 | ||
87 | band 3 => queue 3 | ||
88 | |||
89 | Traffic will begin flowing through each queue if your TOS values are assigning | ||
90 | traffic across the various bands. For example, ssh traffic will always try to | ||
91 | go out band 0 based on TOS -> Linux priority conversion (realtime traffic), | ||
92 | so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal" | ||
93 | traffic classification, which is band 1. Therefore pings will be send out | ||
94 | queue 1 on the NIC. | ||
95 | |||
96 | Note the use of the multiqueue keyword. This is only in versions of iproute2 | ||
97 | that support multiqueue networking devices; if this is omitted when loading | ||
98 | a qdisc onto a multiqueue device, the qdisc will load and operate the same | ||
99 | if it were loaded onto a single-queue device (i.e. - sends all traffic to | ||
100 | queue 0). | ||
101 | |||
102 | Another alternative to multiqueue band allocation can be done by using the | ||
103 | multiqueue option and specify 0 bands. If this is the case, the qdisc will | ||
104 | allocate the number of bands to equal the number of queues that the device | ||
105 | reports, and bring the qdisc online. | ||
106 | |||
107 | The behavior of tc filters remains the same, where it will override TOS priority | ||
108 | classification. | ||
109 | |||
110 | |||
111 | Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> | ||