diff options
Diffstat (limited to 'Documentation/networking/scaling.txt')
-rw-r--r-- | Documentation/networking/scaling.txt | 23 |
1 files changed, 15 insertions, 8 deletions
diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt index 7254b4b5910e..58fd7414e6c0 100644 --- a/Documentation/networking/scaling.txt +++ b/Documentation/networking/scaling.txt | |||
@@ -52,7 +52,8 @@ module parameter for specifying the number of hardware queues to | |||
52 | configure. In the bnx2x driver, for instance, this parameter is called | 52 | configure. In the bnx2x driver, for instance, this parameter is called |
53 | num_queues. A typical RSS configuration would be to have one receive queue | 53 | num_queues. A typical RSS configuration would be to have one receive queue |
54 | for each CPU if the device supports enough queues, or otherwise at least | 54 | for each CPU if the device supports enough queues, or otherwise at least |
55 | one for each cache domain at a particular cache level (L1, L2, etc.). | 55 | one for each memory domain, where a memory domain is a set of CPUs that |
56 | share a particular memory level (L1, L2, NUMA node, etc.). | ||
56 | 57 | ||
57 | The indirection table of an RSS device, which resolves a queue by masked | 58 | The indirection table of an RSS device, which resolves a queue by masked |
58 | hash, is usually programmed by the driver at initialization. The | 59 | hash, is usually programmed by the driver at initialization. The |
@@ -82,11 +83,17 @@ RSS should be enabled when latency is a concern or whenever receive | |||
82 | interrupt processing forms a bottleneck. Spreading load between CPUs | 83 | interrupt processing forms a bottleneck. Spreading load between CPUs |
83 | decreases queue length. For low latency networking, the optimal setting | 84 | decreases queue length. For low latency networking, the optimal setting |
84 | is to allocate as many queues as there are CPUs in the system (or the | 85 | is to allocate as many queues as there are CPUs in the system (or the |
85 | NIC maximum, if lower). Because the aggregate number of interrupts grows | 86 | NIC maximum, if lower). The most efficient high-rate configuration |
86 | with each additional queue, the most efficient high-rate configuration | ||
87 | is likely the one with the smallest number of receive queues where no | 87 | is likely the one with the smallest number of receive queues where no |
88 | CPU that processes receive interrupts reaches 100% utilization. Per-cpu | 88 | receive queue overflows due to a saturated CPU, because in default |
89 | load can be observed using the mpstat utility. | 89 | mode with interrupt coalescing enabled, the aggregate number of |
90 | interrupts (and thus work) grows with each additional queue. | ||
91 | |||
92 | Per-cpu load can be observed using the mpstat utility, but note that on | ||
93 | processors with hyperthreading (HT), each hyperthread is represented as | ||
94 | a separate CPU. For interrupt handling, HT has shown no benefit in | ||
95 | initial tests, so limit the number of queues to the number of CPU cores | ||
96 | in the system. | ||
90 | 97 | ||
91 | 98 | ||
92 | RPS: Receive Packet Steering | 99 | RPS: Receive Packet Steering |
@@ -145,7 +152,7 @@ the bitmap. | |||
145 | == Suggested Configuration | 152 | == Suggested Configuration |
146 | 153 | ||
147 | For a single queue device, a typical RPS configuration would be to set | 154 | For a single queue device, a typical RPS configuration would be to set |
148 | the rps_cpus to the CPUs in the same cache domain of the interrupting | 155 | the rps_cpus to the CPUs in the same memory domain of the interrupting |
149 | CPU. If NUMA locality is not an issue, this could also be all CPUs in | 156 | CPU. If NUMA locality is not an issue, this could also be all CPUs in |
150 | the system. At high interrupt rate, it might be wise to exclude the | 157 | the system. At high interrupt rate, it might be wise to exclude the |
151 | interrupting CPU from the map since that already performs much work. | 158 | interrupting CPU from the map since that already performs much work. |
@@ -154,7 +161,7 @@ For a multi-queue system, if RSS is configured so that a hardware | |||
154 | receive queue is mapped to each CPU, then RPS is probably redundant | 161 | receive queue is mapped to each CPU, then RPS is probably redundant |
155 | and unnecessary. If there are fewer hardware queues than CPUs, then | 162 | and unnecessary. If there are fewer hardware queues than CPUs, then |
156 | RPS might be beneficial if the rps_cpus for each queue are the ones that | 163 | RPS might be beneficial if the rps_cpus for each queue are the ones that |
157 | share the same cache domain as the interrupting CPU for that queue. | 164 | share the same memory domain as the interrupting CPU for that queue. |
158 | 165 | ||
159 | 166 | ||
160 | RFS: Receive Flow Steering | 167 | RFS: Receive Flow Steering |
@@ -326,7 +333,7 @@ The queue chosen for transmitting a particular flow is saved in the | |||
326 | corresponding socket structure for the flow (e.g. a TCP connection). | 333 | corresponding socket structure for the flow (e.g. a TCP connection). |
327 | This transmit queue is used for subsequent packets sent on the flow to | 334 | This transmit queue is used for subsequent packets sent on the flow to |
328 | prevent out of order (ooo) packets. The choice also amortizes the cost | 335 | prevent out of order (ooo) packets. The choice also amortizes the cost |
329 | of calling get_xps_queues() over all packets in the connection. To avoid | 336 | of calling get_xps_queues() over all packets in the flow. To avoid |
330 | ooo packets, the queue for a flow can subsequently only be changed if | 337 | ooo packets, the queue for a flow can subsequently only be changed if |
331 | skb->ooo_okay is set for a packet in the flow. This flag indicates that | 338 | skb->ooo_okay is set for a packet in the flow. This flag indicates that |
332 | there are no outstanding packets in the flow, so the transmit queue can | 339 | there are no outstanding packets in the flow, so the transmit queue can |