aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking/scaling.txt
diff options
context:
space:
mode:
authorWillem de Bruijn <willemb@google.com>2011-08-11 10:41:48 -0400
committerDavid S. Miller <davem@davemloft.net>2011-08-13 21:00:33 -0400
commit320f24e482e6b390c608c6afec253405f9ab7436 (patch)
treecfb6f94a5bb381ab6794270a9db032d8c4a82c0b /Documentation/networking/scaling.txt
parentb88cf73d9278a5838e3ac2b670ab3b4ff533ea17 (diff)
net: minor update to Documentation/networking/scaling.txt
Incorporate last comments about hyperthreading, interrupt coalescing and the definition of cache domains into the network scaling document scaling.txt Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation/networking/scaling.txt')
-rw-r--r--Documentation/networking/scaling.txt23
1 files changed, 15 insertions, 8 deletions
diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
index 7254b4b5910e..58fd7414e6c0 100644
--- a/Documentation/networking/scaling.txt
+++ b/Documentation/networking/scaling.txt
@@ -52,7 +52,8 @@ module parameter for specifying the number of hardware queues to
52configure. In the bnx2x driver, for instance, this parameter is called 52configure. In the bnx2x driver, for instance, this parameter is called
53num_queues. A typical RSS configuration would be to have one receive queue 53num_queues. A typical RSS configuration would be to have one receive queue
54for each CPU if the device supports enough queues, or otherwise at least 54for each CPU if the device supports enough queues, or otherwise at least
55one for each cache domain at a particular cache level (L1, L2, etc.). 55one for each memory domain, where a memory domain is a set of CPUs that
56share a particular memory level (L1, L2, NUMA node, etc.).
56 57
57The indirection table of an RSS device, which resolves a queue by masked 58The indirection table of an RSS device, which resolves a queue by masked
58hash, is usually programmed by the driver at initialization. The 59hash, is usually programmed by the driver at initialization. The
@@ -82,11 +83,17 @@ RSS should be enabled when latency is a concern or whenever receive
82interrupt processing forms a bottleneck. Spreading load between CPUs 83interrupt processing forms a bottleneck. Spreading load between CPUs
83decreases queue length. For low latency networking, the optimal setting 84decreases queue length. For low latency networking, the optimal setting
84is to allocate as many queues as there are CPUs in the system (or the 85is to allocate as many queues as there are CPUs in the system (or the
85NIC maximum, if lower). Because the aggregate number of interrupts grows 86NIC maximum, if lower). The most efficient high-rate configuration
86with each additional queue, the most efficient high-rate configuration
87is likely the one with the smallest number of receive queues where no 87is likely the one with the smallest number of receive queues where no
88CPU that processes receive interrupts reaches 100% utilization. Per-cpu 88receive queue overflows due to a saturated CPU, because in default
89load can be observed using the mpstat utility. 89mode with interrupt coalescing enabled, the aggregate number of
90interrupts (and thus work) grows with each additional queue.
91
92Per-cpu load can be observed using the mpstat utility, but note that on
93processors with hyperthreading (HT), each hyperthread is represented as
94a separate CPU. For interrupt handling, HT has shown no benefit in
95initial tests, so limit the number of queues to the number of CPU cores
96in the system.
90 97
91 98
92RPS: Receive Packet Steering 99RPS: Receive Packet Steering
@@ -145,7 +152,7 @@ the bitmap.
145== Suggested Configuration 152== Suggested Configuration
146 153
147For a single queue device, a typical RPS configuration would be to set 154For a single queue device, a typical RPS configuration would be to set
148the rps_cpus to the CPUs in the same cache domain of the interrupting 155the rps_cpus to the CPUs in the same memory domain of the interrupting
149CPU. If NUMA locality is not an issue, this could also be all CPUs in 156CPU. If NUMA locality is not an issue, this could also be all CPUs in
150the system. At high interrupt rate, it might be wise to exclude the 157the system. At high interrupt rate, it might be wise to exclude the
151interrupting CPU from the map since that already performs much work. 158interrupting CPU from the map since that already performs much work.
@@ -154,7 +161,7 @@ For a multi-queue system, if RSS is configured so that a hardware
154receive queue is mapped to each CPU, then RPS is probably redundant 161receive queue is mapped to each CPU, then RPS is probably redundant
155and unnecessary. If there are fewer hardware queues than CPUs, then 162and unnecessary. If there are fewer hardware queues than CPUs, then
156RPS might be beneficial if the rps_cpus for each queue are the ones that 163RPS might be beneficial if the rps_cpus for each queue are the ones that
157share the same cache domain as the interrupting CPU for that queue. 164share the same memory domain as the interrupting CPU for that queue.
158 165
159 166
160RFS: Receive Flow Steering 167RFS: Receive Flow Steering
@@ -326,7 +333,7 @@ The queue chosen for transmitting a particular flow is saved in the
326corresponding socket structure for the flow (e.g. a TCP connection). 333corresponding socket structure for the flow (e.g. a TCP connection).
327This transmit queue is used for subsequent packets sent on the flow to 334This transmit queue is used for subsequent packets sent on the flow to
328prevent out of order (ooo) packets. The choice also amortizes the cost 335prevent out of order (ooo) packets. The choice also amortizes the cost
329of calling get_xps_queues() over all packets in the connection. To avoid 336of calling get_xps_queues() over all packets in the flow. To avoid
330ooo packets, the queue for a flow can subsequently only be changed if 337ooo packets, the queue for a flow can subsequently only be changed if
331skb->ooo_okay is set for a packet in the flow. This flag indicates that 338skb->ooo_okay is set for a packet in the flow. This flag indicates that
332there are no outstanding packets in the flow, so the transmit queue can 339there are no outstanding packets in the flow, so the transmit queue can