aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking/scaling.txt
diff options
context:
space:
mode:
authorWillem de Bruijn <willemb@google.com>2013-05-22 03:54:40 -0400
committerDavid S. Miller <davem@davemloft.net>2013-05-23 21:54:44 -0400
commit191cb1f21afd9a7fbaa085ad9b86cb307e9a3891 (patch)
treeb5a7079ee9c55bb065301cdfe895b497d50002ee /Documentation/networking/scaling.txt
parent161f65ba3583b84b4714f21dbee263f99824c516 (diff)
rps: document flow limit in scaling.txt
Explain the mechanism and API of the recently merged rps flow limit patch. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation/networking/scaling.txt')
-rw-r--r--Documentation/networking/scaling.txt58
1 files changed, 58 insertions, 0 deletions
diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
index 579994afbe06..ca6977f5b2ed 100644
--- a/Documentation/networking/scaling.txt
+++ b/Documentation/networking/scaling.txt
@@ -163,6 +163,64 @@ and unnecessary. If there are fewer hardware queues than CPUs, then
163RPS might be beneficial if the rps_cpus for each queue are the ones that 163RPS might be beneficial if the rps_cpus for each queue are the ones that
164share the same memory domain as the interrupting CPU for that queue. 164share the same memory domain as the interrupting CPU for that queue.
165 165
166==== RPS Flow Limit
167
168RPS scales kernel receive processing across CPUs without introducing
169reordering. The trade-off to sending all packets from the same flow
170to the same CPU is CPU load imbalance if flows vary in packet rate.
171In the extreme case a single flow dominates traffic. Especially on
172common server workloads with many concurrent connections, such
173behavior indicates a problem such as a misconfiguration or spoofed
174source Denial of Service attack.
175
176Flow Limit is an optional RPS feature that prioritizes small flows
177during CPU contention by dropping packets from large flows slightly
178ahead of those from small flows. It is active only when an RPS or RFS
179destination CPU approaches saturation. Once a CPU's input packet
180queue exceeds half the maximum queue length (as set by sysctl
181net.core.netdev_max_backlog), the kernel starts a per-flow packet
182count over the last 256 packets. If a flow exceeds a set ratio (by
183default, half) of these packets when a new packet arrives, then the
184new packet is dropped. Packets from other flows are still only
185dropped once the input packet queue reaches netdev_max_backlog.
186No packets are dropped when the input packet queue length is below
187the threshold, so flow limit does not sever connections outright:
188even large flows maintain connectivity.
189
190== Interface
191
192Flow limit is compiled in by default (CONFIG_NET_FLOW_LIMIT), but not
193turned on. It is implemented for each CPU independently (to avoid lock
194and cache contention) and toggled per CPU by setting the relevant bit
195in sysctl net.core.flow_limit_cpu_bitmap. It exposes the same CPU
196bitmap interface as rps_cpus (see above) when called from procfs:
197
198 /proc/sys/net/core/flow_limit_cpu_bitmap
199
200Per-flow rate is calculated by hashing each packet into a hashtable
201bucket and incrementing a per-bucket counter. The hash function is
202the same that selects a CPU in RPS, but as the number of buckets can
203be much larger than the number of CPUs, flow limit has finer-grained
204identification of large flows and fewer false positives. The default
205table has 4096 buckets. This value can be modified through sysctl
206
207 net.core.flow_limit_table_len
208
209The value is only consulted when a new table is allocated. Modifying
210it does not update active tables.
211
212== Suggested Configuration
213
214Flow limit is useful on systems with many concurrent connections,
215where a single connection taking up 50% of a CPU indicates a problem.
216In such environments, enable the feature on all CPUs that handle
217network rx interrupts (as set in /proc/irq/N/smp_affinity).
218
219The feature depends on the input packet queue length to exceed
220the flow limit threshold (50%) + the flow history length (256).
221Setting net.core.netdev_max_backlog to either 1000 or 10000
222performed well in experiments.
223
166 224
167RFS: Receive Flow Steering 225RFS: Receive Flow Steering
168========================== 226==========================