diff options
author | Willem de Bruijn <willemb@google.com> | 2013-05-22 03:54:40 -0400 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2013-05-23 21:54:44 -0400 |
commit | 191cb1f21afd9a7fbaa085ad9b86cb307e9a3891 (patch) | |
tree | b5a7079ee9c55bb065301cdfe895b497d50002ee /Documentation/networking/scaling.txt | |
parent | 161f65ba3583b84b4714f21dbee263f99824c516 (diff) |
rps: document flow limit in scaling.txt
Explain the mechanism and API of the recently merged
rps flow limit patch.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation/networking/scaling.txt')
-rw-r--r-- | Documentation/networking/scaling.txt | 58 |
1 files changed, 58 insertions, 0 deletions
diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt index 579994afbe06..ca6977f5b2ed 100644 --- a/Documentation/networking/scaling.txt +++ b/Documentation/networking/scaling.txt | |||
@@ -163,6 +163,64 @@ and unnecessary. If there are fewer hardware queues than CPUs, then | |||
163 | RPS might be beneficial if the rps_cpus for each queue are the ones that | 163 | RPS might be beneficial if the rps_cpus for each queue are the ones that |
164 | share the same memory domain as the interrupting CPU for that queue. | 164 | share the same memory domain as the interrupting CPU for that queue. |
165 | 165 | ||
166 | ==== RPS Flow Limit | ||
167 | |||
168 | RPS scales kernel receive processing across CPUs without introducing | ||
169 | reordering. The trade-off to sending all packets from the same flow | ||
170 | to the same CPU is CPU load imbalance if flows vary in packet rate. | ||
171 | In the extreme case a single flow dominates traffic. Especially on | ||
172 | common server workloads with many concurrent connections, such | ||
173 | behavior indicates a problem such as a misconfiguration or spoofed | ||
174 | source Denial of Service attack. | ||
175 | |||
176 | Flow Limit is an optional RPS feature that prioritizes small flows | ||
177 | during CPU contention by dropping packets from large flows slightly | ||
178 | ahead of those from small flows. It is active only when an RPS or RFS | ||
179 | destination CPU approaches saturation. Once a CPU's input packet | ||
180 | queue exceeds half the maximum queue length (as set by sysctl | ||
181 | net.core.netdev_max_backlog), the kernel starts a per-flow packet | ||
182 | count over the last 256 packets. If a flow exceeds a set ratio (by | ||
183 | default, half) of these packets when a new packet arrives, then the | ||
184 | new packet is dropped. Packets from other flows are still only | ||
185 | dropped once the input packet queue reaches netdev_max_backlog. | ||
186 | No packets are dropped when the input packet queue length is below | ||
187 | the threshold, so flow limit does not sever connections outright: | ||
188 | even large flows maintain connectivity. | ||
189 | |||
190 | == Interface | ||
191 | |||
192 | Flow limit is compiled in by default (CONFIG_NET_FLOW_LIMIT), but not | ||
193 | turned on. It is implemented for each CPU independently (to avoid lock | ||
194 | and cache contention) and toggled per CPU by setting the relevant bit | ||
195 | in sysctl net.core.flow_limit_cpu_bitmap. It exposes the same CPU | ||
196 | bitmap interface as rps_cpus (see above) when called from procfs: | ||
197 | |||
198 | /proc/sys/net/core/flow_limit_cpu_bitmap | ||
199 | |||
200 | Per-flow rate is calculated by hashing each packet into a hashtable | ||
201 | bucket and incrementing a per-bucket counter. The hash function is | ||
202 | the same that selects a CPU in RPS, but as the number of buckets can | ||
203 | be much larger than the number of CPUs, flow limit has finer-grained | ||
204 | identification of large flows and fewer false positives. The default | ||
205 | table has 4096 buckets. This value can be modified through sysctl | ||
206 | |||
207 | net.core.flow_limit_table_len | ||
208 | |||
209 | The value is only consulted when a new table is allocated. Modifying | ||
210 | it does not update active tables. | ||
211 | |||
212 | == Suggested Configuration | ||
213 | |||
214 | Flow limit is useful on systems with many concurrent connections, | ||
215 | where a single connection taking up 50% of a CPU indicates a problem. | ||
216 | In such environments, enable the feature on all CPUs that handle | ||
217 | network rx interrupts (as set in /proc/irq/N/smp_affinity). | ||
218 | |||
219 | The feature depends on the input packet queue length to exceed | ||
220 | the flow limit threshold (50%) + the flow history length (256). | ||
221 | Setting net.core.netdev_max_backlog to either 1000 or 10000 | ||
222 | performed well in experiments. | ||
223 | |||
166 | 224 | ||
167 | RFS: Receive Flow Steering | 225 | RFS: Receive Flow Steering |
168 | ========================== | 226 | ========================== |