diff options
author | Jesper Dangaard Brouer <brouer@redhat.com> | 2018-05-24 10:46:12 -0400 |
---|---|---|
committer | Alexei Starovoitov <ast@kernel.org> | 2018-05-24 21:36:15 -0400 |
commit | 735fc4054b3a25034445c6713d259da0f96f8131 (patch) | |
tree | 355f7a0672e6239fa4227d562f7d5b65fac9c011 /kernel/bpf/devmap.c | |
parent | 389ab7f01af988c2a1ec5617eb0c7e220df1ef1c (diff) |
xdp: change ndo_xdp_xmit API to support bulking
This patch change the API for ndo_xdp_xmit to support bulking
xdp_frames.
When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
Most of the slowdown is caused by DMA API indirect function calls, but
also the net_device->ndo_xdp_xmit() call.
Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
performance improved:
for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps
With frames avail as a bulk inside the driver ndo_xdp_xmit call,
further optimizations are possible, like bulk DMA-mapping for TX.
Testing without CONFIG_RETPOLINE show the same performance for
physical NIC drivers.
The virtual NIC driver tun sees a huge performance boost, as it can
avoid doing per frame producer locking, but instead amortize the
locking cost over the bulk.
V2: Fix compile errors reported by kbuild test robot <lkp@intel.com>
V4: Isolated ndo, driver changes and callers.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Diffstat (limited to 'kernel/bpf/devmap.c')
-rw-r--r-- | kernel/bpf/devmap.c | 29 |
1 files changed, 18 insertions, 11 deletions
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index a9cd5c93dd2b..77908311ec98 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c | |||
@@ -232,24 +232,31 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj, | |||
232 | prefetch(xdpf); | 232 | prefetch(xdpf); |
233 | } | 233 | } |
234 | 234 | ||
235 | for (i = 0; i < bq->count; i++) { | 235 | sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q); |
236 | struct xdp_frame *xdpf = bq->q[i]; | 236 | if (sent < 0) { |
237 | int err; | 237 | sent = 0; |
238 | 238 | goto error; | |
239 | err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf); | ||
240 | if (err) { | ||
241 | drops++; | ||
242 | xdp_return_frame_rx_napi(xdpf); | ||
243 | } else { | ||
244 | sent++; | ||
245 | } | ||
246 | } | 239 | } |
240 | drops = bq->count - sent; | ||
241 | out: | ||
247 | bq->count = 0; | 242 | bq->count = 0; |
248 | 243 | ||
249 | trace_xdp_devmap_xmit(&obj->dtab->map, obj->bit, | 244 | trace_xdp_devmap_xmit(&obj->dtab->map, obj->bit, |
250 | sent, drops, bq->dev_rx, dev); | 245 | sent, drops, bq->dev_rx, dev); |
251 | bq->dev_rx = NULL; | 246 | bq->dev_rx = NULL; |
252 | return 0; | 247 | return 0; |
248 | error: | ||
249 | /* If ndo_xdp_xmit fails with an errno, no frames have been | ||
250 | * xmit'ed and it's our responsibility to them free all. | ||
251 | */ | ||
252 | for (i = 0; i < bq->count; i++) { | ||
253 | struct xdp_frame *xdpf = bq->q[i]; | ||
254 | |||
255 | /* RX path under NAPI protection, can return frames faster */ | ||
256 | xdp_return_frame_rx_napi(xdpf); | ||
257 | drops++; | ||
258 | } | ||
259 | goto out; | ||
253 | } | 260 | } |
254 | 261 | ||
255 | /* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled | 262 | /* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled |