aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/netdevsim/dev.c
diff options
context:
space:
mode:
authorDavid S. Miller <davem@davemloft.net>2019-08-17 15:40:09 -0400
committerDavid S. Miller <davem@davemloft.net>2019-08-17 15:40:09 -0400
commit83beee5a3aff0fb159b2fb4d0cac8f18a193417e (patch)
treece77ccefee1384488408d9b9e49e2148359f30d9 /drivers/net/netdevsim/dev.c
parentf77508308fa76d0efc60ebf3c906f467feb062cb (diff)
parent95766451bfb82f972bf3fea93fc6e91a904cf624 (diff)
Merge branch 'drop_monitor-for-offloaded-paths'
Ido Schimmel says: ==================== Add drop monitor for offloaded data paths Users have several ways to debug the kernel and understand why a packet was dropped. For example, using drop monitor and perf. Both utilities trace kfree_skb(), which is the function called when a packet is freed as part of a failure. The information provided by these tools is invaluable when trying to understand the cause of a packet loss. In recent years, large portions of the kernel data path were offloaded to capable devices. Today, it is possible to perform L2 and L3 forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN). Different TC classifiers and actions are also offloaded to capable devices, at both ingress and egress. However, when the data path is offloaded it is not possible to achieve the same level of introspection since packets are dropped by the underlying device and never reach the kernel. This patchset aims to solve this by allowing users to monitor packets that the underlying device decided to drop along with relevant metadata such as the drop reason and ingress port. The above is achieved by exposing a fundamental capability of devices capable of data path offloading - packet trapping. In much the same way as drop monitor registers its probe function with the kfree_skb() tracepoint, the device is instructed to pass to the CPU (trap) packets that it decided to drop in various places in the pipeline. The configuration of the device to pass such packets to the CPU is performed using devlink, as it is not specific to a port, but rather to a device. In the future, we plan to control the policing of such packets using devlink, in order not to overwhelm the CPU. While devlink is used as the control path, the dropped packets are passed along with metadata to drop monitor, which reports them to userspace as netlink events. This allows users to use the same interface for the monitoring of both software and hardware drops. Logically, the solution looks as follows: Netlink event: Packet w/ metadata Or a summary of recent drops ^ | Userspace | +---------------------------------------------------+ Kernel | | +-------+--------+ | | | drop_monitor | | | +-------^--------+ | | | +----+----+ | | Kernel's Rx path | devlink | (non-drop traps) | | +----^----+ ^ | | +-----------+ | +-------+-------+ | | | Device driver | | | +-------^-------+ Kernel | +---------------------------------------------------+ Hardware | | Trapped packet | +--+---+ | | | ASIC | | | +------+ In order to reduce the patch count, this patchset only includes integration with netdevsim. A follow-up patchset will add devlink-trap support in mlxsw. Patches #1-#7 extend drop monitor to also monitor hardware originated drops. Patches #8-#10 add the devlink-trap infrastructure. Patches #11-#12 add devlink-trap support in netdevsim. Patches #13-#16 add tests for the generic infrastructure over netdevsim. Example ======= Instantiate netdevsim --------------------- List supported traps -------------------- netdevsim/netdevsim10: name source_mac_is_multicast type drop generic true action drop group l2_drops name vlan_tag_mismatch type drop generic true action drop group l2_drops name ingress_vlan_filter type drop generic true action drop group l2_drops name ingress_spanning_tree_filter type drop generic true action drop group l2_drops name port_list_is_empty type drop generic true action drop group l2_drops name port_loopback_filter type drop generic true action drop group l2_drops name fid_miss type exception generic false action trap group l2_drops name blackhole_route type drop generic true action drop group l3_drops name ttl_value_is_too_small type exception generic true action trap group l3_drops name tail_drop type drop generic true action drop group buffer_drops Enable a trap ------------- Query statistics ---------------- netdevsim/netdevsim10: name blackhole_route type drop generic true action trap group l3_drops stats: rx: bytes 7384 packets 52 Monitor dropped packets ----------------------- dropwatch> set alertmode packet Setting alert mode Alert mode successfully set dropwatch> set sw true setting software drops monitoring to 1 dropwatch> set hw true setting hardware drops monitoring to 1 dropwatch> start Enabling monitoring... Kernel monitoring activated. Issue Ctrl-C to stop monitoring drop at: ttl_value_is_too_small (l3_drops) origin: hardware input port ifindex: 55 input port name: eth0 timestamp: Mon Aug 12 10:52:20 2019 445911505 nsec protocol: 0x800 length: 142 original length: 142 drop at: ip6_mc_input+0x8b8/0xef8 (0xffffffff9e2bb0e8) origin: software input port ifindex: 4 timestamp: Mon Aug 12 10:53:37 2019 024444587 nsec protocol: 0x86dd length: 110 original length: 110 Future plans ============ * Provide more drop reasons as well as more metadata * Add dropmon support to libpcap, so that tcpdump/tshark could specifically listen on dropmon traffic, instead of capturing all netlink packets via nlmon interface Changes in v3: * Place test with the rest of the netdevsim tests * Fix test to load netdevsim module * Move devlink helpers from the test to devlink_lib.sh. Will be used by mlxsw tests * Re-order netdevsim includes in alphabetical order * Fix reverse xmas tree in netdevsim * Remove double include in netdevsim Changes in v2: * Use drop monitor to report dropped packets instead of devlink * Add drop monitor patches * Add test cases ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'drivers/net/netdevsim/dev.c')
-rw-r--r--drivers/net/netdevsim/dev.c282
1 files changed, 281 insertions, 1 deletions
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index a570da406d1d..c217049552f7 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -17,11 +17,20 @@
17 17
18#include <linux/debugfs.h> 18#include <linux/debugfs.h>
19#include <linux/device.h> 19#include <linux/device.h>
20#include <linux/etherdevice.h>
21#include <linux/inet.h>
22#include <linux/jiffies.h>
23#include <linux/kernel.h>
20#include <linux/list.h> 24#include <linux/list.h>
21#include <linux/mutex.h> 25#include <linux/mutex.h>
22#include <linux/random.h> 26#include <linux/random.h>
23#include <linux/rtnetlink.h> 27#include <linux/rtnetlink.h>
28#include <linux/workqueue.h>
24#include <net/devlink.h> 29#include <net/devlink.h>
30#include <net/ip.h>
31#include <uapi/linux/devlink.h>
32#include <uapi/linux/ip.h>
33#include <uapi/linux/udp.h>
25 34
26#include "netdevsim.h" 35#include "netdevsim.h"
27 36
@@ -302,6 +311,218 @@ static void nsim_dev_dummy_region_exit(struct nsim_dev *nsim_dev)
302 devlink_region_destroy(nsim_dev->dummy_region); 311 devlink_region_destroy(nsim_dev->dummy_region);
303} 312}
304 313
314struct nsim_trap_item {
315 void *trap_ctx;
316 enum devlink_trap_action action;
317};
318
319struct nsim_trap_data {
320 struct delayed_work trap_report_dw;
321 struct nsim_trap_item *trap_items_arr;
322 struct nsim_dev *nsim_dev;
323 spinlock_t trap_lock; /* Protects trap_items_arr */
324};
325
326/* All driver-specific traps must be documented in
327 * Documentation/networking/devlink-trap-netdevsim.rst
328 */
329enum {
330 NSIM_TRAP_ID_BASE = DEVLINK_TRAP_GENERIC_ID_MAX,
331 NSIM_TRAP_ID_FID_MISS,
332};
333
334#define NSIM_TRAP_NAME_FID_MISS "fid_miss"
335
336#define NSIM_TRAP_METADATA DEVLINK_TRAP_METADATA_TYPE_F_IN_PORT
337
338#define NSIM_TRAP_DROP(_id, _group_id) \
339 DEVLINK_TRAP_GENERIC(DROP, DROP, _id, \
340 DEVLINK_TRAP_GROUP_GENERIC(_group_id), \
341 NSIM_TRAP_METADATA)
342#define NSIM_TRAP_EXCEPTION(_id, _group_id) \
343 DEVLINK_TRAP_GENERIC(EXCEPTION, TRAP, _id, \
344 DEVLINK_TRAP_GROUP_GENERIC(_group_id), \
345 NSIM_TRAP_METADATA)
346#define NSIM_TRAP_DRIVER_EXCEPTION(_id, _group_id) \
347 DEVLINK_TRAP_DRIVER(EXCEPTION, TRAP, NSIM_TRAP_ID_##_id, \
348 NSIM_TRAP_NAME_##_id, \
349 DEVLINK_TRAP_GROUP_GENERIC(_group_id), \
350 NSIM_TRAP_METADATA)
351
352static const struct devlink_trap nsim_traps_arr[] = {
353 NSIM_TRAP_DROP(SMAC_MC, L2_DROPS),
354 NSIM_TRAP_DROP(VLAN_TAG_MISMATCH, L2_DROPS),
355 NSIM_TRAP_DROP(INGRESS_VLAN_FILTER, L2_DROPS),
356 NSIM_TRAP_DROP(INGRESS_STP_FILTER, L2_DROPS),
357 NSIM_TRAP_DROP(EMPTY_TX_LIST, L2_DROPS),
358 NSIM_TRAP_DROP(PORT_LOOPBACK_FILTER, L2_DROPS),
359 NSIM_TRAP_DRIVER_EXCEPTION(FID_MISS, L2_DROPS),
360 NSIM_TRAP_DROP(BLACKHOLE_ROUTE, L3_DROPS),
361 NSIM_TRAP_EXCEPTION(TTL_ERROR, L3_DROPS),
362 NSIM_TRAP_DROP(TAIL_DROP, BUFFER_DROPS),
363};
364
365#define NSIM_TRAP_L4_DATA_LEN 100
366
367static struct sk_buff *nsim_dev_trap_skb_build(void)
368{
369 int tot_len, data_len = NSIM_TRAP_L4_DATA_LEN;
370 struct sk_buff *skb;
371 struct udphdr *udph;
372 struct ethhdr *eth;
373 struct iphdr *iph;
374
375 skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC);
376 if (!skb)
377 return NULL;
378 tot_len = sizeof(struct iphdr) + sizeof(struct udphdr) + data_len;
379
380 eth = skb_put(skb, sizeof(struct ethhdr));
381 eth_random_addr(eth->h_dest);
382 eth_random_addr(eth->h_source);
383 eth->h_proto = htons(ETH_P_IP);
384 skb->protocol = htons(ETH_P_IP);
385
386 iph = skb_put(skb, sizeof(struct iphdr));
387 iph->protocol = IPPROTO_UDP;
388 iph->saddr = in_aton("192.0.2.1");
389 iph->daddr = in_aton("198.51.100.1");
390 iph->version = 0x4;
391 iph->frag_off = 0;
392 iph->ihl = 0x5;
393 iph->tot_len = htons(tot_len);
394 iph->ttl = 100;
395 ip_send_check(iph);
396
397 udph = skb_put_zero(skb, sizeof(struct udphdr) + data_len);
398 get_random_bytes(&udph->source, sizeof(u16));
399 get_random_bytes(&udph->dest, sizeof(u16));
400 udph->len = htons(sizeof(struct udphdr) + data_len);
401
402 return skb;
403}
404
405static void nsim_dev_trap_report(struct nsim_dev_port *nsim_dev_port)
406{
407 struct nsim_dev *nsim_dev = nsim_dev_port->ns->nsim_dev;
408 struct devlink *devlink = priv_to_devlink(nsim_dev);
409 struct nsim_trap_data *nsim_trap_data;
410 int i;
411
412 nsim_trap_data = nsim_dev->trap_data;
413
414 spin_lock(&nsim_trap_data->trap_lock);
415 for (i = 0; i < ARRAY_SIZE(nsim_traps_arr); i++) {
416 struct nsim_trap_item *nsim_trap_item;
417 struct sk_buff *skb;
418
419 nsim_trap_item = &nsim_trap_data->trap_items_arr[i];
420 if (nsim_trap_item->action == DEVLINK_TRAP_ACTION_DROP)
421 continue;
422
423 skb = nsim_dev_trap_skb_build();
424 if (!skb)
425 continue;
426 skb->dev = nsim_dev_port->ns->netdev;
427
428 /* Trapped packets are usually passed to devlink in softIRQ,
429 * but in this case they are generated in a workqueue. Disable
430 * softIRQs to prevent lockdep from complaining about
431 * "incosistent lock state".
432 */
433 local_bh_disable();
434 devlink_trap_report(devlink, skb, nsim_trap_item->trap_ctx,
435 &nsim_dev_port->devlink_port);
436 local_bh_enable();
437 consume_skb(skb);
438 }
439 spin_unlock(&nsim_trap_data->trap_lock);
440}
441
442#define NSIM_TRAP_REPORT_INTERVAL_MS 100
443
444static void nsim_dev_trap_report_work(struct work_struct *work)
445{
446 struct nsim_trap_data *nsim_trap_data;
447 struct nsim_dev_port *nsim_dev_port;
448 struct nsim_dev *nsim_dev;
449
450 nsim_trap_data = container_of(work, struct nsim_trap_data,
451 trap_report_dw.work);
452 nsim_dev = nsim_trap_data->nsim_dev;
453
454 /* For each running port and enabled packet trap, generate a UDP
455 * packet with a random 5-tuple and report it.
456 */
457 mutex_lock(&nsim_dev->port_list_lock);
458 list_for_each_entry(nsim_dev_port, &nsim_dev->port_list, list) {
459 if (!netif_running(nsim_dev_port->ns->netdev))
460 continue;
461
462 nsim_dev_trap_report(nsim_dev_port);
463 }
464 mutex_unlock(&nsim_dev->port_list_lock);
465
466 schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw,
467 msecs_to_jiffies(NSIM_TRAP_REPORT_INTERVAL_MS));
468}
469
470static int nsim_dev_traps_init(struct devlink *devlink)
471{
472 struct nsim_dev *nsim_dev = devlink_priv(devlink);
473 struct nsim_trap_data *nsim_trap_data;
474 int err;
475
476 nsim_trap_data = kzalloc(sizeof(*nsim_trap_data), GFP_KERNEL);
477 if (!nsim_trap_data)
478 return -ENOMEM;
479
480 nsim_trap_data->trap_items_arr = kcalloc(ARRAY_SIZE(nsim_traps_arr),
481 sizeof(struct nsim_trap_item),
482 GFP_KERNEL);
483 if (!nsim_trap_data->trap_items_arr) {
484 err = -ENOMEM;
485 goto err_trap_data_free;
486 }
487
488 /* The lock is used to protect the action state of the registered
489 * traps. The value is written by user and read in delayed work when
490 * iterating over all the traps.
491 */
492 spin_lock_init(&nsim_trap_data->trap_lock);
493 nsim_trap_data->nsim_dev = nsim_dev;
494 nsim_dev->trap_data = nsim_trap_data;
495
496 err = devlink_traps_register(devlink, nsim_traps_arr,
497 ARRAY_SIZE(nsim_traps_arr), NULL);
498 if (err)
499 goto err_trap_items_free;
500
501 INIT_DELAYED_WORK(&nsim_dev->trap_data->trap_report_dw,
502 nsim_dev_trap_report_work);
503 schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw,
504 msecs_to_jiffies(NSIM_TRAP_REPORT_INTERVAL_MS));
505
506 return 0;
507
508err_trap_items_free:
509 kfree(nsim_trap_data->trap_items_arr);
510err_trap_data_free:
511 kfree(nsim_trap_data);
512 return err;
513}
514
515static void nsim_dev_traps_exit(struct devlink *devlink)
516{
517 struct nsim_dev *nsim_dev = devlink_priv(devlink);
518
519 cancel_delayed_work_sync(&nsim_dev->trap_data->trap_report_dw);
520 devlink_traps_unregister(devlink, nsim_traps_arr,
521 ARRAY_SIZE(nsim_traps_arr));
522 kfree(nsim_dev->trap_data->trap_items_arr);
523 kfree(nsim_dev->trap_data);
524}
525
305static int nsim_dev_reload(struct devlink *devlink, 526static int nsim_dev_reload(struct devlink *devlink,
306 struct netlink_ext_ack *extack) 527 struct netlink_ext_ack *extack)
307{ 528{
@@ -369,9 +590,61 @@ static int nsim_dev_flash_update(struct devlink *devlink, const char *file_name,
369 return 0; 590 return 0;
370} 591}
371 592
593static struct nsim_trap_item *
594nsim_dev_trap_item_lookup(struct nsim_dev *nsim_dev, u16 trap_id)
595{
596 struct nsim_trap_data *nsim_trap_data = nsim_dev->trap_data;
597 int i;
598
599 for (i = 0; i < ARRAY_SIZE(nsim_traps_arr); i++) {
600 if (nsim_traps_arr[i].id == trap_id)
601 return &nsim_trap_data->trap_items_arr[i];
602 }
603
604 return NULL;
605}
606
607static int nsim_dev_devlink_trap_init(struct devlink *devlink,
608 const struct devlink_trap *trap,
609 void *trap_ctx)
610{
611 struct nsim_dev *nsim_dev = devlink_priv(devlink);
612 struct nsim_trap_item *nsim_trap_item;
613
614 nsim_trap_item = nsim_dev_trap_item_lookup(nsim_dev, trap->id);
615 if (WARN_ON(!nsim_trap_item))
616 return -ENOENT;
617
618 nsim_trap_item->trap_ctx = trap_ctx;
619 nsim_trap_item->action = trap->init_action;
620
621 return 0;
622}
623
624static int
625nsim_dev_devlink_trap_action_set(struct devlink *devlink,
626 const struct devlink_trap *trap,
627 enum devlink_trap_action action)
628{
629 struct nsim_dev *nsim_dev = devlink_priv(devlink);
630 struct nsim_trap_item *nsim_trap_item;
631
632 nsim_trap_item = nsim_dev_trap_item_lookup(nsim_dev, trap->id);
633 if (WARN_ON(!nsim_trap_item))
634 return -ENOENT;
635
636 spin_lock(&nsim_dev->trap_data->trap_lock);
637 nsim_trap_item->action = action;
638 spin_unlock(&nsim_dev->trap_data->trap_lock);
639
640 return 0;
641}
642
372static const struct devlink_ops nsim_dev_devlink_ops = { 643static const struct devlink_ops nsim_dev_devlink_ops = {
373 .reload = nsim_dev_reload, 644 .reload = nsim_dev_reload,
374 .flash_update = nsim_dev_flash_update, 645 .flash_update = nsim_dev_flash_update,
646 .trap_init = nsim_dev_devlink_trap_init,
647 .trap_action_set = nsim_dev_devlink_trap_action_set,
375}; 648};
376 649
377#define NSIM_DEV_MAX_MACS_DEFAULT 32 650#define NSIM_DEV_MAX_MACS_DEFAULT 32
@@ -421,10 +694,14 @@ nsim_dev_create(struct nsim_bus_dev *nsim_bus_dev, unsigned int port_count)
421 if (err) 694 if (err)
422 goto err_params_unregister; 695 goto err_params_unregister;
423 696
424 err = nsim_dev_debugfs_init(nsim_dev); 697 err = nsim_dev_traps_init(devlink);
425 if (err) 698 if (err)
426 goto err_dummy_region_exit; 699 goto err_dummy_region_exit;
427 700
701 err = nsim_dev_debugfs_init(nsim_dev);
702 if (err)
703 goto err_traps_exit;
704
428 err = nsim_bpf_dev_init(nsim_dev); 705 err = nsim_bpf_dev_init(nsim_dev);
429 if (err) 706 if (err)
430 goto err_debugfs_exit; 707 goto err_debugfs_exit;
@@ -434,6 +711,8 @@ nsim_dev_create(struct nsim_bus_dev *nsim_bus_dev, unsigned int port_count)
434 711
435err_debugfs_exit: 712err_debugfs_exit:
436 nsim_dev_debugfs_exit(nsim_dev); 713 nsim_dev_debugfs_exit(nsim_dev);
714err_traps_exit:
715 nsim_dev_traps_exit(devlink);
437err_dummy_region_exit: 716err_dummy_region_exit:
438 nsim_dev_dummy_region_exit(nsim_dev); 717 nsim_dev_dummy_region_exit(nsim_dev);
439err_params_unregister: 718err_params_unregister:
@@ -456,6 +735,7 @@ static void nsim_dev_destroy(struct nsim_dev *nsim_dev)
456 735
457 nsim_bpf_dev_exit(nsim_dev); 736 nsim_bpf_dev_exit(nsim_dev);
458 nsim_dev_debugfs_exit(nsim_dev); 737 nsim_dev_debugfs_exit(nsim_dev);
738 nsim_dev_traps_exit(devlink);
459 nsim_dev_dummy_region_exit(nsim_dev); 739 nsim_dev_dummy_region_exit(nsim_dev);
460 devlink_params_unregister(devlink, nsim_devlink_params, 740 devlink_params_unregister(devlink, nsim_devlink_params,
461 ARRAY_SIZE(nsim_devlink_params)); 741 ARRAY_SIZE(nsim_devlink_params));