aboutsummaryrefslogtreecommitdiffstats
path: root/net/core
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2013-11-13 03:40:34 -0500
committerLinus Torvalds <torvalds@linux-foundation.org>2013-11-13 03:40:34 -0500
commit42a2d923cc349583ebf6fdd52a7d35e1c2f7e6bd (patch)
tree2b2b0c03b5389c1301800119333967efafd994ca /net/core
parent5cbb3d216e2041700231bcfc383ee5f8b7fc8b74 (diff)
parent75ecab1df14d90e86cebef9ec5c76befde46e65f (diff)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller: 1) The addition of nftables. No longer will we need protocol aware firewall filtering modules, it can all live in userspace. At the core of nftables is a, for lack of a better term, virtual machine that executes byte codes to inspect packet or metadata (arriving interface index, etc.) and make verdict decisions. Besides support for loading packet contents and comparing them, the interpreter supports lookups in various datastructures as fundamental operations. For example sets are supports, and therefore one could create a set of whitelist IP address entries which have ACCEPT verdicts attached to them, and use the appropriate byte codes to do such lookups. Since the interpreted code is composed in userspace, userspace can do things like optimize things before giving it to the kernel. Another major improvement is the capability of atomically updating portions of the ruleset. In the existing netfilter implementation, one has to update the entire rule set in order to make a change and this is very expensive. Userspace tools exist to create nftables rules using existing netfilter rule sets, but both kernel implementations will need to co-exist for quite some time as we transition from the old to the new stuff. Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have worked so hard on this. 2) Daniel Borkmann and Hannes Frederic Sowa made several improvements to our pseudo-random number generator, mostly used for things like UDP port randomization and netfitler, amongst other things. In particular the taus88 generater is updated to taus113, and test cases are added. 3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet and Yang Yingliang. 4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin Sujir. 5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet, Neal Cardwell, and Yuchung Cheng. 6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary control message data, much like other socket option attributes. From Francesco Fusco. 7) Allow applications to specify a cap on the rate computed automatically by the kernel for pacing flows, via a new SO_MAX_PACING_RATE socket option. From Eric Dumazet. 8) Make the initial autotuned send buffer sizing in TCP more closely reflect actual needs, from Eric Dumazet. 9) Currently early socket demux only happens for TCP sockets, but we can do it for connected UDP sockets too. Implementation from Shawn Bohrer. 10) Refactor inet socket demux with the goal of improving hash demux performance for listening sockets. With the main goals being able to use RCU lookups on even request sockets, and eliminating the listening lock contention. From Eric Dumazet. 11) The bonding layer has many demuxes in it's fast path, and an RCU conversion was started back in 3.11, several changes here extend the RCU usage to even more locations. From Ding Tianhong and Wang Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav Falico. 12) Allow stackability of segmentation offloads to, in particular, allow segmentation offloading over tunnels. From Eric Dumazet. 13) Significantly improve the handling of secret keys we input into the various hash functions in the inet hashtables, TCP fast open, as well as syncookies. From Hannes Frederic Sowa. The key fundamental operation is "net_get_random_once()" which uses static keys. Hannes even extended this to ipv4/ipv6 fragmentation handling and our generic flow dissector. 14) The generic driver layer takes care now to set the driver data to NULL on device removal, so it's no longer necessary for drivers to explicitly set it to NULL any more. Many drivers have been cleaned up in this way, from Jingoo Han. 15) Add a BPF based packet scheduler classifier, from Daniel Borkmann. 16) Improve CRC32 interfaces and generic SKB checksum iterators so that SCTP's checksumming can more cleanly be handled. Also from Daniel Borkmann. 17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces using the interface MTU value. This helps avoid PMTU attacks, particularly on DNS servers. From Hannes Frederic Sowa. 18) Use generic XPS for transmit queue steering rather than internal (re-)implementation in virtio-net. From Jason Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits) random32: add test cases for taus113 implementation random32: upgrade taus88 generator to taus113 from errata paper random32: move rnd_state to linux/random.h random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized random32: add periodic reseeding random32: fix off-by-one in seeding requirement PHY: Add RTL8201CP phy_driver to realtek xtsonic: add missing platform_set_drvdata() in xtsonic_probe() macmace: add missing platform_set_drvdata() in mace_probe() ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe() ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh vlan: Implement vlan_dev_get_egress_qos_mask as an inline. ixgbe: add warning when max_vfs is out of range. igb: Update link modes display in ethtool netfilter: push reasm skb through instead of original frag skbs ip6_output: fragment outgoing reassembled skb properly MAINTAINERS: mv643xx_eth: take over maintainership from Lennart net_sched: tbf: support of 64bit rates ixgbe: deleting dfwd stations out of order can cause null ptr deref ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS ...
Diffstat (limited to 'net/core')
-rw-r--r--net/core/datagram.c2
-rw-r--r--net/core/dev.c557
-rw-r--r--net/core/dev_addr_lists.c4
-rw-r--r--net/core/ethtool.c3
-rw-r--r--net/core/fib_rules.c3
-rw-r--r--net/core/flow_dissector.c79
-rw-r--r--net/core/iovec.c2
-rw-r--r--net/core/neighbour.c2
-rw-r--r--net/core/net-sysfs.c2
-rw-r--r--net/core/netprio_cgroup.c3
-rw-r--r--net/core/rtnetlink.c12
-rw-r--r--net/core/secure_seq.c16
-rw-r--r--net/core/skbuff.c144
-rw-r--r--net/core/sock.c45
-rw-r--r--net/core/utils.c49
15 files changed, 632 insertions, 291 deletions
diff --git a/net/core/datagram.c b/net/core/datagram.c
index af814e764206..a16ed7bbe376 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -577,7 +577,7 @@ EXPORT_SYMBOL(skb_copy_datagram_from_iovec);
577/** 577/**
578 * zerocopy_sg_from_iovec - Build a zerocopy datagram from an iovec 578 * zerocopy_sg_from_iovec - Build a zerocopy datagram from an iovec
579 * @skb: buffer to copy 579 * @skb: buffer to copy
580 * @from: io vector to copy to 580 * @from: io vector to copy from
581 * @offset: offset in the io vector to start copying from 581 * @offset: offset in the io vector to start copying from
582 * @count: amount of vectors to copy to buffer from 582 * @count: amount of vectors to copy to buffer from
583 * 583 *
diff --git a/net/core/dev.c b/net/core/dev.c
index 3430b1ed12e5..8ffc52e01ece 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1203,7 +1203,7 @@ void netdev_state_change(struct net_device *dev)
1203{ 1203{
1204 if (dev->flags & IFF_UP) { 1204 if (dev->flags & IFF_UP) {
1205 call_netdevice_notifiers(NETDEV_CHANGE, dev); 1205 call_netdevice_notifiers(NETDEV_CHANGE, dev);
1206 rtmsg_ifinfo(RTM_NEWLINK, dev, 0); 1206 rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
1207 } 1207 }
1208} 1208}
1209EXPORT_SYMBOL(netdev_state_change); 1209EXPORT_SYMBOL(netdev_state_change);
@@ -1293,7 +1293,7 @@ int dev_open(struct net_device *dev)
1293 if (ret < 0) 1293 if (ret < 0)
1294 return ret; 1294 return ret;
1295 1295
1296 rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING); 1296 rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL);
1297 call_netdevice_notifiers(NETDEV_UP, dev); 1297 call_netdevice_notifiers(NETDEV_UP, dev);
1298 1298
1299 return ret; 1299 return ret;
@@ -1307,7 +1307,7 @@ static int __dev_close_many(struct list_head *head)
1307 ASSERT_RTNL(); 1307 ASSERT_RTNL();
1308 might_sleep(); 1308 might_sleep();
1309 1309
1310 list_for_each_entry(dev, head, unreg_list) { 1310 list_for_each_entry(dev, head, close_list) {
1311 call_netdevice_notifiers(NETDEV_GOING_DOWN, dev); 1311 call_netdevice_notifiers(NETDEV_GOING_DOWN, dev);
1312 1312
1313 clear_bit(__LINK_STATE_START, &dev->state); 1313 clear_bit(__LINK_STATE_START, &dev->state);
@@ -1323,7 +1323,7 @@ static int __dev_close_many(struct list_head *head)
1323 1323
1324 dev_deactivate_many(head); 1324 dev_deactivate_many(head);
1325 1325
1326 list_for_each_entry(dev, head, unreg_list) { 1326 list_for_each_entry(dev, head, close_list) {
1327 const struct net_device_ops *ops = dev->netdev_ops; 1327 const struct net_device_ops *ops = dev->netdev_ops;
1328 1328
1329 /* 1329 /*
@@ -1351,7 +1351,7 @@ static int __dev_close(struct net_device *dev)
1351 /* Temporarily disable netpoll until the interface is down */ 1351 /* Temporarily disable netpoll until the interface is down */
1352 netpoll_rx_disable(dev); 1352 netpoll_rx_disable(dev);
1353 1353
1354 list_add(&dev->unreg_list, &single); 1354 list_add(&dev->close_list, &single);
1355 retval = __dev_close_many(&single); 1355 retval = __dev_close_many(&single);
1356 list_del(&single); 1356 list_del(&single);
1357 1357
@@ -1362,21 +1362,20 @@ static int __dev_close(struct net_device *dev)
1362static int dev_close_many(struct list_head *head) 1362static int dev_close_many(struct list_head *head)
1363{ 1363{
1364 struct net_device *dev, *tmp; 1364 struct net_device *dev, *tmp;
1365 LIST_HEAD(tmp_list);
1366 1365
1367 list_for_each_entry_safe(dev, tmp, head, unreg_list) 1366 /* Remove the devices that don't need to be closed */
1367 list_for_each_entry_safe(dev, tmp, head, close_list)
1368 if (!(dev->flags & IFF_UP)) 1368 if (!(dev->flags & IFF_UP))
1369 list_move(&dev->unreg_list, &tmp_list); 1369 list_del_init(&dev->close_list);
1370 1370
1371 __dev_close_many(head); 1371 __dev_close_many(head);
1372 1372
1373 list_for_each_entry(dev, head, unreg_list) { 1373 list_for_each_entry_safe(dev, tmp, head, close_list) {
1374 rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING); 1374 rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL);
1375 call_netdevice_notifiers(NETDEV_DOWN, dev); 1375 call_netdevice_notifiers(NETDEV_DOWN, dev);
1376 list_del_init(&dev->close_list);
1376 } 1377 }
1377 1378
1378 /* rollback_registered_many needs the complete original list */
1379 list_splice(&tmp_list, head);
1380 return 0; 1379 return 0;
1381} 1380}
1382 1381
@@ -1397,7 +1396,7 @@ int dev_close(struct net_device *dev)
1397 /* Block netpoll rx while the interface is going down */ 1396 /* Block netpoll rx while the interface is going down */
1398 netpoll_rx_disable(dev); 1397 netpoll_rx_disable(dev);
1399 1398
1400 list_add(&dev->unreg_list, &single); 1399 list_add(&dev->close_list, &single);
1401 dev_close_many(&single); 1400 dev_close_many(&single);
1402 list_del(&single); 1401 list_del(&single);
1403 1402
@@ -2378,6 +2377,8 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
2378 } 2377 }
2379 2378
2380 SKB_GSO_CB(skb)->mac_offset = skb_headroom(skb); 2379 SKB_GSO_CB(skb)->mac_offset = skb_headroom(skb);
2380 SKB_GSO_CB(skb)->encap_level = 0;
2381
2381 skb_reset_mac_header(skb); 2382 skb_reset_mac_header(skb);
2382 skb_reset_mac_len(skb); 2383 skb_reset_mac_len(skb);
2383 2384
@@ -2537,7 +2538,7 @@ static inline int skb_needs_linearize(struct sk_buff *skb,
2537} 2538}
2538 2539
2539int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, 2540int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
2540 struct netdev_queue *txq) 2541 struct netdev_queue *txq, void *accel_priv)
2541{ 2542{
2542 const struct net_device_ops *ops = dev->netdev_ops; 2543 const struct net_device_ops *ops = dev->netdev_ops;
2543 int rc = NETDEV_TX_OK; 2544 int rc = NETDEV_TX_OK;
@@ -2603,9 +2604,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
2603 dev_queue_xmit_nit(skb, dev); 2604 dev_queue_xmit_nit(skb, dev);
2604 2605
2605 skb_len = skb->len; 2606 skb_len = skb->len;
2606 rc = ops->ndo_start_xmit(skb, dev); 2607 if (accel_priv)
2608 rc = ops->ndo_dfwd_start_xmit(skb, dev, accel_priv);
2609 else
2610 rc = ops->ndo_start_xmit(skb, dev);
2611
2607 trace_net_dev_xmit(skb, rc, dev, skb_len); 2612 trace_net_dev_xmit(skb, rc, dev, skb_len);
2608 if (rc == NETDEV_TX_OK) 2613 if (rc == NETDEV_TX_OK && txq)
2609 txq_trans_update(txq); 2614 txq_trans_update(txq);
2610 return rc; 2615 return rc;
2611 } 2616 }
@@ -2621,7 +2626,10 @@ gso:
2621 dev_queue_xmit_nit(nskb, dev); 2626 dev_queue_xmit_nit(nskb, dev);
2622 2627
2623 skb_len = nskb->len; 2628 skb_len = nskb->len;
2624 rc = ops->ndo_start_xmit(nskb, dev); 2629 if (accel_priv)
2630 rc = ops->ndo_dfwd_start_xmit(nskb, dev, accel_priv);
2631 else
2632 rc = ops->ndo_start_xmit(nskb, dev);
2625 trace_net_dev_xmit(nskb, rc, dev, skb_len); 2633 trace_net_dev_xmit(nskb, rc, dev, skb_len);
2626 if (unlikely(rc != NETDEV_TX_OK)) { 2634 if (unlikely(rc != NETDEV_TX_OK)) {
2627 if (rc & ~NETDEV_TX_MASK) 2635 if (rc & ~NETDEV_TX_MASK)
@@ -2646,6 +2654,7 @@ out_kfree_skb:
2646out: 2654out:
2647 return rc; 2655 return rc;
2648} 2656}
2657EXPORT_SYMBOL_GPL(dev_hard_start_xmit);
2649 2658
2650static void qdisc_pkt_len_init(struct sk_buff *skb) 2659static void qdisc_pkt_len_init(struct sk_buff *skb)
2651{ 2660{
@@ -2853,7 +2862,7 @@ int dev_queue_xmit(struct sk_buff *skb)
2853 2862
2854 if (!netif_xmit_stopped(txq)) { 2863 if (!netif_xmit_stopped(txq)) {
2855 __this_cpu_inc(xmit_recursion); 2864 __this_cpu_inc(xmit_recursion);
2856 rc = dev_hard_start_xmit(skb, dev, txq); 2865 rc = dev_hard_start_xmit(skb, dev, txq, NULL);
2857 __this_cpu_dec(xmit_recursion); 2866 __this_cpu_dec(xmit_recursion);
2858 if (dev_xmit_complete(rc)) { 2867 if (dev_xmit_complete(rc)) {
2859 HARD_TX_UNLOCK(dev, txq); 2868 HARD_TX_UNLOCK(dev, txq);
@@ -4374,42 +4383,40 @@ struct netdev_adjacent {
4374 /* upper master flag, there can only be one master device per list */ 4383 /* upper master flag, there can only be one master device per list */
4375 bool master; 4384 bool master;
4376 4385
4377 /* indicates that this dev is our first-level lower/upper device */
4378 bool neighbour;
4379
4380 /* counter for the number of times this device was added to us */ 4386 /* counter for the number of times this device was added to us */
4381 u16 ref_nr; 4387 u16 ref_nr;
4382 4388
4389 /* private field for the users */
4390 void *private;
4391
4383 struct list_head list; 4392 struct list_head list;
4384 struct rcu_head rcu; 4393 struct rcu_head rcu;
4385}; 4394};
4386 4395
4387static struct netdev_adjacent *__netdev_find_adj(struct net_device *dev, 4396static struct netdev_adjacent *__netdev_find_adj_rcu(struct net_device *dev,
4388 struct net_device *adj_dev, 4397 struct net_device *adj_dev,
4389 bool upper) 4398 struct list_head *adj_list)
4390{ 4399{
4391 struct netdev_adjacent *adj; 4400 struct netdev_adjacent *adj;
4392 struct list_head *dev_list;
4393
4394 dev_list = upper ? &dev->upper_dev_list : &dev->lower_dev_list;
4395 4401
4396 list_for_each_entry(adj, dev_list, list) { 4402 list_for_each_entry_rcu(adj, adj_list, list) {
4397 if (adj->dev == adj_dev) 4403 if (adj->dev == adj_dev)
4398 return adj; 4404 return adj;
4399 } 4405 }
4400 return NULL; 4406 return NULL;
4401} 4407}
4402 4408
4403static inline struct netdev_adjacent *__netdev_find_upper(struct net_device *dev, 4409static struct netdev_adjacent *__netdev_find_adj(struct net_device *dev,
4404 struct net_device *udev) 4410 struct net_device *adj_dev,
4411 struct list_head *adj_list)
4405{ 4412{
4406 return __netdev_find_adj(dev, udev, true); 4413 struct netdev_adjacent *adj;
4407}
4408 4414
4409static inline struct netdev_adjacent *__netdev_find_lower(struct net_device *dev, 4415 list_for_each_entry(adj, adj_list, list) {
4410 struct net_device *ldev) 4416 if (adj->dev == adj_dev)
4411{ 4417 return adj;
4412 return __netdev_find_adj(dev, ldev, false); 4418 }
4419 return NULL;
4413} 4420}
4414 4421
4415/** 4422/**
@@ -4426,7 +4433,7 @@ bool netdev_has_upper_dev(struct net_device *dev,
4426{ 4433{
4427 ASSERT_RTNL(); 4434 ASSERT_RTNL();
4428 4435
4429 return __netdev_find_upper(dev, upper_dev); 4436 return __netdev_find_adj(dev, upper_dev, &dev->all_adj_list.upper);
4430} 4437}
4431EXPORT_SYMBOL(netdev_has_upper_dev); 4438EXPORT_SYMBOL(netdev_has_upper_dev);
4432 4439
@@ -4441,7 +4448,7 @@ bool netdev_has_any_upper_dev(struct net_device *dev)
4441{ 4448{
4442 ASSERT_RTNL(); 4449 ASSERT_RTNL();
4443 4450
4444 return !list_empty(&dev->upper_dev_list); 4451 return !list_empty(&dev->all_adj_list.upper);
4445} 4452}
4446EXPORT_SYMBOL(netdev_has_any_upper_dev); 4453EXPORT_SYMBOL(netdev_has_any_upper_dev);
4447 4454
@@ -4458,10 +4465,10 @@ struct net_device *netdev_master_upper_dev_get(struct net_device *dev)
4458 4465
4459 ASSERT_RTNL(); 4466 ASSERT_RTNL();
4460 4467
4461 if (list_empty(&dev->upper_dev_list)) 4468 if (list_empty(&dev->adj_list.upper))
4462 return NULL; 4469 return NULL;
4463 4470
4464 upper = list_first_entry(&dev->upper_dev_list, 4471 upper = list_first_entry(&dev->adj_list.upper,
4465 struct netdev_adjacent, list); 4472 struct netdev_adjacent, list);
4466 if (likely(upper->master)) 4473 if (likely(upper->master))
4467 return upper->dev; 4474 return upper->dev;
@@ -4469,15 +4476,26 @@ struct net_device *netdev_master_upper_dev_get(struct net_device *dev)
4469} 4476}
4470EXPORT_SYMBOL(netdev_master_upper_dev_get); 4477EXPORT_SYMBOL(netdev_master_upper_dev_get);
4471 4478
4472/* netdev_upper_get_next_dev_rcu - Get the next dev from upper list 4479void *netdev_adjacent_get_private(struct list_head *adj_list)
4480{
4481 struct netdev_adjacent *adj;
4482
4483 adj = list_entry(adj_list, struct netdev_adjacent, list);
4484
4485 return adj->private;
4486}
4487EXPORT_SYMBOL(netdev_adjacent_get_private);
4488
4489/**
4490 * netdev_all_upper_get_next_dev_rcu - Get the next dev from upper list
4473 * @dev: device 4491 * @dev: device
4474 * @iter: list_head ** of the current position 4492 * @iter: list_head ** of the current position
4475 * 4493 *
4476 * Gets the next device from the dev's upper list, starting from iter 4494 * Gets the next device from the dev's upper list, starting from iter
4477 * position. The caller must hold RCU read lock. 4495 * position. The caller must hold RCU read lock.
4478 */ 4496 */
4479struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev, 4497struct net_device *netdev_all_upper_get_next_dev_rcu(struct net_device *dev,
4480 struct list_head **iter) 4498 struct list_head **iter)
4481{ 4499{
4482 struct netdev_adjacent *upper; 4500 struct netdev_adjacent *upper;
4483 4501
@@ -4485,14 +4503,71 @@ struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev,
4485 4503
4486 upper = list_entry_rcu((*iter)->next, struct netdev_adjacent, list); 4504 upper = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);
4487 4505
4488 if (&upper->list == &dev->upper_dev_list) 4506 if (&upper->list == &dev->all_adj_list.upper)
4489 return NULL; 4507 return NULL;
4490 4508
4491 *iter = &upper->list; 4509 *iter = &upper->list;
4492 4510
4493 return upper->dev; 4511 return upper->dev;
4494} 4512}
4495EXPORT_SYMBOL(netdev_upper_get_next_dev_rcu); 4513EXPORT_SYMBOL(netdev_all_upper_get_next_dev_rcu);
4514
4515/**
4516 * netdev_lower_get_next_private - Get the next ->private from the
4517 * lower neighbour list
4518 * @dev: device
4519 * @iter: list_head ** of the current position
4520 *
4521 * Gets the next netdev_adjacent->private from the dev's lower neighbour
4522 * list, starting from iter position. The caller must hold either hold the
4523 * RTNL lock or its own locking that guarantees that the neighbour lower
4524 * list will remain unchainged.
4525 */
4526void *netdev_lower_get_next_private(struct net_device *dev,
4527 struct list_head **iter)
4528{
4529 struct netdev_adjacent *lower;
4530
4531 lower = list_entry(*iter, struct netdev_adjacent, list);
4532
4533 if (&lower->list == &dev->adj_list.lower)
4534 return NULL;
4535
4536 if (iter)
4537 *iter = lower->list.next;
4538
4539 return lower->private;
4540}
4541EXPORT_SYMBOL(netdev_lower_get_next_private);
4542
4543/**
4544 * netdev_lower_get_next_private_rcu - Get the next ->private from the
4545 * lower neighbour list, RCU
4546 * variant
4547 * @dev: device
4548 * @iter: list_head ** of the current position
4549 *
4550 * Gets the next netdev_adjacent->private from the dev's lower neighbour
4551 * list, starting from iter position. The caller must hold RCU read lock.
4552 */
4553void *netdev_lower_get_next_private_rcu(struct net_device *dev,
4554 struct list_head **iter)
4555{
4556 struct netdev_adjacent *lower;
4557
4558 WARN_ON_ONCE(!rcu_read_lock_held());
4559
4560 lower = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);
4561
4562 if (&lower->list == &dev->adj_list.lower)
4563 return NULL;
4564
4565 if (iter)
4566 *iter = &lower->list;
4567
4568 return lower->private;
4569}
4570EXPORT_SYMBOL(netdev_lower_get_next_private_rcu);
4496 4571
4497/** 4572/**
4498 * netdev_master_upper_dev_get_rcu - Get master upper device 4573 * netdev_master_upper_dev_get_rcu - Get master upper device
@@ -4505,7 +4580,7 @@ struct net_device *netdev_master_upper_dev_get_rcu(struct net_device *dev)
4505{ 4580{
4506 struct netdev_adjacent *upper; 4581 struct netdev_adjacent *upper;
4507 4582
4508 upper = list_first_or_null_rcu(&dev->upper_dev_list, 4583 upper = list_first_or_null_rcu(&dev->adj_list.upper,
4509 struct netdev_adjacent, list); 4584 struct netdev_adjacent, list);
4510 if (upper && likely(upper->master)) 4585 if (upper && likely(upper->master))
4511 return upper->dev; 4586 return upper->dev;
@@ -4515,15 +4590,16 @@ EXPORT_SYMBOL(netdev_master_upper_dev_get_rcu);
4515 4590
4516static int __netdev_adjacent_dev_insert(struct net_device *dev, 4591static int __netdev_adjacent_dev_insert(struct net_device *dev,
4517 struct net_device *adj_dev, 4592 struct net_device *adj_dev,
4518 bool neighbour, bool master, 4593 struct list_head *dev_list,
4519 bool upper) 4594 void *private, bool master)
4520{ 4595{
4521 struct netdev_adjacent *adj; 4596 struct netdev_adjacent *adj;
4597 char linkname[IFNAMSIZ+7];
4598 int ret;
4522 4599
4523 adj = __netdev_find_adj(dev, adj_dev, upper); 4600 adj = __netdev_find_adj(dev, adj_dev, dev_list);
4524 4601
4525 if (adj) { 4602 if (adj) {
4526 BUG_ON(neighbour);
4527 adj->ref_nr++; 4603 adj->ref_nr++;
4528 return 0; 4604 return 0;
4529 } 4605 }
@@ -4534,124 +4610,179 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
4534 4610
4535 adj->dev = adj_dev; 4611 adj->dev = adj_dev;
4536 adj->master = master; 4612 adj->master = master;
4537 adj->neighbour = neighbour;
4538 adj->ref_nr = 1; 4613 adj->ref_nr = 1;
4539 4614 adj->private = private;
4540 dev_hold(adj_dev); 4615 dev_hold(adj_dev);
4541 pr_debug("dev_hold for %s, because of %s link added from %s to %s\n",
4542 adj_dev->name, upper ? "upper" : "lower", dev->name,
4543 adj_dev->name);
4544 4616
4545 if (!upper) { 4617 pr_debug("dev_hold for %s, because of link added from %s to %s\n",
4546 list_add_tail_rcu(&adj->list, &dev->lower_dev_list); 4618 adj_dev->name, dev->name, adj_dev->name);
4547 return 0; 4619
4620 if (dev_list == &dev->adj_list.lower) {
4621 sprintf(linkname, "lower_%s", adj_dev->name);
4622 ret = sysfs_create_link(&(dev->dev.kobj),
4623 &(adj_dev->dev.kobj), linkname);
4624 if (ret)
4625 goto free_adj;
4626 } else if (dev_list == &dev->adj_list.upper) {
4627 sprintf(linkname, "upper_%s", adj_dev->name);
4628 ret = sysfs_create_link(&(dev->dev.kobj),
4629 &(adj_dev->dev.kobj), linkname);
4630 if (ret)
4631 goto free_adj;
4548 } 4632 }
4549 4633
4550 /* Ensure that master upper link is always the first item in list. */ 4634 /* Ensure that master link is always the first item in list. */
4551 if (master) 4635 if (master) {
4552 list_add_rcu(&adj->list, &dev->upper_dev_list); 4636 ret = sysfs_create_link(&(dev->dev.kobj),
4553 else 4637 &(adj_dev->dev.kobj), "master");
4554 list_add_tail_rcu(&adj->list, &dev->upper_dev_list); 4638 if (ret)
4639 goto remove_symlinks;
4640
4641 list_add_rcu(&adj->list, dev_list);
4642 } else {
4643 list_add_tail_rcu(&adj->list, dev_list);
4644 }
4555 4645
4556 return 0; 4646 return 0;
4557}
4558 4647
4559static inline int __netdev_upper_dev_insert(struct net_device *dev, 4648remove_symlinks:
4560 struct net_device *udev, 4649 if (dev_list == &dev->adj_list.lower) {
4561 bool master, bool neighbour) 4650 sprintf(linkname, "lower_%s", adj_dev->name);
4562{ 4651 sysfs_remove_link(&(dev->dev.kobj), linkname);
4563 return __netdev_adjacent_dev_insert(dev, udev, neighbour, master, 4652 } else if (dev_list == &dev->adj_list.upper) {
4564 true); 4653 sprintf(linkname, "upper_%s", adj_dev->name);
4565} 4654 sysfs_remove_link(&(dev->dev.kobj), linkname);
4655 }
4566 4656
4567static inline int __netdev_lower_dev_insert(struct net_device *dev, 4657free_adj:
4568 struct net_device *ldev, 4658 kfree(adj);
4569 bool neighbour) 4659 dev_put(adj_dev);
4570{ 4660
4571 return __netdev_adjacent_dev_insert(dev, ldev, neighbour, false, 4661 return ret;
4572 false);
4573} 4662}
4574 4663
4575void __netdev_adjacent_dev_remove(struct net_device *dev, 4664void __netdev_adjacent_dev_remove(struct net_device *dev,
4576 struct net_device *adj_dev, bool upper) 4665 struct net_device *adj_dev,
4666 struct list_head *dev_list)
4577{ 4667{
4578 struct netdev_adjacent *adj; 4668 struct netdev_adjacent *adj;
4669 char linkname[IFNAMSIZ+7];
4579 4670
4580 if (upper) 4671 adj = __netdev_find_adj(dev, adj_dev, dev_list);
4581 adj = __netdev_find_upper(dev, adj_dev);
4582 else
4583 adj = __netdev_find_lower(dev, adj_dev);
4584 4672
4585 if (!adj) 4673 if (!adj) {
4674 pr_err("tried to remove device %s from %s\n",
4675 dev->name, adj_dev->name);
4586 BUG(); 4676 BUG();
4677 }
4587 4678
4588 if (adj->ref_nr > 1) { 4679 if (adj->ref_nr > 1) {
4680 pr_debug("%s to %s ref_nr-- = %d\n", dev->name, adj_dev->name,
4681 adj->ref_nr-1);
4589 adj->ref_nr--; 4682 adj->ref_nr--;
4590 return; 4683 return;
4591 } 4684 }
4592 4685
4686 if (adj->master)
4687 sysfs_remove_link(&(dev->dev.kobj), "master");
4688
4689 if (dev_list == &dev->adj_list.lower) {
4690 sprintf(linkname, "lower_%s", adj_dev->name);
4691 sysfs_remove_link(&(dev->dev.kobj), linkname);
4692 } else if (dev_list == &dev->adj_list.upper) {
4693 sprintf(linkname, "upper_%s", adj_dev->name);
4694 sysfs_remove_link(&(dev->dev.kobj), linkname);
4695 }
4696
4593 list_del_rcu(&adj->list); 4697 list_del_rcu(&adj->list);
4594 pr_debug("dev_put for %s, because of %s link removed from %s to %s\n", 4698 pr_debug("dev_put for %s, because link removed from %s to %s\n",
4595 adj_dev->name, upper ? "upper" : "lower", dev->name, 4699 adj_dev->name, dev->name, adj_dev->name);
4596 adj_dev->name);
4597 dev_put(adj_dev); 4700 dev_put(adj_dev);
4598 kfree_rcu(adj, rcu); 4701 kfree_rcu(adj, rcu);
4599} 4702}
4600 4703
4601static inline void __netdev_upper_dev_remove(struct net_device *dev, 4704int __netdev_adjacent_dev_link_lists(struct net_device *dev,
4602 struct net_device *udev) 4705 struct net_device *upper_dev,
4603{ 4706 struct list_head *up_list,
4604 return __netdev_adjacent_dev_remove(dev, udev, true); 4707 struct list_head *down_list,
4605} 4708 void *private, bool master)
4606
4607static inline void __netdev_lower_dev_remove(struct net_device *dev,
4608 struct net_device *ldev)
4609{
4610 return __netdev_adjacent_dev_remove(dev, ldev, false);
4611}
4612
4613int __netdev_adjacent_dev_insert_link(struct net_device *dev,
4614 struct net_device *upper_dev,
4615 bool master, bool neighbour)
4616{ 4709{
4617 int ret; 4710 int ret;
4618 4711
4619 ret = __netdev_upper_dev_insert(dev, upper_dev, master, neighbour); 4712 ret = __netdev_adjacent_dev_insert(dev, upper_dev, up_list, private,
4713 master);
4620 if (ret) 4714 if (ret)
4621 return ret; 4715 return ret;
4622 4716
4623 ret = __netdev_lower_dev_insert(upper_dev, dev, neighbour); 4717 ret = __netdev_adjacent_dev_insert(upper_dev, dev, down_list, private,
4718 false);
4624 if (ret) { 4719 if (ret) {
4625 __netdev_upper_dev_remove(dev, upper_dev); 4720 __netdev_adjacent_dev_remove(dev, upper_dev, up_list);
4626 return ret; 4721 return ret;
4627 } 4722 }
4628 4723
4629 return 0; 4724 return 0;
4630} 4725}
4631 4726
4632static inline int __netdev_adjacent_dev_link(struct net_device *dev, 4727int __netdev_adjacent_dev_link(struct net_device *dev,
4633 struct net_device *udev) 4728 struct net_device *upper_dev)
4634{ 4729{
4635 return __netdev_adjacent_dev_insert_link(dev, udev, false, false); 4730 return __netdev_adjacent_dev_link_lists(dev, upper_dev,
4731 &dev->all_adj_list.upper,
4732 &upper_dev->all_adj_list.lower,
4733 NULL, false);
4636} 4734}
4637 4735
4638static inline int __netdev_adjacent_dev_link_neighbour(struct net_device *dev, 4736void __netdev_adjacent_dev_unlink_lists(struct net_device *dev,
4639 struct net_device *udev, 4737 struct net_device *upper_dev,
4640 bool master) 4738 struct list_head *up_list,
4739 struct list_head *down_list)
4641{ 4740{
4642 return __netdev_adjacent_dev_insert_link(dev, udev, master, true); 4741 __netdev_adjacent_dev_remove(dev, upper_dev, up_list);
4742 __netdev_adjacent_dev_remove(upper_dev, dev, down_list);
4643} 4743}
4644 4744
4645void __netdev_adjacent_dev_unlink(struct net_device *dev, 4745void __netdev_adjacent_dev_unlink(struct net_device *dev,
4646 struct net_device *upper_dev) 4746 struct net_device *upper_dev)
4647{ 4747{
4648 __netdev_upper_dev_remove(dev, upper_dev); 4748 __netdev_adjacent_dev_unlink_lists(dev, upper_dev,
4649 __netdev_lower_dev_remove(upper_dev, dev); 4749 &dev->all_adj_list.upper,
4750 &upper_dev->all_adj_list.lower);
4650} 4751}
4651 4752
4753int __netdev_adjacent_dev_link_neighbour(struct net_device *dev,
4754 struct net_device *upper_dev,
4755 void *private, bool master)
4756{
4757 int ret = __netdev_adjacent_dev_link(dev, upper_dev);
4758
4759 if (ret)
4760 return ret;
4761
4762 ret = __netdev_adjacent_dev_link_lists(dev, upper_dev,
4763 &dev->adj_list.upper,
4764 &upper_dev->adj_list.lower,
4765 private, master);
4766 if (ret) {
4767 __netdev_adjacent_dev_unlink(dev, upper_dev);
4768 return ret;
4769 }
4770
4771 return 0;
4772}
4773
4774void __netdev_adjacent_dev_unlink_neighbour(struct net_device *dev,
4775 struct net_device *upper_dev)
4776{
4777 __netdev_adjacent_dev_unlink(dev, upper_dev);
4778 __netdev_adjacent_dev_unlink_lists(dev, upper_dev,
4779 &dev->adj_list.upper,
4780 &upper_dev->adj_list.lower);
4781}
4652 4782
4653static int __netdev_upper_dev_link(struct net_device *dev, 4783static int __netdev_upper_dev_link(struct net_device *dev,
4654 struct net_device *upper_dev, bool master) 4784 struct net_device *upper_dev, bool master,
4785 void *private)
4655{ 4786{
4656 struct netdev_adjacent *i, *j, *to_i, *to_j; 4787 struct netdev_adjacent *i, *j, *to_i, *to_j;
4657 int ret = 0; 4788 int ret = 0;
@@ -4662,26 +4793,29 @@ static int __netdev_upper_dev_link(struct net_device *dev,
4662 return -EBUSY; 4793 return -EBUSY;
4663 4794
4664 /* To prevent loops, check if dev is not upper device to upper_dev. */ 4795 /* To prevent loops, check if dev is not upper device to upper_dev. */
4665 if (__netdev_find_upper(upper_dev, dev)) 4796 if (__netdev_find_adj(upper_dev, dev, &upper_dev->all_adj_list.upper))
4666 return -EBUSY; 4797 return -EBUSY;
4667 4798
4668 if (__netdev_find_upper(dev, upper_dev)) 4799 if (__netdev_find_adj(dev, upper_dev, &dev->all_adj_list.upper))
4669 return -EEXIST; 4800 return -EEXIST;
4670 4801
4671 if (master && netdev_master_upper_dev_get(dev)) 4802 if (master && netdev_master_upper_dev_get(dev))
4672 return -EBUSY; 4803 return -EBUSY;
4673 4804
4674 ret = __netdev_adjacent_dev_link_neighbour(dev, upper_dev, master); 4805 ret = __netdev_adjacent_dev_link_neighbour(dev, upper_dev, private,
4806 master);
4675 if (ret) 4807 if (ret)
4676 return ret; 4808 return ret;
4677 4809
4678 /* Now that we linked these devs, make all the upper_dev's 4810 /* Now that we linked these devs, make all the upper_dev's
4679 * upper_dev_list visible to every dev's lower_dev_list and vice 4811 * all_adj_list.upper visible to every dev's all_adj_list.lower an
4680 * versa, and don't forget the devices itself. All of these 4812 * versa, and don't forget the devices itself. All of these
4681 * links are non-neighbours. 4813 * links are non-neighbours.
4682 */ 4814 */
4683 list_for_each_entry(i, &dev->lower_dev_list, list) { 4815 list_for_each_entry(i, &dev->all_adj_list.lower, list) {
4684 list_for_each_entry(j, &upper_dev->upper_dev_list, list) { 4816 list_for_each_entry(j, &upper_dev->all_adj_list.upper, list) {
4817 pr_debug("Interlinking %s with %s, non-neighbour\n",
4818 i->dev->name, j->dev->name);
4685 ret = __netdev_adjacent_dev_link(i->dev, j->dev); 4819 ret = __netdev_adjacent_dev_link(i->dev, j->dev);
4686 if (ret) 4820 if (ret)
4687 goto rollback_mesh; 4821 goto rollback_mesh;
@@ -4689,14 +4823,18 @@ static int __netdev_upper_dev_link(struct net_device *dev,
4689 } 4823 }
4690 4824
4691 /* add dev to every upper_dev's upper device */ 4825 /* add dev to every upper_dev's upper device */
4692 list_for_each_entry(i, &upper_dev->upper_dev_list, list) { 4826 list_for_each_entry(i, &upper_dev->all_adj_list.upper, list) {
4827 pr_debug("linking %s's upper device %s with %s\n",
4828 upper_dev->name, i->dev->name, dev->name);
4693 ret = __netdev_adjacent_dev_link(dev, i->dev); 4829 ret = __netdev_adjacent_dev_link(dev, i->dev);
4694 if (ret) 4830 if (ret)
4695 goto rollback_upper_mesh; 4831 goto rollback_upper_mesh;
4696 } 4832 }
4697 4833
4698 /* add upper_dev to every dev's lower device */ 4834 /* add upper_dev to every dev's lower device */
4699 list_for_each_entry(i, &dev->lower_dev_list, list) { 4835 list_for_each_entry(i, &dev->all_adj_list.lower, list) {
4836 pr_debug("linking %s's lower device %s with %s\n", dev->name,
4837 i->dev->name, upper_dev->name);
4700 ret = __netdev_adjacent_dev_link(i->dev, upper_dev); 4838 ret = __netdev_adjacent_dev_link(i->dev, upper_dev);
4701 if (ret) 4839 if (ret)
4702 goto rollback_lower_mesh; 4840 goto rollback_lower_mesh;
@@ -4707,7 +4845,7 @@ static int __netdev_upper_dev_link(struct net_device *dev,
4707 4845
4708rollback_lower_mesh: 4846rollback_lower_mesh:
4709 to_i = i; 4847 to_i = i;
4710 list_for_each_entry(i, &dev->lower_dev_list, list) { 4848 list_for_each_entry(i, &dev->all_adj_list.lower, list) {
4711 if (i == to_i) 4849 if (i == to_i)
4712 break; 4850 break;
4713 __netdev_adjacent_dev_unlink(i->dev, upper_dev); 4851 __netdev_adjacent_dev_unlink(i->dev, upper_dev);
@@ -4717,7 +4855,7 @@ rollback_lower_mesh:
4717 4855
4718rollback_upper_mesh: 4856rollback_upper_mesh:
4719 to_i = i; 4857 to_i = i;
4720 list_for_each_entry(i, &upper_dev->upper_dev_list, list) { 4858 list_for_each_entry(i, &upper_dev->all_adj_list.upper, list) {
4721 if (i == to_i) 4859 if (i == to_i)
4722 break; 4860 break;
4723 __netdev_adjacent_dev_unlink(dev, i->dev); 4861 __netdev_adjacent_dev_unlink(dev, i->dev);
@@ -4728,8 +4866,8 @@ rollback_upper_mesh:
4728rollback_mesh: 4866rollback_mesh:
4729 to_i = i; 4867 to_i = i;
4730 to_j = j; 4868 to_j = j;
4731 list_for_each_entry(i, &dev->lower_dev_list, list) { 4869 list_for_each_entry(i, &dev->all_adj_list.lower, list) {
4732 list_for_each_entry(j, &upper_dev->upper_dev_list, list) { 4870 list_for_each_entry(j, &upper_dev->all_adj_list.upper, list) {
4733 if (i == to_i && j == to_j) 4871 if (i == to_i && j == to_j)
4734 break; 4872 break;
4735 __netdev_adjacent_dev_unlink(i->dev, j->dev); 4873 __netdev_adjacent_dev_unlink(i->dev, j->dev);
@@ -4738,7 +4876,7 @@ rollback_mesh:
4738 break; 4876 break;
4739 } 4877 }
4740 4878
4741 __netdev_adjacent_dev_unlink(dev, upper_dev); 4879 __netdev_adjacent_dev_unlink_neighbour(dev, upper_dev);
4742 4880
4743 return ret; 4881 return ret;
4744} 4882}
@@ -4756,7 +4894,7 @@ rollback_mesh:
4756int netdev_upper_dev_link(struct net_device *dev, 4894int netdev_upper_dev_link(struct net_device *dev,
4757 struct net_device *upper_dev) 4895 struct net_device *upper_dev)
4758{ 4896{
4759 return __netdev_upper_dev_link(dev, upper_dev, false); 4897 return __netdev_upper_dev_link(dev, upper_dev, false, NULL);
4760} 4898}
4761EXPORT_SYMBOL(netdev_upper_dev_link); 4899EXPORT_SYMBOL(netdev_upper_dev_link);
4762 4900
@@ -4774,10 +4912,18 @@ EXPORT_SYMBOL(netdev_upper_dev_link);
4774int netdev_master_upper_dev_link(struct net_device *dev, 4912int netdev_master_upper_dev_link(struct net_device *dev,
4775 struct net_device *upper_dev) 4913 struct net_device *upper_dev)
4776{ 4914{
4777 return __netdev_upper_dev_link(dev, upper_dev, true); 4915 return __netdev_upper_dev_link(dev, upper_dev, true, NULL);
4778} 4916}
4779EXPORT_SYMBOL(netdev_master_upper_dev_link); 4917EXPORT_SYMBOL(netdev_master_upper_dev_link);
4780 4918
4919int netdev_master_upper_dev_link_private(struct net_device *dev,
4920 struct net_device *upper_dev,
4921 void *private)
4922{
4923 return __netdev_upper_dev_link(dev, upper_dev, true, private);
4924}
4925EXPORT_SYMBOL(netdev_master_upper_dev_link_private);
4926
4781/** 4927/**
4782 * netdev_upper_dev_unlink - Removes a link to upper device 4928 * netdev_upper_dev_unlink - Removes a link to upper device
4783 * @dev: device 4929 * @dev: device
@@ -4792,29 +4938,59 @@ void netdev_upper_dev_unlink(struct net_device *dev,
4792 struct netdev_adjacent *i, *j; 4938 struct netdev_adjacent *i, *j;
4793 ASSERT_RTNL(); 4939 ASSERT_RTNL();
4794 4940
4795 __netdev_adjacent_dev_unlink(dev, upper_dev); 4941 __netdev_adjacent_dev_unlink_neighbour(dev, upper_dev);
4796 4942
4797 /* Here is the tricky part. We must remove all dev's lower 4943 /* Here is the tricky part. We must remove all dev's lower
4798 * devices from all upper_dev's upper devices and vice 4944 * devices from all upper_dev's upper devices and vice
4799 * versa, to maintain the graph relationship. 4945 * versa, to maintain the graph relationship.
4800 */ 4946 */
4801 list_for_each_entry(i, &dev->lower_dev_list, list) 4947 list_for_each_entry(i, &dev->all_adj_list.lower, list)
4802 list_for_each_entry(j, &upper_dev->upper_dev_list, list) 4948 list_for_each_entry(j, &upper_dev->all_adj_list.upper, list)
4803 __netdev_adjacent_dev_unlink(i->dev, j->dev); 4949 __netdev_adjacent_dev_unlink(i->dev, j->dev);
4804 4950
4805 /* remove also the devices itself from lower/upper device 4951 /* remove also the devices itself from lower/upper device
4806 * list 4952 * list
4807 */ 4953 */
4808 list_for_each_entry(i, &dev->lower_dev_list, list) 4954 list_for_each_entry(i, &dev->all_adj_list.lower, list)
4809 __netdev_adjacent_dev_unlink(i->dev, upper_dev); 4955 __netdev_adjacent_dev_unlink(i->dev, upper_dev);
4810 4956
4811 list_for_each_entry(i, &upper_dev->upper_dev_list, list) 4957 list_for_each_entry(i, &upper_dev->all_adj_list.upper, list)
4812 __netdev_adjacent_dev_unlink(dev, i->dev); 4958 __netdev_adjacent_dev_unlink(dev, i->dev);
4813 4959
4814 call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev); 4960 call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev);
4815} 4961}
4816EXPORT_SYMBOL(netdev_upper_dev_unlink); 4962EXPORT_SYMBOL(netdev_upper_dev_unlink);
4817 4963
4964void *netdev_lower_dev_get_private_rcu(struct net_device *dev,
4965 struct net_device *lower_dev)
4966{
4967 struct netdev_adjacent *lower;
4968
4969 if (!lower_dev)
4970 return NULL;
4971 lower = __netdev_find_adj_rcu(dev, lower_dev, &dev->adj_list.lower);
4972 if (!lower)
4973 return NULL;
4974
4975 return lower->private;
4976}
4977EXPORT_SYMBOL(netdev_lower_dev_get_private_rcu);
4978
4979void *netdev_lower_dev_get_private(struct net_device *dev,
4980 struct net_device *lower_dev)
4981{
4982 struct netdev_adjacent *lower;
4983
4984 if (!lower_dev)
4985 return NULL;
4986 lower = __netdev_find_adj(dev, lower_dev, &dev->adj_list.lower);
4987 if (!lower)
4988 return NULL;
4989
4990 return lower->private;
4991}
4992EXPORT_SYMBOL(netdev_lower_dev_get_private);
4993
4818static void dev_change_rx_flags(struct net_device *dev, int flags) 4994static void dev_change_rx_flags(struct net_device *dev, int flags)
4819{ 4995{
4820 const struct net_device_ops *ops = dev->netdev_ops; 4996 const struct net_device_ops *ops = dev->netdev_ops;
@@ -4823,7 +4999,7 @@ static void dev_change_rx_flags(struct net_device *dev, int flags)
4823 ops->ndo_change_rx_flags(dev, flags); 4999 ops->ndo_change_rx_flags(dev, flags);
4824} 5000}
4825 5001
4826static int __dev_set_promiscuity(struct net_device *dev, int inc) 5002static int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify)
4827{ 5003{
4828 unsigned int old_flags = dev->flags; 5004 unsigned int old_flags = dev->flags;
4829 kuid_t uid; 5005 kuid_t uid;
@@ -4866,6 +5042,8 @@ static int __dev_set_promiscuity(struct net_device *dev, int inc)
4866 5042
4867 dev_change_rx_flags(dev, IFF_PROMISC); 5043 dev_change_rx_flags(dev, IFF_PROMISC);
4868 } 5044 }
5045 if (notify)
5046 __dev_notify_flags(dev, old_flags, IFF_PROMISC);
4869 return 0; 5047 return 0;
4870} 5048}
4871 5049
@@ -4885,7 +5063,7 @@ int dev_set_promiscuity(struct net_device *dev, int inc)
4885 unsigned int old_flags = dev->flags; 5063 unsigned int old_flags = dev->flags;
4886 int err; 5064 int err;
4887 5065
4888 err = __dev_set_promiscuity(dev, inc); 5066 err = __dev_set_promiscuity(dev, inc, true);
4889 if (err < 0) 5067 if (err < 0)
4890 return err; 5068 return err;
4891 if (dev->flags != old_flags) 5069 if (dev->flags != old_flags)
@@ -4894,22 +5072,9 @@ int dev_set_promiscuity(struct net_device *dev, int inc)
4894} 5072}
4895EXPORT_SYMBOL(dev_set_promiscuity); 5073EXPORT_SYMBOL(dev_set_promiscuity);
4896 5074
4897/** 5075static int __dev_set_allmulti(struct net_device *dev, int inc, bool notify)
4898 * dev_set_allmulti - update allmulti count on a device
4899 * @dev: device
4900 * @inc: modifier
4901 *
4902 * Add or remove reception of all multicast frames to a device. While the
4903 * count in the device remains above zero the interface remains listening
4904 * to all interfaces. Once it hits zero the device reverts back to normal
4905 * filtering operation. A negative @inc value is used to drop the counter
4906 * when releasing a resource needing all multicasts.
4907 * Return 0 if successful or a negative errno code on error.
4908 */
4909
4910int dev_set_allmulti(struct net_device *dev, int inc)
4911{ 5076{
4912 unsigned int old_flags = dev->flags; 5077 unsigned int old_flags = dev->flags, old_gflags = dev->gflags;
4913 5078
4914 ASSERT_RTNL(); 5079 ASSERT_RTNL();
4915 5080
@@ -4932,9 +5097,30 @@ int dev_set_allmulti(struct net_device *dev, int inc)
4932 if (dev->flags ^ old_flags) { 5097 if (dev->flags ^ old_flags) {
4933 dev_change_rx_flags(dev, IFF_ALLMULTI); 5098 dev_change_rx_flags(dev, IFF_ALLMULTI);
4934 dev_set_rx_mode(dev); 5099 dev_set_rx_mode(dev);
5100 if (notify)
5101 __dev_notify_flags(dev, old_flags,
5102 dev->gflags ^ old_gflags);
4935 } 5103 }
4936 return 0; 5104 return 0;
4937} 5105}
5106
5107/**
5108 * dev_set_allmulti - update allmulti count on a device
5109 * @dev: device
5110 * @inc: modifier
5111 *
5112 * Add or remove reception of all multicast frames to a device. While the
5113 * count in the device remains above zero the interface remains listening
5114 * to all interfaces. Once it hits zero the device reverts back to normal
5115 * filtering operation. A negative @inc value is used to drop the counter
5116 * when releasing a resource needing all multicasts.
5117 * Return 0 if successful or a negative errno code on error.
5118 */
5119
5120int dev_set_allmulti(struct net_device *dev, int inc)
5121{
5122 return __dev_set_allmulti(dev, inc, true);
5123}
4938EXPORT_SYMBOL(dev_set_allmulti); 5124EXPORT_SYMBOL(dev_set_allmulti);
4939 5125
4940/* 5126/*
@@ -4959,10 +5145,10 @@ void __dev_set_rx_mode(struct net_device *dev)
4959 * therefore calling __dev_set_promiscuity here is safe. 5145 * therefore calling __dev_set_promiscuity here is safe.
4960 */ 5146 */
4961 if (!netdev_uc_empty(dev) && !dev->uc_promisc) { 5147 if (!netdev_uc_empty(dev) && !dev->uc_promisc) {
4962 __dev_set_promiscuity(dev, 1); 5148 __dev_set_promiscuity(dev, 1, false);
4963 dev->uc_promisc = true; 5149 dev->uc_promisc = true;
4964 } else if (netdev_uc_empty(dev) && dev->uc_promisc) { 5150 } else if (netdev_uc_empty(dev) && dev->uc_promisc) {
4965 __dev_set_promiscuity(dev, -1); 5151 __dev_set_promiscuity(dev, -1, false);
4966 dev->uc_promisc = false; 5152 dev->uc_promisc = false;
4967 } 5153 }
4968 } 5154 }
@@ -5051,9 +5237,13 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags)
5051 5237
5052 if ((flags ^ dev->gflags) & IFF_PROMISC) { 5238 if ((flags ^ dev->gflags) & IFF_PROMISC) {
5053 int inc = (flags & IFF_PROMISC) ? 1 : -1; 5239 int inc = (flags & IFF_PROMISC) ? 1 : -1;
5240 unsigned int old_flags = dev->flags;
5054 5241
5055 dev->gflags ^= IFF_PROMISC; 5242 dev->gflags ^= IFF_PROMISC;
5056 dev_set_promiscuity(dev, inc); 5243
5244 if (__dev_set_promiscuity(dev, inc, false) >= 0)
5245 if (dev->flags != old_flags)
5246 dev_set_rx_mode(dev);
5057 } 5247 }
5058 5248
5059 /* NOTE: order of synchronization of IFF_PROMISC and IFF_ALLMULTI 5249 /* NOTE: order of synchronization of IFF_PROMISC and IFF_ALLMULTI
@@ -5064,16 +5254,20 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags)
5064 int inc = (flags & IFF_ALLMULTI) ? 1 : -1; 5254 int inc = (flags & IFF_ALLMULTI) ? 1 : -1;
5065 5255
5066 dev->gflags ^= IFF_ALLMULTI; 5256 dev->gflags ^= IFF_ALLMULTI;
5067 dev_set_allmulti(dev, inc); 5257 __dev_set_allmulti(dev, inc, false);
5068 } 5258 }
5069 5259
5070 return ret; 5260 return ret;
5071} 5261}
5072 5262
5073void __dev_notify_flags(struct net_device *dev, unsigned int old_flags) 5263void __dev_notify_flags(struct net_device *dev, unsigned int old_flags,
5264 unsigned int gchanges)
5074{ 5265{
5075 unsigned int changes = dev->flags ^ old_flags; 5266 unsigned int changes = dev->flags ^ old_flags;
5076 5267
5268 if (gchanges)
5269 rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC);
5270
5077 if (changes & IFF_UP) { 5271 if (changes & IFF_UP) {
5078 if (dev->flags & IFF_UP) 5272 if (dev->flags & IFF_UP)
5079 call_netdevice_notifiers(NETDEV_UP, dev); 5273 call_netdevice_notifiers(NETDEV_UP, dev);
@@ -5102,17 +5296,14 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags)
5102int dev_change_flags(struct net_device *dev, unsigned int flags) 5296int dev_change_flags(struct net_device *dev, unsigned int flags)
5103{ 5297{
5104 int ret; 5298 int ret;
5105 unsigned int changes, old_flags = dev->flags; 5299 unsigned int changes, old_flags = dev->flags, old_gflags = dev->gflags;
5106 5300
5107 ret = __dev_change_flags(dev, flags); 5301 ret = __dev_change_flags(dev, flags);
5108 if (ret < 0) 5302 if (ret < 0)
5109 return ret; 5303 return ret;
5110 5304
5111 changes = old_flags ^ dev->flags; 5305 changes = (old_flags ^ dev->flags) | (old_gflags ^ dev->gflags);
5112 if (changes) 5306 __dev_notify_flags(dev, old_flags, changes);
5113 rtmsg_ifinfo(RTM_NEWLINK, dev, changes);
5114
5115 __dev_notify_flags(dev, old_flags);
5116 return ret; 5307 return ret;
5117} 5308}
5118EXPORT_SYMBOL(dev_change_flags); 5309EXPORT_SYMBOL(dev_change_flags);
@@ -5259,6 +5450,7 @@ static void net_set_todo(struct net_device *dev)
5259static void rollback_registered_many(struct list_head *head) 5450static void rollback_registered_many(struct list_head *head)
5260{ 5451{
5261 struct net_device *dev, *tmp; 5452 struct net_device *dev, *tmp;
5453 LIST_HEAD(close_head);
5262 5454
5263 BUG_ON(dev_boot_phase); 5455 BUG_ON(dev_boot_phase);
5264 ASSERT_RTNL(); 5456 ASSERT_RTNL();
@@ -5281,7 +5473,9 @@ static void rollback_registered_many(struct list_head *head)
5281 } 5473 }
5282 5474
5283 /* If device is running, close it first. */ 5475 /* If device is running, close it first. */
5284 dev_close_many(head); 5476 list_for_each_entry(dev, head, unreg_list)
5477 list_add_tail(&dev->close_list, &close_head);
5478 dev_close_many(&close_head);
5285 5479
5286 list_for_each_entry(dev, head, unreg_list) { 5480 list_for_each_entry(dev, head, unreg_list) {
5287 /* And unlink it from device chain. */ 5481 /* And unlink it from device chain. */
@@ -5304,7 +5498,7 @@ static void rollback_registered_many(struct list_head *head)
5304 5498
5305 if (!dev->rtnl_link_ops || 5499 if (!dev->rtnl_link_ops ||
5306 dev->rtnl_link_state == RTNL_LINK_INITIALIZED) 5500 dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
5307 rtmsg_ifinfo(RTM_DELLINK, dev, ~0U); 5501 rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL);
5308 5502
5309 /* 5503 /*
5310 * Flush the unicast and multicast chains 5504 * Flush the unicast and multicast chains
@@ -5703,7 +5897,7 @@ int register_netdevice(struct net_device *dev)
5703 */ 5897 */
5704 if (!dev->rtnl_link_ops || 5898 if (!dev->rtnl_link_ops ||
5705 dev->rtnl_link_state == RTNL_LINK_INITIALIZED) 5899 dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
5706 rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U); 5900 rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
5707 5901
5708out: 5902out:
5709 return ret; 5903 return ret;
@@ -6010,6 +6204,16 @@ void netdev_set_default_ethtool_ops(struct net_device *dev,
6010} 6204}
6011EXPORT_SYMBOL_GPL(netdev_set_default_ethtool_ops); 6205EXPORT_SYMBOL_GPL(netdev_set_default_ethtool_ops);
6012 6206
6207void netdev_freemem(struct net_device *dev)
6208{
6209 char *addr = (char *)dev - dev->padded;
6210
6211 if (is_vmalloc_addr(addr))
6212 vfree(addr);
6213 else
6214 kfree(addr);
6215}
6216
6013/** 6217/**
6014 * alloc_netdev_mqs - allocate network device 6218 * alloc_netdev_mqs - allocate network device
6015 * @sizeof_priv: size of private data to allocate space for 6219 * @sizeof_priv: size of private data to allocate space for
@@ -6053,7 +6257,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
6053 /* ensure 32-byte alignment of whole construct */ 6257 /* ensure 32-byte alignment of whole construct */
6054 alloc_size += NETDEV_ALIGN - 1; 6258 alloc_size += NETDEV_ALIGN - 1;
6055 6259
6056 p = kzalloc(alloc_size, GFP_KERNEL); 6260 p = kzalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
6261 if (!p)
6262 p = vzalloc(alloc_size);
6057 if (!p) 6263 if (!p)
6058 return NULL; 6264 return NULL;
6059 6265
@@ -6062,7 +6268,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
6062 6268
6063 dev->pcpu_refcnt = alloc_percpu(int); 6269 dev->pcpu_refcnt = alloc_percpu(int);
6064 if (!dev->pcpu_refcnt) 6270 if (!dev->pcpu_refcnt)
6065 goto free_p; 6271 goto free_dev;
6066 6272
6067 if (dev_addr_init(dev)) 6273 if (dev_addr_init(dev))
6068 goto free_pcpu; 6274 goto free_pcpu;
@@ -6077,9 +6283,12 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
6077 6283
6078 INIT_LIST_HEAD(&dev->napi_list); 6284 INIT_LIST_HEAD(&dev->napi_list);
6079 INIT_LIST_HEAD(&dev->unreg_list); 6285 INIT_LIST_HEAD(&dev->unreg_list);
6286 INIT_LIST_HEAD(&dev->close_list);
6080 INIT_LIST_HEAD(&dev->link_watch_list); 6287 INIT_LIST_HEAD(&dev->link_watch_list);
6081 INIT_LIST_HEAD(&dev->upper_dev_list); 6288 INIT_LIST_HEAD(&dev->adj_list.upper);
6082 INIT_LIST_HEAD(&dev->lower_dev_list); 6289 INIT_LIST_HEAD(&dev->adj_list.lower);
6290 INIT_LIST_HEAD(&dev->all_adj_list.upper);
6291 INIT_LIST_HEAD(&dev->all_adj_list.lower);
6083 dev->priv_flags = IFF_XMIT_DST_RELEASE; 6292 dev->priv_flags = IFF_XMIT_DST_RELEASE;
6084 setup(dev); 6293 setup(dev);
6085 6294
@@ -6112,8 +6321,8 @@ free_pcpu:
6112 kfree(dev->_rx); 6321 kfree(dev->_rx);
6113#endif 6322#endif
6114 6323
6115free_p: 6324free_dev:
6116 kfree(p); 6325 netdev_freemem(dev);
6117 return NULL; 6326 return NULL;
6118} 6327}
6119EXPORT_SYMBOL(alloc_netdev_mqs); 6328EXPORT_SYMBOL(alloc_netdev_mqs);
@@ -6150,7 +6359,7 @@ void free_netdev(struct net_device *dev)
6150 6359
6151 /* Compatibility with error handling in drivers */ 6360 /* Compatibility with error handling in drivers */
6152 if (dev->reg_state == NETREG_UNINITIALIZED) { 6361 if (dev->reg_state == NETREG_UNINITIALIZED) {
6153 kfree((char *)dev - dev->padded); 6362 netdev_freemem(dev);
6154 return; 6363 return;
6155 } 6364 }
6156 6365
@@ -6312,7 +6521,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
6312 call_netdevice_notifiers(NETDEV_UNREGISTER, dev); 6521 call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
6313 rcu_barrier(); 6522 rcu_barrier();
6314 call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev); 6523 call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
6315 rtmsg_ifinfo(RTM_DELLINK, dev, ~0U); 6524 rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL);
6316 6525
6317 /* 6526 /*
6318 * Flush the unicast and multicast chains 6527 * Flush the unicast and multicast chains
@@ -6351,7 +6560,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
6351 * Prevent userspace races by waiting until the network 6560 * Prevent userspace races by waiting until the network
6352 * device is fully setup before sending notifications. 6561 * device is fully setup before sending notifications.
6353 */ 6562 */
6354 rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U); 6563 rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
6355 6564
6356 synchronize_net(); 6565 synchronize_net();
6357 err = 0; 6566 err = 0;
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index 6cda4e2c2132..ec40a849fc42 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -752,7 +752,7 @@ int dev_mc_del_global(struct net_device *dev, const unsigned char *addr)
752EXPORT_SYMBOL(dev_mc_del_global); 752EXPORT_SYMBOL(dev_mc_del_global);
753 753
754/** 754/**
755 * dev_mc_sync - Synchronize device's unicast list to another device 755 * dev_mc_sync - Synchronize device's multicast list to another device
756 * @to: destination device 756 * @to: destination device
757 * @from: source device 757 * @from: source device
758 * 758 *
@@ -780,7 +780,7 @@ int dev_mc_sync(struct net_device *to, struct net_device *from)
780EXPORT_SYMBOL(dev_mc_sync); 780EXPORT_SYMBOL(dev_mc_sync);
781 781
782/** 782/**
783 * dev_mc_sync_multiple - Synchronize device's unicast list to another 783 * dev_mc_sync_multiple - Synchronize device's multicast list to another
784 * device, but allow for multiple calls to sync to multiple devices. 784 * device, but allow for multiple calls to sync to multiple devices.
785 * @to: destination device 785 * @to: destination device
786 * @from: source device 786 * @from: source device
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 78e9d9223e40..30071dec287a 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -81,6 +81,8 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
81 [NETIF_F_TSO6_BIT] = "tx-tcp6-segmentation", 81 [NETIF_F_TSO6_BIT] = "tx-tcp6-segmentation",
82 [NETIF_F_FSO_BIT] = "tx-fcoe-segmentation", 82 [NETIF_F_FSO_BIT] = "tx-fcoe-segmentation",
83 [NETIF_F_GSO_GRE_BIT] = "tx-gre-segmentation", 83 [NETIF_F_GSO_GRE_BIT] = "tx-gre-segmentation",
84 [NETIF_F_GSO_IPIP_BIT] = "tx-ipip-segmentation",
85 [NETIF_F_GSO_SIT_BIT] = "tx-sit-segmentation",
84 [NETIF_F_GSO_UDP_TUNNEL_BIT] = "tx-udp_tnl-segmentation", 86 [NETIF_F_GSO_UDP_TUNNEL_BIT] = "tx-udp_tnl-segmentation",
85 [NETIF_F_GSO_MPLS_BIT] = "tx-mpls-segmentation", 87 [NETIF_F_GSO_MPLS_BIT] = "tx-mpls-segmentation",
86 88
@@ -94,6 +96,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
94 [NETIF_F_LOOPBACK_BIT] = "loopback", 96 [NETIF_F_LOOPBACK_BIT] = "loopback",
95 [NETIF_F_RXFCS_BIT] = "rx-fcs", 97 [NETIF_F_RXFCS_BIT] = "rx-fcs",
96 [NETIF_F_RXALL_BIT] = "rx-all", 98 [NETIF_F_RXALL_BIT] = "rx-all",
99 [NETIF_F_HW_L2FW_DOFFLOAD_BIT] = "l2-fwd-offload",
97}; 100};
98 101
99static int ethtool_get_features(struct net_device *dev, void __user *useraddr) 102static int ethtool_get_features(struct net_device *dev, void __user *useraddr)
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 2e654138433c..f409e0bd35c0 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -460,7 +460,8 @@ static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh)
460 if (frh->action && (frh->action != rule->action)) 460 if (frh->action && (frh->action != rule->action))
461 continue; 461 continue;
462 462
463 if (frh->table && (frh_get_table(frh, tb) != rule->table)) 463 if (frh_get_table(frh, tb) &&
464 (frh_get_table(frh, tb) != rule->table))
464 continue; 465 continue;
465 466
466 if (tb[FRA_PRIORITY] && 467 if (tb[FRA_PRIORITY] &&
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 143b6fdb9647..d6ef17322500 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -25,9 +25,35 @@ static void iph_to_flow_copy_addrs(struct flow_keys *flow, const struct iphdr *i
25 memcpy(&flow->src, &iph->saddr, sizeof(flow->src) + sizeof(flow->dst)); 25 memcpy(&flow->src, &iph->saddr, sizeof(flow->src) + sizeof(flow->dst));
26} 26}
27 27
28/**
29 * skb_flow_get_ports - extract the upper layer ports and return them
30 * @skb: buffer to extract the ports from
31 * @thoff: transport header offset
32 * @ip_proto: protocol for which to get port offset
33 *
34 * The function will try to retrieve the ports at offset thoff + poff where poff
35 * is the protocol port offset returned from proto_ports_offset
36 */
37__be32 skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto)
38{
39 int poff = proto_ports_offset(ip_proto);
40
41 if (poff >= 0) {
42 __be32 *ports, _ports;
43
44 ports = skb_header_pointer(skb, thoff + poff,
45 sizeof(_ports), &_ports);
46 if (ports)
47 return *ports;
48 }
49
50 return 0;
51}
52EXPORT_SYMBOL(skb_flow_get_ports);
53
28bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow) 54bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow)
29{ 55{
30 int poff, nhoff = skb_network_offset(skb); 56 int nhoff = skb_network_offset(skb);
31 u8 ip_proto; 57 u8 ip_proto;
32 __be16 proto = skb->protocol; 58 __be16 proto = skb->protocol;
33 59
@@ -42,13 +68,13 @@ ip:
42 iph = skb_header_pointer(skb, nhoff, sizeof(_iph), &_iph); 68 iph = skb_header_pointer(skb, nhoff, sizeof(_iph), &_iph);
43 if (!iph || iph->ihl < 5) 69 if (!iph || iph->ihl < 5)
44 return false; 70 return false;
71 nhoff += iph->ihl * 4;
45 72
73 ip_proto = iph->protocol;
46 if (ip_is_fragment(iph)) 74 if (ip_is_fragment(iph))
47 ip_proto = 0; 75 ip_proto = 0;
48 else 76
49 ip_proto = iph->protocol;
50 iph_to_flow_copy_addrs(flow, iph); 77 iph_to_flow_copy_addrs(flow, iph);
51 nhoff += iph->ihl * 4;
52 break; 78 break;
53 } 79 }
54 case __constant_htons(ETH_P_IPV6): { 80 case __constant_htons(ETH_P_IPV6): {
@@ -150,16 +176,7 @@ ipv6:
150 } 176 }
151 177
152 flow->ip_proto = ip_proto; 178 flow->ip_proto = ip_proto;
153 poff = proto_ports_offset(ip_proto); 179 flow->ports = skb_flow_get_ports(skb, nhoff, ip_proto);
154 if (poff >= 0) {
155 __be32 *ports, _ports;
156
157 ports = skb_header_pointer(skb, nhoff + poff,
158 sizeof(_ports), &_ports);
159 if (ports)
160 flow->ports = *ports;
161 }
162
163 flow->thoff = (u16) nhoff; 180 flow->thoff = (u16) nhoff;
164 181
165 return true; 182 return true;
@@ -167,6 +184,22 @@ ipv6:
167EXPORT_SYMBOL(skb_flow_dissect); 184EXPORT_SYMBOL(skb_flow_dissect);
168 185
169static u32 hashrnd __read_mostly; 186static u32 hashrnd __read_mostly;
187static __always_inline void __flow_hash_secret_init(void)
188{
189 net_get_random_once(&hashrnd, sizeof(hashrnd));
190}
191
192static __always_inline u32 __flow_hash_3words(u32 a, u32 b, u32 c)
193{
194 __flow_hash_secret_init();
195 return jhash_3words(a, b, c, hashrnd);
196}
197
198static __always_inline u32 __flow_hash_1word(u32 a)
199{
200 __flow_hash_secret_init();
201 return jhash_1word(a, hashrnd);
202}
170 203
171/* 204/*
172 * __skb_get_rxhash: calculate a flow hash based on src/dst addresses 205 * __skb_get_rxhash: calculate a flow hash based on src/dst addresses
@@ -193,9 +226,9 @@ void __skb_get_rxhash(struct sk_buff *skb)
193 swap(keys.port16[0], keys.port16[1]); 226 swap(keys.port16[0], keys.port16[1]);
194 } 227 }
195 228
196 hash = jhash_3words((__force u32)keys.dst, 229 hash = __flow_hash_3words((__force u32)keys.dst,
197 (__force u32)keys.src, 230 (__force u32)keys.src,
198 (__force u32)keys.ports, hashrnd); 231 (__force u32)keys.ports);
199 if (!hash) 232 if (!hash)
200 hash = 1; 233 hash = 1;
201 234
@@ -231,7 +264,7 @@ u16 __skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb,
231 hash = skb->sk->sk_hash; 264 hash = skb->sk->sk_hash;
232 else 265 else
233 hash = (__force u16) skb->protocol; 266 hash = (__force u16) skb->protocol;
234 hash = jhash_1word(hash, hashrnd); 267 hash = __flow_hash_1word(hash);
235 268
236 return (u16) (((u64) hash * qcount) >> 32) + qoffset; 269 return (u16) (((u64) hash * qcount) >> 32) + qoffset;
237} 270}
@@ -323,7 +356,7 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
323 else 356 else
324 hash = (__force u16) skb->protocol ^ 357 hash = (__force u16) skb->protocol ^
325 skb->rxhash; 358 skb->rxhash;
326 hash = jhash_1word(hash, hashrnd); 359 hash = __flow_hash_1word(hash);
327 queue_index = map->queues[ 360 queue_index = map->queues[
328 ((u64)hash * map->len) >> 32]; 361 ((u64)hash * map->len) >> 32];
329 } 362 }
@@ -378,11 +411,3 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
378 skb_set_queue_mapping(skb, queue_index); 411 skb_set_queue_mapping(skb, queue_index);
379 return netdev_get_tx_queue(dev, queue_index); 412 return netdev_get_tx_queue(dev, queue_index);
380} 413}
381
382static int __init initialize_hashrnd(void)
383{
384 get_random_bytes(&hashrnd, sizeof(hashrnd));
385 return 0;
386}
387
388late_initcall_sync(initialize_hashrnd);
diff --git a/net/core/iovec.c b/net/core/iovec.c
index b77eeecc0011..4cdb7c48dad6 100644
--- a/net/core/iovec.c
+++ b/net/core/iovec.c
@@ -100,7 +100,7 @@ int memcpy_toiovecend(const struct iovec *iov, unsigned char *kdata,
100EXPORT_SYMBOL(memcpy_toiovecend); 100EXPORT_SYMBOL(memcpy_toiovecend);
101 101
102/* 102/*
103 * Copy iovec from kernel. Returns -EFAULT on error. 103 * Copy iovec to kernel. Returns -EFAULT on error.
104 */ 104 */
105 105
106int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov, 106int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 6072610a8672..ca15f32821fb 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -867,7 +867,7 @@ static void neigh_invalidate(struct neighbour *neigh)
867static void neigh_probe(struct neighbour *neigh) 867static void neigh_probe(struct neighbour *neigh)
868 __releases(neigh->lock) 868 __releases(neigh->lock)
869{ 869{
870 struct sk_buff *skb = skb_peek(&neigh->arp_queue); 870 struct sk_buff *skb = skb_peek_tail(&neigh->arp_queue);
871 /* keep skb alive even if arp_queue overflows */ 871 /* keep skb alive even if arp_queue overflows */
872 if (skb) 872 if (skb)
873 skb = skb_copy(skb, GFP_ATOMIC); 873 skb = skb_copy(skb, GFP_ATOMIC);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 325dee863e46..f3edf9635e02 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1263,7 +1263,7 @@ static void netdev_release(struct device *d)
1263 BUG_ON(dev->reg_state != NETREG_RELEASED); 1263 BUG_ON(dev->reg_state != NETREG_RELEASED);
1264 1264
1265 kfree(dev->ifalias); 1265 kfree(dev->ifalias);
1266 kfree((char *)dev - dev->padded); 1266 netdev_freemem(dev);
1267} 1267}
1268 1268
1269static const void *net_namespace(struct device *d) 1269static const void *net_namespace(struct device *d)
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index d9cd627e6a16..9b7cf6c85f82 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -222,11 +222,10 @@ static void net_prio_attach(struct cgroup_subsys_state *css,
222 struct cgroup_taskset *tset) 222 struct cgroup_taskset *tset)
223{ 223{
224 struct task_struct *p; 224 struct task_struct *p;
225 void *v; 225 void *v = (void *)(unsigned long)css->cgroup->id;
226 226
227 cgroup_taskset_for_each(p, css, tset) { 227 cgroup_taskset_for_each(p, css, tset) {
228 task_lock(p); 228 task_lock(p);
229 v = (void *)(unsigned long)task_netprioidx(p);
230 iterate_fd(p->files, 0, update_netprio, v); 229 iterate_fd(p->files, 0, update_netprio, v);
231 task_unlock(p); 230 task_unlock(p);
232 } 231 }
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2a0e21de3060..cf67144d3e3c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1647,9 +1647,8 @@ int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm)
1647 } 1647 }
1648 1648
1649 dev->rtnl_link_state = RTNL_LINK_INITIALIZED; 1649 dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
1650 rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U);
1651 1650
1652 __dev_notify_flags(dev, old_flags); 1651 __dev_notify_flags(dev, old_flags, ~0U);
1653 return 0; 1652 return 0;
1654} 1653}
1655EXPORT_SYMBOL(rtnl_configure_link); 1654EXPORT_SYMBOL(rtnl_configure_link);
@@ -1985,14 +1984,15 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
1985 return skb->len; 1984 return skb->len;
1986} 1985}
1987 1986
1988void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change) 1987void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
1988 gfp_t flags)
1989{ 1989{
1990 struct net *net = dev_net(dev); 1990 struct net *net = dev_net(dev);
1991 struct sk_buff *skb; 1991 struct sk_buff *skb;
1992 int err = -ENOBUFS; 1992 int err = -ENOBUFS;
1993 size_t if_info_size; 1993 size_t if_info_size;
1994 1994
1995 skb = nlmsg_new((if_info_size = if_nlmsg_size(dev, 0)), GFP_KERNEL); 1995 skb = nlmsg_new((if_info_size = if_nlmsg_size(dev, 0)), flags);
1996 if (skb == NULL) 1996 if (skb == NULL)
1997 goto errout; 1997 goto errout;
1998 1998
@@ -2003,7 +2003,7 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change)
2003 kfree_skb(skb); 2003 kfree_skb(skb);
2004 goto errout; 2004 goto errout;
2005 } 2005 }
2006 rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_KERNEL); 2006 rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, flags);
2007 return; 2007 return;
2008errout: 2008errout:
2009 if (err < 0) 2009 if (err < 0)
@@ -2717,7 +2717,7 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi
2717 case NETDEV_JOIN: 2717 case NETDEV_JOIN:
2718 break; 2718 break;
2719 default: 2719 default:
2720 rtmsg_ifinfo(RTM_NEWLINK, dev, 0); 2720 rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
2721 break; 2721 break;
2722 } 2722 }
2723 return NOTIFY_DONE; 2723 return NOTIFY_DONE;
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 8d9d05edd2eb..897da56f3aff 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -7,6 +7,7 @@
7#include <linux/hrtimer.h> 7#include <linux/hrtimer.h>
8#include <linux/ktime.h> 8#include <linux/ktime.h>
9#include <linux/string.h> 9#include <linux/string.h>
10#include <linux/net.h>
10 11
11#include <net/secure_seq.h> 12#include <net/secure_seq.h>
12 13
@@ -15,20 +16,9 @@
15 16
16static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned; 17static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
17 18
18static void net_secret_init(void) 19static __always_inline void net_secret_init(void)
19{ 20{
20 u32 tmp; 21 net_get_random_once(net_secret, sizeof(net_secret));
21 int i;
22
23 if (likely(net_secret[0]))
24 return;
25
26 for (i = NET_SECRET_SIZE; i > 0;) {
27 do {
28 get_random_bytes(&tmp, sizeof(tmp));
29 } while (!tmp);
30 cmpxchg(&net_secret[--i], 0, tmp);
31 }
32} 22}
33#endif 23#endif
34 24
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d81cff119f73..8cec1e6b844d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -476,6 +476,18 @@ void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
476} 476}
477EXPORT_SYMBOL(skb_add_rx_frag); 477EXPORT_SYMBOL(skb_add_rx_frag);
478 478
479void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size,
480 unsigned int truesize)
481{
482 skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
483
484 skb_frag_size_add(frag, size);
485 skb->len += size;
486 skb->data_len += size;
487 skb->truesize += truesize;
488}
489EXPORT_SYMBOL(skb_coalesce_rx_frag);
490
479static void skb_drop_list(struct sk_buff **listp) 491static void skb_drop_list(struct sk_buff **listp)
480{ 492{
481 kfree_skb_list(*listp); 493 kfree_skb_list(*listp);
@@ -580,9 +592,6 @@ static void skb_release_head_state(struct sk_buff *skb)
580#if IS_ENABLED(CONFIG_NF_CONNTRACK) 592#if IS_ENABLED(CONFIG_NF_CONNTRACK)
581 nf_conntrack_put(skb->nfct); 593 nf_conntrack_put(skb->nfct);
582#endif 594#endif
583#ifdef NET_SKBUFF_NF_DEFRAG_NEEDED
584 nf_conntrack_put_reasm(skb->nfct_reasm);
585#endif
586#ifdef CONFIG_BRIDGE_NETFILTER 595#ifdef CONFIG_BRIDGE_NETFILTER
587 nf_bridge_put(skb->nf_bridge); 596 nf_bridge_put(skb->nf_bridge);
588#endif 597#endif
@@ -903,6 +912,9 @@ EXPORT_SYMBOL(skb_clone);
903 912
904static void skb_headers_offset_update(struct sk_buff *skb, int off) 913static void skb_headers_offset_update(struct sk_buff *skb, int off)
905{ 914{
915 /* Only adjust this if it actually is csum_start rather than csum */
916 if (skb->ip_summed == CHECKSUM_PARTIAL)
917 skb->csum_start += off;
906 /* {transport,network,mac}_header and tail are relative to skb->head */ 918 /* {transport,network,mac}_header and tail are relative to skb->head */
907 skb->transport_header += off; 919 skb->transport_header += off;
908 skb->network_header += off; 920 skb->network_header += off;
@@ -1036,8 +1048,8 @@ EXPORT_SYMBOL(__pskb_copy);
1036 * @ntail: room to add at tail 1048 * @ntail: room to add at tail
1037 * @gfp_mask: allocation priority 1049 * @gfp_mask: allocation priority
1038 * 1050 *
1039 * Expands (or creates identical copy, if &nhead and &ntail are zero) 1051 * Expands (or creates identical copy, if @nhead and @ntail are zero)
1040 * header of skb. &sk_buff itself is not changed. &sk_buff MUST have 1052 * header of @skb. &sk_buff itself is not changed. &sk_buff MUST have
1041 * reference count of 1. Returns zero in the case of success or error, 1053 * reference count of 1. Returns zero in the case of success or error,
1042 * if expansion failed. In the last case, &sk_buff is not changed. 1054 * if expansion failed. In the last case, &sk_buff is not changed.
1043 * 1055 *
@@ -1109,9 +1121,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
1109#endif 1121#endif
1110 skb->tail += off; 1122 skb->tail += off;
1111 skb_headers_offset_update(skb, nhead); 1123 skb_headers_offset_update(skb, nhead);
1112 /* Only adjust this if it actually is csum_start rather than csum */
1113 if (skb->ip_summed == CHECKSUM_PARTIAL)
1114 skb->csum_start += nhead;
1115 skb->cloned = 0; 1124 skb->cloned = 0;
1116 skb->hdr_len = 0; 1125 skb->hdr_len = 0;
1117 skb->nohdr = 0; 1126 skb->nohdr = 0;
@@ -1176,7 +1185,6 @@ struct sk_buff *skb_copy_expand(const struct sk_buff *skb,
1176 NUMA_NO_NODE); 1185 NUMA_NO_NODE);
1177 int oldheadroom = skb_headroom(skb); 1186 int oldheadroom = skb_headroom(skb);
1178 int head_copy_len, head_copy_off; 1187 int head_copy_len, head_copy_off;
1179 int off;
1180 1188
1181 if (!n) 1189 if (!n)
1182 return NULL; 1190 return NULL;
@@ -1200,11 +1208,7 @@ struct sk_buff *skb_copy_expand(const struct sk_buff *skb,
1200 1208
1201 copy_skb_header(n, skb); 1209 copy_skb_header(n, skb);
1202 1210
1203 off = newheadroom - oldheadroom; 1211 skb_headers_offset_update(n, newheadroom - oldheadroom);
1204 if (n->ip_summed == CHECKSUM_PARTIAL)
1205 n->csum_start += off;
1206
1207 skb_headers_offset_update(n, off);
1208 1212
1209 return n; 1213 return n;
1210} 1214}
@@ -1257,6 +1261,29 @@ free_skb:
1257EXPORT_SYMBOL(skb_pad); 1261EXPORT_SYMBOL(skb_pad);
1258 1262
1259/** 1263/**
1264 * pskb_put - add data to the tail of a potentially fragmented buffer
1265 * @skb: start of the buffer to use
1266 * @tail: tail fragment of the buffer to use
1267 * @len: amount of data to add
1268 *
1269 * This function extends the used data area of the potentially
1270 * fragmented buffer. @tail must be the last fragment of @skb -- or
1271 * @skb itself. If this would exceed the total buffer size the kernel
1272 * will panic. A pointer to the first byte of the extra data is
1273 * returned.
1274 */
1275
1276unsigned char *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len)
1277{
1278 if (tail != skb) {
1279 skb->data_len += len;
1280 skb->len += len;
1281 }
1282 return skb_put(tail, len);
1283}
1284EXPORT_SYMBOL_GPL(pskb_put);
1285
1286/**
1260 * skb_put - add data to a buffer 1287 * skb_put - add data to a buffer
1261 * @skb: buffer to use 1288 * @skb: buffer to use
1262 * @len: amount of data to add 1289 * @len: amount of data to add
@@ -1933,9 +1960,8 @@ fault:
1933EXPORT_SYMBOL(skb_store_bits); 1960EXPORT_SYMBOL(skb_store_bits);
1934 1961
1935/* Checksum skb data. */ 1962/* Checksum skb data. */
1936 1963__wsum __skb_checksum(const struct sk_buff *skb, int offset, int len,
1937__wsum skb_checksum(const struct sk_buff *skb, int offset, 1964 __wsum csum, const struct skb_checksum_ops *ops)
1938 int len, __wsum csum)
1939{ 1965{
1940 int start = skb_headlen(skb); 1966 int start = skb_headlen(skb);
1941 int i, copy = start - offset; 1967 int i, copy = start - offset;
@@ -1946,7 +1972,7 @@ __wsum skb_checksum(const struct sk_buff *skb, int offset,
1946 if (copy > 0) { 1972 if (copy > 0) {
1947 if (copy > len) 1973 if (copy > len)
1948 copy = len; 1974 copy = len;
1949 csum = csum_partial(skb->data + offset, copy, csum); 1975 csum = ops->update(skb->data + offset, copy, csum);
1950 if ((len -= copy) == 0) 1976 if ((len -= copy) == 0)
1951 return csum; 1977 return csum;
1952 offset += copy; 1978 offset += copy;
@@ -1967,10 +1993,10 @@ __wsum skb_checksum(const struct sk_buff *skb, int offset,
1967 if (copy > len) 1993 if (copy > len)
1968 copy = len; 1994 copy = len;
1969 vaddr = kmap_atomic(skb_frag_page(frag)); 1995 vaddr = kmap_atomic(skb_frag_page(frag));
1970 csum2 = csum_partial(vaddr + frag->page_offset + 1996 csum2 = ops->update(vaddr + frag->page_offset +
1971 offset - start, copy, 0); 1997 offset - start, copy, 0);
1972 kunmap_atomic(vaddr); 1998 kunmap_atomic(vaddr);
1973 csum = csum_block_add(csum, csum2, pos); 1999 csum = ops->combine(csum, csum2, pos, copy);
1974 if (!(len -= copy)) 2000 if (!(len -= copy))
1975 return csum; 2001 return csum;
1976 offset += copy; 2002 offset += copy;
@@ -1989,9 +2015,9 @@ __wsum skb_checksum(const struct sk_buff *skb, int offset,
1989 __wsum csum2; 2015 __wsum csum2;
1990 if (copy > len) 2016 if (copy > len)
1991 copy = len; 2017 copy = len;
1992 csum2 = skb_checksum(frag_iter, offset - start, 2018 csum2 = __skb_checksum(frag_iter, offset - start,
1993 copy, 0); 2019 copy, 0, ops);
1994 csum = csum_block_add(csum, csum2, pos); 2020 csum = ops->combine(csum, csum2, pos, copy);
1995 if ((len -= copy) == 0) 2021 if ((len -= copy) == 0)
1996 return csum; 2022 return csum;
1997 offset += copy; 2023 offset += copy;
@@ -2003,6 +2029,18 @@ __wsum skb_checksum(const struct sk_buff *skb, int offset,
2003 2029
2004 return csum; 2030 return csum;
2005} 2031}
2032EXPORT_SYMBOL(__skb_checksum);
2033
2034__wsum skb_checksum(const struct sk_buff *skb, int offset,
2035 int len, __wsum csum)
2036{
2037 const struct skb_checksum_ops ops = {
2038 .update = csum_partial_ext,
2039 .combine = csum_block_add_ext,
2040 };
2041
2042 return __skb_checksum(skb, offset, len, csum, &ops);
2043}
2006EXPORT_SYMBOL(skb_checksum); 2044EXPORT_SYMBOL(skb_checksum);
2007 2045
2008/* Both of above in one bottle. */ 2046/* Both of above in one bottle. */
@@ -2522,14 +2560,14 @@ EXPORT_SYMBOL(skb_prepare_seq_read);
2522 * @data: destination pointer for data to be returned 2560 * @data: destination pointer for data to be returned
2523 * @st: state variable 2561 * @st: state variable
2524 * 2562 *
2525 * Reads a block of skb data at &consumed relative to the 2563 * Reads a block of skb data at @consumed relative to the
2526 * lower offset specified to skb_prepare_seq_read(). Assigns 2564 * lower offset specified to skb_prepare_seq_read(). Assigns
2527 * the head of the data block to &data and returns the length 2565 * the head of the data block to @data and returns the length
2528 * of the block or 0 if the end of the skb data or the upper 2566 * of the block or 0 if the end of the skb data or the upper
2529 * offset has been reached. 2567 * offset has been reached.
2530 * 2568 *
2531 * The caller is not required to consume all of the data 2569 * The caller is not required to consume all of the data
2532 * returned, i.e. &consumed is typically set to the number 2570 * returned, i.e. @consumed is typically set to the number
2533 * of bytes already consumed and the next call to 2571 * of bytes already consumed and the next call to
2534 * skb_seq_read() will return the remaining part of the block. 2572 * skb_seq_read() will return the remaining part of the block.
2535 * 2573 *
@@ -2837,14 +2875,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
2837 __copy_skb_header(nskb, skb); 2875 __copy_skb_header(nskb, skb);
2838 nskb->mac_len = skb->mac_len; 2876 nskb->mac_len = skb->mac_len;
2839 2877
2840 /* nskb and skb might have different headroom */ 2878 skb_headers_offset_update(nskb, skb_headroom(nskb) - headroom);
2841 if (nskb->ip_summed == CHECKSUM_PARTIAL)
2842 nskb->csum_start += skb_headroom(nskb) - headroom;
2843
2844 skb_reset_mac_header(nskb);
2845 skb_set_network_header(nskb, skb->mac_len);
2846 nskb->transport_header = (nskb->network_header +
2847 skb_network_header_len(skb));
2848 2879
2849 skb_copy_from_linear_data_offset(skb, -tnl_hlen, 2880 skb_copy_from_linear_data_offset(skb, -tnl_hlen,
2850 nskb->data - tnl_hlen, 2881 nskb->data - tnl_hlen,
@@ -2936,32 +2967,30 @@ EXPORT_SYMBOL_GPL(skb_segment);
2936 2967
2937int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb) 2968int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
2938{ 2969{
2939 struct sk_buff *p = *head; 2970 struct skb_shared_info *pinfo, *skbinfo = skb_shinfo(skb);
2940 struct sk_buff *nskb;
2941 struct skb_shared_info *skbinfo = skb_shinfo(skb);
2942 struct skb_shared_info *pinfo = skb_shinfo(p);
2943 unsigned int headroom;
2944 unsigned int len = skb_gro_len(skb);
2945 unsigned int offset = skb_gro_offset(skb); 2971 unsigned int offset = skb_gro_offset(skb);
2946 unsigned int headlen = skb_headlen(skb); 2972 unsigned int headlen = skb_headlen(skb);
2973 struct sk_buff *nskb, *lp, *p = *head;
2974 unsigned int len = skb_gro_len(skb);
2947 unsigned int delta_truesize; 2975 unsigned int delta_truesize;
2976 unsigned int headroom;
2948 2977
2949 if (p->len + len >= 65536) 2978 if (unlikely(p->len + len >= 65536))
2950 return -E2BIG; 2979 return -E2BIG;
2951 2980
2952 if (pinfo->frag_list) 2981 lp = NAPI_GRO_CB(p)->last ?: p;
2953 goto merge; 2982 pinfo = skb_shinfo(lp);
2954 else if (headlen <= offset) { 2983
2984 if (headlen <= offset) {
2955 skb_frag_t *frag; 2985 skb_frag_t *frag;
2956 skb_frag_t *frag2; 2986 skb_frag_t *frag2;
2957 int i = skbinfo->nr_frags; 2987 int i = skbinfo->nr_frags;
2958 int nr_frags = pinfo->nr_frags + i; 2988 int nr_frags = pinfo->nr_frags + i;
2959 2989
2960 offset -= headlen;
2961
2962 if (nr_frags > MAX_SKB_FRAGS) 2990 if (nr_frags > MAX_SKB_FRAGS)
2963 return -E2BIG; 2991 goto merge;
2964 2992
2993 offset -= headlen;
2965 pinfo->nr_frags = nr_frags; 2994 pinfo->nr_frags = nr_frags;
2966 skbinfo->nr_frags = 0; 2995 skbinfo->nr_frags = 0;
2967 2996
@@ -2992,7 +3021,7 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
2992 unsigned int first_offset; 3021 unsigned int first_offset;
2993 3022
2994 if (nr_frags + 1 + skbinfo->nr_frags > MAX_SKB_FRAGS) 3023 if (nr_frags + 1 + skbinfo->nr_frags > MAX_SKB_FRAGS)
2995 return -E2BIG; 3024 goto merge;
2996 3025
2997 first_offset = skb->data - 3026 first_offset = skb->data -
2998 (unsigned char *)page_address(page) + 3027 (unsigned char *)page_address(page) +
@@ -3010,7 +3039,10 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
3010 delta_truesize = skb->truesize - SKB_DATA_ALIGN(sizeof(struct sk_buff)); 3039 delta_truesize = skb->truesize - SKB_DATA_ALIGN(sizeof(struct sk_buff));
3011 NAPI_GRO_CB(skb)->free = NAPI_GRO_FREE_STOLEN_HEAD; 3040 NAPI_GRO_CB(skb)->free = NAPI_GRO_FREE_STOLEN_HEAD;
3012 goto done; 3041 goto done;
3013 } else if (skb_gro_len(p) != pinfo->gso_size) 3042 }
3043 if (pinfo->frag_list)
3044 goto merge;
3045 if (skb_gro_len(p) != pinfo->gso_size)
3014 return -E2BIG; 3046 return -E2BIG;
3015 3047
3016 headroom = skb_headroom(p); 3048 headroom = skb_headroom(p);
@@ -3062,16 +3094,24 @@ merge:
3062 3094
3063 __skb_pull(skb, offset); 3095 __skb_pull(skb, offset);
3064 3096
3065 NAPI_GRO_CB(p)->last->next = skb; 3097 if (!NAPI_GRO_CB(p)->last)
3098 skb_shinfo(p)->frag_list = skb;
3099 else
3100 NAPI_GRO_CB(p)->last->next = skb;
3066 NAPI_GRO_CB(p)->last = skb; 3101 NAPI_GRO_CB(p)->last = skb;
3067 skb_header_release(skb); 3102 skb_header_release(skb);
3103 lp = p;
3068 3104
3069done: 3105done:
3070 NAPI_GRO_CB(p)->count++; 3106 NAPI_GRO_CB(p)->count++;
3071 p->data_len += len; 3107 p->data_len += len;
3072 p->truesize += delta_truesize; 3108 p->truesize += delta_truesize;
3073 p->len += len; 3109 p->len += len;
3074 3110 if (lp != p) {
3111 lp->data_len += len;
3112 lp->truesize += delta_truesize;
3113 lp->len += len;
3114 }
3075 NAPI_GRO_CB(skb)->same_flow = 1; 3115 NAPI_GRO_CB(skb)->same_flow = 1;
3076 return 0; 3116 return 0;
3077} 3117}
diff --git a/net/core/sock.c b/net/core/sock.c
index 0b39e7ae4383..ab20ed9b0f31 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -475,12 +475,6 @@ discard_and_relse:
475} 475}
476EXPORT_SYMBOL(sk_receive_skb); 476EXPORT_SYMBOL(sk_receive_skb);
477 477
478void sk_reset_txq(struct sock *sk)
479{
480 sk_tx_queue_clear(sk);
481}
482EXPORT_SYMBOL(sk_reset_txq);
483
484struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie) 478struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
485{ 479{
486 struct dst_entry *dst = __sk_dst_get(sk); 480 struct dst_entry *dst = __sk_dst_get(sk);
@@ -914,6 +908,13 @@ set_rcvbuf:
914 } 908 }
915 break; 909 break;
916#endif 910#endif
911
912 case SO_MAX_PACING_RATE:
913 sk->sk_max_pacing_rate = val;
914 sk->sk_pacing_rate = min(sk->sk_pacing_rate,
915 sk->sk_max_pacing_rate);
916 break;
917
917 default: 918 default:
918 ret = -ENOPROTOOPT; 919 ret = -ENOPROTOOPT;
919 break; 920 break;
@@ -1177,6 +1178,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
1177 break; 1178 break;
1178#endif 1179#endif
1179 1180
1181 case SO_MAX_PACING_RATE:
1182 v.val = sk->sk_max_pacing_rate;
1183 break;
1184
1180 default: 1185 default:
1181 return -ENOPROTOOPT; 1186 return -ENOPROTOOPT;
1182 } 1187 }
@@ -1836,7 +1841,17 @@ EXPORT_SYMBOL(sock_alloc_send_skb);
1836/* On 32bit arches, an skb frag is limited to 2^15 */ 1841/* On 32bit arches, an skb frag is limited to 2^15 */
1837#define SKB_FRAG_PAGE_ORDER get_order(32768) 1842#define SKB_FRAG_PAGE_ORDER get_order(32768)
1838 1843
1839bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag) 1844/**
1845 * skb_page_frag_refill - check that a page_frag contains enough room
1846 * @sz: minimum size of the fragment we want to get
1847 * @pfrag: pointer to page_frag
1848 * @prio: priority for memory allocation
1849 *
1850 * Note: While this allocator tries to use high order pages, there is
1851 * no guarantee that allocations succeed. Therefore, @sz MUST be
1852 * less or equal than PAGE_SIZE.
1853 */
1854bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio)
1840{ 1855{
1841 int order; 1856 int order;
1842 1857
@@ -1845,16 +1860,16 @@ bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
1845 pfrag->offset = 0; 1860 pfrag->offset = 0;
1846 return true; 1861 return true;
1847 } 1862 }
1848 if (pfrag->offset < pfrag->size) 1863 if (pfrag->offset + sz <= pfrag->size)
1849 return true; 1864 return true;
1850 put_page(pfrag->page); 1865 put_page(pfrag->page);
1851 } 1866 }
1852 1867
1853 /* We restrict high order allocations to users that can afford to wait */ 1868 /* We restrict high order allocations to users that can afford to wait */
1854 order = (sk->sk_allocation & __GFP_WAIT) ? SKB_FRAG_PAGE_ORDER : 0; 1869 order = (prio & __GFP_WAIT) ? SKB_FRAG_PAGE_ORDER : 0;
1855 1870
1856 do { 1871 do {
1857 gfp_t gfp = sk->sk_allocation; 1872 gfp_t gfp = prio;
1858 1873
1859 if (order) 1874 if (order)
1860 gfp |= __GFP_COMP | __GFP_NOWARN; 1875 gfp |= __GFP_COMP | __GFP_NOWARN;
@@ -1866,6 +1881,15 @@ bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
1866 } 1881 }
1867 } while (--order >= 0); 1882 } while (--order >= 0);
1868 1883
1884 return false;
1885}
1886EXPORT_SYMBOL(skb_page_frag_refill);
1887
1888bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
1889{
1890 if (likely(skb_page_frag_refill(32U, pfrag, sk->sk_allocation)))
1891 return true;
1892
1869 sk_enter_memory_pressure(sk); 1893 sk_enter_memory_pressure(sk);
1870 sk_stream_moderate_sndbuf(sk); 1894 sk_stream_moderate_sndbuf(sk);
1871 return false; 1895 return false;
@@ -2319,6 +2343,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
2319 sk->sk_ll_usec = sysctl_net_busy_read; 2343 sk->sk_ll_usec = sysctl_net_busy_read;
2320#endif 2344#endif
2321 2345
2346 sk->sk_max_pacing_rate = ~0U;
2322 sk->sk_pacing_rate = ~0U; 2347 sk->sk_pacing_rate = ~0U;
2323 /* 2348 /*
2324 * Before updating sk_refcnt, we must commit prior changes to memory 2349 * Before updating sk_refcnt, we must commit prior changes to memory
diff --git a/net/core/utils.c b/net/core/utils.c
index aa88e23fc87a..2f737bf90b3f 100644
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -338,3 +338,52 @@ void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb,
338 csum_unfold(*sum))); 338 csum_unfold(*sum)));
339} 339}
340EXPORT_SYMBOL(inet_proto_csum_replace16); 340EXPORT_SYMBOL(inet_proto_csum_replace16);
341
342struct __net_random_once_work {
343 struct work_struct work;
344 struct static_key *key;
345};
346
347static void __net_random_once_deferred(struct work_struct *w)
348{
349 struct __net_random_once_work *work =
350 container_of(w, struct __net_random_once_work, work);
351 if (!static_key_enabled(work->key))
352 static_key_slow_inc(work->key);
353 kfree(work);
354}
355
356static void __net_random_once_disable_jump(struct static_key *key)
357{
358 struct __net_random_once_work *w;
359
360 w = kmalloc(sizeof(*w), GFP_ATOMIC);
361 if (!w)
362 return;
363
364 INIT_WORK(&w->work, __net_random_once_deferred);
365 w->key = key;
366 schedule_work(&w->work);
367}
368
369bool __net_get_random_once(void *buf, int nbytes, bool *done,
370 struct static_key *done_key)
371{
372 static DEFINE_SPINLOCK(lock);
373 unsigned long flags;
374
375 spin_lock_irqsave(&lock, flags);
376 if (*done) {
377 spin_unlock_irqrestore(&lock, flags);
378 return false;
379 }
380
381 get_random_bytes(buf, nbytes);
382 *done = true;
383 spin_unlock_irqrestore(&lock, flags);
384
385 __net_random_once_disable_jump(done_key);
386
387 return true;
388}
389EXPORT_SYMBOL(__net_get_random_once);