aboutsummaryrefslogtreecommitdiffstats
path: root/net
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2014-10-08 21:40:54 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2014-10-08 21:40:54 -0400
commit35a9ad8af0bb0fa3525e6d0d20e32551d226f38e (patch)
tree15b4b33206818886d9cff371fd2163e073b70568 /net
parentd5935b07da53f74726e2a65dd4281d0f2c70e5d4 (diff)
parent64b1f00a0830e1c53874067273a096b228d83d36 (diff)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller: "Most notable changes in here: 1) By far the biggest accomplishment, thanks to a large range of contributors, is the addition of multi-send for transmit. This is the result of discussions back in Chicago, and the hard work of several individuals. Now, when the ->ndo_start_xmit() method of a driver sees skb->xmit_more as true, it can choose to defer the doorbell telling the driver to start processing the new TX queue entires. skb->xmit_more means that the generic networking is guaranteed to call the driver immediately with another SKB to send. There is logic added to the qdisc layer to dequeue multiple packets at a time, and the handling mis-predicted offloads in software is now done with no locks held. Finally, pktgen is extended to have a "burst" parameter that can be used to test a multi-send implementation. Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4, virtio_net Adding support is almost trivial, so export more drivers to support this optimization soon. I want to thank, in no particular or implied order, Jesper Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann, David Tat, Hannes Frederic Sowa, and Rusty Russell. 2) PTP and timestamping support in bnx2x, from Michal Kalderon. 3) Allow adjusting the rx_copybreak threshold for a driver via ethtool, and add rx_copybreak support to enic driver. From Govindarajulu Varadarajan. 4) Significant enhancements to the generic PHY layer and the bcm7xxx driver in particular (EEE support, auto power down, etc.) from Florian Fainelli. 5) Allow raw buffers to be used for flow dissection, allowing drivers to determine the optimal "linear pull" size for devices that DMA into pools of pages. The objective is to get exactly the necessary amount of headers into the linear SKB area pre-pulled, but no more. The new interface drivers use is eth_get_headlen(). From WANG Cong, with driver conversions (several had their own by-hand duplicated implementations) by Alexander Duyck and Eric Dumazet. 6) Support checksumming more smoothly and efficiently for encapsulations, and add "foo over UDP" facility. From Tom Herbert. 7) Add Broadcom SF2 switch driver to DSA layer, from Florian Fainelli. 8) eBPF now can load programs via a system call and has an extensive testsuite. Alexei Starovoitov and Daniel Borkmann. 9) Major overhaul of the packet scheduler to use RCU in several major areas such as the classifiers and rate estimators. From John Fastabend. 10) Add driver for Intel FM10000 Ethernet Switch, from Alexander Duyck. 11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric Dumazet. 12) Add Datacenter TCP congestion control algorithm support, From Florian Westphal. 13) Reorganize sk_buff so that __copy_skb_header() is significantly faster. From Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits) netlabel: directly return netlbl_unlabel_genl_init() net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers net: description of dma_cookie cause make xmldocs warning cxgb4: clean up a type issue cxgb4: potential shift wrapping bug i40e: skb->xmit_more support net: fs_enet: Add NAPI TX net: fs_enet: Remove non NAPI RX r8169:add support for RTL8168EP net_sched: copy exts->type in tcf_exts_change() wimax: convert printk to pr_foo() af_unix: remove 0 assignment on static ipv6: Do not warn for informational ICMP messages, regardless of type. Update Intel Ethernet Driver maintainers list bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING tipc: fix bug in multicast congestion handling net: better IFF_XMIT_DST_RELEASE support net/mlx4_en: remove NETDEV_TX_BUSY 3c59x: fix bad split of cpu_to_le32(pci_map_single()) net: bcmgenet: fix Tx ring priority programming ...
Diffstat (limited to 'net')
-rw-r--r--net/8021q/vlan_dev.c3
-rw-r--r--net/Kconfig7
-rw-r--r--net/atm/clip.c6
-rw-r--r--net/atm/common.c2
-rw-r--r--net/atm/mpc.c2
-rw-r--r--net/bluetooth/6lowpan.c229
-rw-r--r--net/bluetooth/af_bluetooth.c3
-rw-r--r--net/bluetooth/amp.c13
-rw-r--r--net/bluetooth/hci_conn.c92
-rw-r--r--net/bluetooth/hci_core.c122
-rw-r--r--net/bluetooth/hci_event.c44
-rw-r--r--net/bluetooth/hidp/core.c10
-rw-r--r--net/bluetooth/l2cap_core.c406
-rw-r--r--net/bluetooth/l2cap_sock.c23
-rw-r--r--net/bluetooth/lib.c14
-rw-r--r--net/bluetooth/mgmt.c161
-rw-r--r--net/bluetooth/smp.c903
-rw-r--r--net/bluetooth/smp.h20
-rw-r--r--net/bridge/Makefile4
-rw-r--r--net/bridge/br.c14
-rw-r--r--net/bridge/br_device.c12
-rw-r--r--net/bridge/br_forward.c2
-rw-r--r--net/bridge/br_if.c20
-rw-r--r--net/bridge/br_input.c1
-rw-r--r--net/bridge/br_multicast.c2
-rw-r--r--net/bridge/br_netfilter.c132
-rw-r--r--net/bridge/br_netlink.c116
-rw-r--r--net/bridge/br_nf_core.c96
-rw-r--r--net/bridge/br_private.h40
-rw-r--r--net/bridge/br_stp.c15
-rw-r--r--net/bridge/br_stp_if.c4
-rw-r--r--net/bridge/br_stp_timer.c4
-rw-r--r--net/bridge/br_sysfs_br.c21
-rw-r--r--net/bridge/br_vlan.c147
-rw-r--r--net/bridge/netfilter/ebtables.c15
-rw-r--r--net/bridge/netfilter/nf_tables_bridge.c2
-rw-r--r--net/bridge/netfilter/nft_reject_bridge.c95
-rw-r--r--net/core/dev.c459
-rw-r--r--net/core/dev_ioctl.c7
-rw-r--r--net/core/ethtool.c82
-rw-r--r--net/core/filter.c45
-rw-r--r--net/core/flow_dissector.c115
-rw-r--r--net/core/gen_estimator.c29
-rw-r--r--net/core/gen_stats.c112
-rw-r--r--net/core/net_namespace.c2
-rw-r--r--net/core/netpoll.c7
-rw-r--r--net/core/pktgen.c76
-rw-r--r--net/core/rtnetlink.c66
-rw-r--r--net/core/secure_seq.c6
-rw-r--r--net/core/skbuff.c395
-rw-r--r--net/core/sock.c110
-rw-r--r--net/core/timestamping.c43
-rw-r--r--net/core/utils.c12
-rw-r--r--net/dccp/ccid.c2
-rw-r--r--net/dccp/ipv6.c2
-rw-r--r--net/dccp/proto.c2
-rw-r--r--net/decnet/af_decnet.c3
-rw-r--r--net/decnet/dn_dev.c3
-rw-r--r--net/decnet/dn_timer.c3
-rw-r--r--net/dsa/Kconfig3
-rw-r--r--net/dsa/Makefile1
-rw-r--r--net/dsa/dsa.c186
-rw-r--r--net/dsa/dsa_priv.h29
-rw-r--r--net/dsa/slave.c304
-rw-r--r--net/dsa/tag_brcm.c171
-rw-r--r--net/dsa/tag_dsa.c9
-rw-r--r--net/dsa/tag_edsa.c9
-rw-r--r--net/dsa/tag_trailer.c9
-rw-r--r--net/ethernet/eth.c34
-rw-r--r--net/ieee802154/6lowpan_rtnl.c127
-rw-r--r--net/ieee802154/reassembly.c4
-rw-r--r--net/ipv4/Kconfig51
-rw-r--r--net/ipv4/Makefile3
-rw-r--r--net/ipv4/af_inet.c47
-rw-r--r--net/ipv4/ah4.c2
-rw-r--r--net/ipv4/arp.c6
-rw-r--r--net/ipv4/cipso_ipv4.c2
-rw-r--r--net/ipv4/fib_frontend.c14
-rw-r--r--net/ipv4/fib_semantics.c8
-rw-r--r--net/ipv4/fou.c514
-rw-r--r--net/ipv4/geneve.c373
-rw-r--r--net/ipv4/gre_demux.c9
-rw-r--r--net/ipv4/gre_offload.c55
-rw-r--r--net/ipv4/icmp.c64
-rw-r--r--net/ipv4/igmp.c35
-rw-r--r--net/ipv4/inet_hashtables.c2
-rw-r--r--net/ipv4/inetpeer.c21
-rw-r--r--net/ipv4/ip_fragment.c4
-rw-r--r--net/ipv4/ip_gre.c94
-rw-r--r--net/ipv4/ip_options.c6
-rw-r--r--net/ipv4/ip_output.c10
-rw-r--r--net/ipv4/ip_sockglue.c19
-rw-r--r--net/ipv4/ip_tunnel.c106
-rw-r--r--net/ipv4/ip_vti.c2
-rw-r--r--net/ipv4/ipconfig.c3
-rw-r--r--net/ipv4/ipip.c82
-rw-r--r--net/ipv4/netfilter/Kconfig39
-rw-r--r--net/ipv4/netfilter/Makefile5
-rw-r--r--net/ipv4/netfilter/ipt_CLUSTERIP.c2
-rw-r--r--net/ipv4/netfilter/ipt_MASQUERADE.c108
-rw-r--r--net/ipv4/netfilter/ipt_REJECT.c2
-rw-r--r--net/ipv4/netfilter/iptable_nat.c233
-rw-r--r--net/ipv4/netfilter/nf_defrag_ipv4.c2
-rw-r--r--net/ipv4/netfilter/nf_nat_l3proto_ipv4.c199
-rw-r--r--net/ipv4/netfilter/nf_nat_masquerade_ipv4.c153
-rw-r--r--net/ipv4/netfilter/nf_reject_ipv4.c127
-rw-r--r--net/ipv4/netfilter/nft_chain_nat_ipv4.c157
-rw-r--r--net/ipv4/netfilter/nft_masq_ipv4.c77
-rw-r--r--net/ipv4/netfilter/nft_reject_ipv4.c1
-rw-r--r--net/ipv4/ping.c2
-rw-r--r--net/ipv4/protocol.c1
-rw-r--r--net/ipv4/route.c14
-rw-r--r--net/ipv4/syncookies.c2
-rw-r--r--net/ipv4/sysctl_net_ipv4.c40
-rw-r--r--net/ipv4/tcp.c36
-rw-r--r--net/ipv4/tcp_bic.c11
-rw-r--r--net/ipv4/tcp_cong.c55
-rw-r--r--net/ipv4/tcp_cubic.c18
-rw-r--r--net/ipv4/tcp_dctcp.c344
-rw-r--r--net/ipv4/tcp_diag.c5
-rw-r--r--net/ipv4/tcp_fastopen.c2
-rw-r--r--net/ipv4/tcp_highspeed.c145
-rw-r--r--net/ipv4/tcp_htcp.c6
-rw-r--r--net/ipv4/tcp_hybla.c1
-rw-r--r--net/ipv4/tcp_illinois.c3
-rw-r--r--net/ipv4/tcp_input.c285
-rw-r--r--net/ipv4/tcp_ipv4.c68
-rw-r--r--net/ipv4/tcp_minisocks.c13
-rw-r--r--net/ipv4/tcp_offload.c72
-rw-r--r--net/ipv4/tcp_output.c124
-rw-r--r--net/ipv4/tcp_probe.c6
-rw-r--r--net/ipv4/tcp_scalable.c2
-rw-r--r--net/ipv4/tcp_timer.c52
-rw-r--r--net/ipv4/tcp_vegas.c3
-rw-r--r--net/ipv4/tcp_veno.c1
-rw-r--r--net/ipv4/tcp_westwood.c35
-rw-r--r--net/ipv4/tcp_yeah.c9
-rw-r--r--net/ipv4/udp.c13
-rw-r--r--net/ipv4/udp_offload.c171
-rw-r--r--net/ipv4/udp_tunnel.c138
-rw-r--r--net/ipv6/Makefile4
-rw-r--r--net/ipv6/addrconf.c17
-rw-r--r--net/ipv6/af_inet6.c20
-rw-r--r--net/ipv6/ah6.c23
-rw-r--r--net/ipv6/anycast.c108
-rw-r--r--net/ipv6/datagram.c23
-rw-r--r--net/ipv6/esp6.c15
-rw-r--r--net/ipv6/exthdrs.c2
-rw-r--r--net/ipv6/icmp.c34
-rw-r--r--net/ipv6/inet6_connection_sock.c6
-rw-r--r--net/ipv6/inet6_hashtables.c7
-rw-r--r--net/ipv6/ip6_fib.c142
-rw-r--r--net/ipv6/ip6_flowlabel.c19
-rw-r--r--net/ipv6/ip6_gre.c14
-rw-r--r--net/ipv6/ip6_icmp.c2
-rw-r--r--net/ipv6/ip6_input.c6
-rw-r--r--net/ipv6/ip6_offload.c34
-rw-r--r--net/ipv6/ip6_output.c27
-rw-r--r--net/ipv6/ip6_tunnel.c34
-rw-r--r--net/ipv6/ip6_udp_tunnel.c107
-rw-r--r--net/ipv6/ip6_vti.c2
-rw-r--r--net/ipv6/ip6mr.c4
-rw-r--r--net/ipv6/ipcomp6.c6
-rw-r--r--net/ipv6/ipv6_sockglue.c26
-rw-r--r--net/ipv6/mcast.c302
-rw-r--r--net/ipv6/mip6.c10
-rw-r--r--net/ipv6/ndisc.c17
-rw-r--r--net/ipv6/netfilter/Kconfig43
-rw-r--r--net/ipv6/netfilter/Makefile5
-rw-r--r--net/ipv6/netfilter/ip6t_MASQUERADE.c76
-rw-r--r--net/ipv6/netfilter/ip6table_nat.c233
-rw-r--r--net/ipv6/netfilter/nf_defrag_ipv6_hooks.c2
-rw-r--r--net/ipv6/netfilter/nf_nat_l3proto_ipv6.c199
-rw-r--r--net/ipv6/netfilter/nf_nat_masquerade_ipv6.c120
-rw-r--r--net/ipv6/netfilter/nf_reject_ipv6.c163
-rw-r--r--net/ipv6/netfilter/nft_chain_nat_ipv6.c165
-rw-r--r--net/ipv6/netfilter/nft_masq_ipv6.c77
-rw-r--r--net/ipv6/output_core.c2
-rw-r--r--net/ipv6/proc.c2
-rw-r--r--net/ipv6/protocol.c1
-rw-r--r--net/ipv6/raw.c8
-rw-r--r--net/ipv6/reassembly.c12
-rw-r--r--net/ipv6/route.c22
-rw-r--r--net/ipv6/sit.c123
-rw-r--r--net/ipv6/syncookies.c4
-rw-r--r--net/ipv6/sysctl_net_ipv6.c10
-rw-r--r--net/ipv6/tcp_ipv6.c32
-rw-r--r--net/ipv6/tcpv6_offload.c69
-rw-r--r--net/ipv6/tunnel6.c4
-rw-r--r--net/ipv6/udp.c26
-rw-r--r--net/ipv6/udp_offload.c92
-rw-r--r--net/ipv6/xfrm6_input.c6
-rw-r--r--net/ipv6/xfrm6_output.c1
-rw-r--r--net/ipv6/xfrm6_policy.c22
-rw-r--r--net/ipv6/xfrm6_state.c14
-rw-r--r--net/ipv6/xfrm6_tunnel.c4
-rw-r--r--net/irda/irlan/irlan_common.c4
-rw-r--r--net/iucv/iucv.c9
-rw-r--r--net/l2tp/l2tp_core.c24
-rw-r--r--net/mac80211/agg-rx.c5
-rw-r--r--net/mac80211/cfg.c114
-rw-r--r--net/mac80211/chan.c191
-rw-r--r--net/mac80211/debugfs.c6
-rw-r--r--net/mac80211/debugfs_netdev.c4
-rw-r--r--net/mac80211/debugfs_sta.c4
-rw-r--r--net/mac80211/driver-ops.h2
-rw-r--r--net/mac80211/ibss.c3
-rw-r--r--net/mac80211/ieee80211_i.h9
-rw-r--r--net/mac80211/iface.c15
-rw-r--r--net/mac80211/key.c15
-rw-r--r--net/mac80211/main.c1
-rw-r--r--net/mac80211/mesh_pathtbl.c4
-rw-r--r--net/mac80211/mesh_plink.c14
-rw-r--r--net/mac80211/mlme.c162
-rw-r--r--net/mac80211/rc80211_minstrel.c98
-rw-r--r--net/mac80211/rc80211_minstrel.h43
-rw-r--r--net/mac80211/rc80211_minstrel_debugfs.c19
-rw-r--r--net/mac80211/rc80211_minstrel_ht.c303
-rw-r--r--net/mac80211/rc80211_minstrel_ht.h41
-rw-r--r--net/mac80211/rc80211_minstrel_ht_debugfs.c10
-rw-r--r--net/mac80211/rx.c13
-rw-r--r--net/mac80211/scan.c3
-rw-r--r--net/mac80211/sta_info.c5
-rw-r--r--net/mac80211/sta_info.h9
-rw-r--r--net/mac80211/status.c22
-rw-r--r--net/mac80211/tdls.c7
-rw-r--r--net/mac80211/trace.h4
-rw-r--r--net/mac80211/tx.c32
-rw-r--r--net/mac80211/util.c26
-rw-r--r--net/mac80211/wme.c5
-rw-r--r--net/mac80211/wpa.c7
-rw-r--r--net/mac802154/rx.c5
-rw-r--r--net/mac802154/tx.c15
-rw-r--r--net/mac802154/wpan.c10
-rw-r--r--net/mpls/mpls_gso.c7
-rw-r--r--net/netfilter/Kconfig9
-rw-r--r--net/netfilter/Makefile1
-rw-r--r--net/netfilter/ipset/Kconfig9
-rw-r--r--net/netfilter/ipset/Makefile1
-rw-r--r--net/netfilter/ipset/ip_set_bitmap_gen.h4
-rw-r--r--net/netfilter/ipset/ip_set_bitmap_ip.c15
-rw-r--r--net/netfilter/ipset/ip_set_bitmap_ipmac.c15
-rw-r--r--net/netfilter/ipset/ip_set_bitmap_port.c15
-rw-r--r--net/netfilter/ipset/ip_set_core.c53
-rw-r--r--net/netfilter/ipset/ip_set_hash_gen.h30
-rw-r--r--net/netfilter/ipset/ip_set_hash_ip.c22
-rw-r--r--net/netfilter/ipset/ip_set_hash_ipmark.c14
-rw-r--r--net/netfilter/ipset/ip_set_hash_ipport.c22
-rw-r--r--net/netfilter/ipset/ip_set_hash_ipportip.c22
-rw-r--r--net/netfilter/ipset/ip_set_hash_ipportnet.c14
-rw-r--r--net/netfilter/ipset/ip_set_hash_mac.c173
-rw-r--r--net/netfilter/ipset/ip_set_hash_net.c16
-rw-r--r--net/netfilter/ipset/ip_set_hash_netiface.c20
-rw-r--r--net/netfilter/ipset/ip_set_hash_netnet.c29
-rw-r--r--net/netfilter/ipset/ip_set_hash_netport.c16
-rw-r--r--net/netfilter/ipset/ip_set_hash_netportnet.c22
-rw-r--r--net/netfilter/ipset/ip_set_list_set.c23
-rw-r--r--net/netfilter/ipvs/Kconfig10
-rw-r--r--net/netfilter/ipvs/Makefile1
-rw-r--r--net/netfilter/ipvs/ip_vs_conn.c74
-rw-r--r--net/netfilter/ipvs/ip_vs_core.c15
-rw-r--r--net/netfilter/ipvs/ip_vs_ctl.c223
-rw-r--r--net/netfilter/ipvs/ip_vs_dh.c2
-rw-r--r--net/netfilter/ipvs/ip_vs_fo.c79
-rw-r--r--net/netfilter/ipvs/ip_vs_ftp.c6
-rw-r--r--net/netfilter/ipvs/ip_vs_lblc.c12
-rw-r--r--net/netfilter/ipvs/ip_vs_lblcr.c12
-rw-r--r--net/netfilter/ipvs/ip_vs_lc.c2
-rw-r--r--net/netfilter/ipvs/ip_vs_nq.c3
-rw-r--r--net/netfilter/ipvs/ip_vs_proto_sctp.c2
-rw-r--r--net/netfilter/ipvs/ip_vs_proto_tcp.c2
-rw-r--r--net/netfilter/ipvs/ip_vs_rr.c2
-rw-r--r--net/netfilter/ipvs/ip_vs_sed.c3
-rw-r--r--net/netfilter/ipvs/ip_vs_sh.c8
-rw-r--r--net/netfilter/ipvs/ip_vs_sync.c13
-rw-r--r--net/netfilter/ipvs/ip_vs_wlc.c3
-rw-r--r--net/netfilter/ipvs/ip_vs_wrr.c2
-rw-r--r--net/netfilter/ipvs/ip_vs_xmit.c388
-rw-r--r--net/netfilter/nf_conntrack_core.c4
-rw-r--r--net/netfilter/nf_conntrack_expect.c3
-rw-r--r--net/netfilter/nf_conntrack_netlink.c2
-rw-r--r--net/netfilter/nf_conntrack_proto_generic.c26
-rw-r--r--net/netfilter/nf_conntrack_standalone.c2
-rw-r--r--net/netfilter/nf_log_common.c2
-rw-r--r--net/netfilter/nf_nat_core.c5
-rw-r--r--net/netfilter/nf_queue.c4
-rw-r--r--net/netfilter/nf_tables_api.c601
-rw-r--r--net/netfilter/nfnetlink.c6
-rw-r--r--net/netfilter/nfnetlink_acct.c54
-rw-r--r--net/netfilter/nfnetlink_log.c8
-rw-r--r--net/netfilter/nfnetlink_queue_core.c12
-rw-r--r--net/netfilter/nft_compat.c116
-rw-r--r--net/netfilter/nft_masq.c59
-rw-r--r--net/netfilter/nft_meta.c45
-rw-r--r--net/netfilter/nft_nat.c16
-rw-r--r--net/netfilter/nft_reject.c37
-rw-r--r--net/netfilter/nft_reject_inet.c94
-rw-r--r--net/netfilter/x_tables.c30
-rw-r--r--net/netfilter/xt_HMARK.c2
-rw-r--r--net/netfilter/xt_RATEEST.c2
-rw-r--r--net/netfilter/xt_cluster.c3
-rw-r--r--net/netfilter/xt_connbytes.c2
-rw-r--r--net/netfilter/xt_hashlimit.c4
-rw-r--r--net/netfilter/xt_physdev.c3
-rw-r--r--net/netfilter/xt_set.c191
-rw-r--r--net/netfilter/xt_string.c1
-rw-r--r--net/netlabel/netlabel_user.c6
-rw-r--r--net/nfc/digital_dep.c101
-rw-r--r--net/nfc/nci/core.c21
-rw-r--r--net/nfc/nci/data.c7
-rw-r--r--net/nfc/nci/ntf.c40
-rw-r--r--net/openvswitch/Kconfig11
-rw-r--r--net/openvswitch/Makefile4
-rw-r--r--net/openvswitch/actions.c261
-rw-r--r--net/openvswitch/datapath.c96
-rw-r--r--net/openvswitch/datapath.h23
-rw-r--r--net/openvswitch/flow.c123
-rw-r--r--net/openvswitch/flow.h54
-rw-r--r--net/openvswitch/flow_netlink.c292
-rw-r--r--net/openvswitch/flow_netlink.h4
-rw-r--r--net/openvswitch/vport-geneve.c235
-rw-r--r--net/openvswitch/vport-gre.c33
-rw-r--r--net/openvswitch/vport-vxlan.c27
-rw-r--r--net/openvswitch/vport.c45
-rw-r--r--net/openvswitch/vport.h14
-rw-r--r--net/packet/af_packet.c12
-rw-r--r--net/phonet/pn_dev.c6
-rw-r--r--net/rds/send.c11
-rw-r--r--net/rds/tcp_connect.c5
-rw-r--r--net/rds/threads.c3
-rw-r--r--net/rose/rose_link.c3
-rw-r--r--net/rxrpc/ar-error.c14
-rw-r--r--net/rxrpc/ar-input.c9
-rw-r--r--net/sched/act_api.c9
-rw-r--r--net/sched/act_police.c6
-rw-r--r--net/sched/cls_api.c33
-rw-r--r--net/sched/cls_basic.c89
-rw-r--r--net/sched/cls_bpf.c102
-rw-r--r--net/sched/cls_cgroup.c79
-rw-r--r--net/sched/cls_flow.c151
-rw-r--r--net/sched/cls_fw.c120
-rw-r--r--net/sched/cls_route.c241
-rw-r--r--net/sched/cls_rsvp.h208
-rw-r--r--net/sched/cls_tcindex.c273
-rw-r--r--net/sched/cls_u32.c407
-rw-r--r--net/sched/em_canid.c4
-rw-r--r--net/sched/em_ipset.c7
-rw-r--r--net/sched/em_meta.c4
-rw-r--r--net/sched/em_nbyte.c2
-rw-r--r--net/sched/em_text.c4
-rw-r--r--net/sched/ematch.c15
-rw-r--r--net/sched/sch_api.c65
-rw-r--r--net/sched/sch_atm.c28
-rw-r--r--net/sched/sch_cbq.c35
-rw-r--r--net/sched/sch_choke.c29
-rw-r--r--net/sched/sch_codel.c2
-rw-r--r--net/sched/sch_drr.c27
-rw-r--r--net/sched/sch_dsmark.c11
-rw-r--r--net/sched/sch_fifo.c2
-rw-r--r--net/sched/sch_fq.c14
-rw-r--r--net/sched/sch_fq_codel.c24
-rw-r--r--net/sched/sch_generic.c82
-rw-r--r--net/sched/sch_gred.c4
-rw-r--r--net/sched/sch_hfsc.c32
-rw-r--r--net/sched/sch_hhf.c8
-rw-r--r--net/sched/sch_htb.c48
-rw-r--r--net/sched/sch_ingress.c10
-rw-r--r--net/sched/sch_mq.c6
-rw-r--r--net/sched/sch_mqprio.c20
-rw-r--r--net/sched/sch_multiq.c17
-rw-r--r--net/sched/sch_netem.c15
-rw-r--r--net/sched/sch_pie.c2
-rw-r--r--net/sched/sch_prio.c20
-rw-r--r--net/sched/sch_qfq.c25
-rw-r--r--net/sched/sch_red.c8
-rw-r--r--net/sched/sch_sfb.c25
-rw-r--r--net/sched/sch_sfq.c35
-rw-r--r--net/sched/sch_tbf.c17
-rw-r--r--net/sched/sch_teql.c20
-rw-r--r--net/sctp/input.c8
-rw-r--r--net/sctp/protocol.c2
-rw-r--r--net/sctp/sm_statefuns.c19
-rw-r--r--net/socket.c7
-rw-r--r--net/tipc/Makefile2
-rw-r--r--net/tipc/bcast.c20
-rw-r--r--net/tipc/bcast.h2
-rw-r--r--net/tipc/config.c4
-rw-r--r--net/tipc/core.c9
-rw-r--r--net/tipc/core.h6
-rw-r--r--net/tipc/link.c120
-rw-r--r--net/tipc/link.h7
-rw-r--r--net/tipc/msg.c38
-rw-r--r--net/tipc/msg.h5
-rw-r--r--net/tipc/name_distr.c140
-rw-r--r--net/tipc/name_distr.h1
-rw-r--r--net/tipc/name_table.c9
-rw-r--r--net/tipc/net.c3
-rw-r--r--net/tipc/node.c95
-rw-r--r--net/tipc/node.h8
-rw-r--r--net/tipc/port.c514
-rw-r--r--net/tipc/port.h190
-rw-r--r--net/tipc/ref.c266
-rw-r--r--net/tipc/ref.h48
-rw-r--r--net/tipc/socket.c884
-rw-r--r--net/tipc/socket.h55
-rw-r--r--net/tipc/subscr.c1
-rw-r--r--net/tipc/sysctl.c7
-rw-r--r--net/unix/garbage.c2
-rw-r--r--net/wimax/id-table.c2
-rw-r--r--net/wimax/op-msg.c9
-rw-r--r--net/wimax/op-reset.c3
-rw-r--r--net/wimax/op-rfkill.c3
-rw-r--r--net/wimax/op-state-get.c3
-rw-r--r--net/wimax/stack.c7
-rw-r--r--net/wimax/wimax-internal.h6
-rw-r--r--net/wireless/chan.c1
-rw-r--r--net/wireless/core.c16
-rw-r--r--net/wireless/ibss.c4
-rw-r--r--net/wireless/mlme.c8
-rw-r--r--net/wireless/nl80211.c249
-rw-r--r--net/wireless/nl80211.h3
-rw-r--r--net/wireless/rdev-ops.h31
-rw-r--r--net/wireless/reg.c82
-rw-r--r--net/wireless/scan.c22
-rw-r--r--net/wireless/sme.c6
-rw-r--r--net/wireless/trace.h45
-rw-r--r--net/wireless/util.c3
-rw-r--r--net/wireless/wext-compat.c2
-rw-r--r--net/wireless/wext-sme.c2
-rw-r--r--net/xfrm/xfrm_hash.h76
-rw-r--r--net/xfrm/xfrm_output.c6
-rw-r--r--net/xfrm/xfrm_policy.c144
-rw-r--r--net/xfrm/xfrm_state.c13
-rw-r--r--net/xfrm/xfrm_user.c83
434 files changed, 15378 insertions, 7910 deletions
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 35a6b6b15e8a..0d441ec8763e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -799,7 +799,8 @@ void vlan_setup(struct net_device *dev)
799 ether_setup(dev); 799 ether_setup(dev);
800 800
801 dev->priv_flags |= IFF_802_1Q_VLAN; 801 dev->priv_flags |= IFF_802_1Q_VLAN;
802 dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING); 802 dev->priv_flags &= ~IFF_TX_SKB_SHARING;
803 netif_keep_dst(dev);
803 dev->tx_queue_len = 0; 804 dev->tx_queue_len = 0;
804 805
805 dev->netdev_ops = &vlan_netdev_ops; 806 dev->netdev_ops = &vlan_netdev_ops;
diff --git a/net/Kconfig b/net/Kconfig
index 4051fdfa4367..d6b138e2c263 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -176,10 +176,11 @@ config NETFILTER_ADVANCED
176 If unsure, say Y. 176 If unsure, say Y.
177 177
178config BRIDGE_NETFILTER 178config BRIDGE_NETFILTER
179 bool "Bridged IP/ARP packets filtering" 179 tristate "Bridged IP/ARP packets filtering"
180 depends on BRIDGE && NETFILTER && INET 180 depends on BRIDGE
181 depends on NETFILTER && INET
181 depends on NETFILTER_ADVANCED 182 depends on NETFILTER_ADVANCED
182 default y 183 default m
183 ---help--- 184 ---help---
184 Enabling this option will let arptables resp. iptables see bridged 185 Enabling this option will let arptables resp. iptables see bridged
185 ARP resp. IP traffic. If you want a bridging firewall, you probably 186 ARP resp. IP traffic. If you want a bridging firewall, you probably
diff --git a/net/atm/clip.c b/net/atm/clip.c
index 46339040fef0..17e55dfecbe2 100644
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -384,7 +384,7 @@ static netdev_tx_t clip_start_xmit(struct sk_buff *skb,
384 pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, vcc, vcc->dev); 384 pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, vcc, vcc->dev);
385 old = xchg(&entry->vccs->xoff, 1); /* assume XOFF ... */ 385 old = xchg(&entry->vccs->xoff, 1); /* assume XOFF ... */
386 if (old) { 386 if (old) {
387 pr_warning("XOFF->XOFF transition\n"); 387 pr_warn("XOFF->XOFF transition\n");
388 goto out_release_neigh; 388 goto out_release_neigh;
389 } 389 }
390 dev->stats.tx_packets++; 390 dev->stats.tx_packets++;
@@ -447,7 +447,7 @@ static int clip_setentry(struct atm_vcc *vcc, __be32 ip)
447 struct rtable *rt; 447 struct rtable *rt;
448 448
449 if (vcc->push != clip_push) { 449 if (vcc->push != clip_push) {
450 pr_warning("non-CLIP VCC\n"); 450 pr_warn("non-CLIP VCC\n");
451 return -EBADF; 451 return -EBADF;
452 } 452 }
453 clip_vcc = CLIP_VCC(vcc); 453 clip_vcc = CLIP_VCC(vcc);
@@ -501,7 +501,7 @@ static void clip_setup(struct net_device *dev)
501 /* without any more elaborate queuing. 100 is a reasonable */ 501 /* without any more elaborate queuing. 100 is a reasonable */
502 /* compromise between decent burst-tolerance and protection */ 502 /* compromise between decent burst-tolerance and protection */
503 /* against memory hogs. */ 503 /* against memory hogs. */
504 dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; 504 netif_keep_dst(dev);
505} 505}
506 506
507static int clip_create(int number) 507static int clip_create(int number)
diff --git a/net/atm/common.c b/net/atm/common.c
index 7b491006eaf4..6a765156a3f6 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -300,7 +300,7 @@ static int adjust_tp(struct atm_trafprm *tp, unsigned char aal)
300 max_sdu = ATM_MAX_AAL34_PDU; 300 max_sdu = ATM_MAX_AAL34_PDU;
301 break; 301 break;
302 default: 302 default:
303 pr_warning("AAL problems ... (%d)\n", aal); 303 pr_warn("AAL problems ... (%d)\n", aal);
304 /* fall through */ 304 /* fall through */
305 case ATM_AAL5: 305 case ATM_AAL5:
306 max_sdu = ATM_MAX_AAL5_PDU; 306 max_sdu = ATM_MAX_AAL5_PDU;
diff --git a/net/atm/mpc.c b/net/atm/mpc.c
index e8e0e7a8a23d..0e982222d425 100644
--- a/net/atm/mpc.c
+++ b/net/atm/mpc.c
@@ -599,7 +599,7 @@ static netdev_tx_t mpc_send_packet(struct sk_buff *skb,
599 } 599 }
600 600
601non_ip: 601non_ip:
602 return mpc->old_ops->ndo_start_xmit(skb, dev); 602 return __netdev_start_xmit(mpc->old_ops, skb, dev, false);
603} 603}
604 604
605static int atm_mpoa_vcc_attach(struct atm_vcc *vcc, void __user *arg) 605static int atm_mpoa_vcc_attach(struct atm_vcc *vcc, void __user *arg)
diff --git a/net/bluetooth/6lowpan.c b/net/bluetooth/6lowpan.c
index 206b65ccd5b8..c2e0d14433df 100644
--- a/net/bluetooth/6lowpan.c
+++ b/net/bluetooth/6lowpan.c
@@ -39,6 +39,7 @@ static struct dentry *lowpan_control_debugfs;
39 39
40struct skb_cb { 40struct skb_cb {
41 struct in6_addr addr; 41 struct in6_addr addr;
42 struct in6_addr gw;
42 struct l2cap_chan *chan; 43 struct l2cap_chan *chan;
43 int status; 44 int status;
44}; 45};
@@ -158,6 +159,54 @@ static inline struct lowpan_peer *peer_lookup_conn(struct lowpan_dev *dev,
158 return NULL; 159 return NULL;
159} 160}
160 161
162static inline struct lowpan_peer *peer_lookup_dst(struct lowpan_dev *dev,
163 struct in6_addr *daddr,
164 struct sk_buff *skb)
165{
166 struct lowpan_peer *peer, *tmp;
167 struct in6_addr *nexthop;
168 struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
169 int count = atomic_read(&dev->peer_count);
170
171 BT_DBG("peers %d addr %pI6c rt %p", count, daddr, rt);
172
173 /* If we have multiple 6lowpan peers, then check where we should
174 * send the packet. If only one peer exists, then we can send the
175 * packet right away.
176 */
177 if (count == 1)
178 return list_first_entry(&dev->peers, struct lowpan_peer,
179 list);
180
181 if (!rt) {
182 nexthop = &lowpan_cb(skb)->gw;
183
184 if (ipv6_addr_any(nexthop))
185 return NULL;
186 } else {
187 nexthop = rt6_nexthop(rt);
188
189 /* We need to remember the address because it is needed
190 * by bt_xmit() when sending the packet. In bt_xmit(), the
191 * destination routing info is not set.
192 */
193 memcpy(&lowpan_cb(skb)->gw, nexthop, sizeof(struct in6_addr));
194 }
195
196 BT_DBG("gw %pI6c", nexthop);
197
198 list_for_each_entry_safe(peer, tmp, &dev->peers, list) {
199 BT_DBG("dst addr %pMR dst type %d ip %pI6c",
200 &peer->chan->dst, peer->chan->dst_type,
201 &peer->peer_addr);
202
203 if (!ipv6_addr_cmp(&peer->peer_addr, nexthop))
204 return peer;
205 }
206
207 return NULL;
208}
209
161static struct lowpan_peer *lookup_peer(struct l2cap_conn *conn) 210static struct lowpan_peer *lookup_peer(struct l2cap_conn *conn)
162{ 211{
163 struct lowpan_dev *entry, *tmp; 212 struct lowpan_dev *entry, *tmp;
@@ -377,58 +426,85 @@ static void convert_dest_bdaddr(struct in6_addr *ip6_daddr,
377 *addr_type = get_addr_type_from_eui64(addr->b[5]); 426 *addr_type = get_addr_type_from_eui64(addr->b[5]);
378} 427}
379 428
380static int header_create(struct sk_buff *skb, struct net_device *netdev, 429static int setup_header(struct sk_buff *skb, struct net_device *netdev,
381 unsigned short type, const void *_daddr, 430 bdaddr_t *peer_addr, u8 *peer_addr_type)
382 const void *_saddr, unsigned int len)
383{ 431{
384 struct ipv6hdr *hdr; 432 struct in6_addr ipv6_daddr;
385 struct lowpan_dev *dev; 433 struct lowpan_dev *dev;
386 struct lowpan_peer *peer; 434 struct lowpan_peer *peer;
387 bdaddr_t addr, *any = BDADDR_ANY; 435 bdaddr_t addr, *any = BDADDR_ANY;
388 u8 *saddr, *daddr = any->b; 436 u8 *daddr = any->b;
389 u8 addr_type; 437 int err, status = 0;
390
391 if (type != ETH_P_IPV6)
392 return -EINVAL;
393
394 hdr = ipv6_hdr(skb);
395 438
396 dev = lowpan_dev(netdev); 439 dev = lowpan_dev(netdev);
397 440
398 if (ipv6_addr_is_multicast(&hdr->daddr)) { 441 memcpy(&ipv6_daddr, &lowpan_cb(skb)->addr, sizeof(ipv6_daddr));
399 memcpy(&lowpan_cb(skb)->addr, &hdr->daddr, 442
400 sizeof(struct in6_addr)); 443 if (ipv6_addr_is_multicast(&ipv6_daddr)) {
401 lowpan_cb(skb)->chan = NULL; 444 lowpan_cb(skb)->chan = NULL;
402 } else { 445 } else {
403 unsigned long flags; 446 unsigned long flags;
447 u8 addr_type;
404 448
405 /* Get destination BT device from skb. 449 /* Get destination BT device from skb.
406 * If there is no such peer then discard the packet. 450 * If there is no such peer then discard the packet.
407 */ 451 */
408 convert_dest_bdaddr(&hdr->daddr, &addr, &addr_type); 452 convert_dest_bdaddr(&ipv6_daddr, &addr, &addr_type);
409 453
410 BT_DBG("dest addr %pMR type %d IP %pI6c", &addr, 454 BT_DBG("dest addr %pMR type %d IP %pI6c", &addr,
411 addr_type, &hdr->daddr); 455 addr_type, &ipv6_daddr);
412 456
413 read_lock_irqsave(&devices_lock, flags); 457 read_lock_irqsave(&devices_lock, flags);
414 peer = peer_lookup_ba(dev, &addr, addr_type); 458 peer = peer_lookup_ba(dev, &addr, addr_type);
415 read_unlock_irqrestore(&devices_lock, flags); 459 read_unlock_irqrestore(&devices_lock, flags);
416 460
417 if (!peer) { 461 if (!peer) {
418 BT_DBG("no such peer %pMR found", &addr); 462 /* The packet might be sent to 6lowpan interface
419 return -ENOENT; 463 * because of routing (either via default route
464 * or user set route) so get peer according to
465 * the destination address.
466 */
467 read_lock_irqsave(&devices_lock, flags);
468 peer = peer_lookup_dst(dev, &ipv6_daddr, skb);
469 read_unlock_irqrestore(&devices_lock, flags);
470 if (!peer) {
471 BT_DBG("no such peer %pMR found", &addr);
472 return -ENOENT;
473 }
420 } 474 }
421 475
422 daddr = peer->eui64_addr; 476 daddr = peer->eui64_addr;
423 477 *peer_addr = addr;
424 memcpy(&lowpan_cb(skb)->addr, &hdr->daddr, 478 *peer_addr_type = addr_type;
425 sizeof(struct in6_addr));
426 lowpan_cb(skb)->chan = peer->chan; 479 lowpan_cb(skb)->chan = peer->chan;
480
481 status = 1;
427 } 482 }
428 483
429 saddr = dev->netdev->dev_addr; 484 lowpan_header_compress(skb, netdev, ETH_P_IPV6, daddr,
485 dev->netdev->dev_addr, skb->len);
486
487 err = dev_hard_header(skb, netdev, ETH_P_IPV6, NULL, NULL, 0);
488 if (err < 0)
489 return err;
490
491 return status;
492}
493
494static int header_create(struct sk_buff *skb, struct net_device *netdev,
495 unsigned short type, const void *_daddr,
496 const void *_saddr, unsigned int len)
497{
498 struct ipv6hdr *hdr;
499
500 if (type != ETH_P_IPV6)
501 return -EINVAL;
502
503 hdr = ipv6_hdr(skb);
504
505 memcpy(&lowpan_cb(skb)->addr, &hdr->daddr, sizeof(struct in6_addr));
430 506
431 return lowpan_header_compress(skb, netdev, type, daddr, saddr, len); 507 return 0;
432} 508}
433 509
434/* Packet to BT LE device */ 510/* Packet to BT LE device */
@@ -470,11 +546,12 @@ static int send_pkt(struct l2cap_chan *chan, struct sk_buff *skb,
470 return err; 546 return err;
471} 547}
472 548
473static void send_mcast_pkt(struct sk_buff *skb, struct net_device *netdev) 549static int send_mcast_pkt(struct sk_buff *skb, struct net_device *netdev)
474{ 550{
475 struct sk_buff *local_skb; 551 struct sk_buff *local_skb;
476 struct lowpan_dev *entry, *tmp; 552 struct lowpan_dev *entry, *tmp;
477 unsigned long flags; 553 unsigned long flags;
554 int err = 0;
478 555
479 read_lock_irqsave(&devices_lock, flags); 556 read_lock_irqsave(&devices_lock, flags);
480 557
@@ -488,55 +565,77 @@ static void send_mcast_pkt(struct sk_buff *skb, struct net_device *netdev)
488 dev = lowpan_dev(entry->netdev); 565 dev = lowpan_dev(entry->netdev);
489 566
490 list_for_each_entry_safe(pentry, ptmp, &dev->peers, list) { 567 list_for_each_entry_safe(pentry, ptmp, &dev->peers, list) {
568 int ret;
569
491 local_skb = skb_clone(skb, GFP_ATOMIC); 570 local_skb = skb_clone(skb, GFP_ATOMIC);
492 571
493 send_pkt(pentry->chan, local_skb, netdev); 572 BT_DBG("xmit %s to %pMR type %d IP %pI6c chan %p",
573 netdev->name,
574 &pentry->chan->dst, pentry->chan->dst_type,
575 &pentry->peer_addr, pentry->chan);
576 ret = send_pkt(pentry->chan, local_skb, netdev);
577 if (ret < 0)
578 err = ret;
494 579
495 kfree_skb(local_skb); 580 kfree_skb(local_skb);
496 } 581 }
497 } 582 }
498 583
499 read_unlock_irqrestore(&devices_lock, flags); 584 read_unlock_irqrestore(&devices_lock, flags);
585
586 return err;
500} 587}
501 588
502static netdev_tx_t bt_xmit(struct sk_buff *skb, struct net_device *netdev) 589static netdev_tx_t bt_xmit(struct sk_buff *skb, struct net_device *netdev)
503{ 590{
504 int err = 0; 591 int err = 0;
505 struct lowpan_dev *dev;
506 struct lowpan_peer *peer;
507 bdaddr_t addr; 592 bdaddr_t addr;
508 u8 addr_type; 593 u8 addr_type;
594 struct sk_buff *tmpskb;
509 595
510 if (ipv6_addr_is_multicast(&lowpan_cb(skb)->addr)) { 596 /* We must take a copy of the skb before we modify/replace the ipv6
511 /* We need to send the packet to every device 597 * header as the header could be used elsewhere
512 * behind this interface. 598 */
513 */ 599 tmpskb = skb_unshare(skb, GFP_ATOMIC);
514 send_mcast_pkt(skb, netdev); 600 if (!tmpskb) {
515 } else { 601 kfree_skb(skb);
516 unsigned long flags; 602 return NET_XMIT_DROP;
517 603 }
518 convert_dest_bdaddr(&lowpan_cb(skb)->addr, &addr, &addr_type); 604 skb = tmpskb;
519 dev = lowpan_dev(netdev);
520
521 read_lock_irqsave(&devices_lock, flags);
522 peer = peer_lookup_ba(dev, &addr, addr_type);
523 read_unlock_irqrestore(&devices_lock, flags);
524 605
525 BT_DBG("xmit %s to %pMR type %d IP %pI6c peer %p", 606 /* Return values from setup_header()
526 netdev->name, &addr, addr_type, 607 * <0 - error, packet is dropped
527 &lowpan_cb(skb)->addr, peer); 608 * 0 - this is a multicast packet
609 * 1 - this is unicast packet
610 */
611 err = setup_header(skb, netdev, &addr, &addr_type);
612 if (err < 0) {
613 kfree_skb(skb);
614 return NET_XMIT_DROP;
615 }
528 616
529 if (peer && peer->chan) 617 if (err) {
530 err = send_pkt(peer->chan, skb, netdev); 618 if (lowpan_cb(skb)->chan) {
531 else 619 BT_DBG("xmit %s to %pMR type %d IP %pI6c chan %p",
620 netdev->name, &addr, addr_type,
621 &lowpan_cb(skb)->addr, lowpan_cb(skb)->chan);
622 err = send_pkt(lowpan_cb(skb)->chan, skb, netdev);
623 } else {
532 err = -ENOENT; 624 err = -ENOENT;
625 }
626 } else {
627 /* We need to send the packet to every device behind this
628 * interface.
629 */
630 err = send_mcast_pkt(skb, netdev);
533 } 631 }
632
534 dev_kfree_skb(skb); 633 dev_kfree_skb(skb);
535 634
536 if (err) 635 if (err)
537 BT_DBG("ERROR: xmit failed (%d)", err); 636 BT_DBG("ERROR: xmit failed (%d)", err);
538 637
539 return (err < 0) ? NET_XMIT_DROP : err; 638 return err < 0 ? NET_XMIT_DROP : err;
540} 639}
541 640
542static const struct net_device_ops netdev_ops = { 641static const struct net_device_ops netdev_ops = {
@@ -556,7 +655,8 @@ static void netdev_setup(struct net_device *dev)
556 dev->needed_tailroom = 0; 655 dev->needed_tailroom = 0;
557 dev->mtu = IPV6_MIN_MTU; 656 dev->mtu = IPV6_MIN_MTU;
558 dev->tx_queue_len = 0; 657 dev->tx_queue_len = 0;
559 dev->flags = IFF_RUNNING | IFF_POINTOPOINT; 658 dev->flags = IFF_RUNNING | IFF_POINTOPOINT |
659 IFF_MULTICAST;
560 dev->watchdog_timeo = 0; 660 dev->watchdog_timeo = 0;
561 661
562 dev->netdev_ops = &netdev_ops; 662 dev->netdev_ops = &netdev_ops;
@@ -671,6 +771,14 @@ static struct l2cap_chan *chan_open(struct l2cap_chan *pchan)
671 return chan; 771 return chan;
672} 772}
673 773
774static void set_ip_addr_bits(u8 addr_type, u8 *addr)
775{
776 if (addr_type == BDADDR_LE_PUBLIC)
777 *addr |= 0x02;
778 else
779 *addr &= ~0x02;
780}
781
674static struct l2cap_chan *add_peer_chan(struct l2cap_chan *chan, 782static struct l2cap_chan *add_peer_chan(struct l2cap_chan *chan,
675 struct lowpan_dev *dev) 783 struct lowpan_dev *dev)
676{ 784{
@@ -693,6 +801,11 @@ static struct l2cap_chan *add_peer_chan(struct l2cap_chan *chan,
693 memcpy(&peer->eui64_addr, (u8 *)&peer->peer_addr.s6_addr + 8, 801 memcpy(&peer->eui64_addr, (u8 *)&peer->peer_addr.s6_addr + 8,
694 EUI64_ADDR_LEN); 802 EUI64_ADDR_LEN);
695 803
804 /* IPv6 address needs to have the U/L bit set properly so toggle
805 * it back here.
806 */
807 set_ip_addr_bits(chan->dst_type, (u8 *)&peer->peer_addr.s6_addr + 8);
808
696 write_lock_irqsave(&devices_lock, flags); 809 write_lock_irqsave(&devices_lock, flags);
697 INIT_LIST_HEAD(&peer->list); 810 INIT_LIST_HEAD(&peer->list);
698 peer_add(dev, peer); 811 peer_add(dev, peer);
@@ -772,16 +885,16 @@ static inline void chan_ready_cb(struct l2cap_chan *chan)
772 ifup(dev->netdev); 885 ifup(dev->netdev);
773} 886}
774 887
775static inline struct l2cap_chan *chan_new_conn_cb(struct l2cap_chan *chan) 888static inline struct l2cap_chan *chan_new_conn_cb(struct l2cap_chan *pchan)
776{ 889{
777 struct l2cap_chan *pchan; 890 struct l2cap_chan *chan;
778 891
779 pchan = chan_open(chan); 892 chan = chan_open(pchan);
780 pchan->ops = chan->ops; 893 chan->ops = pchan->ops;
781 894
782 BT_DBG("chan %p pchan %p", chan, pchan); 895 BT_DBG("chan %p pchan %p", chan, pchan);
783 896
784 return pchan; 897 return chan;
785} 898}
786 899
787static void delete_netdev(struct work_struct *work) 900static void delete_netdev(struct work_struct *work)
@@ -876,6 +989,9 @@ static void chan_suspend_cb(struct l2cap_chan *chan)
876 989
877 BT_DBG("chan %p conn %p skb %p", chan, chan->conn, skb); 990 BT_DBG("chan %p conn %p skb %p", chan, chan->conn, skb);
878 991
992 if (!skb)
993 return;
994
879 lowpan_cb(skb)->status = -EAGAIN; 995 lowpan_cb(skb)->status = -EAGAIN;
880} 996}
881 997
@@ -885,12 +1001,15 @@ static void chan_resume_cb(struct l2cap_chan *chan)
885 1001
886 BT_DBG("chan %p conn %p skb %p", chan, chan->conn, skb); 1002 BT_DBG("chan %p conn %p skb %p", chan, chan->conn, skb);
887 1003
1004 if (!skb)
1005 return;
1006
888 lowpan_cb(skb)->status = 0; 1007 lowpan_cb(skb)->status = 0;
889} 1008}
890 1009
891static long chan_get_sndtimeo_cb(struct l2cap_chan *chan) 1010static long chan_get_sndtimeo_cb(struct l2cap_chan *chan)
892{ 1011{
893 return msecs_to_jiffies(1000); 1012 return L2CAP_CONN_TIMEOUT;
894} 1013}
895 1014
896static const struct l2cap_ops bt_6lowpan_chan_ops = { 1015static const struct l2cap_ops bt_6lowpan_chan_ops = {
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 4dca0299ed96..339c74ad4553 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -709,8 +709,11 @@ EXPORT_SYMBOL_GPL(bt_debugfs);
709 709
710static int __init bt_init(void) 710static int __init bt_init(void)
711{ 711{
712 struct sk_buff *skb;
712 int err; 713 int err;
713 714
715 BUILD_BUG_ON(sizeof(struct bt_skb_cb) > sizeof(skb->cb));
716
714 BT_INFO("Core ver %s", VERSION); 717 BT_INFO("Core ver %s", VERSION);
715 718
716 bt_debugfs = debugfs_create_dir("bluetooth", NULL); 719 bt_debugfs = debugfs_create_dir("bluetooth", NULL);
diff --git a/net/bluetooth/amp.c b/net/bluetooth/amp.c
index 016cdb66df6c..2640d78f30b8 100644
--- a/net/bluetooth/amp.c
+++ b/net/bluetooth/amp.c
@@ -149,15 +149,14 @@ static int hmac_sha256(u8 *key, u8 ksize, char *plaintext, u8 psize, u8 *output)
149 if (ret) { 149 if (ret) {
150 BT_DBG("crypto_ahash_setkey failed: err %d", ret); 150 BT_DBG("crypto_ahash_setkey failed: err %d", ret);
151 } else { 151 } else {
152 struct { 152 char desc[sizeof(struct shash_desc) +
153 struct shash_desc shash; 153 crypto_shash_descsize(tfm)] CRYPTO_MINALIGN_ATTR;
154 char ctx[crypto_shash_descsize(tfm)]; 154 struct shash_desc *shash = (struct shash_desc *)desc;
155 } desc;
156 155
157 desc.shash.tfm = tfm; 156 shash->tfm = tfm;
158 desc.shash.flags = CRYPTO_TFM_REQ_MAY_SLEEP; 157 shash->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
159 158
160 ret = crypto_shash_digest(&desc.shash, plaintext, psize, 159 ret = crypto_shash_digest(shash, plaintext, psize,
161 output); 160 output);
162 } 161 }
163 162
diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
index faff6247ac8f..b9517bd17190 100644
--- a/net/bluetooth/hci_conn.c
+++ b/net/bluetooth/hci_conn.c
@@ -36,19 +36,25 @@
36struct sco_param { 36struct sco_param {
37 u16 pkt_type; 37 u16 pkt_type;
38 u16 max_latency; 38 u16 max_latency;
39 u8 retrans_effort;
40};
41
42static const struct sco_param esco_param_cvsd[] = {
43 { EDR_ESCO_MASK & ~ESCO_2EV3, 0x000a, 0x01 }, /* S3 */
44 { EDR_ESCO_MASK & ~ESCO_2EV3, 0x0007, 0x01 }, /* S2 */
45 { EDR_ESCO_MASK | ESCO_EV3, 0x0007, 0x01 }, /* S1 */
46 { EDR_ESCO_MASK | ESCO_HV3, 0xffff, 0x01 }, /* D1 */
47 { EDR_ESCO_MASK | ESCO_HV1, 0xffff, 0x01 }, /* D0 */
39}; 48};
40 49
41static const struct sco_param sco_param_cvsd[] = { 50static const struct sco_param sco_param_cvsd[] = {
42 { EDR_ESCO_MASK & ~ESCO_2EV3, 0x000a }, /* S3 */ 51 { EDR_ESCO_MASK | ESCO_HV3, 0xffff, 0xff }, /* D1 */
43 { EDR_ESCO_MASK & ~ESCO_2EV3, 0x0007 }, /* S2 */ 52 { EDR_ESCO_MASK | ESCO_HV1, 0xffff, 0xff }, /* D0 */
44 { EDR_ESCO_MASK | ESCO_EV3, 0x0007 }, /* S1 */
45 { EDR_ESCO_MASK | ESCO_HV3, 0xffff }, /* D1 */
46 { EDR_ESCO_MASK | ESCO_HV1, 0xffff }, /* D0 */
47}; 53};
48 54
49static const struct sco_param sco_param_wideband[] = { 55static const struct sco_param esco_param_msbc[] = {
50 { EDR_ESCO_MASK & ~ESCO_2EV3, 0x000d }, /* T2 */ 56 { EDR_ESCO_MASK & ~ESCO_2EV3, 0x000d, 0x02 }, /* T2 */
51 { EDR_ESCO_MASK | ESCO_EV3, 0x0008 }, /* T1 */ 57 { EDR_ESCO_MASK | ESCO_EV3, 0x0008, 0x02 }, /* T1 */
52}; 58};
53 59
54static void hci_le_create_connection_cancel(struct hci_conn *conn) 60static void hci_le_create_connection_cancel(struct hci_conn *conn)
@@ -116,23 +122,36 @@ static void hci_reject_sco(struct hci_conn *conn)
116{ 122{
117 struct hci_cp_reject_sync_conn_req cp; 123 struct hci_cp_reject_sync_conn_req cp;
118 124
119 cp.reason = HCI_ERROR_REMOTE_USER_TERM; 125 cp.reason = HCI_ERROR_REJ_LIMITED_RESOURCES;
120 bacpy(&cp.bdaddr, &conn->dst); 126 bacpy(&cp.bdaddr, &conn->dst);
121 127
122 hci_send_cmd(conn->hdev, HCI_OP_REJECT_SYNC_CONN_REQ, sizeof(cp), &cp); 128 hci_send_cmd(conn->hdev, HCI_OP_REJECT_SYNC_CONN_REQ, sizeof(cp), &cp);
123} 129}
124 130
125void hci_disconnect(struct hci_conn *conn, __u8 reason) 131int hci_disconnect(struct hci_conn *conn, __u8 reason)
126{ 132{
127 struct hci_cp_disconnect cp; 133 struct hci_cp_disconnect cp;
128 134
129 BT_DBG("hcon %p", conn); 135 BT_DBG("hcon %p", conn);
130 136
137 /* When we are master of an established connection and it enters
138 * the disconnect timeout, then go ahead and try to read the
139 * current clock offset. Processing of the result is done
140 * within the event handling and hci_clock_offset_evt function.
141 */
142 if (conn->type == ACL_LINK && conn->role == HCI_ROLE_MASTER) {
143 struct hci_dev *hdev = conn->hdev;
144 struct hci_cp_read_clock_offset cp;
145
146 cp.handle = cpu_to_le16(conn->handle);
147 hci_send_cmd(hdev, HCI_OP_READ_CLOCK_OFFSET, sizeof(cp), &cp);
148 }
149
131 conn->state = BT_DISCONN; 150 conn->state = BT_DISCONN;
132 151
133 cp.handle = cpu_to_le16(conn->handle); 152 cp.handle = cpu_to_le16(conn->handle);
134 cp.reason = reason; 153 cp.reason = reason;
135 hci_send_cmd(conn->hdev, HCI_OP_DISCONNECT, sizeof(cp), &cp); 154 return hci_send_cmd(conn->hdev, HCI_OP_DISCONNECT, sizeof(cp), &cp);
136} 155}
137 156
138static void hci_amp_disconn(struct hci_conn *conn) 157static void hci_amp_disconn(struct hci_conn *conn)
@@ -188,21 +207,26 @@ bool hci_setup_sync(struct hci_conn *conn, __u16 handle)
188 207
189 switch (conn->setting & SCO_AIRMODE_MASK) { 208 switch (conn->setting & SCO_AIRMODE_MASK) {
190 case SCO_AIRMODE_TRANSP: 209 case SCO_AIRMODE_TRANSP:
191 if (conn->attempt > ARRAY_SIZE(sco_param_wideband)) 210 if (conn->attempt > ARRAY_SIZE(esco_param_msbc))
192 return false; 211 return false;
193 cp.retrans_effort = 0x02; 212 param = &esco_param_msbc[conn->attempt - 1];
194 param = &sco_param_wideband[conn->attempt - 1];
195 break; 213 break;
196 case SCO_AIRMODE_CVSD: 214 case SCO_AIRMODE_CVSD:
197 if (conn->attempt > ARRAY_SIZE(sco_param_cvsd)) 215 if (lmp_esco_capable(conn->link)) {
198 return false; 216 if (conn->attempt > ARRAY_SIZE(esco_param_cvsd))
199 cp.retrans_effort = 0x01; 217 return false;
200 param = &sco_param_cvsd[conn->attempt - 1]; 218 param = &esco_param_cvsd[conn->attempt - 1];
219 } else {
220 if (conn->attempt > ARRAY_SIZE(sco_param_cvsd))
221 return false;
222 param = &sco_param_cvsd[conn->attempt - 1];
223 }
201 break; 224 break;
202 default: 225 default:
203 return false; 226 return false;
204 } 227 }
205 228
229 cp.retrans_effort = param->retrans_effort;
206 cp.pkt_type = __cpu_to_le16(param->pkt_type); 230 cp.pkt_type = __cpu_to_le16(param->pkt_type);
207 cp.max_latency = __cpu_to_le16(param->max_latency); 231 cp.max_latency = __cpu_to_le16(param->max_latency);
208 232
@@ -325,25 +349,6 @@ static void hci_conn_timeout(struct work_struct *work)
325 hci_amp_disconn(conn); 349 hci_amp_disconn(conn);
326 } else { 350 } else {
327 __u8 reason = hci_proto_disconn_ind(conn); 351 __u8 reason = hci_proto_disconn_ind(conn);
328
329 /* When we are master of an established connection
330 * and it enters the disconnect timeout, then go
331 * ahead and try to read the current clock offset.
332 *
333 * Processing of the result is done within the
334 * event handling and hci_clock_offset_evt function.
335 */
336 if (conn->type == ACL_LINK &&
337 conn->role == HCI_ROLE_MASTER) {
338 struct hci_dev *hdev = conn->hdev;
339 struct hci_cp_read_clock_offset cp;
340
341 cp.handle = cpu_to_le16(conn->handle);
342
343 hci_send_cmd(hdev, HCI_OP_READ_CLOCK_OFFSET,
344 sizeof(cp), &cp);
345 }
346
347 hci_disconnect(conn, reason); 352 hci_disconnect(conn, reason);
348 } 353 }
349 break; 354 break;
@@ -595,6 +600,7 @@ void hci_le_conn_failed(struct hci_conn *conn, u8 status)
595 conn->dst_type); 600 conn->dst_type);
596 if (params && params->conn) { 601 if (params && params->conn) {
597 hci_conn_drop(params->conn); 602 hci_conn_drop(params->conn);
603 hci_conn_put(params->conn);
598 params->conn = NULL; 604 params->conn = NULL;
599 } 605 }
600 606
@@ -1290,11 +1296,16 @@ struct hci_chan *hci_chan_create(struct hci_conn *conn)
1290 1296
1291 BT_DBG("%s hcon %p", hdev->name, conn); 1297 BT_DBG("%s hcon %p", hdev->name, conn);
1292 1298
1299 if (test_bit(HCI_CONN_DROP, &conn->flags)) {
1300 BT_DBG("Refusing to create new hci_chan");
1301 return NULL;
1302 }
1303
1293 chan = kzalloc(sizeof(*chan), GFP_KERNEL); 1304 chan = kzalloc(sizeof(*chan), GFP_KERNEL);
1294 if (!chan) 1305 if (!chan)
1295 return NULL; 1306 return NULL;
1296 1307
1297 chan->conn = conn; 1308 chan->conn = hci_conn_get(conn);
1298 skb_queue_head_init(&chan->data_q); 1309 skb_queue_head_init(&chan->data_q);
1299 chan->state = BT_CONNECTED; 1310 chan->state = BT_CONNECTED;
1300 1311
@@ -1314,7 +1325,10 @@ void hci_chan_del(struct hci_chan *chan)
1314 1325
1315 synchronize_rcu(); 1326 synchronize_rcu();
1316 1327
1317 hci_conn_drop(conn); 1328 /* Prevent new hci_chan's to be created for this hci_conn */
1329 set_bit(HCI_CONN_DROP, &conn->flags);
1330
1331 hci_conn_put(conn);
1318 1332
1319 skb_queue_purge(&chan->data_q); 1333 skb_queue_purge(&chan->data_q);
1320 kfree(chan); 1334 kfree(chan);
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 1d9c29a00568..cb05d7f16a34 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1898,6 +1898,8 @@ static int __hci_init(struct hci_dev *hdev)
1898 debugfs_create_u16("discov_interleaved_timeout", 0644, 1898 debugfs_create_u16("discov_interleaved_timeout", 0644,
1899 hdev->debugfs, 1899 hdev->debugfs,
1900 &hdev->discov_interleaved_timeout); 1900 &hdev->discov_interleaved_timeout);
1901
1902 smp_register(hdev);
1901 } 1903 }
1902 1904
1903 return 0; 1905 return 0;
@@ -2539,6 +2541,7 @@ static void hci_pend_le_actions_clear(struct hci_dev *hdev)
2539 list_for_each_entry(p, &hdev->le_conn_params, list) { 2541 list_for_each_entry(p, &hdev->le_conn_params, list) {
2540 if (p->conn) { 2542 if (p->conn) {
2541 hci_conn_drop(p->conn); 2543 hci_conn_drop(p->conn);
2544 hci_conn_put(p->conn);
2542 p->conn = NULL; 2545 p->conn = NULL;
2543 } 2546 }
2544 list_del_init(&p->action); 2547 list_del_init(&p->action);
@@ -3238,7 +3241,7 @@ struct smp_irk *hci_find_irk_by_rpa(struct hci_dev *hdev, bdaddr_t *rpa)
3238 } 3241 }
3239 3242
3240 list_for_each_entry(irk, &hdev->identity_resolving_keys, list) { 3243 list_for_each_entry(irk, &hdev->identity_resolving_keys, list) {
3241 if (smp_irk_matches(hdev->tfm_aes, irk->val, rpa)) { 3244 if (smp_irk_matches(hdev, irk->val, rpa)) {
3242 bacpy(&irk->rpa, rpa); 3245 bacpy(&irk->rpa, rpa);
3243 return irk; 3246 return irk;
3244 } 3247 }
@@ -3723,6 +3726,18 @@ int hci_conn_params_set(struct hci_dev *hdev, bdaddr_t *addr, u8 addr_type,
3723 return 0; 3726 return 0;
3724} 3727}
3725 3728
3729static void hci_conn_params_free(struct hci_conn_params *params)
3730{
3731 if (params->conn) {
3732 hci_conn_drop(params->conn);
3733 hci_conn_put(params->conn);
3734 }
3735
3736 list_del(&params->action);
3737 list_del(&params->list);
3738 kfree(params);
3739}
3740
3726/* This function requires the caller holds hdev->lock */ 3741/* This function requires the caller holds hdev->lock */
3727void hci_conn_params_del(struct hci_dev *hdev, bdaddr_t *addr, u8 addr_type) 3742void hci_conn_params_del(struct hci_dev *hdev, bdaddr_t *addr, u8 addr_type)
3728{ 3743{
@@ -3732,12 +3747,7 @@ void hci_conn_params_del(struct hci_dev *hdev, bdaddr_t *addr, u8 addr_type)
3732 if (!params) 3747 if (!params)
3733 return; 3748 return;
3734 3749
3735 if (params->conn) 3750 hci_conn_params_free(params);
3736 hci_conn_drop(params->conn);
3737
3738 list_del(&params->action);
3739 list_del(&params->list);
3740 kfree(params);
3741 3751
3742 hci_update_background_scan(hdev); 3752 hci_update_background_scan(hdev);
3743 3753
@@ -3764,13 +3774,8 @@ void hci_conn_params_clear_all(struct hci_dev *hdev)
3764{ 3774{
3765 struct hci_conn_params *params, *tmp; 3775 struct hci_conn_params *params, *tmp;
3766 3776
3767 list_for_each_entry_safe(params, tmp, &hdev->le_conn_params, list) { 3777 list_for_each_entry_safe(params, tmp, &hdev->le_conn_params, list)
3768 if (params->conn) 3778 hci_conn_params_free(params);
3769 hci_conn_drop(params->conn);
3770 list_del(&params->action);
3771 list_del(&params->list);
3772 kfree(params);
3773 }
3774 3779
3775 hci_update_background_scan(hdev); 3780 hci_update_background_scan(hdev);
3776 3781
@@ -3867,6 +3872,7 @@ static void set_random_addr(struct hci_request *req, bdaddr_t *rpa)
3867 if (test_bit(HCI_LE_ADV, &hdev->dev_flags) || 3872 if (test_bit(HCI_LE_ADV, &hdev->dev_flags) ||
3868 hci_conn_hash_lookup_state(hdev, LE_LINK, BT_CONNECT)) { 3873 hci_conn_hash_lookup_state(hdev, LE_LINK, BT_CONNECT)) {
3869 BT_DBG("Deferring random address update"); 3874 BT_DBG("Deferring random address update");
3875 set_bit(HCI_RPA_EXPIRED, &hdev->dev_flags);
3870 return; 3876 return;
3871 } 3877 }
3872 3878
@@ -3892,7 +3898,7 @@ int hci_update_random_address(struct hci_request *req, bool require_privacy,
3892 !bacmp(&hdev->random_addr, &hdev->rpa)) 3898 !bacmp(&hdev->random_addr, &hdev->rpa))
3893 return 0; 3899 return 0;
3894 3900
3895 err = smp_generate_rpa(hdev->tfm_aes, hdev->irk, &hdev->rpa); 3901 err = smp_generate_rpa(hdev, hdev->irk, &hdev->rpa);
3896 if (err < 0) { 3902 if (err < 0) {
3897 BT_ERR("%s failed to generate new RPA", hdev->name); 3903 BT_ERR("%s failed to generate new RPA", hdev->name);
3898 return err; 3904 return err;
@@ -4100,18 +4106,9 @@ int hci_register_dev(struct hci_dev *hdev)
4100 4106
4101 dev_set_name(&hdev->dev, "%s", hdev->name); 4107 dev_set_name(&hdev->dev, "%s", hdev->name);
4102 4108
4103 hdev->tfm_aes = crypto_alloc_blkcipher("ecb(aes)", 0,
4104 CRYPTO_ALG_ASYNC);
4105 if (IS_ERR(hdev->tfm_aes)) {
4106 BT_ERR("Unable to create crypto context");
4107 error = PTR_ERR(hdev->tfm_aes);
4108 hdev->tfm_aes = NULL;
4109 goto err_wqueue;
4110 }
4111
4112 error = device_add(&hdev->dev); 4109 error = device_add(&hdev->dev);
4113 if (error < 0) 4110 if (error < 0)
4114 goto err_tfm; 4111 goto err_wqueue;
4115 4112
4116 hdev->rfkill = rfkill_alloc(hdev->name, &hdev->dev, 4113 hdev->rfkill = rfkill_alloc(hdev->name, &hdev->dev,
4117 RFKILL_TYPE_BLUETOOTH, &hci_rfkill_ops, 4114 RFKILL_TYPE_BLUETOOTH, &hci_rfkill_ops,
@@ -4153,8 +4150,6 @@ int hci_register_dev(struct hci_dev *hdev)
4153 4150
4154 return id; 4151 return id;
4155 4152
4156err_tfm:
4157 crypto_free_blkcipher(hdev->tfm_aes);
4158err_wqueue: 4153err_wqueue:
4159 destroy_workqueue(hdev->workqueue); 4154 destroy_workqueue(hdev->workqueue);
4160 destroy_workqueue(hdev->req_workqueue); 4155 destroy_workqueue(hdev->req_workqueue);
@@ -4206,8 +4201,7 @@ void hci_unregister_dev(struct hci_dev *hdev)
4206 rfkill_destroy(hdev->rfkill); 4201 rfkill_destroy(hdev->rfkill);
4207 } 4202 }
4208 4203
4209 if (hdev->tfm_aes) 4204 smp_unregister(hdev);
4210 crypto_free_blkcipher(hdev->tfm_aes);
4211 4205
4212 device_del(&hdev->dev); 4206 device_del(&hdev->dev);
4213 4207
@@ -4380,26 +4374,6 @@ static int hci_reassembly(struct hci_dev *hdev, int type, void *data,
4380 return remain; 4374 return remain;
4381} 4375}
4382 4376
4383int hci_recv_fragment(struct hci_dev *hdev, int type, void *data, int count)
4384{
4385 int rem = 0;
4386
4387 if (type < HCI_ACLDATA_PKT || type > HCI_EVENT_PKT)
4388 return -EILSEQ;
4389
4390 while (count) {
4391 rem = hci_reassembly(hdev, type, data, count, type - 1);
4392 if (rem < 0)
4393 return rem;
4394
4395 data += (count - rem);
4396 count = rem;
4397 }
4398
4399 return rem;
4400}
4401EXPORT_SYMBOL(hci_recv_fragment);
4402
4403#define STREAM_REASSEMBLY 0 4377#define STREAM_REASSEMBLY 0
4404 4378
4405int hci_recv_stream_fragment(struct hci_dev *hdev, void *data, int count) 4379int hci_recv_stream_fragment(struct hci_dev *hdev, void *data, int count)
@@ -4553,6 +4527,7 @@ static struct sk_buff *hci_prepare_cmd(struct hci_dev *hdev, u16 opcode,
4553 BT_DBG("skb len %d", skb->len); 4527 BT_DBG("skb len %d", skb->len);
4554 4528
4555 bt_cb(skb)->pkt_type = HCI_COMMAND_PKT; 4529 bt_cb(skb)->pkt_type = HCI_COMMAND_PKT;
4530 bt_cb(skb)->opcode = opcode;
4556 4531
4557 return skb; 4532 return skb;
4558} 4533}
@@ -5690,3 +5665,52 @@ void hci_update_background_scan(struct hci_dev *hdev)
5690 if (err) 5665 if (err)
5691 BT_ERR("Failed to run HCI request: err %d", err); 5666 BT_ERR("Failed to run HCI request: err %d", err);
5692} 5667}
5668
5669static bool disconnected_whitelist_entries(struct hci_dev *hdev)
5670{
5671 struct bdaddr_list *b;
5672
5673 list_for_each_entry(b, &hdev->whitelist, list) {
5674 struct hci_conn *conn;
5675
5676 conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &b->bdaddr);
5677 if (!conn)
5678 return true;
5679
5680 if (conn->state != BT_CONNECTED && conn->state != BT_CONFIG)
5681 return true;
5682 }
5683
5684 return false;
5685}
5686
5687void hci_update_page_scan(struct hci_dev *hdev, struct hci_request *req)
5688{
5689 u8 scan;
5690
5691 if (!test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags))
5692 return;
5693
5694 if (!hdev_is_powered(hdev))
5695 return;
5696
5697 if (mgmt_powering_down(hdev))
5698 return;
5699
5700 if (test_bit(HCI_CONNECTABLE, &hdev->dev_flags) ||
5701 disconnected_whitelist_entries(hdev))
5702 scan = SCAN_PAGE;
5703 else
5704 scan = SCAN_DISABLED;
5705
5706 if (test_bit(HCI_PSCAN, &hdev->flags) == !!(scan & SCAN_PAGE))
5707 return;
5708
5709 if (test_bit(HCI_DISCOVERABLE, &hdev->dev_flags))
5710 scan |= SCAN_INQUIRY;
5711
5712 if (req)
5713 hci_req_add(req, HCI_OP_WRITE_SCAN_ENABLE, 1, &scan);
5714 else
5715 hci_send_cmd(hdev, HCI_OP_WRITE_SCAN_ENABLE, 1, &scan);
5716}
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index a6000823f0ff..8b0a2a6de419 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -2071,6 +2071,8 @@ static void hci_conn_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
2071 cp.handle = ev->handle; 2071 cp.handle = ev->handle;
2072 hci_send_cmd(hdev, HCI_OP_READ_REMOTE_FEATURES, 2072 hci_send_cmd(hdev, HCI_OP_READ_REMOTE_FEATURES,
2073 sizeof(cp), &cp); 2073 sizeof(cp), &cp);
2074
2075 hci_update_page_scan(hdev, NULL);
2074 } 2076 }
2075 2077
2076 /* Set packet type for incoming connection */ 2078 /* Set packet type for incoming connection */
@@ -2247,9 +2249,12 @@ static void hci_disconn_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
2247 mgmt_device_disconnected(hdev, &conn->dst, conn->type, conn->dst_type, 2249 mgmt_device_disconnected(hdev, &conn->dst, conn->type, conn->dst_type,
2248 reason, mgmt_connected); 2250 reason, mgmt_connected);
2249 2251
2250 if (conn->type == ACL_LINK && 2252 if (conn->type == ACL_LINK) {
2251 test_bit(HCI_CONN_FLUSH_KEY, &conn->flags)) 2253 if (test_bit(HCI_CONN_FLUSH_KEY, &conn->flags))
2252 hci_remove_link_key(hdev, &conn->dst); 2254 hci_remove_link_key(hdev, &conn->dst);
2255
2256 hci_update_page_scan(hdev, NULL);
2257 }
2253 2258
2254 params = hci_conn_params_lookup(hdev, &conn->dst, conn->dst_type); 2259 params = hci_conn_params_lookup(hdev, &conn->dst, conn->dst_type);
2255 if (params) { 2260 if (params) {
@@ -2315,8 +2320,7 @@ static void hci_auth_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
2315 conn->sec_level = conn->pending_sec_level; 2320 conn->sec_level = conn->pending_sec_level;
2316 } 2321 }
2317 } else { 2322 } else {
2318 mgmt_auth_failed(hdev, &conn->dst, conn->type, conn->dst_type, 2323 mgmt_auth_failed(conn, ev->status);
2319 ev->status);
2320 } 2324 }
2321 2325
2322 clear_bit(HCI_CONN_AUTH_PEND, &conn->flags); 2326 clear_bit(HCI_CONN_AUTH_PEND, &conn->flags);
@@ -2434,6 +2438,12 @@ static void hci_encrypt_change_evt(struct hci_dev *hdev, struct sk_buff *skb)
2434 } 2438 }
2435 } 2439 }
2436 2440
2441 /* We should disregard the current RPA and generate a new one
2442 * whenever the encryption procedure fails.
2443 */
2444 if (ev->status && conn->type == LE_LINK)
2445 set_bit(HCI_RPA_EXPIRED, &hdev->dev_flags);
2446
2437 clear_bit(HCI_CONN_ENCRYPT_PEND, &conn->flags); 2447 clear_bit(HCI_CONN_ENCRYPT_PEND, &conn->flags);
2438 2448
2439 if (ev->status && conn->state == BT_CONNECTED) { 2449 if (ev->status && conn->state == BT_CONNECTED) {
@@ -3895,8 +3905,7 @@ static void hci_simple_pair_complete_evt(struct hci_dev *hdev,
3895 * event gets always produced as initiator and is also mapped to 3905 * event gets always produced as initiator and is also mapped to
3896 * the mgmt_auth_failed event */ 3906 * the mgmt_auth_failed event */
3897 if (!test_bit(HCI_CONN_AUTH_PEND, &conn->flags) && ev->status) 3907 if (!test_bit(HCI_CONN_AUTH_PEND, &conn->flags) && ev->status)
3898 mgmt_auth_failed(hdev, &conn->dst, conn->type, conn->dst_type, 3908 mgmt_auth_failed(conn, ev->status);
3899 ev->status);
3900 3909
3901 hci_conn_drop(conn); 3910 hci_conn_drop(conn);
3902 3911
@@ -4188,16 +4197,16 @@ static void hci_le_conn_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
4188 conn->dst_type = irk->addr_type; 4197 conn->dst_type = irk->addr_type;
4189 } 4198 }
4190 4199
4191 if (conn->dst_type == ADDR_LE_DEV_PUBLIC)
4192 addr_type = BDADDR_LE_PUBLIC;
4193 else
4194 addr_type = BDADDR_LE_RANDOM;
4195
4196 if (ev->status) { 4200 if (ev->status) {
4197 hci_le_conn_failed(conn, ev->status); 4201 hci_le_conn_failed(conn, ev->status);
4198 goto unlock; 4202 goto unlock;
4199 } 4203 }
4200 4204
4205 if (conn->dst_type == ADDR_LE_DEV_PUBLIC)
4206 addr_type = BDADDR_LE_PUBLIC;
4207 else
4208 addr_type = BDADDR_LE_RANDOM;
4209
4201 /* Drop the connection if the device is blocked */ 4210 /* Drop the connection if the device is blocked */
4202 if (hci_bdaddr_list_lookup(&hdev->blacklist, &conn->dst, addr_type)) { 4211 if (hci_bdaddr_list_lookup(&hdev->blacklist, &conn->dst, addr_type)) {
4203 hci_conn_drop(conn); 4212 hci_conn_drop(conn);
@@ -4220,11 +4229,13 @@ static void hci_le_conn_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
4220 4229
4221 hci_proto_connect_cfm(conn, ev->status); 4230 hci_proto_connect_cfm(conn, ev->status);
4222 4231
4223 params = hci_conn_params_lookup(hdev, &conn->dst, conn->dst_type); 4232 params = hci_pend_le_action_lookup(&hdev->pend_le_conns, &conn->dst,
4233 conn->dst_type);
4224 if (params) { 4234 if (params) {
4225 list_del_init(&params->action); 4235 list_del_init(&params->action);
4226 if (params->conn) { 4236 if (params->conn) {
4227 hci_conn_drop(params->conn); 4237 hci_conn_drop(params->conn);
4238 hci_conn_put(params->conn);
4228 params->conn = NULL; 4239 params->conn = NULL;
4229 } 4240 }
4230 } 4241 }
@@ -4316,7 +4327,7 @@ static void check_pending_le_conn(struct hci_dev *hdev, bdaddr_t *addr,
4316 * the parameters get removed and keep the reference 4327 * the parameters get removed and keep the reference
4317 * count consistent once the connection is established. 4328 * count consistent once the connection is established.
4318 */ 4329 */
4319 params->conn = conn; 4330 params->conn = hci_conn_get(conn);
4320 return; 4331 return;
4321 } 4332 }
4322 4333
@@ -4501,10 +4512,7 @@ static void hci_le_ltk_request_evt(struct hci_dev *hdev, struct sk_buff *skb)
4501 memcpy(cp.ltk, ltk->val, sizeof(ltk->val)); 4512 memcpy(cp.ltk, ltk->val, sizeof(ltk->val));
4502 cp.handle = cpu_to_le16(conn->handle); 4513 cp.handle = cpu_to_le16(conn->handle);
4503 4514
4504 if (ltk->authenticated) 4515 conn->pending_sec_level = smp_ltk_sec_level(ltk);
4505 conn->pending_sec_level = BT_SECURITY_HIGH;
4506 else
4507 conn->pending_sec_level = BT_SECURITY_MEDIUM;
4508 4516
4509 conn->enc_key_size = ltk->enc_size; 4517 conn->enc_key_size = ltk->enc_size;
4510 4518
diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index 6c7ecf116e74..1b7d605706aa 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -915,7 +915,7 @@ static int hidp_session_new(struct hidp_session **out, const bdaddr_t *bdaddr,
915 915
916 /* connection management */ 916 /* connection management */
917 bacpy(&session->bdaddr, bdaddr); 917 bacpy(&session->bdaddr, bdaddr);
918 session->conn = conn; 918 session->conn = l2cap_conn_get(conn);
919 session->user.probe = hidp_session_probe; 919 session->user.probe = hidp_session_probe;
920 session->user.remove = hidp_session_remove; 920 session->user.remove = hidp_session_remove;
921 session->ctrl_sock = ctrl_sock; 921 session->ctrl_sock = ctrl_sock;
@@ -941,13 +941,13 @@ static int hidp_session_new(struct hidp_session **out, const bdaddr_t *bdaddr,
941 if (ret) 941 if (ret)
942 goto err_free; 942 goto err_free;
943 943
944 l2cap_conn_get(session->conn);
945 get_file(session->intr_sock->file); 944 get_file(session->intr_sock->file);
946 get_file(session->ctrl_sock->file); 945 get_file(session->ctrl_sock->file);
947 *out = session; 946 *out = session;
948 return 0; 947 return 0;
949 948
950err_free: 949err_free:
950 l2cap_conn_put(session->conn);
951 kfree(session); 951 kfree(session);
952 return ret; 952 return ret;
953} 953}
@@ -1327,10 +1327,8 @@ int hidp_connection_add(struct hidp_connadd_req *req,
1327 1327
1328 conn = NULL; 1328 conn = NULL;
1329 l2cap_chan_lock(chan); 1329 l2cap_chan_lock(chan);
1330 if (chan->conn) { 1330 if (chan->conn)
1331 l2cap_conn_get(chan->conn); 1331 conn = l2cap_conn_get(chan->conn);
1332 conn = chan->conn;
1333 }
1334 l2cap_chan_unlock(chan); 1332 l2cap_chan_unlock(chan);
1335 1333
1336 if (!conn) 1334 if (!conn)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index 46547b920f88..b6f9777e057d 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -210,6 +210,10 @@ int l2cap_add_scid(struct l2cap_chan *chan, __u16 scid)
210{ 210{
211 write_lock(&chan_list_lock); 211 write_lock(&chan_list_lock);
212 212
213 /* Override the defaults (which are for conn-oriented) */
214 chan->omtu = L2CAP_DEFAULT_MTU;
215 chan->chan_type = L2CAP_CHAN_FIXED;
216
213 chan->scid = scid; 217 chan->scid = scid;
214 218
215 write_unlock(&chan_list_lock); 219 write_unlock(&chan_list_lock);
@@ -542,7 +546,10 @@ void __l2cap_chan_add(struct l2cap_conn *conn, struct l2cap_chan *chan)
542 546
543 l2cap_chan_hold(chan); 547 l2cap_chan_hold(chan);
544 548
545 hci_conn_hold(conn->hcon); 549 /* Only keep a reference for fixed channels if they requested it */
550 if (chan->chan_type != L2CAP_CHAN_FIXED ||
551 test_bit(FLAG_HOLD_HCI_CONN, &chan->flags))
552 hci_conn_hold(conn->hcon);
546 553
547 list_add(&chan->list, &conn->chan_l); 554 list_add(&chan->list, &conn->chan_l);
548} 555}
@@ -562,6 +569,8 @@ void l2cap_chan_del(struct l2cap_chan *chan, int err)
562 569
563 BT_DBG("chan %p, conn %p, err %d", chan, conn, err); 570 BT_DBG("chan %p, conn %p, err %d", chan, conn, err);
564 571
572 chan->ops->teardown(chan, err);
573
565 if (conn) { 574 if (conn) {
566 struct amp_mgr *mgr = conn->hcon->amp_mgr; 575 struct amp_mgr *mgr = conn->hcon->amp_mgr;
567 /* Delete from channel list */ 576 /* Delete from channel list */
@@ -571,7 +580,12 @@ void l2cap_chan_del(struct l2cap_chan *chan, int err)
571 580
572 chan->conn = NULL; 581 chan->conn = NULL;
573 582
574 if (chan->scid != L2CAP_CID_A2MP) 583 /* Reference was only held for non-fixed channels or
584 * fixed channels that explicitly requested it using the
585 * FLAG_HOLD_HCI_CONN flag.
586 */
587 if (chan->chan_type != L2CAP_CHAN_FIXED ||
588 test_bit(FLAG_HOLD_HCI_CONN, &chan->flags))
575 hci_conn_drop(conn->hcon); 589 hci_conn_drop(conn->hcon);
576 590
577 if (mgr && mgr->bredr_chan == chan) 591 if (mgr && mgr->bredr_chan == chan)
@@ -585,8 +599,6 @@ void l2cap_chan_del(struct l2cap_chan *chan, int err)
585 amp_disconnect_logical_link(hs_hchan); 599 amp_disconnect_logical_link(hs_hchan);
586 } 600 }
587 601
588 chan->ops->teardown(chan, err);
589
590 if (test_bit(CONF_NOT_COMPLETE, &chan->conf_state)) 602 if (test_bit(CONF_NOT_COMPLETE, &chan->conf_state))
591 return; 603 return;
592 604
@@ -619,9 +631,11 @@ void l2cap_chan_del(struct l2cap_chan *chan, int err)
619} 631}
620EXPORT_SYMBOL_GPL(l2cap_chan_del); 632EXPORT_SYMBOL_GPL(l2cap_chan_del);
621 633
622void l2cap_conn_update_id_addr(struct hci_conn *hcon) 634static void l2cap_conn_update_id_addr(struct work_struct *work)
623{ 635{
624 struct l2cap_conn *conn = hcon->l2cap_data; 636 struct l2cap_conn *conn = container_of(work, struct l2cap_conn,
637 id_addr_update_work);
638 struct hci_conn *hcon = conn->hcon;
625 struct l2cap_chan *chan; 639 struct l2cap_chan *chan;
626 640
627 mutex_lock(&conn->chan_lock); 641 mutex_lock(&conn->chan_lock);
@@ -1082,6 +1096,9 @@ static void l2cap_send_rr_or_rnr(struct l2cap_chan *chan, bool poll)
1082 1096
1083static inline int __l2cap_no_conn_pending(struct l2cap_chan *chan) 1097static inline int __l2cap_no_conn_pending(struct l2cap_chan *chan)
1084{ 1098{
1099 if (chan->chan_type != L2CAP_CHAN_CONN_ORIENTED)
1100 return true;
1101
1085 return !test_bit(CONF_CONNECT_PEND, &chan->conf_state); 1102 return !test_bit(CONF_CONNECT_PEND, &chan->conf_state);
1086} 1103}
1087 1104
@@ -1266,6 +1283,24 @@ static void l2cap_start_connection(struct l2cap_chan *chan)
1266 } 1283 }
1267} 1284}
1268 1285
1286static void l2cap_request_info(struct l2cap_conn *conn)
1287{
1288 struct l2cap_info_req req;
1289
1290 if (conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_SENT)
1291 return;
1292
1293 req.type = cpu_to_le16(L2CAP_IT_FEAT_MASK);
1294
1295 conn->info_state |= L2CAP_INFO_FEAT_MASK_REQ_SENT;
1296 conn->info_ident = l2cap_get_ident(conn);
1297
1298 schedule_delayed_work(&conn->info_timer, L2CAP_INFO_TIMEOUT);
1299
1300 l2cap_send_cmd(conn, conn->info_ident, L2CAP_INFO_REQ,
1301 sizeof(req), &req);
1302}
1303
1269static void l2cap_do_start(struct l2cap_chan *chan) 1304static void l2cap_do_start(struct l2cap_chan *chan)
1270{ 1305{
1271 struct l2cap_conn *conn = chan->conn; 1306 struct l2cap_conn *conn = chan->conn;
@@ -1275,26 +1310,17 @@ static void l2cap_do_start(struct l2cap_chan *chan)
1275 return; 1310 return;
1276 } 1311 }
1277 1312
1278 if (conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_SENT) { 1313 if (!(conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_SENT)) {
1279 if (!(conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_DONE)) 1314 l2cap_request_info(conn);
1280 return; 1315 return;
1281 1316 }
1282 if (l2cap_chan_check_security(chan, true) &&
1283 __l2cap_no_conn_pending(chan)) {
1284 l2cap_start_connection(chan);
1285 }
1286 } else {
1287 struct l2cap_info_req req;
1288 req.type = cpu_to_le16(L2CAP_IT_FEAT_MASK);
1289
1290 conn->info_state |= L2CAP_INFO_FEAT_MASK_REQ_SENT;
1291 conn->info_ident = l2cap_get_ident(conn);
1292 1317
1293 schedule_delayed_work(&conn->info_timer, L2CAP_INFO_TIMEOUT); 1318 if (!(conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_DONE))
1319 return;
1294 1320
1295 l2cap_send_cmd(conn, conn->info_ident, L2CAP_INFO_REQ, 1321 if (l2cap_chan_check_security(chan, true) &&
1296 sizeof(req), &req); 1322 __l2cap_no_conn_pending(chan))
1297 } 1323 l2cap_start_connection(chan);
1298} 1324}
1299 1325
1300static inline int l2cap_mode_supported(__u8 mode, __u32 feat_mask) 1326static inline int l2cap_mode_supported(__u8 mode, __u32 feat_mask)
@@ -1353,6 +1379,7 @@ static void l2cap_conn_start(struct l2cap_conn *conn)
1353 l2cap_chan_lock(chan); 1379 l2cap_chan_lock(chan);
1354 1380
1355 if (chan->chan_type != L2CAP_CHAN_CONN_ORIENTED) { 1381 if (chan->chan_type != L2CAP_CHAN_CONN_ORIENTED) {
1382 l2cap_chan_ready(chan);
1356 l2cap_chan_unlock(chan); 1383 l2cap_chan_unlock(chan);
1357 continue; 1384 continue;
1358 } 1385 }
@@ -1417,71 +1444,18 @@ static void l2cap_conn_start(struct l2cap_conn *conn)
1417 mutex_unlock(&conn->chan_lock); 1444 mutex_unlock(&conn->chan_lock);
1418} 1445}
1419 1446
1420/* Find socket with cid and source/destination bdaddr.
1421 * Returns closest match, locked.
1422 */
1423static struct l2cap_chan *l2cap_global_chan_by_scid(int state, u16 cid,
1424 bdaddr_t *src,
1425 bdaddr_t *dst)
1426{
1427 struct l2cap_chan *c, *c1 = NULL;
1428
1429 read_lock(&chan_list_lock);
1430
1431 list_for_each_entry(c, &chan_list, global_l) {
1432 if (state && c->state != state)
1433 continue;
1434
1435 if (c->scid == cid) {
1436 int src_match, dst_match;
1437 int src_any, dst_any;
1438
1439 /* Exact match. */
1440 src_match = !bacmp(&c->src, src);
1441 dst_match = !bacmp(&c->dst, dst);
1442 if (src_match && dst_match) {
1443 read_unlock(&chan_list_lock);
1444 return c;
1445 }
1446
1447 /* Closest match */
1448 src_any = !bacmp(&c->src, BDADDR_ANY);
1449 dst_any = !bacmp(&c->dst, BDADDR_ANY);
1450 if ((src_match && dst_any) || (src_any && dst_match) ||
1451 (src_any && dst_any))
1452 c1 = c;
1453 }
1454 }
1455
1456 read_unlock(&chan_list_lock);
1457
1458 return c1;
1459}
1460
1461static void l2cap_le_conn_ready(struct l2cap_conn *conn) 1447static void l2cap_le_conn_ready(struct l2cap_conn *conn)
1462{ 1448{
1463 struct hci_conn *hcon = conn->hcon; 1449 struct hci_conn *hcon = conn->hcon;
1464 struct hci_dev *hdev = hcon->hdev; 1450 struct hci_dev *hdev = hcon->hdev;
1465 struct l2cap_chan *chan, *pchan;
1466 u8 dst_type;
1467 1451
1468 BT_DBG(""); 1452 BT_DBG("%s conn %p", hdev->name, conn);
1469
1470 /* Check if we have socket listening on cid */
1471 pchan = l2cap_global_chan_by_scid(BT_LISTEN, L2CAP_CID_ATT,
1472 &hcon->src, &hcon->dst);
1473 if (!pchan)
1474 return;
1475
1476 /* Client ATT sockets should override the server one */
1477 if (__l2cap_get_chan_by_dcid(conn, L2CAP_CID_ATT))
1478 return;
1479
1480 dst_type = bdaddr_type(hcon, hcon->dst_type);
1481 1453
1482 /* If device is blocked, do not create a channel for it */ 1454 /* For outgoing pairing which doesn't necessarily have an
1483 if (hci_bdaddr_list_lookup(&hdev->blacklist, &hcon->dst, dst_type)) 1455 * associated socket (e.g. mgmt_pair_device).
1484 return; 1456 */
1457 if (hcon->out)
1458 smp_conn_security(hcon, hcon->pending_sec_level);
1485 1459
1486 /* For LE slave connections, make sure the connection interval 1460 /* For LE slave connections, make sure the connection interval
1487 * is in the range of the minium and maximum interval that has 1461 * is in the range of the minium and maximum interval that has
@@ -1501,22 +1475,6 @@ static void l2cap_le_conn_ready(struct l2cap_conn *conn)
1501 l2cap_send_cmd(conn, l2cap_get_ident(conn), 1475 l2cap_send_cmd(conn, l2cap_get_ident(conn),
1502 L2CAP_CONN_PARAM_UPDATE_REQ, sizeof(req), &req); 1476 L2CAP_CONN_PARAM_UPDATE_REQ, sizeof(req), &req);
1503 } 1477 }
1504
1505 l2cap_chan_lock(pchan);
1506
1507 chan = pchan->ops->new_connection(pchan);
1508 if (!chan)
1509 goto clean;
1510
1511 bacpy(&chan->src, &hcon->src);
1512 bacpy(&chan->dst, &hcon->dst);
1513 chan->src_type = bdaddr_type(hcon, hcon->src_type);
1514 chan->dst_type = dst_type;
1515
1516 __l2cap_chan_add(conn, chan);
1517
1518clean:
1519 l2cap_chan_unlock(pchan);
1520} 1478}
1521 1479
1522static void l2cap_conn_ready(struct l2cap_conn *conn) 1480static void l2cap_conn_ready(struct l2cap_conn *conn)
@@ -1526,17 +1484,11 @@ static void l2cap_conn_ready(struct l2cap_conn *conn)
1526 1484
1527 BT_DBG("conn %p", conn); 1485 BT_DBG("conn %p", conn);
1528 1486
1529 /* For outgoing pairing which doesn't necessarily have an 1487 if (hcon->type == ACL_LINK)
1530 * associated socket (e.g. mgmt_pair_device). 1488 l2cap_request_info(conn);
1531 */
1532 if (hcon->out && hcon->type == LE_LINK)
1533 smp_conn_security(hcon, hcon->pending_sec_level);
1534 1489
1535 mutex_lock(&conn->chan_lock); 1490 mutex_lock(&conn->chan_lock);
1536 1491
1537 if (hcon->type == LE_LINK)
1538 l2cap_le_conn_ready(conn);
1539
1540 list_for_each_entry(chan, &conn->chan_l, list) { 1492 list_for_each_entry(chan, &conn->chan_l, list) {
1541 1493
1542 l2cap_chan_lock(chan); 1494 l2cap_chan_lock(chan);
@@ -1549,8 +1501,8 @@ static void l2cap_conn_ready(struct l2cap_conn *conn)
1549 if (hcon->type == LE_LINK) { 1501 if (hcon->type == LE_LINK) {
1550 l2cap_le_start(chan); 1502 l2cap_le_start(chan);
1551 } else if (chan->chan_type != L2CAP_CHAN_CONN_ORIENTED) { 1503 } else if (chan->chan_type != L2CAP_CHAN_CONN_ORIENTED) {
1552 l2cap_chan_ready(chan); 1504 if (conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_DONE)
1553 1505 l2cap_chan_ready(chan);
1554 } else if (chan->state == BT_CONNECT) { 1506 } else if (chan->state == BT_CONNECT) {
1555 l2cap_do_start(chan); 1507 l2cap_do_start(chan);
1556 } 1508 }
@@ -1560,6 +1512,9 @@ static void l2cap_conn_ready(struct l2cap_conn *conn)
1560 1512
1561 mutex_unlock(&conn->chan_lock); 1513 mutex_unlock(&conn->chan_lock);
1562 1514
1515 if (hcon->type == LE_LINK)
1516 l2cap_le_conn_ready(conn);
1517
1563 queue_work(hcon->hdev->workqueue, &conn->pending_rx_work); 1518 queue_work(hcon->hdev->workqueue, &conn->pending_rx_work);
1564} 1519}
1565 1520
@@ -1695,8 +1650,14 @@ static void l2cap_conn_del(struct hci_conn *hcon, int err)
1695 if (work_pending(&conn->pending_rx_work)) 1650 if (work_pending(&conn->pending_rx_work))
1696 cancel_work_sync(&conn->pending_rx_work); 1651 cancel_work_sync(&conn->pending_rx_work);
1697 1652
1653 if (work_pending(&conn->id_addr_update_work))
1654 cancel_work_sync(&conn->id_addr_update_work);
1655
1698 l2cap_unregister_all_users(conn); 1656 l2cap_unregister_all_users(conn);
1699 1657
1658 /* Force the connection to be immediately dropped */
1659 hcon->disc_timeout = 0;
1660
1700 mutex_lock(&conn->chan_lock); 1661 mutex_lock(&conn->chan_lock);
1701 1662
1702 /* Kill channels */ 1663 /* Kill channels */
@@ -1719,29 +1680,11 @@ static void l2cap_conn_del(struct hci_conn *hcon, int err)
1719 if (conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_SENT) 1680 if (conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_SENT)
1720 cancel_delayed_work_sync(&conn->info_timer); 1681 cancel_delayed_work_sync(&conn->info_timer);
1721 1682
1722 if (test_and_clear_bit(HCI_CONN_LE_SMP_PEND, &hcon->flags)) {
1723 cancel_delayed_work_sync(&conn->security_timer);
1724 smp_chan_destroy(conn);
1725 }
1726
1727 hcon->l2cap_data = NULL; 1683 hcon->l2cap_data = NULL;
1728 conn->hchan = NULL; 1684 conn->hchan = NULL;
1729 l2cap_conn_put(conn); 1685 l2cap_conn_put(conn);
1730} 1686}
1731 1687
1732static void security_timeout(struct work_struct *work)
1733{
1734 struct l2cap_conn *conn = container_of(work, struct l2cap_conn,
1735 security_timer.work);
1736
1737 BT_DBG("conn %p", conn);
1738
1739 if (test_and_clear_bit(HCI_CONN_LE_SMP_PEND, &conn->hcon->flags)) {
1740 smp_chan_destroy(conn);
1741 l2cap_conn_del(conn->hcon, ETIMEDOUT);
1742 }
1743}
1744
1745static void l2cap_conn_free(struct kref *ref) 1688static void l2cap_conn_free(struct kref *ref)
1746{ 1689{
1747 struct l2cap_conn *conn = container_of(ref, struct l2cap_conn, ref); 1690 struct l2cap_conn *conn = container_of(ref, struct l2cap_conn, ref);
@@ -1750,9 +1693,10 @@ static void l2cap_conn_free(struct kref *ref)
1750 kfree(conn); 1693 kfree(conn);
1751} 1694}
1752 1695
1753void l2cap_conn_get(struct l2cap_conn *conn) 1696struct l2cap_conn *l2cap_conn_get(struct l2cap_conn *conn)
1754{ 1697{
1755 kref_get(&conn->ref); 1698 kref_get(&conn->ref);
1699 return conn;
1756} 1700}
1757EXPORT_SYMBOL(l2cap_conn_get); 1701EXPORT_SYMBOL(l2cap_conn_get);
1758 1702
@@ -1794,6 +1738,7 @@ static struct l2cap_chan *l2cap_global_chan_by_psm(int state, __le16 psm,
1794 src_match = !bacmp(&c->src, src); 1738 src_match = !bacmp(&c->src, src);
1795 dst_match = !bacmp(&c->dst, dst); 1739 dst_match = !bacmp(&c->dst, dst);
1796 if (src_match && dst_match) { 1740 if (src_match && dst_match) {
1741 l2cap_chan_hold(c);
1797 read_unlock(&chan_list_lock); 1742 read_unlock(&chan_list_lock);
1798 return c; 1743 return c;
1799 } 1744 }
@@ -1807,6 +1752,9 @@ static struct l2cap_chan *l2cap_global_chan_by_psm(int state, __le16 psm,
1807 } 1752 }
1808 } 1753 }
1809 1754
1755 if (c1)
1756 l2cap_chan_hold(c1);
1757
1810 read_unlock(&chan_list_lock); 1758 read_unlock(&chan_list_lock);
1811 1759
1812 return c1; 1760 return c1;
@@ -2027,10 +1975,12 @@ static void l2cap_ertm_resend(struct l2cap_chan *chan)
2027 tx_skb->data + L2CAP_HDR_SIZE); 1975 tx_skb->data + L2CAP_HDR_SIZE);
2028 } 1976 }
2029 1977
1978 /* Update FCS */
2030 if (chan->fcs == L2CAP_FCS_CRC16) { 1979 if (chan->fcs == L2CAP_FCS_CRC16) {
2031 u16 fcs = crc16(0, (u8 *) tx_skb->data, tx_skb->len); 1980 u16 fcs = crc16(0, (u8 *) tx_skb->data,
2032 put_unaligned_le16(fcs, skb_put(tx_skb, 1981 tx_skb->len - L2CAP_FCS_SIZE);
2033 L2CAP_FCS_SIZE)); 1982 put_unaligned_le16(fcs, skb_tail_pointer(tx_skb) -
1983 L2CAP_FCS_SIZE);
2034 } 1984 }
2035 1985
2036 l2cap_do_send(chan, tx_skb); 1986 l2cap_do_send(chan, tx_skb);
@@ -2334,7 +2284,6 @@ static int l2cap_segment_sdu(struct l2cap_chan *chan,
2334 } else { 2284 } else {
2335 sar = L2CAP_SAR_START; 2285 sar = L2CAP_SAR_START;
2336 sdu_len = len; 2286 sdu_len = len;
2337 pdu_len -= L2CAP_SDULEN_SIZE;
2338 } 2287 }
2339 2288
2340 while (len > 0) { 2289 while (len > 0) {
@@ -2349,10 +2298,8 @@ static int l2cap_segment_sdu(struct l2cap_chan *chan,
2349 __skb_queue_tail(seg_queue, skb); 2298 __skb_queue_tail(seg_queue, skb);
2350 2299
2351 len -= pdu_len; 2300 len -= pdu_len;
2352 if (sdu_len) { 2301 if (sdu_len)
2353 sdu_len = 0; 2302 sdu_len = 0;
2354 pdu_len += L2CAP_SDULEN_SIZE;
2355 }
2356 2303
2357 if (len <= pdu_len) { 2304 if (len <= pdu_len) {
2358 sar = L2CAP_SAR_END; 2305 sar = L2CAP_SAR_END;
@@ -2418,12 +2365,8 @@ static int l2cap_segment_le_sdu(struct l2cap_chan *chan,
2418 2365
2419 BT_DBG("chan %p, msg %p, len %zu", chan, msg, len); 2366 BT_DBG("chan %p, msg %p, len %zu", chan, msg, len);
2420 2367
2421 pdu_len = chan->conn->mtu - L2CAP_HDR_SIZE;
2422
2423 pdu_len = min_t(size_t, pdu_len, chan->remote_mps);
2424
2425 sdu_len = len; 2368 sdu_len = len;
2426 pdu_len -= L2CAP_SDULEN_SIZE; 2369 pdu_len = chan->remote_mps - L2CAP_SDULEN_SIZE;
2427 2370
2428 while (len > 0) { 2371 while (len > 0) {
2429 if (len <= pdu_len) 2372 if (len <= pdu_len)
@@ -3884,6 +3827,7 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
3884response: 3827response:
3885 l2cap_chan_unlock(pchan); 3828 l2cap_chan_unlock(pchan);
3886 mutex_unlock(&conn->chan_lock); 3829 mutex_unlock(&conn->chan_lock);
3830 l2cap_chan_put(pchan);
3887 3831
3888sendresp: 3832sendresp:
3889 rsp.scid = cpu_to_le16(scid); 3833 rsp.scid = cpu_to_le16(scid);
@@ -5487,6 +5431,11 @@ static int l2cap_le_connect_req(struct l2cap_conn *conn,
5487 5431
5488 if (test_bit(FLAG_DEFER_SETUP, &chan->flags)) { 5432 if (test_bit(FLAG_DEFER_SETUP, &chan->flags)) {
5489 l2cap_state_change(chan, BT_CONNECT2); 5433 l2cap_state_change(chan, BT_CONNECT2);
5434 /* The following result value is actually not defined
5435 * for LE CoC but we use it to let the function know
5436 * that it should bail out after doing its cleanup
5437 * instead of sending a response.
5438 */
5490 result = L2CAP_CR_PEND; 5439 result = L2CAP_CR_PEND;
5491 chan->ops->defer(chan); 5440 chan->ops->defer(chan);
5492 } else { 5441 } else {
@@ -5497,6 +5446,7 @@ static int l2cap_le_connect_req(struct l2cap_conn *conn,
5497response_unlock: 5446response_unlock:
5498 l2cap_chan_unlock(pchan); 5447 l2cap_chan_unlock(pchan);
5499 mutex_unlock(&conn->chan_lock); 5448 mutex_unlock(&conn->chan_lock);
5449 l2cap_chan_put(pchan);
5500 5450
5501 if (result == L2CAP_CR_PEND) 5451 if (result == L2CAP_CR_PEND)
5502 return 0; 5452 return 0;
@@ -6845,12 +6795,12 @@ static void l2cap_conless_channel(struct l2cap_conn *conn, __le16 psm,
6845 struct l2cap_chan *chan; 6795 struct l2cap_chan *chan;
6846 6796
6847 if (hcon->type != ACL_LINK) 6797 if (hcon->type != ACL_LINK)
6848 goto drop; 6798 goto free_skb;
6849 6799
6850 chan = l2cap_global_chan_by_psm(0, psm, &hcon->src, &hcon->dst, 6800 chan = l2cap_global_chan_by_psm(0, psm, &hcon->src, &hcon->dst,
6851 ACL_LINK); 6801 ACL_LINK);
6852 if (!chan) 6802 if (!chan)
6853 goto drop; 6803 goto free_skb;
6854 6804
6855 BT_DBG("chan %p, len %d", chan, skb->len); 6805 BT_DBG("chan %p, len %d", chan, skb->len);
6856 6806
@@ -6864,36 +6814,14 @@ static void l2cap_conless_channel(struct l2cap_conn *conn, __le16 psm,
6864 bacpy(&bt_cb(skb)->bdaddr, &hcon->dst); 6814 bacpy(&bt_cb(skb)->bdaddr, &hcon->dst);
6865 bt_cb(skb)->psm = psm; 6815 bt_cb(skb)->psm = psm;
6866 6816
6867 if (!chan->ops->recv(chan, skb)) 6817 if (!chan->ops->recv(chan, skb)) {
6868 return; 6818 l2cap_chan_put(chan);
6869
6870drop:
6871 kfree_skb(skb);
6872}
6873
6874static void l2cap_att_channel(struct l2cap_conn *conn,
6875 struct sk_buff *skb)
6876{
6877 struct hci_conn *hcon = conn->hcon;
6878 struct l2cap_chan *chan;
6879
6880 if (hcon->type != LE_LINK)
6881 goto drop;
6882
6883 chan = l2cap_global_chan_by_scid(BT_CONNECTED, L2CAP_CID_ATT,
6884 &hcon->src, &hcon->dst);
6885 if (!chan)
6886 goto drop;
6887
6888 BT_DBG("chan %p, len %d", chan, skb->len);
6889
6890 if (chan->imtu < skb->len)
6891 goto drop;
6892
6893 if (!chan->ops->recv(chan, skb))
6894 return; 6819 return;
6820 }
6895 6821
6896drop: 6822drop:
6823 l2cap_chan_put(chan);
6824free_skb:
6897 kfree_skb(skb); 6825 kfree_skb(skb);
6898} 6826}
6899 6827
@@ -6942,19 +6870,10 @@ static void l2cap_recv_frame(struct l2cap_conn *conn, struct sk_buff *skb)
6942 l2cap_conless_channel(conn, psm, skb); 6870 l2cap_conless_channel(conn, psm, skb);
6943 break; 6871 break;
6944 6872
6945 case L2CAP_CID_ATT:
6946 l2cap_att_channel(conn, skb);
6947 break;
6948
6949 case L2CAP_CID_LE_SIGNALING: 6873 case L2CAP_CID_LE_SIGNALING:
6950 l2cap_le_sig_channel(conn, skb); 6874 l2cap_le_sig_channel(conn, skb);
6951 break; 6875 break;
6952 6876
6953 case L2CAP_CID_SMP:
6954 if (smp_sig_channel(conn, skb))
6955 l2cap_conn_del(conn->hcon, EACCES);
6956 break;
6957
6958 default: 6877 default:
6959 l2cap_data_channel(conn, cid, skb); 6878 l2cap_data_channel(conn, cid, skb);
6960 break; 6879 break;
@@ -6993,8 +6912,7 @@ static struct l2cap_conn *l2cap_conn_add(struct hci_conn *hcon)
6993 6912
6994 kref_init(&conn->ref); 6913 kref_init(&conn->ref);
6995 hcon->l2cap_data = conn; 6914 hcon->l2cap_data = conn;
6996 conn->hcon = hcon; 6915 conn->hcon = hci_conn_get(hcon);
6997 hci_conn_get(conn->hcon);
6998 conn->hchan = hchan; 6916 conn->hchan = hchan;
6999 6917
7000 BT_DBG("hcon %p conn %p hchan %p", hcon, conn, hchan); 6918 BT_DBG("hcon %p conn %p hchan %p", hcon, conn, hchan);
@@ -7023,13 +6941,11 @@ static struct l2cap_conn *l2cap_conn_add(struct hci_conn *hcon)
7023 INIT_LIST_HEAD(&conn->chan_l); 6941 INIT_LIST_HEAD(&conn->chan_l);
7024 INIT_LIST_HEAD(&conn->users); 6942 INIT_LIST_HEAD(&conn->users);
7025 6943
7026 if (hcon->type == LE_LINK) 6944 INIT_DELAYED_WORK(&conn->info_timer, l2cap_info_timeout);
7027 INIT_DELAYED_WORK(&conn->security_timer, security_timeout);
7028 else
7029 INIT_DELAYED_WORK(&conn->info_timer, l2cap_info_timeout);
7030 6945
7031 skb_queue_head_init(&conn->pending_rx); 6946 skb_queue_head_init(&conn->pending_rx);
7032 INIT_WORK(&conn->pending_rx_work, process_pending_rx); 6947 INIT_WORK(&conn->pending_rx_work, process_pending_rx);
6948 INIT_WORK(&conn->id_addr_update_work, l2cap_conn_update_id_addr);
7033 6949
7034 conn->disc_reason = HCI_ERROR_REMOTE_USER_TERM; 6950 conn->disc_reason = HCI_ERROR_REMOTE_USER_TERM;
7035 6951
@@ -7064,8 +6980,6 @@ int l2cap_chan_connect(struct l2cap_chan *chan, __le16 psm, u16 cid,
7064 6980
7065 hci_dev_lock(hdev); 6981 hci_dev_lock(hdev);
7066 6982
7067 l2cap_chan_lock(chan);
7068
7069 if (!is_valid_psm(__le16_to_cpu(psm), dst_type) && !cid && 6983 if (!is_valid_psm(__le16_to_cpu(psm), dst_type) && !cid &&
7070 chan->chan_type != L2CAP_CHAN_RAW) { 6984 chan->chan_type != L2CAP_CHAN_RAW) {
7071 err = -EINVAL; 6985 err = -EINVAL;
@@ -7162,19 +7076,20 @@ int l2cap_chan_connect(struct l2cap_chan *chan, __le16 psm, u16 cid,
7162 goto done; 7076 goto done;
7163 } 7077 }
7164 7078
7079 mutex_lock(&conn->chan_lock);
7080 l2cap_chan_lock(chan);
7081
7165 if (cid && __l2cap_get_chan_by_dcid(conn, cid)) { 7082 if (cid && __l2cap_get_chan_by_dcid(conn, cid)) {
7166 hci_conn_drop(hcon); 7083 hci_conn_drop(hcon);
7167 err = -EBUSY; 7084 err = -EBUSY;
7168 goto done; 7085 goto chan_unlock;
7169 } 7086 }
7170 7087
7171 /* Update source addr of the socket */ 7088 /* Update source addr of the socket */
7172 bacpy(&chan->src, &hcon->src); 7089 bacpy(&chan->src, &hcon->src);
7173 chan->src_type = bdaddr_type(hcon, hcon->src_type); 7090 chan->src_type = bdaddr_type(hcon, hcon->src_type);
7174 7091
7175 l2cap_chan_unlock(chan); 7092 __l2cap_chan_add(conn, chan);
7176 l2cap_chan_add(conn, chan);
7177 l2cap_chan_lock(chan);
7178 7093
7179 /* l2cap_chan_add takes its own ref so we can drop this one */ 7094 /* l2cap_chan_add takes its own ref so we can drop this one */
7180 hci_conn_drop(hcon); 7095 hci_conn_drop(hcon);
@@ -7200,8 +7115,10 @@ int l2cap_chan_connect(struct l2cap_chan *chan, __le16 psm, u16 cid,
7200 7115
7201 err = 0; 7116 err = 0;
7202 7117
7203done: 7118chan_unlock:
7204 l2cap_chan_unlock(chan); 7119 l2cap_chan_unlock(chan);
7120 mutex_unlock(&conn->chan_lock);
7121done:
7205 hci_dev_unlock(hdev); 7122 hci_dev_unlock(hdev);
7206 hci_dev_put(hdev); 7123 hci_dev_put(hdev);
7207 return err; 7124 return err;
@@ -7239,19 +7156,99 @@ int l2cap_connect_ind(struct hci_dev *hdev, bdaddr_t *bdaddr)
7239 return exact ? lm1 : lm2; 7156 return exact ? lm1 : lm2;
7240} 7157}
7241 7158
7159/* Find the next fixed channel in BT_LISTEN state, continue iteration
7160 * from an existing channel in the list or from the beginning of the
7161 * global list (by passing NULL as first parameter).
7162 */
7163static struct l2cap_chan *l2cap_global_fixed_chan(struct l2cap_chan *c,
7164 bdaddr_t *src, u8 link_type)
7165{
7166 read_lock(&chan_list_lock);
7167
7168 if (c)
7169 c = list_next_entry(c, global_l);
7170 else
7171 c = list_entry(chan_list.next, typeof(*c), global_l);
7172
7173 list_for_each_entry_from(c, &chan_list, global_l) {
7174 if (c->chan_type != L2CAP_CHAN_FIXED)
7175 continue;
7176 if (c->state != BT_LISTEN)
7177 continue;
7178 if (bacmp(&c->src, src) && bacmp(&c->src, BDADDR_ANY))
7179 continue;
7180 if (link_type == ACL_LINK && c->src_type != BDADDR_BREDR)
7181 continue;
7182 if (link_type == LE_LINK && c->src_type == BDADDR_BREDR)
7183 continue;
7184
7185 l2cap_chan_hold(c);
7186 read_unlock(&chan_list_lock);
7187 return c;
7188 }
7189
7190 read_unlock(&chan_list_lock);
7191
7192 return NULL;
7193}
7194
7242void l2cap_connect_cfm(struct hci_conn *hcon, u8 status) 7195void l2cap_connect_cfm(struct hci_conn *hcon, u8 status)
7243{ 7196{
7197 struct hci_dev *hdev = hcon->hdev;
7244 struct l2cap_conn *conn; 7198 struct l2cap_conn *conn;
7199 struct l2cap_chan *pchan;
7200 u8 dst_type;
7245 7201
7246 BT_DBG("hcon %p bdaddr %pMR status %d", hcon, &hcon->dst, status); 7202 BT_DBG("hcon %p bdaddr %pMR status %d", hcon, &hcon->dst, status);
7247 7203
7248 if (!status) { 7204 if (status) {
7249 conn = l2cap_conn_add(hcon);
7250 if (conn)
7251 l2cap_conn_ready(conn);
7252 } else {
7253 l2cap_conn_del(hcon, bt_to_errno(status)); 7205 l2cap_conn_del(hcon, bt_to_errno(status));
7206 return;
7254 } 7207 }
7208
7209 conn = l2cap_conn_add(hcon);
7210 if (!conn)
7211 return;
7212
7213 dst_type = bdaddr_type(hcon, hcon->dst_type);
7214
7215 /* If device is blocked, do not create channels for it */
7216 if (hci_bdaddr_list_lookup(&hdev->blacklist, &hcon->dst, dst_type))
7217 return;
7218
7219 /* Find fixed channels and notify them of the new connection. We
7220 * use multiple individual lookups, continuing each time where
7221 * we left off, because the list lock would prevent calling the
7222 * potentially sleeping l2cap_chan_lock() function.
7223 */
7224 pchan = l2cap_global_fixed_chan(NULL, &hdev->bdaddr, hcon->type);
7225 while (pchan) {
7226 struct l2cap_chan *chan, *next;
7227
7228 /* Client fixed channels should override server ones */
7229 if (__l2cap_get_chan_by_dcid(conn, pchan->scid))
7230 goto next;
7231
7232 l2cap_chan_lock(pchan);
7233 chan = pchan->ops->new_connection(pchan);
7234 if (chan) {
7235 bacpy(&chan->src, &hcon->src);
7236 bacpy(&chan->dst, &hcon->dst);
7237 chan->src_type = bdaddr_type(hcon, hcon->src_type);
7238 chan->dst_type = dst_type;
7239
7240 __l2cap_chan_add(conn, chan);
7241 }
7242
7243 l2cap_chan_unlock(pchan);
7244next:
7245 next = l2cap_global_fixed_chan(pchan, &hdev->bdaddr,
7246 hcon->type);
7247 l2cap_chan_put(pchan);
7248 pchan = next;
7249 }
7250
7251 l2cap_conn_ready(conn);
7255} 7252}
7256 7253
7257int l2cap_disconn_ind(struct hci_conn *hcon) 7254int l2cap_disconn_ind(struct hci_conn *hcon)
@@ -7299,12 +7296,6 @@ int l2cap_security_cfm(struct hci_conn *hcon, u8 status, u8 encrypt)
7299 7296
7300 BT_DBG("conn %p status 0x%2.2x encrypt %u", conn, status, encrypt); 7297 BT_DBG("conn %p status 0x%2.2x encrypt %u", conn, status, encrypt);
7301 7298
7302 if (hcon->type == LE_LINK) {
7303 if (!status && encrypt)
7304 smp_distribute_keys(conn);
7305 cancel_delayed_work(&conn->security_timer);
7306 }
7307
7308 mutex_lock(&conn->chan_lock); 7299 mutex_lock(&conn->chan_lock);
7309 7300
7310 list_for_each_entry(chan, &conn->chan_l, list) { 7301 list_for_each_entry(chan, &conn->chan_l, list) {
@@ -7318,15 +7309,8 @@ int l2cap_security_cfm(struct hci_conn *hcon, u8 status, u8 encrypt)
7318 continue; 7309 continue;
7319 } 7310 }
7320 7311
7321 if (chan->scid == L2CAP_CID_ATT) { 7312 if (!status && encrypt)
7322 if (!status && encrypt) { 7313 chan->sec_level = hcon->sec_level;
7323 chan->sec_level = hcon->sec_level;
7324 l2cap_chan_ready(chan);
7325 }
7326
7327 l2cap_chan_unlock(chan);
7328 continue;
7329 }
7330 7314
7331 if (!__l2cap_no_conn_pending(chan)) { 7315 if (!__l2cap_no_conn_pending(chan)) {
7332 l2cap_chan_unlock(chan); 7316 l2cap_chan_unlock(chan);
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 1884f72083c2..31f106e61ca2 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -99,15 +99,6 @@ static int l2cap_sock_bind(struct socket *sock, struct sockaddr *addr, int alen)
99 if (!bdaddr_type_is_valid(la.l2_bdaddr_type)) 99 if (!bdaddr_type_is_valid(la.l2_bdaddr_type))
100 return -EINVAL; 100 return -EINVAL;
101 101
102 if (la.l2_cid) {
103 /* When the socket gets created it defaults to
104 * CHAN_CONN_ORIENTED, so we need to overwrite the
105 * default here.
106 */
107 chan->chan_type = L2CAP_CHAN_FIXED;
108 chan->omtu = L2CAP_DEFAULT_MTU;
109 }
110
111 if (bdaddr_type_is_le(la.l2_bdaddr_type)) { 102 if (bdaddr_type_is_le(la.l2_bdaddr_type)) {
112 /* We only allow ATT user space socket */ 103 /* We only allow ATT user space socket */
113 if (la.l2_cid && 104 if (la.l2_cid &&
@@ -155,6 +146,14 @@ static int l2cap_sock_bind(struct socket *sock, struct sockaddr *addr, int alen)
155 case L2CAP_CHAN_RAW: 146 case L2CAP_CHAN_RAW:
156 chan->sec_level = BT_SECURITY_SDP; 147 chan->sec_level = BT_SECURITY_SDP;
157 break; 148 break;
149 case L2CAP_CHAN_FIXED:
150 /* Fixed channels default to the L2CAP core not holding a
151 * hci_conn reference for them. For fixed channels mapping to
152 * L2CAP sockets we do want to hold a reference so set the
153 * appropriate flag to request it.
154 */
155 set_bit(FLAG_HOLD_HCI_CONN, &chan->flags);
156 break;
158 } 157 }
159 158
160 bacpy(&chan->src, &la.l2_bdaddr); 159 bacpy(&chan->src, &la.l2_bdaddr);
@@ -790,6 +789,7 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname,
790 if (chan->scid == L2CAP_CID_ATT) { 789 if (chan->scid == L2CAP_CID_ATT) {
791 if (smp_conn_security(conn->hcon, sec.level)) 790 if (smp_conn_security(conn->hcon, sec.level))
792 break; 791 break;
792 set_bit(FLAG_PENDING_SECURITY, &chan->flags);
793 sk->sk_state = BT_CONFIG; 793 sk->sk_state = BT_CONFIG;
794 chan->state = BT_CONFIG; 794 chan->state = BT_CONFIG;
795 795
@@ -1359,6 +1359,11 @@ static void l2cap_sock_resume_cb(struct l2cap_chan *chan)
1359{ 1359{
1360 struct sock *sk = chan->data; 1360 struct sock *sk = chan->data;
1361 1361
1362 if (test_and_clear_bit(FLAG_PENDING_SECURITY, &chan->flags)) {
1363 sk->sk_state = BT_CONNECTED;
1364 chan->state = BT_CONNECTED;
1365 }
1366
1362 clear_bit(BT_SK_SUSPEND, &bt_sk(sk)->flags); 1367 clear_bit(BT_SK_SUSPEND, &bt_sk(sk)->flags);
1363 sk->sk_state_change(sk); 1368 sk->sk_state_change(sk);
1364} 1369}
diff --git a/net/bluetooth/lib.c b/net/bluetooth/lib.c
index 941ad7530eda..b36bc0415854 100644
--- a/net/bluetooth/lib.c
+++ b/net/bluetooth/lib.c
@@ -135,40 +135,34 @@ int bt_to_errno(__u16 code)
135} 135}
136EXPORT_SYMBOL(bt_to_errno); 136EXPORT_SYMBOL(bt_to_errno);
137 137
138int bt_info(const char *format, ...) 138void bt_info(const char *format, ...)
139{ 139{
140 struct va_format vaf; 140 struct va_format vaf;
141 va_list args; 141 va_list args;
142 int r;
143 142
144 va_start(args, format); 143 va_start(args, format);
145 144
146 vaf.fmt = format; 145 vaf.fmt = format;
147 vaf.va = &args; 146 vaf.va = &args;
148 147
149 r = pr_info("%pV", &vaf); 148 pr_info("%pV", &vaf);
150 149
151 va_end(args); 150 va_end(args);
152
153 return r;
154} 151}
155EXPORT_SYMBOL(bt_info); 152EXPORT_SYMBOL(bt_info);
156 153
157int bt_err(const char *format, ...) 154void bt_err(const char *format, ...)
158{ 155{
159 struct va_format vaf; 156 struct va_format vaf;
160 va_list args; 157 va_list args;
161 int r;
162 158
163 va_start(args, format); 159 va_start(args, format);
164 160
165 vaf.fmt = format; 161 vaf.fmt = format;
166 vaf.va = &args; 162 vaf.va = &args;
167 163
168 r = pr_err("%pV", &vaf); 164 pr_err("%pV", &vaf);
169 165
170 va_end(args); 166 va_end(args);
171
172 return r;
173} 167}
174EXPORT_SYMBOL(bt_err); 168EXPORT_SYMBOL(bt_err);
diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index b8554d429d88..efb71b022ab6 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -129,9 +129,6 @@ static const u16 mgmt_events[] = {
129 129
130#define CACHE_TIMEOUT msecs_to_jiffies(2 * 1000) 130#define CACHE_TIMEOUT msecs_to_jiffies(2 * 1000)
131 131
132#define hdev_is_powered(hdev) (test_bit(HCI_UP, &hdev->flags) && \
133 !test_bit(HCI_AUTO_OFF, &hdev->dev_flags))
134
135struct pending_cmd { 132struct pending_cmd {
136 struct list_head list; 133 struct list_head list;
137 u16 opcode; 134 u16 opcode;
@@ -1536,9 +1533,11 @@ static void set_discoverable_complete(struct hci_dev *hdev, u8 status)
1536 1533
1537 /* When the discoverable mode gets changed, make sure 1534 /* When the discoverable mode gets changed, make sure
1538 * that class of device has the limited discoverable 1535 * that class of device has the limited discoverable
1539 * bit correctly set. 1536 * bit correctly set. Also update page scan based on whitelist
1537 * entries.
1540 */ 1538 */
1541 hci_req_init(&req, hdev); 1539 hci_req_init(&req, hdev);
1540 hci_update_page_scan(hdev, &req);
1542 update_class(&req); 1541 update_class(&req);
1543 hci_req_run(&req, NULL); 1542 hci_req_run(&req, NULL);
1544 1543
@@ -1785,6 +1784,7 @@ static void set_connectable_complete(struct hci_dev *hdev, u8 status)
1785 1784
1786 if (conn_changed || discov_changed) { 1785 if (conn_changed || discov_changed) {
1787 new_settings(hdev, cmd->sk); 1786 new_settings(hdev, cmd->sk);
1787 hci_update_page_scan(hdev, NULL);
1788 if (discov_changed) 1788 if (discov_changed)
1789 mgmt_update_adv_data(hdev); 1789 mgmt_update_adv_data(hdev);
1790 hci_update_background_scan(hdev); 1790 hci_update_background_scan(hdev);
@@ -1818,6 +1818,7 @@ static int set_connectable_update_settings(struct hci_dev *hdev,
1818 return err; 1818 return err;
1819 1819
1820 if (changed) { 1820 if (changed) {
1821 hci_update_page_scan(hdev, NULL);
1821 hci_update_background_scan(hdev); 1822 hci_update_background_scan(hdev);
1822 return new_settings(hdev, sk); 1823 return new_settings(hdev, sk);
1823 } 1824 }
@@ -2787,7 +2788,6 @@ static int disconnect(struct sock *sk, struct hci_dev *hdev, void *data,
2787{ 2788{
2788 struct mgmt_cp_disconnect *cp = data; 2789 struct mgmt_cp_disconnect *cp = data;
2789 struct mgmt_rp_disconnect rp; 2790 struct mgmt_rp_disconnect rp;
2790 struct hci_cp_disconnect dc;
2791 struct pending_cmd *cmd; 2791 struct pending_cmd *cmd;
2792 struct hci_conn *conn; 2792 struct hci_conn *conn;
2793 int err; 2793 int err;
@@ -2835,10 +2835,7 @@ static int disconnect(struct sock *sk, struct hci_dev *hdev, void *data,
2835 goto failed; 2835 goto failed;
2836 } 2836 }
2837 2837
2838 dc.handle = cpu_to_le16(conn->handle); 2838 err = hci_disconnect(conn, HCI_ERROR_REMOTE_USER_TERM);
2839 dc.reason = HCI_ERROR_REMOTE_USER_TERM;
2840
2841 err = hci_send_cmd(hdev, HCI_OP_DISCONNECT, sizeof(dc), &dc);
2842 if (err < 0) 2839 if (err < 0)
2843 mgmt_pending_remove(cmd); 2840 mgmt_pending_remove(cmd);
2844 2841
@@ -3062,6 +3059,7 @@ static void pairing_complete(struct pending_cmd *cmd, u8 status)
3062 conn->disconn_cfm_cb = NULL; 3059 conn->disconn_cfm_cb = NULL;
3063 3060
3064 hci_conn_drop(conn); 3061 hci_conn_drop(conn);
3062 hci_conn_put(conn);
3065 3063
3066 mgmt_pending_remove(cmd); 3064 mgmt_pending_remove(cmd);
3067} 3065}
@@ -3211,7 +3209,7 @@ static int pair_device(struct sock *sk, struct hci_dev *hdev, void *data,
3211 } 3209 }
3212 3210
3213 conn->io_capability = cp->io_cap; 3211 conn->io_capability = cp->io_cap;
3214 cmd->user_data = conn; 3212 cmd->user_data = hci_conn_get(conn);
3215 3213
3216 if ((conn->state == BT_CONNECTED || conn->state == BT_CONFIG) && 3214 if ((conn->state == BT_CONNECTED || conn->state == BT_CONFIG) &&
3217 hci_conn_security(conn, sec_level, auth_type, true)) 3215 hci_conn_security(conn, sec_level, auth_type, true))
@@ -4381,27 +4379,6 @@ unlock:
4381 return err; 4379 return err;
4382} 4380}
4383 4381
4384static void set_bredr_scan(struct hci_request *req)
4385{
4386 struct hci_dev *hdev = req->hdev;
4387 u8 scan = 0;
4388
4389 /* Ensure that fast connectable is disabled. This function will
4390 * not do anything if the page scan parameters are already what
4391 * they should be.
4392 */
4393 write_fast_connectable(req, false);
4394
4395 if (test_bit(HCI_CONNECTABLE, &hdev->dev_flags) ||
4396 !list_empty(&hdev->whitelist))
4397 scan |= SCAN_PAGE;
4398 if (test_bit(HCI_DISCOVERABLE, &hdev->dev_flags))
4399 scan |= SCAN_INQUIRY;
4400
4401 if (scan)
4402 hci_req_add(req, HCI_OP_WRITE_SCAN_ENABLE, 1, &scan);
4403}
4404
4405static void set_bredr_complete(struct hci_dev *hdev, u8 status) 4382static void set_bredr_complete(struct hci_dev *hdev, u8 status)
4406{ 4383{
4407 struct pending_cmd *cmd; 4384 struct pending_cmd *cmd;
@@ -4507,9 +4484,8 @@ static int set_bredr(struct sock *sk, struct hci_dev *hdev, void *data, u16 len)
4507 4484
4508 hci_req_init(&req, hdev); 4485 hci_req_init(&req, hdev);
4509 4486
4510 if (test_bit(HCI_CONNECTABLE, &hdev->dev_flags) || 4487 write_fast_connectable(&req, false);
4511 !list_empty(&hdev->whitelist)) 4488 hci_update_page_scan(hdev, &req);
4512 set_bredr_scan(&req);
4513 4489
4514 /* Since only the advertising data flags will change, there 4490 /* Since only the advertising data flags will change, there
4515 * is no need to update the scan response data. 4491 * is no need to update the scan response data.
@@ -4935,6 +4911,7 @@ static void get_conn_info_complete(struct pending_cmd *cmd, void *data)
4935 match->mgmt_status, &rp, sizeof(rp)); 4911 match->mgmt_status, &rp, sizeof(rp));
4936 4912
4937 hci_conn_drop(conn); 4913 hci_conn_drop(conn);
4914 hci_conn_put(conn);
4938 4915
4939 mgmt_pending_remove(cmd); 4916 mgmt_pending_remove(cmd);
4940} 4917}
@@ -5091,7 +5068,7 @@ static int get_conn_info(struct sock *sk, struct hci_dev *hdev, void *data,
5091 } 5068 }
5092 5069
5093 hci_conn_hold(conn); 5070 hci_conn_hold(conn);
5094 cmd->user_data = conn; 5071 cmd->user_data = hci_conn_get(conn);
5095 5072
5096 conn->conn_info_timestamp = jiffies; 5073 conn->conn_info_timestamp = jiffies;
5097 } else { 5074 } else {
@@ -5155,8 +5132,10 @@ send_rsp:
5155 cmd_complete(cmd->sk, cmd->index, cmd->opcode, mgmt_status(status), 5132 cmd_complete(cmd->sk, cmd->index, cmd->opcode, mgmt_status(status),
5156 &rp, sizeof(rp)); 5133 &rp, sizeof(rp));
5157 mgmt_pending_remove(cmd); 5134 mgmt_pending_remove(cmd);
5158 if (conn) 5135 if (conn) {
5159 hci_conn_drop(conn); 5136 hci_conn_drop(conn);
5137 hci_conn_put(conn);
5138 }
5160 5139
5161unlock: 5140unlock:
5162 hci_dev_unlock(hdev); 5141 hci_dev_unlock(hdev);
@@ -5219,7 +5198,7 @@ static int get_clock_info(struct sock *sk, struct hci_dev *hdev, void *data,
5219 5198
5220 if (conn) { 5199 if (conn) {
5221 hci_conn_hold(conn); 5200 hci_conn_hold(conn);
5222 cmd->user_data = conn; 5201 cmd->user_data = hci_conn_get(conn);
5223 5202
5224 hci_cp.handle = cpu_to_le16(conn->handle); 5203 hci_cp.handle = cpu_to_le16(conn->handle);
5225 hci_cp.which = 0x01; /* Piconet clock */ 5204 hci_cp.which = 0x01; /* Piconet clock */
@@ -5235,27 +5214,6 @@ unlock:
5235 return err; 5214 return err;
5236} 5215}
5237 5216
5238/* Helper for Add/Remove Device commands */
5239static void update_page_scan(struct hci_dev *hdev, u8 scan)
5240{
5241 if (!test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags))
5242 return;
5243
5244 if (!hdev_is_powered(hdev))
5245 return;
5246
5247 /* If HCI_CONNECTABLE is set then Add/Remove Device should not
5248 * make any changes to page scanning.
5249 */
5250 if (test_bit(HCI_CONNECTABLE, &hdev->dev_flags))
5251 return;
5252
5253 if (test_bit(HCI_DISCOVERABLE, &hdev->dev_flags))
5254 scan |= SCAN_INQUIRY;
5255
5256 hci_send_cmd(hdev, HCI_OP_WRITE_SCAN_ENABLE, 1, &scan);
5257}
5258
5259static void device_added(struct sock *sk, struct hci_dev *hdev, 5217static void device_added(struct sock *sk, struct hci_dev *hdev,
5260 bdaddr_t *bdaddr, u8 type, u8 action) 5218 bdaddr_t *bdaddr, u8 type, u8 action)
5261{ 5219{
@@ -5291,8 +5249,6 @@ static int add_device(struct sock *sk, struct hci_dev *hdev,
5291 hci_dev_lock(hdev); 5249 hci_dev_lock(hdev);
5292 5250
5293 if (cp->addr.type == BDADDR_BREDR) { 5251 if (cp->addr.type == BDADDR_BREDR) {
5294 bool update_scan;
5295
5296 /* Only incoming connections action is supported for now */ 5252 /* Only incoming connections action is supported for now */
5297 if (cp->action != 0x01) { 5253 if (cp->action != 0x01) {
5298 err = cmd_complete(sk, hdev->id, MGMT_OP_ADD_DEVICE, 5254 err = cmd_complete(sk, hdev->id, MGMT_OP_ADD_DEVICE,
@@ -5301,15 +5257,12 @@ static int add_device(struct sock *sk, struct hci_dev *hdev,
5301 goto unlock; 5257 goto unlock;
5302 } 5258 }
5303 5259
5304 update_scan = list_empty(&hdev->whitelist);
5305
5306 err = hci_bdaddr_list_add(&hdev->whitelist, &cp->addr.bdaddr, 5260 err = hci_bdaddr_list_add(&hdev->whitelist, &cp->addr.bdaddr,
5307 cp->addr.type); 5261 cp->addr.type);
5308 if (err) 5262 if (err)
5309 goto unlock; 5263 goto unlock;
5310 5264
5311 if (update_scan) 5265 hci_update_page_scan(hdev, NULL);
5312 update_page_scan(hdev, SCAN_PAGE);
5313 5266
5314 goto added; 5267 goto added;
5315 } 5268 }
@@ -5392,8 +5345,7 @@ static int remove_device(struct sock *sk, struct hci_dev *hdev,
5392 goto unlock; 5345 goto unlock;
5393 } 5346 }
5394 5347
5395 if (list_empty(&hdev->whitelist)) 5348 hci_update_page_scan(hdev, NULL);
5396 update_page_scan(hdev, SCAN_DISABLED);
5397 5349
5398 device_removed(sk, hdev, &cp->addr.bdaddr, 5350 device_removed(sk, hdev, &cp->addr.bdaddr,
5399 cp->addr.type); 5351 cp->addr.type);
@@ -5444,7 +5396,7 @@ static int remove_device(struct sock *sk, struct hci_dev *hdev,
5444 kfree(b); 5396 kfree(b);
5445 } 5397 }
5446 5398
5447 update_page_scan(hdev, SCAN_DISABLED); 5399 hci_update_page_scan(hdev, NULL);
5448 5400
5449 list_for_each_entry_safe(p, tmp, &hdev->le_conn_params, list) { 5401 list_for_each_entry_safe(p, tmp, &hdev->le_conn_params, list) {
5450 if (p->auto_connect == HCI_AUTO_CONN_DISABLED) 5402 if (p->auto_connect == HCI_AUTO_CONN_DISABLED)
@@ -5969,8 +5921,8 @@ static int powered_update_hci(struct hci_dev *hdev)
5969 sizeof(link_sec), &link_sec); 5921 sizeof(link_sec), &link_sec);
5970 5922
5971 if (lmp_bredr_capable(hdev)) { 5923 if (lmp_bredr_capable(hdev)) {
5972 if (test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) 5924 write_fast_connectable(&req, false);
5973 set_bredr_scan(&req); 5925 hci_update_page_scan(hdev, &req);
5974 update_class(&req); 5926 update_class(&req);
5975 update_name(&req); 5927 update_name(&req);
5976 update_eir(&req); 5928 update_eir(&req);
@@ -6281,25 +6233,35 @@ static void unpair_device_rsp(struct pending_cmd *cmd, void *data)
6281 mgmt_pending_remove(cmd); 6233 mgmt_pending_remove(cmd);
6282} 6234}
6283 6235
6236bool mgmt_powering_down(struct hci_dev *hdev)
6237{
6238 struct pending_cmd *cmd;
6239 struct mgmt_mode *cp;
6240
6241 cmd = mgmt_pending_find(MGMT_OP_SET_POWERED, hdev);
6242 if (!cmd)
6243 return false;
6244
6245 cp = cmd->param;
6246 if (!cp->val)
6247 return true;
6248
6249 return false;
6250}
6251
6284void mgmt_device_disconnected(struct hci_dev *hdev, bdaddr_t *bdaddr, 6252void mgmt_device_disconnected(struct hci_dev *hdev, bdaddr_t *bdaddr,
6285 u8 link_type, u8 addr_type, u8 reason, 6253 u8 link_type, u8 addr_type, u8 reason,
6286 bool mgmt_connected) 6254 bool mgmt_connected)
6287{ 6255{
6288 struct mgmt_ev_device_disconnected ev; 6256 struct mgmt_ev_device_disconnected ev;
6289 struct pending_cmd *power_off;
6290 struct sock *sk = NULL; 6257 struct sock *sk = NULL;
6291 6258
6292 power_off = mgmt_pending_find(MGMT_OP_SET_POWERED, hdev); 6259 /* The connection is still in hci_conn_hash so test for 1
6293 if (power_off) { 6260 * instead of 0 to know if this is the last one.
6294 struct mgmt_mode *cp = power_off->param; 6261 */
6295 6262 if (mgmt_powering_down(hdev) && hci_conn_count(hdev) == 1) {
6296 /* The connection is still in hci_conn_hash so test for 1 6263 cancel_delayed_work(&hdev->power_off);
6297 * instead of 0 to know if this is the last one. 6264 queue_work(hdev->req_workqueue, &hdev->power_off.work);
6298 */
6299 if (!cp->val && hci_conn_count(hdev) == 1) {
6300 cancel_delayed_work(&hdev->power_off);
6301 queue_work(hdev->req_workqueue, &hdev->power_off.work);
6302 }
6303 } 6265 }
6304 6266
6305 if (!mgmt_connected) 6267 if (!mgmt_connected)
@@ -6359,19 +6321,13 @@ void mgmt_connect_failed(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type,
6359 u8 addr_type, u8 status) 6321 u8 addr_type, u8 status)
6360{ 6322{
6361 struct mgmt_ev_connect_failed ev; 6323 struct mgmt_ev_connect_failed ev;
6362 struct pending_cmd *power_off;
6363
6364 power_off = mgmt_pending_find(MGMT_OP_SET_POWERED, hdev);
6365 if (power_off) {
6366 struct mgmt_mode *cp = power_off->param;
6367 6324
6368 /* The connection is still in hci_conn_hash so test for 1 6325 /* The connection is still in hci_conn_hash so test for 1
6369 * instead of 0 to know if this is the last one. 6326 * instead of 0 to know if this is the last one.
6370 */ 6327 */
6371 if (!cp->val && hci_conn_count(hdev) == 1) { 6328 if (mgmt_powering_down(hdev) && hci_conn_count(hdev) == 1) {
6372 cancel_delayed_work(&hdev->power_off); 6329 cancel_delayed_work(&hdev->power_off);
6373 queue_work(hdev->req_workqueue, &hdev->power_off.work); 6330 queue_work(hdev->req_workqueue, &hdev->power_off.work);
6374 }
6375 } 6331 }
6376 6332
6377 bacpy(&ev.addr.bdaddr, bdaddr); 6333 bacpy(&ev.addr.bdaddr, bdaddr);
@@ -6529,16 +6485,23 @@ int mgmt_user_passkey_notify(struct hci_dev *hdev, bdaddr_t *bdaddr,
6529 return mgmt_event(MGMT_EV_PASSKEY_NOTIFY, hdev, &ev, sizeof(ev), NULL); 6485 return mgmt_event(MGMT_EV_PASSKEY_NOTIFY, hdev, &ev, sizeof(ev), NULL);
6530} 6486}
6531 6487
6532void mgmt_auth_failed(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, 6488void mgmt_auth_failed(struct hci_conn *conn, u8 hci_status)
6533 u8 addr_type, u8 status)
6534{ 6489{
6535 struct mgmt_ev_auth_failed ev; 6490 struct mgmt_ev_auth_failed ev;
6491 struct pending_cmd *cmd;
6492 u8 status = mgmt_status(hci_status);
6536 6493
6537 bacpy(&ev.addr.bdaddr, bdaddr); 6494 bacpy(&ev.addr.bdaddr, &conn->dst);
6538 ev.addr.type = link_to_bdaddr(link_type, addr_type); 6495 ev.addr.type = link_to_bdaddr(conn->type, conn->dst_type);
6539 ev.status = mgmt_status(status); 6496 ev.status = status;
6540 6497
6541 mgmt_event(MGMT_EV_AUTH_FAILED, hdev, &ev, sizeof(ev), NULL); 6498 cmd = find_pairing(conn);
6499
6500 mgmt_event(MGMT_EV_AUTH_FAILED, conn->hdev, &ev, sizeof(ev),
6501 cmd ? cmd->sk : NULL);
6502
6503 if (cmd)
6504 pairing_complete(cmd, status);
6542} 6505}
6543 6506
6544void mgmt_auth_enable_complete(struct hci_dev *hdev, u8 status) 6507void mgmt_auth_enable_complete(struct hci_dev *hdev, u8 status)
diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c
index fd3294300803..f09b6b65cf6b 100644
--- a/net/bluetooth/smp.c
+++ b/net/bluetooth/smp.c
@@ -31,9 +31,12 @@
31 31
32#include "smp.h" 32#include "smp.h"
33 33
34#define SMP_ALLOW_CMD(smp, code) set_bit(code, &smp->allow_cmd)
35
34#define SMP_TIMEOUT msecs_to_jiffies(30000) 36#define SMP_TIMEOUT msecs_to_jiffies(30000)
35 37
36#define AUTH_REQ_MASK 0x07 38#define AUTH_REQ_MASK 0x07
39#define KEY_DIST_MASK 0x07
37 40
38enum { 41enum {
39 SMP_FLAG_TK_VALID, 42 SMP_FLAG_TK_VALID,
@@ -44,7 +47,10 @@ enum {
44}; 47};
45 48
46struct smp_chan { 49struct smp_chan {
47 struct l2cap_conn *conn; 50 struct l2cap_conn *conn;
51 struct delayed_work security_timer;
52 unsigned long allow_cmd; /* Bitmask of allowed commands */
53
48 u8 preq[7]; /* SMP Pairing Request */ 54 u8 preq[7]; /* SMP Pairing Request */
49 u8 prsp[7]; /* SMP Pairing Response */ 55 u8 prsp[7]; /* SMP Pairing Response */
50 u8 prnd[16]; /* SMP Pairing Random (local) */ 56 u8 prnd[16]; /* SMP Pairing Random (local) */
@@ -139,12 +145,18 @@ static int smp_ah(struct crypto_blkcipher *tfm, u8 irk[16], u8 r[3], u8 res[3])
139 return 0; 145 return 0;
140} 146}
141 147
142bool smp_irk_matches(struct crypto_blkcipher *tfm, u8 irk[16], 148bool smp_irk_matches(struct hci_dev *hdev, u8 irk[16], bdaddr_t *bdaddr)
143 bdaddr_t *bdaddr)
144{ 149{
150 struct l2cap_chan *chan = hdev->smp_data;
151 struct crypto_blkcipher *tfm;
145 u8 hash[3]; 152 u8 hash[3];
146 int err; 153 int err;
147 154
155 if (!chan || !chan->data)
156 return false;
157
158 tfm = chan->data;
159
148 BT_DBG("RPA %pMR IRK %*phN", bdaddr, 16, irk); 160 BT_DBG("RPA %pMR IRK %*phN", bdaddr, 16, irk);
149 161
150 err = smp_ah(tfm, irk, &bdaddr->b[3], hash); 162 err = smp_ah(tfm, irk, &bdaddr->b[3], hash);
@@ -154,10 +166,17 @@ bool smp_irk_matches(struct crypto_blkcipher *tfm, u8 irk[16],
154 return !memcmp(bdaddr->b, hash, 3); 166 return !memcmp(bdaddr->b, hash, 3);
155} 167}
156 168
157int smp_generate_rpa(struct crypto_blkcipher *tfm, u8 irk[16], bdaddr_t *rpa) 169int smp_generate_rpa(struct hci_dev *hdev, u8 irk[16], bdaddr_t *rpa)
158{ 170{
171 struct l2cap_chan *chan = hdev->smp_data;
172 struct crypto_blkcipher *tfm;
159 int err; 173 int err;
160 174
175 if (!chan || !chan->data)
176 return -EOPNOTSUPP;
177
178 tfm = chan->data;
179
161 get_random_bytes(&rpa->b[3], 3); 180 get_random_bytes(&rpa->b[3], 3);
162 181
163 rpa->b[5] &= 0x3f; /* Clear two most significant bits */ 182 rpa->b[5] &= 0x3f; /* Clear two most significant bits */
@@ -235,47 +254,38 @@ static int smp_s1(struct smp_chan *smp, u8 k[16], u8 r1[16], u8 r2[16],
235 return err; 254 return err;
236} 255}
237 256
238static struct sk_buff *smp_build_cmd(struct l2cap_conn *conn, u8 code, 257static void smp_send_cmd(struct l2cap_conn *conn, u8 code, u16 len, void *data)
239 u16 dlen, void *data)
240{ 258{
241 struct sk_buff *skb; 259 struct l2cap_chan *chan = conn->smp;
242 struct l2cap_hdr *lh; 260 struct smp_chan *smp;
243 int len; 261 struct kvec iv[2];
244 262 struct msghdr msg;
245 len = L2CAP_HDR_SIZE + sizeof(code) + dlen;
246
247 if (len > conn->mtu)
248 return NULL;
249 263
250 skb = bt_skb_alloc(len, GFP_ATOMIC); 264 if (!chan)
251 if (!skb) 265 return;
252 return NULL;
253 266
254 lh = (struct l2cap_hdr *) skb_put(skb, L2CAP_HDR_SIZE); 267 BT_DBG("code 0x%2.2x", code);
255 lh->len = cpu_to_le16(sizeof(code) + dlen);
256 lh->cid = cpu_to_le16(L2CAP_CID_SMP);
257 268
258 memcpy(skb_put(skb, sizeof(code)), &code, sizeof(code)); 269 iv[0].iov_base = &code;
270 iv[0].iov_len = 1;
259 271
260 memcpy(skb_put(skb, dlen), data, dlen); 272 iv[1].iov_base = data;
273 iv[1].iov_len = len;
261 274
262 return skb; 275 memset(&msg, 0, sizeof(msg));
263}
264 276
265static void smp_send_cmd(struct l2cap_conn *conn, u8 code, u16 len, void *data) 277 msg.msg_iov = (struct iovec *) &iv;
266{ 278 msg.msg_iovlen = 2;
267 struct sk_buff *skb = smp_build_cmd(conn, code, len, data);
268 279
269 BT_DBG("code 0x%2.2x", code); 280 l2cap_chan_send(chan, &msg, 1 + len);
270 281
271 if (!skb) 282 if (!chan->data)
272 return; 283 return;
273 284
274 skb->priority = HCI_PRIO_MAX; 285 smp = chan->data;
275 hci_send_acl(conn->hchan, skb, 0);
276 286
277 cancel_delayed_work_sync(&conn->security_timer); 287 cancel_delayed_work_sync(&smp->security_timer);
278 schedule_delayed_work(&conn->security_timer, SMP_TIMEOUT); 288 schedule_delayed_work(&smp->security_timer, SMP_TIMEOUT);
279} 289}
280 290
281static __u8 authreq_to_seclevel(__u8 authreq) 291static __u8 authreq_to_seclevel(__u8 authreq)
@@ -302,7 +312,8 @@ static void build_pairing_cmd(struct l2cap_conn *conn,
302 struct smp_cmd_pairing *req, 312 struct smp_cmd_pairing *req,
303 struct smp_cmd_pairing *rsp, __u8 authreq) 313 struct smp_cmd_pairing *rsp, __u8 authreq)
304{ 314{
305 struct smp_chan *smp = conn->smp_chan; 315 struct l2cap_chan *chan = conn->smp;
316 struct smp_chan *smp = chan->data;
306 struct hci_conn *hcon = conn->hcon; 317 struct hci_conn *hcon = conn->hcon;
307 struct hci_dev *hdev = hcon->hdev; 318 struct hci_dev *hdev = hcon->hdev;
308 u8 local_dist = 0, remote_dist = 0; 319 u8 local_dist = 0, remote_dist = 0;
@@ -345,7 +356,8 @@ static void build_pairing_cmd(struct l2cap_conn *conn,
345 356
346static u8 check_enc_key_size(struct l2cap_conn *conn, __u8 max_key_size) 357static u8 check_enc_key_size(struct l2cap_conn *conn, __u8 max_key_size)
347{ 358{
348 struct smp_chan *smp = conn->smp_chan; 359 struct l2cap_chan *chan = conn->smp;
360 struct smp_chan *smp = chan->data;
349 361
350 if ((max_key_size > SMP_MAX_ENC_KEY_SIZE) || 362 if ((max_key_size > SMP_MAX_ENC_KEY_SIZE) ||
351 (max_key_size < SMP_MIN_ENC_KEY_SIZE)) 363 (max_key_size < SMP_MIN_ENC_KEY_SIZE))
@@ -356,21 +368,60 @@ static u8 check_enc_key_size(struct l2cap_conn *conn, __u8 max_key_size)
356 return 0; 368 return 0;
357} 369}
358 370
371static void smp_chan_destroy(struct l2cap_conn *conn)
372{
373 struct l2cap_chan *chan = conn->smp;
374 struct smp_chan *smp = chan->data;
375 bool complete;
376
377 BUG_ON(!smp);
378
379 cancel_delayed_work_sync(&smp->security_timer);
380
381 complete = test_bit(SMP_FLAG_COMPLETE, &smp->flags);
382 mgmt_smp_complete(conn->hcon, complete);
383
384 kfree(smp->csrk);
385 kfree(smp->slave_csrk);
386
387 crypto_free_blkcipher(smp->tfm_aes);
388
389 /* If pairing failed clean up any keys we might have */
390 if (!complete) {
391 if (smp->ltk) {
392 list_del(&smp->ltk->list);
393 kfree(smp->ltk);
394 }
395
396 if (smp->slave_ltk) {
397 list_del(&smp->slave_ltk->list);
398 kfree(smp->slave_ltk);
399 }
400
401 if (smp->remote_irk) {
402 list_del(&smp->remote_irk->list);
403 kfree(smp->remote_irk);
404 }
405 }
406
407 chan->data = NULL;
408 kfree(smp);
409 hci_conn_drop(conn->hcon);
410}
411
359static void smp_failure(struct l2cap_conn *conn, u8 reason) 412static void smp_failure(struct l2cap_conn *conn, u8 reason)
360{ 413{
361 struct hci_conn *hcon = conn->hcon; 414 struct hci_conn *hcon = conn->hcon;
415 struct l2cap_chan *chan = conn->smp;
362 416
363 if (reason) 417 if (reason)
364 smp_send_cmd(conn, SMP_CMD_PAIRING_FAIL, sizeof(reason), 418 smp_send_cmd(conn, SMP_CMD_PAIRING_FAIL, sizeof(reason),
365 &reason); 419 &reason);
366 420
367 clear_bit(HCI_CONN_ENCRYPT_PEND, &hcon->flags); 421 clear_bit(HCI_CONN_ENCRYPT_PEND, &hcon->flags);
368 mgmt_auth_failed(hcon->hdev, &hcon->dst, hcon->type, hcon->dst_type, 422 mgmt_auth_failed(hcon, HCI_ERROR_AUTH_FAILURE);
369 HCI_ERROR_AUTH_FAILURE);
370
371 cancel_delayed_work_sync(&conn->security_timer);
372 423
373 if (test_and_clear_bit(HCI_CONN_LE_SMP_PEND, &hcon->flags)) 424 if (chan->data)
374 smp_chan_destroy(conn); 425 smp_chan_destroy(conn);
375} 426}
376 427
@@ -405,7 +456,8 @@ static int tk_request(struct l2cap_conn *conn, u8 remote_oob, u8 auth,
405 u8 local_io, u8 remote_io) 456 u8 local_io, u8 remote_io)
406{ 457{
407 struct hci_conn *hcon = conn->hcon; 458 struct hci_conn *hcon = conn->hcon;
408 struct smp_chan *smp = conn->smp_chan; 459 struct l2cap_chan *chan = conn->smp;
460 struct smp_chan *smp = chan->data;
409 u8 method; 461 u8 method;
410 u32 passkey = 0; 462 u32 passkey = 0;
411 int ret = 0; 463 int ret = 0;
@@ -442,8 +494,11 @@ static int tk_request(struct l2cap_conn *conn, u8 remote_oob, u8 auth,
442 } 494 }
443 495
444 /* Not Just Works/Confirm results in MITM Authentication */ 496 /* Not Just Works/Confirm results in MITM Authentication */
445 if (method != JUST_CFM) 497 if (method != JUST_CFM) {
446 set_bit(SMP_FLAG_MITM_AUTH, &smp->flags); 498 set_bit(SMP_FLAG_MITM_AUTH, &smp->flags);
499 if (hcon->pending_sec_level < BT_SECURITY_HIGH)
500 hcon->pending_sec_level = BT_SECURITY_HIGH;
501 }
447 502
448 /* If both devices have Keyoard-Display I/O, the master 503 /* If both devices have Keyoard-Display I/O, the master
449 * Confirms and the slave Enters the passkey. 504 * Confirms and the slave Enters the passkey.
@@ -503,6 +558,11 @@ static u8 smp_confirm(struct smp_chan *smp)
503 558
504 smp_send_cmd(smp->conn, SMP_CMD_PAIRING_CONFIRM, sizeof(cp), &cp); 559 smp_send_cmd(smp->conn, SMP_CMD_PAIRING_CONFIRM, sizeof(cp), &cp);
505 560
561 if (conn->hcon->out)
562 SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_CONFIRM);
563 else
564 SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RANDOM);
565
506 return 0; 566 return 0;
507} 567}
508 568
@@ -574,82 +634,262 @@ static u8 smp_random(struct smp_chan *smp)
574 return 0; 634 return 0;
575} 635}
576 636
577static struct smp_chan *smp_chan_create(struct l2cap_conn *conn) 637static void smp_notify_keys(struct l2cap_conn *conn)
578{ 638{
579 struct smp_chan *smp; 639 struct l2cap_chan *chan = conn->smp;
640 struct smp_chan *smp = chan->data;
641 struct hci_conn *hcon = conn->hcon;
642 struct hci_dev *hdev = hcon->hdev;
643 struct smp_cmd_pairing *req = (void *) &smp->preq[1];
644 struct smp_cmd_pairing *rsp = (void *) &smp->prsp[1];
645 bool persistent;
580 646
581 smp = kzalloc(sizeof(*smp), GFP_ATOMIC); 647 if (smp->remote_irk) {
582 if (!smp) { 648 mgmt_new_irk(hdev, smp->remote_irk);
583 clear_bit(HCI_CONN_LE_SMP_PEND, &conn->hcon->flags); 649 /* Now that user space can be considered to know the
584 return NULL; 650 * identity address track the connection based on it
651 * from now on.
652 */
653 bacpy(&hcon->dst, &smp->remote_irk->bdaddr);
654 hcon->dst_type = smp->remote_irk->addr_type;
655 queue_work(hdev->workqueue, &conn->id_addr_update_work);
656
657 /* When receiving an indentity resolving key for
658 * a remote device that does not use a resolvable
659 * private address, just remove the key so that
660 * it is possible to use the controller white
661 * list for scanning.
662 *
663 * Userspace will have been told to not store
664 * this key at this point. So it is safe to
665 * just remove it.
666 */
667 if (!bacmp(&smp->remote_irk->rpa, BDADDR_ANY)) {
668 list_del(&smp->remote_irk->list);
669 kfree(smp->remote_irk);
670 smp->remote_irk = NULL;
671 }
585 } 672 }
586 673
587 smp->tfm_aes = crypto_alloc_blkcipher("ecb(aes)", 0, CRYPTO_ALG_ASYNC); 674 /* The LTKs and CSRKs should be persistent only if both sides
588 if (IS_ERR(smp->tfm_aes)) { 675 * had the bonding bit set in their authentication requests.
589 BT_ERR("Unable to create ECB crypto context"); 676 */
590 kfree(smp); 677 persistent = !!((req->auth_req & rsp->auth_req) & SMP_AUTH_BONDING);
591 clear_bit(HCI_CONN_LE_SMP_PEND, &conn->hcon->flags); 678
592 return NULL; 679 if (smp->csrk) {
680 smp->csrk->bdaddr_type = hcon->dst_type;
681 bacpy(&smp->csrk->bdaddr, &hcon->dst);
682 mgmt_new_csrk(hdev, smp->csrk, persistent);
593 } 683 }
594 684
595 smp->conn = conn; 685 if (smp->slave_csrk) {
596 conn->smp_chan = smp; 686 smp->slave_csrk->bdaddr_type = hcon->dst_type;
687 bacpy(&smp->slave_csrk->bdaddr, &hcon->dst);
688 mgmt_new_csrk(hdev, smp->slave_csrk, persistent);
689 }
597 690
598 hci_conn_hold(conn->hcon); 691 if (smp->ltk) {
692 smp->ltk->bdaddr_type = hcon->dst_type;
693 bacpy(&smp->ltk->bdaddr, &hcon->dst);
694 mgmt_new_ltk(hdev, smp->ltk, persistent);
695 }
599 696
600 return smp; 697 if (smp->slave_ltk) {
698 smp->slave_ltk->bdaddr_type = hcon->dst_type;
699 bacpy(&smp->slave_ltk->bdaddr, &hcon->dst);
700 mgmt_new_ltk(hdev, smp->slave_ltk, persistent);
701 }
601} 702}
602 703
603void smp_chan_destroy(struct l2cap_conn *conn) 704static void smp_allow_key_dist(struct smp_chan *smp)
604{ 705{
605 struct smp_chan *smp = conn->smp_chan; 706 /* Allow the first expected phase 3 PDU. The rest of the PDUs
606 bool complete; 707 * will be allowed in each PDU handler to ensure we receive
708 * them in the correct order.
709 */
710 if (smp->remote_key_dist & SMP_DIST_ENC_KEY)
711 SMP_ALLOW_CMD(smp, SMP_CMD_ENCRYPT_INFO);
712 else if (smp->remote_key_dist & SMP_DIST_ID_KEY)
713 SMP_ALLOW_CMD(smp, SMP_CMD_IDENT_INFO);
714 else if (smp->remote_key_dist & SMP_DIST_SIGN)
715 SMP_ALLOW_CMD(smp, SMP_CMD_SIGN_INFO);
716}
607 717
608 BUG_ON(!smp); 718static void smp_distribute_keys(struct smp_chan *smp)
719{
720 struct smp_cmd_pairing *req, *rsp;
721 struct l2cap_conn *conn = smp->conn;
722 struct hci_conn *hcon = conn->hcon;
723 struct hci_dev *hdev = hcon->hdev;
724 __u8 *keydist;
609 725
610 complete = test_bit(SMP_FLAG_COMPLETE, &smp->flags); 726 BT_DBG("conn %p", conn);
611 mgmt_smp_complete(conn->hcon, complete);
612 727
613 kfree(smp->csrk); 728 rsp = (void *) &smp->prsp[1];
614 kfree(smp->slave_csrk);
615 729
616 crypto_free_blkcipher(smp->tfm_aes); 730 /* The responder sends its keys first */
731 if (hcon->out && (smp->remote_key_dist & KEY_DIST_MASK)) {
732 smp_allow_key_dist(smp);
733 return;
734 }
617 735
618 /* If pairing failed clean up any keys we might have */ 736 req = (void *) &smp->preq[1];
619 if (!complete) {
620 if (smp->ltk) {
621 list_del(&smp->ltk->list);
622 kfree(smp->ltk);
623 }
624 737
625 if (smp->slave_ltk) { 738 if (hcon->out) {
626 list_del(&smp->slave_ltk->list); 739 keydist = &rsp->init_key_dist;
627 kfree(smp->slave_ltk); 740 *keydist &= req->init_key_dist;
628 } 741 } else {
742 keydist = &rsp->resp_key_dist;
743 *keydist &= req->resp_key_dist;
744 }
629 745
630 if (smp->remote_irk) { 746 BT_DBG("keydist 0x%x", *keydist);
631 list_del(&smp->remote_irk->list); 747
632 kfree(smp->remote_irk); 748 if (*keydist & SMP_DIST_ENC_KEY) {
749 struct smp_cmd_encrypt_info enc;
750 struct smp_cmd_master_ident ident;
751 struct smp_ltk *ltk;
752 u8 authenticated;
753 __le16 ediv;
754 __le64 rand;
755
756 get_random_bytes(enc.ltk, sizeof(enc.ltk));
757 get_random_bytes(&ediv, sizeof(ediv));
758 get_random_bytes(&rand, sizeof(rand));
759
760 smp_send_cmd(conn, SMP_CMD_ENCRYPT_INFO, sizeof(enc), &enc);
761
762 authenticated = hcon->sec_level == BT_SECURITY_HIGH;
763 ltk = hci_add_ltk(hdev, &hcon->dst, hcon->dst_type,
764 SMP_LTK_SLAVE, authenticated, enc.ltk,
765 smp->enc_key_size, ediv, rand);
766 smp->slave_ltk = ltk;
767
768 ident.ediv = ediv;
769 ident.rand = rand;
770
771 smp_send_cmd(conn, SMP_CMD_MASTER_IDENT, sizeof(ident), &ident);
772
773 *keydist &= ~SMP_DIST_ENC_KEY;
774 }
775
776 if (*keydist & SMP_DIST_ID_KEY) {
777 struct smp_cmd_ident_addr_info addrinfo;
778 struct smp_cmd_ident_info idinfo;
779
780 memcpy(idinfo.irk, hdev->irk, sizeof(idinfo.irk));
781
782 smp_send_cmd(conn, SMP_CMD_IDENT_INFO, sizeof(idinfo), &idinfo);
783
784 /* The hci_conn contains the local identity address
785 * after the connection has been established.
786 *
787 * This is true even when the connection has been
788 * established using a resolvable random address.
789 */
790 bacpy(&addrinfo.bdaddr, &hcon->src);
791 addrinfo.addr_type = hcon->src_type;
792
793 smp_send_cmd(conn, SMP_CMD_IDENT_ADDR_INFO, sizeof(addrinfo),
794 &addrinfo);
795
796 *keydist &= ~SMP_DIST_ID_KEY;
797 }
798
799 if (*keydist & SMP_DIST_SIGN) {
800 struct smp_cmd_sign_info sign;
801 struct smp_csrk *csrk;
802
803 /* Generate a new random key */
804 get_random_bytes(sign.csrk, sizeof(sign.csrk));
805
806 csrk = kzalloc(sizeof(*csrk), GFP_KERNEL);
807 if (csrk) {
808 csrk->master = 0x00;
809 memcpy(csrk->val, sign.csrk, sizeof(csrk->val));
633 } 810 }
811 smp->slave_csrk = csrk;
812
813 smp_send_cmd(conn, SMP_CMD_SIGN_INFO, sizeof(sign), &sign);
814
815 *keydist &= ~SMP_DIST_SIGN;
634 } 816 }
635 817
636 kfree(smp); 818 /* If there are still keys to be received wait for them */
637 conn->smp_chan = NULL; 819 if (smp->remote_key_dist & KEY_DIST_MASK) {
638 hci_conn_drop(conn->hcon); 820 smp_allow_key_dist(smp);
821 return;
822 }
823
824 set_bit(SMP_FLAG_COMPLETE, &smp->flags);
825 smp_notify_keys(conn);
826
827 smp_chan_destroy(conn);
828}
829
830static void smp_timeout(struct work_struct *work)
831{
832 struct smp_chan *smp = container_of(work, struct smp_chan,
833 security_timer.work);
834 struct l2cap_conn *conn = smp->conn;
835
836 BT_DBG("conn %p", conn);
837
838 hci_disconnect(conn->hcon, HCI_ERROR_REMOTE_USER_TERM);
839}
840
841static struct smp_chan *smp_chan_create(struct l2cap_conn *conn)
842{
843 struct l2cap_chan *chan = conn->smp;
844 struct smp_chan *smp;
845
846 smp = kzalloc(sizeof(*smp), GFP_ATOMIC);
847 if (!smp)
848 return NULL;
849
850 smp->tfm_aes = crypto_alloc_blkcipher("ecb(aes)", 0, CRYPTO_ALG_ASYNC);
851 if (IS_ERR(smp->tfm_aes)) {
852 BT_ERR("Unable to create ECB crypto context");
853 kfree(smp);
854 return NULL;
855 }
856
857 smp->conn = conn;
858 chan->data = smp;
859
860 SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_FAIL);
861
862 INIT_DELAYED_WORK(&smp->security_timer, smp_timeout);
863
864 hci_conn_hold(conn->hcon);
865
866 return smp;
639} 867}
640 868
641int smp_user_confirm_reply(struct hci_conn *hcon, u16 mgmt_op, __le32 passkey) 869int smp_user_confirm_reply(struct hci_conn *hcon, u16 mgmt_op, __le32 passkey)
642{ 870{
643 struct l2cap_conn *conn = hcon->l2cap_data; 871 struct l2cap_conn *conn = hcon->l2cap_data;
872 struct l2cap_chan *chan;
644 struct smp_chan *smp; 873 struct smp_chan *smp;
645 u32 value; 874 u32 value;
875 int err;
646 876
647 BT_DBG(""); 877 BT_DBG("");
648 878
649 if (!conn || !test_bit(HCI_CONN_LE_SMP_PEND, &hcon->flags)) 879 if (!conn)
880 return -ENOTCONN;
881
882 chan = conn->smp;
883 if (!chan)
650 return -ENOTCONN; 884 return -ENOTCONN;
651 885
652 smp = conn->smp_chan; 886 l2cap_chan_lock(chan);
887 if (!chan->data) {
888 err = -ENOTCONN;
889 goto unlock;
890 }
891
892 smp = chan->data;
653 893
654 switch (mgmt_op) { 894 switch (mgmt_op) {
655 case MGMT_OP_USER_PASSKEY_REPLY: 895 case MGMT_OP_USER_PASSKEY_REPLY:
@@ -664,12 +904,16 @@ int smp_user_confirm_reply(struct hci_conn *hcon, u16 mgmt_op, __le32 passkey)
664 case MGMT_OP_USER_PASSKEY_NEG_REPLY: 904 case MGMT_OP_USER_PASSKEY_NEG_REPLY:
665 case MGMT_OP_USER_CONFIRM_NEG_REPLY: 905 case MGMT_OP_USER_CONFIRM_NEG_REPLY:
666 smp_failure(conn, SMP_PASSKEY_ENTRY_FAILED); 906 smp_failure(conn, SMP_PASSKEY_ENTRY_FAILED);
667 return 0; 907 err = 0;
908 goto unlock;
668 default: 909 default:
669 smp_failure(conn, SMP_PASSKEY_ENTRY_FAILED); 910 smp_failure(conn, SMP_PASSKEY_ENTRY_FAILED);
670 return -EOPNOTSUPP; 911 err = -EOPNOTSUPP;
912 goto unlock;
671 } 913 }
672 914
915 err = 0;
916
673 /* If it is our turn to send Pairing Confirm, do so now */ 917 /* If it is our turn to send Pairing Confirm, do so now */
674 if (test_bit(SMP_FLAG_CFM_PENDING, &smp->flags)) { 918 if (test_bit(SMP_FLAG_CFM_PENDING, &smp->flags)) {
675 u8 rsp = smp_confirm(smp); 919 u8 rsp = smp_confirm(smp);
@@ -677,12 +921,15 @@ int smp_user_confirm_reply(struct hci_conn *hcon, u16 mgmt_op, __le32 passkey)
677 smp_failure(conn, rsp); 921 smp_failure(conn, rsp);
678 } 922 }
679 923
680 return 0; 924unlock:
925 l2cap_chan_unlock(chan);
926 return err;
681} 927}
682 928
683static u8 smp_cmd_pairing_req(struct l2cap_conn *conn, struct sk_buff *skb) 929static u8 smp_cmd_pairing_req(struct l2cap_conn *conn, struct sk_buff *skb)
684{ 930{
685 struct smp_cmd_pairing rsp, *req = (void *) skb->data; 931 struct smp_cmd_pairing rsp, *req = (void *) skb->data;
932 struct l2cap_chan *chan = conn->smp;
686 struct hci_dev *hdev = conn->hcon->hdev; 933 struct hci_dev *hdev = conn->hcon->hdev;
687 struct smp_chan *smp; 934 struct smp_chan *smp;
688 u8 key_size, auth, sec_level; 935 u8 key_size, auth, sec_level;
@@ -696,26 +943,30 @@ static u8 smp_cmd_pairing_req(struct l2cap_conn *conn, struct sk_buff *skb)
696 if (conn->hcon->role != HCI_ROLE_SLAVE) 943 if (conn->hcon->role != HCI_ROLE_SLAVE)
697 return SMP_CMD_NOTSUPP; 944 return SMP_CMD_NOTSUPP;
698 945
699 if (!test_and_set_bit(HCI_CONN_LE_SMP_PEND, &conn->hcon->flags)) 946 if (!chan->data)
700 smp = smp_chan_create(conn); 947 smp = smp_chan_create(conn);
701 else 948 else
702 smp = conn->smp_chan; 949 smp = chan->data;
703 950
704 if (!smp) 951 if (!smp)
705 return SMP_UNSPECIFIED; 952 return SMP_UNSPECIFIED;
706 953
954 /* We didn't start the pairing, so match remote */
955 auth = req->auth_req & AUTH_REQ_MASK;
956
707 if (!test_bit(HCI_BONDABLE, &hdev->dev_flags) && 957 if (!test_bit(HCI_BONDABLE, &hdev->dev_flags) &&
708 (req->auth_req & SMP_AUTH_BONDING)) 958 (auth & SMP_AUTH_BONDING))
709 return SMP_PAIRING_NOTSUPP; 959 return SMP_PAIRING_NOTSUPP;
710 960
711 smp->preq[0] = SMP_CMD_PAIRING_REQ; 961 smp->preq[0] = SMP_CMD_PAIRING_REQ;
712 memcpy(&smp->preq[1], req, sizeof(*req)); 962 memcpy(&smp->preq[1], req, sizeof(*req));
713 skb_pull(skb, sizeof(*req)); 963 skb_pull(skb, sizeof(*req));
714 964
715 /* We didn't start the pairing, so match remote */ 965 if (conn->hcon->io_capability == HCI_IO_NO_INPUT_OUTPUT)
716 auth = req->auth_req; 966 sec_level = BT_SECURITY_MEDIUM;
967 else
968 sec_level = authreq_to_seclevel(auth);
717 969
718 sec_level = authreq_to_seclevel(auth);
719 if (sec_level > conn->hcon->pending_sec_level) 970 if (sec_level > conn->hcon->pending_sec_level)
720 conn->hcon->pending_sec_level = sec_level; 971 conn->hcon->pending_sec_level = sec_level;
721 972
@@ -741,6 +992,7 @@ static u8 smp_cmd_pairing_req(struct l2cap_conn *conn, struct sk_buff *skb)
741 memcpy(&smp->prsp[1], &rsp, sizeof(rsp)); 992 memcpy(&smp->prsp[1], &rsp, sizeof(rsp));
742 993
743 smp_send_cmd(conn, SMP_CMD_PAIRING_RSP, sizeof(rsp), &rsp); 994 smp_send_cmd(conn, SMP_CMD_PAIRING_RSP, sizeof(rsp), &rsp);
995 SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_CONFIRM);
744 996
745 /* Request setup of TK */ 997 /* Request setup of TK */
746 ret = tk_request(conn, 0, auth, rsp.io_capability, req->io_capability); 998 ret = tk_request(conn, 0, auth, rsp.io_capability, req->io_capability);
@@ -753,8 +1005,9 @@ static u8 smp_cmd_pairing_req(struct l2cap_conn *conn, struct sk_buff *skb)
753static u8 smp_cmd_pairing_rsp(struct l2cap_conn *conn, struct sk_buff *skb) 1005static u8 smp_cmd_pairing_rsp(struct l2cap_conn *conn, struct sk_buff *skb)
754{ 1006{
755 struct smp_cmd_pairing *req, *rsp = (void *) skb->data; 1007 struct smp_cmd_pairing *req, *rsp = (void *) skb->data;
756 struct smp_chan *smp = conn->smp_chan; 1008 struct l2cap_chan *chan = conn->smp;
757 u8 key_size, auth = SMP_AUTH_NONE; 1009 struct smp_chan *smp = chan->data;
1010 u8 key_size, auth;
758 int ret; 1011 int ret;
759 1012
760 BT_DBG("conn %p", conn); 1013 BT_DBG("conn %p", conn);
@@ -773,6 +1026,8 @@ static u8 smp_cmd_pairing_rsp(struct l2cap_conn *conn, struct sk_buff *skb)
773 if (check_enc_key_size(conn, key_size)) 1026 if (check_enc_key_size(conn, key_size))
774 return SMP_ENC_KEY_SIZE; 1027 return SMP_ENC_KEY_SIZE;
775 1028
1029 auth = rsp->auth_req & AUTH_REQ_MASK;
1030
776 /* If we need MITM check that it can be acheived */ 1031 /* If we need MITM check that it can be acheived */
777 if (conn->hcon->pending_sec_level >= BT_SECURITY_HIGH) { 1032 if (conn->hcon->pending_sec_level >= BT_SECURITY_HIGH) {
778 u8 method; 1033 u8 method;
@@ -793,11 +1048,7 @@ static u8 smp_cmd_pairing_rsp(struct l2cap_conn *conn, struct sk_buff *skb)
793 */ 1048 */
794 smp->remote_key_dist &= rsp->resp_key_dist; 1049 smp->remote_key_dist &= rsp->resp_key_dist;
795 1050
796 if ((req->auth_req & SMP_AUTH_BONDING) && 1051 auth |= req->auth_req;
797 (rsp->auth_req & SMP_AUTH_BONDING))
798 auth = SMP_AUTH_BONDING;
799
800 auth |= (req->auth_req | rsp->auth_req) & SMP_AUTH_MITM;
801 1052
802 ret = tk_request(conn, 0, auth, req->io_capability, rsp->io_capability); 1053 ret = tk_request(conn, 0, auth, req->io_capability, rsp->io_capability);
803 if (ret) 1054 if (ret)
@@ -814,7 +1065,8 @@ static u8 smp_cmd_pairing_rsp(struct l2cap_conn *conn, struct sk_buff *skb)
814 1065
815static u8 smp_cmd_pairing_confirm(struct l2cap_conn *conn, struct sk_buff *skb) 1066static u8 smp_cmd_pairing_confirm(struct l2cap_conn *conn, struct sk_buff *skb)
816{ 1067{
817 struct smp_chan *smp = conn->smp_chan; 1068 struct l2cap_chan *chan = conn->smp;
1069 struct smp_chan *smp = chan->data;
818 1070
819 BT_DBG("conn %p %s", conn, conn->hcon->out ? "master" : "slave"); 1071 BT_DBG("conn %p %s", conn, conn->hcon->out ? "master" : "slave");
820 1072
@@ -824,10 +1076,14 @@ static u8 smp_cmd_pairing_confirm(struct l2cap_conn *conn, struct sk_buff *skb)
824 memcpy(smp->pcnf, skb->data, sizeof(smp->pcnf)); 1076 memcpy(smp->pcnf, skb->data, sizeof(smp->pcnf));
825 skb_pull(skb, sizeof(smp->pcnf)); 1077 skb_pull(skb, sizeof(smp->pcnf));
826 1078
827 if (conn->hcon->out) 1079 if (conn->hcon->out) {
828 smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd), 1080 smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd),
829 smp->prnd); 1081 smp->prnd);
830 else if (test_bit(SMP_FLAG_TK_VALID, &smp->flags)) 1082 SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RANDOM);
1083 return 0;
1084 }
1085
1086 if (test_bit(SMP_FLAG_TK_VALID, &smp->flags))
831 return smp_confirm(smp); 1087 return smp_confirm(smp);
832 else 1088 else
833 set_bit(SMP_FLAG_CFM_PENDING, &smp->flags); 1089 set_bit(SMP_FLAG_CFM_PENDING, &smp->flags);
@@ -837,7 +1093,8 @@ static u8 smp_cmd_pairing_confirm(struct l2cap_conn *conn, struct sk_buff *skb)
837 1093
838static u8 smp_cmd_pairing_random(struct l2cap_conn *conn, struct sk_buff *skb) 1094static u8 smp_cmd_pairing_random(struct l2cap_conn *conn, struct sk_buff *skb)
839{ 1095{
840 struct smp_chan *smp = conn->smp_chan; 1096 struct l2cap_chan *chan = conn->smp;
1097 struct smp_chan *smp = chan->data;
841 1098
842 BT_DBG("conn %p", conn); 1099 BT_DBG("conn %p", conn);
843 1100
@@ -860,7 +1117,7 @@ static bool smp_ltk_encrypt(struct l2cap_conn *conn, u8 sec_level)
860 if (!key) 1117 if (!key)
861 return false; 1118 return false;
862 1119
863 if (sec_level > BT_SECURITY_MEDIUM && !key->authenticated) 1120 if (smp_ltk_sec_level(key) < sec_level)
864 return false; 1121 return false;
865 1122
866 if (test_and_set_bit(HCI_CONN_ENCRYPT_PEND, &hcon->flags)) 1123 if (test_and_set_bit(HCI_CONN_ENCRYPT_PEND, &hcon->flags))
@@ -903,7 +1160,7 @@ static u8 smp_cmd_security_req(struct l2cap_conn *conn, struct sk_buff *skb)
903 struct smp_cmd_pairing cp; 1160 struct smp_cmd_pairing cp;
904 struct hci_conn *hcon = conn->hcon; 1161 struct hci_conn *hcon = conn->hcon;
905 struct smp_chan *smp; 1162 struct smp_chan *smp;
906 u8 sec_level; 1163 u8 sec_level, auth;
907 1164
908 BT_DBG("conn %p", conn); 1165 BT_DBG("conn %p", conn);
909 1166
@@ -913,7 +1170,13 @@ static u8 smp_cmd_security_req(struct l2cap_conn *conn, struct sk_buff *skb)
913 if (hcon->role != HCI_ROLE_MASTER) 1170 if (hcon->role != HCI_ROLE_MASTER)
914 return SMP_CMD_NOTSUPP; 1171 return SMP_CMD_NOTSUPP;
915 1172
916 sec_level = authreq_to_seclevel(rp->auth_req); 1173 auth = rp->auth_req & AUTH_REQ_MASK;
1174
1175 if (hcon->io_capability == HCI_IO_NO_INPUT_OUTPUT)
1176 sec_level = BT_SECURITY_MEDIUM;
1177 else
1178 sec_level = authreq_to_seclevel(auth);
1179
917 if (smp_sufficient_security(hcon, sec_level)) 1180 if (smp_sufficient_security(hcon, sec_level))
918 return 0; 1181 return 0;
919 1182
@@ -923,26 +1186,24 @@ static u8 smp_cmd_security_req(struct l2cap_conn *conn, struct sk_buff *skb)
923 if (smp_ltk_encrypt(conn, hcon->pending_sec_level)) 1186 if (smp_ltk_encrypt(conn, hcon->pending_sec_level))
924 return 0; 1187 return 0;
925 1188
926 if (test_and_set_bit(HCI_CONN_LE_SMP_PEND, &hcon->flags))
927 return 0;
928
929 smp = smp_chan_create(conn); 1189 smp = smp_chan_create(conn);
930 if (!smp) 1190 if (!smp)
931 return SMP_UNSPECIFIED; 1191 return SMP_UNSPECIFIED;
932 1192
933 if (!test_bit(HCI_BONDABLE, &hcon->hdev->dev_flags) && 1193 if (!test_bit(HCI_BONDABLE, &hcon->hdev->dev_flags) &&
934 (rp->auth_req & SMP_AUTH_BONDING)) 1194 (auth & SMP_AUTH_BONDING))
935 return SMP_PAIRING_NOTSUPP; 1195 return SMP_PAIRING_NOTSUPP;
936 1196
937 skb_pull(skb, sizeof(*rp)); 1197 skb_pull(skb, sizeof(*rp));
938 1198
939 memset(&cp, 0, sizeof(cp)); 1199 memset(&cp, 0, sizeof(cp));
940 build_pairing_cmd(conn, &cp, NULL, rp->auth_req); 1200 build_pairing_cmd(conn, &cp, NULL, auth);
941 1201
942 smp->preq[0] = SMP_CMD_PAIRING_REQ; 1202 smp->preq[0] = SMP_CMD_PAIRING_REQ;
943 memcpy(&smp->preq[1], &cp, sizeof(cp)); 1203 memcpy(&smp->preq[1], &cp, sizeof(cp));
944 1204
945 smp_send_cmd(conn, SMP_CMD_PAIRING_REQ, sizeof(cp), &cp); 1205 smp_send_cmd(conn, SMP_CMD_PAIRING_REQ, sizeof(cp), &cp);
1206 SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RSP);
946 1207
947 return 0; 1208 return 0;
948} 1209}
@@ -950,8 +1211,10 @@ static u8 smp_cmd_security_req(struct l2cap_conn *conn, struct sk_buff *skb)
950int smp_conn_security(struct hci_conn *hcon, __u8 sec_level) 1211int smp_conn_security(struct hci_conn *hcon, __u8 sec_level)
951{ 1212{
952 struct l2cap_conn *conn = hcon->l2cap_data; 1213 struct l2cap_conn *conn = hcon->l2cap_data;
1214 struct l2cap_chan *chan;
953 struct smp_chan *smp; 1215 struct smp_chan *smp;
954 __u8 authreq; 1216 __u8 authreq;
1217 int ret;
955 1218
956 BT_DBG("conn %p hcon %p level 0x%2.2x", conn, hcon, sec_level); 1219 BT_DBG("conn %p hcon %p level 0x%2.2x", conn, hcon, sec_level);
957 1220
@@ -959,6 +1222,8 @@ int smp_conn_security(struct hci_conn *hcon, __u8 sec_level)
959 if (!conn) 1222 if (!conn)
960 return 1; 1223 return 1;
961 1224
1225 chan = conn->smp;
1226
962 if (!test_bit(HCI_LE_ENABLED, &hcon->hdev->dev_flags)) 1227 if (!test_bit(HCI_LE_ENABLED, &hcon->hdev->dev_flags))
963 return 1; 1228 return 1;
964 1229
@@ -972,12 +1237,19 @@ int smp_conn_security(struct hci_conn *hcon, __u8 sec_level)
972 if (smp_ltk_encrypt(conn, hcon->pending_sec_level)) 1237 if (smp_ltk_encrypt(conn, hcon->pending_sec_level))
973 return 0; 1238 return 0;
974 1239
975 if (test_and_set_bit(HCI_CONN_LE_SMP_PEND, &hcon->flags)) 1240 l2cap_chan_lock(chan);
976 return 0; 1241
1242 /* If SMP is already in progress ignore this request */
1243 if (chan->data) {
1244 ret = 0;
1245 goto unlock;
1246 }
977 1247
978 smp = smp_chan_create(conn); 1248 smp = smp_chan_create(conn);
979 if (!smp) 1249 if (!smp) {
980 return 1; 1250 ret = 1;
1251 goto unlock;
1252 }
981 1253
982 authreq = seclevel_to_authreq(sec_level); 1254 authreq = seclevel_to_authreq(sec_level);
983 1255
@@ -996,30 +1268,34 @@ int smp_conn_security(struct hci_conn *hcon, __u8 sec_level)
996 memcpy(&smp->preq[1], &cp, sizeof(cp)); 1268 memcpy(&smp->preq[1], &cp, sizeof(cp));
997 1269
998 smp_send_cmd(conn, SMP_CMD_PAIRING_REQ, sizeof(cp), &cp); 1270 smp_send_cmd(conn, SMP_CMD_PAIRING_REQ, sizeof(cp), &cp);
1271 SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RSP);
999 } else { 1272 } else {
1000 struct smp_cmd_security_req cp; 1273 struct smp_cmd_security_req cp;
1001 cp.auth_req = authreq; 1274 cp.auth_req = authreq;
1002 smp_send_cmd(conn, SMP_CMD_SECURITY_REQ, sizeof(cp), &cp); 1275 smp_send_cmd(conn, SMP_CMD_SECURITY_REQ, sizeof(cp), &cp);
1276 SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_REQ);
1003 } 1277 }
1004 1278
1005 set_bit(SMP_FLAG_INITIATOR, &smp->flags); 1279 set_bit(SMP_FLAG_INITIATOR, &smp->flags);
1280 ret = 0;
1006 1281
1007 return 0; 1282unlock:
1283 l2cap_chan_unlock(chan);
1284 return ret;
1008} 1285}
1009 1286
1010static int smp_cmd_encrypt_info(struct l2cap_conn *conn, struct sk_buff *skb) 1287static int smp_cmd_encrypt_info(struct l2cap_conn *conn, struct sk_buff *skb)
1011{ 1288{
1012 struct smp_cmd_encrypt_info *rp = (void *) skb->data; 1289 struct smp_cmd_encrypt_info *rp = (void *) skb->data;
1013 struct smp_chan *smp = conn->smp_chan; 1290 struct l2cap_chan *chan = conn->smp;
1291 struct smp_chan *smp = chan->data;
1014 1292
1015 BT_DBG("conn %p", conn); 1293 BT_DBG("conn %p", conn);
1016 1294
1017 if (skb->len < sizeof(*rp)) 1295 if (skb->len < sizeof(*rp))
1018 return SMP_INVALID_PARAMS; 1296 return SMP_INVALID_PARAMS;
1019 1297
1020 /* Ignore this PDU if it wasn't requested */ 1298 SMP_ALLOW_CMD(smp, SMP_CMD_MASTER_IDENT);
1021 if (!(smp->remote_key_dist & SMP_DIST_ENC_KEY))
1022 return 0;
1023 1299
1024 skb_pull(skb, sizeof(*rp)); 1300 skb_pull(skb, sizeof(*rp));
1025 1301
@@ -1031,7 +1307,8 @@ static int smp_cmd_encrypt_info(struct l2cap_conn *conn, struct sk_buff *skb)
1031static int smp_cmd_master_ident(struct l2cap_conn *conn, struct sk_buff *skb) 1307static int smp_cmd_master_ident(struct l2cap_conn *conn, struct sk_buff *skb)
1032{ 1308{
1033 struct smp_cmd_master_ident *rp = (void *) skb->data; 1309 struct smp_cmd_master_ident *rp = (void *) skb->data;
1034 struct smp_chan *smp = conn->smp_chan; 1310 struct l2cap_chan *chan = conn->smp;
1311 struct smp_chan *smp = chan->data;
1035 struct hci_dev *hdev = conn->hcon->hdev; 1312 struct hci_dev *hdev = conn->hcon->hdev;
1036 struct hci_conn *hcon = conn->hcon; 1313 struct hci_conn *hcon = conn->hcon;
1037 struct smp_ltk *ltk; 1314 struct smp_ltk *ltk;
@@ -1042,13 +1319,14 @@ static int smp_cmd_master_ident(struct l2cap_conn *conn, struct sk_buff *skb)
1042 if (skb->len < sizeof(*rp)) 1319 if (skb->len < sizeof(*rp))
1043 return SMP_INVALID_PARAMS; 1320 return SMP_INVALID_PARAMS;
1044 1321
1045 /* Ignore this PDU if it wasn't requested */
1046 if (!(smp->remote_key_dist & SMP_DIST_ENC_KEY))
1047 return 0;
1048
1049 /* Mark the information as received */ 1322 /* Mark the information as received */
1050 smp->remote_key_dist &= ~SMP_DIST_ENC_KEY; 1323 smp->remote_key_dist &= ~SMP_DIST_ENC_KEY;
1051 1324
1325 if (smp->remote_key_dist & SMP_DIST_ID_KEY)
1326 SMP_ALLOW_CMD(smp, SMP_CMD_IDENT_INFO);
1327 else if (smp->remote_key_dist & SMP_DIST_SIGN)
1328 SMP_ALLOW_CMD(smp, SMP_CMD_SIGN_INFO);
1329
1052 skb_pull(skb, sizeof(*rp)); 1330 skb_pull(skb, sizeof(*rp));
1053 1331
1054 hci_dev_lock(hdev); 1332 hci_dev_lock(hdev);
@@ -1057,8 +1335,8 @@ static int smp_cmd_master_ident(struct l2cap_conn *conn, struct sk_buff *skb)
1057 authenticated, smp->tk, smp->enc_key_size, 1335 authenticated, smp->tk, smp->enc_key_size,
1058 rp->ediv, rp->rand); 1336 rp->ediv, rp->rand);
1059 smp->ltk = ltk; 1337 smp->ltk = ltk;
1060 if (!(smp->remote_key_dist & SMP_DIST_ID_KEY)) 1338 if (!(smp->remote_key_dist & KEY_DIST_MASK))
1061 smp_distribute_keys(conn); 1339 smp_distribute_keys(smp);
1062 hci_dev_unlock(hdev); 1340 hci_dev_unlock(hdev);
1063 1341
1064 return 0; 1342 return 0;
@@ -1067,16 +1345,15 @@ static int smp_cmd_master_ident(struct l2cap_conn *conn, struct sk_buff *skb)
1067static int smp_cmd_ident_info(struct l2cap_conn *conn, struct sk_buff *skb) 1345static int smp_cmd_ident_info(struct l2cap_conn *conn, struct sk_buff *skb)
1068{ 1346{
1069 struct smp_cmd_ident_info *info = (void *) skb->data; 1347 struct smp_cmd_ident_info *info = (void *) skb->data;
1070 struct smp_chan *smp = conn->smp_chan; 1348 struct l2cap_chan *chan = conn->smp;
1349 struct smp_chan *smp = chan->data;
1071 1350
1072 BT_DBG(""); 1351 BT_DBG("");
1073 1352
1074 if (skb->len < sizeof(*info)) 1353 if (skb->len < sizeof(*info))
1075 return SMP_INVALID_PARAMS; 1354 return SMP_INVALID_PARAMS;
1076 1355
1077 /* Ignore this PDU if it wasn't requested */ 1356 SMP_ALLOW_CMD(smp, SMP_CMD_IDENT_ADDR_INFO);
1078 if (!(smp->remote_key_dist & SMP_DIST_ID_KEY))
1079 return 0;
1080 1357
1081 skb_pull(skb, sizeof(*info)); 1358 skb_pull(skb, sizeof(*info));
1082 1359
@@ -1089,7 +1366,8 @@ static int smp_cmd_ident_addr_info(struct l2cap_conn *conn,
1089 struct sk_buff *skb) 1366 struct sk_buff *skb)
1090{ 1367{
1091 struct smp_cmd_ident_addr_info *info = (void *) skb->data; 1368 struct smp_cmd_ident_addr_info *info = (void *) skb->data;
1092 struct smp_chan *smp = conn->smp_chan; 1369 struct l2cap_chan *chan = conn->smp;
1370 struct smp_chan *smp = chan->data;
1093 struct hci_conn *hcon = conn->hcon; 1371 struct hci_conn *hcon = conn->hcon;
1094 bdaddr_t rpa; 1372 bdaddr_t rpa;
1095 1373
@@ -1098,13 +1376,12 @@ static int smp_cmd_ident_addr_info(struct l2cap_conn *conn,
1098 if (skb->len < sizeof(*info)) 1376 if (skb->len < sizeof(*info))
1099 return SMP_INVALID_PARAMS; 1377 return SMP_INVALID_PARAMS;
1100 1378
1101 /* Ignore this PDU if it wasn't requested */
1102 if (!(smp->remote_key_dist & SMP_DIST_ID_KEY))
1103 return 0;
1104
1105 /* Mark the information as received */ 1379 /* Mark the information as received */
1106 smp->remote_key_dist &= ~SMP_DIST_ID_KEY; 1380 smp->remote_key_dist &= ~SMP_DIST_ID_KEY;
1107 1381
1382 if (smp->remote_key_dist & SMP_DIST_SIGN)
1383 SMP_ALLOW_CMD(smp, SMP_CMD_SIGN_INFO);
1384
1108 skb_pull(skb, sizeof(*info)); 1385 skb_pull(skb, sizeof(*info));
1109 1386
1110 hci_dev_lock(hcon->hdev); 1387 hci_dev_lock(hcon->hdev);
@@ -1133,7 +1410,8 @@ static int smp_cmd_ident_addr_info(struct l2cap_conn *conn,
1133 smp->id_addr_type, smp->irk, &rpa); 1410 smp->id_addr_type, smp->irk, &rpa);
1134 1411
1135distribute: 1412distribute:
1136 smp_distribute_keys(conn); 1413 if (!(smp->remote_key_dist & KEY_DIST_MASK))
1414 smp_distribute_keys(smp);
1137 1415
1138 hci_dev_unlock(hcon->hdev); 1416 hci_dev_unlock(hcon->hdev);
1139 1417
@@ -1143,7 +1421,8 @@ distribute:
1143static int smp_cmd_sign_info(struct l2cap_conn *conn, struct sk_buff *skb) 1421static int smp_cmd_sign_info(struct l2cap_conn *conn, struct sk_buff *skb)
1144{ 1422{
1145 struct smp_cmd_sign_info *rp = (void *) skb->data; 1423 struct smp_cmd_sign_info *rp = (void *) skb->data;
1146 struct smp_chan *smp = conn->smp_chan; 1424 struct l2cap_chan *chan = conn->smp;
1425 struct smp_chan *smp = chan->data;
1147 struct hci_dev *hdev = conn->hcon->hdev; 1426 struct hci_dev *hdev = conn->hcon->hdev;
1148 struct smp_csrk *csrk; 1427 struct smp_csrk *csrk;
1149 1428
@@ -1152,10 +1431,6 @@ static int smp_cmd_sign_info(struct l2cap_conn *conn, struct sk_buff *skb)
1152 if (skb->len < sizeof(*rp)) 1431 if (skb->len < sizeof(*rp))
1153 return SMP_INVALID_PARAMS; 1432 return SMP_INVALID_PARAMS;
1154 1433
1155 /* Ignore this PDU if it wasn't requested */
1156 if (!(smp->remote_key_dist & SMP_DIST_SIGN))
1157 return 0;
1158
1159 /* Mark the information as received */ 1434 /* Mark the information as received */
1160 smp->remote_key_dist &= ~SMP_DIST_SIGN; 1435 smp->remote_key_dist &= ~SMP_DIST_SIGN;
1161 1436
@@ -1168,16 +1443,17 @@ static int smp_cmd_sign_info(struct l2cap_conn *conn, struct sk_buff *skb)
1168 memcpy(csrk->val, rp->csrk, sizeof(csrk->val)); 1443 memcpy(csrk->val, rp->csrk, sizeof(csrk->val));
1169 } 1444 }
1170 smp->csrk = csrk; 1445 smp->csrk = csrk;
1171 if (!(smp->remote_key_dist & SMP_DIST_SIGN)) 1446 smp_distribute_keys(smp);
1172 smp_distribute_keys(conn);
1173 hci_dev_unlock(hdev); 1447 hci_dev_unlock(hdev);
1174 1448
1175 return 0; 1449 return 0;
1176} 1450}
1177 1451
1178int smp_sig_channel(struct l2cap_conn *conn, struct sk_buff *skb) 1452static int smp_sig_channel(struct l2cap_chan *chan, struct sk_buff *skb)
1179{ 1453{
1454 struct l2cap_conn *conn = chan->conn;
1180 struct hci_conn *hcon = conn->hcon; 1455 struct hci_conn *hcon = conn->hcon;
1456 struct smp_chan *smp;
1181 __u8 code, reason; 1457 __u8 code, reason;
1182 int err = 0; 1458 int err = 0;
1183 1459
@@ -1186,13 +1462,10 @@ int smp_sig_channel(struct l2cap_conn *conn, struct sk_buff *skb)
1186 return 0; 1462 return 0;
1187 } 1463 }
1188 1464
1189 if (skb->len < 1) { 1465 if (skb->len < 1)
1190 kfree_skb(skb);
1191 return -EILSEQ; 1466 return -EILSEQ;
1192 }
1193 1467
1194 if (!test_bit(HCI_LE_ENABLED, &hcon->hdev->dev_flags)) { 1468 if (!test_bit(HCI_LE_ENABLED, &hcon->hdev->dev_flags)) {
1195 err = -EOPNOTSUPP;
1196 reason = SMP_PAIRING_NOTSUPP; 1469 reason = SMP_PAIRING_NOTSUPP;
1197 goto done; 1470 goto done;
1198 } 1471 }
@@ -1200,18 +1473,19 @@ int smp_sig_channel(struct l2cap_conn *conn, struct sk_buff *skb)
1200 code = skb->data[0]; 1473 code = skb->data[0];
1201 skb_pull(skb, sizeof(code)); 1474 skb_pull(skb, sizeof(code));
1202 1475
1203 /* 1476 smp = chan->data;
1204 * The SMP context must be initialized for all other PDUs except 1477
1205 * pairing and security requests. If we get any other PDU when 1478 if (code > SMP_CMD_MAX)
1206 * not initialized simply disconnect (done if this function 1479 goto drop;
1207 * returns an error). 1480
1481 if (smp && !test_and_clear_bit(code, &smp->allow_cmd))
1482 goto drop;
1483
1484 /* If we don't have a context the only allowed commands are
1485 * pairing request and security request.
1208 */ 1486 */
1209 if (code != SMP_CMD_PAIRING_REQ && code != SMP_CMD_SECURITY_REQ && 1487 if (!smp && code != SMP_CMD_PAIRING_REQ && code != SMP_CMD_SECURITY_REQ)
1210 !conn->smp_chan) { 1488 goto drop;
1211 BT_ERR("Unexpected SMP command 0x%02x. Disconnecting.", code);
1212 kfree_skb(skb);
1213 return -EOPNOTSUPP;
1214 }
1215 1489
1216 switch (code) { 1490 switch (code) {
1217 case SMP_CMD_PAIRING_REQ: 1491 case SMP_CMD_PAIRING_REQ:
@@ -1220,7 +1494,6 @@ int smp_sig_channel(struct l2cap_conn *conn, struct sk_buff *skb)
1220 1494
1221 case SMP_CMD_PAIRING_FAIL: 1495 case SMP_CMD_PAIRING_FAIL:
1222 smp_failure(conn, 0); 1496 smp_failure(conn, 0);
1223 reason = 0;
1224 err = -EPERM; 1497 err = -EPERM;
1225 break; 1498 break;
1226 1499
@@ -1262,197 +1535,217 @@ int smp_sig_channel(struct l2cap_conn *conn, struct sk_buff *skb)
1262 1535
1263 default: 1536 default:
1264 BT_DBG("Unknown command code 0x%2.2x", code); 1537 BT_DBG("Unknown command code 0x%2.2x", code);
1265
1266 reason = SMP_CMD_NOTSUPP; 1538 reason = SMP_CMD_NOTSUPP;
1267 err = -EOPNOTSUPP;
1268 goto done; 1539 goto done;
1269 } 1540 }
1270 1541
1271done: 1542done:
1272 if (reason) 1543 if (!err) {
1273 smp_failure(conn, reason); 1544 if (reason)
1545 smp_failure(conn, reason);
1546 kfree_skb(skb);
1547 }
1274 1548
1275 kfree_skb(skb);
1276 return err; 1549 return err;
1550
1551drop:
1552 BT_ERR("%s unexpected SMP command 0x%02x from %pMR", hcon->hdev->name,
1553 code, &hcon->dst);
1554 kfree_skb(skb);
1555 return 0;
1277} 1556}
1278 1557
1279static void smp_notify_keys(struct l2cap_conn *conn) 1558static void smp_teardown_cb(struct l2cap_chan *chan, int err)
1280{ 1559{
1281 struct smp_chan *smp = conn->smp_chan; 1560 struct l2cap_conn *conn = chan->conn;
1282 struct hci_conn *hcon = conn->hcon;
1283 struct hci_dev *hdev = hcon->hdev;
1284 struct smp_cmd_pairing *req = (void *) &smp->preq[1];
1285 struct smp_cmd_pairing *rsp = (void *) &smp->prsp[1];
1286 bool persistent;
1287 1561
1288 if (smp->remote_irk) { 1562 BT_DBG("chan %p", chan);
1289 mgmt_new_irk(hdev, smp->remote_irk);
1290 /* Now that user space can be considered to know the
1291 * identity address track the connection based on it
1292 * from now on.
1293 */
1294 bacpy(&hcon->dst, &smp->remote_irk->bdaddr);
1295 hcon->dst_type = smp->remote_irk->addr_type;
1296 l2cap_conn_update_id_addr(hcon);
1297 1563
1298 /* When receiving an indentity resolving key for 1564 if (chan->data)
1299 * a remote device that does not use a resolvable 1565 smp_chan_destroy(conn);
1300 * private address, just remove the key so that
1301 * it is possible to use the controller white
1302 * list for scanning.
1303 *
1304 * Userspace will have been told to not store
1305 * this key at this point. So it is safe to
1306 * just remove it.
1307 */
1308 if (!bacmp(&smp->remote_irk->rpa, BDADDR_ANY)) {
1309 list_del(&smp->remote_irk->list);
1310 kfree(smp->remote_irk);
1311 smp->remote_irk = NULL;
1312 }
1313 }
1314 1566
1315 /* The LTKs and CSRKs should be persistent only if both sides 1567 conn->smp = NULL;
1316 * had the bonding bit set in their authentication requests. 1568 l2cap_chan_put(chan);
1317 */ 1569}
1318 persistent = !!((req->auth_req & rsp->auth_req) & SMP_AUTH_BONDING);
1319 1570
1320 if (smp->csrk) { 1571static void smp_resume_cb(struct l2cap_chan *chan)
1321 smp->csrk->bdaddr_type = hcon->dst_type; 1572{
1322 bacpy(&smp->csrk->bdaddr, &hcon->dst); 1573 struct smp_chan *smp = chan->data;
1323 mgmt_new_csrk(hdev, smp->csrk, persistent); 1574 struct l2cap_conn *conn = chan->conn;
1324 } 1575 struct hci_conn *hcon = conn->hcon;
1325 1576
1326 if (smp->slave_csrk) { 1577 BT_DBG("chan %p", chan);
1327 smp->slave_csrk->bdaddr_type = hcon->dst_type;
1328 bacpy(&smp->slave_csrk->bdaddr, &hcon->dst);
1329 mgmt_new_csrk(hdev, smp->slave_csrk, persistent);
1330 }
1331 1578
1332 if (smp->ltk) { 1579 if (!smp)
1333 smp->ltk->bdaddr_type = hcon->dst_type; 1580 return;
1334 bacpy(&smp->ltk->bdaddr, &hcon->dst);
1335 mgmt_new_ltk(hdev, smp->ltk, persistent);
1336 }
1337 1581
1338 if (smp->slave_ltk) { 1582 if (!test_bit(HCI_CONN_ENCRYPT, &hcon->flags))
1339 smp->slave_ltk->bdaddr_type = hcon->dst_type; 1583 return;
1340 bacpy(&smp->slave_ltk->bdaddr, &hcon->dst); 1584
1341 mgmt_new_ltk(hdev, smp->slave_ltk, persistent); 1585 cancel_delayed_work(&smp->security_timer);
1342 } 1586
1587 smp_distribute_keys(smp);
1343} 1588}
1344 1589
1345int smp_distribute_keys(struct l2cap_conn *conn) 1590static void smp_ready_cb(struct l2cap_chan *chan)
1346{ 1591{
1347 struct smp_cmd_pairing *req, *rsp; 1592 struct l2cap_conn *conn = chan->conn;
1348 struct smp_chan *smp = conn->smp_chan;
1349 struct hci_conn *hcon = conn->hcon;
1350 struct hci_dev *hdev = hcon->hdev;
1351 __u8 *keydist;
1352 1593
1353 BT_DBG("conn %p", conn); 1594 BT_DBG("chan %p", chan);
1354 1595
1355 if (!test_bit(HCI_CONN_LE_SMP_PEND, &hcon->flags)) 1596 conn->smp = chan;
1356 return 0; 1597 l2cap_chan_hold(chan);
1598}
1357 1599
1358 rsp = (void *) &smp->prsp[1]; 1600static int smp_recv_cb(struct l2cap_chan *chan, struct sk_buff *skb)
1601{
1602 int err;
1359 1603
1360 /* The responder sends its keys first */ 1604 BT_DBG("chan %p", chan);
1361 if (hcon->out && (smp->remote_key_dist & 0x07))
1362 return 0;
1363 1605
1364 req = (void *) &smp->preq[1]; 1606 err = smp_sig_channel(chan, skb);
1607 if (err) {
1608 struct smp_chan *smp = chan->data;
1365 1609
1366 if (hcon->out) { 1610 if (smp)
1367 keydist = &rsp->init_key_dist; 1611 cancel_delayed_work_sync(&smp->security_timer);
1368 *keydist &= req->init_key_dist; 1612
1369 } else { 1613 hci_disconnect(chan->conn->hcon, HCI_ERROR_AUTH_FAILURE);
1370 keydist = &rsp->resp_key_dist;
1371 *keydist &= req->resp_key_dist;
1372 } 1614 }
1373 1615
1374 BT_DBG("keydist 0x%x", *keydist); 1616 return err;
1617}
1375 1618
1376 if (*keydist & SMP_DIST_ENC_KEY) { 1619static struct sk_buff *smp_alloc_skb_cb(struct l2cap_chan *chan,
1377 struct smp_cmd_encrypt_info enc; 1620 unsigned long hdr_len,
1378 struct smp_cmd_master_ident ident; 1621 unsigned long len, int nb)
1379 struct smp_ltk *ltk; 1622{
1380 u8 authenticated; 1623 struct sk_buff *skb;
1381 __le16 ediv;
1382 __le64 rand;
1383 1624
1384 get_random_bytes(enc.ltk, sizeof(enc.ltk)); 1625 skb = bt_skb_alloc(hdr_len + len, GFP_KERNEL);
1385 get_random_bytes(&ediv, sizeof(ediv)); 1626 if (!skb)
1386 get_random_bytes(&rand, sizeof(rand)); 1627 return ERR_PTR(-ENOMEM);
1387 1628
1388 smp_send_cmd(conn, SMP_CMD_ENCRYPT_INFO, sizeof(enc), &enc); 1629 skb->priority = HCI_PRIO_MAX;
1630 bt_cb(skb)->chan = chan;
1389 1631
1390 authenticated = hcon->sec_level == BT_SECURITY_HIGH; 1632 return skb;
1391 ltk = hci_add_ltk(hdev, &hcon->dst, hcon->dst_type, 1633}
1392 SMP_LTK_SLAVE, authenticated, enc.ltk,
1393 smp->enc_key_size, ediv, rand);
1394 smp->slave_ltk = ltk;
1395 1634
1396 ident.ediv = ediv; 1635static const struct l2cap_ops smp_chan_ops = {
1397 ident.rand = rand; 1636 .name = "Security Manager",
1637 .ready = smp_ready_cb,
1638 .recv = smp_recv_cb,
1639 .alloc_skb = smp_alloc_skb_cb,
1640 .teardown = smp_teardown_cb,
1641 .resume = smp_resume_cb,
1642
1643 .new_connection = l2cap_chan_no_new_connection,
1644 .state_change = l2cap_chan_no_state_change,
1645 .close = l2cap_chan_no_close,
1646 .defer = l2cap_chan_no_defer,
1647 .suspend = l2cap_chan_no_suspend,
1648 .set_shutdown = l2cap_chan_no_set_shutdown,
1649 .get_sndtimeo = l2cap_chan_no_get_sndtimeo,
1650 .memcpy_fromiovec = l2cap_chan_no_memcpy_fromiovec,
1651};
1398 1652
1399 smp_send_cmd(conn, SMP_CMD_MASTER_IDENT, sizeof(ident), &ident); 1653static inline struct l2cap_chan *smp_new_conn_cb(struct l2cap_chan *pchan)
1654{
1655 struct l2cap_chan *chan;
1400 1656
1401 *keydist &= ~SMP_DIST_ENC_KEY; 1657 BT_DBG("pchan %p", pchan);
1402 }
1403 1658
1404 if (*keydist & SMP_DIST_ID_KEY) { 1659 chan = l2cap_chan_create();
1405 struct smp_cmd_ident_addr_info addrinfo; 1660 if (!chan)
1406 struct smp_cmd_ident_info idinfo; 1661 return NULL;
1407 1662
1408 memcpy(idinfo.irk, hdev->irk, sizeof(idinfo.irk)); 1663 chan->chan_type = pchan->chan_type;
1664 chan->ops = &smp_chan_ops;
1665 chan->scid = pchan->scid;
1666 chan->dcid = chan->scid;
1667 chan->imtu = pchan->imtu;
1668 chan->omtu = pchan->omtu;
1669 chan->mode = pchan->mode;
1409 1670
1410 smp_send_cmd(conn, SMP_CMD_IDENT_INFO, sizeof(idinfo), &idinfo); 1671 BT_DBG("created chan %p", chan);
1411 1672
1412 /* The hci_conn contains the local identity address 1673 return chan;
1413 * after the connection has been established. 1674}
1414 *
1415 * This is true even when the connection has been
1416 * established using a resolvable random address.
1417 */
1418 bacpy(&addrinfo.bdaddr, &hcon->src);
1419 addrinfo.addr_type = hcon->src_type;
1420 1675
1421 smp_send_cmd(conn, SMP_CMD_IDENT_ADDR_INFO, sizeof(addrinfo), 1676static const struct l2cap_ops smp_root_chan_ops = {
1422 &addrinfo); 1677 .name = "Security Manager Root",
1678 .new_connection = smp_new_conn_cb,
1679
1680 /* None of these are implemented for the root channel */
1681 .close = l2cap_chan_no_close,
1682 .alloc_skb = l2cap_chan_no_alloc_skb,
1683 .recv = l2cap_chan_no_recv,
1684 .state_change = l2cap_chan_no_state_change,
1685 .teardown = l2cap_chan_no_teardown,
1686 .ready = l2cap_chan_no_ready,
1687 .defer = l2cap_chan_no_defer,
1688 .suspend = l2cap_chan_no_suspend,
1689 .resume = l2cap_chan_no_resume,
1690 .set_shutdown = l2cap_chan_no_set_shutdown,
1691 .get_sndtimeo = l2cap_chan_no_get_sndtimeo,
1692 .memcpy_fromiovec = l2cap_chan_no_memcpy_fromiovec,
1693};
1423 1694
1424 *keydist &= ~SMP_DIST_ID_KEY; 1695int smp_register(struct hci_dev *hdev)
1425 } 1696{
1697 struct l2cap_chan *chan;
1698 struct crypto_blkcipher *tfm_aes;
1426 1699
1427 if (*keydist & SMP_DIST_SIGN) { 1700 BT_DBG("%s", hdev->name);
1428 struct smp_cmd_sign_info sign;
1429 struct smp_csrk *csrk;
1430 1701
1431 /* Generate a new random key */ 1702 tfm_aes = crypto_alloc_blkcipher("ecb(aes)", 0, CRYPTO_ALG_ASYNC);
1432 get_random_bytes(sign.csrk, sizeof(sign.csrk)); 1703 if (IS_ERR(tfm_aes)) {
1704 int err = PTR_ERR(tfm_aes);
1705 BT_ERR("Unable to create crypto context");
1706 return err;
1707 }
1433 1708
1434 csrk = kzalloc(sizeof(*csrk), GFP_KERNEL); 1709 chan = l2cap_chan_create();
1435 if (csrk) { 1710 if (!chan) {
1436 csrk->master = 0x00; 1711 crypto_free_blkcipher(tfm_aes);
1437 memcpy(csrk->val, sign.csrk, sizeof(csrk->val)); 1712 return -ENOMEM;
1438 } 1713 }
1439 smp->slave_csrk = csrk;
1440 1714
1441 smp_send_cmd(conn, SMP_CMD_SIGN_INFO, sizeof(sign), &sign); 1715 chan->data = tfm_aes;
1442 1716
1443 *keydist &= ~SMP_DIST_SIGN; 1717 l2cap_add_scid(chan, L2CAP_CID_SMP);
1444 }
1445 1718
1446 /* If there are still keys to be received wait for them */ 1719 l2cap_chan_set_defaults(chan);
1447 if ((smp->remote_key_dist & 0x07))
1448 return 0;
1449 1720
1450 clear_bit(HCI_CONN_LE_SMP_PEND, &hcon->flags); 1721 bacpy(&chan->src, &hdev->bdaddr);
1451 cancel_delayed_work_sync(&conn->security_timer); 1722 chan->src_type = BDADDR_LE_PUBLIC;
1452 set_bit(SMP_FLAG_COMPLETE, &smp->flags); 1723 chan->state = BT_LISTEN;
1453 smp_notify_keys(conn); 1724 chan->mode = L2CAP_MODE_BASIC;
1725 chan->imtu = L2CAP_DEFAULT_MTU;
1726 chan->ops = &smp_root_chan_ops;
1454 1727
1455 smp_chan_destroy(conn); 1728 hdev->smp_data = chan;
1456 1729
1457 return 0; 1730 return 0;
1458} 1731}
1732
1733void smp_unregister(struct hci_dev *hdev)
1734{
1735 struct l2cap_chan *chan = hdev->smp_data;
1736 struct crypto_blkcipher *tfm_aes;
1737
1738 if (!chan)
1739 return;
1740
1741 BT_DBG("%s chan %p", hdev->name, chan);
1742
1743 tfm_aes = chan->data;
1744 if (tfm_aes) {
1745 chan->data = NULL;
1746 crypto_free_blkcipher(tfm_aes);
1747 }
1748
1749 hdev->smp_data = NULL;
1750 l2cap_chan_put(chan);
1751}
diff --git a/net/bluetooth/smp.h b/net/bluetooth/smp.h
index 796f4f45f92f..86a683a8b491 100644
--- a/net/bluetooth/smp.h
+++ b/net/bluetooth/smp.h
@@ -102,6 +102,8 @@ struct smp_cmd_security_req {
102 __u8 auth_req; 102 __u8 auth_req;
103} __packed; 103} __packed;
104 104
105#define SMP_CMD_MAX 0x0b
106
105#define SMP_PASSKEY_ENTRY_FAILED 0x01 107#define SMP_PASSKEY_ENTRY_FAILED 0x01
106#define SMP_OOB_NOT_AVAIL 0x02 108#define SMP_OOB_NOT_AVAIL 0x02
107#define SMP_AUTH_REQUIREMENTS 0x03 109#define SMP_AUTH_REQUIREMENTS 0x03
@@ -123,17 +125,23 @@ enum {
123 SMP_LTK_SLAVE, 125 SMP_LTK_SLAVE,
124}; 126};
125 127
128static inline u8 smp_ltk_sec_level(struct smp_ltk *key)
129{
130 if (key->authenticated)
131 return BT_SECURITY_HIGH;
132
133 return BT_SECURITY_MEDIUM;
134}
135
126/* SMP Commands */ 136/* SMP Commands */
127bool smp_sufficient_security(struct hci_conn *hcon, u8 sec_level); 137bool smp_sufficient_security(struct hci_conn *hcon, u8 sec_level);
128int smp_conn_security(struct hci_conn *hcon, __u8 sec_level); 138int smp_conn_security(struct hci_conn *hcon, __u8 sec_level);
129int smp_sig_channel(struct l2cap_conn *conn, struct sk_buff *skb);
130int smp_distribute_keys(struct l2cap_conn *conn);
131int smp_user_confirm_reply(struct hci_conn *conn, u16 mgmt_op, __le32 passkey); 139int smp_user_confirm_reply(struct hci_conn *conn, u16 mgmt_op, __le32 passkey);
132 140
133void smp_chan_destroy(struct l2cap_conn *conn); 141bool smp_irk_matches(struct hci_dev *hdev, u8 irk[16], bdaddr_t *bdaddr);
142int smp_generate_rpa(struct hci_dev *hdev, u8 irk[16], bdaddr_t *rpa);
134 143
135bool smp_irk_matches(struct crypto_blkcipher *tfm, u8 irk[16], 144int smp_register(struct hci_dev *hdev);
136 bdaddr_t *bdaddr); 145void smp_unregister(struct hci_dev *hdev);
137int smp_generate_rpa(struct crypto_blkcipher *tfm, u8 irk[16], bdaddr_t *rpa);
138 146
139#endif /* __SMP_H */ 147#endif /* __SMP_H */
diff --git a/net/bridge/Makefile b/net/bridge/Makefile
index 8590b942bffa..fd7ee03c59b3 100644
--- a/net/bridge/Makefile
+++ b/net/bridge/Makefile
@@ -10,7 +10,9 @@ bridge-y := br.o br_device.o br_fdb.o br_forward.o br_if.o br_input.o \
10 10
11bridge-$(CONFIG_SYSFS) += br_sysfs_if.o br_sysfs_br.o 11bridge-$(CONFIG_SYSFS) += br_sysfs_if.o br_sysfs_br.o
12 12
13bridge-$(CONFIG_BRIDGE_NETFILTER) += br_netfilter.o 13bridge-$(subst m,y,$(CONFIG_BRIDGE_NETFILTER)) += br_nf_core.o
14
15obj-$(CONFIG_BRIDGE_NETFILTER) += br_netfilter.o
14 16
15bridge-$(CONFIG_BRIDGE_IGMP_SNOOPING) += br_multicast.o br_mdb.o 17bridge-$(CONFIG_BRIDGE_IGMP_SNOOPING) += br_multicast.o br_mdb.o
16 18
diff --git a/net/bridge/br.c b/net/bridge/br.c
index 1a755a1e5410..44425aff7cba 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -161,7 +161,7 @@ static int __init br_init(void)
161 if (err) 161 if (err)
162 goto err_out1; 162 goto err_out1;
163 163
164 err = br_netfilter_init(); 164 err = br_nf_core_init();
165 if (err) 165 if (err)
166 goto err_out2; 166 goto err_out2;
167 167
@@ -179,11 +179,16 @@ static int __init br_init(void)
179 br_fdb_test_addr_hook = br_fdb_test_addr; 179 br_fdb_test_addr_hook = br_fdb_test_addr;
180#endif 180#endif
181 181
182 pr_info("bridge: automatic filtering via arp/ip/ip6tables has been "
183 "deprecated. Update your scripts to load br_netfilter if you "
184 "need this.\n");
185
182 return 0; 186 return 0;
187
183err_out4: 188err_out4:
184 unregister_netdevice_notifier(&br_device_notifier); 189 unregister_netdevice_notifier(&br_device_notifier);
185err_out3: 190err_out3:
186 br_netfilter_fini(); 191 br_nf_core_fini();
187err_out2: 192err_out2:
188 unregister_pernet_subsys(&br_net_ops); 193 unregister_pernet_subsys(&br_net_ops);
189err_out1: 194err_out1:
@@ -196,20 +201,17 @@ err_out:
196static void __exit br_deinit(void) 201static void __exit br_deinit(void)
197{ 202{
198 stp_proto_unregister(&br_stp_proto); 203 stp_proto_unregister(&br_stp_proto);
199
200 br_netlink_fini(); 204 br_netlink_fini();
201 unregister_netdevice_notifier(&br_device_notifier); 205 unregister_netdevice_notifier(&br_device_notifier);
202 brioctl_set(NULL); 206 brioctl_set(NULL);
203
204 unregister_pernet_subsys(&br_net_ops); 207 unregister_pernet_subsys(&br_net_ops);
205 208
206 rcu_barrier(); /* Wait for completion of call_rcu()'s */ 209 rcu_barrier(); /* Wait for completion of call_rcu()'s */
207 210
208 br_netfilter_fini(); 211 br_nf_core_fini();
209#if IS_ENABLED(CONFIG_ATM_LANE) 212#if IS_ENABLED(CONFIG_ATM_LANE)
210 br_fdb_test_addr_hook = NULL; 213 br_fdb_test_addr_hook = NULL;
211#endif 214#endif
212
213 br_fdb_fini(); 215 br_fdb_fini();
214} 216}
215 217
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 568cccd39a3d..ffd379db5938 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -36,7 +36,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
36 u16 vid = 0; 36 u16 vid = 0;
37 37
38 rcu_read_lock(); 38 rcu_read_lock();
39#ifdef CONFIG_BRIDGE_NETFILTER 39#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
40 if (skb->nf_bridge && (skb->nf_bridge->mask & BRNF_BRIDGED_DNAT)) { 40 if (skb->nf_bridge && (skb->nf_bridge->mask & BRNF_BRIDGED_DNAT)) {
41 br_nf_pre_routing_finish_bridge_slow(skb); 41 br_nf_pre_routing_finish_bridge_slow(skb);
42 rcu_read_unlock(); 42 rcu_read_unlock();
@@ -88,12 +88,17 @@ out:
88static int br_dev_init(struct net_device *dev) 88static int br_dev_init(struct net_device *dev)
89{ 89{
90 struct net_bridge *br = netdev_priv(dev); 90 struct net_bridge *br = netdev_priv(dev);
91 int err;
91 92
92 br->stats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats); 93 br->stats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
93 if (!br->stats) 94 if (!br->stats)
94 return -ENOMEM; 95 return -ENOMEM;
95 96
96 return 0; 97 err = br_vlan_init(br);
98 if (err)
99 free_percpu(br->stats);
100
101 return err;
97} 102}
98 103
99static int br_dev_open(struct net_device *dev) 104static int br_dev_open(struct net_device *dev)
@@ -167,7 +172,7 @@ static int br_change_mtu(struct net_device *dev, int new_mtu)
167 172
168 dev->mtu = new_mtu; 173 dev->mtu = new_mtu;
169 174
170#ifdef CONFIG_BRIDGE_NETFILTER 175#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
171 /* remember the MTU in the rtable for PMTU */ 176 /* remember the MTU in the rtable for PMTU */
172 dst_metric_set(&br->fake_rtable.dst, RTAX_MTU, new_mtu); 177 dst_metric_set(&br->fake_rtable.dst, RTAX_MTU, new_mtu);
173#endif 178#endif
@@ -389,5 +394,4 @@ void br_dev_setup(struct net_device *dev)
389 br_netfilter_rtable_init(br); 394 br_netfilter_rtable_init(br);
390 br_stp_timer_init(br); 395 br_stp_timer_init(br);
391 br_multicast_init(br); 396 br_multicast_init(br);
392 br_vlan_init(br);
393} 397}
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 056b67b0e277..992ec49a96aa 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -49,6 +49,7 @@ int br_dev_queue_push_xmit(struct sk_buff *skb)
49 49
50 return 0; 50 return 0;
51} 51}
52EXPORT_SYMBOL_GPL(br_dev_queue_push_xmit);
52 53
53int br_forward_finish(struct sk_buff *skb) 54int br_forward_finish(struct sk_buff *skb)
54{ 55{
@@ -56,6 +57,7 @@ int br_forward_finish(struct sk_buff *skb)
56 br_dev_queue_push_xmit); 57 br_dev_queue_push_xmit);
57 58
58} 59}
60EXPORT_SYMBOL_GPL(br_forward_finish);
59 61
60static void __br_deliver(const struct net_bridge_port *to, struct sk_buff *skb) 62static void __br_deliver(const struct net_bridge_port *to, struct sk_buff *skb)
61{ 63{
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 078d336a1f37..ed307db7a12b 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -252,12 +252,12 @@ static void del_nbp(struct net_bridge_port *p)
252 br_fdb_delete_by_port(br, p, 1); 252 br_fdb_delete_by_port(br, p, 1);
253 nbp_update_port_count(br); 253 nbp_update_port_count(br);
254 254
255 netdev_upper_dev_unlink(dev, br->dev);
256
255 dev->priv_flags &= ~IFF_BRIDGE_PORT; 257 dev->priv_flags &= ~IFF_BRIDGE_PORT;
256 258
257 netdev_rx_handler_unregister(dev); 259 netdev_rx_handler_unregister(dev);
258 260
259 netdev_upper_dev_unlink(dev, br->dev);
260
261 br_multicast_del_port(p); 261 br_multicast_del_port(p);
262 262
263 kobject_uevent(&p->kobj, KOBJ_REMOVE); 263 kobject_uevent(&p->kobj, KOBJ_REMOVE);
@@ -332,7 +332,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br,
332 p->port_no = index; 332 p->port_no = index;
333 p->flags = BR_LEARNING | BR_FLOOD; 333 p->flags = BR_LEARNING | BR_FLOOD;
334 br_init_port(p); 334 br_init_port(p);
335 p->state = BR_STATE_DISABLED; 335 br_set_state(p, BR_STATE_DISABLED);
336 br_stp_port_timer_init(p); 336 br_stp_port_timer_init(p);
337 br_multicast_add_port(p); 337 br_multicast_add_port(p);
338 338
@@ -476,16 +476,16 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
476 if (err) 476 if (err)
477 goto err3; 477 goto err3;
478 478
479 err = netdev_master_upper_dev_link(dev, br->dev); 479 err = netdev_rx_handler_register(dev, br_handle_frame, p);
480 if (err) 480 if (err)
481 goto err4; 481 goto err4;
482 482
483 err = netdev_rx_handler_register(dev, br_handle_frame, p); 483 dev->priv_flags |= IFF_BRIDGE_PORT;
484
485 err = netdev_master_upper_dev_link(dev, br->dev);
484 if (err) 486 if (err)
485 goto err5; 487 goto err5;
486 488
487 dev->priv_flags |= IFF_BRIDGE_PORT;
488
489 dev_disable_lro(dev); 489 dev_disable_lro(dev);
490 490
491 list_add_rcu(&p->list, &br->port_list); 491 list_add_rcu(&p->list, &br->port_list);
@@ -500,6 +500,9 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
500 if (br_fdb_insert(br, p, dev->dev_addr, 0)) 500 if (br_fdb_insert(br, p, dev->dev_addr, 0))
501 netdev_err(dev, "failed insert local address bridge forwarding table\n"); 501 netdev_err(dev, "failed insert local address bridge forwarding table\n");
502 502
503 if (nbp_vlan_init(p))
504 netdev_err(dev, "failed to initialize vlan filtering on this port\n");
505
503 spin_lock_bh(&br->lock); 506 spin_lock_bh(&br->lock);
504 changed_addr = br_stp_recalculate_bridge_id(br); 507 changed_addr = br_stp_recalculate_bridge_id(br);
505 508
@@ -520,7 +523,8 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
520 return 0; 523 return 0;
521 524
522err5: 525err5:
523 netdev_upper_dev_unlink(dev, br->dev); 526 dev->priv_flags &= ~IFF_BRIDGE_PORT;
527 netdev_rx_handler_unregister(dev);
524err4: 528err4:
525 br_netpoll_disable(p); 529 br_netpoll_disable(p);
526err3: 530err3:
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 366c43649079..6fd5522df696 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -140,6 +140,7 @@ drop:
140 kfree_skb(skb); 140 kfree_skb(skb);
141 goto out; 141 goto out;
142} 142}
143EXPORT_SYMBOL_GPL(br_handle_frame_finish);
143 144
144/* note: already called with rcu_read_lock */ 145/* note: already called with rcu_read_lock */
145static int br_handle_local_finish(struct sk_buff *skb) 146static int br_handle_local_finish(struct sk_buff *skb)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 7751c92c8c57..648d79ccf462 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1822,7 +1822,7 @@ static void br_multicast_query_expired(struct net_bridge *br,
1822 if (query->startup_sent < br->multicast_startup_query_count) 1822 if (query->startup_sent < br->multicast_startup_query_count)
1823 query->startup_sent++; 1823 query->startup_sent++;
1824 1824
1825 rcu_assign_pointer(querier, NULL); 1825 RCU_INIT_POINTER(querier, NULL);
1826 br_multicast_send_query(br, NULL, query); 1826 br_multicast_send_query(br, NULL, query);
1827 spin_unlock(&br->multicast_lock); 1827 spin_unlock(&br->multicast_lock);
1828} 1828}
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index a615264cf01a..1bada53bb195 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -111,66 +111,6 @@ static inline __be16 pppoe_proto(const struct sk_buff *skb)
111 pppoe_proto(skb) == htons(PPP_IPV6) && \ 111 pppoe_proto(skb) == htons(PPP_IPV6) && \
112 brnf_filter_pppoe_tagged) 112 brnf_filter_pppoe_tagged)
113 113
114static void fake_update_pmtu(struct dst_entry *dst, struct sock *sk,
115 struct sk_buff *skb, u32 mtu)
116{
117}
118
119static void fake_redirect(struct dst_entry *dst, struct sock *sk,
120 struct sk_buff *skb)
121{
122}
123
124static u32 *fake_cow_metrics(struct dst_entry *dst, unsigned long old)
125{
126 return NULL;
127}
128
129static struct neighbour *fake_neigh_lookup(const struct dst_entry *dst,
130 struct sk_buff *skb,
131 const void *daddr)
132{
133 return NULL;
134}
135
136static unsigned int fake_mtu(const struct dst_entry *dst)
137{
138 return dst->dev->mtu;
139}
140
141static struct dst_ops fake_dst_ops = {
142 .family = AF_INET,
143 .protocol = cpu_to_be16(ETH_P_IP),
144 .update_pmtu = fake_update_pmtu,
145 .redirect = fake_redirect,
146 .cow_metrics = fake_cow_metrics,
147 .neigh_lookup = fake_neigh_lookup,
148 .mtu = fake_mtu,
149};
150
151/*
152 * Initialize bogus route table used to keep netfilter happy.
153 * Currently, we fill in the PMTU entry because netfilter
154 * refragmentation needs it, and the rt_flags entry because
155 * ipt_REJECT needs it. Future netfilter modules might
156 * require us to fill additional fields.
157 */
158static const u32 br_dst_default_metrics[RTAX_MAX] = {
159 [RTAX_MTU - 1] = 1500,
160};
161
162void br_netfilter_rtable_init(struct net_bridge *br)
163{
164 struct rtable *rt = &br->fake_rtable;
165
166 atomic_set(&rt->dst.__refcnt, 1);
167 rt->dst.dev = br->dev;
168 rt->dst.path = &rt->dst;
169 dst_init_metrics(&rt->dst, br_dst_default_metrics, true);
170 rt->dst.flags = DST_NOXFRM | DST_FAKE_RTABLE;
171 rt->dst.ops = &fake_dst_ops;
172}
173
174static inline struct rtable *bridge_parent_rtable(const struct net_device *dev) 114static inline struct rtable *bridge_parent_rtable(const struct net_device *dev)
175{ 115{
176 struct net_bridge_port *port; 116 struct net_bridge_port *port;
@@ -245,14 +185,6 @@ static inline void nf_bridge_save_header(struct sk_buff *skb)
245 skb->nf_bridge->data, header_size); 185 skb->nf_bridge->data, header_size);
246} 186}
247 187
248static inline void nf_bridge_update_protocol(struct sk_buff *skb)
249{
250 if (skb->nf_bridge->mask & BRNF_8021Q)
251 skb->protocol = htons(ETH_P_8021Q);
252 else if (skb->nf_bridge->mask & BRNF_PPPoE)
253 skb->protocol = htons(ETH_P_PPP_SES);
254}
255
256/* When handing a packet over to the IP layer 188/* When handing a packet over to the IP layer
257 * check whether we have a skb that is in the 189 * check whether we have a skb that is in the
258 * expected format 190 * expected format
@@ -320,26 +252,6 @@ drop:
320 return -1; 252 return -1;
321} 253}
322 254
323/* Fill in the header for fragmented IP packets handled by
324 * the IPv4 connection tracking code.
325 */
326int nf_bridge_copy_header(struct sk_buff *skb)
327{
328 int err;
329 unsigned int header_size;
330
331 nf_bridge_update_protocol(skb);
332 header_size = ETH_HLEN + nf_bridge_encap_header_len(skb);
333 err = skb_cow_head(skb, header_size);
334 if (err)
335 return err;
336
337 skb_copy_to_linear_data_offset(skb, -header_size,
338 skb->nf_bridge->data, header_size);
339 __skb_push(skb, nf_bridge_encap_header_len(skb));
340 return 0;
341}
342
343/* PF_BRIDGE/PRE_ROUTING *********************************************/ 255/* PF_BRIDGE/PRE_ROUTING *********************************************/
344/* Undo the changes made for ip6tables PREROUTING and continue the 256/* Undo the changes made for ip6tables PREROUTING and continue the
345 * bridge PRE_ROUTING hook. */ 257 * bridge PRE_ROUTING hook. */
@@ -404,6 +316,7 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
404 ETH_HLEN-ETH_ALEN); 316 ETH_HLEN-ETH_ALEN);
405 /* tell br_dev_xmit to continue with forwarding */ 317 /* tell br_dev_xmit to continue with forwarding */
406 nf_bridge->mask |= BRNF_BRIDGED_DNAT; 318 nf_bridge->mask |= BRNF_BRIDGED_DNAT;
319 /* FIXME Need to refragment */
407 ret = neigh->output(neigh, skb); 320 ret = neigh->output(neigh, skb);
408 } 321 }
409 neigh_release(neigh); 322 neigh_release(neigh);
@@ -459,6 +372,10 @@ static int br_nf_pre_routing_finish(struct sk_buff *skb)
459 struct nf_bridge_info *nf_bridge = skb->nf_bridge; 372 struct nf_bridge_info *nf_bridge = skb->nf_bridge;
460 struct rtable *rt; 373 struct rtable *rt;
461 int err; 374 int err;
375 int frag_max_size;
376
377 frag_max_size = IPCB(skb)->frag_max_size;
378 BR_INPUT_SKB_CB(skb)->frag_max_size = frag_max_size;
462 379
463 if (nf_bridge->mask & BRNF_PKT_TYPE) { 380 if (nf_bridge->mask & BRNF_PKT_TYPE) {
464 skb->pkt_type = PACKET_OTHERHOST; 381 skb->pkt_type = PACKET_OTHERHOST;
@@ -863,13 +780,19 @@ static unsigned int br_nf_forward_arp(const struct nf_hook_ops *ops,
863static int br_nf_dev_queue_xmit(struct sk_buff *skb) 780static int br_nf_dev_queue_xmit(struct sk_buff *skb)
864{ 781{
865 int ret; 782 int ret;
783 int frag_max_size;
866 784
785 /* This is wrong! We should preserve the original fragment
786 * boundaries by preserving frag_list rather than refragmenting.
787 */
867 if (skb->protocol == htons(ETH_P_IP) && 788 if (skb->protocol == htons(ETH_P_IP) &&
868 skb->len + nf_bridge_mtu_reduction(skb) > skb->dev->mtu && 789 skb->len + nf_bridge_mtu_reduction(skb) > skb->dev->mtu &&
869 !skb_is_gso(skb)) { 790 !skb_is_gso(skb)) {
791 frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
870 if (br_parse_ip_options(skb)) 792 if (br_parse_ip_options(skb))
871 /* Drop invalid packet */ 793 /* Drop invalid packet */
872 return NF_DROP; 794 return NF_DROP;
795 IPCB(skb)->frag_max_size = frag_max_size;
873 ret = ip_fragment(skb, br_dev_queue_push_xmit); 796 ret = ip_fragment(skb, br_dev_queue_push_xmit);
874 } else 797 } else
875 ret = br_dev_queue_push_xmit(skb); 798 ret = br_dev_queue_push_xmit(skb);
@@ -944,6 +867,11 @@ static unsigned int ip_sabotage_in(const struct nf_hook_ops *ops,
944 return NF_ACCEPT; 867 return NF_ACCEPT;
945} 868}
946 869
870void br_netfilter_enable(void)
871{
872}
873EXPORT_SYMBOL_GPL(br_netfilter_enable);
874
947/* For br_nf_post_routing, we need (prio = NF_BR_PRI_LAST), because 875/* For br_nf_post_routing, we need (prio = NF_BR_PRI_LAST), because
948 * br_dev_queue_push_xmit is called afterwards */ 876 * br_dev_queue_push_xmit is called afterwards */
949static struct nf_hook_ops br_nf_ops[] __read_mostly = { 877static struct nf_hook_ops br_nf_ops[] __read_mostly = {
@@ -1059,38 +987,42 @@ static struct ctl_table brnf_table[] = {
1059}; 987};
1060#endif 988#endif
1061 989
1062int __init br_netfilter_init(void) 990static int __init br_netfilter_init(void)
1063{ 991{
1064 int ret; 992 int ret;
1065 993
1066 ret = dst_entries_init(&fake_dst_ops); 994 ret = nf_register_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops));
1067 if (ret < 0) 995 if (ret < 0)
1068 return ret; 996 return ret;
1069 997
1070 ret = nf_register_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops));
1071 if (ret < 0) {
1072 dst_entries_destroy(&fake_dst_ops);
1073 return ret;
1074 }
1075#ifdef CONFIG_SYSCTL 998#ifdef CONFIG_SYSCTL
1076 brnf_sysctl_header = register_net_sysctl(&init_net, "net/bridge", brnf_table); 999 brnf_sysctl_header = register_net_sysctl(&init_net, "net/bridge", brnf_table);
1077 if (brnf_sysctl_header == NULL) { 1000 if (brnf_sysctl_header == NULL) {
1078 printk(KERN_WARNING 1001 printk(KERN_WARNING
1079 "br_netfilter: can't register to sysctl.\n"); 1002 "br_netfilter: can't register to sysctl.\n");
1080 nf_unregister_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops)); 1003 ret = -ENOMEM;
1081 dst_entries_destroy(&fake_dst_ops); 1004 goto err1;
1082 return -ENOMEM;
1083 } 1005 }
1084#endif 1006#endif
1085 printk(KERN_NOTICE "Bridge firewalling registered\n"); 1007 printk(KERN_NOTICE "Bridge firewalling registered\n");
1086 return 0; 1008 return 0;
1009err1:
1010 nf_unregister_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops));
1011 return ret;
1087} 1012}
1088 1013
1089void br_netfilter_fini(void) 1014static void __exit br_netfilter_fini(void)
1090{ 1015{
1091 nf_unregister_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops)); 1016 nf_unregister_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops));
1092#ifdef CONFIG_SYSCTL 1017#ifdef CONFIG_SYSCTL
1093 unregister_net_sysctl_table(brnf_sysctl_header); 1018 unregister_net_sysctl_table(brnf_sysctl_header);
1094#endif 1019#endif
1095 dst_entries_destroy(&fake_dst_ops);
1096} 1020}
1021
1022module_init(br_netfilter_init);
1023module_exit(br_netfilter_fini);
1024
1025MODULE_LICENSE("GPL");
1026MODULE_AUTHOR("Lennert Buytenhek <buytenh@gnu.org>");
1027MODULE_AUTHOR("Bart De Schuymer <bdschuym@pandora.be>");
1028MODULE_DESCRIPTION("Linux ethernet netfilter firewall bridge");
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index cb5fcf62f663..2ff9706647f2 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -257,9 +257,6 @@ static int br_afspec(struct net_bridge *br,
257 } else 257 } else
258 err = br_vlan_add(br, vinfo->vid, vinfo->flags); 258 err = br_vlan_add(br, vinfo->vid, vinfo->flags);
259 259
260 if (err)
261 break;
262
263 break; 260 break;
264 261
265 case RTM_DELLINK: 262 case RTM_DELLINK:
@@ -276,7 +273,7 @@ static int br_afspec(struct net_bridge *br,
276 return err; 273 return err;
277} 274}
278 275
279static const struct nla_policy ifla_brport_policy[IFLA_BRPORT_MAX + 1] = { 276static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
280 [IFLA_BRPORT_STATE] = { .type = NLA_U8 }, 277 [IFLA_BRPORT_STATE] = { .type = NLA_U8 },
281 [IFLA_BRPORT_COST] = { .type = NLA_U32 }, 278 [IFLA_BRPORT_COST] = { .type = NLA_U32 },
282 [IFLA_BRPORT_PRIORITY] = { .type = NLA_U16 }, 279 [IFLA_BRPORT_PRIORITY] = { .type = NLA_U16 },
@@ -304,7 +301,7 @@ static int br_set_port_state(struct net_bridge_port *p, u8 state)
304 (!netif_oper_up(p->dev) && state != BR_STATE_DISABLED)) 301 (!netif_oper_up(p->dev) && state != BR_STATE_DISABLED))
305 return -ENETDOWN; 302 return -ENETDOWN;
306 303
307 p->state = state; 304 br_set_state(p, state);
308 br_log_state(p); 305 br_log_state(p);
309 br_port_state_selection(p->br); 306 br_port_state_selection(p->br);
310 return 0; 307 return 0;
@@ -382,7 +379,7 @@ int br_setlink(struct net_device *dev, struct nlmsghdr *nlh)
382 if (p && protinfo) { 379 if (p && protinfo) {
383 if (protinfo->nla_type & NLA_F_NESTED) { 380 if (protinfo->nla_type & NLA_F_NESTED) {
384 err = nla_parse_nested(tb, IFLA_BRPORT_MAX, 381 err = nla_parse_nested(tb, IFLA_BRPORT_MAX,
385 protinfo, ifla_brport_policy); 382 protinfo, br_port_policy);
386 if (err) 383 if (err)
387 return err; 384 return err;
388 385
@@ -461,6 +458,88 @@ static int br_dev_newlink(struct net *src_net, struct net_device *dev,
461 return register_netdevice(dev); 458 return register_netdevice(dev);
462} 459}
463 460
461static int br_port_slave_changelink(struct net_device *brdev,
462 struct net_device *dev,
463 struct nlattr *tb[],
464 struct nlattr *data[])
465{
466 if (!data)
467 return 0;
468 return br_setport(br_port_get_rtnl(dev), data);
469}
470
471static int br_port_fill_slave_info(struct sk_buff *skb,
472 const struct net_device *brdev,
473 const struct net_device *dev)
474{
475 return br_port_fill_attrs(skb, br_port_get_rtnl(dev));
476}
477
478static size_t br_port_get_slave_size(const struct net_device *brdev,
479 const struct net_device *dev)
480{
481 return br_port_info_size();
482}
483
484static const struct nla_policy br_policy[IFLA_BR_MAX + 1] = {
485 [IFLA_BR_FORWARD_DELAY] = { .type = NLA_U32 },
486 [IFLA_BR_HELLO_TIME] = { .type = NLA_U32 },
487 [IFLA_BR_MAX_AGE] = { .type = NLA_U32 },
488};
489
490static int br_changelink(struct net_device *brdev, struct nlattr *tb[],
491 struct nlattr *data[])
492{
493 struct net_bridge *br = netdev_priv(brdev);
494 int err;
495
496 if (!data)
497 return 0;
498
499 if (data[IFLA_BR_FORWARD_DELAY]) {
500 err = br_set_forward_delay(br, nla_get_u32(data[IFLA_BR_FORWARD_DELAY]));
501 if (err)
502 return err;
503 }
504
505 if (data[IFLA_BR_HELLO_TIME]) {
506 err = br_set_hello_time(br, nla_get_u32(data[IFLA_BR_HELLO_TIME]));
507 if (err)
508 return err;
509 }
510
511 if (data[IFLA_BR_MAX_AGE]) {
512 err = br_set_max_age(br, nla_get_u32(data[IFLA_BR_MAX_AGE]));
513 if (err)
514 return err;
515 }
516
517 return 0;
518}
519
520static size_t br_get_size(const struct net_device *brdev)
521{
522 return nla_total_size(sizeof(u32)) + /* IFLA_BR_FORWARD_DELAY */
523 nla_total_size(sizeof(u32)) + /* IFLA_BR_HELLO_TIME */
524 nla_total_size(sizeof(u32)) + /* IFLA_BR_MAX_AGE */
525 0;
526}
527
528static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev)
529{
530 struct net_bridge *br = netdev_priv(brdev);
531 u32 forward_delay = jiffies_to_clock_t(br->forward_delay);
532 u32 hello_time = jiffies_to_clock_t(br->hello_time);
533 u32 age_time = jiffies_to_clock_t(br->max_age);
534
535 if (nla_put_u32(skb, IFLA_BR_FORWARD_DELAY, forward_delay) ||
536 nla_put_u32(skb, IFLA_BR_HELLO_TIME, hello_time) ||
537 nla_put_u32(skb, IFLA_BR_MAX_AGE, age_time))
538 return -EMSGSIZE;
539
540 return 0;
541}
542
464static size_t br_get_link_af_size(const struct net_device *dev) 543static size_t br_get_link_af_size(const struct net_device *dev)
465{ 544{
466 struct net_port_vlans *pv; 545 struct net_port_vlans *pv;
@@ -485,12 +564,23 @@ static struct rtnl_af_ops br_af_ops = {
485}; 564};
486 565
487struct rtnl_link_ops br_link_ops __read_mostly = { 566struct rtnl_link_ops br_link_ops __read_mostly = {
488 .kind = "bridge", 567 .kind = "bridge",
489 .priv_size = sizeof(struct net_bridge), 568 .priv_size = sizeof(struct net_bridge),
490 .setup = br_dev_setup, 569 .setup = br_dev_setup,
491 .validate = br_validate, 570 .maxtype = IFLA_BRPORT_MAX,
492 .newlink = br_dev_newlink, 571 .policy = br_policy,
493 .dellink = br_dev_delete, 572 .validate = br_validate,
573 .newlink = br_dev_newlink,
574 .changelink = br_changelink,
575 .dellink = br_dev_delete,
576 .get_size = br_get_size,
577 .fill_info = br_fill_info,
578
579 .slave_maxtype = IFLA_BRPORT_MAX,
580 .slave_policy = br_port_policy,
581 .slave_changelink = br_port_slave_changelink,
582 .get_slave_size = br_port_get_slave_size,
583 .fill_slave_info = br_port_fill_slave_info,
494}; 584};
495 585
496int __init br_netlink_init(void) 586int __init br_netlink_init(void)
@@ -512,7 +602,7 @@ out_af:
512 return err; 602 return err;
513} 603}
514 604
515void __exit br_netlink_fini(void) 605void br_netlink_fini(void)
516{ 606{
517 br_mdb_uninit(); 607 br_mdb_uninit();
518 rtnl_af_unregister(&br_af_ops); 608 rtnl_af_unregister(&br_af_ops);
diff --git a/net/bridge/br_nf_core.c b/net/bridge/br_nf_core.c
new file mode 100644
index 000000000000..387cb3bd017c
--- /dev/null
+++ b/net/bridge/br_nf_core.c
@@ -0,0 +1,96 @@
1/*
2 * Handle firewalling core
3 * Linux ethernet bridge
4 *
5 * Authors:
6 * Lennert Buytenhek <buytenh@gnu.org>
7 * Bart De Schuymer <bdschuym@pandora.be>
8 *
9 * This program is free software; you can redistribute it and/or
10 * modify it under the terms of the GNU General Public License
11 * as published by the Free Software Foundation; either version
12 * 2 of the License, or (at your option) any later version.
13 *
14 * Lennert dedicates this file to Kerstin Wurdinger.
15 */
16
17#include <linux/module.h>
18#include <linux/kernel.h>
19#include <linux/in_route.h>
20#include <linux/inetdevice.h>
21#include <net/route.h>
22
23#include "br_private.h"
24#ifdef CONFIG_SYSCTL
25#include <linux/sysctl.h>
26#endif
27
28static void fake_update_pmtu(struct dst_entry *dst, struct sock *sk,
29 struct sk_buff *skb, u32 mtu)
30{
31}
32
33static void fake_redirect(struct dst_entry *dst, struct sock *sk,
34 struct sk_buff *skb)
35{
36}
37
38static u32 *fake_cow_metrics(struct dst_entry *dst, unsigned long old)
39{
40 return NULL;
41}
42
43static struct neighbour *fake_neigh_lookup(const struct dst_entry *dst,
44 struct sk_buff *skb,
45 const void *daddr)
46{
47 return NULL;
48}
49
50static unsigned int fake_mtu(const struct dst_entry *dst)
51{
52 return dst->dev->mtu;
53}
54
55static struct dst_ops fake_dst_ops = {
56 .family = AF_INET,
57 .protocol = cpu_to_be16(ETH_P_IP),
58 .update_pmtu = fake_update_pmtu,
59 .redirect = fake_redirect,
60 .cow_metrics = fake_cow_metrics,
61 .neigh_lookup = fake_neigh_lookup,
62 .mtu = fake_mtu,
63};
64
65/*
66 * Initialize bogus route table used to keep netfilter happy.
67 * Currently, we fill in the PMTU entry because netfilter
68 * refragmentation needs it, and the rt_flags entry because
69 * ipt_REJECT needs it. Future netfilter modules might
70 * require us to fill additional fields.
71 */
72static const u32 br_dst_default_metrics[RTAX_MAX] = {
73 [RTAX_MTU - 1] = 1500,
74};
75
76void br_netfilter_rtable_init(struct net_bridge *br)
77{
78 struct rtable *rt = &br->fake_rtable;
79
80 atomic_set(&rt->dst.__refcnt, 1);
81 rt->dst.dev = br->dev;
82 rt->dst.path = &rt->dst;
83 dst_init_metrics(&rt->dst, br_dst_default_metrics, true);
84 rt->dst.flags = DST_NOXFRM | DST_FAKE_RTABLE;
85 rt->dst.ops = &fake_dst_ops;
86}
87
88int __init br_nf_core_init(void)
89{
90 return dst_entries_init(&fake_dst_ops);
91}
92
93void br_nf_core_fini(void)
94{
95 dst_entries_destroy(&fake_dst_ops);
96}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index b6c04cbcfdc5..4d783d071305 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -221,7 +221,7 @@ struct net_bridge
221 struct pcpu_sw_netstats __percpu *stats; 221 struct pcpu_sw_netstats __percpu *stats;
222 spinlock_t hash_lock; 222 spinlock_t hash_lock;
223 struct hlist_head hash[BR_HASH_SIZE]; 223 struct hlist_head hash[BR_HASH_SIZE];
224#ifdef CONFIG_BRIDGE_NETFILTER 224#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
225 struct rtable fake_rtable; 225 struct rtable fake_rtable;
226 bool nf_call_iptables; 226 bool nf_call_iptables;
227 bool nf_call_ip6tables; 227 bool nf_call_ip6tables;
@@ -299,16 +299,21 @@ struct net_bridge
299#ifdef CONFIG_BRIDGE_VLAN_FILTERING 299#ifdef CONFIG_BRIDGE_VLAN_FILTERING
300 u8 vlan_enabled; 300 u8 vlan_enabled;
301 __be16 vlan_proto; 301 __be16 vlan_proto;
302 u16 default_pvid;
302 struct net_port_vlans __rcu *vlan_info; 303 struct net_port_vlans __rcu *vlan_info;
303#endif 304#endif
304}; 305};
305 306
306struct br_input_skb_cb { 307struct br_input_skb_cb {
307 struct net_device *brdev; 308 struct net_device *brdev;
309
308#ifdef CONFIG_BRIDGE_IGMP_SNOOPING 310#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
309 int igmp; 311 int igmp;
310 int mrouters_only; 312 int mrouters_only;
311#endif 313#endif
314
315 u16 frag_max_size;
316
312#ifdef CONFIG_BRIDGE_VLAN_FILTERING 317#ifdef CONFIG_BRIDGE_VLAN_FILTERING
313 bool vlan_filtered; 318 bool vlan_filtered;
314#endif 319#endif
@@ -604,11 +609,13 @@ bool br_vlan_find(struct net_bridge *br, u16 vid);
604void br_recalculate_fwd_mask(struct net_bridge *br); 609void br_recalculate_fwd_mask(struct net_bridge *br);
605int br_vlan_filter_toggle(struct net_bridge *br, unsigned long val); 610int br_vlan_filter_toggle(struct net_bridge *br, unsigned long val);
606int br_vlan_set_proto(struct net_bridge *br, unsigned long val); 611int br_vlan_set_proto(struct net_bridge *br, unsigned long val);
607void br_vlan_init(struct net_bridge *br); 612int br_vlan_init(struct net_bridge *br);
613int br_vlan_set_default_pvid(struct net_bridge *br, unsigned long val);
608int nbp_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags); 614int nbp_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags);
609int nbp_vlan_delete(struct net_bridge_port *port, u16 vid); 615int nbp_vlan_delete(struct net_bridge_port *port, u16 vid);
610void nbp_vlan_flush(struct net_bridge_port *port); 616void nbp_vlan_flush(struct net_bridge_port *port);
611bool nbp_vlan_find(struct net_bridge_port *port, u16 vid); 617bool nbp_vlan_find(struct net_bridge_port *port, u16 vid);
618int nbp_vlan_init(struct net_bridge_port *port);
612 619
613static inline struct net_port_vlans *br_get_vlan_info( 620static inline struct net_port_vlans *br_get_vlan_info(
614 const struct net_bridge *br) 621 const struct net_bridge *br)
@@ -641,11 +648,11 @@ static inline int br_vlan_get_tag(const struct sk_buff *skb, u16 *vid)
641 648
642static inline u16 br_get_pvid(const struct net_port_vlans *v) 649static inline u16 br_get_pvid(const struct net_port_vlans *v)
643{ 650{
644 /* Return just the VID if it is set, or VLAN_N_VID (invalid vid) if 651 if (!v)
645 * vid wasn't set 652 return 0;
646 */ 653
647 smp_rmb(); 654 smp_rmb();
648 return v->pvid ?: VLAN_N_VID; 655 return v->pvid;
649} 656}
650 657
651static inline int br_vlan_enabled(struct net_bridge *br) 658static inline int br_vlan_enabled(struct net_bridge *br)
@@ -704,8 +711,9 @@ static inline void br_recalculate_fwd_mask(struct net_bridge *br)
704{ 711{
705} 712}
706 713
707static inline void br_vlan_init(struct net_bridge *br) 714static inline int br_vlan_init(struct net_bridge *br)
708{ 715{
716 return 0;
709} 717}
710 718
711static inline int nbp_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags) 719static inline int nbp_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags)
@@ -738,13 +746,18 @@ static inline bool nbp_vlan_find(struct net_bridge_port *port, u16 vid)
738 return false; 746 return false;
739} 747}
740 748
749static inline int nbp_vlan_init(struct net_bridge_port *port)
750{
751 return 0;
752}
753
741static inline u16 br_vlan_get_tag(const struct sk_buff *skb, u16 *tag) 754static inline u16 br_vlan_get_tag(const struct sk_buff *skb, u16 *tag)
742{ 755{
743 return 0; 756 return 0;
744} 757}
745static inline u16 br_get_pvid(const struct net_port_vlans *v) 758static inline u16 br_get_pvid(const struct net_port_vlans *v)
746{ 759{
747 return VLAN_N_VID; /* Returns invalid vid */ 760 return 0;
748} 761}
749 762
750static inline int br_vlan_enabled(struct net_bridge *br) 763static inline int br_vlan_enabled(struct net_bridge *br)
@@ -754,18 +767,19 @@ static inline int br_vlan_enabled(struct net_bridge *br)
754#endif 767#endif
755 768
756/* br_netfilter.c */ 769/* br_netfilter.c */
757#ifdef CONFIG_BRIDGE_NETFILTER 770#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
758int br_netfilter_init(void); 771int br_nf_core_init(void);
759void br_netfilter_fini(void); 772void br_nf_core_fini(void);
760void br_netfilter_rtable_init(struct net_bridge *); 773void br_netfilter_rtable_init(struct net_bridge *);
761#else 774#else
762#define br_netfilter_init() (0) 775static inline int br_nf_core_init(void) { return 0; }
763#define br_netfilter_fini() do { } while (0) 776static inline void br_nf_core_fini(void) {}
764#define br_netfilter_rtable_init(x) 777#define br_netfilter_rtable_init(x)
765#endif 778#endif
766 779
767/* br_stp.c */ 780/* br_stp.c */
768void br_log_state(const struct net_bridge_port *p); 781void br_log_state(const struct net_bridge_port *p);
782void br_set_state(struct net_bridge_port *p, unsigned int state);
769struct net_bridge_port *br_get_port(struct net_bridge *br, u16 port_no); 783struct net_bridge_port *br_get_port(struct net_bridge *br, u16 port_no);
770void br_init_port(struct net_bridge_port *p); 784void br_init_port(struct net_bridge_port *p);
771void br_become_designated_port(struct net_bridge_port *p); 785void br_become_designated_port(struct net_bridge_port *p);
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index 3c86f0538cbb..2b047bcf42a4 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -36,6 +36,11 @@ void br_log_state(const struct net_bridge_port *p)
36 br_port_state_names[p->state]); 36 br_port_state_names[p->state]);
37} 37}
38 38
39void br_set_state(struct net_bridge_port *p, unsigned int state)
40{
41 p->state = state;
42}
43
39/* called under bridge lock */ 44/* called under bridge lock */
40struct net_bridge_port *br_get_port(struct net_bridge *br, u16 port_no) 45struct net_bridge_port *br_get_port(struct net_bridge *br, u16 port_no)
41{ 46{
@@ -107,7 +112,7 @@ static void br_root_port_block(const struct net_bridge *br,
107 br_notice(br, "port %u(%s) tried to become root port (blocked)", 112 br_notice(br, "port %u(%s) tried to become root port (blocked)",
108 (unsigned int) p->port_no, p->dev->name); 113 (unsigned int) p->port_no, p->dev->name);
109 114
110 p->state = BR_STATE_LISTENING; 115 br_set_state(p, BR_STATE_LISTENING);
111 br_log_state(p); 116 br_log_state(p);
112 br_ifinfo_notify(RTM_NEWLINK, p); 117 br_ifinfo_notify(RTM_NEWLINK, p);
113 118
@@ -387,7 +392,7 @@ static void br_make_blocking(struct net_bridge_port *p)
387 p->state == BR_STATE_LEARNING) 392 p->state == BR_STATE_LEARNING)
388 br_topology_change_detection(p->br); 393 br_topology_change_detection(p->br);
389 394
390 p->state = BR_STATE_BLOCKING; 395 br_set_state(p, BR_STATE_BLOCKING);
391 br_log_state(p); 396 br_log_state(p);
392 br_ifinfo_notify(RTM_NEWLINK, p); 397 br_ifinfo_notify(RTM_NEWLINK, p);
393 398
@@ -404,13 +409,13 @@ static void br_make_forwarding(struct net_bridge_port *p)
404 return; 409 return;
405 410
406 if (br->stp_enabled == BR_NO_STP || br->forward_delay == 0) { 411 if (br->stp_enabled == BR_NO_STP || br->forward_delay == 0) {
407 p->state = BR_STATE_FORWARDING; 412 br_set_state(p, BR_STATE_FORWARDING);
408 br_topology_change_detection(br); 413 br_topology_change_detection(br);
409 del_timer(&p->forward_delay_timer); 414 del_timer(&p->forward_delay_timer);
410 } else if (br->stp_enabled == BR_KERNEL_STP) 415 } else if (br->stp_enabled == BR_KERNEL_STP)
411 p->state = BR_STATE_LISTENING; 416 br_set_state(p, BR_STATE_LISTENING);
412 else 417 else
413 p->state = BR_STATE_LEARNING; 418 br_set_state(p, BR_STATE_LEARNING);
414 419
415 br_multicast_enable_port(p); 420 br_multicast_enable_port(p);
416 br_log_state(p); 421 br_log_state(p);
diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 189ba1e7d851..41146872c1b4 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -37,7 +37,7 @@ void br_init_port(struct net_bridge_port *p)
37{ 37{
38 p->port_id = br_make_port_id(p->priority, p->port_no); 38 p->port_id = br_make_port_id(p->priority, p->port_no);
39 br_become_designated_port(p); 39 br_become_designated_port(p);
40 p->state = BR_STATE_BLOCKING; 40 br_set_state(p, BR_STATE_BLOCKING);
41 p->topology_change_ack = 0; 41 p->topology_change_ack = 0;
42 p->config_pending = 0; 42 p->config_pending = 0;
43} 43}
@@ -100,7 +100,7 @@ void br_stp_disable_port(struct net_bridge_port *p)
100 100
101 wasroot = br_is_root_bridge(br); 101 wasroot = br_is_root_bridge(br);
102 br_become_designated_port(p); 102 br_become_designated_port(p);
103 p->state = BR_STATE_DISABLED; 103 br_set_state(p, BR_STATE_DISABLED);
104 p->topology_change_ack = 0; 104 p->topology_change_ack = 0;
105 p->config_pending = 0; 105 p->config_pending = 0;
106 106
diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
index 558c46d19e05..4fcaa67750fd 100644
--- a/net/bridge/br_stp_timer.c
+++ b/net/bridge/br_stp_timer.c
@@ -87,11 +87,11 @@ static void br_forward_delay_timer_expired(unsigned long arg)
87 (unsigned int) p->port_no, p->dev->name); 87 (unsigned int) p->port_no, p->dev->name);
88 spin_lock(&br->lock); 88 spin_lock(&br->lock);
89 if (p->state == BR_STATE_LISTENING) { 89 if (p->state == BR_STATE_LISTENING) {
90 p->state = BR_STATE_LEARNING; 90 br_set_state(p, BR_STATE_LEARNING);
91 mod_timer(&p->forward_delay_timer, 91 mod_timer(&p->forward_delay_timer,
92 jiffies + br->forward_delay); 92 jiffies + br->forward_delay);
93 } else if (p->state == BR_STATE_LEARNING) { 93 } else if (p->state == BR_STATE_LEARNING) {
94 p->state = BR_STATE_FORWARDING; 94 br_set_state(p, BR_STATE_FORWARDING);
95 if (br_is_designated_for_some_port(br)) 95 if (br_is_designated_for_some_port(br))
96 br_topology_change_detection(br); 96 br_topology_change_detection(br);
97 netif_carrier_on(br->dev); 97 netif_carrier_on(br->dev);
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index c9e2572b15f4..4c97fc50fb70 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -629,7 +629,7 @@ static ssize_t multicast_startup_query_interval_store(
629} 629}
630static DEVICE_ATTR_RW(multicast_startup_query_interval); 630static DEVICE_ATTR_RW(multicast_startup_query_interval);
631#endif 631#endif
632#ifdef CONFIG_BRIDGE_NETFILTER 632#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
633static ssize_t nf_call_iptables_show( 633static ssize_t nf_call_iptables_show(
634 struct device *d, struct device_attribute *attr, char *buf) 634 struct device *d, struct device_attribute *attr, char *buf)
635{ 635{
@@ -725,6 +725,22 @@ static ssize_t vlan_protocol_store(struct device *d,
725 return store_bridge_parm(d, buf, len, br_vlan_set_proto); 725 return store_bridge_parm(d, buf, len, br_vlan_set_proto);
726} 726}
727static DEVICE_ATTR_RW(vlan_protocol); 727static DEVICE_ATTR_RW(vlan_protocol);
728
729static ssize_t default_pvid_show(struct device *d,
730 struct device_attribute *attr,
731 char *buf)
732{
733 struct net_bridge *br = to_bridge(d);
734 return sprintf(buf, "%d\n", br->default_pvid);
735}
736
737static ssize_t default_pvid_store(struct device *d,
738 struct device_attribute *attr,
739 const char *buf, size_t len)
740{
741 return store_bridge_parm(d, buf, len, br_vlan_set_default_pvid);
742}
743static DEVICE_ATTR_RW(default_pvid);
728#endif 744#endif
729 745
730static struct attribute *bridge_attrs[] = { 746static struct attribute *bridge_attrs[] = {
@@ -763,7 +779,7 @@ static struct attribute *bridge_attrs[] = {
763 &dev_attr_multicast_query_response_interval.attr, 779 &dev_attr_multicast_query_response_interval.attr,
764 &dev_attr_multicast_startup_query_interval.attr, 780 &dev_attr_multicast_startup_query_interval.attr,
765#endif 781#endif
766#ifdef CONFIG_BRIDGE_NETFILTER 782#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
767 &dev_attr_nf_call_iptables.attr, 783 &dev_attr_nf_call_iptables.attr,
768 &dev_attr_nf_call_ip6tables.attr, 784 &dev_attr_nf_call_ip6tables.attr,
769 &dev_attr_nf_call_arptables.attr, 785 &dev_attr_nf_call_arptables.attr,
@@ -771,6 +787,7 @@ static struct attribute *bridge_attrs[] = {
771#ifdef CONFIG_BRIDGE_VLAN_FILTERING 787#ifdef CONFIG_BRIDGE_VLAN_FILTERING
772 &dev_attr_vlan_filtering.attr, 788 &dev_attr_vlan_filtering.attr,
773 &dev_attr_vlan_protocol.attr, 789 &dev_attr_vlan_protocol.attr,
790 &dev_attr_default_pvid.attr,
774#endif 791#endif
775 NULL 792 NULL
776}; 793};
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 3ba57fcdcd13..150048fb99b0 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -223,7 +223,7 @@ bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v,
223 * See if pvid is set on this port. That tells us which 223 * See if pvid is set on this port. That tells us which
224 * vlan untagged or priority-tagged traffic belongs to. 224 * vlan untagged or priority-tagged traffic belongs to.
225 */ 225 */
226 if (pvid == VLAN_N_VID) 226 if (!pvid)
227 goto drop; 227 goto drop;
228 228
229 /* PVID is set on this port. Any untagged or priority-tagged 229 /* PVID is set on this port. Any untagged or priority-tagged
@@ -292,7 +292,7 @@ bool br_should_learn(struct net_bridge_port *p, struct sk_buff *skb, u16 *vid)
292 292
293 if (!*vid) { 293 if (!*vid) {
294 *vid = br_get_pvid(v); 294 *vid = br_get_pvid(v);
295 if (*vid == VLAN_N_VID) 295 if (!*vid)
296 return false; 296 return false;
297 297
298 return true; 298 return true;
@@ -499,9 +499,141 @@ err_filt:
499 goto unlock; 499 goto unlock;
500} 500}
501 501
502void br_vlan_init(struct net_bridge *br) 502static bool vlan_default_pvid(struct net_port_vlans *pv, u16 vid)
503{
504 return pv && vid == pv->pvid && test_bit(vid, pv->untagged_bitmap);
505}
506
507static void br_vlan_disable_default_pvid(struct net_bridge *br)
508{
509 struct net_bridge_port *p;
510 u16 pvid = br->default_pvid;
511
512 /* Disable default_pvid on all ports where it is still
513 * configured.
514 */
515 if (vlan_default_pvid(br_get_vlan_info(br), pvid))
516 br_vlan_delete(br, pvid);
517
518 list_for_each_entry(p, &br->port_list, list) {
519 if (vlan_default_pvid(nbp_get_vlan_info(p), pvid))
520 nbp_vlan_delete(p, pvid);
521 }
522
523 br->default_pvid = 0;
524}
525
526static int __br_vlan_set_default_pvid(struct net_bridge *br, u16 pvid)
527{
528 struct net_bridge_port *p;
529 u16 old_pvid;
530 int err = 0;
531 unsigned long *changed;
532
533 changed = kcalloc(BITS_TO_LONGS(BR_MAX_PORTS), sizeof(unsigned long),
534 GFP_KERNEL);
535 if (!changed)
536 return -ENOMEM;
537
538 old_pvid = br->default_pvid;
539
540 /* Update default_pvid config only if we do not conflict with
541 * user configuration.
542 */
543 if ((!old_pvid || vlan_default_pvid(br_get_vlan_info(br), old_pvid)) &&
544 !br_vlan_find(br, pvid)) {
545 err = br_vlan_add(br, pvid,
546 BRIDGE_VLAN_INFO_PVID |
547 BRIDGE_VLAN_INFO_UNTAGGED);
548 if (err)
549 goto out;
550 br_vlan_delete(br, old_pvid);
551 set_bit(0, changed);
552 }
553
554 list_for_each_entry(p, &br->port_list, list) {
555 /* Update default_pvid config only if we do not conflict with
556 * user configuration.
557 */
558 if ((old_pvid &&
559 !vlan_default_pvid(nbp_get_vlan_info(p), old_pvid)) ||
560 nbp_vlan_find(p, pvid))
561 continue;
562
563 err = nbp_vlan_add(p, pvid,
564 BRIDGE_VLAN_INFO_PVID |
565 BRIDGE_VLAN_INFO_UNTAGGED);
566 if (err)
567 goto err_port;
568 nbp_vlan_delete(p, old_pvid);
569 set_bit(p->port_no, changed);
570 }
571
572 br->default_pvid = pvid;
573
574out:
575 kfree(changed);
576 return err;
577
578err_port:
579 list_for_each_entry_continue_reverse(p, &br->port_list, list) {
580 if (!test_bit(p->port_no, changed))
581 continue;
582
583 if (old_pvid)
584 nbp_vlan_add(p, old_pvid,
585 BRIDGE_VLAN_INFO_PVID |
586 BRIDGE_VLAN_INFO_UNTAGGED);
587 nbp_vlan_delete(p, pvid);
588 }
589
590 if (test_bit(0, changed)) {
591 if (old_pvid)
592 br_vlan_add(br, old_pvid,
593 BRIDGE_VLAN_INFO_PVID |
594 BRIDGE_VLAN_INFO_UNTAGGED);
595 br_vlan_delete(br, pvid);
596 }
597 goto out;
598}
599
600int br_vlan_set_default_pvid(struct net_bridge *br, unsigned long val)
601{
602 u16 pvid = val;
603 int err = 0;
604
605 if (val >= VLAN_VID_MASK)
606 return -EINVAL;
607
608 if (!rtnl_trylock())
609 return restart_syscall();
610
611 if (pvid == br->default_pvid)
612 goto unlock;
613
614 /* Only allow default pvid change when filtering is disabled */
615 if (br->vlan_enabled) {
616 pr_info_once("Please disable vlan filtering to change default_pvid\n");
617 err = -EPERM;
618 goto unlock;
619 }
620
621 if (!pvid)
622 br_vlan_disable_default_pvid(br);
623 else
624 err = __br_vlan_set_default_pvid(br, pvid);
625
626unlock:
627 rtnl_unlock();
628 return err;
629}
630
631int br_vlan_init(struct net_bridge *br)
503{ 632{
504 br->vlan_proto = htons(ETH_P_8021Q); 633 br->vlan_proto = htons(ETH_P_8021Q);
634 br->default_pvid = 1;
635 return br_vlan_add(br, 1,
636 BRIDGE_VLAN_INFO_PVID | BRIDGE_VLAN_INFO_UNTAGGED);
505} 637}
506 638
507/* Must be protected by RTNL. 639/* Must be protected by RTNL.
@@ -593,3 +725,12 @@ out:
593 rcu_read_unlock(); 725 rcu_read_unlock();
594 return found; 726 return found;
595} 727}
728
729int nbp_vlan_init(struct net_bridge_port *p)
730{
731 return p->br->default_pvid ?
732 nbp_vlan_add(p, p->br->default_pvid,
733 BRIDGE_VLAN_INFO_PVID |
734 BRIDGE_VLAN_INFO_UNTAGGED) :
735 0;
736}
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 6d69631b9f4d..d9a8c05d995d 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -26,6 +26,7 @@
26#include <asm/uaccess.h> 26#include <asm/uaccess.h>
27#include <linux/smp.h> 27#include <linux/smp.h>
28#include <linux/cpumask.h> 28#include <linux/cpumask.h>
29#include <linux/audit.h>
29#include <net/sock.h> 30#include <net/sock.h>
30/* needed for logical [in,out]-dev filtering */ 31/* needed for logical [in,out]-dev filtering */
31#include "../br_private.h" 32#include "../br_private.h"
@@ -1058,6 +1059,20 @@ static int do_replace_finish(struct net *net, struct ebt_replace *repl,
1058 vfree(table); 1059 vfree(table);
1059 1060
1060 vfree(counterstmp); 1061 vfree(counterstmp);
1062
1063#ifdef CONFIG_AUDIT
1064 if (audit_enabled) {
1065 struct audit_buffer *ab;
1066
1067 ab = audit_log_start(current->audit_context, GFP_KERNEL,
1068 AUDIT_NETFILTER_CFG);
1069 if (ab) {
1070 audit_log_format(ab, "table=%s family=%u entries=%u",
1071 repl->name, AF_BRIDGE, repl->nentries);
1072 audit_log_end(ab);
1073 }
1074 }
1075#endif
1061 return ret; 1076 return ret;
1062 1077
1063free_unlock: 1078free_unlock:
diff --git a/net/bridge/netfilter/nf_tables_bridge.c b/net/bridge/netfilter/nf_tables_bridge.c
index 5bcc0d8b31f2..da17a5eab8b4 100644
--- a/net/bridge/netfilter/nf_tables_bridge.c
+++ b/net/bridge/netfilter/nf_tables_bridge.c
@@ -34,9 +34,11 @@ static struct nft_af_info nft_af_bridge __read_mostly = {
34 .owner = THIS_MODULE, 34 .owner = THIS_MODULE,
35 .nops = 1, 35 .nops = 1,
36 .hooks = { 36 .hooks = {
37 [NF_BR_PRE_ROUTING] = nft_do_chain_bridge,
37 [NF_BR_LOCAL_IN] = nft_do_chain_bridge, 38 [NF_BR_LOCAL_IN] = nft_do_chain_bridge,
38 [NF_BR_FORWARD] = nft_do_chain_bridge, 39 [NF_BR_FORWARD] = nft_do_chain_bridge,
39 [NF_BR_LOCAL_OUT] = nft_do_chain_bridge, 40 [NF_BR_LOCAL_OUT] = nft_do_chain_bridge,
41 [NF_BR_POST_ROUTING] = nft_do_chain_bridge,
40 }, 42 },
41}; 43};
42 44
diff --git a/net/bridge/netfilter/nft_reject_bridge.c b/net/bridge/netfilter/nft_reject_bridge.c
index ee3ffe93e14e..a76479535df2 100644
--- a/net/bridge/netfilter/nft_reject_bridge.c
+++ b/net/bridge/netfilter/nft_reject_bridge.c
@@ -14,21 +14,106 @@
14#include <linux/netfilter/nf_tables.h> 14#include <linux/netfilter/nf_tables.h>
15#include <net/netfilter/nf_tables.h> 15#include <net/netfilter/nf_tables.h>
16#include <net/netfilter/nft_reject.h> 16#include <net/netfilter/nft_reject.h>
17#include <net/netfilter/ipv4/nf_reject.h>
18#include <net/netfilter/ipv6/nf_reject.h>
17 19
18static void nft_reject_bridge_eval(const struct nft_expr *expr, 20static void nft_reject_bridge_eval(const struct nft_expr *expr,
19 struct nft_data data[NFT_REG_MAX + 1], 21 struct nft_data data[NFT_REG_MAX + 1],
20 const struct nft_pktinfo *pkt) 22 const struct nft_pktinfo *pkt)
21{ 23{
24 struct nft_reject *priv = nft_expr_priv(expr);
25 struct net *net = dev_net((pkt->in != NULL) ? pkt->in : pkt->out);
26
22 switch (eth_hdr(pkt->skb)->h_proto) { 27 switch (eth_hdr(pkt->skb)->h_proto) {
23 case htons(ETH_P_IP): 28 case htons(ETH_P_IP):
24 return nft_reject_ipv4_eval(expr, data, pkt); 29 switch (priv->type) {
30 case NFT_REJECT_ICMP_UNREACH:
31 nf_send_unreach(pkt->skb, priv->icmp_code);
32 break;
33 case NFT_REJECT_TCP_RST:
34 nf_send_reset(pkt->skb, pkt->ops->hooknum);
35 break;
36 case NFT_REJECT_ICMPX_UNREACH:
37 nf_send_unreach(pkt->skb,
38 nft_reject_icmp_code(priv->icmp_code));
39 break;
40 }
41 break;
25 case htons(ETH_P_IPV6): 42 case htons(ETH_P_IPV6):
26 return nft_reject_ipv6_eval(expr, data, pkt); 43 switch (priv->type) {
44 case NFT_REJECT_ICMP_UNREACH:
45 nf_send_unreach6(net, pkt->skb, priv->icmp_code,
46 pkt->ops->hooknum);
47 break;
48 case NFT_REJECT_TCP_RST:
49 nf_send_reset6(net, pkt->skb, pkt->ops->hooknum);
50 break;
51 case NFT_REJECT_ICMPX_UNREACH:
52 nf_send_unreach6(net, pkt->skb,
53 nft_reject_icmpv6_code(priv->icmp_code),
54 pkt->ops->hooknum);
55 break;
56 }
57 break;
27 default: 58 default:
28 /* No explicit way to reject this protocol, drop it. */ 59 /* No explicit way to reject this protocol, drop it. */
29 data[NFT_REG_VERDICT].verdict = NF_DROP;
30 break; 60 break;
31 } 61 }
62 data[NFT_REG_VERDICT].verdict = NF_DROP;
63}
64
65static int nft_reject_bridge_init(const struct nft_ctx *ctx,
66 const struct nft_expr *expr,
67 const struct nlattr * const tb[])
68{
69 struct nft_reject *priv = nft_expr_priv(expr);
70 int icmp_code;
71
72 if (tb[NFTA_REJECT_TYPE] == NULL)
73 return -EINVAL;
74
75 priv->type = ntohl(nla_get_be32(tb[NFTA_REJECT_TYPE]));
76 switch (priv->type) {
77 case NFT_REJECT_ICMP_UNREACH:
78 case NFT_REJECT_ICMPX_UNREACH:
79 if (tb[NFTA_REJECT_ICMP_CODE] == NULL)
80 return -EINVAL;
81
82 icmp_code = nla_get_u8(tb[NFTA_REJECT_ICMP_CODE]);
83 if (priv->type == NFT_REJECT_ICMPX_UNREACH &&
84 icmp_code > NFT_REJECT_ICMPX_MAX)
85 return -EINVAL;
86
87 priv->icmp_code = icmp_code;
88 break;
89 case NFT_REJECT_TCP_RST:
90 break;
91 default:
92 return -EINVAL;
93 }
94 return 0;
95}
96
97static int nft_reject_bridge_dump(struct sk_buff *skb,
98 const struct nft_expr *expr)
99{
100 const struct nft_reject *priv = nft_expr_priv(expr);
101
102 if (nla_put_be32(skb, NFTA_REJECT_TYPE, htonl(priv->type)))
103 goto nla_put_failure;
104
105 switch (priv->type) {
106 case NFT_REJECT_ICMP_UNREACH:
107 case NFT_REJECT_ICMPX_UNREACH:
108 if (nla_put_u8(skb, NFTA_REJECT_ICMP_CODE, priv->icmp_code))
109 goto nla_put_failure;
110 break;
111 }
112
113 return 0;
114
115nla_put_failure:
116 return -1;
32} 117}
33 118
34static struct nft_expr_type nft_reject_bridge_type; 119static struct nft_expr_type nft_reject_bridge_type;
@@ -36,8 +121,8 @@ static const struct nft_expr_ops nft_reject_bridge_ops = {
36 .type = &nft_reject_bridge_type, 121 .type = &nft_reject_bridge_type,
37 .size = NFT_EXPR_SIZE(sizeof(struct nft_reject)), 122 .size = NFT_EXPR_SIZE(sizeof(struct nft_reject)),
38 .eval = nft_reject_bridge_eval, 123 .eval = nft_reject_bridge_eval,
39 .init = nft_reject_init, 124 .init = nft_reject_bridge_init,
40 .dump = nft_reject_dump, 125 .dump = nft_reject_bridge_dump,
41}; 126};
42 127
43static struct nft_expr_type nft_reject_bridge_type __read_mostly = { 128static struct nft_expr_type nft_reject_bridge_type __read_mostly = {
diff --git a/net/core/dev.c b/net/core/dev.c
index 130d64220229..4699dcfdc4ab 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -897,23 +897,25 @@ struct net_device *dev_getfirstbyhwtype(struct net *net, unsigned short type)
897EXPORT_SYMBOL(dev_getfirstbyhwtype); 897EXPORT_SYMBOL(dev_getfirstbyhwtype);
898 898
899/** 899/**
900 * dev_get_by_flags_rcu - find any device with given flags 900 * __dev_get_by_flags - find any device with given flags
901 * @net: the applicable net namespace 901 * @net: the applicable net namespace
902 * @if_flags: IFF_* values 902 * @if_flags: IFF_* values
903 * @mask: bitmask of bits in if_flags to check 903 * @mask: bitmask of bits in if_flags to check
904 * 904 *
905 * Search for any interface with the given flags. Returns NULL if a device 905 * Search for any interface with the given flags. Returns NULL if a device
906 * is not found or a pointer to the device. Must be called inside 906 * is not found or a pointer to the device. Must be called inside
907 * rcu_read_lock(), and result refcount is unchanged. 907 * rtnl_lock(), and result refcount is unchanged.
908 */ 908 */
909 909
910struct net_device *dev_get_by_flags_rcu(struct net *net, unsigned short if_flags, 910struct net_device *__dev_get_by_flags(struct net *net, unsigned short if_flags,
911 unsigned short mask) 911 unsigned short mask)
912{ 912{
913 struct net_device *dev, *ret; 913 struct net_device *dev, *ret;
914 914
915 ASSERT_RTNL();
916
915 ret = NULL; 917 ret = NULL;
916 for_each_netdev_rcu(net, dev) { 918 for_each_netdev(net, dev) {
917 if (((dev->flags ^ if_flags) & mask) == 0) { 919 if (((dev->flags ^ if_flags) & mask) == 0) {
918 ret = dev; 920 ret = dev;
919 break; 921 break;
@@ -921,7 +923,7 @@ struct net_device *dev_get_by_flags_rcu(struct net *net, unsigned short if_flags
921 } 923 }
922 return ret; 924 return ret;
923} 925}
924EXPORT_SYMBOL(dev_get_by_flags_rcu); 926EXPORT_SYMBOL(__dev_get_by_flags);
925 927
926/** 928/**
927 * dev_valid_name - check if name is okay for network device 929 * dev_valid_name - check if name is okay for network device
@@ -2175,6 +2177,53 @@ static struct dev_kfree_skb_cb *get_kfree_skb_cb(const struct sk_buff *skb)
2175 return (struct dev_kfree_skb_cb *)skb->cb; 2177 return (struct dev_kfree_skb_cb *)skb->cb;
2176} 2178}
2177 2179
2180void netif_schedule_queue(struct netdev_queue *txq)
2181{
2182 rcu_read_lock();
2183 if (!(txq->state & QUEUE_STATE_ANY_XOFF)) {
2184 struct Qdisc *q = rcu_dereference(txq->qdisc);
2185
2186 __netif_schedule(q);
2187 }
2188 rcu_read_unlock();
2189}
2190EXPORT_SYMBOL(netif_schedule_queue);
2191
2192/**
2193 * netif_wake_subqueue - allow sending packets on subqueue
2194 * @dev: network device
2195 * @queue_index: sub queue index
2196 *
2197 * Resume individual transmit queue of a device with multiple transmit queues.
2198 */
2199void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
2200{
2201 struct netdev_queue *txq = netdev_get_tx_queue(dev, queue_index);
2202
2203 if (test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &txq->state)) {
2204 struct Qdisc *q;
2205
2206 rcu_read_lock();
2207 q = rcu_dereference(txq->qdisc);
2208 __netif_schedule(q);
2209 rcu_read_unlock();
2210 }
2211}
2212EXPORT_SYMBOL(netif_wake_subqueue);
2213
2214void netif_tx_wake_queue(struct netdev_queue *dev_queue)
2215{
2216 if (test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state)) {
2217 struct Qdisc *q;
2218
2219 rcu_read_lock();
2220 q = rcu_dereference(dev_queue->qdisc);
2221 __netif_schedule(q);
2222 rcu_read_unlock();
2223 }
2224}
2225EXPORT_SYMBOL(netif_tx_wake_queue);
2226
2178void __dev_kfree_skb_irq(struct sk_buff *skb, enum skb_free_reason reason) 2227void __dev_kfree_skb_irq(struct sk_buff *skb, enum skb_free_reason reason)
2179{ 2228{
2180 unsigned long flags; 2229 unsigned long flags;
@@ -2371,16 +2420,6 @@ struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb,
2371 rcu_read_lock(); 2420 rcu_read_lock();
2372 list_for_each_entry_rcu(ptype, &offload_base, list) { 2421 list_for_each_entry_rcu(ptype, &offload_base, list) {
2373 if (ptype->type == type && ptype->callbacks.gso_segment) { 2422 if (ptype->type == type && ptype->callbacks.gso_segment) {
2374 if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL)) {
2375 int err;
2376
2377 err = ptype->callbacks.gso_send_check(skb);
2378 segs = ERR_PTR(err);
2379 if (err || skb_gso_ok(skb, features))
2380 break;
2381 __skb_push(skb, (skb->data -
2382 skb_network_header(skb)));
2383 }
2384 segs = ptype->callbacks.gso_segment(skb, features); 2423 segs = ptype->callbacks.gso_segment(skb, features);
2385 break; 2424 break;
2386 } 2425 }
@@ -2483,52 +2522,6 @@ static int illegal_highdma(struct net_device *dev, struct sk_buff *skb)
2483 return 0; 2522 return 0;
2484} 2523}
2485 2524
2486struct dev_gso_cb {
2487 void (*destructor)(struct sk_buff *skb);
2488};
2489
2490#define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb)
2491
2492static void dev_gso_skb_destructor(struct sk_buff *skb)
2493{
2494 struct dev_gso_cb *cb;
2495
2496 kfree_skb_list(skb->next);
2497 skb->next = NULL;
2498
2499 cb = DEV_GSO_CB(skb);
2500 if (cb->destructor)
2501 cb->destructor(skb);
2502}
2503
2504/**
2505 * dev_gso_segment - Perform emulated hardware segmentation on skb.
2506 * @skb: buffer to segment
2507 * @features: device features as applicable to this skb
2508 *
2509 * This function segments the given skb and stores the list of segments
2510 * in skb->next.
2511 */
2512static int dev_gso_segment(struct sk_buff *skb, netdev_features_t features)
2513{
2514 struct sk_buff *segs;
2515
2516 segs = skb_gso_segment(skb, features);
2517
2518 /* Verifying header integrity only. */
2519 if (!segs)
2520 return 0;
2521
2522 if (IS_ERR(segs))
2523 return PTR_ERR(segs);
2524
2525 skb->next = segs;
2526 DEV_GSO_CB(skb)->destructor = skb->destructor;
2527 skb->destructor = dev_gso_skb_destructor;
2528
2529 return 0;
2530}
2531
2532/* If MPLS offload request, verify we are testing hardware MPLS features 2525/* If MPLS offload request, verify we are testing hardware MPLS features
2533 * instead of standard features for the netdev. 2526 * instead of standard features for the netdev.
2534 */ 2527 */
@@ -2572,10 +2565,12 @@ static netdev_features_t harmonize_features(struct sk_buff *skb,
2572 2565
2573netdev_features_t netif_skb_features(struct sk_buff *skb) 2566netdev_features_t netif_skb_features(struct sk_buff *skb)
2574{ 2567{
2568 const struct net_device *dev = skb->dev;
2569 netdev_features_t features = dev->features;
2570 u16 gso_segs = skb_shinfo(skb)->gso_segs;
2575 __be16 protocol = skb->protocol; 2571 __be16 protocol = skb->protocol;
2576 netdev_features_t features = skb->dev->features;
2577 2572
2578 if (skb_shinfo(skb)->gso_segs > skb->dev->gso_max_segs) 2573 if (gso_segs > dev->gso_max_segs || gso_segs < dev->gso_min_segs)
2579 features &= ~NETIF_F_GSO_MASK; 2574 features &= ~NETIF_F_GSO_MASK;
2580 2575
2581 if (protocol == htons(ETH_P_8021Q) || protocol == htons(ETH_P_8021AD)) { 2576 if (protocol == htons(ETH_P_8021Q) || protocol == htons(ETH_P_8021AD)) {
@@ -2586,7 +2581,7 @@ netdev_features_t netif_skb_features(struct sk_buff *skb)
2586 } 2581 }
2587 2582
2588 features = netdev_intersect_features(features, 2583 features = netdev_intersect_features(features,
2589 skb->dev->vlan_features | 2584 dev->vlan_features |
2590 NETIF_F_HW_VLAN_CTAG_TX | 2585 NETIF_F_HW_VLAN_CTAG_TX |
2591 NETIF_F_HW_VLAN_STAG_TX); 2586 NETIF_F_HW_VLAN_STAG_TX);
2592 2587
@@ -2603,119 +2598,149 @@ netdev_features_t netif_skb_features(struct sk_buff *skb)
2603} 2598}
2604EXPORT_SYMBOL(netif_skb_features); 2599EXPORT_SYMBOL(netif_skb_features);
2605 2600
2606int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, 2601static int xmit_one(struct sk_buff *skb, struct net_device *dev,
2607 struct netdev_queue *txq) 2602 struct netdev_queue *txq, bool more)
2608{ 2603{
2609 const struct net_device_ops *ops = dev->netdev_ops; 2604 unsigned int len;
2610 int rc = NETDEV_TX_OK; 2605 int rc;
2611 unsigned int skb_len;
2612 2606
2613 if (likely(!skb->next)) { 2607 if (!list_empty(&ptype_all))
2614 netdev_features_t features; 2608 dev_queue_xmit_nit(skb, dev);
2615 2609
2616 /* 2610 len = skb->len;
2617 * If device doesn't need skb->dst, release it right now while 2611 trace_net_dev_start_xmit(skb, dev);
2618 * its hot in this cpu cache 2612 rc = netdev_start_xmit(skb, dev, txq, more);
2619 */ 2613 trace_net_dev_xmit(skb, rc, dev, len);
2620 if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
2621 skb_dst_drop(skb);
2622 2614
2623 features = netif_skb_features(skb); 2615 return rc;
2624 2616}
2625 if (vlan_tx_tag_present(skb) &&
2626 !vlan_hw_offload_capable(features, skb->vlan_proto)) {
2627 skb = __vlan_put_tag(skb, skb->vlan_proto,
2628 vlan_tx_tag_get(skb));
2629 if (unlikely(!skb))
2630 goto out;
2631 2617
2632 skb->vlan_tci = 0; 2618struct sk_buff *dev_hard_start_xmit(struct sk_buff *first, struct net_device *dev,
2633 } 2619 struct netdev_queue *txq, int *ret)
2620{
2621 struct sk_buff *skb = first;
2622 int rc = NETDEV_TX_OK;
2634 2623
2635 /* If encapsulation offload request, verify we are testing 2624 while (skb) {
2636 * hardware encapsulation features instead of standard 2625 struct sk_buff *next = skb->next;
2637 * features for the netdev
2638 */
2639 if (skb->encapsulation)
2640 features &= dev->hw_enc_features;
2641 2626
2642 if (netif_needs_gso(skb, features)) { 2627 skb->next = NULL;
2643 if (unlikely(dev_gso_segment(skb, features))) 2628 rc = xmit_one(skb, dev, txq, next != NULL);
2644 goto out_kfree_skb; 2629 if (unlikely(!dev_xmit_complete(rc))) {
2645 if (skb->next) 2630 skb->next = next;
2646 goto gso; 2631 goto out;
2647 } else { 2632 }
2648 if (skb_needs_linearize(skb, features) &&
2649 __skb_linearize(skb))
2650 goto out_kfree_skb;
2651 2633
2652 /* If packet is not checksummed and device does not 2634 skb = next;
2653 * support checksumming for this protocol, complete 2635 if (netif_xmit_stopped(txq) && skb) {
2654 * checksumming here. 2636 rc = NETDEV_TX_BUSY;
2655 */ 2637 break;
2656 if (skb->ip_summed == CHECKSUM_PARTIAL) {
2657 if (skb->encapsulation)
2658 skb_set_inner_transport_header(skb,
2659 skb_checksum_start_offset(skb));
2660 else
2661 skb_set_transport_header(skb,
2662 skb_checksum_start_offset(skb));
2663 if (!(features & NETIF_F_ALL_CSUM) &&
2664 skb_checksum_help(skb))
2665 goto out_kfree_skb;
2666 }
2667 } 2638 }
2639 }
2668 2640
2669 if (!list_empty(&ptype_all)) 2641out:
2670 dev_queue_xmit_nit(skb, dev); 2642 *ret = rc;
2643 return skb;
2644}
2671 2645
2672 skb_len = skb->len; 2646static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb,
2673 trace_net_dev_start_xmit(skb, dev); 2647 netdev_features_t features)
2674 rc = ops->ndo_start_xmit(skb, dev); 2648{
2675 trace_net_dev_xmit(skb, rc, dev, skb_len); 2649 if (vlan_tx_tag_present(skb) &&
2676 if (rc == NETDEV_TX_OK) 2650 !vlan_hw_offload_capable(features, skb->vlan_proto)) {
2677 txq_trans_update(txq); 2651 skb = __vlan_put_tag(skb, skb->vlan_proto,
2678 return rc; 2652 vlan_tx_tag_get(skb));
2653 if (skb)
2654 skb->vlan_tci = 0;
2679 } 2655 }
2656 return skb;
2657}
2680 2658
2681gso: 2659static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device *dev)
2682 do { 2660{
2683 struct sk_buff *nskb = skb->next; 2661 netdev_features_t features;
2684 2662
2685 skb->next = nskb->next; 2663 if (skb->next)
2686 nskb->next = NULL; 2664 return skb;
2687 2665
2688 if (!list_empty(&ptype_all)) 2666 features = netif_skb_features(skb);
2689 dev_queue_xmit_nit(nskb, dev); 2667 skb = validate_xmit_vlan(skb, features);
2690 2668 if (unlikely(!skb))
2691 skb_len = nskb->len; 2669 goto out_null;
2692 trace_net_dev_start_xmit(nskb, dev); 2670
2693 rc = ops->ndo_start_xmit(nskb, dev); 2671 /* If encapsulation offload request, verify we are testing
2694 trace_net_dev_xmit(nskb, rc, dev, skb_len); 2672 * hardware encapsulation features instead of standard
2695 if (unlikely(rc != NETDEV_TX_OK)) { 2673 * features for the netdev
2696 if (rc & ~NETDEV_TX_MASK) 2674 */
2697 goto out_kfree_gso_skb; 2675 if (skb->encapsulation)
2698 nskb->next = skb->next; 2676 features &= dev->hw_enc_features;
2699 skb->next = nskb; 2677
2700 return rc; 2678 if (netif_needs_gso(skb, features)) {
2679 struct sk_buff *segs;
2680
2681 segs = skb_gso_segment(skb, features);
2682 if (IS_ERR(segs)) {
2683 segs = NULL;
2684 } else if (segs) {
2685 consume_skb(skb);
2686 skb = segs;
2701 } 2687 }
2702 txq_trans_update(txq); 2688 } else {
2703 if (unlikely(netif_xmit_stopped(txq) && skb->next)) 2689 if (skb_needs_linearize(skb, features) &&
2704 return NETDEV_TX_BUSY; 2690 __skb_linearize(skb))
2705 } while (skb->next); 2691 goto out_kfree_skb;
2706 2692
2707out_kfree_gso_skb: 2693 /* If packet is not checksummed and device does not
2708 if (likely(skb->next == NULL)) { 2694 * support checksumming for this protocol, complete
2709 skb->destructor = DEV_GSO_CB(skb)->destructor; 2695 * checksumming here.
2710 consume_skb(skb); 2696 */
2711 return rc; 2697 if (skb->ip_summed == CHECKSUM_PARTIAL) {
2698 if (skb->encapsulation)
2699 skb_set_inner_transport_header(skb,
2700 skb_checksum_start_offset(skb));
2701 else
2702 skb_set_transport_header(skb,
2703 skb_checksum_start_offset(skb));
2704 if (!(features & NETIF_F_ALL_CSUM) &&
2705 skb_checksum_help(skb))
2706 goto out_kfree_skb;
2707 }
2712 } 2708 }
2709
2710 return skb;
2711
2713out_kfree_skb: 2712out_kfree_skb:
2714 kfree_skb(skb); 2713 kfree_skb(skb);
2715out: 2714out_null:
2716 return rc; 2715 return NULL;
2716}
2717
2718struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev)
2719{
2720 struct sk_buff *next, *head = NULL, *tail;
2721
2722 for (; skb != NULL; skb = next) {
2723 next = skb->next;
2724 skb->next = NULL;
2725
2726 /* in case skb wont be segmented, point to itself */
2727 skb->prev = skb;
2728
2729 skb = validate_xmit_skb(skb, dev);
2730 if (!skb)
2731 continue;
2732
2733 if (!head)
2734 head = skb;
2735 else
2736 tail->next = skb;
2737 /* If skb was segmented, skb->prev points to
2738 * the last segment. If not, it still contains skb.
2739 */
2740 tail = skb->prev;
2741 }
2742 return head;
2717} 2743}
2718EXPORT_SYMBOL_GPL(dev_hard_start_xmit);
2719 2744
2720static void qdisc_pkt_len_init(struct sk_buff *skb) 2745static void qdisc_pkt_len_init(struct sk_buff *skb)
2721{ 2746{
@@ -2778,12 +2803,10 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
2778 * waiting to be sent out; and the qdisc is not running - 2803 * waiting to be sent out; and the qdisc is not running -
2779 * xmit the skb directly. 2804 * xmit the skb directly.
2780 */ 2805 */
2781 if (!(dev->priv_flags & IFF_XMIT_DST_RELEASE))
2782 skb_dst_force(skb);
2783 2806
2784 qdisc_bstats_update(q, skb); 2807 qdisc_bstats_update(q, skb);
2785 2808
2786 if (sch_direct_xmit(skb, q, dev, txq, root_lock)) { 2809 if (sch_direct_xmit(skb, q, dev, txq, root_lock, true)) {
2787 if (unlikely(contended)) { 2810 if (unlikely(contended)) {
2788 spin_unlock(&q->busylock); 2811 spin_unlock(&q->busylock);
2789 contended = false; 2812 contended = false;
@@ -2794,7 +2817,6 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
2794 2817
2795 rc = NET_XMIT_SUCCESS; 2818 rc = NET_XMIT_SUCCESS;
2796 } else { 2819 } else {
2797 skb_dst_force(skb);
2798 rc = q->enqueue(skb, q) & NET_XMIT_MASK; 2820 rc = q->enqueue(skb, q) & NET_XMIT_MASK;
2799 if (qdisc_run_begin(q)) { 2821 if (qdisc_run_begin(q)) {
2800 if (unlikely(contended)) { 2822 if (unlikely(contended)) {
@@ -2891,6 +2913,14 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
2891 2913
2892 skb_update_prio(skb); 2914 skb_update_prio(skb);
2893 2915
2916 /* If device/qdisc don't need skb->dst, release it right now while
2917 * its hot in this cpu cache.
2918 */
2919 if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
2920 skb_dst_drop(skb);
2921 else
2922 skb_dst_force(skb);
2923
2894 txq = netdev_pick_tx(dev, skb, accel_priv); 2924 txq = netdev_pick_tx(dev, skb, accel_priv);
2895 q = rcu_dereference_bh(txq->qdisc); 2925 q = rcu_dereference_bh(txq->qdisc);
2896 2926
@@ -2923,11 +2953,15 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
2923 if (__this_cpu_read(xmit_recursion) > RECURSION_LIMIT) 2953 if (__this_cpu_read(xmit_recursion) > RECURSION_LIMIT)
2924 goto recursion_alert; 2954 goto recursion_alert;
2925 2955
2956 skb = validate_xmit_skb(skb, dev);
2957 if (!skb)
2958 goto drop;
2959
2926 HARD_TX_LOCK(dev, txq, cpu); 2960 HARD_TX_LOCK(dev, txq, cpu);
2927 2961
2928 if (!netif_xmit_stopped(txq)) { 2962 if (!netif_xmit_stopped(txq)) {
2929 __this_cpu_inc(xmit_recursion); 2963 __this_cpu_inc(xmit_recursion);
2930 rc = dev_hard_start_xmit(skb, dev, txq); 2964 skb = dev_hard_start_xmit(skb, dev, txq, &rc);
2931 __this_cpu_dec(xmit_recursion); 2965 __this_cpu_dec(xmit_recursion);
2932 if (dev_xmit_complete(rc)) { 2966 if (dev_xmit_complete(rc)) {
2933 HARD_TX_UNLOCK(dev, txq); 2967 HARD_TX_UNLOCK(dev, txq);
@@ -2948,10 +2982,11 @@ recursion_alert:
2948 } 2982 }
2949 2983
2950 rc = -ENETDOWN; 2984 rc = -ENETDOWN;
2985drop:
2951 rcu_read_unlock_bh(); 2986 rcu_read_unlock_bh();
2952 2987
2953 atomic_long_inc(&dev->tx_dropped); 2988 atomic_long_inc(&dev->tx_dropped);
2954 kfree_skb(skb); 2989 kfree_skb_list(skb);
2955 return rc; 2990 return rc;
2956out: 2991out:
2957 rcu_read_unlock_bh(); 2992 rcu_read_unlock_bh();
@@ -3128,8 +3163,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
3128 } 3163 }
3129 3164
3130 if (map) { 3165 if (map) {
3131 tcpu = map->cpus[((u64) hash * map->len) >> 32]; 3166 tcpu = map->cpus[reciprocal_scale(hash, map->len)];
3132
3133 if (cpu_online(tcpu)) { 3167 if (cpu_online(tcpu)) {
3134 cpu = tcpu; 3168 cpu = tcpu;
3135 goto done; 3169 goto done;
@@ -3465,7 +3499,7 @@ static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
3465 skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl); 3499 skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl);
3466 skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS); 3500 skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
3467 3501
3468 q = rxq->qdisc; 3502 q = rcu_dereference(rxq->qdisc);
3469 if (q != &noop_qdisc) { 3503 if (q != &noop_qdisc) {
3470 spin_lock(qdisc_lock(q)); 3504 spin_lock(qdisc_lock(q));
3471 if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) 3505 if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
@@ -3482,7 +3516,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
3482{ 3516{
3483 struct netdev_queue *rxq = rcu_dereference(skb->dev->ingress_queue); 3517 struct netdev_queue *rxq = rcu_dereference(skb->dev->ingress_queue);
3484 3518
3485 if (!rxq || rxq->qdisc == &noop_qdisc) 3519 if (!rxq || rcu_access_pointer(rxq->qdisc) == &noop_qdisc)
3486 goto out; 3520 goto out;
3487 3521
3488 if (*pt_prev) { 3522 if (*pt_prev) {
@@ -3963,11 +3997,10 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
3963 if (!(skb->dev->features & NETIF_F_GRO)) 3997 if (!(skb->dev->features & NETIF_F_GRO))
3964 goto normal; 3998 goto normal;
3965 3999
3966 if (skb_is_gso(skb) || skb_has_frag_list(skb)) 4000 if (skb_is_gso(skb) || skb_has_frag_list(skb) || skb->csum_bad)
3967 goto normal; 4001 goto normal;
3968 4002
3969 gro_list_prepare(napi, skb); 4003 gro_list_prepare(napi, skb);
3970 NAPI_GRO_CB(skb)->csum = skb->csum; /* Needed for CHECKSUM_COMPLETE */
3971 4004
3972 rcu_read_lock(); 4005 rcu_read_lock();
3973 list_for_each_entry_rcu(ptype, head, list) { 4006 list_for_each_entry_rcu(ptype, head, list) {
@@ -3981,6 +4014,22 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
3981 NAPI_GRO_CB(skb)->free = 0; 4014 NAPI_GRO_CB(skb)->free = 0;
3982 NAPI_GRO_CB(skb)->udp_mark = 0; 4015 NAPI_GRO_CB(skb)->udp_mark = 0;
3983 4016
4017 /* Setup for GRO checksum validation */
4018 switch (skb->ip_summed) {
4019 case CHECKSUM_COMPLETE:
4020 NAPI_GRO_CB(skb)->csum = skb->csum;
4021 NAPI_GRO_CB(skb)->csum_valid = 1;
4022 NAPI_GRO_CB(skb)->csum_cnt = 0;
4023 break;
4024 case CHECKSUM_UNNECESSARY:
4025 NAPI_GRO_CB(skb)->csum_cnt = skb->csum_level + 1;
4026 NAPI_GRO_CB(skb)->csum_valid = 0;
4027 break;
4028 default:
4029 NAPI_GRO_CB(skb)->csum_cnt = 0;
4030 NAPI_GRO_CB(skb)->csum_valid = 0;
4031 }
4032
3984 pp = ptype->callbacks.gro_receive(&napi->gro_list, skb); 4033 pp = ptype->callbacks.gro_receive(&napi->gro_list, skb);
3985 break; 4034 break;
3986 } 4035 }
@@ -4210,6 +4259,31 @@ gro_result_t napi_gro_frags(struct napi_struct *napi)
4210} 4259}
4211EXPORT_SYMBOL(napi_gro_frags); 4260EXPORT_SYMBOL(napi_gro_frags);
4212 4261
4262/* Compute the checksum from gro_offset and return the folded value
4263 * after adding in any pseudo checksum.
4264 */
4265__sum16 __skb_gro_checksum_complete(struct sk_buff *skb)
4266{
4267 __wsum wsum;
4268 __sum16 sum;
4269
4270 wsum = skb_checksum(skb, skb_gro_offset(skb), skb_gro_len(skb), 0);
4271
4272 /* NAPI_GRO_CB(skb)->csum holds pseudo checksum */
4273 sum = csum_fold(csum_add(NAPI_GRO_CB(skb)->csum, wsum));
4274 if (likely(!sum)) {
4275 if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
4276 !skb->csum_complete_sw)
4277 netdev_rx_csum_fault(skb->dev);
4278 }
4279
4280 NAPI_GRO_CB(skb)->csum = wsum;
4281 NAPI_GRO_CB(skb)->csum_valid = 1;
4282
4283 return sum;
4284}
4285EXPORT_SYMBOL(__skb_gro_checksum_complete);
4286
4213/* 4287/*
4214 * net_rps_action_and_irq_enable sends any pending IPI's for rps. 4288 * net_rps_action_and_irq_enable sends any pending IPI's for rps.
4215 * Note: called with local irq disabled, but exits with local irq enabled. 4289 * Note: called with local irq disabled, but exits with local irq enabled.
@@ -6579,6 +6653,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
6579 6653
6580 dev->gso_max_size = GSO_MAX_SIZE; 6654 dev->gso_max_size = GSO_MAX_SIZE;
6581 dev->gso_max_segs = GSO_MAX_SEGS; 6655 dev->gso_max_segs = GSO_MAX_SEGS;
6656 dev->gso_min_segs = 0;
6582 6657
6583 INIT_LIST_HEAD(&dev->napi_list); 6658 INIT_LIST_HEAD(&dev->napi_list);
6584 INIT_LIST_HEAD(&dev->unreg_list); 6659 INIT_LIST_HEAD(&dev->unreg_list);
@@ -6588,7 +6663,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
6588 INIT_LIST_HEAD(&dev->adj_list.lower); 6663 INIT_LIST_HEAD(&dev->adj_list.lower);
6589 INIT_LIST_HEAD(&dev->all_adj_list.upper); 6664 INIT_LIST_HEAD(&dev->all_adj_list.upper);
6590 INIT_LIST_HEAD(&dev->all_adj_list.lower); 6665 INIT_LIST_HEAD(&dev->all_adj_list.lower);
6591 dev->priv_flags = IFF_XMIT_DST_RELEASE; 6666 dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
6592 setup(dev); 6667 setup(dev);
6593 6668
6594 dev->num_tx_queues = txqs; 6669 dev->num_tx_queues = txqs;
@@ -7010,53 +7085,45 @@ const char *netdev_drivername(const struct net_device *dev)
7010 return empty; 7085 return empty;
7011} 7086}
7012 7087
7013static int __netdev_printk(const char *level, const struct net_device *dev, 7088static void __netdev_printk(const char *level, const struct net_device *dev,
7014 struct va_format *vaf) 7089 struct va_format *vaf)
7015{ 7090{
7016 int r;
7017
7018 if (dev && dev->dev.parent) { 7091 if (dev && dev->dev.parent) {
7019 r = dev_printk_emit(level[1] - '0', 7092 dev_printk_emit(level[1] - '0',
7020 dev->dev.parent, 7093 dev->dev.parent,
7021 "%s %s %s%s: %pV", 7094 "%s %s %s%s: %pV",
7022 dev_driver_string(dev->dev.parent), 7095 dev_driver_string(dev->dev.parent),
7023 dev_name(dev->dev.parent), 7096 dev_name(dev->dev.parent),
7024 netdev_name(dev), netdev_reg_state(dev), 7097 netdev_name(dev), netdev_reg_state(dev),
7025 vaf); 7098 vaf);
7026 } else if (dev) { 7099 } else if (dev) {
7027 r = printk("%s%s%s: %pV", level, netdev_name(dev), 7100 printk("%s%s%s: %pV",
7028 netdev_reg_state(dev), vaf); 7101 level, netdev_name(dev), netdev_reg_state(dev), vaf);
7029 } else { 7102 } else {
7030 r = printk("%s(NULL net_device): %pV", level, vaf); 7103 printk("%s(NULL net_device): %pV", level, vaf);
7031 } 7104 }
7032
7033 return r;
7034} 7105}
7035 7106
7036int netdev_printk(const char *level, const struct net_device *dev, 7107void netdev_printk(const char *level, const struct net_device *dev,
7037 const char *format, ...) 7108 const char *format, ...)
7038{ 7109{
7039 struct va_format vaf; 7110 struct va_format vaf;
7040 va_list args; 7111 va_list args;
7041 int r;
7042 7112
7043 va_start(args, format); 7113 va_start(args, format);
7044 7114
7045 vaf.fmt = format; 7115 vaf.fmt = format;
7046 vaf.va = &args; 7116 vaf.va = &args;
7047 7117
7048 r = __netdev_printk(level, dev, &vaf); 7118 __netdev_printk(level, dev, &vaf);
7049 7119
7050 va_end(args); 7120 va_end(args);
7051
7052 return r;
7053} 7121}
7054EXPORT_SYMBOL(netdev_printk); 7122EXPORT_SYMBOL(netdev_printk);
7055 7123
7056#define define_netdev_printk_level(func, level) \ 7124#define define_netdev_printk_level(func, level) \
7057int func(const struct net_device *dev, const char *fmt, ...) \ 7125void func(const struct net_device *dev, const char *fmt, ...) \
7058{ \ 7126{ \
7059 int r; \
7060 struct va_format vaf; \ 7127 struct va_format vaf; \
7061 va_list args; \ 7128 va_list args; \
7062 \ 7129 \
@@ -7065,11 +7132,9 @@ int func(const struct net_device *dev, const char *fmt, ...) \
7065 vaf.fmt = fmt; \ 7132 vaf.fmt = fmt; \
7066 vaf.va = &args; \ 7133 vaf.va = &args; \
7067 \ 7134 \
7068 r = __netdev_printk(level, dev, &vaf); \ 7135 __netdev_printk(level, dev, &vaf); \
7069 \ 7136 \
7070 va_end(args); \ 7137 va_end(args); \
7071 \
7072 return r; \
7073} \ 7138} \
7074EXPORT_SYMBOL(func); 7139EXPORT_SYMBOL(func);
7075 7140
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index cf999e09bcd2..72e899a3efda 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -365,11 +365,8 @@ void dev_load(struct net *net, const char *name)
365 no_module = !dev; 365 no_module = !dev;
366 if (no_module && capable(CAP_NET_ADMIN)) 366 if (no_module && capable(CAP_NET_ADMIN))
367 no_module = request_module("netdev-%s", name); 367 no_module = request_module("netdev-%s", name);
368 if (no_module && capable(CAP_SYS_MODULE)) { 368 if (no_module && capable(CAP_SYS_MODULE))
369 if (!request_module("%s", name)) 369 request_module("%s", name);
370 pr_warn("Loading kernel module for a network device with CAP_SYS_MODULE (deprecated). Use CAP_NET_ADMIN and alias netdev-%s instead.\n",
371 name);
372 }
373} 370}
374EXPORT_SYMBOL(dev_load); 371EXPORT_SYMBOL(dev_load);
375 372
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 17cb912793fa..1600aa24d36b 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1621,6 +1621,81 @@ static int ethtool_get_module_eeprom(struct net_device *dev,
1621 modinfo.eeprom_len); 1621 modinfo.eeprom_len);
1622} 1622}
1623 1623
1624static int ethtool_tunable_valid(const struct ethtool_tunable *tuna)
1625{
1626 switch (tuna->id) {
1627 case ETHTOOL_RX_COPYBREAK:
1628 case ETHTOOL_TX_COPYBREAK:
1629 if (tuna->len != sizeof(u32) ||
1630 tuna->type_id != ETHTOOL_TUNABLE_U32)
1631 return -EINVAL;
1632 break;
1633 default:
1634 return -EINVAL;
1635 }
1636
1637 return 0;
1638}
1639
1640static int ethtool_get_tunable(struct net_device *dev, void __user *useraddr)
1641{
1642 int ret;
1643 struct ethtool_tunable tuna;
1644 const struct ethtool_ops *ops = dev->ethtool_ops;
1645 void *data;
1646
1647 if (!ops->get_tunable)
1648 return -EOPNOTSUPP;
1649 if (copy_from_user(&tuna, useraddr, sizeof(tuna)))
1650 return -EFAULT;
1651 ret = ethtool_tunable_valid(&tuna);
1652 if (ret)
1653 return ret;
1654 data = kmalloc(tuna.len, GFP_USER);
1655 if (!data)
1656 return -ENOMEM;
1657 ret = ops->get_tunable(dev, &tuna, data);
1658 if (ret)
1659 goto out;
1660 useraddr += sizeof(tuna);
1661 ret = -EFAULT;
1662 if (copy_to_user(useraddr, data, tuna.len))
1663 goto out;
1664 ret = 0;
1665
1666out:
1667 kfree(data);
1668 return ret;
1669}
1670
1671static int ethtool_set_tunable(struct net_device *dev, void __user *useraddr)
1672{
1673 int ret;
1674 struct ethtool_tunable tuna;
1675 const struct ethtool_ops *ops = dev->ethtool_ops;
1676 void *data;
1677
1678 if (!ops->set_tunable)
1679 return -EOPNOTSUPP;
1680 if (copy_from_user(&tuna, useraddr, sizeof(tuna)))
1681 return -EFAULT;
1682 ret = ethtool_tunable_valid(&tuna);
1683 if (ret)
1684 return ret;
1685 data = kmalloc(tuna.len, GFP_USER);
1686 if (!data)
1687 return -ENOMEM;
1688 useraddr += sizeof(tuna);
1689 ret = -EFAULT;
1690 if (copy_from_user(data, useraddr, tuna.len))
1691 goto out;
1692 ret = ops->set_tunable(dev, &tuna, data);
1693
1694out:
1695 kfree(data);
1696 return ret;
1697}
1698
1624/* The main entry point in this file. Called from net/core/dev_ioctl.c */ 1699/* The main entry point in this file. Called from net/core/dev_ioctl.c */
1625 1700
1626int dev_ethtool(struct net *net, struct ifreq *ifr) 1701int dev_ethtool(struct net *net, struct ifreq *ifr)
@@ -1670,6 +1745,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
1670 case ETHTOOL_GCHANNELS: 1745 case ETHTOOL_GCHANNELS:
1671 case ETHTOOL_GET_TS_INFO: 1746 case ETHTOOL_GET_TS_INFO:
1672 case ETHTOOL_GEEE: 1747 case ETHTOOL_GEEE:
1748 case ETHTOOL_GTUNABLE:
1673 break; 1749 break;
1674 default: 1750 default:
1675 if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) 1751 if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
@@ -1857,6 +1933,12 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
1857 case ETHTOOL_GMODULEEEPROM: 1933 case ETHTOOL_GMODULEEEPROM:
1858 rc = ethtool_get_module_eeprom(dev, useraddr); 1934 rc = ethtool_get_module_eeprom(dev, useraddr);
1859 break; 1935 break;
1936 case ETHTOOL_GTUNABLE:
1937 rc = ethtool_get_tunable(dev, useraddr);
1938 break;
1939 case ETHTOOL_STUNABLE:
1940 rc = ethtool_set_tunable(dev, useraddr);
1941 break;
1860 default: 1942 default:
1861 rc = -EOPNOTSUPP; 1943 rc = -EOPNOTSUPP;
1862 } 1944 }
diff --git a/net/core/filter.c b/net/core/filter.c
index d814b8a89d0f..fcd3f6742a6a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -87,33 +87,9 @@ int sk_filter(struct sock *sk, struct sk_buff *skb)
87} 87}
88EXPORT_SYMBOL(sk_filter); 88EXPORT_SYMBOL(sk_filter);
89 89
90/* Helper to find the offset of pkt_type in sk_buff structure. We want
91 * to make sure its still a 3bit field starting at a byte boundary;
92 * taken from arch/x86/net/bpf_jit_comp.c.
93 */
94#ifdef __BIG_ENDIAN_BITFIELD
95#define PKT_TYPE_MAX (7 << 5)
96#else
97#define PKT_TYPE_MAX 7
98#endif
99static unsigned int pkt_type_offset(void)
100{
101 struct sk_buff skb_probe = { .pkt_type = ~0, };
102 u8 *ct = (u8 *) &skb_probe;
103 unsigned int off;
104
105 for (off = 0; off < sizeof(struct sk_buff); off++) {
106 if (ct[off] == PKT_TYPE_MAX)
107 return off;
108 }
109
110 pr_err_once("Please fix %s, as pkt_type couldn't be found!\n", __func__);
111 return -1;
112}
113
114static u64 __skb_get_pay_offset(u64 ctx, u64 a, u64 x, u64 r4, u64 r5) 90static u64 __skb_get_pay_offset(u64 ctx, u64 a, u64 x, u64 r4, u64 r5)
115{ 91{
116 return __skb_get_poff((struct sk_buff *)(unsigned long) ctx); 92 return skb_get_poff((struct sk_buff *)(unsigned long) ctx);
117} 93}
118 94
119static u64 __skb_get_nlattr(u64 ctx, u64 a, u64 x, u64 r4, u64 r5) 95static u64 __skb_get_nlattr(u64 ctx, u64 a, u64 x, u64 r4, u64 r5)
@@ -190,11 +166,8 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
190 break; 166 break;
191 167
192 case SKF_AD_OFF + SKF_AD_PKTTYPE: 168 case SKF_AD_OFF + SKF_AD_PKTTYPE:
193 *insn = BPF_LDX_MEM(BPF_B, BPF_REG_A, BPF_REG_CTX, 169 *insn++ = BPF_LDX_MEM(BPF_B, BPF_REG_A, BPF_REG_CTX,
194 pkt_type_offset()); 170 PKT_TYPE_OFFSET());
195 if (insn->off < 0)
196 return false;
197 insn++;
198 *insn = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, PKT_TYPE_MAX); 171 *insn = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, PKT_TYPE_MAX);
199#ifdef __BIG_ENDIAN_BITFIELD 172#ifdef __BIG_ENDIAN_BITFIELD
200 insn++; 173 insn++;
@@ -933,7 +906,7 @@ static struct bpf_prog *bpf_migrate_filter(struct bpf_prog *fp)
933 906
934 /* Expand fp for appending the new filter representation. */ 907 /* Expand fp for appending the new filter representation. */
935 old_fp = fp; 908 old_fp = fp;
936 fp = krealloc(old_fp, bpf_prog_size(new_len), GFP_KERNEL); 909 fp = bpf_prog_realloc(old_fp, bpf_prog_size(new_len), 0);
937 if (!fp) { 910 if (!fp) {
938 /* The old_fp is still around in case we couldn't 911 /* The old_fp is still around in case we couldn't
939 * allocate new memory, so uncharge on that one. 912 * allocate new memory, so uncharge on that one.
@@ -972,7 +945,7 @@ static struct bpf_prog *bpf_prepare_filter(struct bpf_prog *fp)
972 int err; 945 int err;
973 946
974 fp->bpf_func = NULL; 947 fp->bpf_func = NULL;
975 fp->jited = 0; 948 fp->jited = false;
976 949
977 err = bpf_check_classic(fp->insns, fp->len); 950 err = bpf_check_classic(fp->insns, fp->len);
978 if (err) { 951 if (err) {
@@ -1013,7 +986,7 @@ int bpf_prog_create(struct bpf_prog **pfp, struct sock_fprog_kern *fprog)
1013 if (fprog->filter == NULL) 986 if (fprog->filter == NULL)
1014 return -EINVAL; 987 return -EINVAL;
1015 988
1016 fp = kmalloc(bpf_prog_size(fprog->len), GFP_KERNEL); 989 fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0);
1017 if (!fp) 990 if (!fp)
1018 return -ENOMEM; 991 return -ENOMEM;
1019 992
@@ -1069,12 +1042,12 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
1069 if (fprog->filter == NULL) 1042 if (fprog->filter == NULL)
1070 return -EINVAL; 1043 return -EINVAL;
1071 1044
1072 prog = kmalloc(bpf_fsize, GFP_KERNEL); 1045 prog = bpf_prog_alloc(bpf_fsize, 0);
1073 if (!prog) 1046 if (!prog)
1074 return -ENOMEM; 1047 return -ENOMEM;
1075 1048
1076 if (copy_from_user(prog->insns, fprog->filter, fsize)) { 1049 if (copy_from_user(prog->insns, fprog->filter, fsize)) {
1077 kfree(prog); 1050 __bpf_prog_free(prog);
1078 return -EFAULT; 1051 return -EFAULT;
1079 } 1052 }
1080 1053
@@ -1082,7 +1055,7 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
1082 1055
1083 err = bpf_prog_store_orig_filter(prog, fprog); 1056 err = bpf_prog_store_orig_filter(prog, fprog);
1084 if (err) { 1057 if (err) {
1085 kfree(prog); 1058 __bpf_prog_free(prog);
1086 return -ENOMEM; 1059 return -ENOMEM;
1087 } 1060 }
1088 1061
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 5f362c1d0332..8560dea58803 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -13,6 +13,7 @@
13#include <linux/if_pppox.h> 13#include <linux/if_pppox.h>
14#include <linux/ppp_defs.h> 14#include <linux/ppp_defs.h>
15#include <net/flow_keys.h> 15#include <net/flow_keys.h>
16#include <scsi/fc/fc_fcoe.h>
16 17
17/* copy saddr & daddr, possibly using 64bit load/store 18/* copy saddr & daddr, possibly using 64bit load/store
18 * Equivalent to : flow->src = iph->saddr; 19 * Equivalent to : flow->src = iph->saddr;
@@ -26,36 +27,61 @@ static void iph_to_flow_copy_addrs(struct flow_keys *flow, const struct iphdr *i
26} 27}
27 28
28/** 29/**
29 * skb_flow_get_ports - extract the upper layer ports and return them 30 * __skb_flow_get_ports - extract the upper layer ports and return them
30 * @skb: buffer to extract the ports from 31 * @skb: sk_buff to extract the ports from
31 * @thoff: transport header offset 32 * @thoff: transport header offset
32 * @ip_proto: protocol for which to get port offset 33 * @ip_proto: protocol for which to get port offset
34 * @data: raw buffer pointer to the packet, if NULL use skb->data
35 * @hlen: packet header length, if @data is NULL use skb_headlen(skb)
33 * 36 *
34 * The function will try to retrieve the ports at offset thoff + poff where poff 37 * The function will try to retrieve the ports at offset thoff + poff where poff
35 * is the protocol port offset returned from proto_ports_offset 38 * is the protocol port offset returned from proto_ports_offset
36 */ 39 */
37__be32 skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto) 40__be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
41 void *data, int hlen)
38{ 42{
39 int poff = proto_ports_offset(ip_proto); 43 int poff = proto_ports_offset(ip_proto);
40 44
45 if (!data) {
46 data = skb->data;
47 hlen = skb_headlen(skb);
48 }
49
41 if (poff >= 0) { 50 if (poff >= 0) {
42 __be32 *ports, _ports; 51 __be32 *ports, _ports;
43 52
44 ports = skb_header_pointer(skb, thoff + poff, 53 ports = __skb_header_pointer(skb, thoff + poff,
45 sizeof(_ports), &_ports); 54 sizeof(_ports), data, hlen, &_ports);
46 if (ports) 55 if (ports)
47 return *ports; 56 return *ports;
48 } 57 }
49 58
50 return 0; 59 return 0;
51} 60}
52EXPORT_SYMBOL(skb_flow_get_ports); 61EXPORT_SYMBOL(__skb_flow_get_ports);
53 62
54bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow) 63/**
64 * __skb_flow_dissect - extract the flow_keys struct and return it
65 * @skb: sk_buff to extract the flow from, can be NULL if the rest are specified
66 * @data: raw buffer pointer to the packet, if NULL use skb->data
67 * @proto: protocol for which to get the flow, if @data is NULL use skb->protocol
68 * @nhoff: network header offset, if @data is NULL use skb_network_offset(skb)
69 * @hlen: packet header length, if @data is NULL use skb_headlen(skb)
70 *
71 * The function will try to retrieve the struct flow_keys from either the skbuff
72 * or a raw buffer specified by the rest parameters
73 */
74bool __skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow,
75 void *data, __be16 proto, int nhoff, int hlen)
55{ 76{
56 int nhoff = skb_network_offset(skb);
57 u8 ip_proto; 77 u8 ip_proto;
58 __be16 proto = skb->protocol; 78
79 if (!data) {
80 data = skb->data;
81 proto = skb->protocol;
82 nhoff = skb_network_offset(skb);
83 hlen = skb_headlen(skb);
84 }
59 85
60 memset(flow, 0, sizeof(*flow)); 86 memset(flow, 0, sizeof(*flow));
61 87
@@ -65,7 +91,7 @@ again:
65 const struct iphdr *iph; 91 const struct iphdr *iph;
66 struct iphdr _iph; 92 struct iphdr _iph;
67ip: 93ip:
68 iph = skb_header_pointer(skb, nhoff, sizeof(_iph), &_iph); 94 iph = __skb_header_pointer(skb, nhoff, sizeof(_iph), data, hlen, &_iph);
69 if (!iph || iph->ihl < 5) 95 if (!iph || iph->ihl < 5)
70 return false; 96 return false;
71 nhoff += iph->ihl * 4; 97 nhoff += iph->ihl * 4;
@@ -83,7 +109,7 @@ ip:
83 __be32 flow_label; 109 __be32 flow_label;
84 110
85ipv6: 111ipv6:
86 iph = skb_header_pointer(skb, nhoff, sizeof(_iph), &_iph); 112 iph = __skb_header_pointer(skb, nhoff, sizeof(_iph), data, hlen, &_iph);
87 if (!iph) 113 if (!iph)
88 return false; 114 return false;
89 115
@@ -92,6 +118,13 @@ ipv6:
92 flow->dst = (__force __be32)ipv6_addr_hash(&iph->daddr); 118 flow->dst = (__force __be32)ipv6_addr_hash(&iph->daddr);
93 nhoff += sizeof(struct ipv6hdr); 119 nhoff += sizeof(struct ipv6hdr);
94 120
121 /* skip the flow label processing if skb is NULL. The
122 * assumption here is that if there is no skb we are not
123 * looking for flow info as much as we are length.
124 */
125 if (!skb)
126 break;
127
95 flow_label = ip6_flowlabel(iph); 128 flow_label = ip6_flowlabel(iph);
96 if (flow_label) { 129 if (flow_label) {
97 /* Awesome, IPv6 packet has a flow label so we can 130 /* Awesome, IPv6 packet has a flow label so we can
@@ -113,7 +146,7 @@ ipv6:
113 const struct vlan_hdr *vlan; 146 const struct vlan_hdr *vlan;
114 struct vlan_hdr _vlan; 147 struct vlan_hdr _vlan;
115 148
116 vlan = skb_header_pointer(skb, nhoff, sizeof(_vlan), &_vlan); 149 vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan), data, hlen, &_vlan);
117 if (!vlan) 150 if (!vlan)
118 return false; 151 return false;
119 152
@@ -126,7 +159,7 @@ ipv6:
126 struct pppoe_hdr hdr; 159 struct pppoe_hdr hdr;
127 __be16 proto; 160 __be16 proto;
128 } *hdr, _hdr; 161 } *hdr, _hdr;
129 hdr = skb_header_pointer(skb, nhoff, sizeof(_hdr), &_hdr); 162 hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, hlen, &_hdr);
130 if (!hdr) 163 if (!hdr)
131 return false; 164 return false;
132 proto = hdr->proto; 165 proto = hdr->proto;
@@ -140,6 +173,9 @@ ipv6:
140 return false; 173 return false;
141 } 174 }
142 } 175 }
176 case htons(ETH_P_FCOE):
177 flow->thoff = (u16)(nhoff + FCOE_HEADER_LEN);
178 /* fall through */
143 default: 179 default:
144 return false; 180 return false;
145 } 181 }
@@ -151,7 +187,7 @@ ipv6:
151 __be16 proto; 187 __be16 proto;
152 } *hdr, _hdr; 188 } *hdr, _hdr;
153 189
154 hdr = skb_header_pointer(skb, nhoff, sizeof(_hdr), &_hdr); 190 hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, hlen, &_hdr);
155 if (!hdr) 191 if (!hdr)
156 return false; 192 return false;
157 /* 193 /*
@@ -171,8 +207,9 @@ ipv6:
171 const struct ethhdr *eth; 207 const struct ethhdr *eth;
172 struct ethhdr _eth; 208 struct ethhdr _eth;
173 209
174 eth = skb_header_pointer(skb, nhoff, 210 eth = __skb_header_pointer(skb, nhoff,
175 sizeof(_eth), &_eth); 211 sizeof(_eth),
212 data, hlen, &_eth);
176 if (!eth) 213 if (!eth)
177 return false; 214 return false;
178 proto = eth->h_proto; 215 proto = eth->h_proto;
@@ -194,12 +231,12 @@ ipv6:
194 231
195 flow->n_proto = proto; 232 flow->n_proto = proto;
196 flow->ip_proto = ip_proto; 233 flow->ip_proto = ip_proto;
197 flow->ports = skb_flow_get_ports(skb, nhoff, ip_proto); 234 flow->ports = __skb_flow_get_ports(skb, nhoff, ip_proto, data, hlen);
198 flow->thoff = (u16) nhoff; 235 flow->thoff = (u16) nhoff;
199 236
200 return true; 237 return true;
201} 238}
202EXPORT_SYMBOL(skb_flow_dissect); 239EXPORT_SYMBOL(__skb_flow_dissect);
203 240
204static u32 hashrnd __read_mostly; 241static u32 hashrnd __read_mostly;
205static __always_inline void __flow_hash_secret_init(void) 242static __always_inline void __flow_hash_secret_init(void)
@@ -286,30 +323,22 @@ u16 __skb_tx_hash(const struct net_device *dev, struct sk_buff *skb,
286 qcount = dev->tc_to_txq[tc].count; 323 qcount = dev->tc_to_txq[tc].count;
287 } 324 }
288 325
289 return (u16) (((u64)skb_get_hash(skb) * qcount) >> 32) + qoffset; 326 return (u16) reciprocal_scale(skb_get_hash(skb), qcount) + qoffset;
290} 327}
291EXPORT_SYMBOL(__skb_tx_hash); 328EXPORT_SYMBOL(__skb_tx_hash);
292 329
293/* __skb_get_poff() returns the offset to the payload as far as it could 330u32 __skb_get_poff(const struct sk_buff *skb, void *data,
294 * be dissected. The main user is currently BPF, so that we can dynamically 331 const struct flow_keys *keys, int hlen)
295 * truncate packets without needing to push actual payload to the user
296 * space and can analyze headers only, instead.
297 */
298u32 __skb_get_poff(const struct sk_buff *skb)
299{ 332{
300 struct flow_keys keys; 333 u32 poff = keys->thoff;
301 u32 poff = 0;
302
303 if (!skb_flow_dissect(skb, &keys))
304 return 0;
305 334
306 poff += keys.thoff; 335 switch (keys->ip_proto) {
307 switch (keys.ip_proto) {
308 case IPPROTO_TCP: { 336 case IPPROTO_TCP: {
309 const struct tcphdr *tcph; 337 const struct tcphdr *tcph;
310 struct tcphdr _tcph; 338 struct tcphdr _tcph;
311 339
312 tcph = skb_header_pointer(skb, poff, sizeof(_tcph), &_tcph); 340 tcph = __skb_header_pointer(skb, poff, sizeof(_tcph),
341 data, hlen, &_tcph);
313 if (!tcph) 342 if (!tcph)
314 return poff; 343 return poff;
315 344
@@ -343,6 +372,21 @@ u32 __skb_get_poff(const struct sk_buff *skb)
343 return poff; 372 return poff;
344} 373}
345 374
375/* skb_get_poff() returns the offset to the payload as far as it could
376 * be dissected. The main user is currently BPF, so that we can dynamically
377 * truncate packets without needing to push actual payload to the user
378 * space and can analyze headers only, instead.
379 */
380u32 skb_get_poff(const struct sk_buff *skb)
381{
382 struct flow_keys keys;
383
384 if (!skb_flow_dissect(skb, &keys))
385 return 0;
386
387 return __skb_get_poff(skb, skb->data, &keys, skb_headlen(skb));
388}
389
346static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb) 390static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
347{ 391{
348#ifdef CONFIG_XPS 392#ifdef CONFIG_XPS
@@ -359,9 +403,8 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
359 if (map->len == 1) 403 if (map->len == 1)
360 queue_index = map->queues[0]; 404 queue_index = map->queues[0];
361 else 405 else
362 queue_index = map->queues[ 406 queue_index = map->queues[reciprocal_scale(skb_get_hash(skb),
363 ((u64)skb_get_hash(skb) * map->len) >> 32]; 407 map->len)];
364
365 if (unlikely(queue_index >= dev->real_num_tx_queues)) 408 if (unlikely(queue_index >= dev->real_num_tx_queues))
366 queue_index = -1; 409 queue_index = -1;
367 } 410 }
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 9d33dfffca19..9dfb88a933e7 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -91,6 +91,8 @@ struct gen_estimator
91 u32 avpps; 91 u32 avpps;
92 struct rcu_head e_rcu; 92 struct rcu_head e_rcu;
93 struct rb_node node; 93 struct rb_node node;
94 struct gnet_stats_basic_cpu __percpu *cpu_bstats;
95 struct rcu_head head;
94}; 96};
95 97
96struct gen_estimator_head 98struct gen_estimator_head
@@ -115,9 +117,8 @@ static void est_timer(unsigned long arg)
115 117
116 rcu_read_lock(); 118 rcu_read_lock();
117 list_for_each_entry_rcu(e, &elist[idx].list, list) { 119 list_for_each_entry_rcu(e, &elist[idx].list, list) {
118 u64 nbytes; 120 struct gnet_stats_basic_packed b = {0};
119 u64 brate; 121 u64 brate;
120 u32 npackets;
121 u32 rate; 122 u32 rate;
122 123
123 spin_lock(e->stats_lock); 124 spin_lock(e->stats_lock);
@@ -125,15 +126,15 @@ static void est_timer(unsigned long arg)
125 if (e->bstats == NULL) 126 if (e->bstats == NULL)
126 goto skip; 127 goto skip;
127 128
128 nbytes = e->bstats->bytes; 129 __gnet_stats_copy_basic(&b, e->cpu_bstats, e->bstats);
129 npackets = e->bstats->packets; 130
130 brate = (nbytes - e->last_bytes)<<(7 - idx); 131 brate = (b.bytes - e->last_bytes)<<(7 - idx);
131 e->last_bytes = nbytes; 132 e->last_bytes = b.bytes;
132 e->avbps += (brate >> e->ewma_log) - (e->avbps >> e->ewma_log); 133 e->avbps += (brate >> e->ewma_log) - (e->avbps >> e->ewma_log);
133 e->rate_est->bps = (e->avbps+0xF)>>5; 134 e->rate_est->bps = (e->avbps+0xF)>>5;
134 135
135 rate = (npackets - e->last_packets)<<(12 - idx); 136 rate = (b.packets - e->last_packets)<<(12 - idx);
136 e->last_packets = npackets; 137 e->last_packets = b.packets;
137 e->avpps += (rate >> e->ewma_log) - (e->avpps >> e->ewma_log); 138 e->avpps += (rate >> e->ewma_log) - (e->avpps >> e->ewma_log);
138 e->rate_est->pps = (e->avpps+0x1FF)>>10; 139 e->rate_est->pps = (e->avpps+0x1FF)>>10;
139skip: 140skip:
@@ -203,12 +204,14 @@ struct gen_estimator *gen_find_node(const struct gnet_stats_basic_packed *bstats
203 * 204 *
204 */ 205 */
205int gen_new_estimator(struct gnet_stats_basic_packed *bstats, 206int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
207 struct gnet_stats_basic_cpu __percpu *cpu_bstats,
206 struct gnet_stats_rate_est64 *rate_est, 208 struct gnet_stats_rate_est64 *rate_est,
207 spinlock_t *stats_lock, 209 spinlock_t *stats_lock,
208 struct nlattr *opt) 210 struct nlattr *opt)
209{ 211{
210 struct gen_estimator *est; 212 struct gen_estimator *est;
211 struct gnet_estimator *parm = nla_data(opt); 213 struct gnet_estimator *parm = nla_data(opt);
214 struct gnet_stats_basic_packed b = {0};
212 int idx; 215 int idx;
213 216
214 if (nla_len(opt) < sizeof(*parm)) 217 if (nla_len(opt) < sizeof(*parm))
@@ -221,15 +224,18 @@ int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
221 if (est == NULL) 224 if (est == NULL)
222 return -ENOBUFS; 225 return -ENOBUFS;
223 226
227 __gnet_stats_copy_basic(&b, cpu_bstats, bstats);
228
224 idx = parm->interval + 2; 229 idx = parm->interval + 2;
225 est->bstats = bstats; 230 est->bstats = bstats;
226 est->rate_est = rate_est; 231 est->rate_est = rate_est;
227 est->stats_lock = stats_lock; 232 est->stats_lock = stats_lock;
228 est->ewma_log = parm->ewma_log; 233 est->ewma_log = parm->ewma_log;
229 est->last_bytes = bstats->bytes; 234 est->last_bytes = b.bytes;
230 est->avbps = rate_est->bps<<5; 235 est->avbps = rate_est->bps<<5;
231 est->last_packets = bstats->packets; 236 est->last_packets = b.packets;
232 est->avpps = rate_est->pps<<10; 237 est->avpps = rate_est->pps<<10;
238 est->cpu_bstats = cpu_bstats;
233 239
234 spin_lock_bh(&est_tree_lock); 240 spin_lock_bh(&est_tree_lock);
235 if (!elist[idx].timer.function) { 241 if (!elist[idx].timer.function) {
@@ -290,11 +296,12 @@ EXPORT_SYMBOL(gen_kill_estimator);
290 * Returns 0 on success or a negative error code. 296 * Returns 0 on success or a negative error code.
291 */ 297 */
292int gen_replace_estimator(struct gnet_stats_basic_packed *bstats, 298int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
299 struct gnet_stats_basic_cpu __percpu *cpu_bstats,
293 struct gnet_stats_rate_est64 *rate_est, 300 struct gnet_stats_rate_est64 *rate_est,
294 spinlock_t *stats_lock, struct nlattr *opt) 301 spinlock_t *stats_lock, struct nlattr *opt)
295{ 302{
296 gen_kill_estimator(bstats, rate_est); 303 gen_kill_estimator(bstats, rate_est);
297 return gen_new_estimator(bstats, rate_est, stats_lock, opt); 304 return gen_new_estimator(bstats, cpu_bstats, rate_est, stats_lock, opt);
298} 305}
299EXPORT_SYMBOL(gen_replace_estimator); 306EXPORT_SYMBOL(gen_replace_estimator);
300 307
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 2ddbce4cce14..0c08062d1796 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -97,6 +97,43 @@ gnet_stats_start_copy(struct sk_buff *skb, int type, spinlock_t *lock,
97} 97}
98EXPORT_SYMBOL(gnet_stats_start_copy); 98EXPORT_SYMBOL(gnet_stats_start_copy);
99 99
100static void
101__gnet_stats_copy_basic_cpu(struct gnet_stats_basic_packed *bstats,
102 struct gnet_stats_basic_cpu __percpu *cpu)
103{
104 int i;
105
106 for_each_possible_cpu(i) {
107 struct gnet_stats_basic_cpu *bcpu = per_cpu_ptr(cpu, i);
108 unsigned int start;
109 u64 bytes;
110 u32 packets;
111
112 do {
113 start = u64_stats_fetch_begin_irq(&bcpu->syncp);
114 bytes = bcpu->bstats.bytes;
115 packets = bcpu->bstats.packets;
116 } while (u64_stats_fetch_retry_irq(&bcpu->syncp, start));
117
118 bstats->bytes += bytes;
119 bstats->packets += packets;
120 }
121}
122
123void
124__gnet_stats_copy_basic(struct gnet_stats_basic_packed *bstats,
125 struct gnet_stats_basic_cpu __percpu *cpu,
126 struct gnet_stats_basic_packed *b)
127{
128 if (cpu) {
129 __gnet_stats_copy_basic_cpu(bstats, cpu);
130 } else {
131 bstats->bytes = b->bytes;
132 bstats->packets = b->packets;
133 }
134}
135EXPORT_SYMBOL(__gnet_stats_copy_basic);
136
100/** 137/**
101 * gnet_stats_copy_basic - copy basic statistics into statistic TLV 138 * gnet_stats_copy_basic - copy basic statistics into statistic TLV
102 * @d: dumping handle 139 * @d: dumping handle
@@ -109,19 +146,25 @@ EXPORT_SYMBOL(gnet_stats_start_copy);
109 * if the room in the socket buffer was not sufficient. 146 * if the room in the socket buffer was not sufficient.
110 */ 147 */
111int 148int
112gnet_stats_copy_basic(struct gnet_dump *d, struct gnet_stats_basic_packed *b) 149gnet_stats_copy_basic(struct gnet_dump *d,
150 struct gnet_stats_basic_cpu __percpu *cpu,
151 struct gnet_stats_basic_packed *b)
113{ 152{
153 struct gnet_stats_basic_packed bstats = {0};
154
155 __gnet_stats_copy_basic(&bstats, cpu, b);
156
114 if (d->compat_tc_stats) { 157 if (d->compat_tc_stats) {
115 d->tc_stats.bytes = b->bytes; 158 d->tc_stats.bytes = bstats.bytes;
116 d->tc_stats.packets = b->packets; 159 d->tc_stats.packets = bstats.packets;
117 } 160 }
118 161
119 if (d->tail) { 162 if (d->tail) {
120 struct gnet_stats_basic sb; 163 struct gnet_stats_basic sb;
121 164
122 memset(&sb, 0, sizeof(sb)); 165 memset(&sb, 0, sizeof(sb));
123 sb.bytes = b->bytes; 166 sb.bytes = bstats.bytes;
124 sb.packets = b->packets; 167 sb.packets = bstats.packets;
125 return gnet_stats_copy(d, TCA_STATS_BASIC, &sb, sizeof(sb)); 168 return gnet_stats_copy(d, TCA_STATS_BASIC, &sb, sizeof(sb));
126 } 169 }
127 return 0; 170 return 0;
@@ -172,29 +215,74 @@ gnet_stats_copy_rate_est(struct gnet_dump *d,
172} 215}
173EXPORT_SYMBOL(gnet_stats_copy_rate_est); 216EXPORT_SYMBOL(gnet_stats_copy_rate_est);
174 217
218static void
219__gnet_stats_copy_queue_cpu(struct gnet_stats_queue *qstats,
220 const struct gnet_stats_queue __percpu *q)
221{
222 int i;
223
224 for_each_possible_cpu(i) {
225 const struct gnet_stats_queue *qcpu = per_cpu_ptr(q, i);
226
227 qstats->qlen = 0;
228 qstats->backlog += qcpu->backlog;
229 qstats->drops += qcpu->drops;
230 qstats->requeues += qcpu->requeues;
231 qstats->overlimits += qcpu->overlimits;
232 }
233}
234
235static void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
236 const struct gnet_stats_queue __percpu *cpu,
237 const struct gnet_stats_queue *q,
238 __u32 qlen)
239{
240 if (cpu) {
241 __gnet_stats_copy_queue_cpu(qstats, cpu);
242 } else {
243 qstats->qlen = q->qlen;
244 qstats->backlog = q->backlog;
245 qstats->drops = q->drops;
246 qstats->requeues = q->requeues;
247 qstats->overlimits = q->overlimits;
248 }
249
250 qstats->qlen = qlen;
251}
252
175/** 253/**
176 * gnet_stats_copy_queue - copy queue statistics into statistics TLV 254 * gnet_stats_copy_queue - copy queue statistics into statistics TLV
177 * @d: dumping handle 255 * @d: dumping handle
256 * @cpu_q: per cpu queue statistics
178 * @q: queue statistics 257 * @q: queue statistics
258 * @qlen: queue length statistics
179 * 259 *
180 * Appends the queue statistics to the top level TLV created by 260 * Appends the queue statistics to the top level TLV created by
181 * gnet_stats_start_copy(). 261 * gnet_stats_start_copy(). Using per cpu queue statistics if
262 * they are available.
182 * 263 *
183 * Returns 0 on success or -1 with the statistic lock released 264 * Returns 0 on success or -1 with the statistic lock released
184 * if the room in the socket buffer was not sufficient. 265 * if the room in the socket buffer was not sufficient.
185 */ 266 */
186int 267int
187gnet_stats_copy_queue(struct gnet_dump *d, struct gnet_stats_queue *q) 268gnet_stats_copy_queue(struct gnet_dump *d,
269 struct gnet_stats_queue __percpu *cpu_q,
270 struct gnet_stats_queue *q, __u32 qlen)
188{ 271{
272 struct gnet_stats_queue qstats = {0};
273
274 __gnet_stats_copy_queue(&qstats, cpu_q, q, qlen);
275
189 if (d->compat_tc_stats) { 276 if (d->compat_tc_stats) {
190 d->tc_stats.drops = q->drops; 277 d->tc_stats.drops = qstats.drops;
191 d->tc_stats.qlen = q->qlen; 278 d->tc_stats.qlen = qstats.qlen;
192 d->tc_stats.backlog = q->backlog; 279 d->tc_stats.backlog = qstats.backlog;
193 d->tc_stats.overlimits = q->overlimits; 280 d->tc_stats.overlimits = qstats.overlimits;
194 } 281 }
195 282
196 if (d->tail) 283 if (d->tail)
197 return gnet_stats_copy(d, TCA_STATS_QUEUE, q, sizeof(*q)); 284 return gnet_stats_copy(d, TCA_STATS_QUEUE,
285 &qstats, sizeof(qstats));
198 286
199 return 0; 287 return 0;
200} 288}
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 7c6b51a58968..7f155175bba8 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -224,7 +224,7 @@ static void net_free(struct net *net)
224 return; 224 return;
225 } 225 }
226#endif 226#endif
227 kfree(net->gen); 227 kfree(rcu_access_pointer(net->gen));
228 kmem_cache_free(net_cachep, net); 228 kmem_cache_free(net_cachep, net);
229} 229}
230 230
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 907fb5e36c02..e6645b4f330a 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -72,7 +72,6 @@ module_param(carrier_timeout, uint, 0644);
72static int netpoll_start_xmit(struct sk_buff *skb, struct net_device *dev, 72static int netpoll_start_xmit(struct sk_buff *skb, struct net_device *dev,
73 struct netdev_queue *txq) 73 struct netdev_queue *txq)
74{ 74{
75 const struct net_device_ops *ops = dev->netdev_ops;
76 int status = NETDEV_TX_OK; 75 int status = NETDEV_TX_OK;
77 netdev_features_t features; 76 netdev_features_t features;
78 77
@@ -92,9 +91,7 @@ static int netpoll_start_xmit(struct sk_buff *skb, struct net_device *dev,
92 skb->vlan_tci = 0; 91 skb->vlan_tci = 0;
93 } 92 }
94 93
95 status = ops->ndo_start_xmit(skb, dev); 94 status = netdev_start_xmit(skb, dev, txq, false);
96 if (status == NETDEV_TX_OK)
97 txq_trans_update(txq);
98 95
99out: 96out:
100 return status; 97 return status;
@@ -116,7 +113,7 @@ static void queue_process(struct work_struct *work)
116 continue; 113 continue;
117 } 114 }
118 115
119 txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb)); 116 txq = skb_get_tx_queue(dev, skb);
120 117
121 local_irq_save(flags); 118 local_irq_save(flags);
122 HARD_TX_LOCK(dev, txq, smp_processor_id()); 119 HARD_TX_LOCK(dev, txq, smp_processor_id());
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 8b849ddfef2e..443256bdcddc 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -202,6 +202,7 @@
202#define F_QUEUE_MAP_CPU (1<<14) /* queue map mirrors smp_processor_id() */ 202#define F_QUEUE_MAP_CPU (1<<14) /* queue map mirrors smp_processor_id() */
203#define F_NODE (1<<15) /* Node memory alloc*/ 203#define F_NODE (1<<15) /* Node memory alloc*/
204#define F_UDPCSUM (1<<16) /* Include UDP checksum */ 204#define F_UDPCSUM (1<<16) /* Include UDP checksum */
205#define F_NO_TIMESTAMP (1<<17) /* Don't timestamp packets (default TS) */
205 206
206/* Thread control flag bits */ 207/* Thread control flag bits */
207#define T_STOP (1<<0) /* Stop run */ 208#define T_STOP (1<<0) /* Stop run */
@@ -386,6 +387,7 @@ struct pktgen_dev {
386 u16 queue_map_min; 387 u16 queue_map_min;
387 u16 queue_map_max; 388 u16 queue_map_max;
388 __u32 skb_priority; /* skb priority field */ 389 __u32 skb_priority; /* skb priority field */
390 unsigned int burst; /* number of duplicated packets to burst */
389 int node; /* Memory node */ 391 int node; /* Memory node */
390 392
391#ifdef CONFIG_XFRM 393#ifdef CONFIG_XFRM
@@ -505,7 +507,7 @@ static ssize_t pgctrl_write(struct file *file, const char __user *buf,
505 pktgen_reset_all_threads(pn); 507 pktgen_reset_all_threads(pn);
506 508
507 else 509 else
508 pr_warning("Unknown command: %s\n", data); 510 pr_warn("Unknown command: %s\n", data);
509 511
510 return count; 512 return count;
511} 513}
@@ -612,6 +614,9 @@ static int pktgen_if_show(struct seq_file *seq, void *v)
612 if (pkt_dev->traffic_class) 614 if (pkt_dev->traffic_class)
613 seq_printf(seq, " traffic_class: 0x%02x\n", pkt_dev->traffic_class); 615 seq_printf(seq, " traffic_class: 0x%02x\n", pkt_dev->traffic_class);
614 616
617 if (pkt_dev->burst > 1)
618 seq_printf(seq, " burst: %d\n", pkt_dev->burst);
619
615 if (pkt_dev->node >= 0) 620 if (pkt_dev->node >= 0)
616 seq_printf(seq, " node: %d\n", pkt_dev->node); 621 seq_printf(seq, " node: %d\n", pkt_dev->node);
617 622
@@ -638,6 +643,9 @@ static int pktgen_if_show(struct seq_file *seq, void *v)
638 if (pkt_dev->flags & F_UDPCSUM) 643 if (pkt_dev->flags & F_UDPCSUM)
639 seq_puts(seq, "UDPCSUM "); 644 seq_puts(seq, "UDPCSUM ");
640 645
646 if (pkt_dev->flags & F_NO_TIMESTAMP)
647 seq_puts(seq, "NO_TIMESTAMP ");
648
641 if (pkt_dev->flags & F_MPLS_RND) 649 if (pkt_dev->flags & F_MPLS_RND)
642 seq_puts(seq, "MPLS_RND "); 650 seq_puts(seq, "MPLS_RND ");
643 651
@@ -857,14 +865,14 @@ static ssize_t pktgen_if_write(struct file *file,
857 pg_result = &(pkt_dev->result[0]); 865 pg_result = &(pkt_dev->result[0]);
858 866
859 if (count < 1) { 867 if (count < 1) {
860 pr_warning("wrong command format\n"); 868 pr_warn("wrong command format\n");
861 return -EINVAL; 869 return -EINVAL;
862 } 870 }
863 871
864 max = count; 872 max = count;
865 tmp = count_trail_chars(user_buffer, max); 873 tmp = count_trail_chars(user_buffer, max);
866 if (tmp < 0) { 874 if (tmp < 0) {
867 pr_warning("illegal format\n"); 875 pr_warn("illegal format\n");
868 return tmp; 876 return tmp;
869 } 877 }
870 i = tmp; 878 i = tmp;
@@ -1120,6 +1128,16 @@ static ssize_t pktgen_if_write(struct file *file,
1120 pkt_dev->dst_mac_count); 1128 pkt_dev->dst_mac_count);
1121 return count; 1129 return count;
1122 } 1130 }
1131 if (!strcmp(name, "burst")) {
1132 len = num_arg(&user_buffer[i], 10, &value);
1133 if (len < 0)
1134 return len;
1135
1136 i += len;
1137 pkt_dev->burst = value < 1 ? 1 : value;
1138 sprintf(pg_result, "OK: burst=%d", pkt_dev->burst);
1139 return count;
1140 }
1123 if (!strcmp(name, "node")) { 1141 if (!strcmp(name, "node")) {
1124 len = num_arg(&user_buffer[i], 10, &value); 1142 len = num_arg(&user_buffer[i], 10, &value);
1125 if (len < 0) 1143 if (len < 0)
@@ -1243,6 +1261,9 @@ static ssize_t pktgen_if_write(struct file *file,
1243 else if (strcmp(f, "!UDPCSUM") == 0) 1261 else if (strcmp(f, "!UDPCSUM") == 0)
1244 pkt_dev->flags &= ~F_UDPCSUM; 1262 pkt_dev->flags &= ~F_UDPCSUM;
1245 1263
1264 else if (strcmp(f, "NO_TIMESTAMP") == 0)
1265 pkt_dev->flags |= F_NO_TIMESTAMP;
1266
1246 else { 1267 else {
1247 sprintf(pg_result, 1268 sprintf(pg_result,
1248 "Flag -:%s:- unknown\nAvailable flags, (prepend ! to un-set flag):\n%s", 1269 "Flag -:%s:- unknown\nAvailable flags, (prepend ! to un-set flag):\n%s",
@@ -1251,6 +1272,7 @@ static ssize_t pktgen_if_write(struct file *file,
1251 "MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, " 1272 "MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, "
1252 "MPLS_RND, VID_RND, SVID_RND, FLOW_SEQ, " 1273 "MPLS_RND, VID_RND, SVID_RND, FLOW_SEQ, "
1253 "QUEUE_MAP_RND, QUEUE_MAP_CPU, UDPCSUM, " 1274 "QUEUE_MAP_RND, QUEUE_MAP_CPU, UDPCSUM, "
1275 "NO_TIMESTAMP, "
1254#ifdef CONFIG_XFRM 1276#ifdef CONFIG_XFRM
1255 "IPSEC, " 1277 "IPSEC, "
1256#endif 1278#endif
@@ -2048,15 +2070,15 @@ static void pktgen_setup_inject(struct pktgen_dev *pkt_dev)
2048 ntxq = pkt_dev->odev->real_num_tx_queues; 2070 ntxq = pkt_dev->odev->real_num_tx_queues;
2049 2071
2050 if (ntxq <= pkt_dev->queue_map_min) { 2072 if (ntxq <= pkt_dev->queue_map_min) {
2051 pr_warning("WARNING: Requested queue_map_min (zero-based) (%d) exceeds valid range [0 - %d] for (%d) queues on %s, resetting\n", 2073 pr_warn("WARNING: Requested queue_map_min (zero-based) (%d) exceeds valid range [0 - %d] for (%d) queues on %s, resetting\n",
2052 pkt_dev->queue_map_min, (ntxq ?: 1) - 1, ntxq, 2074 pkt_dev->queue_map_min, (ntxq ?: 1) - 1, ntxq,
2053 pkt_dev->odevname); 2075 pkt_dev->odevname);
2054 pkt_dev->queue_map_min = (ntxq ?: 1) - 1; 2076 pkt_dev->queue_map_min = (ntxq ?: 1) - 1;
2055 } 2077 }
2056 if (pkt_dev->queue_map_max >= ntxq) { 2078 if (pkt_dev->queue_map_max >= ntxq) {
2057 pr_warning("WARNING: Requested queue_map_max (zero-based) (%d) exceeds valid range [0 - %d] for (%d) queues on %s, resetting\n", 2079 pr_warn("WARNING: Requested queue_map_max (zero-based) (%d) exceeds valid range [0 - %d] for (%d) queues on %s, resetting\n",
2058 pkt_dev->queue_map_max, (ntxq ?: 1) - 1, ntxq, 2080 pkt_dev->queue_map_max, (ntxq ?: 1) - 1, ntxq,
2059 pkt_dev->odevname); 2081 pkt_dev->odevname);
2060 pkt_dev->queue_map_max = (ntxq ?: 1) - 1; 2082 pkt_dev->queue_map_max = (ntxq ?: 1) - 1;
2061 } 2083 }
2062 2084
@@ -2685,9 +2707,14 @@ static void pktgen_finalize_skb(struct pktgen_dev *pkt_dev, struct sk_buff *skb,
2685 pgh->pgh_magic = htonl(PKTGEN_MAGIC); 2707 pgh->pgh_magic = htonl(PKTGEN_MAGIC);
2686 pgh->seq_num = htonl(pkt_dev->seq_num); 2708 pgh->seq_num = htonl(pkt_dev->seq_num);
2687 2709
2688 do_gettimeofday(&timestamp); 2710 if (pkt_dev->flags & F_NO_TIMESTAMP) {
2689 pgh->tv_sec = htonl(timestamp.tv_sec); 2711 pgh->tv_sec = 0;
2690 pgh->tv_usec = htonl(timestamp.tv_usec); 2712 pgh->tv_usec = 0;
2713 } else {
2714 do_gettimeofday(&timestamp);
2715 pgh->tv_sec = htonl(timestamp.tv_sec);
2716 pgh->tv_usec = htonl(timestamp.tv_usec);
2717 }
2691} 2718}
2692 2719
2693static struct sk_buff *pktgen_alloc_skb(struct net_device *dev, 2720static struct sk_buff *pktgen_alloc_skb(struct net_device *dev,
@@ -3160,8 +3187,8 @@ static int pktgen_stop_device(struct pktgen_dev *pkt_dev)
3160 int nr_frags = pkt_dev->skb ? skb_shinfo(pkt_dev->skb)->nr_frags : -1; 3187 int nr_frags = pkt_dev->skb ? skb_shinfo(pkt_dev->skb)->nr_frags : -1;
3161 3188
3162 if (!pkt_dev->running) { 3189 if (!pkt_dev->running) {
3163 pr_warning("interface: %s is already stopped\n", 3190 pr_warn("interface: %s is already stopped\n",
3164 pkt_dev->odevname); 3191 pkt_dev->odevname);
3165 return -EINVAL; 3192 return -EINVAL;
3166 } 3193 }
3167 3194
@@ -3284,11 +3311,9 @@ static void pktgen_wait_for_skb(struct pktgen_dev *pkt_dev)
3284 3311
3285static void pktgen_xmit(struct pktgen_dev *pkt_dev) 3312static void pktgen_xmit(struct pktgen_dev *pkt_dev)
3286{ 3313{
3314 unsigned int burst = ACCESS_ONCE(pkt_dev->burst);
3287 struct net_device *odev = pkt_dev->odev; 3315 struct net_device *odev = pkt_dev->odev;
3288 netdev_tx_t (*xmit)(struct sk_buff *, struct net_device *)
3289 = odev->netdev_ops->ndo_start_xmit;
3290 struct netdev_queue *txq; 3316 struct netdev_queue *txq;
3291 u16 queue_map;
3292 int ret; 3317 int ret;
3293 3318
3294 /* If device is offline, then don't send */ 3319 /* If device is offline, then don't send */
@@ -3326,8 +3351,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
3326 if (pkt_dev->delay && pkt_dev->last_ok) 3351 if (pkt_dev->delay && pkt_dev->last_ok)
3327 spin(pkt_dev, pkt_dev->next_tx); 3352 spin(pkt_dev, pkt_dev->next_tx);
3328 3353
3329 queue_map = skb_get_queue_mapping(pkt_dev->skb); 3354 txq = skb_get_tx_queue(odev, pkt_dev->skb);
3330 txq = netdev_get_tx_queue(odev, queue_map);
3331 3355
3332 local_bh_disable(); 3356 local_bh_disable();
3333 3357
@@ -3338,16 +3362,19 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
3338 pkt_dev->last_ok = 0; 3362 pkt_dev->last_ok = 0;
3339 goto unlock; 3363 goto unlock;
3340 } 3364 }
3341 atomic_inc(&(pkt_dev->skb->users)); 3365 atomic_add(burst, &pkt_dev->skb->users);
3342 ret = (*xmit)(pkt_dev->skb, odev); 3366
3367xmit_more:
3368 ret = netdev_start_xmit(pkt_dev->skb, odev, txq, --burst > 0);
3343 3369
3344 switch (ret) { 3370 switch (ret) {
3345 case NETDEV_TX_OK: 3371 case NETDEV_TX_OK:
3346 txq_trans_update(txq);
3347 pkt_dev->last_ok = 1; 3372 pkt_dev->last_ok = 1;
3348 pkt_dev->sofar++; 3373 pkt_dev->sofar++;
3349 pkt_dev->seq_num++; 3374 pkt_dev->seq_num++;
3350 pkt_dev->tx_bytes += pkt_dev->last_pkt_size; 3375 pkt_dev->tx_bytes += pkt_dev->last_pkt_size;
3376 if (burst > 0 && !netif_xmit_frozen_or_drv_stopped(txq))
3377 goto xmit_more;
3351 break; 3378 break;
3352 case NET_XMIT_DROP: 3379 case NET_XMIT_DROP:
3353 case NET_XMIT_CN: 3380 case NET_XMIT_CN:
@@ -3366,6 +3393,8 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
3366 atomic_dec(&(pkt_dev->skb->users)); 3393 atomic_dec(&(pkt_dev->skb->users));
3367 pkt_dev->last_ok = 0; 3394 pkt_dev->last_ok = 0;
3368 } 3395 }
3396 if (unlikely(burst))
3397 atomic_sub(burst, &pkt_dev->skb->users);
3369unlock: 3398unlock:
3370 HARD_TX_UNLOCK(odev, txq); 3399 HARD_TX_UNLOCK(odev, txq);
3371 3400
@@ -3564,6 +3593,7 @@ static int pktgen_add_device(struct pktgen_thread *t, const char *ifname)
3564 pkt_dev->svlan_p = 0; 3593 pkt_dev->svlan_p = 0;
3565 pkt_dev->svlan_cfi = 0; 3594 pkt_dev->svlan_cfi = 0;
3566 pkt_dev->svlan_id = 0xffff; 3595 pkt_dev->svlan_id = 0xffff;
3596 pkt_dev->burst = 1;
3567 pkt_dev->node = -1; 3597 pkt_dev->node = -1;
3568 3598
3569 err = pktgen_setup_dev(t->net, pkt_dev, ifname); 3599 err = pktgen_setup_dev(t->net, pkt_dev, ifname);
@@ -3684,7 +3714,7 @@ static int pktgen_remove_device(struct pktgen_thread *t,
3684 pr_debug("remove_device pkt_dev=%p\n", pkt_dev); 3714 pr_debug("remove_device pkt_dev=%p\n", pkt_dev);
3685 3715
3686 if (pkt_dev->running) { 3716 if (pkt_dev->running) {
3687 pr_warning("WARNING: trying to remove a running interface, stopping it now\n"); 3717 pr_warn("WARNING: trying to remove a running interface, stopping it now\n");
3688 pktgen_stop_device(pkt_dev); 3718 pktgen_stop_device(pkt_dev);
3689 } 3719 }
3690 3720
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f0493e3b7471..a6882686ca3a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1481,9 +1481,12 @@ static int do_set_master(struct net_device *dev, int ifindex)
1481 return 0; 1481 return 0;
1482} 1482}
1483 1483
1484#define DO_SETLINK_MODIFIED 0x01
1485/* notify flag means notify + modified. */
1486#define DO_SETLINK_NOTIFY 0x03
1484static int do_setlink(const struct sk_buff *skb, 1487static int do_setlink(const struct sk_buff *skb,
1485 struct net_device *dev, struct ifinfomsg *ifm, 1488 struct net_device *dev, struct ifinfomsg *ifm,
1486 struct nlattr **tb, char *ifname, int modified) 1489 struct nlattr **tb, char *ifname, int status)
1487{ 1490{
1488 const struct net_device_ops *ops = dev->netdev_ops; 1491 const struct net_device_ops *ops = dev->netdev_ops;
1489 int err; 1492 int err;
@@ -1502,7 +1505,7 @@ static int do_setlink(const struct sk_buff *skb,
1502 put_net(net); 1505 put_net(net);
1503 if (err) 1506 if (err)
1504 goto errout; 1507 goto errout;
1505 modified = 1; 1508 status |= DO_SETLINK_MODIFIED;
1506 } 1509 }
1507 1510
1508 if (tb[IFLA_MAP]) { 1511 if (tb[IFLA_MAP]) {
@@ -1531,7 +1534,7 @@ static int do_setlink(const struct sk_buff *skb,
1531 if (err < 0) 1534 if (err < 0)
1532 goto errout; 1535 goto errout;
1533 1536
1534 modified = 1; 1537 status |= DO_SETLINK_NOTIFY;
1535 } 1538 }
1536 1539
1537 if (tb[IFLA_ADDRESS]) { 1540 if (tb[IFLA_ADDRESS]) {
@@ -1551,19 +1554,19 @@ static int do_setlink(const struct sk_buff *skb,
1551 kfree(sa); 1554 kfree(sa);
1552 if (err) 1555 if (err)
1553 goto errout; 1556 goto errout;
1554 modified = 1; 1557 status |= DO_SETLINK_MODIFIED;
1555 } 1558 }
1556 1559
1557 if (tb[IFLA_MTU]) { 1560 if (tb[IFLA_MTU]) {
1558 err = dev_set_mtu(dev, nla_get_u32(tb[IFLA_MTU])); 1561 err = dev_set_mtu(dev, nla_get_u32(tb[IFLA_MTU]));
1559 if (err < 0) 1562 if (err < 0)
1560 goto errout; 1563 goto errout;
1561 modified = 1; 1564 status |= DO_SETLINK_MODIFIED;
1562 } 1565 }
1563 1566
1564 if (tb[IFLA_GROUP]) { 1567 if (tb[IFLA_GROUP]) {
1565 dev_set_group(dev, nla_get_u32(tb[IFLA_GROUP])); 1568 dev_set_group(dev, nla_get_u32(tb[IFLA_GROUP]));
1566 modified = 1; 1569 status |= DO_SETLINK_NOTIFY;
1567 } 1570 }
1568 1571
1569 /* 1572 /*
@@ -1575,7 +1578,7 @@ static int do_setlink(const struct sk_buff *skb,
1575 err = dev_change_name(dev, ifname); 1578 err = dev_change_name(dev, ifname);
1576 if (err < 0) 1579 if (err < 0)
1577 goto errout; 1580 goto errout;
1578 modified = 1; 1581 status |= DO_SETLINK_MODIFIED;
1579 } 1582 }
1580 1583
1581 if (tb[IFLA_IFALIAS]) { 1584 if (tb[IFLA_IFALIAS]) {
@@ -1583,7 +1586,7 @@ static int do_setlink(const struct sk_buff *skb,
1583 nla_len(tb[IFLA_IFALIAS])); 1586 nla_len(tb[IFLA_IFALIAS]));
1584 if (err < 0) 1587 if (err < 0)
1585 goto errout; 1588 goto errout;
1586 modified = 1; 1589 status |= DO_SETLINK_NOTIFY;
1587 } 1590 }
1588 1591
1589 if (tb[IFLA_BROADCAST]) { 1592 if (tb[IFLA_BROADCAST]) {
@@ -1601,25 +1604,35 @@ static int do_setlink(const struct sk_buff *skb,
1601 err = do_set_master(dev, nla_get_u32(tb[IFLA_MASTER])); 1604 err = do_set_master(dev, nla_get_u32(tb[IFLA_MASTER]));
1602 if (err) 1605 if (err)
1603 goto errout; 1606 goto errout;
1604 modified = 1; 1607 status |= DO_SETLINK_MODIFIED;
1605 } 1608 }
1606 1609
1607 if (tb[IFLA_CARRIER]) { 1610 if (tb[IFLA_CARRIER]) {
1608 err = dev_change_carrier(dev, nla_get_u8(tb[IFLA_CARRIER])); 1611 err = dev_change_carrier(dev, nla_get_u8(tb[IFLA_CARRIER]));
1609 if (err) 1612 if (err)
1610 goto errout; 1613 goto errout;
1611 modified = 1; 1614 status |= DO_SETLINK_MODIFIED;
1612 } 1615 }
1613 1616
1614 if (tb[IFLA_TXQLEN]) 1617 if (tb[IFLA_TXQLEN]) {
1615 dev->tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]); 1618 unsigned long value = nla_get_u32(tb[IFLA_TXQLEN]);
1619
1620 if (dev->tx_queue_len ^ value)
1621 status |= DO_SETLINK_NOTIFY;
1622
1623 dev->tx_queue_len = value;
1624 }
1616 1625
1617 if (tb[IFLA_OPERSTATE]) 1626 if (tb[IFLA_OPERSTATE])
1618 set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE])); 1627 set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE]));
1619 1628
1620 if (tb[IFLA_LINKMODE]) { 1629 if (tb[IFLA_LINKMODE]) {
1630 unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]);
1631
1621 write_lock_bh(&dev_base_lock); 1632 write_lock_bh(&dev_base_lock);
1622 dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]); 1633 if (dev->link_mode ^ value)
1634 status |= DO_SETLINK_NOTIFY;
1635 dev->link_mode = value;
1623 write_unlock_bh(&dev_base_lock); 1636 write_unlock_bh(&dev_base_lock);
1624 } 1637 }
1625 1638
@@ -1634,7 +1647,7 @@ static int do_setlink(const struct sk_buff *skb,
1634 err = do_setvfinfo(dev, attr); 1647 err = do_setvfinfo(dev, attr);
1635 if (err < 0) 1648 if (err < 0)
1636 goto errout; 1649 goto errout;
1637 modified = 1; 1650 status |= DO_SETLINK_NOTIFY;
1638 } 1651 }
1639 } 1652 }
1640 err = 0; 1653 err = 0;
@@ -1664,7 +1677,7 @@ static int do_setlink(const struct sk_buff *skb,
1664 err = ops->ndo_set_vf_port(dev, vf, port); 1677 err = ops->ndo_set_vf_port(dev, vf, port);
1665 if (err < 0) 1678 if (err < 0)
1666 goto errout; 1679 goto errout;
1667 modified = 1; 1680 status |= DO_SETLINK_NOTIFY;
1668 } 1681 }
1669 } 1682 }
1670 err = 0; 1683 err = 0;
@@ -1682,7 +1695,7 @@ static int do_setlink(const struct sk_buff *skb,
1682 err = ops->ndo_set_vf_port(dev, PORT_SELF_VF, port); 1695 err = ops->ndo_set_vf_port(dev, PORT_SELF_VF, port);
1683 if (err < 0) 1696 if (err < 0)
1684 goto errout; 1697 goto errout;
1685 modified = 1; 1698 status |= DO_SETLINK_NOTIFY;
1686 } 1699 }
1687 1700
1688 if (tb[IFLA_AF_SPEC]) { 1701 if (tb[IFLA_AF_SPEC]) {
@@ -1699,15 +1712,20 @@ static int do_setlink(const struct sk_buff *skb,
1699 if (err < 0) 1712 if (err < 0)
1700 goto errout; 1713 goto errout;
1701 1714
1702 modified = 1; 1715 status |= DO_SETLINK_NOTIFY;
1703 } 1716 }
1704 } 1717 }
1705 err = 0; 1718 err = 0;
1706 1719
1707errout: 1720errout:
1708 if (err < 0 && modified) 1721 if (status & DO_SETLINK_MODIFIED) {
1709 net_warn_ratelimited("A link change request failed with some changes committed already. Interface %s may have been left with an inconsistent configuration, please check.\n", 1722 if (status & DO_SETLINK_NOTIFY)
1710 dev->name); 1723 netdev_state_change(dev);
1724
1725 if (err < 0)
1726 net_warn_ratelimited("A link change request failed with some changes committed already. Interface %s may have been left with an inconsistent configuration, please check.\n",
1727 dev->name);
1728 }
1711 1729
1712 return err; 1730 return err;
1713} 1731}
@@ -1989,7 +2007,7 @@ replay:
1989 } 2007 }
1990 2008
1991 if (dev) { 2009 if (dev) {
1992 int modified = 0; 2010 int status = 0;
1993 2011
1994 if (nlh->nlmsg_flags & NLM_F_EXCL) 2012 if (nlh->nlmsg_flags & NLM_F_EXCL)
1995 return -EEXIST; 2013 return -EEXIST;
@@ -2004,7 +2022,7 @@ replay:
2004 err = ops->changelink(dev, tb, data); 2022 err = ops->changelink(dev, tb, data);
2005 if (err < 0) 2023 if (err < 0)
2006 return err; 2024 return err;
2007 modified = 1; 2025 status |= DO_SETLINK_NOTIFY;
2008 } 2026 }
2009 2027
2010 if (linkinfo[IFLA_INFO_SLAVE_DATA]) { 2028 if (linkinfo[IFLA_INFO_SLAVE_DATA]) {
@@ -2015,10 +2033,10 @@ replay:
2015 tb, slave_data); 2033 tb, slave_data);
2016 if (err < 0) 2034 if (err < 0)
2017 return err; 2035 return err;
2018 modified = 1; 2036 status |= DO_SETLINK_NOTIFY;
2019 } 2037 }
2020 2038
2021 return do_setlink(skb, dev, ifm, tb, ifname, modified); 2039 return do_setlink(skb, dev, ifm, tb, ifname, status);
2022 } 2040 }
2023 2041
2024 if (!(nlh->nlmsg_flags & NLM_F_CREATE)) { 2042 if (!(nlh->nlmsg_flags & NLM_F_CREATE)) {
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index ba71212f0251..51dd3193a33e 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -35,7 +35,7 @@ static u32 seq_scale(u32 seq)
35 * overlaps less than one time per MSL (2 minutes). 35 * overlaps less than one time per MSL (2 minutes).
36 * Choosing a clock of 64 ns period is OK. (period of 274 s) 36 * Choosing a clock of 64 ns period is OK. (period of 274 s)
37 */ 37 */
38 return seq + (ktime_to_ns(ktime_get_real()) >> 6); 38 return seq + (ktime_get_real_ns() >> 6);
39} 39}
40#endif 40#endif
41 41
@@ -135,7 +135,7 @@ u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
135 md5_transform(hash, net_secret); 135 md5_transform(hash, net_secret);
136 136
137 seq = hash[0] | (((u64)hash[1]) << 32); 137 seq = hash[0] | (((u64)hash[1]) << 32);
138 seq += ktime_to_ns(ktime_get_real()); 138 seq += ktime_get_real_ns();
139 seq &= (1ull << 48) - 1; 139 seq &= (1ull << 48) - 1;
140 140
141 return seq; 141 return seq;
@@ -163,7 +163,7 @@ u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
163 md5_transform(hash, secret); 163 md5_transform(hash, secret);
164 164
165 seq = hash[0] | (((u64)hash[1]) << 32); 165 seq = hash[0] | (((u64)hash[1]) << 32);
166 seq += ktime_to_ns(ktime_get_real()); 166 seq += ktime_get_real_ns();
167 seq &= (1ull << 48) - 1; 167 seq &= (1ull << 48) - 1;
168 168
169 return seq; 169 return seq;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 8d289697cc7a..7b3df0d518ab 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -257,16 +257,16 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
257 kmemcheck_annotate_variable(shinfo->destructor_arg); 257 kmemcheck_annotate_variable(shinfo->destructor_arg);
258 258
259 if (flags & SKB_ALLOC_FCLONE) { 259 if (flags & SKB_ALLOC_FCLONE) {
260 struct sk_buff *child = skb + 1; 260 struct sk_buff_fclones *fclones;
261 atomic_t *fclone_ref = (atomic_t *) (child + 1);
262 261
263 kmemcheck_annotate_bitfield(child, flags1); 262 fclones = container_of(skb, struct sk_buff_fclones, skb1);
264 kmemcheck_annotate_bitfield(child, flags2); 263
264 kmemcheck_annotate_bitfield(&fclones->skb2, flags1);
265 skb->fclone = SKB_FCLONE_ORIG; 265 skb->fclone = SKB_FCLONE_ORIG;
266 atomic_set(fclone_ref, 1); 266 atomic_set(&fclones->fclone_ref, 1);
267 267
268 child->fclone = SKB_FCLONE_UNAVAILABLE; 268 fclones->skb2.fclone = SKB_FCLONE_FREE;
269 child->pfmemalloc = pfmemalloc; 269 fclones->skb2.pfmemalloc = pfmemalloc;
270 } 270 }
271out: 271out:
272 return skb; 272 return skb;
@@ -491,32 +491,33 @@ static void skb_free_head(struct sk_buff *skb)
491 491
492static void skb_release_data(struct sk_buff *skb) 492static void skb_release_data(struct sk_buff *skb)
493{ 493{
494 if (!skb->cloned || 494 struct skb_shared_info *shinfo = skb_shinfo(skb);
495 !atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, 495 int i;
496 &skb_shinfo(skb)->dataref)) {
497 if (skb_shinfo(skb)->nr_frags) {
498 int i;
499 for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
500 skb_frag_unref(skb, i);
501 }
502 496
503 /* 497 if (skb->cloned &&
504 * If skb buf is from userspace, we need to notify the caller 498 atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
505 * the lower device DMA has done; 499 &shinfo->dataref))
506 */ 500 return;
507 if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
508 struct ubuf_info *uarg;
509 501
510 uarg = skb_shinfo(skb)->destructor_arg; 502 for (i = 0; i < shinfo->nr_frags; i++)
511 if (uarg->callback) 503 __skb_frag_unref(&shinfo->frags[i]);
512 uarg->callback(uarg, true);
513 }
514 504
515 if (skb_has_frag_list(skb)) 505 /*
516 skb_drop_fraglist(skb); 506 * If skb buf is from userspace, we need to notify the caller
507 * the lower device DMA has done;
508 */
509 if (shinfo->tx_flags & SKBTX_DEV_ZEROCOPY) {
510 struct ubuf_info *uarg;
517 511
518 skb_free_head(skb); 512 uarg = shinfo->destructor_arg;
513 if (uarg->callback)
514 uarg->callback(uarg, true);
519 } 515 }
516
517 if (shinfo->frag_list)
518 kfree_skb_list(shinfo->frag_list);
519
520 skb_free_head(skb);
520} 521}
521 522
522/* 523/*
@@ -524,8 +525,7 @@ static void skb_release_data(struct sk_buff *skb)
524 */ 525 */
525static void kfree_skbmem(struct sk_buff *skb) 526static void kfree_skbmem(struct sk_buff *skb)
526{ 527{
527 struct sk_buff *other; 528 struct sk_buff_fclones *fclones;
528 atomic_t *fclone_ref;
529 529
530 switch (skb->fclone) { 530 switch (skb->fclone) {
531 case SKB_FCLONE_UNAVAILABLE: 531 case SKB_FCLONE_UNAVAILABLE:
@@ -533,22 +533,28 @@ static void kfree_skbmem(struct sk_buff *skb)
533 break; 533 break;
534 534
535 case SKB_FCLONE_ORIG: 535 case SKB_FCLONE_ORIG:
536 fclone_ref = (atomic_t *) (skb + 2); 536 fclones = container_of(skb, struct sk_buff_fclones, skb1);
537 if (atomic_dec_and_test(fclone_ref)) 537 if (atomic_dec_and_test(&fclones->fclone_ref))
538 kmem_cache_free(skbuff_fclone_cache, skb); 538 kmem_cache_free(skbuff_fclone_cache, fclones);
539 break; 539 break;
540 540
541 case SKB_FCLONE_CLONE: 541 case SKB_FCLONE_CLONE:
542 fclone_ref = (atomic_t *) (skb + 1); 542 fclones = container_of(skb, struct sk_buff_fclones, skb2);
543 other = skb - 1;
544 543
545 /* The clone portion is available for 544 /* Warning : We must perform the atomic_dec_and_test() before
546 * fast-cloning again. 545 * setting skb->fclone back to SKB_FCLONE_FREE, otherwise
546 * skb_clone() could set clone_ref to 2 before our decrement.
547 * Anyway, if we are going to free the structure, no need to
548 * rewrite skb->fclone.
547 */ 549 */
548 skb->fclone = SKB_FCLONE_UNAVAILABLE; 550 if (atomic_dec_and_test(&fclones->fclone_ref)) {
549 551 kmem_cache_free(skbuff_fclone_cache, fclones);
550 if (atomic_dec_and_test(fclone_ref)) 552 } else {
551 kmem_cache_free(skbuff_fclone_cache, other); 553 /* The clone portion is available for
554 * fast-cloning again.
555 */
556 skb->fclone = SKB_FCLONE_FREE;
557 }
552 break; 558 break;
553 } 559 }
554} 560}
@@ -566,7 +572,7 @@ static void skb_release_head_state(struct sk_buff *skb)
566#if IS_ENABLED(CONFIG_NF_CONNTRACK) 572#if IS_ENABLED(CONFIG_NF_CONNTRACK)
567 nf_conntrack_put(skb->nfct); 573 nf_conntrack_put(skb->nfct);
568#endif 574#endif
569#ifdef CONFIG_BRIDGE_NETFILTER 575#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
570 nf_bridge_put(skb->nf_bridge); 576 nf_bridge_put(skb->nf_bridge);
571#endif 577#endif
572/* XXX: IS this still necessary? - JHS */ 578/* XXX: IS this still necessary? - JHS */
@@ -674,57 +680,61 @@ void consume_skb(struct sk_buff *skb)
674} 680}
675EXPORT_SYMBOL(consume_skb); 681EXPORT_SYMBOL(consume_skb);
676 682
683/* Make sure a field is enclosed inside headers_start/headers_end section */
684#define CHECK_SKB_FIELD(field) \
685 BUILD_BUG_ON(offsetof(struct sk_buff, field) < \
686 offsetof(struct sk_buff, headers_start)); \
687 BUILD_BUG_ON(offsetof(struct sk_buff, field) > \
688 offsetof(struct sk_buff, headers_end)); \
689
677static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old) 690static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
678{ 691{
679 new->tstamp = old->tstamp; 692 new->tstamp = old->tstamp;
693 /* We do not copy old->sk */
680 new->dev = old->dev; 694 new->dev = old->dev;
681 new->transport_header = old->transport_header; 695 memcpy(new->cb, old->cb, sizeof(old->cb));
682 new->network_header = old->network_header;
683 new->mac_header = old->mac_header;
684 new->inner_protocol = old->inner_protocol;
685 new->inner_transport_header = old->inner_transport_header;
686 new->inner_network_header = old->inner_network_header;
687 new->inner_mac_header = old->inner_mac_header;
688 skb_dst_copy(new, old); 696 skb_dst_copy(new, old);
689 skb_copy_hash(new, old);
690 new->ooo_okay = old->ooo_okay;
691 new->no_fcs = old->no_fcs;
692 new->encapsulation = old->encapsulation;
693 new->encap_hdr_csum = old->encap_hdr_csum;
694 new->csum_valid = old->csum_valid;
695 new->csum_complete_sw = old->csum_complete_sw;
696#ifdef CONFIG_XFRM 697#ifdef CONFIG_XFRM
697 new->sp = secpath_get(old->sp); 698 new->sp = secpath_get(old->sp);
698#endif 699#endif
699 memcpy(new->cb, old->cb, sizeof(old->cb)); 700 __nf_copy(new, old, false);
700 new->csum = old->csum; 701
701 new->ignore_df = old->ignore_df; 702 /* Note : this field could be in headers_start/headers_end section
702 new->pkt_type = old->pkt_type; 703 * It is not yet because we do not want to have a 16 bit hole
703 new->ip_summed = old->ip_summed; 704 */
704 skb_copy_queue_mapping(new, old); 705 new->queue_mapping = old->queue_mapping;
705 new->priority = old->priority; 706
706#if IS_ENABLED(CONFIG_IP_VS) 707 memcpy(&new->headers_start, &old->headers_start,
707 new->ipvs_property = old->ipvs_property; 708 offsetof(struct sk_buff, headers_end) -
709 offsetof(struct sk_buff, headers_start));
710 CHECK_SKB_FIELD(protocol);
711 CHECK_SKB_FIELD(csum);
712 CHECK_SKB_FIELD(hash);
713 CHECK_SKB_FIELD(priority);
714 CHECK_SKB_FIELD(skb_iif);
715 CHECK_SKB_FIELD(vlan_proto);
716 CHECK_SKB_FIELD(vlan_tci);
717 CHECK_SKB_FIELD(transport_header);
718 CHECK_SKB_FIELD(network_header);
719 CHECK_SKB_FIELD(mac_header);
720 CHECK_SKB_FIELD(inner_protocol);
721 CHECK_SKB_FIELD(inner_transport_header);
722 CHECK_SKB_FIELD(inner_network_header);
723 CHECK_SKB_FIELD(inner_mac_header);
724 CHECK_SKB_FIELD(mark);
725#ifdef CONFIG_NETWORK_SECMARK
726 CHECK_SKB_FIELD(secmark);
727#endif
728#ifdef CONFIG_NET_RX_BUSY_POLL
729 CHECK_SKB_FIELD(napi_id);
708#endif 730#endif
709 new->pfmemalloc = old->pfmemalloc;
710 new->protocol = old->protocol;
711 new->mark = old->mark;
712 new->skb_iif = old->skb_iif;
713 __nf_copy(new, old);
714#ifdef CONFIG_NET_SCHED 731#ifdef CONFIG_NET_SCHED
715 new->tc_index = old->tc_index; 732 CHECK_SKB_FIELD(tc_index);
716#ifdef CONFIG_NET_CLS_ACT 733#ifdef CONFIG_NET_CLS_ACT
717 new->tc_verd = old->tc_verd; 734 CHECK_SKB_FIELD(tc_verd);
718#endif 735#endif
719#endif 736#endif
720 new->vlan_proto = old->vlan_proto;
721 new->vlan_tci = old->vlan_tci;
722 737
723 skb_copy_secmark(new, old);
724
725#ifdef CONFIG_NET_RX_BUSY_POLL
726 new->napi_id = old->napi_id;
727#endif
728} 738}
729 739
730/* 740/*
@@ -855,17 +865,22 @@ EXPORT_SYMBOL_GPL(skb_copy_ubufs);
855 865
856struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask) 866struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
857{ 867{
858 struct sk_buff *n; 868 struct sk_buff_fclones *fclones = container_of(skb,
869 struct sk_buff_fclones,
870 skb1);
871 struct sk_buff *n = &fclones->skb2;
859 872
860 if (skb_orphan_frags(skb, gfp_mask)) 873 if (skb_orphan_frags(skb, gfp_mask))
861 return NULL; 874 return NULL;
862 875
863 n = skb + 1;
864 if (skb->fclone == SKB_FCLONE_ORIG && 876 if (skb->fclone == SKB_FCLONE_ORIG &&
865 n->fclone == SKB_FCLONE_UNAVAILABLE) { 877 n->fclone == SKB_FCLONE_FREE) {
866 atomic_t *fclone_ref = (atomic_t *) (n + 1);
867 n->fclone = SKB_FCLONE_CLONE; 878 n->fclone = SKB_FCLONE_CLONE;
868 atomic_inc(fclone_ref); 879 /* As our fastclone was free, clone_ref must be 1 at this point.
880 * We could use atomic_inc() here, but it is faster
881 * to set the final value.
882 */
883 atomic_set(&fclones->fclone_ref, 2);
869 } else { 884 } else {
870 if (skb_pfmemalloc(skb)) 885 if (skb_pfmemalloc(skb))
871 gfp_mask |= __GFP_MEMALLOC; 886 gfp_mask |= __GFP_MEMALLOC;
@@ -875,7 +890,6 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
875 return NULL; 890 return NULL;
876 891
877 kmemcheck_annotate_bitfield(n, flags1); 892 kmemcheck_annotate_bitfield(n, flags1);
878 kmemcheck_annotate_bitfield(n, flags2);
879 n->fclone = SKB_FCLONE_UNAVAILABLE; 893 n->fclone = SKB_FCLONE_UNAVAILABLE;
880 } 894 }
881 895
@@ -3069,6 +3083,11 @@ perform_csum_check:
3069 } 3083 }
3070 } while ((offset += len) < head_skb->len); 3084 } while ((offset += len) < head_skb->len);
3071 3085
3086 /* Some callers want to get the end of the list.
3087 * Put it in segs->prev to avoid walking the list.
3088 * (see validate_xmit_skb_list() for example)
3089 */
3090 segs->prev = tail;
3072 return segs; 3091 return segs;
3073 3092
3074err: 3093err:
@@ -3182,7 +3201,7 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
3182 skb_shinfo(nskb)->frag_list = p; 3201 skb_shinfo(nskb)->frag_list = p;
3183 skb_shinfo(nskb)->gso_size = pinfo->gso_size; 3202 skb_shinfo(nskb)->gso_size = pinfo->gso_size;
3184 pinfo->gso_size = 0; 3203 pinfo->gso_size = 0;
3185 skb_header_release(p); 3204 __skb_header_release(p);
3186 NAPI_GRO_CB(nskb)->last = p; 3205 NAPI_GRO_CB(nskb)->last = p;
3187 3206
3188 nskb->data_len += p->len; 3207 nskb->data_len += p->len;
@@ -3214,7 +3233,7 @@ merge:
3214 else 3233 else
3215 NAPI_GRO_CB(p)->last->next = skb; 3234 NAPI_GRO_CB(p)->last->next = skb;
3216 NAPI_GRO_CB(p)->last = skb; 3235 NAPI_GRO_CB(p)->last = skb;
3217 skb_header_release(skb); 3236 __skb_header_release(skb);
3218 lp = p; 3237 lp = p;
3219 3238
3220done: 3239done:
@@ -3230,7 +3249,6 @@ done:
3230 NAPI_GRO_CB(skb)->same_flow = 1; 3249 NAPI_GRO_CB(skb)->same_flow = 1;
3231 return 0; 3250 return 0;
3232} 3251}
3233EXPORT_SYMBOL_GPL(skb_gro_receive);
3234 3252
3235void __init skb_init(void) 3253void __init skb_init(void)
3236{ 3254{
@@ -3240,8 +3258,7 @@ void __init skb_init(void)
3240 SLAB_HWCACHE_ALIGN|SLAB_PANIC, 3258 SLAB_HWCACHE_ALIGN|SLAB_PANIC,
3241 NULL); 3259 NULL);
3242 skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache", 3260 skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
3243 (2*sizeof(struct sk_buff)) + 3261 sizeof(struct sk_buff_fclones),
3244 sizeof(atomic_t),
3245 0, 3262 0,
3246 SLAB_HWCACHE_ALIGN|SLAB_PANIC, 3263 SLAB_HWCACHE_ALIGN|SLAB_PANIC,
3247 NULL); 3264 NULL);
@@ -3494,32 +3511,66 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb)
3494} 3511}
3495EXPORT_SYMBOL(sock_queue_err_skb); 3512EXPORT_SYMBOL(sock_queue_err_skb);
3496 3513
3497void __skb_tstamp_tx(struct sk_buff *orig_skb, 3514struct sk_buff *sock_dequeue_err_skb(struct sock *sk)
3498 struct skb_shared_hwtstamps *hwtstamps,
3499 struct sock *sk, int tstype)
3500{ 3515{
3501 struct sock_exterr_skb *serr; 3516 struct sk_buff_head *q = &sk->sk_error_queue;
3502 struct sk_buff *skb; 3517 struct sk_buff *skb, *skb_next;
3503 int err; 3518 int err = 0;
3504 3519
3505 if (!sk) 3520 spin_lock_bh(&q->lock);
3506 return; 3521 skb = __skb_dequeue(q);
3522 if (skb && (skb_next = skb_peek(q)))
3523 err = SKB_EXT_ERR(skb_next)->ee.ee_errno;
3524 spin_unlock_bh(&q->lock);
3507 3525
3508 if (hwtstamps) { 3526 sk->sk_err = err;
3509 *skb_hwtstamps(orig_skb) = 3527 if (err)
3510 *hwtstamps; 3528 sk->sk_error_report(sk);
3511 } else { 3529
3512 /* 3530 return skb;
3513 * no hardware time stamps available, 3531}
3514 * so keep the shared tx_flags and only 3532EXPORT_SYMBOL(sock_dequeue_err_skb);
3515 * store software time stamp 3533
3516 */ 3534/**
3517 orig_skb->tstamp = ktime_get_real(); 3535 * skb_clone_sk - create clone of skb, and take reference to socket
3536 * @skb: the skb to clone
3537 *
3538 * This function creates a clone of a buffer that holds a reference on
3539 * sk_refcnt. Buffers created via this function are meant to be
3540 * returned using sock_queue_err_skb, or free via kfree_skb.
3541 *
3542 * When passing buffers allocated with this function to sock_queue_err_skb
3543 * it is necessary to wrap the call with sock_hold/sock_put in order to
3544 * prevent the socket from being released prior to being enqueued on
3545 * the sk_error_queue.
3546 */
3547struct sk_buff *skb_clone_sk(struct sk_buff *skb)
3548{
3549 struct sock *sk = skb->sk;
3550 struct sk_buff *clone;
3551
3552 if (!sk || !atomic_inc_not_zero(&sk->sk_refcnt))
3553 return NULL;
3554
3555 clone = skb_clone(skb, GFP_ATOMIC);
3556 if (!clone) {
3557 sock_put(sk);
3558 return NULL;
3518 } 3559 }
3519 3560
3520 skb = skb_clone(orig_skb, GFP_ATOMIC); 3561 clone->sk = sk;
3521 if (!skb) 3562 clone->destructor = sock_efree;
3522 return; 3563
3564 return clone;
3565}
3566EXPORT_SYMBOL(skb_clone_sk);
3567
3568static void __skb_complete_tx_timestamp(struct sk_buff *skb,
3569 struct sock *sk,
3570 int tstype)
3571{
3572 struct sock_exterr_skb *serr;
3573 int err;
3523 3574
3524 serr = SKB_EXT_ERR(skb); 3575 serr = SKB_EXT_ERR(skb);
3525 memset(serr, 0, sizeof(*serr)); 3576 memset(serr, 0, sizeof(*serr));
@@ -3537,6 +3588,42 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
3537 if (err) 3588 if (err)
3538 kfree_skb(skb); 3589 kfree_skb(skb);
3539} 3590}
3591
3592void skb_complete_tx_timestamp(struct sk_buff *skb,
3593 struct skb_shared_hwtstamps *hwtstamps)
3594{
3595 struct sock *sk = skb->sk;
3596
3597 /* take a reference to prevent skb_orphan() from freeing the socket */
3598 sock_hold(sk);
3599
3600 *skb_hwtstamps(skb) = *hwtstamps;
3601 __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND);
3602
3603 sock_put(sk);
3604}
3605EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
3606
3607void __skb_tstamp_tx(struct sk_buff *orig_skb,
3608 struct skb_shared_hwtstamps *hwtstamps,
3609 struct sock *sk, int tstype)
3610{
3611 struct sk_buff *skb;
3612
3613 if (!sk)
3614 return;
3615
3616 if (hwtstamps)
3617 *skb_hwtstamps(orig_skb) = *hwtstamps;
3618 else
3619 orig_skb->tstamp = ktime_get_real();
3620
3621 skb = skb_clone(orig_skb, GFP_ATOMIC);
3622 if (!skb)
3623 return;
3624
3625 __skb_complete_tx_timestamp(skb, sk, tstype);
3626}
3540EXPORT_SYMBOL_GPL(__skb_tstamp_tx); 3627EXPORT_SYMBOL_GPL(__skb_tstamp_tx);
3541 3628
3542void skb_tstamp_tx(struct sk_buff *orig_skb, 3629void skb_tstamp_tx(struct sk_buff *orig_skb,
@@ -3561,9 +3648,14 @@ void skb_complete_wifi_ack(struct sk_buff *skb, bool acked)
3561 serr->ee.ee_errno = ENOMSG; 3648 serr->ee.ee_errno = ENOMSG;
3562 serr->ee.ee_origin = SO_EE_ORIGIN_TXSTATUS; 3649 serr->ee.ee_origin = SO_EE_ORIGIN_TXSTATUS;
3563 3650
3651 /* take a reference to prevent skb_orphan() from freeing the socket */
3652 sock_hold(sk);
3653
3564 err = sock_queue_err_skb(sk, skb); 3654 err = sock_queue_err_skb(sk, skb);
3565 if (err) 3655 if (err)
3566 kfree_skb(skb); 3656 kfree_skb(skb);
3657
3658 sock_put(sk);
3567} 3659}
3568EXPORT_SYMBOL_GPL(skb_complete_wifi_ack); 3660EXPORT_SYMBOL_GPL(skb_complete_wifi_ack);
3569 3661
@@ -3864,7 +3956,8 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
3864 return false; 3956 return false;
3865 3957
3866 if (len <= skb_tailroom(to)) { 3958 if (len <= skb_tailroom(to)) {
3867 BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len)); 3959 if (len)
3960 BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));
3868 *delta_truesize = 0; 3961 *delta_truesize = 0;
3869 return true; 3962 return true;
3870 } 3963 }
@@ -4029,3 +4122,81 @@ err_free:
4029 return NULL; 4122 return NULL;
4030} 4123}
4031EXPORT_SYMBOL(skb_vlan_untag); 4124EXPORT_SYMBOL(skb_vlan_untag);
4125
4126/**
4127 * alloc_skb_with_frags - allocate skb with page frags
4128 *
4129 * header_len: size of linear part
4130 * data_len: needed length in frags
4131 * max_page_order: max page order desired.
4132 * errcode: pointer to error code if any
4133 * gfp_mask: allocation mask
4134 *
4135 * This can be used to allocate a paged skb, given a maximal order for frags.
4136 */
4137struct sk_buff *alloc_skb_with_frags(unsigned long header_len,
4138 unsigned long data_len,
4139 int max_page_order,
4140 int *errcode,
4141 gfp_t gfp_mask)
4142{
4143 int npages = (data_len + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
4144 unsigned long chunk;
4145 struct sk_buff *skb;
4146 struct page *page;
4147 gfp_t gfp_head;
4148 int i;
4149
4150 *errcode = -EMSGSIZE;
4151 /* Note this test could be relaxed, if we succeed to allocate
4152 * high order pages...
4153 */
4154 if (npages > MAX_SKB_FRAGS)
4155 return NULL;
4156
4157 gfp_head = gfp_mask;
4158 if (gfp_head & __GFP_WAIT)
4159 gfp_head |= __GFP_REPEAT;
4160
4161 *errcode = -ENOBUFS;
4162 skb = alloc_skb(header_len, gfp_head);
4163 if (!skb)
4164 return NULL;
4165
4166 skb->truesize += npages << PAGE_SHIFT;
4167
4168 for (i = 0; npages > 0; i++) {
4169 int order = max_page_order;
4170
4171 while (order) {
4172 if (npages >= 1 << order) {
4173 page = alloc_pages(gfp_mask |
4174 __GFP_COMP |
4175 __GFP_NOWARN |
4176 __GFP_NORETRY,
4177 order);
4178 if (page)
4179 goto fill_page;
4180 /* Do not retry other high order allocations */
4181 order = 1;
4182 max_page_order = 0;
4183 }
4184 order--;
4185 }
4186 page = alloc_page(gfp_mask);
4187 if (!page)
4188 goto failure;
4189fill_page:
4190 chunk = min_t(unsigned long, data_len,
4191 PAGE_SIZE << order);
4192 skb_fill_page_desc(skb, i, page, 0, chunk);
4193 data_len -= chunk;
4194 npages -= 1 << order;
4195 }
4196 return skb;
4197
4198failure:
4199 kfree_skb(skb);
4200 return NULL;
4201}
4202EXPORT_SYMBOL(alloc_skb_with_frags);
diff --git a/net/core/sock.c b/net/core/sock.c
index 611f424fb76b..b4f3ea2fce60 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -437,7 +437,6 @@ static void sock_disable_timestamp(struct sock *sk, unsigned long flags)
437int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) 437int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
438{ 438{
439 int err; 439 int err;
440 int skb_len;
441 unsigned long flags; 440 unsigned long flags;
442 struct sk_buff_head *list = &sk->sk_receive_queue; 441 struct sk_buff_head *list = &sk->sk_receive_queue;
443 442
@@ -459,13 +458,6 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
459 skb->dev = NULL; 458 skb->dev = NULL;
460 skb_set_owner_r(skb, sk); 459 skb_set_owner_r(skb, sk);
461 460
462 /* Cache the SKB length before we tack it onto the receive
463 * queue. Once it is added it no longer belongs to us and
464 * may be freed by other threads of control pulling packets
465 * from the queue.
466 */
467 skb_len = skb->len;
468
469 /* we escape from rcu protected region, make sure we dont leak 461 /* we escape from rcu protected region, make sure we dont leak
470 * a norefcounted dst 462 * a norefcounted dst
471 */ 463 */
@@ -1642,18 +1634,24 @@ void sock_rfree(struct sk_buff *skb)
1642} 1634}
1643EXPORT_SYMBOL(sock_rfree); 1635EXPORT_SYMBOL(sock_rfree);
1644 1636
1637void sock_efree(struct sk_buff *skb)
1638{
1639 sock_put(skb->sk);
1640}
1641EXPORT_SYMBOL(sock_efree);
1642
1643#ifdef CONFIG_INET
1645void sock_edemux(struct sk_buff *skb) 1644void sock_edemux(struct sk_buff *skb)
1646{ 1645{
1647 struct sock *sk = skb->sk; 1646 struct sock *sk = skb->sk;
1648 1647
1649#ifdef CONFIG_INET
1650 if (sk->sk_state == TCP_TIME_WAIT) 1648 if (sk->sk_state == TCP_TIME_WAIT)
1651 inet_twsk_put(inet_twsk(sk)); 1649 inet_twsk_put(inet_twsk(sk));
1652 else 1650 else
1653#endif
1654 sock_put(sk); 1651 sock_put(sk);
1655} 1652}
1656EXPORT_SYMBOL(sock_edemux); 1653EXPORT_SYMBOL(sock_edemux);
1654#endif
1657 1655
1658kuid_t sock_i_uid(struct sock *sk) 1656kuid_t sock_i_uid(struct sock *sk)
1659{ 1657{
@@ -1761,21 +1759,12 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
1761 unsigned long data_len, int noblock, 1759 unsigned long data_len, int noblock,
1762 int *errcode, int max_page_order) 1760 int *errcode, int max_page_order)
1763{ 1761{
1764 struct sk_buff *skb = NULL; 1762 struct sk_buff *skb;
1765 unsigned long chunk;
1766 gfp_t gfp_mask;
1767 long timeo; 1763 long timeo;
1768 int err; 1764 int err;
1769 int npages = (data_len + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
1770 struct page *page;
1771 int i;
1772
1773 err = -EMSGSIZE;
1774 if (npages > MAX_SKB_FRAGS)
1775 goto failure;
1776 1765
1777 timeo = sock_sndtimeo(sk, noblock); 1766 timeo = sock_sndtimeo(sk, noblock);
1778 while (!skb) { 1767 for (;;) {
1779 err = sock_error(sk); 1768 err = sock_error(sk);
1780 if (err != 0) 1769 if (err != 0)
1781 goto failure; 1770 goto failure;
@@ -1784,66 +1773,27 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
1784 if (sk->sk_shutdown & SEND_SHUTDOWN) 1773 if (sk->sk_shutdown & SEND_SHUTDOWN)
1785 goto failure; 1774 goto failure;
1786 1775
1787 if (atomic_read(&sk->sk_wmem_alloc) >= sk->sk_sndbuf) { 1776 if (sk_wmem_alloc_get(sk) < sk->sk_sndbuf)
1788 set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags); 1777 break;
1789 set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
1790 err = -EAGAIN;
1791 if (!timeo)
1792 goto failure;
1793 if (signal_pending(current))
1794 goto interrupted;
1795 timeo = sock_wait_for_wmem(sk, timeo);
1796 continue;
1797 }
1798
1799 err = -ENOBUFS;
1800 gfp_mask = sk->sk_allocation;
1801 if (gfp_mask & __GFP_WAIT)
1802 gfp_mask |= __GFP_REPEAT;
1803 1778
1804 skb = alloc_skb(header_len, gfp_mask); 1779 set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags);
1805 if (!skb) 1780 set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
1781 err = -EAGAIN;
1782 if (!timeo)
1806 goto failure; 1783 goto failure;
1807 1784 if (signal_pending(current))
1808 skb->truesize += data_len; 1785 goto interrupted;
1809 1786 timeo = sock_wait_for_wmem(sk, timeo);
1810 for (i = 0; npages > 0; i++) {
1811 int order = max_page_order;
1812
1813 while (order) {
1814 if (npages >= 1 << order) {
1815 page = alloc_pages(sk->sk_allocation |
1816 __GFP_COMP |
1817 __GFP_NOWARN |
1818 __GFP_NORETRY,
1819 order);
1820 if (page)
1821 goto fill_page;
1822 /* Do not retry other high order allocations */
1823 order = 1;
1824 max_page_order = 0;
1825 }
1826 order--;
1827 }
1828 page = alloc_page(sk->sk_allocation);
1829 if (!page)
1830 goto failure;
1831fill_page:
1832 chunk = min_t(unsigned long, data_len,
1833 PAGE_SIZE << order);
1834 skb_fill_page_desc(skb, i, page, 0, chunk);
1835 data_len -= chunk;
1836 npages -= 1 << order;
1837 }
1838 } 1787 }
1839 1788 skb = alloc_skb_with_frags(header_len, data_len, max_page_order,
1840 skb_set_owner_w(skb, sk); 1789 errcode, sk->sk_allocation);
1790 if (skb)
1791 skb_set_owner_w(skb, sk);
1841 return skb; 1792 return skb;
1842 1793
1843interrupted: 1794interrupted:
1844 err = sock_intr_errno(timeo); 1795 err = sock_intr_errno(timeo);
1845failure: 1796failure:
1846 kfree_skb(skb);
1847 *errcode = err; 1797 *errcode = err;
1848 return NULL; 1798 return NULL;
1849} 1799}
@@ -2492,11 +2442,11 @@ int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len,
2492 int level, int type) 2442 int level, int type)
2493{ 2443{
2494 struct sock_exterr_skb *serr; 2444 struct sock_exterr_skb *serr;
2495 struct sk_buff *skb, *skb2; 2445 struct sk_buff *skb;
2496 int copied, err; 2446 int copied, err;
2497 2447
2498 err = -EAGAIN; 2448 err = -EAGAIN;
2499 skb = skb_dequeue(&sk->sk_error_queue); 2449 skb = sock_dequeue_err_skb(sk);
2500 if (skb == NULL) 2450 if (skb == NULL)
2501 goto out; 2451 goto out;
2502 2452
@@ -2517,16 +2467,6 @@ int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len,
2517 msg->msg_flags |= MSG_ERRQUEUE; 2467 msg->msg_flags |= MSG_ERRQUEUE;
2518 err = copied; 2468 err = copied;
2519 2469
2520 /* Reset and regenerate socket error */
2521 spin_lock_bh(&sk->sk_error_queue.lock);
2522 sk->sk_err = 0;
2523 if ((skb2 = skb_peek(&sk->sk_error_queue)) != NULL) {
2524 sk->sk_err = SKB_EXT_ERR(skb2)->ee.ee_errno;
2525 spin_unlock_bh(&sk->sk_error_queue.lock);
2526 sk->sk_error_report(sk);
2527 } else
2528 spin_unlock_bh(&sk->sk_error_queue.lock);
2529
2530out_free_skb: 2470out_free_skb:
2531 kfree_skb(skb); 2471 kfree_skb(skb);
2532out: 2472out:
diff --git a/net/core/timestamping.c b/net/core/timestamping.c
index a8770391ea5b..43d3dd62fcc8 100644
--- a/net/core/timestamping.c
+++ b/net/core/timestamping.c
@@ -36,10 +36,9 @@ void skb_clone_tx_timestamp(struct sk_buff *skb)
36{ 36{
37 struct phy_device *phydev; 37 struct phy_device *phydev;
38 struct sk_buff *clone; 38 struct sk_buff *clone;
39 struct sock *sk = skb->sk;
40 unsigned int type; 39 unsigned int type;
41 40
42 if (!sk) 41 if (!skb->sk)
43 return; 42 return;
44 43
45 type = classify(skb); 44 type = classify(skb);
@@ -48,50 +47,14 @@ void skb_clone_tx_timestamp(struct sk_buff *skb)
48 47
49 phydev = skb->dev->phydev; 48 phydev = skb->dev->phydev;
50 if (likely(phydev->drv->txtstamp)) { 49 if (likely(phydev->drv->txtstamp)) {
51 if (!atomic_inc_not_zero(&sk->sk_refcnt)) 50 clone = skb_clone_sk(skb);
51 if (!clone)
52 return; 52 return;
53
54 clone = skb_clone(skb, GFP_ATOMIC);
55 if (!clone) {
56 sock_put(sk);
57 return;
58 }
59
60 clone->sk = sk;
61 phydev->drv->txtstamp(phydev, clone, type); 53 phydev->drv->txtstamp(phydev, clone, type);
62 } 54 }
63} 55}
64EXPORT_SYMBOL_GPL(skb_clone_tx_timestamp); 56EXPORT_SYMBOL_GPL(skb_clone_tx_timestamp);
65 57
66void skb_complete_tx_timestamp(struct sk_buff *skb,
67 struct skb_shared_hwtstamps *hwtstamps)
68{
69 struct sock *sk = skb->sk;
70 struct sock_exterr_skb *serr;
71 int err;
72
73 if (!hwtstamps) {
74 sock_put(sk);
75 kfree_skb(skb);
76 return;
77 }
78
79 *skb_hwtstamps(skb) = *hwtstamps;
80
81 serr = SKB_EXT_ERR(skb);
82 memset(serr, 0, sizeof(*serr));
83 serr->ee.ee_errno = ENOMSG;
84 serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
85 skb->sk = NULL;
86
87 err = sock_queue_err_skb(sk, skb);
88
89 sock_put(sk);
90 if (err)
91 kfree_skb(skb);
92}
93EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
94
95bool skb_defer_rx_timestamp(struct sk_buff *skb) 58bool skb_defer_rx_timestamp(struct sk_buff *skb)
96{ 59{
97 struct phy_device *phydev; 60 struct phy_device *phydev;
diff --git a/net/core/utils.c b/net/core/utils.c
index eed34338736c..efc76dd9dcd1 100644
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -306,16 +306,14 @@ EXPORT_SYMBOL(in6_pton);
306void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb, 306void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
307 __be32 from, __be32 to, int pseudohdr) 307 __be32 from, __be32 to, int pseudohdr)
308{ 308{
309 __be32 diff[] = { ~from, to };
310 if (skb->ip_summed != CHECKSUM_PARTIAL) { 309 if (skb->ip_summed != CHECKSUM_PARTIAL) {
311 *sum = csum_fold(csum_partial(diff, sizeof(diff), 310 *sum = csum_fold(csum_add(csum_sub(~csum_unfold(*sum), from),
312 ~csum_unfold(*sum))); 311 to));
313 if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr) 312 if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
314 skb->csum = ~csum_partial(diff, sizeof(diff), 313 skb->csum = ~csum_add(csum_sub(~(skb->csum), from), to);
315 ~skb->csum);
316 } else if (pseudohdr) 314 } else if (pseudohdr)
317 *sum = ~csum_fold(csum_partial(diff, sizeof(diff), 315 *sum = ~csum_fold(csum_add(csum_sub(csum_unfold(*sum), from),
318 csum_unfold(*sum))); 316 to));
319} 317}
320EXPORT_SYMBOL(inet_proto_csum_replace4); 318EXPORT_SYMBOL(inet_proto_csum_replace4);
321 319
diff --git a/net/dccp/ccid.c b/net/dccp/ccid.c
index 597557254ddb..83498975165f 100644
--- a/net/dccp/ccid.c
+++ b/net/dccp/ccid.c
@@ -99,7 +99,7 @@ static void ccid_kmem_cache_destroy(struct kmem_cache *slab)
99 kmem_cache_destroy(slab); 99 kmem_cache_destroy(slab);
100} 100}
101 101
102static int ccid_activate(struct ccid_operations *ccid_ops) 102static int __init ccid_activate(struct ccid_operations *ccid_ops)
103{ 103{
104 int err = -ENOBUFS; 104 int err = -ENOBUFS;
105 105
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 04cb17d4b0ce..ad2acfe1ca61 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -404,7 +404,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
404 ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr; 404 ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr;
405 ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr; 405 ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr;
406 406
407 if (ipv6_opt_accepted(sk, skb) || 407 if (ipv6_opt_accepted(sk, skb, IP6CB(skb)) ||
408 np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo || 408 np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo ||
409 np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) { 409 np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) {
410 atomic_inc(&skb->users); 410 atomic_inc(&skb->users);
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index f440cc7c9f72..97b0fcc79547 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -1082,7 +1082,7 @@ void dccp_shutdown(struct sock *sk, int how)
1082 1082
1083EXPORT_SYMBOL_GPL(dccp_shutdown); 1083EXPORT_SYMBOL_GPL(dccp_shutdown);
1084 1084
1085static inline int dccp_mib_init(void) 1085static inline int __init dccp_mib_init(void)
1086{ 1086{
1087 dccp_statistics = alloc_percpu(struct dccp_mib); 1087 dccp_statistics = alloc_percpu(struct dccp_mib);
1088 if (!dccp_statistics) 1088 if (!dccp_statistics)
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index ae011b46c071..25733d538147 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -127,6 +127,7 @@ Version 0.0.6 2.1.110 07-aug-98 Eduardo Marcelo Serrat
127#include <linux/stat.h> 127#include <linux/stat.h>
128#include <linux/init.h> 128#include <linux/init.h>
129#include <linux/poll.h> 129#include <linux/poll.h>
130#include <linux/jiffies.h>
130#include <net/net_namespace.h> 131#include <net/net_namespace.h>
131#include <net/neighbour.h> 132#include <net/neighbour.h>
132#include <net/dst.h> 133#include <net/dst.h>
@@ -598,7 +599,7 @@ int dn_destroy_timer(struct sock *sk)
598 if (sk->sk_socket) 599 if (sk->sk_socket)
599 return 0; 600 return 0;
600 601
601 if ((jiffies - scp->stamp) >= (HZ * decnet_time_wait)) { 602 if (time_after_eq(jiffies, scp->stamp + HZ * decnet_time_wait)) {
602 dn_unhash_sock(sk); 603 dn_unhash_sock(sk);
603 sock_put(sk); 604 sock_put(sk);
604 return 1; 605 return 1;
diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index 3b726f31c64c..4400da7739da 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -41,6 +41,7 @@
41#include <linux/sysctl.h> 41#include <linux/sysctl.h>
42#include <linux/notifier.h> 42#include <linux/notifier.h>
43#include <linux/slab.h> 43#include <linux/slab.h>
44#include <linux/jiffies.h>
44#include <asm/uaccess.h> 45#include <asm/uaccess.h>
45#include <net/net_namespace.h> 46#include <net/net_namespace.h>
46#include <net/neighbour.h> 47#include <net/neighbour.h>
@@ -875,7 +876,7 @@ static void dn_send_endnode_hello(struct net_device *dev, struct dn_ifaddr *ifa)
875static int dn_am_i_a_router(struct dn_neigh *dn, struct dn_dev *dn_db, struct dn_ifaddr *ifa) 876static int dn_am_i_a_router(struct dn_neigh *dn, struct dn_dev *dn_db, struct dn_ifaddr *ifa)
876{ 877{
877 /* First check time since device went up */ 878 /* First check time since device went up */
878 if ((jiffies - dn_db->uptime) < DRDELAY) 879 if (time_before(jiffies, dn_db->uptime + DRDELAY))
879 return 0; 880 return 0;
880 881
881 /* If there is no router, then yes... */ 882 /* If there is no router, then yes... */
diff --git a/net/decnet/dn_timer.c b/net/decnet/dn_timer.c
index d9c150cc59a9..1d330fd43dc7 100644
--- a/net/decnet/dn_timer.c
+++ b/net/decnet/dn_timer.c
@@ -23,6 +23,7 @@
23#include <linux/spinlock.h> 23#include <linux/spinlock.h>
24#include <net/sock.h> 24#include <net/sock.h>
25#include <linux/atomic.h> 25#include <linux/atomic.h>
26#include <linux/jiffies.h>
26#include <net/flow.h> 27#include <net/flow.h>
27#include <net/dn.h> 28#include <net/dn.h>
28 29
@@ -91,7 +92,7 @@ static void dn_slow_timer(unsigned long arg)
91 * since the last successful transmission. 92 * since the last successful transmission.
92 */ 93 */
93 if (scp->keepalive && scp->keepalive_fxn && (scp->state == DN_RUN)) { 94 if (scp->keepalive && scp->keepalive_fxn && (scp->state == DN_RUN)) {
94 if ((jiffies - scp->stamp) >= scp->keepalive) 95 if (time_after_eq(jiffies, scp->stamp + scp->keepalive))
95 scp->keepalive_fxn(sk); 96 scp->keepalive_fxn(sk);
96 } 97 }
97 98
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index f5eede1d6cb8..a585fd6352eb 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -12,6 +12,9 @@ config NET_DSA
12if NET_DSA 12if NET_DSA
13 13
14# tagging formats 14# tagging formats
15config NET_DSA_TAG_BRCM
16 bool
17
15config NET_DSA_TAG_DSA 18config NET_DSA_TAG_DSA
16 bool 19 bool
17 20
diff --git a/net/dsa/Makefile b/net/dsa/Makefile
index 7b9fcbbeda5d..da06ed1df620 100644
--- a/net/dsa/Makefile
+++ b/net/dsa/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_NET_DSA) += dsa_core.o
3dsa_core-y += dsa.o slave.o 3dsa_core-y += dsa.o slave.o
4 4
5# tagging formats 5# tagging formats
6dsa_core-$(CONFIG_NET_DSA_TAG_BRCM) += tag_brcm.o
6dsa_core-$(CONFIG_NET_DSA_TAG_DSA) += tag_dsa.o 7dsa_core-$(CONFIG_NET_DSA_TAG_DSA) += tag_dsa.o
7dsa_core-$(CONFIG_NET_DSA_TAG_EDSA) += tag_edsa.o 8dsa_core-$(CONFIG_NET_DSA_TAG_EDSA) += tag_edsa.o
8dsa_core-$(CONFIG_NET_DSA_TAG_TRAILER) += tag_trailer.o 9dsa_core-$(CONFIG_NET_DSA_TAG_TRAILER) += tag_trailer.o
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 0a49632fac47..22f34cf4cb27 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -10,7 +10,6 @@
10 */ 10 */
11 11
12#include <linux/list.h> 12#include <linux/list.h>
13#include <linux/netdevice.h>
14#include <linux/platform_device.h> 13#include <linux/platform_device.h>
15#include <linux/slab.h> 14#include <linux/slab.h>
16#include <linux/module.h> 15#include <linux/module.h>
@@ -44,7 +43,7 @@ void unregister_switch_driver(struct dsa_switch_driver *drv)
44EXPORT_SYMBOL_GPL(unregister_switch_driver); 43EXPORT_SYMBOL_GPL(unregister_switch_driver);
45 44
46static struct dsa_switch_driver * 45static struct dsa_switch_driver *
47dsa_switch_probe(struct mii_bus *bus, int sw_addr, char **_name) 46dsa_switch_probe(struct device *host_dev, int sw_addr, char **_name)
48{ 47{
49 struct dsa_switch_driver *ret; 48 struct dsa_switch_driver *ret;
50 struct list_head *list; 49 struct list_head *list;
@@ -59,7 +58,7 @@ dsa_switch_probe(struct mii_bus *bus, int sw_addr, char **_name)
59 58
60 drv = list_entry(list, struct dsa_switch_driver, list); 59 drv = list_entry(list, struct dsa_switch_driver, list);
61 60
62 name = drv->probe(bus, sw_addr); 61 name = drv->probe(host_dev, sw_addr);
63 if (name != NULL) { 62 if (name != NULL) {
64 ret = drv; 63 ret = drv;
65 break; 64 break;
@@ -76,7 +75,7 @@ dsa_switch_probe(struct mii_bus *bus, int sw_addr, char **_name)
76/* basic switch operations **************************************************/ 75/* basic switch operations **************************************************/
77static struct dsa_switch * 76static struct dsa_switch *
78dsa_switch_setup(struct dsa_switch_tree *dst, int index, 77dsa_switch_setup(struct dsa_switch_tree *dst, int index,
79 struct device *parent, struct mii_bus *bus) 78 struct device *parent, struct device *host_dev)
80{ 79{
81 struct dsa_chip_data *pd = dst->pd->chip + index; 80 struct dsa_chip_data *pd = dst->pd->chip + index;
82 struct dsa_switch_driver *drv; 81 struct dsa_switch_driver *drv;
@@ -89,7 +88,7 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
89 /* 88 /*
90 * Probe for switch model. 89 * Probe for switch model.
91 */ 90 */
92 drv = dsa_switch_probe(bus, pd->sw_addr, &name); 91 drv = dsa_switch_probe(host_dev, pd->sw_addr, &name);
93 if (drv == NULL) { 92 if (drv == NULL) {
94 printk(KERN_ERR "%s[%d]: could not detect attached switch\n", 93 printk(KERN_ERR "%s[%d]: could not detect attached switch\n",
95 dst->master_netdev->name, index); 94 dst->master_netdev->name, index);
@@ -110,8 +109,7 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
110 ds->index = index; 109 ds->index = index;
111 ds->pd = dst->pd->chip + index; 110 ds->pd = dst->pd->chip + index;
112 ds->drv = drv; 111 ds->drv = drv;
113 ds->master_mii_bus = bus; 112 ds->master_dev = host_dev;
114
115 113
116 /* 114 /*
117 * Validate supplied switch configuration. 115 * Validate supplied switch configuration.
@@ -144,14 +142,44 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
144 goto out; 142 goto out;
145 } 143 }
146 144
145 /* Make the built-in MII bus mask match the number of ports,
146 * switch drivers can override this later
147 */
148 ds->phys_mii_mask = ds->phys_port_mask;
149
147 /* 150 /*
148 * If the CPU connects to this switch, set the switch tree 151 * If the CPU connects to this switch, set the switch tree
149 * tagging protocol to the preferred tagging format of this 152 * tagging protocol to the preferred tagging format of this
150 * switch. 153 * switch.
151 */ 154 */
152 if (ds->dst->cpu_switch == index) 155 if (dst->cpu_switch == index) {
153 ds->dst->tag_protocol = drv->tag_protocol; 156 switch (drv->tag_protocol) {
157#ifdef CONFIG_NET_DSA_TAG_DSA
158 case DSA_TAG_PROTO_DSA:
159 dst->rcv = dsa_netdev_ops.rcv;
160 break;
161#endif
162#ifdef CONFIG_NET_DSA_TAG_EDSA
163 case DSA_TAG_PROTO_EDSA:
164 dst->rcv = edsa_netdev_ops.rcv;
165 break;
166#endif
167#ifdef CONFIG_NET_DSA_TAG_TRAILER
168 case DSA_TAG_PROTO_TRAILER:
169 dst->rcv = trailer_netdev_ops.rcv;
170 break;
171#endif
172#ifdef CONFIG_NET_DSA_TAG_BRCM
173 case DSA_TAG_PROTO_BRCM:
174 dst->rcv = brcm_netdev_ops.rcv;
175 break;
176#endif
177 default:
178 break;
179 }
154 180
181 dst->tag_protocol = drv->tag_protocol;
182 }
155 183
156 /* 184 /*
157 * Do basic register setup. 185 * Do basic register setup.
@@ -210,6 +238,51 @@ static void dsa_switch_destroy(struct dsa_switch *ds)
210{ 238{
211} 239}
212 240
241#ifdef CONFIG_PM_SLEEP
242static int dsa_switch_suspend(struct dsa_switch *ds)
243{
244 int i, ret = 0;
245
246 /* Suspend slave network devices */
247 for (i = 0; i < DSA_MAX_PORTS; i++) {
248 if (!(ds->phys_port_mask & (1 << i)))
249 continue;
250
251 ret = dsa_slave_suspend(ds->ports[i]);
252 if (ret)
253 return ret;
254 }
255
256 if (ds->drv->suspend)
257 ret = ds->drv->suspend(ds);
258
259 return ret;
260}
261
262static int dsa_switch_resume(struct dsa_switch *ds)
263{
264 int i, ret = 0;
265
266 if (ds->drv->resume)
267 ret = ds->drv->resume(ds);
268
269 if (ret)
270 return ret;
271
272 /* Resume slave network devices */
273 for (i = 0; i < DSA_MAX_PORTS; i++) {
274 if (!(ds->phys_port_mask & (1 << i)))
275 continue;
276
277 ret = dsa_slave_resume(ds->ports[i]);
278 if (ret)
279 return ret;
280 }
281
282 return 0;
283}
284#endif
285
213 286
214/* link polling *************************************************************/ 287/* link polling *************************************************************/
215static void dsa_link_poll_work(struct work_struct *ugly) 288static void dsa_link_poll_work(struct work_struct *ugly)
@@ -256,7 +329,7 @@ static struct device *dev_find_class(struct device *parent, char *class)
256 return device_find_child(parent, class, dev_is_class); 329 return device_find_child(parent, class, dev_is_class);
257} 330}
258 331
259static struct mii_bus *dev_to_mii_bus(struct device *dev) 332struct mii_bus *dsa_host_dev_to_mii_bus(struct device *dev)
260{ 333{
261 struct device *d; 334 struct device *d;
262 335
@@ -272,6 +345,7 @@ static struct mii_bus *dev_to_mii_bus(struct device *dev)
272 345
273 return NULL; 346 return NULL;
274} 347}
348EXPORT_SYMBOL_GPL(dsa_host_dev_to_mii_bus);
275 349
276static struct net_device *dev_to_net_device(struct device *dev) 350static struct net_device *dev_to_net_device(struct device *dev)
277{ 351{
@@ -410,7 +484,8 @@ static int dsa_of_probe(struct platform_device *pdev)
410 chip_index++; 484 chip_index++;
411 cd = &pd->chip[chip_index]; 485 cd = &pd->chip[chip_index];
412 486
413 cd->mii_bus = &mdio_bus->dev; 487 cd->of_node = child;
488 cd->host_dev = &mdio_bus->dev;
414 489
415 sw_addr = of_get_property(child, "reg", NULL); 490 sw_addr = of_get_property(child, "reg", NULL);
416 if (!sw_addr) 491 if (!sw_addr)
@@ -431,6 +506,8 @@ static int dsa_of_probe(struct platform_device *pdev)
431 if (!port_name) 506 if (!port_name)
432 continue; 507 continue;
433 508
509 cd->port_dn[port_index] = port;
510
434 cd->port_names[port_index] = kstrdup(port_name, 511 cd->port_names[port_index] = kstrdup(port_name,
435 GFP_KERNEL); 512 GFP_KERNEL);
436 if (!cd->port_names[port_index]) { 513 if (!cd->port_names[port_index]) {
@@ -534,17 +611,9 @@ static int dsa_probe(struct platform_device *pdev)
534 dst->cpu_port = -1; 611 dst->cpu_port = -1;
535 612
536 for (i = 0; i < pd->nr_chips; i++) { 613 for (i = 0; i < pd->nr_chips; i++) {
537 struct mii_bus *bus;
538 struct dsa_switch *ds; 614 struct dsa_switch *ds;
539 615
540 bus = dev_to_mii_bus(pd->chip[i].mii_bus); 616 ds = dsa_switch_setup(dst, i, &pdev->dev, pd->chip[i].host_dev);
541 if (bus == NULL) {
542 printk(KERN_ERR "%s[%d]: no mii bus found for "
543 "dsa switch\n", dev->name, i);
544 continue;
545 }
546
547 ds = dsa_switch_setup(dst, i, &pdev->dev, bus);
548 if (IS_ERR(ds)) { 617 if (IS_ERR(ds)) {
549 printk(KERN_ERR "%s[%d]: couldn't create dsa switch " 618 printk(KERN_ERR "%s[%d]: couldn't create dsa switch "
550 "instance (error %ld)\n", dev->name, i, 619 "instance (error %ld)\n", dev->name, i,
@@ -608,7 +677,62 @@ static void dsa_shutdown(struct platform_device *pdev)
608{ 677{
609} 678}
610 679
680static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
681 struct packet_type *pt, struct net_device *orig_dev)
682{
683 struct dsa_switch_tree *dst = dev->dsa_ptr;
684
685 if (unlikely(dst == NULL)) {
686 kfree_skb(skb);
687 return 0;
688 }
689
690 return dst->rcv(skb, dev, pt, orig_dev);
691}
692
693static struct packet_type dsa_pack_type __read_mostly = {
694 .type = cpu_to_be16(ETH_P_XDSA),
695 .func = dsa_switch_rcv,
696};
697
698#ifdef CONFIG_PM_SLEEP
699static int dsa_suspend(struct device *d)
700{
701 struct platform_device *pdev = to_platform_device(d);
702 struct dsa_switch_tree *dst = platform_get_drvdata(pdev);
703 int i, ret = 0;
704
705 for (i = 0; i < dst->pd->nr_chips; i++) {
706 struct dsa_switch *ds = dst->ds[i];
707
708 if (ds != NULL)
709 ret = dsa_switch_suspend(ds);
710 }
711
712 return ret;
713}
714
715static int dsa_resume(struct device *d)
716{
717 struct platform_device *pdev = to_platform_device(d);
718 struct dsa_switch_tree *dst = platform_get_drvdata(pdev);
719 int i, ret = 0;
720
721 for (i = 0; i < dst->pd->nr_chips; i++) {
722 struct dsa_switch *ds = dst->ds[i];
723
724 if (ds != NULL)
725 ret = dsa_switch_resume(ds);
726 }
727
728 return ret;
729}
730#endif
731
732static SIMPLE_DEV_PM_OPS(dsa_pm_ops, dsa_suspend, dsa_resume);
733
611static const struct of_device_id dsa_of_match_table[] = { 734static const struct of_device_id dsa_of_match_table[] = {
735 { .compatible = "brcm,bcm7445-switch-v4.0" },
612 { .compatible = "marvell,dsa", }, 736 { .compatible = "marvell,dsa", },
613 {} 737 {}
614}; 738};
@@ -622,6 +746,7 @@ static struct platform_driver dsa_driver = {
622 .name = "dsa", 746 .name = "dsa",
623 .owner = THIS_MODULE, 747 .owner = THIS_MODULE,
624 .of_match_table = dsa_of_match_table, 748 .of_match_table = dsa_of_match_table,
749 .pm = &dsa_pm_ops,
625 }, 750 },
626}; 751};
627 752
@@ -633,30 +758,15 @@ static int __init dsa_init_module(void)
633 if (rc) 758 if (rc)
634 return rc; 759 return rc;
635 760
636#ifdef CONFIG_NET_DSA_TAG_DSA 761 dev_add_pack(&dsa_pack_type);
637 dev_add_pack(&dsa_packet_type); 762
638#endif
639#ifdef CONFIG_NET_DSA_TAG_EDSA
640 dev_add_pack(&edsa_packet_type);
641#endif
642#ifdef CONFIG_NET_DSA_TAG_TRAILER
643 dev_add_pack(&trailer_packet_type);
644#endif
645 return 0; 763 return 0;
646} 764}
647module_init(dsa_init_module); 765module_init(dsa_init_module);
648 766
649static void __exit dsa_cleanup_module(void) 767static void __exit dsa_cleanup_module(void)
650{ 768{
651#ifdef CONFIG_NET_DSA_TAG_TRAILER 769 dev_remove_pack(&dsa_pack_type);
652 dev_remove_pack(&trailer_packet_type);
653#endif
654#ifdef CONFIG_NET_DSA_TAG_EDSA
655 dev_remove_pack(&edsa_packet_type);
656#endif
657#ifdef CONFIG_NET_DSA_TAG_DSA
658 dev_remove_pack(&dsa_packet_type);
659#endif
660 platform_driver_unregister(&dsa_driver); 770 platform_driver_unregister(&dsa_driver);
661} 771}
662module_exit(dsa_cleanup_module); 772module_exit(dsa_cleanup_module);
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index d4cf5cc747e3..dc9756d3154c 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -12,7 +12,13 @@
12#define __DSA_PRIV_H 12#define __DSA_PRIV_H
13 13
14#include <linux/phy.h> 14#include <linux/phy.h>
15#include <net/dsa.h> 15#include <linux/netdevice.h>
16
17struct dsa_device_ops {
18 netdev_tx_t (*xmit)(struct sk_buff *skb, struct net_device *dev);
19 int (*rcv)(struct sk_buff *skb, struct net_device *dev,
20 struct packet_type *pt, struct net_device *orig_dev);
21};
16 22
17struct dsa_slave_priv { 23struct dsa_slave_priv {
18 /* 24 /*
@@ -20,6 +26,8 @@ struct dsa_slave_priv {
20 * switch port. 26 * switch port.
21 */ 27 */
22 struct net_device *dev; 28 struct net_device *dev;
29 netdev_tx_t (*xmit)(struct sk_buff *skb,
30 struct net_device *dev);
23 31
24 /* 32 /*
25 * Which switch this port is a part of, and the port index 33 * Which switch this port is a part of, and the port index
@@ -33,28 +41,35 @@ struct dsa_slave_priv {
33 * to this port. 41 * to this port.
34 */ 42 */
35 struct phy_device *phy; 43 struct phy_device *phy;
44 phy_interface_t phy_interface;
45 int old_link;
46 int old_pause;
47 int old_duplex;
36}; 48};
37 49
38/* dsa.c */ 50/* dsa.c */
39extern char dsa_driver_version[]; 51extern char dsa_driver_version[];
40 52
41/* slave.c */ 53/* slave.c */
54extern const struct dsa_device_ops notag_netdev_ops;
42void dsa_slave_mii_bus_init(struct dsa_switch *ds); 55void dsa_slave_mii_bus_init(struct dsa_switch *ds);
43struct net_device *dsa_slave_create(struct dsa_switch *ds, 56struct net_device *dsa_slave_create(struct dsa_switch *ds,
44 struct device *parent, 57 struct device *parent,
45 int port, char *name); 58 int port, char *name);
59int dsa_slave_suspend(struct net_device *slave_dev);
60int dsa_slave_resume(struct net_device *slave_dev);
46 61
47/* tag_dsa.c */ 62/* tag_dsa.c */
48netdev_tx_t dsa_xmit(struct sk_buff *skb, struct net_device *dev); 63extern const struct dsa_device_ops dsa_netdev_ops;
49extern struct packet_type dsa_packet_type;
50 64
51/* tag_edsa.c */ 65/* tag_edsa.c */
52netdev_tx_t edsa_xmit(struct sk_buff *skb, struct net_device *dev); 66extern const struct dsa_device_ops edsa_netdev_ops;
53extern struct packet_type edsa_packet_type;
54 67
55/* tag_trailer.c */ 68/* tag_trailer.c */
56netdev_tx_t trailer_xmit(struct sk_buff *skb, struct net_device *dev); 69extern const struct dsa_device_ops trailer_netdev_ops;
57extern struct packet_type trailer_packet_type; 70
71/* tag_brcm.c */
72extern const struct dsa_device_ops brcm_netdev_ops;
58 73
59 74
60#endif 75#endif
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 45a1e34c89e0..8030489d9cbe 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -9,9 +9,10 @@
9 */ 9 */
10 10
11#include <linux/list.h> 11#include <linux/list.h>
12#include <linux/netdevice.h>
13#include <linux/etherdevice.h> 12#include <linux/etherdevice.h>
14#include <linux/phy.h> 13#include <linux/phy.h>
14#include <linux/of_net.h>
15#include <linux/of_mdio.h>
15#include "dsa_priv.h" 16#include "dsa_priv.h"
16 17
17/* slave mii_bus handling ***************************************************/ 18/* slave mii_bus handling ***************************************************/
@@ -19,7 +20,7 @@ static int dsa_slave_phy_read(struct mii_bus *bus, int addr, int reg)
19{ 20{
20 struct dsa_switch *ds = bus->priv; 21 struct dsa_switch *ds = bus->priv;
21 22
22 if (ds->phys_port_mask & (1 << addr)) 23 if (ds->phys_mii_mask & (1 << addr))
23 return ds->drv->phy_read(ds, addr, reg); 24 return ds->drv->phy_read(ds, addr, reg);
24 25
25 return 0xffff; 26 return 0xffff;
@@ -29,7 +30,7 @@ static int dsa_slave_phy_write(struct mii_bus *bus, int addr, int reg, u16 val)
29{ 30{
30 struct dsa_switch *ds = bus->priv; 31 struct dsa_switch *ds = bus->priv;
31 32
32 if (ds->phys_port_mask & (1 << addr)) 33 if (ds->phys_mii_mask & (1 << addr))
33 return ds->drv->phy_write(ds, addr, reg, val); 34 return ds->drv->phy_write(ds, addr, reg, val);
34 35
35 return 0; 36 return 0;
@@ -43,7 +44,7 @@ void dsa_slave_mii_bus_init(struct dsa_switch *ds)
43 ds->slave_mii_bus->write = dsa_slave_phy_write; 44 ds->slave_mii_bus->write = dsa_slave_phy_write;
44 snprintf(ds->slave_mii_bus->id, MII_BUS_ID_SIZE, "dsa-%d:%.2x", 45 snprintf(ds->slave_mii_bus->id, MII_BUS_ID_SIZE, "dsa-%d:%.2x",
45 ds->index, ds->pd->sw_addr); 46 ds->index, ds->pd->sw_addr);
46 ds->slave_mii_bus->parent = &ds->master_mii_bus->dev; 47 ds->slave_mii_bus->parent = ds->master_dev;
47} 48}
48 49
49 50
@@ -61,6 +62,7 @@ static int dsa_slave_open(struct net_device *dev)
61{ 62{
62 struct dsa_slave_priv *p = netdev_priv(dev); 63 struct dsa_slave_priv *p = netdev_priv(dev);
63 struct net_device *master = p->parent->dst->master_netdev; 64 struct net_device *master = p->parent->dst->master_netdev;
65 struct dsa_switch *ds = p->parent;
64 int err; 66 int err;
65 67
66 if (!(master->flags & IFF_UP)) 68 if (!(master->flags & IFF_UP))
@@ -83,8 +85,20 @@ static int dsa_slave_open(struct net_device *dev)
83 goto clear_allmulti; 85 goto clear_allmulti;
84 } 86 }
85 87
88 if (ds->drv->port_enable) {
89 err = ds->drv->port_enable(ds, p->port, p->phy);
90 if (err)
91 goto clear_promisc;
92 }
93
94 if (p->phy)
95 phy_start(p->phy);
96
86 return 0; 97 return 0;
87 98
99clear_promisc:
100 if (dev->flags & IFF_PROMISC)
101 dev_set_promiscuity(master, 0);
88clear_allmulti: 102clear_allmulti:
89 if (dev->flags & IFF_ALLMULTI) 103 if (dev->flags & IFF_ALLMULTI)
90 dev_set_allmulti(master, -1); 104 dev_set_allmulti(master, -1);
@@ -99,6 +113,10 @@ static int dsa_slave_close(struct net_device *dev)
99{ 113{
100 struct dsa_slave_priv *p = netdev_priv(dev); 114 struct dsa_slave_priv *p = netdev_priv(dev);
101 struct net_device *master = p->parent->dst->master_netdev; 115 struct net_device *master = p->parent->dst->master_netdev;
116 struct dsa_switch *ds = p->parent;
117
118 if (p->phy)
119 phy_stop(p->phy);
102 120
103 dev_mc_unsync(master, dev); 121 dev_mc_unsync(master, dev);
104 dev_uc_unsync(master, dev); 122 dev_uc_unsync(master, dev);
@@ -110,6 +128,9 @@ static int dsa_slave_close(struct net_device *dev)
110 if (!ether_addr_equal(dev->dev_addr, master->dev_addr)) 128 if (!ether_addr_equal(dev->dev_addr, master->dev_addr))
111 dev_uc_del(master, dev->dev_addr); 129 dev_uc_del(master, dev->dev_addr);
112 130
131 if (ds->drv->port_disable)
132 ds->drv->port_disable(ds, p->port, p->phy);
133
113 return 0; 134 return 0;
114} 135}
115 136
@@ -171,6 +192,24 @@ static int dsa_slave_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
171 return -EOPNOTSUPP; 192 return -EOPNOTSUPP;
172} 193}
173 194
195static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev)
196{
197 struct dsa_slave_priv *p = netdev_priv(dev);
198
199 return p->xmit(skb, dev);
200}
201
202static netdev_tx_t dsa_slave_notag_xmit(struct sk_buff *skb,
203 struct net_device *dev)
204{
205 struct dsa_slave_priv *p = netdev_priv(dev);
206
207 skb->dev = p->parent->dst->master_netdev;
208 dev_queue_xmit(skb);
209
210 return NETDEV_TX_OK;
211}
212
174 213
175/* ethtool operations *******************************************************/ 214/* ethtool operations *******************************************************/
176static int 215static int
@@ -282,6 +321,65 @@ static int dsa_slave_get_sset_count(struct net_device *dev, int sset)
282 return -EOPNOTSUPP; 321 return -EOPNOTSUPP;
283} 322}
284 323
324static void dsa_slave_get_wol(struct net_device *dev, struct ethtool_wolinfo *w)
325{
326 struct dsa_slave_priv *p = netdev_priv(dev);
327 struct dsa_switch *ds = p->parent;
328
329 if (ds->drv->get_wol)
330 ds->drv->get_wol(ds, p->port, w);
331}
332
333static int dsa_slave_set_wol(struct net_device *dev, struct ethtool_wolinfo *w)
334{
335 struct dsa_slave_priv *p = netdev_priv(dev);
336 struct dsa_switch *ds = p->parent;
337 int ret = -EOPNOTSUPP;
338
339 if (ds->drv->set_wol)
340 ret = ds->drv->set_wol(ds, p->port, w);
341
342 return ret;
343}
344
345static int dsa_slave_set_eee(struct net_device *dev, struct ethtool_eee *e)
346{
347 struct dsa_slave_priv *p = netdev_priv(dev);
348 struct dsa_switch *ds = p->parent;
349 int ret;
350
351 if (!ds->drv->set_eee)
352 return -EOPNOTSUPP;
353
354 ret = ds->drv->set_eee(ds, p->port, p->phy, e);
355 if (ret)
356 return ret;
357
358 if (p->phy)
359 ret = phy_ethtool_set_eee(p->phy, e);
360
361 return ret;
362}
363
364static int dsa_slave_get_eee(struct net_device *dev, struct ethtool_eee *e)
365{
366 struct dsa_slave_priv *p = netdev_priv(dev);
367 struct dsa_switch *ds = p->parent;
368 int ret;
369
370 if (!ds->drv->get_eee)
371 return -EOPNOTSUPP;
372
373 ret = ds->drv->get_eee(ds, p->port, e);
374 if (ret)
375 return ret;
376
377 if (p->phy)
378 ret = phy_ethtool_get_eee(p->phy, e);
379
380 return ret;
381}
382
285static const struct ethtool_ops dsa_slave_ethtool_ops = { 383static const struct ethtool_ops dsa_slave_ethtool_ops = {
286 .get_settings = dsa_slave_get_settings, 384 .get_settings = dsa_slave_get_settings,
287 .set_settings = dsa_slave_set_settings, 385 .set_settings = dsa_slave_set_settings,
@@ -291,46 +389,143 @@ static const struct ethtool_ops dsa_slave_ethtool_ops = {
291 .get_strings = dsa_slave_get_strings, 389 .get_strings = dsa_slave_get_strings,
292 .get_ethtool_stats = dsa_slave_get_ethtool_stats, 390 .get_ethtool_stats = dsa_slave_get_ethtool_stats,
293 .get_sset_count = dsa_slave_get_sset_count, 391 .get_sset_count = dsa_slave_get_sset_count,
392 .set_wol = dsa_slave_set_wol,
393 .get_wol = dsa_slave_get_wol,
394 .set_eee = dsa_slave_set_eee,
395 .get_eee = dsa_slave_get_eee,
294}; 396};
295 397
296#ifdef CONFIG_NET_DSA_TAG_DSA 398static const struct net_device_ops dsa_slave_netdev_ops = {
297static const struct net_device_ops dsa_netdev_ops = {
298 .ndo_init = dsa_slave_init, 399 .ndo_init = dsa_slave_init,
299 .ndo_open = dsa_slave_open, 400 .ndo_open = dsa_slave_open,
300 .ndo_stop = dsa_slave_close, 401 .ndo_stop = dsa_slave_close,
301 .ndo_start_xmit = dsa_xmit, 402 .ndo_start_xmit = dsa_slave_xmit,
302 .ndo_change_rx_flags = dsa_slave_change_rx_flags, 403 .ndo_change_rx_flags = dsa_slave_change_rx_flags,
303 .ndo_set_rx_mode = dsa_slave_set_rx_mode, 404 .ndo_set_rx_mode = dsa_slave_set_rx_mode,
304 .ndo_set_mac_address = dsa_slave_set_mac_address, 405 .ndo_set_mac_address = dsa_slave_set_mac_address,
305 .ndo_do_ioctl = dsa_slave_ioctl, 406 .ndo_do_ioctl = dsa_slave_ioctl,
306}; 407};
307#endif 408
308#ifdef CONFIG_NET_DSA_TAG_EDSA 409static void dsa_slave_adjust_link(struct net_device *dev)
309static const struct net_device_ops edsa_netdev_ops = { 410{
310 .ndo_init = dsa_slave_init, 411 struct dsa_slave_priv *p = netdev_priv(dev);
311 .ndo_open = dsa_slave_open, 412 struct dsa_switch *ds = p->parent;
312 .ndo_stop = dsa_slave_close, 413 unsigned int status_changed = 0;
313 .ndo_start_xmit = edsa_xmit, 414
314 .ndo_change_rx_flags = dsa_slave_change_rx_flags, 415 if (p->old_link != p->phy->link) {
315 .ndo_set_rx_mode = dsa_slave_set_rx_mode, 416 status_changed = 1;
316 .ndo_set_mac_address = dsa_slave_set_mac_address, 417 p->old_link = p->phy->link;
317 .ndo_do_ioctl = dsa_slave_ioctl, 418 }
318}; 419
319#endif 420 if (p->old_duplex != p->phy->duplex) {
320#ifdef CONFIG_NET_DSA_TAG_TRAILER 421 status_changed = 1;
321static const struct net_device_ops trailer_netdev_ops = { 422 p->old_duplex = p->phy->duplex;
322 .ndo_init = dsa_slave_init, 423 }
323 .ndo_open = dsa_slave_open, 424
324 .ndo_stop = dsa_slave_close, 425 if (p->old_pause != p->phy->pause) {
325 .ndo_start_xmit = trailer_xmit, 426 status_changed = 1;
326 .ndo_change_rx_flags = dsa_slave_change_rx_flags, 427 p->old_pause = p->phy->pause;
327 .ndo_set_rx_mode = dsa_slave_set_rx_mode, 428 }
328 .ndo_set_mac_address = dsa_slave_set_mac_address, 429
329 .ndo_do_ioctl = dsa_slave_ioctl, 430 if (ds->drv->adjust_link && status_changed)
330}; 431 ds->drv->adjust_link(ds, p->port, p->phy);
331#endif 432
433 if (status_changed)
434 phy_print_status(p->phy);
435}
436
437static int dsa_slave_fixed_link_update(struct net_device *dev,
438 struct fixed_phy_status *status)
439{
440 struct dsa_slave_priv *p = netdev_priv(dev);
441 struct dsa_switch *ds = p->parent;
442
443 if (ds->drv->fixed_link_update)
444 ds->drv->fixed_link_update(ds, p->port, status);
445
446 return 0;
447}
332 448
333/* slave device setup *******************************************************/ 449/* slave device setup *******************************************************/
450static void dsa_slave_phy_setup(struct dsa_slave_priv *p,
451 struct net_device *slave_dev)
452{
453 struct dsa_switch *ds = p->parent;
454 struct dsa_chip_data *cd = ds->pd;
455 struct device_node *phy_dn, *port_dn;
456 bool phy_is_fixed = false;
457 u32 phy_flags = 0;
458 int ret;
459
460 port_dn = cd->port_dn[p->port];
461 p->phy_interface = of_get_phy_mode(port_dn);
462
463 phy_dn = of_parse_phandle(port_dn, "phy-handle", 0);
464 if (of_phy_is_fixed_link(port_dn)) {
465 /* In the case of a fixed PHY, the DT node associated
466 * to the fixed PHY is the Port DT node
467 */
468 ret = of_phy_register_fixed_link(port_dn);
469 if (ret) {
470 pr_err("failed to register fixed PHY\n");
471 return;
472 }
473 phy_is_fixed = true;
474 phy_dn = port_dn;
475 }
476
477 if (ds->drv->get_phy_flags)
478 phy_flags = ds->drv->get_phy_flags(ds, p->port);
479
480 if (phy_dn)
481 p->phy = of_phy_connect(slave_dev, phy_dn,
482 dsa_slave_adjust_link, phy_flags,
483 p->phy_interface);
484
485 if (p->phy && phy_is_fixed)
486 fixed_phy_set_link_update(p->phy, dsa_slave_fixed_link_update);
487
488 /* We could not connect to a designated PHY, so use the switch internal
489 * MDIO bus instead
490 */
491 if (!p->phy)
492 p->phy = ds->slave_mii_bus->phy_map[p->port];
493 else
494 pr_info("attached PHY at address %d [%s]\n",
495 p->phy->addr, p->phy->drv->name);
496}
497
498int dsa_slave_suspend(struct net_device *slave_dev)
499{
500 struct dsa_slave_priv *p = netdev_priv(slave_dev);
501
502 netif_device_detach(slave_dev);
503
504 if (p->phy) {
505 phy_stop(p->phy);
506 p->old_pause = -1;
507 p->old_link = -1;
508 p->old_duplex = -1;
509 phy_suspend(p->phy);
510 }
511
512 return 0;
513}
514
515int dsa_slave_resume(struct net_device *slave_dev)
516{
517 struct dsa_slave_priv *p = netdev_priv(slave_dev);
518
519 netif_device_attach(slave_dev);
520
521 if (p->phy) {
522 phy_resume(p->phy);
523 phy_start(p->phy);
524 }
525
526 return 0;
527}
528
334struct net_device * 529struct net_device *
335dsa_slave_create(struct dsa_switch *ds, struct device *parent, 530dsa_slave_create(struct dsa_switch *ds, struct device *parent,
336 int port, char *name) 531 int port, char *name)
@@ -349,35 +544,48 @@ dsa_slave_create(struct dsa_switch *ds, struct device *parent,
349 slave_dev->ethtool_ops = &dsa_slave_ethtool_ops; 544 slave_dev->ethtool_ops = &dsa_slave_ethtool_ops;
350 eth_hw_addr_inherit(slave_dev, master); 545 eth_hw_addr_inherit(slave_dev, master);
351 slave_dev->tx_queue_len = 0; 546 slave_dev->tx_queue_len = 0;
547 slave_dev->netdev_ops = &dsa_slave_netdev_ops;
548
549 SET_NETDEV_DEV(slave_dev, parent);
550 slave_dev->dev.of_node = ds->pd->port_dn[port];
551 slave_dev->vlan_features = master->vlan_features;
552
553 p = netdev_priv(slave_dev);
554 p->dev = slave_dev;
555 p->parent = ds;
556 p->port = port;
352 557
353 switch (ds->dst->tag_protocol) { 558 switch (ds->dst->tag_protocol) {
354#ifdef CONFIG_NET_DSA_TAG_DSA 559#ifdef CONFIG_NET_DSA_TAG_DSA
355 case htons(ETH_P_DSA): 560 case DSA_TAG_PROTO_DSA:
356 slave_dev->netdev_ops = &dsa_netdev_ops; 561 p->xmit = dsa_netdev_ops.xmit;
357 break; 562 break;
358#endif 563#endif
359#ifdef CONFIG_NET_DSA_TAG_EDSA 564#ifdef CONFIG_NET_DSA_TAG_EDSA
360 case htons(ETH_P_EDSA): 565 case DSA_TAG_PROTO_EDSA:
361 slave_dev->netdev_ops = &edsa_netdev_ops; 566 p->xmit = edsa_netdev_ops.xmit;
362 break; 567 break;
363#endif 568#endif
364#ifdef CONFIG_NET_DSA_TAG_TRAILER 569#ifdef CONFIG_NET_DSA_TAG_TRAILER
365 case htons(ETH_P_TRAILER): 570 case DSA_TAG_PROTO_TRAILER:
366 slave_dev->netdev_ops = &trailer_netdev_ops; 571 p->xmit = trailer_netdev_ops.xmit;
572 break;
573#endif
574#ifdef CONFIG_NET_DSA_TAG_BRCM
575 case DSA_TAG_PROTO_BRCM:
576 p->xmit = brcm_netdev_ops.xmit;
367 break; 577 break;
368#endif 578#endif
369 default: 579 default:
370 BUG(); 580 p->xmit = dsa_slave_notag_xmit;
581 break;
371 } 582 }
372 583
373 SET_NETDEV_DEV(slave_dev, parent); 584 p->old_pause = -1;
374 slave_dev->vlan_features = master->vlan_features; 585 p->old_link = -1;
586 p->old_duplex = -1;
375 587
376 p = netdev_priv(slave_dev); 588 dsa_slave_phy_setup(p, slave_dev);
377 p->dev = slave_dev;
378 p->parent = ds;
379 p->port = port;
380 p->phy = ds->slave_mii_bus->phy_map[port];
381 589
382 ret = register_netdev(slave_dev); 590 ret = register_netdev(slave_dev);
383 if (ret) { 591 if (ret) {
@@ -390,6 +598,9 @@ dsa_slave_create(struct dsa_switch *ds, struct device *parent,
390 netif_carrier_off(slave_dev); 598 netif_carrier_off(slave_dev);
391 599
392 if (p->phy != NULL) { 600 if (p->phy != NULL) {
601 if (ds->drv->get_phy_flags(ds, port))
602 p->phy->dev_flags |= ds->drv->get_phy_flags(ds, port);
603
393 phy_attach(slave_dev, dev_name(&p->phy->dev), 604 phy_attach(slave_dev, dev_name(&p->phy->dev),
394 PHY_INTERFACE_MODE_GMII); 605 PHY_INTERFACE_MODE_GMII);
395 606
@@ -397,7 +608,6 @@ dsa_slave_create(struct dsa_switch *ds, struct device *parent,
397 p->phy->speed = 0; 608 p->phy->speed = 0;
398 p->phy->duplex = 0; 609 p->phy->duplex = 0;
399 p->phy->advertising = p->phy->supported | ADVERTISED_Autoneg; 610 p->phy->advertising = p->phy->supported | ADVERTISED_Autoneg;
400 phy_start_aneg(p->phy);
401 } 611 }
402 612
403 return slave_dev; 613 return slave_dev;
diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
new file mode 100644
index 000000000000..83d3572cdb20
--- /dev/null
+++ b/net/dsa/tag_brcm.c
@@ -0,0 +1,171 @@
1/*
2 * Broadcom tag support
3 *
4 * Copyright (C) 2014 Broadcom Corporation
5 *
6 * This program is free software; you can redistribute it and/or modify
7 * it under the terms of the GNU General Public License as published by
8 * the Free Software Foundation; either version 2 of the License, or
9 * (at your option) any later version.
10 */
11
12#include <linux/etherdevice.h>
13#include <linux/list.h>
14#include <linux/slab.h>
15#include "dsa_priv.h"
16
17/* This tag length is 4 bytes, older ones were 6 bytes, we do not
18 * handle them
19 */
20#define BRCM_TAG_LEN 4
21
22/* Tag is constructed and desconstructed using byte by byte access
23 * because the tag is placed after the MAC Source Address, which does
24 * not make it 4-bytes aligned, so this might cause unaligned accesses
25 * on most systems where this is used.
26 */
27
28/* Ingress and egress opcodes */
29#define BRCM_OPCODE_SHIFT 5
30#define BRCM_OPCODE_MASK 0x7
31
32/* Ingress fields */
33/* 1st byte in the tag */
34#define BRCM_IG_TC_SHIFT 2
35#define BRCM_IG_TC_MASK 0x7
36/* 2nd byte in the tag */
37#define BRCM_IG_TE_MASK 0x3
38#define BRCM_IG_TS_SHIFT 7
39/* 3rd byte in the tag */
40#define BRCM_IG_DSTMAP2_MASK 1
41#define BRCM_IG_DSTMAP1_MASK 0xff
42
43/* Egress fields */
44
45/* 2nd byte in the tag */
46#define BRCM_EG_CID_MASK 0xff
47
48/* 3rd byte in the tag */
49#define BRCM_EG_RC_MASK 0xff
50#define BRCM_EG_RC_RSVD (3 << 6)
51#define BRCM_EG_RC_EXCEPTION (1 << 5)
52#define BRCM_EG_RC_PROT_SNOOP (1 << 4)
53#define BRCM_EG_RC_PROT_TERM (1 << 3)
54#define BRCM_EG_RC_SWITCH (1 << 2)
55#define BRCM_EG_RC_MAC_LEARN (1 << 1)
56#define BRCM_EG_RC_MIRROR (1 << 0)
57#define BRCM_EG_TC_SHIFT 5
58#define BRCM_EG_TC_MASK 0x7
59#define BRCM_EG_PID_MASK 0x1f
60
61static netdev_tx_t brcm_tag_xmit(struct sk_buff *skb, struct net_device *dev)
62{
63 struct dsa_slave_priv *p = netdev_priv(dev);
64 u8 *brcm_tag;
65
66 dev->stats.tx_packets++;
67 dev->stats.tx_bytes += skb->len;
68
69 if (skb_cow_head(skb, BRCM_TAG_LEN) < 0)
70 goto out_free;
71
72 skb_push(skb, BRCM_TAG_LEN);
73
74 memmove(skb->data, skb->data + BRCM_TAG_LEN, 2 * ETH_ALEN);
75
76 /* Build the tag after the MAC Source Address */
77 brcm_tag = skb->data + 2 * ETH_ALEN;
78
79 /* Set the ingress opcode, traffic class, tag enforcment is
80 * deprecated
81 */
82 brcm_tag[0] = (1 << BRCM_OPCODE_SHIFT) |
83 ((skb->priority << BRCM_IG_TC_SHIFT) & BRCM_IG_TC_MASK);
84 brcm_tag[1] = 0;
85 brcm_tag[2] = 0;
86 if (p->port == 8)
87 brcm_tag[2] = BRCM_IG_DSTMAP2_MASK;
88 brcm_tag[3] = (1 << p->port) & BRCM_IG_DSTMAP1_MASK;
89
90 /* Queue the SKB for transmission on the parent interface, but
91 * do not modify its EtherType
92 */
93 skb->dev = p->parent->dst->master_netdev;
94 dev_queue_xmit(skb);
95
96 return NETDEV_TX_OK;
97
98out_free:
99 kfree_skb(skb);
100 return NETDEV_TX_OK;
101}
102
103static int brcm_tag_rcv(struct sk_buff *skb, struct net_device *dev,
104 struct packet_type *pt, struct net_device *orig_dev)
105{
106 struct dsa_switch_tree *dst = dev->dsa_ptr;
107 struct dsa_switch *ds;
108 int source_port;
109 u8 *brcm_tag;
110
111 if (unlikely(dst == NULL))
112 goto out_drop;
113
114 ds = dst->ds[0];
115
116 skb = skb_unshare(skb, GFP_ATOMIC);
117 if (skb == NULL)
118 goto out;
119
120 if (unlikely(!pskb_may_pull(skb, BRCM_TAG_LEN)))
121 goto out_drop;
122
123 /* skb->data points to the EtherType, the tag is right before it */
124 brcm_tag = skb->data - 2;
125
126 /* The opcode should never be different than 0b000 */
127 if (unlikely((brcm_tag[0] >> BRCM_OPCODE_SHIFT) & BRCM_OPCODE_MASK))
128 goto out_drop;
129
130 /* We should never see a reserved reason code without knowing how to
131 * handle it
132 */
133 WARN_ON(brcm_tag[2] & BRCM_EG_RC_RSVD);
134
135 /* Locate which port this is coming from */
136 source_port = brcm_tag[3] & BRCM_EG_PID_MASK;
137
138 /* Validate port against switch setup, either the port is totally */
139 if (source_port >= DSA_MAX_PORTS || ds->ports[source_port] == NULL)
140 goto out_drop;
141
142 /* Remove Broadcom tag and update checksum */
143 skb_pull_rcsum(skb, BRCM_TAG_LEN);
144
145 /* Move the Ethernet DA and SA */
146 memmove(skb->data - ETH_HLEN,
147 skb->data - ETH_HLEN - BRCM_TAG_LEN,
148 2 * ETH_ALEN);
149
150 skb_push(skb, ETH_HLEN);
151 skb->pkt_type = PACKET_HOST;
152 skb->dev = ds->ports[source_port];
153 skb->protocol = eth_type_trans(skb, skb->dev);
154
155 skb->dev->stats.rx_packets++;
156 skb->dev->stats.rx_bytes += skb->len;
157
158 netif_receive_skb(skb);
159
160 return 0;
161
162out_drop:
163 kfree_skb(skb);
164out:
165 return 0;
166}
167
168const struct dsa_device_ops brcm_netdev_ops = {
169 .xmit = brcm_tag_xmit,
170 .rcv = brcm_tag_rcv,
171};
diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index cacce1e22f9c..ce90c8bdc658 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -10,13 +10,12 @@
10 10
11#include <linux/etherdevice.h> 11#include <linux/etherdevice.h>
12#include <linux/list.h> 12#include <linux/list.h>
13#include <linux/netdevice.h>
14#include <linux/slab.h> 13#include <linux/slab.h>
15#include "dsa_priv.h" 14#include "dsa_priv.h"
16 15
17#define DSA_HLEN 4 16#define DSA_HLEN 4
18 17
19netdev_tx_t dsa_xmit(struct sk_buff *skb, struct net_device *dev) 18static netdev_tx_t dsa_xmit(struct sk_buff *skb, struct net_device *dev)
20{ 19{
21 struct dsa_slave_priv *p = netdev_priv(dev); 20 struct dsa_slave_priv *p = netdev_priv(dev);
22 u8 *dsa_header; 21 u8 *dsa_header;
@@ -186,7 +185,7 @@ out:
186 return 0; 185 return 0;
187} 186}
188 187
189struct packet_type dsa_packet_type __read_mostly = { 188const struct dsa_device_ops dsa_netdev_ops = {
190 .type = cpu_to_be16(ETH_P_DSA), 189 .xmit = dsa_xmit,
191 .func = dsa_rcv, 190 .rcv = dsa_rcv,
192}; 191};
diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
index e70c43c25e64..94fcce778679 100644
--- a/net/dsa/tag_edsa.c
+++ b/net/dsa/tag_edsa.c
@@ -10,14 +10,13 @@
10 10
11#include <linux/etherdevice.h> 11#include <linux/etherdevice.h>
12#include <linux/list.h> 12#include <linux/list.h>
13#include <linux/netdevice.h>
14#include <linux/slab.h> 13#include <linux/slab.h>
15#include "dsa_priv.h" 14#include "dsa_priv.h"
16 15
17#define DSA_HLEN 4 16#define DSA_HLEN 4
18#define EDSA_HLEN 8 17#define EDSA_HLEN 8
19 18
20netdev_tx_t edsa_xmit(struct sk_buff *skb, struct net_device *dev) 19static netdev_tx_t edsa_xmit(struct sk_buff *skb, struct net_device *dev)
21{ 20{
22 struct dsa_slave_priv *p = netdev_priv(dev); 21 struct dsa_slave_priv *p = netdev_priv(dev);
23 u8 *edsa_header; 22 u8 *edsa_header;
@@ -205,7 +204,7 @@ out:
205 return 0; 204 return 0;
206} 205}
207 206
208struct packet_type edsa_packet_type __read_mostly = { 207const struct dsa_device_ops edsa_netdev_ops = {
209 .type = cpu_to_be16(ETH_P_EDSA), 208 .xmit = edsa_xmit,
210 .func = edsa_rcv, 209 .rcv = edsa_rcv,
211}; 210};
diff --git a/net/dsa/tag_trailer.c b/net/dsa/tag_trailer.c
index 94bc260d015d..115fdca34077 100644
--- a/net/dsa/tag_trailer.c
+++ b/net/dsa/tag_trailer.c
@@ -10,11 +10,10 @@
10 10
11#include <linux/etherdevice.h> 11#include <linux/etherdevice.h>
12#include <linux/list.h> 12#include <linux/list.h>
13#include <linux/netdevice.h>
14#include <linux/slab.h> 13#include <linux/slab.h>
15#include "dsa_priv.h" 14#include "dsa_priv.h"
16 15
17netdev_tx_t trailer_xmit(struct sk_buff *skb, struct net_device *dev) 16static netdev_tx_t trailer_xmit(struct sk_buff *skb, struct net_device *dev)
18{ 17{
19 struct dsa_slave_priv *p = netdev_priv(dev); 18 struct dsa_slave_priv *p = netdev_priv(dev);
20 struct sk_buff *nskb; 19 struct sk_buff *nskb;
@@ -114,7 +113,7 @@ out:
114 return 0; 113 return 0;
115} 114}
116 115
117struct packet_type trailer_packet_type __read_mostly = { 116const struct dsa_device_ops trailer_netdev_ops = {
118 .type = cpu_to_be16(ETH_P_TRAILER), 117 .xmit = trailer_xmit,
119 .func = trailer_rcv, 118 .rcv = trailer_rcv,
120}; 119};
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index f405e0592407..33a140e15834 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -146,6 +146,33 @@ int eth_rebuild_header(struct sk_buff *skb)
146EXPORT_SYMBOL(eth_rebuild_header); 146EXPORT_SYMBOL(eth_rebuild_header);
147 147
148/** 148/**
149 * eth_get_headlen - determine the the length of header for an ethernet frame
150 * @data: pointer to start of frame
151 * @len: total length of frame
152 *
153 * Make a best effort attempt to pull the length for all of the headers for
154 * a given frame in a linear buffer.
155 */
156u32 eth_get_headlen(void *data, unsigned int len)
157{
158 const struct ethhdr *eth = (const struct ethhdr *)data;
159 struct flow_keys keys;
160
161 /* this should never happen, but better safe than sorry */
162 if (len < sizeof(*eth))
163 return len;
164
165 /* parse any remaining L2/L3 headers, check for L4 */
166 if (!__skb_flow_dissect(NULL, &keys, data,
167 eth->h_proto, sizeof(*eth), len))
168 return max_t(u32, keys.thoff, sizeof(*eth));
169
170 /* parse for any L4 headers */
171 return min_t(u32, __skb_get_poff(NULL, data, &keys, len), len);
172}
173EXPORT_SYMBOL(eth_get_headlen);
174
175/**
149 * eth_type_trans - determine the packet's protocol ID. 176 * eth_type_trans - determine the packet's protocol ID.
150 * @skb: received socket data 177 * @skb: received socket data
151 * @dev: receiving network device 178 * @dev: receiving network device
@@ -181,11 +208,8 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev)
181 * variants has been configured on the receiving interface, 208 * variants has been configured on the receiving interface,
182 * and if so, set skb->protocol without looking at the packet. 209 * and if so, set skb->protocol without looking at the packet.
183 */ 210 */
184 if (unlikely(netdev_uses_dsa_tags(dev))) 211 if (unlikely(netdev_uses_dsa(dev)))
185 return htons(ETH_P_DSA); 212 return htons(ETH_P_XDSA);
186
187 if (unlikely(netdev_uses_trailer_tags(dev)))
188 return htons(ETH_P_TRAILER);
189 213
190 if (likely(ntohs(eth->h_proto) >= ETH_P_802_3_MIN)) 214 if (likely(ntohs(eth->h_proto) >= ETH_P_802_3_MIN))
191 return eth->h_proto; 215 return eth->h_proto;
diff --git a/net/ieee802154/6lowpan_rtnl.c b/net/ieee802154/6lowpan_rtnl.c
index 6591d27e53a4..44136297b673 100644
--- a/net/ieee802154/6lowpan_rtnl.c
+++ b/net/ieee802154/6lowpan_rtnl.c
@@ -71,18 +71,33 @@ struct lowpan_dev_record {
71 struct list_head list; 71 struct list_head list;
72}; 72};
73 73
74/* don't save pan id, it's intra pan */
75struct lowpan_addr {
76 u8 mode;
77 union {
78 /* IPv6 needs big endian here */
79 __be64 extended_addr;
80 __be16 short_addr;
81 } u;
82};
83
84struct lowpan_addr_info {
85 struct lowpan_addr daddr;
86 struct lowpan_addr saddr;
87};
88
74static inline struct 89static inline struct
75lowpan_dev_info *lowpan_dev_info(const struct net_device *dev) 90lowpan_dev_info *lowpan_dev_info(const struct net_device *dev)
76{ 91{
77 return netdev_priv(dev); 92 return netdev_priv(dev);
78} 93}
79 94
80static inline void lowpan_address_flip(u8 *src, u8 *dest) 95static inline struct
96lowpan_addr_info *lowpan_skb_priv(const struct sk_buff *skb)
81{ 97{
82 int i; 98 WARN_ON_ONCE(skb_headroom(skb) < sizeof(struct lowpan_addr_info));
83 99 return (struct lowpan_addr_info *)(skb->data -
84 for (i = 0; i < IEEE802154_ADDR_LEN; i++) 100 sizeof(struct lowpan_addr_info));
85 (dest)[IEEE802154_ADDR_LEN - i - 1] = (src)[i];
86} 101}
87 102
88static int lowpan_header_create(struct sk_buff *skb, struct net_device *dev, 103static int lowpan_header_create(struct sk_buff *skb, struct net_device *dev,
@@ -91,8 +106,7 @@ static int lowpan_header_create(struct sk_buff *skb, struct net_device *dev,
91{ 106{
92 const u8 *saddr = _saddr; 107 const u8 *saddr = _saddr;
93 const u8 *daddr = _daddr; 108 const u8 *daddr = _daddr;
94 struct ieee802154_addr sa, da; 109 struct lowpan_addr_info *info;
95 struct ieee802154_mac_cb *cb = mac_cb_init(skb);
96 110
97 /* TODO: 111 /* TODO:
98 * if this package isn't ipv6 one, where should it be routed? 112 * if this package isn't ipv6 one, where should it be routed?
@@ -106,41 +120,17 @@ static int lowpan_header_create(struct sk_buff *skb, struct net_device *dev,
106 raw_dump_inline(__func__, "saddr", (unsigned char *)saddr, 8); 120 raw_dump_inline(__func__, "saddr", (unsigned char *)saddr, 8);
107 raw_dump_inline(__func__, "daddr", (unsigned char *)daddr, 8); 121 raw_dump_inline(__func__, "daddr", (unsigned char *)daddr, 8);
108 122
109 lowpan_header_compress(skb, dev, type, daddr, saddr, len); 123 info = lowpan_skb_priv(skb);
110
111 /* NOTE1: I'm still unsure about the fact that compression and WPAN
112 * header are created here and not later in the xmit. So wait for
113 * an opinion of net maintainers.
114 */
115 /* NOTE2: to be absolutely correct, we must derive PANid information
116 * from MAC subif of the 'dev' and 'real_dev' network devices, but
117 * this isn't implemented in mainline yet, so currently we assign 0xff
118 */
119 cb->type = IEEE802154_FC_TYPE_DATA;
120 124
121 /* prepare wpan address data */ 125 /* TODO: Currently we only support extended_addr */
122 sa.mode = IEEE802154_ADDR_LONG; 126 info->daddr.mode = IEEE802154_ADDR_LONG;
123 sa.pan_id = ieee802154_mlme_ops(dev)->get_pan_id(dev); 127 memcpy(&info->daddr.u.extended_addr, daddr,
124 sa.extended_addr = ieee802154_devaddr_from_raw(saddr); 128 sizeof(info->daddr.u.extended_addr));
129 info->saddr.mode = IEEE802154_ADDR_LONG;
130 memcpy(&info->saddr.u.extended_addr, saddr,
131 sizeof(info->daddr.u.extended_addr));
125 132
126 /* intra-PAN communications */ 133 return 0;
127 da.pan_id = sa.pan_id;
128
129 /* if the destination address is the broadcast address, use the
130 * corresponding short address
131 */
132 if (lowpan_is_addr_broadcast(daddr)) {
133 da.mode = IEEE802154_ADDR_SHORT;
134 da.short_addr = cpu_to_le16(IEEE802154_ADDR_BROADCAST);
135 } else {
136 da.mode = IEEE802154_ADDR_LONG;
137 da.extended_addr = ieee802154_devaddr_from_raw(daddr);
138 }
139
140 cb->ackreq = !lowpan_is_addr_broadcast(daddr);
141
142 return dev_hard_header(skb, lowpan_dev_info(dev)->real_dev,
143 type, (void *)&da, (void *)&sa, 0);
144} 134}
145 135
146static int lowpan_give_skb_to_devices(struct sk_buff *skb, 136static int lowpan_give_skb_to_devices(struct sk_buff *skb,
@@ -338,13 +328,68 @@ err:
338 return rc; 328 return rc;
339} 329}
340 330
331static int lowpan_header(struct sk_buff *skb, struct net_device *dev)
332{
333 struct ieee802154_addr sa, da;
334 struct ieee802154_mac_cb *cb = mac_cb_init(skb);
335 struct lowpan_addr_info info;
336 void *daddr, *saddr;
337
338 memcpy(&info, lowpan_skb_priv(skb), sizeof(info));
339
340 /* TODO: Currently we only support extended_addr */
341 daddr = &info.daddr.u.extended_addr;
342 saddr = &info.saddr.u.extended_addr;
343
344 lowpan_header_compress(skb, dev, ETH_P_IPV6, daddr, saddr, skb->len);
345
346 cb->type = IEEE802154_FC_TYPE_DATA;
347
348 /* prepare wpan address data */
349 sa.mode = IEEE802154_ADDR_LONG;
350 sa.pan_id = ieee802154_mlme_ops(dev)->get_pan_id(dev);
351 sa.extended_addr = ieee802154_devaddr_from_raw(saddr);
352
353 /* intra-PAN communications */
354 da.pan_id = sa.pan_id;
355
356 /* if the destination address is the broadcast address, use the
357 * corresponding short address
358 */
359 if (lowpan_is_addr_broadcast((const u8 *)daddr)) {
360 da.mode = IEEE802154_ADDR_SHORT;
361 da.short_addr = cpu_to_le16(IEEE802154_ADDR_BROADCAST);
362 cb->ackreq = false;
363 } else {
364 da.mode = IEEE802154_ADDR_LONG;
365 da.extended_addr = ieee802154_devaddr_from_raw(daddr);
366 cb->ackreq = true;
367 }
368
369 return dev_hard_header(skb, lowpan_dev_info(dev)->real_dev,
370 ETH_P_IPV6, (void *)&da, (void *)&sa, 0);
371}
372
341static netdev_tx_t lowpan_xmit(struct sk_buff *skb, struct net_device *dev) 373static netdev_tx_t lowpan_xmit(struct sk_buff *skb, struct net_device *dev)
342{ 374{
343 struct ieee802154_hdr wpan_hdr; 375 struct ieee802154_hdr wpan_hdr;
344 int max_single; 376 int max_single, ret;
345 377
346 pr_debug("package xmit\n"); 378 pr_debug("package xmit\n");
347 379
380 /* We must take a copy of the skb before we modify/replace the ipv6
381 * header as the header could be used elsewhere
382 */
383 skb = skb_unshare(skb, GFP_ATOMIC);
384 if (!skb)
385 return NET_XMIT_DROP;
386
387 ret = lowpan_header(skb, dev);
388 if (ret < 0) {
389 kfree_skb(skb);
390 return NET_XMIT_DROP;
391 }
392
348 if (ieee802154_hdr_peek(skb, &wpan_hdr) < 0) { 393 if (ieee802154_hdr_peek(skb, &wpan_hdr) < 0) {
349 kfree_skb(skb); 394 kfree_skb(skb);
350 return NET_XMIT_DROP; 395 return NET_XMIT_DROP;
diff --git a/net/ieee802154/reassembly.c b/net/ieee802154/reassembly.c
index 32755cb7e64e..7cfcd6885225 100644
--- a/net/ieee802154/reassembly.c
+++ b/net/ieee802154/reassembly.c
@@ -485,7 +485,7 @@ static void __net_exit lowpan_frags_ns_sysctl_unregister(struct net *net)
485 485
486static struct ctl_table_header *lowpan_ctl_header; 486static struct ctl_table_header *lowpan_ctl_header;
487 487
488static int lowpan_frags_sysctl_register(void) 488static int __init lowpan_frags_sysctl_register(void)
489{ 489{
490 lowpan_ctl_header = register_net_sysctl(&init_net, 490 lowpan_ctl_header = register_net_sysctl(&init_net,
491 "net/ieee802154/6lowpan", 491 "net/ieee802154/6lowpan",
@@ -507,7 +507,7 @@ static inline void lowpan_frags_ns_sysctl_unregister(struct net *net)
507{ 507{
508} 508}
509 509
510static inline int lowpan_frags_sysctl_register(void) 510static inline int __init lowpan_frags_sysctl_register(void)
511{ 511{
512 return 0; 512 return 0;
513} 513}
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index dbc10d84161f..e682b48e0709 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -309,8 +309,33 @@ config NET_IPVTI
309 309
310config NET_UDP_TUNNEL 310config NET_UDP_TUNNEL
311 tristate 311 tristate
312 select NET_IP_TUNNEL
312 default n 313 default n
313 314
315config NET_FOU
316 tristate "IP: Foo (IP protocols) over UDP"
317 select XFRM
318 select NET_UDP_TUNNEL
319 ---help---
320 Foo over UDP allows any IP protocol to be directly encapsulated
321 over UDP include tunnels (IPIP, GRE, SIT). By encapsulating in UDP
322 network mechanisms and optimizations for UDP (such as ECMP
323 and RSS) can be leveraged to provide better service.
324
325config GENEVE
326 tristate "Generic Network Virtualization Encapsulation (Geneve)"
327 depends on INET
328 select NET_UDP_TUNNEL
329 ---help---
330 This allows one to create Geneve virtual interfaces that provide
331 Layer 2 Networks over Layer 3 Networks. Geneve is often used
332 to tunnel virtual network infrastructure in virtualized environments.
333 For more information see:
334 http://tools.ietf.org/html/draft-gross-geneve-01
335
336 To compile this driver as a module, choose M here: the module
337
338
314config INET_AH 339config INET_AH
315 tristate "IP: AH transformation" 340 tristate "IP: AH transformation"
316 select XFRM_ALGO 341 select XFRM_ALGO
@@ -560,6 +585,27 @@ config TCP_CONG_ILLINOIS
560 For further details see: 585 For further details see:
561 http://www.ews.uiuc.edu/~shaoliu/tcpillinois/index.html 586 http://www.ews.uiuc.edu/~shaoliu/tcpillinois/index.html
562 587
588config TCP_CONG_DCTCP
589 tristate "DataCenter TCP (DCTCP)"
590 default n
591 ---help---
592 DCTCP leverages Explicit Congestion Notification (ECN) in the network to
593 provide multi-bit feedback to the end hosts. It is designed to provide:
594
595 - High burst tolerance (incast due to partition/aggregate),
596 - Low latency (short flows, queries),
597 - High throughput (continuous data updates, large file transfers) with
598 commodity, shallow-buffered switches.
599
600 All switches in the data center network running DCTCP must support
601 ECN marking and be configured for marking when reaching defined switch
602 buffer thresholds. The default ECN marking threshold heuristic for
603 DCTCP on switches is 20 packets (30KB) at 1Gbps, and 65 packets
604 (~100KB) at 10Gbps, but might need further careful tweaking.
605
606 For further details see:
607 http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf
608
563choice 609choice
564 prompt "Default TCP congestion control" 610 prompt "Default TCP congestion control"
565 default DEFAULT_CUBIC 611 default DEFAULT_CUBIC
@@ -588,9 +634,11 @@ choice
588 config DEFAULT_WESTWOOD 634 config DEFAULT_WESTWOOD
589 bool "Westwood" if TCP_CONG_WESTWOOD=y 635 bool "Westwood" if TCP_CONG_WESTWOOD=y
590 636
637 config DEFAULT_DCTCP
638 bool "DCTCP" if TCP_CONG_DCTCP=y
639
591 config DEFAULT_RENO 640 config DEFAULT_RENO
592 bool "Reno" 641 bool "Reno"
593
594endchoice 642endchoice
595 643
596endif 644endif
@@ -610,6 +658,7 @@ config DEFAULT_TCP_CONG
610 default "westwood" if DEFAULT_WESTWOOD 658 default "westwood" if DEFAULT_WESTWOOD
611 default "veno" if DEFAULT_VENO 659 default "veno" if DEFAULT_VENO
612 default "reno" if DEFAULT_RENO 660 default "reno" if DEFAULT_RENO
661 default "dctcp" if DEFAULT_DCTCP
613 default "cubic" 662 default "cubic"
614 663
615config TCP_MD5SIG 664config TCP_MD5SIG
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index 8ee1cd4053ee..518c04ed666e 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_IP_MULTIPLE_TABLES) += fib_rules.o
20obj-$(CONFIG_IP_MROUTE) += ipmr.o 20obj-$(CONFIG_IP_MROUTE) += ipmr.o
21obj-$(CONFIG_NET_IPIP) += ipip.o 21obj-$(CONFIG_NET_IPIP) += ipip.o
22gre-y := gre_demux.o 22gre-y := gre_demux.o
23obj-$(CONFIG_NET_FOU) += fou.o
23obj-$(CONFIG_NET_IPGRE_DEMUX) += gre.o 24obj-$(CONFIG_NET_IPGRE_DEMUX) += gre.o
24obj-$(CONFIG_NET_IPGRE) += ip_gre.o 25obj-$(CONFIG_NET_IPGRE) += ip_gre.o
25obj-$(CONFIG_NET_UDP_TUNNEL) += udp_tunnel.o 26obj-$(CONFIG_NET_UDP_TUNNEL) += udp_tunnel.o
@@ -42,6 +43,7 @@ obj-$(CONFIG_INET_UDP_DIAG) += udp_diag.o
42obj-$(CONFIG_NET_TCPPROBE) += tcp_probe.o 43obj-$(CONFIG_NET_TCPPROBE) += tcp_probe.o
43obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o 44obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o
44obj-$(CONFIG_TCP_CONG_CUBIC) += tcp_cubic.o 45obj-$(CONFIG_TCP_CONG_CUBIC) += tcp_cubic.o
46obj-$(CONFIG_TCP_CONG_DCTCP) += tcp_dctcp.o
45obj-$(CONFIG_TCP_CONG_WESTWOOD) += tcp_westwood.o 47obj-$(CONFIG_TCP_CONG_WESTWOOD) += tcp_westwood.o
46obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o 48obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o
47obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o 49obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o
@@ -54,6 +56,7 @@ obj-$(CONFIG_TCP_CONG_YEAH) += tcp_yeah.o
54obj-$(CONFIG_TCP_CONG_ILLINOIS) += tcp_illinois.o 56obj-$(CONFIG_TCP_CONG_ILLINOIS) += tcp_illinois.o
55obj-$(CONFIG_MEMCG_KMEM) += tcp_memcontrol.o 57obj-$(CONFIG_MEMCG_KMEM) += tcp_memcontrol.o
56obj-$(CONFIG_NETLABEL) += cipso_ipv4.o 58obj-$(CONFIG_NETLABEL) += cipso_ipv4.o
59obj-$(CONFIG_GENEVE) += geneve.o
57 60
58obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \ 61obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \
59 xfrm4_output.o xfrm4_protocol.o 62 xfrm4_output.o xfrm4_protocol.o
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index d156b3c5f363..92db7a69f2b9 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -418,10 +418,6 @@ int inet_release(struct socket *sock)
418} 418}
419EXPORT_SYMBOL(inet_release); 419EXPORT_SYMBOL(inet_release);
420 420
421/* It is off by default, see below. */
422int sysctl_ip_nonlocal_bind __read_mostly;
423EXPORT_SYMBOL(sysctl_ip_nonlocal_bind);
424
425int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) 421int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
426{ 422{
427 struct sockaddr_in *addr = (struct sockaddr_in *)uaddr; 423 struct sockaddr_in *addr = (struct sockaddr_in *)uaddr;
@@ -461,7 +457,7 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
461 * is temporarily down) 457 * is temporarily down)
462 */ 458 */
463 err = -EADDRNOTAVAIL; 459 err = -EADDRNOTAVAIL;
464 if (!sysctl_ip_nonlocal_bind && 460 if (!net->ipv4.sysctl_ip_nonlocal_bind &&
465 !(inet->freebind || inet->transparent) && 461 !(inet->freebind || inet->transparent) &&
466 addr->sin_addr.s_addr != htonl(INADDR_ANY) && 462 addr->sin_addr.s_addr != htonl(INADDR_ANY) &&
467 chk_addr_ret != RTN_LOCAL && 463 chk_addr_ret != RTN_LOCAL &&
@@ -1201,40 +1197,6 @@ int inet_sk_rebuild_header(struct sock *sk)
1201} 1197}
1202EXPORT_SYMBOL(inet_sk_rebuild_header); 1198EXPORT_SYMBOL(inet_sk_rebuild_header);
1203 1199
1204static int inet_gso_send_check(struct sk_buff *skb)
1205{
1206 const struct net_offload *ops;
1207 const struct iphdr *iph;
1208 int proto;
1209 int ihl;
1210 int err = -EINVAL;
1211
1212 if (unlikely(!pskb_may_pull(skb, sizeof(*iph))))
1213 goto out;
1214
1215 iph = ip_hdr(skb);
1216 ihl = iph->ihl * 4;
1217 if (ihl < sizeof(*iph))
1218 goto out;
1219
1220 proto = iph->protocol;
1221
1222 /* Warning: after this point, iph might be no longer valid */
1223 if (unlikely(!pskb_may_pull(skb, ihl)))
1224 goto out;
1225 __skb_pull(skb, ihl);
1226
1227 skb_reset_transport_header(skb);
1228 err = -EPROTONOSUPPORT;
1229
1230 ops = rcu_dereference(inet_offloads[proto]);
1231 if (likely(ops && ops->callbacks.gso_send_check))
1232 err = ops->callbacks.gso_send_check(skb);
1233
1234out:
1235 return err;
1236}
1237
1238static struct sk_buff *inet_gso_segment(struct sk_buff *skb, 1200static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
1239 netdev_features_t features) 1201 netdev_features_t features)
1240{ 1202{
@@ -1407,6 +1369,9 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
1407 * immediately following this IP hdr. 1369 * immediately following this IP hdr.
1408 */ 1370 */
1409 1371
1372 /* Note : No need to call skb_gro_postpull_rcsum() here,
1373 * as we already checked checksum over ipv4 header was 0
1374 */
1410 skb_gro_pull(skb, sizeof(*iph)); 1375 skb_gro_pull(skb, sizeof(*iph));
1411 skb_set_transport_header(skb, skb_gro_offset(skb)); 1376 skb_set_transport_header(skb, skb_gro_offset(skb));
1412 1377
@@ -1659,7 +1624,6 @@ static int ipv4_proc_init(void);
1659static struct packet_offload ip_packet_offload __read_mostly = { 1624static struct packet_offload ip_packet_offload __read_mostly = {
1660 .type = cpu_to_be16(ETH_P_IP), 1625 .type = cpu_to_be16(ETH_P_IP),
1661 .callbacks = { 1626 .callbacks = {
1662 .gso_send_check = inet_gso_send_check,
1663 .gso_segment = inet_gso_segment, 1627 .gso_segment = inet_gso_segment,
1664 .gro_receive = inet_gro_receive, 1628 .gro_receive = inet_gro_receive,
1665 .gro_complete = inet_gro_complete, 1629 .gro_complete = inet_gro_complete,
@@ -1668,8 +1632,9 @@ static struct packet_offload ip_packet_offload __read_mostly = {
1668 1632
1669static const struct net_offload ipip_offload = { 1633static const struct net_offload ipip_offload = {
1670 .callbacks = { 1634 .callbacks = {
1671 .gso_send_check = inet_gso_send_check,
1672 .gso_segment = inet_gso_segment, 1635 .gso_segment = inet_gso_segment,
1636 .gro_receive = inet_gro_receive,
1637 .gro_complete = inet_gro_complete,
1673 }, 1638 },
1674}; 1639};
1675 1640
diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index a2afa89513a0..ac9a32ec3ee4 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -505,8 +505,6 @@ static int ah_init_state(struct xfrm_state *x)
505 ahp->icv_full_len = aalg_desc->uinfo.auth.icv_fullbits/8; 505 ahp->icv_full_len = aalg_desc->uinfo.auth.icv_fullbits/8;
506 ahp->icv_trunc_len = x->aalg->alg_trunc_len/8; 506 ahp->icv_trunc_len = x->aalg->alg_trunc_len/8;
507 507
508 BUG_ON(ahp->icv_trunc_len > MAX_AH_AUTH_LEN);
509
510 if (x->props.flags & XFRM_STATE_ALIGN4) 508 if (x->props.flags & XFRM_STATE_ALIGN4)
511 x->props.header_len = XFRM_ALIGN4(sizeof(struct ip_auth_hdr) + 509 x->props.header_len = XFRM_ALIGN4(sizeof(struct ip_auth_hdr) +
512 ahp->icv_trunc_len); 510 ahp->icv_trunc_len);
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 1a9b99e04465..16acb59d665e 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -953,10 +953,11 @@ static int arp_rcv(struct sk_buff *skb, struct net_device *dev,
953{ 953{
954 const struct arphdr *arp; 954 const struct arphdr *arp;
955 955
956 /* do not tweak dropwatch on an ARP we will ignore */
956 if (dev->flags & IFF_NOARP || 957 if (dev->flags & IFF_NOARP ||
957 skb->pkt_type == PACKET_OTHERHOST || 958 skb->pkt_type == PACKET_OTHERHOST ||
958 skb->pkt_type == PACKET_LOOPBACK) 959 skb->pkt_type == PACKET_LOOPBACK)
959 goto freeskb; 960 goto consumeskb;
960 961
961 skb = skb_share_check(skb, GFP_ATOMIC); 962 skb = skb_share_check(skb, GFP_ATOMIC);
962 if (!skb) 963 if (!skb)
@@ -974,6 +975,9 @@ static int arp_rcv(struct sk_buff *skb, struct net_device *dev,
974 975
975 return NF_HOOK(NFPROTO_ARP, NF_ARP_IN, skb, dev, NULL, arp_process); 976 return NF_HOOK(NFPROTO_ARP, NF_ARP_IN, skb, dev, NULL, arp_process);
976 977
978consumeskb:
979 consume_skb(skb);
980 return 0;
977freeskb: 981freeskb:
978 kfree_skb(skb); 982 kfree_skb(skb);
979out_of_mem: 983out_of_mem:
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 05b708bbdb0d..4715f25dfe03 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -246,7 +246,7 @@ static u32 cipso_v4_map_cache_hash(const unsigned char *key, u32 key_len)
246 * success, negative values on error. 246 * success, negative values on error.
247 * 247 *
248 */ 248 */
249static int cipso_v4_cache_init(void) 249static int __init cipso_v4_cache_init(void)
250{ 250{
251 u32 iter; 251 u32 iter;
252 252
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 255aa9946fe7..23104a3f2924 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -243,7 +243,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
243 u8 tos, int oif, struct net_device *dev, 243 u8 tos, int oif, struct net_device *dev,
244 int rpf, struct in_device *idev, u32 *itag) 244 int rpf, struct in_device *idev, u32 *itag)
245{ 245{
246 int ret, no_addr, accept_local; 246 int ret, no_addr;
247 struct fib_result res; 247 struct fib_result res;
248 struct flowi4 fl4; 248 struct flowi4 fl4;
249 struct net *net; 249 struct net *net;
@@ -258,16 +258,17 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
258 258
259 no_addr = idev->ifa_list == NULL; 259 no_addr = idev->ifa_list == NULL;
260 260
261 accept_local = IN_DEV_ACCEPT_LOCAL(idev);
262 fl4.flowi4_mark = IN_DEV_SRC_VMARK(idev) ? skb->mark : 0; 261 fl4.flowi4_mark = IN_DEV_SRC_VMARK(idev) ? skb->mark : 0;
263 262
264 net = dev_net(dev); 263 net = dev_net(dev);
265 if (fib_lookup(net, &fl4, &res)) 264 if (fib_lookup(net, &fl4, &res))
266 goto last_resort; 265 goto last_resort;
267 if (res.type != RTN_UNICAST) { 266 if (res.type != RTN_UNICAST &&
268 if (res.type != RTN_LOCAL || !accept_local) 267 (res.type != RTN_LOCAL || !IN_DEV_ACCEPT_LOCAL(idev)))
269 goto e_inval; 268 goto e_inval;
270 } 269 if (!rpf && !fib_num_tclassid_users(dev_net(dev)) &&
270 (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev)))
271 goto last_resort;
271 fib_combine_itag(itag, &res); 272 fib_combine_itag(itag, &res);
272 dev_match = false; 273 dev_match = false;
273 274
@@ -321,6 +322,7 @@ int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
321 int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev); 322 int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);
322 323
323 if (!r && !fib_num_tclassid_users(dev_net(dev)) && 324 if (!r && !fib_num_tclassid_users(dev_net(dev)) &&
325 IN_DEV_ACCEPT_LOCAL(idev) &&
324 (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))) { 326 (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))) {
325 *itag = 0; 327 *itag = 0;
326 return 0; 328 return 0;
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index b10cd43a4722..5b6efb3d2308 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -157,9 +157,12 @@ static void rt_fibinfo_free(struct rtable __rcu **rtp)
157 157
158static void free_nh_exceptions(struct fib_nh *nh) 158static void free_nh_exceptions(struct fib_nh *nh)
159{ 159{
160 struct fnhe_hash_bucket *hash = nh->nh_exceptions; 160 struct fnhe_hash_bucket *hash;
161 int i; 161 int i;
162 162
163 hash = rcu_dereference_protected(nh->nh_exceptions, 1);
164 if (!hash)
165 return;
163 for (i = 0; i < FNHE_HASH_SIZE; i++) { 166 for (i = 0; i < FNHE_HASH_SIZE; i++) {
164 struct fib_nh_exception *fnhe; 167 struct fib_nh_exception *fnhe;
165 168
@@ -205,8 +208,7 @@ static void free_fib_info_rcu(struct rcu_head *head)
205 change_nexthops(fi) { 208 change_nexthops(fi) {
206 if (nexthop_nh->nh_dev) 209 if (nexthop_nh->nh_dev)
207 dev_put(nexthop_nh->nh_dev); 210 dev_put(nexthop_nh->nh_dev);
208 if (nexthop_nh->nh_exceptions) 211 free_nh_exceptions(nexthop_nh);
209 free_nh_exceptions(nexthop_nh);
210 rt_fibinfo_free_cpus(nexthop_nh->nh_pcpu_rth_output); 212 rt_fibinfo_free_cpus(nexthop_nh->nh_pcpu_rth_output);
211 rt_fibinfo_free(&nexthop_nh->nh_rth_input); 213 rt_fibinfo_free(&nexthop_nh->nh_rth_input);
212 } endfor_nexthops(fi); 214 } endfor_nexthops(fi);
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
new file mode 100644
index 000000000000..efa70ad44906
--- /dev/null
+++ b/net/ipv4/fou.c
@@ -0,0 +1,514 @@
1#include <linux/module.h>
2#include <linux/errno.h>
3#include <linux/socket.h>
4#include <linux/skbuff.h>
5#include <linux/ip.h>
6#include <linux/udp.h>
7#include <linux/types.h>
8#include <linux/kernel.h>
9#include <net/genetlink.h>
10#include <net/gue.h>
11#include <net/ip.h>
12#include <net/protocol.h>
13#include <net/udp.h>
14#include <net/udp_tunnel.h>
15#include <net/xfrm.h>
16#include <uapi/linux/fou.h>
17#include <uapi/linux/genetlink.h>
18
19static DEFINE_SPINLOCK(fou_lock);
20static LIST_HEAD(fou_list);
21
22struct fou {
23 struct socket *sock;
24 u8 protocol;
25 u16 port;
26 struct udp_offload udp_offloads;
27 struct list_head list;
28};
29
30struct fou_cfg {
31 u16 type;
32 u8 protocol;
33 struct udp_port_cfg udp_config;
34};
35
36static inline struct fou *fou_from_sock(struct sock *sk)
37{
38 return sk->sk_user_data;
39}
40
41static int fou_udp_encap_recv_deliver(struct sk_buff *skb,
42 u8 protocol, size_t len)
43{
44 struct iphdr *iph = ip_hdr(skb);
45
46 /* Remove 'len' bytes from the packet (UDP header and
47 * FOU header if present), modify the protocol to the one
48 * we found, and then call rcv_encap.
49 */
50 iph->tot_len = htons(ntohs(iph->tot_len) - len);
51 __skb_pull(skb, len);
52 skb_postpull_rcsum(skb, udp_hdr(skb), len);
53 skb_reset_transport_header(skb);
54
55 return -protocol;
56}
57
58static int fou_udp_recv(struct sock *sk, struct sk_buff *skb)
59{
60 struct fou *fou = fou_from_sock(sk);
61
62 if (!fou)
63 return 1;
64
65 return fou_udp_encap_recv_deliver(skb, fou->protocol,
66 sizeof(struct udphdr));
67}
68
69static int gue_udp_recv(struct sock *sk, struct sk_buff *skb)
70{
71 struct fou *fou = fou_from_sock(sk);
72 size_t len;
73 struct guehdr *guehdr;
74 struct udphdr *uh;
75
76 if (!fou)
77 return 1;
78
79 len = sizeof(struct udphdr) + sizeof(struct guehdr);
80 if (!pskb_may_pull(skb, len))
81 goto drop;
82
83 uh = udp_hdr(skb);
84 guehdr = (struct guehdr *)&uh[1];
85
86 len += guehdr->hlen << 2;
87 if (!pskb_may_pull(skb, len))
88 goto drop;
89
90 if (guehdr->version != 0)
91 goto drop;
92
93 if (guehdr->flags) {
94 /* No support yet */
95 goto drop;
96 }
97
98 return fou_udp_encap_recv_deliver(skb, guehdr->next_hdr, len);
99drop:
100 kfree_skb(skb);
101 return 0;
102}
103
104static struct sk_buff **fou_gro_receive(struct sk_buff **head,
105 struct sk_buff *skb)
106{
107 const struct net_offload *ops;
108 struct sk_buff **pp = NULL;
109 u8 proto = NAPI_GRO_CB(skb)->proto;
110 const struct net_offload **offloads;
111
112 rcu_read_lock();
113 offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
114 ops = rcu_dereference(offloads[proto]);
115 if (!ops || !ops->callbacks.gro_receive)
116 goto out_unlock;
117
118 pp = ops->callbacks.gro_receive(head, skb);
119
120out_unlock:
121 rcu_read_unlock();
122
123 return pp;
124}
125
126static int fou_gro_complete(struct sk_buff *skb, int nhoff)
127{
128 const struct net_offload *ops;
129 u8 proto = NAPI_GRO_CB(skb)->proto;
130 int err = -ENOSYS;
131 const struct net_offload **offloads;
132
133 rcu_read_lock();
134 offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
135 ops = rcu_dereference(offloads[proto]);
136 if (WARN_ON(!ops || !ops->callbacks.gro_complete))
137 goto out_unlock;
138
139 err = ops->callbacks.gro_complete(skb, nhoff);
140
141out_unlock:
142 rcu_read_unlock();
143
144 return err;
145}
146
147static struct sk_buff **gue_gro_receive(struct sk_buff **head,
148 struct sk_buff *skb)
149{
150 const struct net_offload **offloads;
151 const struct net_offload *ops;
152 struct sk_buff **pp = NULL;
153 struct sk_buff *p;
154 u8 proto;
155 struct guehdr *guehdr;
156 unsigned int hlen, guehlen;
157 unsigned int off;
158 int flush = 1;
159
160 off = skb_gro_offset(skb);
161 hlen = off + sizeof(*guehdr);
162 guehdr = skb_gro_header_fast(skb, off);
163 if (skb_gro_header_hard(skb, hlen)) {
164 guehdr = skb_gro_header_slow(skb, hlen, off);
165 if (unlikely(!guehdr))
166 goto out;
167 }
168
169 proto = guehdr->next_hdr;
170
171 rcu_read_lock();
172 offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
173 ops = rcu_dereference(offloads[proto]);
174 if (WARN_ON(!ops || !ops->callbacks.gro_receive))
175 goto out_unlock;
176
177 guehlen = sizeof(*guehdr) + (guehdr->hlen << 2);
178
179 hlen = off + guehlen;
180 if (skb_gro_header_hard(skb, hlen)) {
181 guehdr = skb_gro_header_slow(skb, hlen, off);
182 if (unlikely(!guehdr))
183 goto out_unlock;
184 }
185
186 flush = 0;
187
188 for (p = *head; p; p = p->next) {
189 const struct guehdr *guehdr2;
190
191 if (!NAPI_GRO_CB(p)->same_flow)
192 continue;
193
194 guehdr2 = (struct guehdr *)(p->data + off);
195
196 /* Compare base GUE header to be equal (covers
197 * hlen, version, next_hdr, and flags.
198 */
199 if (guehdr->word != guehdr2->word) {
200 NAPI_GRO_CB(p)->same_flow = 0;
201 continue;
202 }
203
204 /* Compare optional fields are the same. */
205 if (guehdr->hlen && memcmp(&guehdr[1], &guehdr2[1],
206 guehdr->hlen << 2)) {
207 NAPI_GRO_CB(p)->same_flow = 0;
208 continue;
209 }
210 }
211
212 skb_gro_pull(skb, guehlen);
213
214 /* Adjusted NAPI_GRO_CB(skb)->csum after skb_gro_pull()*/
215 skb_gro_postpull_rcsum(skb, guehdr, guehlen);
216
217 pp = ops->callbacks.gro_receive(head, skb);
218
219out_unlock:
220 rcu_read_unlock();
221out:
222 NAPI_GRO_CB(skb)->flush |= flush;
223
224 return pp;
225}
226
227static int gue_gro_complete(struct sk_buff *skb, int nhoff)
228{
229 const struct net_offload **offloads;
230 struct guehdr *guehdr = (struct guehdr *)(skb->data + nhoff);
231 const struct net_offload *ops;
232 unsigned int guehlen;
233 u8 proto;
234 int err = -ENOENT;
235
236 proto = guehdr->next_hdr;
237
238 guehlen = sizeof(*guehdr) + (guehdr->hlen << 2);
239
240 rcu_read_lock();
241 offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
242 ops = rcu_dereference(offloads[proto]);
243 if (WARN_ON(!ops || !ops->callbacks.gro_complete))
244 goto out_unlock;
245
246 err = ops->callbacks.gro_complete(skb, nhoff + guehlen);
247
248out_unlock:
249 rcu_read_unlock();
250 return err;
251}
252
253static int fou_add_to_port_list(struct fou *fou)
254{
255 struct fou *fout;
256
257 spin_lock(&fou_lock);
258 list_for_each_entry(fout, &fou_list, list) {
259 if (fou->port == fout->port) {
260 spin_unlock(&fou_lock);
261 return -EALREADY;
262 }
263 }
264
265 list_add(&fou->list, &fou_list);
266 spin_unlock(&fou_lock);
267
268 return 0;
269}
270
271static void fou_release(struct fou *fou)
272{
273 struct socket *sock = fou->sock;
274 struct sock *sk = sock->sk;
275
276 udp_del_offload(&fou->udp_offloads);
277
278 list_del(&fou->list);
279
280 /* Remove hooks into tunnel socket */
281 sk->sk_user_data = NULL;
282
283 sock_release(sock);
284
285 kfree(fou);
286}
287
288static int fou_encap_init(struct sock *sk, struct fou *fou, struct fou_cfg *cfg)
289{
290 udp_sk(sk)->encap_rcv = fou_udp_recv;
291 fou->protocol = cfg->protocol;
292 fou->udp_offloads.callbacks.gro_receive = fou_gro_receive;
293 fou->udp_offloads.callbacks.gro_complete = fou_gro_complete;
294 fou->udp_offloads.port = cfg->udp_config.local_udp_port;
295 fou->udp_offloads.ipproto = cfg->protocol;
296
297 return 0;
298}
299
300static int gue_encap_init(struct sock *sk, struct fou *fou, struct fou_cfg *cfg)
301{
302 udp_sk(sk)->encap_rcv = gue_udp_recv;
303 fou->udp_offloads.callbacks.gro_receive = gue_gro_receive;
304 fou->udp_offloads.callbacks.gro_complete = gue_gro_complete;
305 fou->udp_offloads.port = cfg->udp_config.local_udp_port;
306
307 return 0;
308}
309
310static int fou_create(struct net *net, struct fou_cfg *cfg,
311 struct socket **sockp)
312{
313 struct fou *fou = NULL;
314 int err;
315 struct socket *sock = NULL;
316 struct sock *sk;
317
318 /* Open UDP socket */
319 err = udp_sock_create(net, &cfg->udp_config, &sock);
320 if (err < 0)
321 goto error;
322
323 /* Allocate FOU port structure */
324 fou = kzalloc(sizeof(*fou), GFP_KERNEL);
325 if (!fou) {
326 err = -ENOMEM;
327 goto error;
328 }
329
330 sk = sock->sk;
331
332 fou->port = cfg->udp_config.local_udp_port;
333
334 /* Initial for fou type */
335 switch (cfg->type) {
336 case FOU_ENCAP_DIRECT:
337 err = fou_encap_init(sk, fou, cfg);
338 if (err)
339 goto error;
340 break;
341 case FOU_ENCAP_GUE:
342 err = gue_encap_init(sk, fou, cfg);
343 if (err)
344 goto error;
345 break;
346 default:
347 err = -EINVAL;
348 goto error;
349 }
350
351 udp_sk(sk)->encap_type = 1;
352 udp_encap_enable();
353
354 sk->sk_user_data = fou;
355 fou->sock = sock;
356
357 udp_set_convert_csum(sk, true);
358
359 sk->sk_allocation = GFP_ATOMIC;
360
361 if (cfg->udp_config.family == AF_INET) {
362 err = udp_add_offload(&fou->udp_offloads);
363 if (err)
364 goto error;
365 }
366
367 err = fou_add_to_port_list(fou);
368 if (err)
369 goto error;
370
371 if (sockp)
372 *sockp = sock;
373
374 return 0;
375
376error:
377 kfree(fou);
378 if (sock)
379 sock_release(sock);
380
381 return err;
382}
383
384static int fou_destroy(struct net *net, struct fou_cfg *cfg)
385{
386 struct fou *fou;
387 u16 port = cfg->udp_config.local_udp_port;
388 int err = -EINVAL;
389
390 spin_lock(&fou_lock);
391 list_for_each_entry(fou, &fou_list, list) {
392 if (fou->port == port) {
393 udp_del_offload(&fou->udp_offloads);
394 fou_release(fou);
395 err = 0;
396 break;
397 }
398 }
399 spin_unlock(&fou_lock);
400
401 return err;
402}
403
404static struct genl_family fou_nl_family = {
405 .id = GENL_ID_GENERATE,
406 .hdrsize = 0,
407 .name = FOU_GENL_NAME,
408 .version = FOU_GENL_VERSION,
409 .maxattr = FOU_ATTR_MAX,
410 .netnsok = true,
411};
412
413static struct nla_policy fou_nl_policy[FOU_ATTR_MAX + 1] = {
414 [FOU_ATTR_PORT] = { .type = NLA_U16, },
415 [FOU_ATTR_AF] = { .type = NLA_U8, },
416 [FOU_ATTR_IPPROTO] = { .type = NLA_U8, },
417 [FOU_ATTR_TYPE] = { .type = NLA_U8, },
418};
419
420static int parse_nl_config(struct genl_info *info,
421 struct fou_cfg *cfg)
422{
423 memset(cfg, 0, sizeof(*cfg));
424
425 cfg->udp_config.family = AF_INET;
426
427 if (info->attrs[FOU_ATTR_AF]) {
428 u8 family = nla_get_u8(info->attrs[FOU_ATTR_AF]);
429
430 if (family != AF_INET && family != AF_INET6)
431 return -EINVAL;
432
433 cfg->udp_config.family = family;
434 }
435
436 if (info->attrs[FOU_ATTR_PORT]) {
437 u16 port = nla_get_u16(info->attrs[FOU_ATTR_PORT]);
438
439 cfg->udp_config.local_udp_port = port;
440 }
441
442 if (info->attrs[FOU_ATTR_IPPROTO])
443 cfg->protocol = nla_get_u8(info->attrs[FOU_ATTR_IPPROTO]);
444
445 if (info->attrs[FOU_ATTR_TYPE])
446 cfg->type = nla_get_u8(info->attrs[FOU_ATTR_TYPE]);
447
448 return 0;
449}
450
451static int fou_nl_cmd_add_port(struct sk_buff *skb, struct genl_info *info)
452{
453 struct fou_cfg cfg;
454 int err;
455
456 err = parse_nl_config(info, &cfg);
457 if (err)
458 return err;
459
460 return fou_create(&init_net, &cfg, NULL);
461}
462
463static int fou_nl_cmd_rm_port(struct sk_buff *skb, struct genl_info *info)
464{
465 struct fou_cfg cfg;
466
467 parse_nl_config(info, &cfg);
468
469 return fou_destroy(&init_net, &cfg);
470}
471
472static const struct genl_ops fou_nl_ops[] = {
473 {
474 .cmd = FOU_CMD_ADD,
475 .doit = fou_nl_cmd_add_port,
476 .policy = fou_nl_policy,
477 .flags = GENL_ADMIN_PERM,
478 },
479 {
480 .cmd = FOU_CMD_DEL,
481 .doit = fou_nl_cmd_rm_port,
482 .policy = fou_nl_policy,
483 .flags = GENL_ADMIN_PERM,
484 },
485};
486
487static int __init fou_init(void)
488{
489 int ret;
490
491 ret = genl_register_family_with_ops(&fou_nl_family,
492 fou_nl_ops);
493
494 return ret;
495}
496
497static void __exit fou_fini(void)
498{
499 struct fou *fou, *next;
500
501 genl_unregister_family(&fou_nl_family);
502
503 /* Close all the FOU sockets */
504
505 spin_lock(&fou_lock);
506 list_for_each_entry_safe(fou, next, &fou_list, list)
507 fou_release(fou);
508 spin_unlock(&fou_lock);
509}
510
511module_init(fou_init);
512module_exit(fou_fini);
513MODULE_AUTHOR("Tom Herbert <therbert@google.com>");
514MODULE_LICENSE("GPL");
diff --git a/net/ipv4/geneve.c b/net/ipv4/geneve.c
new file mode 100644
index 000000000000..065cd94c640c
--- /dev/null
+++ b/net/ipv4/geneve.c
@@ -0,0 +1,373 @@
1/*
2 * Geneve: Generic Network Virtualization Encapsulation
3 *
4 * Copyright (c) 2014 Nicira, Inc.
5 *
6 * This program is free software; you can redistribute it and/or
7 * modify it under the terms of the GNU General Public License
8 * as published by the Free Software Foundation; either version
9 * 2 of the License, or (at your option) any later version.
10 */
11
12#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
13
14#include <linux/kernel.h>
15#include <linux/types.h>
16#include <linux/module.h>
17#include <linux/errno.h>
18#include <linux/slab.h>
19#include <linux/skbuff.h>
20#include <linux/rculist.h>
21#include <linux/netdevice.h>
22#include <linux/in.h>
23#include <linux/ip.h>
24#include <linux/udp.h>
25#include <linux/igmp.h>
26#include <linux/etherdevice.h>
27#include <linux/if_ether.h>
28#include <linux/if_vlan.h>
29#include <linux/hash.h>
30#include <linux/ethtool.h>
31#include <net/arp.h>
32#include <net/ndisc.h>
33#include <net/ip.h>
34#include <net/ip_tunnels.h>
35#include <net/icmp.h>
36#include <net/udp.h>
37#include <net/rtnetlink.h>
38#include <net/route.h>
39#include <net/dsfield.h>
40#include <net/inet_ecn.h>
41#include <net/net_namespace.h>
42#include <net/netns/generic.h>
43#include <net/geneve.h>
44#include <net/protocol.h>
45#include <net/udp_tunnel.h>
46#if IS_ENABLED(CONFIG_IPV6)
47#include <net/ipv6.h>
48#include <net/addrconf.h>
49#include <net/ip6_tunnel.h>
50#include <net/ip6_checksum.h>
51#endif
52
53#define PORT_HASH_BITS 8
54#define PORT_HASH_SIZE (1<<PORT_HASH_BITS)
55
56/* per-network namespace private data for this module */
57struct geneve_net {
58 struct hlist_head sock_list[PORT_HASH_SIZE];
59 spinlock_t sock_lock; /* Protects sock_list */
60};
61
62static int geneve_net_id;
63
64static struct workqueue_struct *geneve_wq;
65
66static inline struct genevehdr *geneve_hdr(const struct sk_buff *skb)
67{
68 return (struct genevehdr *)(udp_hdr(skb) + 1);
69}
70
71static struct hlist_head *gs_head(struct net *net, __be16 port)
72{
73 struct geneve_net *gn = net_generic(net, geneve_net_id);
74
75 return &gn->sock_list[hash_32(ntohs(port), PORT_HASH_BITS)];
76}
77
78/* Find geneve socket based on network namespace and UDP port */
79static struct geneve_sock *geneve_find_sock(struct net *net, __be16 port)
80{
81 struct geneve_sock *gs;
82
83 hlist_for_each_entry_rcu(gs, gs_head(net, port), hlist) {
84 if (inet_sk(gs->sock->sk)->inet_sport == port)
85 return gs;
86 }
87
88 return NULL;
89}
90
91static void geneve_build_header(struct genevehdr *geneveh,
92 __be16 tun_flags, u8 vni[3],
93 u8 options_len, u8 *options)
94{
95 geneveh->ver = GENEVE_VER;
96 geneveh->opt_len = options_len / 4;
97 geneveh->oam = !!(tun_flags & TUNNEL_OAM);
98 geneveh->critical = !!(tun_flags & TUNNEL_CRIT_OPT);
99 geneveh->rsvd1 = 0;
100 memcpy(geneveh->vni, vni, 3);
101 geneveh->proto_type = htons(ETH_P_TEB);
102 geneveh->rsvd2 = 0;
103
104 memcpy(geneveh->options, options, options_len);
105}
106
107/* Transmit a fully formated Geneve frame.
108 *
109 * When calling this function. The skb->data should point
110 * to the geneve header which is fully formed.
111 *
112 * This function will add other UDP tunnel headers.
113 */
114int geneve_xmit_skb(struct geneve_sock *gs, struct rtable *rt,
115 struct sk_buff *skb, __be32 src, __be32 dst, __u8 tos,
116 __u8 ttl, __be16 df, __be16 src_port, __be16 dst_port,
117 __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
118 bool xnet)
119{
120 struct genevehdr *gnvh;
121 int min_headroom;
122 int err;
123
124 skb = udp_tunnel_handle_offloads(skb, !gs->sock->sk->sk_no_check_tx);
125
126 min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
127 + GENEVE_BASE_HLEN + opt_len + sizeof(struct iphdr)
128 + (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
129
130 err = skb_cow_head(skb, min_headroom);
131 if (unlikely(err))
132 return err;
133
134 if (vlan_tx_tag_present(skb)) {
135 if (unlikely(!__vlan_put_tag(skb,
136 skb->vlan_proto,
137 vlan_tx_tag_get(skb)))) {
138 err = -ENOMEM;
139 return err;
140 }
141 skb->vlan_tci = 0;
142 }
143
144 gnvh = (struct genevehdr *)__skb_push(skb, sizeof(*gnvh) + opt_len);
145 geneve_build_header(gnvh, tun_flags, vni, opt_len, opt);
146
147 return udp_tunnel_xmit_skb(gs->sock, rt, skb, src, dst,
148 tos, ttl, df, src_port, dst_port, xnet);
149}
150EXPORT_SYMBOL_GPL(geneve_xmit_skb);
151
152static void geneve_notify_add_rx_port(struct geneve_sock *gs)
153{
154 struct sock *sk = gs->sock->sk;
155 sa_family_t sa_family = sk->sk_family;
156 int err;
157
158 if (sa_family == AF_INET) {
159 err = udp_add_offload(&gs->udp_offloads);
160 if (err)
161 pr_warn("geneve: udp_add_offload failed with status %d\n",
162 err);
163 }
164}
165
166/* Callback from net/ipv4/udp.c to receive packets */
167static int geneve_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
168{
169 struct genevehdr *geneveh;
170 struct geneve_sock *gs;
171 int opts_len;
172
173 /* Need Geneve and inner Ethernet header to be present */
174 if (unlikely(!pskb_may_pull(skb, GENEVE_BASE_HLEN)))
175 goto error;
176
177 /* Return packets with reserved bits set */
178 geneveh = geneve_hdr(skb);
179
180 if (unlikely(geneveh->ver != GENEVE_VER))
181 goto error;
182
183 if (unlikely(geneveh->proto_type != htons(ETH_P_TEB)))
184 goto error;
185
186 opts_len = geneveh->opt_len * 4;
187 if (iptunnel_pull_header(skb, GENEVE_BASE_HLEN + opts_len,
188 htons(ETH_P_TEB)))
189 goto drop;
190
191 gs = rcu_dereference_sk_user_data(sk);
192 if (!gs)
193 goto drop;
194
195 gs->rcv(gs, skb);
196 return 0;
197
198drop:
199 /* Consume bad packet */
200 kfree_skb(skb);
201 return 0;
202
203error:
204 /* Let the UDP layer deal with the skb */
205 return 1;
206}
207
208static void geneve_del_work(struct work_struct *work)
209{
210 struct geneve_sock *gs = container_of(work, struct geneve_sock,
211 del_work);
212
213 udp_tunnel_sock_release(gs->sock);
214 kfree_rcu(gs, rcu);
215}
216
217static struct socket *geneve_create_sock(struct net *net, bool ipv6,
218 __be16 port)
219{
220 struct socket *sock;
221 struct udp_port_cfg udp_conf;
222 int err;
223
224 memset(&udp_conf, 0, sizeof(udp_conf));
225
226 if (ipv6) {
227 udp_conf.family = AF_INET6;
228 } else {
229 udp_conf.family = AF_INET;
230 udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
231 }
232
233 udp_conf.local_udp_port = port;
234
235 /* Open UDP socket */
236 err = udp_sock_create(net, &udp_conf, &sock);
237 if (err < 0)
238 return ERR_PTR(err);
239
240 return sock;
241}
242
243/* Create new listen socket if needed */
244static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port,
245 geneve_rcv_t *rcv, void *data,
246 bool ipv6)
247{
248 struct geneve_net *gn = net_generic(net, geneve_net_id);
249 struct geneve_sock *gs;
250 struct socket *sock;
251 struct udp_tunnel_sock_cfg tunnel_cfg;
252
253 gs = kzalloc(sizeof(*gs), GFP_KERNEL);
254 if (!gs)
255 return ERR_PTR(-ENOMEM);
256
257 INIT_WORK(&gs->del_work, geneve_del_work);
258
259 sock = geneve_create_sock(net, ipv6, port);
260 if (IS_ERR(sock)) {
261 kfree(gs);
262 return ERR_CAST(sock);
263 }
264
265 gs->sock = sock;
266 atomic_set(&gs->refcnt, 1);
267 gs->rcv = rcv;
268 gs->rcv_data = data;
269
270 /* Initialize the geneve udp offloads structure */
271 gs->udp_offloads.port = port;
272 gs->udp_offloads.callbacks.gro_receive = NULL;
273 gs->udp_offloads.callbacks.gro_complete = NULL;
274
275 spin_lock(&gn->sock_lock);
276 hlist_add_head_rcu(&gs->hlist, gs_head(net, port));
277 geneve_notify_add_rx_port(gs);
278 spin_unlock(&gn->sock_lock);
279
280 /* Mark socket as an encapsulation socket */
281 tunnel_cfg.sk_user_data = gs;
282 tunnel_cfg.encap_type = 1;
283 tunnel_cfg.encap_rcv = geneve_udp_encap_recv;
284 tunnel_cfg.encap_destroy = NULL;
285 setup_udp_tunnel_sock(net, sock, &tunnel_cfg);
286
287 return gs;
288}
289
290struct geneve_sock *geneve_sock_add(struct net *net, __be16 port,
291 geneve_rcv_t *rcv, void *data,
292 bool no_share, bool ipv6)
293{
294 struct geneve_sock *gs;
295
296 gs = geneve_socket_create(net, port, rcv, data, ipv6);
297 if (!IS_ERR(gs))
298 return gs;
299
300 if (no_share) /* Return error if sharing is not allowed. */
301 return ERR_PTR(-EINVAL);
302
303 gs = geneve_find_sock(net, port);
304 if (gs) {
305 if (gs->rcv == rcv)
306 atomic_inc(&gs->refcnt);
307 else
308 gs = ERR_PTR(-EBUSY);
309 } else {
310 gs = ERR_PTR(-EINVAL);
311 }
312
313 return gs;
314}
315EXPORT_SYMBOL_GPL(geneve_sock_add);
316
317void geneve_sock_release(struct geneve_sock *gs)
318{
319 if (!atomic_dec_and_test(&gs->refcnt))
320 return;
321
322 queue_work(geneve_wq, &gs->del_work);
323}
324EXPORT_SYMBOL_GPL(geneve_sock_release);
325
326static __net_init int geneve_init_net(struct net *net)
327{
328 struct geneve_net *gn = net_generic(net, geneve_net_id);
329 unsigned int h;
330
331 spin_lock_init(&gn->sock_lock);
332
333 for (h = 0; h < PORT_HASH_SIZE; ++h)
334 INIT_HLIST_HEAD(&gn->sock_list[h]);
335
336 return 0;
337}
338
339static struct pernet_operations geneve_net_ops = {
340 .init = geneve_init_net,
341 .exit = NULL,
342 .id = &geneve_net_id,
343 .size = sizeof(struct geneve_net),
344};
345
346static int __init geneve_init_module(void)
347{
348 int rc;
349
350 geneve_wq = alloc_workqueue("geneve", 0, 0);
351 if (!geneve_wq)
352 return -ENOMEM;
353
354 rc = register_pernet_subsys(&geneve_net_ops);
355 if (rc)
356 return rc;
357
358 pr_info("Geneve driver\n");
359
360 return 0;
361}
362late_initcall(geneve_init_module);
363
364static void __exit geneve_cleanup_module(void)
365{
366 destroy_workqueue(geneve_wq);
367}
368module_exit(geneve_cleanup_module);
369
370MODULE_LICENSE("GPL");
371MODULE_AUTHOR("Jesse Gross <jesse@nicira.com>");
372MODULE_DESCRIPTION("Driver for GENEVE encapsulated traffic");
373MODULE_ALIAS_RTNL_LINK("geneve");
diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
index 0485bf7f8f03..4a7b5b2a1ce3 100644
--- a/net/ipv4/gre_demux.c
+++ b/net/ipv4/gre_demux.c
@@ -98,7 +98,6 @@ EXPORT_SYMBOL_GPL(gre_build_header);
98static int parse_gre_header(struct sk_buff *skb, struct tnl_ptk_info *tpi, 98static int parse_gre_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
99 bool *csum_err) 99 bool *csum_err)
100{ 100{
101 unsigned int ip_hlen = ip_hdrlen(skb);
102 const struct gre_base_hdr *greh; 101 const struct gre_base_hdr *greh;
103 __be32 *options; 102 __be32 *options;
104 int hdr_len; 103 int hdr_len;
@@ -106,7 +105,7 @@ static int parse_gre_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
106 if (unlikely(!pskb_may_pull(skb, sizeof(struct gre_base_hdr)))) 105 if (unlikely(!pskb_may_pull(skb, sizeof(struct gre_base_hdr))))
107 return -EINVAL; 106 return -EINVAL;
108 107
109 greh = (struct gre_base_hdr *)(skb_network_header(skb) + ip_hlen); 108 greh = (struct gre_base_hdr *)skb_transport_header(skb);
110 if (unlikely(greh->flags & (GRE_VERSION | GRE_ROUTING))) 109 if (unlikely(greh->flags & (GRE_VERSION | GRE_ROUTING)))
111 return -EINVAL; 110 return -EINVAL;
112 111
@@ -116,7 +115,7 @@ static int parse_gre_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
116 if (!pskb_may_pull(skb, hdr_len)) 115 if (!pskb_may_pull(skb, hdr_len))
117 return -EINVAL; 116 return -EINVAL;
118 117
119 greh = (struct gre_base_hdr *)(skb_network_header(skb) + ip_hlen); 118 greh = (struct gre_base_hdr *)skb_transport_header(skb);
120 tpi->proto = greh->protocol; 119 tpi->proto = greh->protocol;
121 120
122 options = (__be32 *)(greh + 1); 121 options = (__be32 *)(greh + 1);
@@ -125,6 +124,10 @@ static int parse_gre_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
125 *csum_err = true; 124 *csum_err = true;
126 return -EINVAL; 125 return -EINVAL;
127 } 126 }
127
128 skb_checksum_try_convert(skb, IPPROTO_GRE, 0,
129 null_compute_pseudo);
130
128 options++; 131 options++;
129 } 132 }
130 133
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 6556263c8fa5..a77729503071 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -15,13 +15,6 @@
15#include <net/protocol.h> 15#include <net/protocol.h>
16#include <net/gre.h> 16#include <net/gre.h>
17 17
18static int gre_gso_send_check(struct sk_buff *skb)
19{
20 if (!skb->encapsulation)
21 return -EINVAL;
22 return 0;
23}
24
25static struct sk_buff *gre_gso_segment(struct sk_buff *skb, 18static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
26 netdev_features_t features) 19 netdev_features_t features)
27{ 20{
@@ -46,6 +39,9 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
46 SKB_GSO_IPIP))) 39 SKB_GSO_IPIP)))
47 goto out; 40 goto out;
48 41
42 if (!skb->encapsulation)
43 goto out;
44
49 if (unlikely(!pskb_may_pull(skb, sizeof(*greh)))) 45 if (unlikely(!pskb_may_pull(skb, sizeof(*greh))))
50 goto out; 46 goto out;
51 47
@@ -119,28 +115,6 @@ out:
119 return segs; 115 return segs;
120} 116}
121 117
122/* Compute the whole skb csum in s/w and store it, then verify GRO csum
123 * starting from gro_offset.
124 */
125static __sum16 gro_skb_checksum(struct sk_buff *skb)
126{
127 __sum16 sum;
128
129 skb->csum = skb_checksum(skb, 0, skb->len, 0);
130 NAPI_GRO_CB(skb)->csum = csum_sub(skb->csum,
131 csum_partial(skb->data, skb_gro_offset(skb), 0));
132 sum = csum_fold(NAPI_GRO_CB(skb)->csum);
133 if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE)) {
134 if (unlikely(!sum) && !skb->csum_complete_sw)
135 netdev_rx_csum_fault(skb->dev);
136 } else {
137 skb->ip_summed = CHECKSUM_COMPLETE;
138 skb->csum_complete_sw = 1;
139 }
140
141 return sum;
142}
143
144static struct sk_buff **gre_gro_receive(struct sk_buff **head, 118static struct sk_buff **gre_gro_receive(struct sk_buff **head,
145 struct sk_buff *skb) 119 struct sk_buff *skb)
146{ 120{
@@ -192,22 +166,16 @@ static struct sk_buff **gre_gro_receive(struct sk_buff **head,
192 if (unlikely(!greh)) 166 if (unlikely(!greh))
193 goto out_unlock; 167 goto out_unlock;
194 } 168 }
195 if (greh->flags & GRE_CSUM) { /* Need to verify GRE csum first */ 169
196 __sum16 csum = 0; 170 /* Don't bother verifying checksum if we're going to flush anyway. */
197 171 if ((greh->flags & GRE_CSUM) && !NAPI_GRO_CB(skb)->flush) {
198 if (skb->ip_summed == CHECKSUM_COMPLETE) 172 if (skb_gro_checksum_simple_validate(skb))
199 csum = csum_fold(NAPI_GRO_CB(skb)->csum);
200 /* Don't trust csum error calculated/reported by h/w */
201 if (skb->ip_summed == CHECKSUM_NONE || csum != 0)
202 csum = gro_skb_checksum(skb);
203
204 /* GRE CSUM is the 1's complement of the 1's complement sum
205 * of the GRE hdr plus payload so it should add up to 0xffff
206 * (and 0 after csum_fold()) just like the IPv4 hdr csum.
207 */
208 if (csum)
209 goto out_unlock; 173 goto out_unlock;
174
175 skb_gro_checksum_try_convert(skb, IPPROTO_GRE, 0,
176 null_compute_pseudo);
210 } 177 }
178
211 flush = 0; 179 flush = 0;
212 180
213 for (p = *head; p; p = p->next) { 181 for (p = *head; p; p = p->next) {
@@ -284,7 +252,6 @@ static int gre_gro_complete(struct sk_buff *skb, int nhoff)
284 252
285static const struct net_offload gre_offload = { 253static const struct net_offload gre_offload = {
286 .callbacks = { 254 .callbacks = {
287 .gso_send_check = gre_gso_send_check,
288 .gso_segment = gre_gso_segment, 255 .gso_segment = gre_gso_segment,
289 .gro_receive = gre_gro_receive, 256 .gro_receive = gre_gro_receive,
290 .gro_complete = gre_gro_complete, 257 .gro_complete = gre_gro_complete,
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index ea7d4afe8205..5882f584910e 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -231,12 +231,62 @@ static inline void icmp_xmit_unlock(struct sock *sk)
231 spin_unlock_bh(&sk->sk_lock.slock); 231 spin_unlock_bh(&sk->sk_lock.slock);
232} 232}
233 233
234int sysctl_icmp_msgs_per_sec __read_mostly = 1000;
235int sysctl_icmp_msgs_burst __read_mostly = 50;
236
237static struct {
238 spinlock_t lock;
239 u32 credit;
240 u32 stamp;
241} icmp_global = {
242 .lock = __SPIN_LOCK_UNLOCKED(icmp_global.lock),
243};
244
245/**
246 * icmp_global_allow - Are we allowed to send one more ICMP message ?
247 *
248 * Uses a token bucket to limit our ICMP messages to sysctl_icmp_msgs_per_sec.
249 * Returns false if we reached the limit and can not send another packet.
250 * Note: called with BH disabled
251 */
252bool icmp_global_allow(void)
253{
254 u32 credit, delta, incr = 0, now = (u32)jiffies;
255 bool rc = false;
256
257 /* Check if token bucket is empty and cannot be refilled
258 * without taking the spinlock.
259 */
260 if (!icmp_global.credit) {
261 delta = min_t(u32, now - icmp_global.stamp, HZ);
262 if (delta < HZ / 50)
263 return false;
264 }
265
266 spin_lock(&icmp_global.lock);
267 delta = min_t(u32, now - icmp_global.stamp, HZ);
268 if (delta >= HZ / 50) {
269 incr = sysctl_icmp_msgs_per_sec * delta / HZ ;
270 if (incr)
271 icmp_global.stamp = now;
272 }
273 credit = min_t(u32, icmp_global.credit + incr, sysctl_icmp_msgs_burst);
274 if (credit) {
275 credit--;
276 rc = true;
277 }
278 icmp_global.credit = credit;
279 spin_unlock(&icmp_global.lock);
280 return rc;
281}
282EXPORT_SYMBOL(icmp_global_allow);
283
234/* 284/*
235 * Send an ICMP frame. 285 * Send an ICMP frame.
236 */ 286 */
237 287
238static inline bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt, 288static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
239 struct flowi4 *fl4, int type, int code) 289 struct flowi4 *fl4, int type, int code)
240{ 290{
241 struct dst_entry *dst = &rt->dst; 291 struct dst_entry *dst = &rt->dst;
242 bool rc = true; 292 bool rc = true;
@@ -253,8 +303,14 @@ static inline bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
253 goto out; 303 goto out;
254 304
255 /* Limit if icmp type is enabled in ratemask. */ 305 /* Limit if icmp type is enabled in ratemask. */
256 if ((1 << type) & net->ipv4.sysctl_icmp_ratemask) { 306 if (!((1 << type) & net->ipv4.sysctl_icmp_ratemask))
257 struct inet_peer *peer = inet_getpeer_v4(net->ipv4.peers, fl4->daddr, 1); 307 goto out;
308
309 rc = false;
310 if (icmp_global_allow()) {
311 struct inet_peer *peer;
312
313 peer = inet_getpeer_v4(net->ipv4.peers, fl4->daddr, 1);
258 rc = inet_peer_xrlim_allow(peer, 314 rc = inet_peer_xrlim_allow(peer,
259 net->ipv4.sysctl_icmp_ratelimit); 315 net->ipv4.sysctl_icmp_ratelimit);
260 if (peer) 316 if (peer)
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index f10eab462282..fb70e3ecc3e4 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -117,7 +117,7 @@
117#define IGMP_V2_Unsolicited_Report_Interval (10*HZ) 117#define IGMP_V2_Unsolicited_Report_Interval (10*HZ)
118#define IGMP_V3_Unsolicited_Report_Interval (1*HZ) 118#define IGMP_V3_Unsolicited_Report_Interval (1*HZ)
119#define IGMP_Query_Response_Interval (10*HZ) 119#define IGMP_Query_Response_Interval (10*HZ)
120#define IGMP_Unsolicited_Report_Count 2 120#define IGMP_Query_Robustness_Variable 2
121 121
122 122
123#define IGMP_Initial_Report_Delay (1) 123#define IGMP_Initial_Report_Delay (1)
@@ -756,8 +756,7 @@ static void igmp_ifc_event(struct in_device *in_dev)
756{ 756{
757 if (IGMP_V1_SEEN(in_dev) || IGMP_V2_SEEN(in_dev)) 757 if (IGMP_V1_SEEN(in_dev) || IGMP_V2_SEEN(in_dev))
758 return; 758 return;
759 in_dev->mr_ifc_count = in_dev->mr_qrv ? in_dev->mr_qrv : 759 in_dev->mr_ifc_count = in_dev->mr_qrv ?: sysctl_igmp_qrv;
760 IGMP_Unsolicited_Report_Count;
761 igmp_ifc_start_timer(in_dev, 1); 760 igmp_ifc_start_timer(in_dev, 1);
762} 761}
763 762
@@ -932,7 +931,7 @@ static bool igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
932 in_dev->mr_qrv = ih3->qrv; 931 in_dev->mr_qrv = ih3->qrv;
933 if (!group) { /* general query */ 932 if (!group) { /* general query */
934 if (ih3->nsrcs) 933 if (ih3->nsrcs)
935 return false; /* no sources allowed */ 934 return true; /* no sources allowed */
936 igmp_gq_start_timer(in_dev); 935 igmp_gq_start_timer(in_dev);
937 return false; 936 return false;
938 } 937 }
@@ -1086,8 +1085,7 @@ static void igmpv3_add_delrec(struct in_device *in_dev, struct ip_mc_list *im)
1086 pmc->interface = im->interface; 1085 pmc->interface = im->interface;
1087 in_dev_hold(in_dev); 1086 in_dev_hold(in_dev);
1088 pmc->multiaddr = im->multiaddr; 1087 pmc->multiaddr = im->multiaddr;
1089 pmc->crcount = in_dev->mr_qrv ? in_dev->mr_qrv : 1088 pmc->crcount = in_dev->mr_qrv ?: sysctl_igmp_qrv;
1090 IGMP_Unsolicited_Report_Count;
1091 pmc->sfmode = im->sfmode; 1089 pmc->sfmode = im->sfmode;
1092 if (pmc->sfmode == MCAST_INCLUDE) { 1090 if (pmc->sfmode == MCAST_INCLUDE) {
1093 struct ip_sf_list *psf; 1091 struct ip_sf_list *psf;
@@ -1226,8 +1224,7 @@ static void igmp_group_added(struct ip_mc_list *im)
1226 } 1224 }
1227 /* else, v3 */ 1225 /* else, v3 */
1228 1226
1229 im->crcount = in_dev->mr_qrv ? in_dev->mr_qrv : 1227 im->crcount = in_dev->mr_qrv ?: sysctl_igmp_qrv;
1230 IGMP_Unsolicited_Report_Count;
1231 igmp_ifc_event(in_dev); 1228 igmp_ifc_event(in_dev);
1232#endif 1229#endif
1233} 1230}
@@ -1322,7 +1319,7 @@ void ip_mc_inc_group(struct in_device *in_dev, __be32 addr)
1322 spin_lock_init(&im->lock); 1319 spin_lock_init(&im->lock);
1323#ifdef CONFIG_IP_MULTICAST 1320#ifdef CONFIG_IP_MULTICAST
1324 setup_timer(&im->timer, igmp_timer_expire, (unsigned long)im); 1321 setup_timer(&im->timer, igmp_timer_expire, (unsigned long)im);
1325 im->unsolicit_count = IGMP_Unsolicited_Report_Count; 1322 im->unsolicit_count = sysctl_igmp_qrv;
1326#endif 1323#endif
1327 1324
1328 im->next_rcu = in_dev->mc_list; 1325 im->next_rcu = in_dev->mc_list;
@@ -1460,7 +1457,7 @@ void ip_mc_init_dev(struct in_device *in_dev)
1460 (unsigned long)in_dev); 1457 (unsigned long)in_dev);
1461 setup_timer(&in_dev->mr_ifc_timer, igmp_ifc_timer_expire, 1458 setup_timer(&in_dev->mr_ifc_timer, igmp_ifc_timer_expire,
1462 (unsigned long)in_dev); 1459 (unsigned long)in_dev);
1463 in_dev->mr_qrv = IGMP_Unsolicited_Report_Count; 1460 in_dev->mr_qrv = sysctl_igmp_qrv;
1464#endif 1461#endif
1465 1462
1466 spin_lock_init(&in_dev->mc_tomb_lock); 1463 spin_lock_init(&in_dev->mc_tomb_lock);
@@ -1474,6 +1471,9 @@ void ip_mc_up(struct in_device *in_dev)
1474 1471
1475 ASSERT_RTNL(); 1472 ASSERT_RTNL();
1476 1473
1474#ifdef CONFIG_IP_MULTICAST
1475 in_dev->mr_qrv = sysctl_igmp_qrv;
1476#endif
1477 ip_mc_inc_group(in_dev, IGMP_ALL_HOSTS); 1477 ip_mc_inc_group(in_dev, IGMP_ALL_HOSTS);
1478 1478
1479 for_each_pmc_rtnl(in_dev, pmc) 1479 for_each_pmc_rtnl(in_dev, pmc)
@@ -1540,7 +1540,9 @@ static struct in_device *ip_mc_find_dev(struct net *net, struct ip_mreqn *imr)
1540 */ 1540 */
1541int sysctl_igmp_max_memberships __read_mostly = IP_MAX_MEMBERSHIPS; 1541int sysctl_igmp_max_memberships __read_mostly = IP_MAX_MEMBERSHIPS;
1542int sysctl_igmp_max_msf __read_mostly = IP_MAX_MSF; 1542int sysctl_igmp_max_msf __read_mostly = IP_MAX_MSF;
1543 1543#ifdef CONFIG_IP_MULTICAST
1544int sysctl_igmp_qrv __read_mostly = IGMP_Query_Robustness_Variable;
1545#endif
1544 1546
1545static int ip_mc_del1_src(struct ip_mc_list *pmc, int sfmode, 1547static int ip_mc_del1_src(struct ip_mc_list *pmc, int sfmode,
1546 __be32 *psfsrc) 1548 __be32 *psfsrc)
@@ -1575,8 +1577,7 @@ static int ip_mc_del1_src(struct ip_mc_list *pmc, int sfmode,
1575#ifdef CONFIG_IP_MULTICAST 1577#ifdef CONFIG_IP_MULTICAST
1576 if (psf->sf_oldin && 1578 if (psf->sf_oldin &&
1577 !IGMP_V1_SEEN(in_dev) && !IGMP_V2_SEEN(in_dev)) { 1579 !IGMP_V1_SEEN(in_dev) && !IGMP_V2_SEEN(in_dev)) {
1578 psf->sf_crcount = in_dev->mr_qrv ? in_dev->mr_qrv : 1580 psf->sf_crcount = in_dev->mr_qrv ?: sysctl_igmp_qrv;
1579 IGMP_Unsolicited_Report_Count;
1580 psf->sf_next = pmc->tomb; 1581 psf->sf_next = pmc->tomb;
1581 pmc->tomb = psf; 1582 pmc->tomb = psf;
1582 rv = 1; 1583 rv = 1;
@@ -1639,8 +1640,7 @@ static int ip_mc_del_src(struct in_device *in_dev, __be32 *pmca, int sfmode,
1639 /* filter mode change */ 1640 /* filter mode change */
1640 pmc->sfmode = MCAST_INCLUDE; 1641 pmc->sfmode = MCAST_INCLUDE;
1641#ifdef CONFIG_IP_MULTICAST 1642#ifdef CONFIG_IP_MULTICAST
1642 pmc->crcount = in_dev->mr_qrv ? in_dev->mr_qrv : 1643 pmc->crcount = in_dev->mr_qrv ?: sysctl_igmp_qrv;
1643 IGMP_Unsolicited_Report_Count;
1644 in_dev->mr_ifc_count = pmc->crcount; 1644 in_dev->mr_ifc_count = pmc->crcount;
1645 for (psf = pmc->sources; psf; psf = psf->sf_next) 1645 for (psf = pmc->sources; psf; psf = psf->sf_next)
1646 psf->sf_crcount = 0; 1646 psf->sf_crcount = 0;
@@ -1818,8 +1818,7 @@ static int ip_mc_add_src(struct in_device *in_dev, __be32 *pmca, int sfmode,
1818#ifdef CONFIG_IP_MULTICAST 1818#ifdef CONFIG_IP_MULTICAST
1819 /* else no filters; keep old mode for reports */ 1819 /* else no filters; keep old mode for reports */
1820 1820
1821 pmc->crcount = in_dev->mr_qrv ? in_dev->mr_qrv : 1821 pmc->crcount = in_dev->mr_qrv ?: sysctl_igmp_qrv;
1822 IGMP_Unsolicited_Report_Count;
1823 in_dev->mr_ifc_count = pmc->crcount; 1822 in_dev->mr_ifc_count = pmc->crcount;
1824 for (psf = pmc->sources; psf; psf = psf->sf_next) 1823 for (psf = pmc->sources; psf; psf = psf->sf_next)
1825 psf->sf_crcount = 0; 1824 psf->sf_crcount = 0;
@@ -2539,7 +2538,7 @@ static int igmp_mc_seq_show(struct seq_file *seq, void *v)
2539 querier = "NONE"; 2538 querier = "NONE";
2540#endif 2539#endif
2541 2540
2542 if (rcu_dereference(state->in_dev->mc_list) == im) { 2541 if (rcu_access_pointer(state->in_dev->mc_list) == im) {
2543 seq_printf(seq, "%d\t%-10s: %5d %7s\n", 2542 seq_printf(seq, "%d\t%-10s: %5d %7s\n",
2544 state->dev->ifindex, state->dev->name, state->in_dev->mc_count, querier); 2543 state->dev->ifindex, state->dev->name, state->in_dev->mc_count, querier);
2545 } 2544 }
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 43116e8c8e13..9111a4e22155 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -229,7 +229,7 @@ begin:
229 } 229 }
230 } else if (score == hiscore && reuseport) { 230 } else if (score == hiscore && reuseport) {
231 matches++; 231 matches++;
232 if (((u64)phash * matches) >> 32 == 0) 232 if (reciprocal_scale(phash, matches) == 0)
233 result = sk; 233 result = sk;
234 phash = next_pseudo_random32(phash); 234 phash = next_pseudo_random32(phash);
235 } 235 }
diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index bd5f5928167d..241afd743d2c 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -72,29 +72,10 @@ void inet_peer_base_init(struct inet_peer_base *bp)
72{ 72{
73 bp->root = peer_avl_empty_rcu; 73 bp->root = peer_avl_empty_rcu;
74 seqlock_init(&bp->lock); 74 seqlock_init(&bp->lock);
75 bp->flush_seq = ~0U;
76 bp->total = 0; 75 bp->total = 0;
77} 76}
78EXPORT_SYMBOL_GPL(inet_peer_base_init); 77EXPORT_SYMBOL_GPL(inet_peer_base_init);
79 78
80static atomic_t v4_seq = ATOMIC_INIT(0);
81static atomic_t v6_seq = ATOMIC_INIT(0);
82
83static atomic_t *inetpeer_seq_ptr(int family)
84{
85 return (family == AF_INET ? &v4_seq : &v6_seq);
86}
87
88static inline void flush_check(struct inet_peer_base *base, int family)
89{
90 atomic_t *fp = inetpeer_seq_ptr(family);
91
92 if (unlikely(base->flush_seq != atomic_read(fp))) {
93 inetpeer_invalidate_tree(base);
94 base->flush_seq = atomic_read(fp);
95 }
96}
97
98#define PEER_MAXDEPTH 40 /* sufficient for about 2^27 nodes */ 79#define PEER_MAXDEPTH 40 /* sufficient for about 2^27 nodes */
99 80
100/* Exported for sysctl_net_ipv4. */ 81/* Exported for sysctl_net_ipv4. */
@@ -444,8 +425,6 @@ struct inet_peer *inet_getpeer(struct inet_peer_base *base,
444 unsigned int sequence; 425 unsigned int sequence;
445 int invalidated, gccnt = 0; 426 int invalidated, gccnt = 0;
446 427
447 flush_check(base, daddr->family);
448
449 /* Attempt a lockless lookup first. 428 /* Attempt a lockless lookup first.
450 * Because of a concurrent writer, we might not find an existing entry. 429 * Because of a concurrent writer, we might not find an existing entry.
451 */ 430 */
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 15f0e2bad7ad..2811cc18701a 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -790,7 +790,7 @@ static void __net_exit ip4_frags_ns_ctl_unregister(struct net *net)
790 kfree(table); 790 kfree(table);
791} 791}
792 792
793static void ip4_frags_ctl_register(void) 793static void __init ip4_frags_ctl_register(void)
794{ 794{
795 register_net_sysctl(&init_net, "net/ipv4", ip4_frags_ctl_table); 795 register_net_sysctl(&init_net, "net/ipv4", ip4_frags_ctl_table);
796} 796}
@@ -804,7 +804,7 @@ static inline void ip4_frags_ns_ctl_unregister(struct net *net)
804{ 804{
805} 805}
806 806
807static inline void ip4_frags_ctl_register(void) 807static inline void __init ip4_frags_ctl_register(void)
808{ 808{
809} 809}
810#endif 810#endif
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 9b842544aea3..12055fdbe716 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -239,7 +239,9 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
239 tpi.seq = htonl(tunnel->o_seqno); 239 tpi.seq = htonl(tunnel->o_seqno);
240 240
241 /* Push GRE header. */ 241 /* Push GRE header. */
242 gre_build_header(skb, &tpi, tunnel->hlen); 242 gre_build_header(skb, &tpi, tunnel->tun_hlen);
243
244 skb_set_inner_protocol(skb, tpi.proto);
243 245
244 ip_tunnel_xmit(skb, dev, tnl_params, tnl_params->protocol); 246 ip_tunnel_xmit(skb, dev, tnl_params, tnl_params->protocol);
245} 247}
@@ -310,7 +312,7 @@ out:
310static int ipgre_tunnel_ioctl(struct net_device *dev, 312static int ipgre_tunnel_ioctl(struct net_device *dev,
311 struct ifreq *ifr, int cmd) 313 struct ifreq *ifr, int cmd)
312{ 314{
313 int err = 0; 315 int err;
314 struct ip_tunnel_parm p; 316 struct ip_tunnel_parm p;
315 317
316 if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) 318 if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
@@ -470,13 +472,18 @@ static void ipgre_tunnel_setup(struct net_device *dev)
470static void __gre_tunnel_init(struct net_device *dev) 472static void __gre_tunnel_init(struct net_device *dev)
471{ 473{
472 struct ip_tunnel *tunnel; 474 struct ip_tunnel *tunnel;
475 int t_hlen;
473 476
474 tunnel = netdev_priv(dev); 477 tunnel = netdev_priv(dev);
475 tunnel->hlen = ip_gre_calc_hlen(tunnel->parms.o_flags); 478 tunnel->tun_hlen = ip_gre_calc_hlen(tunnel->parms.o_flags);
476 tunnel->parms.iph.protocol = IPPROTO_GRE; 479 tunnel->parms.iph.protocol = IPPROTO_GRE;
477 480
478 dev->needed_headroom = LL_MAX_HEADER + sizeof(struct iphdr) + 4; 481 tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen;
479 dev->mtu = ETH_DATA_LEN - sizeof(struct iphdr) - 4; 482
483 t_hlen = tunnel->hlen + sizeof(struct iphdr);
484
485 dev->needed_headroom = LL_MAX_HEADER + t_hlen + 4;
486 dev->mtu = ETH_DATA_LEN - t_hlen - 4;
480 487
481 dev->features |= GRE_FEATURES; 488 dev->features |= GRE_FEATURES;
482 dev->hw_features |= GRE_FEATURES; 489 dev->hw_features |= GRE_FEATURES;
@@ -503,7 +510,7 @@ static int ipgre_tunnel_init(struct net_device *dev)
503 memcpy(dev->broadcast, &iph->daddr, 4); 510 memcpy(dev->broadcast, &iph->daddr, 4);
504 511
505 dev->flags = IFF_NOARP; 512 dev->flags = IFF_NOARP;
506 dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; 513 netif_keep_dst(dev);
507 dev->addr_len = 4; 514 dev->addr_len = 4;
508 515
509 if (iph->daddr) { 516 if (iph->daddr) {
@@ -628,6 +635,40 @@ static void ipgre_netlink_parms(struct nlattr *data[], struct nlattr *tb[],
628 parms->iph.frag_off = htons(IP_DF); 635 parms->iph.frag_off = htons(IP_DF);
629} 636}
630 637
638/* This function returns true when ENCAP attributes are present in the nl msg */
639static bool ipgre_netlink_encap_parms(struct nlattr *data[],
640 struct ip_tunnel_encap *ipencap)
641{
642 bool ret = false;
643
644 memset(ipencap, 0, sizeof(*ipencap));
645
646 if (!data)
647 return ret;
648
649 if (data[IFLA_GRE_ENCAP_TYPE]) {
650 ret = true;
651 ipencap->type = nla_get_u16(data[IFLA_GRE_ENCAP_TYPE]);
652 }
653
654 if (data[IFLA_GRE_ENCAP_FLAGS]) {
655 ret = true;
656 ipencap->flags = nla_get_u16(data[IFLA_GRE_ENCAP_FLAGS]);
657 }
658
659 if (data[IFLA_GRE_ENCAP_SPORT]) {
660 ret = true;
661 ipencap->sport = nla_get_u16(data[IFLA_GRE_ENCAP_SPORT]);
662 }
663
664 if (data[IFLA_GRE_ENCAP_DPORT]) {
665 ret = true;
666 ipencap->dport = nla_get_u16(data[IFLA_GRE_ENCAP_DPORT]);
667 }
668
669 return ret;
670}
671
631static int gre_tap_init(struct net_device *dev) 672static int gre_tap_init(struct net_device *dev)
632{ 673{
633 __gre_tunnel_init(dev); 674 __gre_tunnel_init(dev);
@@ -657,6 +698,15 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev,
657 struct nlattr *tb[], struct nlattr *data[]) 698 struct nlattr *tb[], struct nlattr *data[])
658{ 699{
659 struct ip_tunnel_parm p; 700 struct ip_tunnel_parm p;
701 struct ip_tunnel_encap ipencap;
702
703 if (ipgre_netlink_encap_parms(data, &ipencap)) {
704 struct ip_tunnel *t = netdev_priv(dev);
705 int err = ip_tunnel_encap_setup(t, &ipencap);
706
707 if (err < 0)
708 return err;
709 }
660 710
661 ipgre_netlink_parms(data, tb, &p); 711 ipgre_netlink_parms(data, tb, &p);
662 return ip_tunnel_newlink(dev, tb, &p); 712 return ip_tunnel_newlink(dev, tb, &p);
@@ -666,6 +716,15 @@ static int ipgre_changelink(struct net_device *dev, struct nlattr *tb[],
666 struct nlattr *data[]) 716 struct nlattr *data[])
667{ 717{
668 struct ip_tunnel_parm p; 718 struct ip_tunnel_parm p;
719 struct ip_tunnel_encap ipencap;
720
721 if (ipgre_netlink_encap_parms(data, &ipencap)) {
722 struct ip_tunnel *t = netdev_priv(dev);
723 int err = ip_tunnel_encap_setup(t, &ipencap);
724
725 if (err < 0)
726 return err;
727 }
669 728
670 ipgre_netlink_parms(data, tb, &p); 729 ipgre_netlink_parms(data, tb, &p);
671 return ip_tunnel_changelink(dev, tb, &p); 730 return ip_tunnel_changelink(dev, tb, &p);
@@ -694,6 +753,14 @@ static size_t ipgre_get_size(const struct net_device *dev)
694 nla_total_size(1) + 753 nla_total_size(1) +
695 /* IFLA_GRE_PMTUDISC */ 754 /* IFLA_GRE_PMTUDISC */
696 nla_total_size(1) + 755 nla_total_size(1) +
756 /* IFLA_GRE_ENCAP_TYPE */
757 nla_total_size(2) +
758 /* IFLA_GRE_ENCAP_FLAGS */
759 nla_total_size(2) +
760 /* IFLA_GRE_ENCAP_SPORT */
761 nla_total_size(2) +
762 /* IFLA_GRE_ENCAP_DPORT */
763 nla_total_size(2) +
697 0; 764 0;
698} 765}
699 766
@@ -714,6 +781,17 @@ static int ipgre_fill_info(struct sk_buff *skb, const struct net_device *dev)
714 nla_put_u8(skb, IFLA_GRE_PMTUDISC, 781 nla_put_u8(skb, IFLA_GRE_PMTUDISC,
715 !!(p->iph.frag_off & htons(IP_DF)))) 782 !!(p->iph.frag_off & htons(IP_DF))))
716 goto nla_put_failure; 783 goto nla_put_failure;
784
785 if (nla_put_u16(skb, IFLA_GRE_ENCAP_TYPE,
786 t->encap.type) ||
787 nla_put_u16(skb, IFLA_GRE_ENCAP_SPORT,
788 t->encap.sport) ||
789 nla_put_u16(skb, IFLA_GRE_ENCAP_DPORT,
790 t->encap.dport) ||
791 nla_put_u16(skb, IFLA_GRE_ENCAP_FLAGS,
792 t->encap.dport))
793 goto nla_put_failure;
794
717 return 0; 795 return 0;
718 796
719nla_put_failure: 797nla_put_failure:
@@ -731,6 +809,10 @@ static const struct nla_policy ipgre_policy[IFLA_GRE_MAX + 1] = {
731 [IFLA_GRE_TTL] = { .type = NLA_U8 }, 809 [IFLA_GRE_TTL] = { .type = NLA_U8 },
732 [IFLA_GRE_TOS] = { .type = NLA_U8 }, 810 [IFLA_GRE_TOS] = { .type = NLA_U8 },
733 [IFLA_GRE_PMTUDISC] = { .type = NLA_U8 }, 811 [IFLA_GRE_PMTUDISC] = { .type = NLA_U8 },
812 [IFLA_GRE_ENCAP_TYPE] = { .type = NLA_U16 },
813 [IFLA_GRE_ENCAP_FLAGS] = { .type = NLA_U16 },
814 [IFLA_GRE_ENCAP_SPORT] = { .type = NLA_U16 },
815 [IFLA_GRE_ENCAP_DPORT] = { .type = NLA_U16 },
734}; 816};
735 817
736static struct rtnl_link_ops ipgre_link_ops __read_mostly = { 818static struct rtnl_link_ops ipgre_link_ops __read_mostly = {
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index ad382499bace..5b3d91be2db0 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -87,17 +87,15 @@ void ip_options_build(struct sk_buff *skb, struct ip_options *opt,
87 * NOTE: dopt cannot point to skb. 87 * NOTE: dopt cannot point to skb.
88 */ 88 */
89 89
90int ip_options_echo(struct ip_options *dopt, struct sk_buff *skb) 90int __ip_options_echo(struct ip_options *dopt, struct sk_buff *skb,
91 const struct ip_options *sopt)
91{ 92{
92 const struct ip_options *sopt;
93 unsigned char *sptr, *dptr; 93 unsigned char *sptr, *dptr;
94 int soffset, doffset; 94 int soffset, doffset;
95 int optlen; 95 int optlen;
96 96
97 memset(dopt, 0, sizeof(struct ip_options)); 97 memset(dopt, 0, sizeof(struct ip_options));
98 98
99 sopt = &(IPCB(skb)->opt);
100
101 if (sopt->optlen == 0) 99 if (sopt->optlen == 0)
102 return 0; 100 return 0;
103 101
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 215af2b155cb..e35b71289156 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -516,7 +516,7 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
516 516
517 hlen = iph->ihl * 4; 517 hlen = iph->ihl * 4;
518 mtu = mtu - hlen; /* Size of data space */ 518 mtu = mtu - hlen; /* Size of data space */
519#ifdef CONFIG_BRIDGE_NETFILTER 519#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
520 if (skb->nf_bridge) 520 if (skb->nf_bridge)
521 mtu -= nf_bridge_mtu_reduction(skb); 521 mtu -= nf_bridge_mtu_reduction(skb);
522#endif 522#endif
@@ -1522,8 +1522,10 @@ static DEFINE_PER_CPU(struct inet_sock, unicast_sock) = {
1522 .uc_ttl = -1, 1522 .uc_ttl = -1,
1523}; 1523};
1524 1524
1525void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr, 1525void ip_send_unicast_reply(struct net *net, struct sk_buff *skb,
1526 __be32 saddr, const struct ip_reply_arg *arg, 1526 const struct ip_options *sopt,
1527 __be32 daddr, __be32 saddr,
1528 const struct ip_reply_arg *arg,
1527 unsigned int len) 1529 unsigned int len)
1528{ 1530{
1529 struct ip_options_data replyopts; 1531 struct ip_options_data replyopts;
@@ -1534,7 +1536,7 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr,
1534 struct sock *sk; 1536 struct sock *sk;
1535 struct inet_sock *inet; 1537 struct inet_sock *inet;
1536 1538
1537 if (ip_options_echo(&replyopts.opt.opt, skb)) 1539 if (__ip_options_echo(&replyopts.opt.opt, skb, sopt))
1538 return; 1540 return;
1539 1541
1540 ipc.addr = daddr; 1542 ipc.addr = daddr;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 5cb830c78990..c373a9ad4555 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -303,7 +303,7 @@ int ip_ra_control(struct sock *sk, unsigned char on,
303 } 303 }
304 /* dont let ip_call_ra_chain() use sk again */ 304 /* dont let ip_call_ra_chain() use sk again */
305 ra->sk = NULL; 305 ra->sk = NULL;
306 rcu_assign_pointer(*rap, ra->next); 306 RCU_INIT_POINTER(*rap, ra->next);
307 spin_unlock_bh(&ip_ra_lock); 307 spin_unlock_bh(&ip_ra_lock);
308 308
309 if (ra->destructor) 309 if (ra->destructor)
@@ -325,7 +325,7 @@ int ip_ra_control(struct sock *sk, unsigned char on,
325 new_ra->sk = sk; 325 new_ra->sk = sk;
326 new_ra->destructor = destructor; 326 new_ra->destructor = destructor;
327 327
328 new_ra->next = ra; 328 RCU_INIT_POINTER(new_ra->next, ra);
329 rcu_assign_pointer(*rap, new_ra); 329 rcu_assign_pointer(*rap, new_ra);
330 sock_hold(sk); 330 sock_hold(sk);
331 spin_unlock_bh(&ip_ra_lock); 331 spin_unlock_bh(&ip_ra_lock);
@@ -405,7 +405,7 @@ void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 inf
405int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len) 405int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
406{ 406{
407 struct sock_exterr_skb *serr; 407 struct sock_exterr_skb *serr;
408 struct sk_buff *skb, *skb2; 408 struct sk_buff *skb;
409 DECLARE_SOCKADDR(struct sockaddr_in *, sin, msg->msg_name); 409 DECLARE_SOCKADDR(struct sockaddr_in *, sin, msg->msg_name);
410 struct { 410 struct {
411 struct sock_extended_err ee; 411 struct sock_extended_err ee;
@@ -415,7 +415,7 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
415 int copied; 415 int copied;
416 416
417 err = -EAGAIN; 417 err = -EAGAIN;
418 skb = skb_dequeue(&sk->sk_error_queue); 418 skb = sock_dequeue_err_skb(sk);
419 if (skb == NULL) 419 if (skb == NULL)
420 goto out; 420 goto out;
421 421
@@ -462,17 +462,6 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
462 msg->msg_flags |= MSG_ERRQUEUE; 462 msg->msg_flags |= MSG_ERRQUEUE;
463 err = copied; 463 err = copied;
464 464
465 /* Reset and regenerate socket error */
466 spin_lock_bh(&sk->sk_error_queue.lock);
467 sk->sk_err = 0;
468 skb2 = skb_peek(&sk->sk_error_queue);
469 if (skb2 != NULL) {
470 sk->sk_err = SKB_EXT_ERR(skb2)->ee.ee_errno;
471 spin_unlock_bh(&sk->sk_error_queue.lock);
472 sk->sk_error_report(sk);
473 } else
474 spin_unlock_bh(&sk->sk_error_queue.lock);
475
476out_free_skb: 465out_free_skb:
477 kfree_skb(skb); 466 kfree_skb(skb);
478out: 467out:
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index bda4bb8ae260..0bb8e141eacc 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -55,6 +55,8 @@
55#include <net/net_namespace.h> 55#include <net/net_namespace.h>
56#include <net/netns/generic.h> 56#include <net/netns/generic.h>
57#include <net/rtnetlink.h> 57#include <net/rtnetlink.h>
58#include <net/udp.h>
59#include <net/gue.h>
58 60
59#if IS_ENABLED(CONFIG_IPV6) 61#if IS_ENABLED(CONFIG_IPV6)
60#include <net/ipv6.h> 62#include <net/ipv6.h>
@@ -487,6 +489,103 @@ drop:
487} 489}
488EXPORT_SYMBOL_GPL(ip_tunnel_rcv); 490EXPORT_SYMBOL_GPL(ip_tunnel_rcv);
489 491
492static int ip_encap_hlen(struct ip_tunnel_encap *e)
493{
494 switch (e->type) {
495 case TUNNEL_ENCAP_NONE:
496 return 0;
497 case TUNNEL_ENCAP_FOU:
498 return sizeof(struct udphdr);
499 case TUNNEL_ENCAP_GUE:
500 return sizeof(struct udphdr) + sizeof(struct guehdr);
501 default:
502 return -EINVAL;
503 }
504}
505
506int ip_tunnel_encap_setup(struct ip_tunnel *t,
507 struct ip_tunnel_encap *ipencap)
508{
509 int hlen;
510
511 memset(&t->encap, 0, sizeof(t->encap));
512
513 hlen = ip_encap_hlen(ipencap);
514 if (hlen < 0)
515 return hlen;
516
517 t->encap.type = ipencap->type;
518 t->encap.sport = ipencap->sport;
519 t->encap.dport = ipencap->dport;
520 t->encap.flags = ipencap->flags;
521
522 t->encap_hlen = hlen;
523 t->hlen = t->encap_hlen + t->tun_hlen;
524
525 return 0;
526}
527EXPORT_SYMBOL_GPL(ip_tunnel_encap_setup);
528
529static int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
530 size_t hdr_len, u8 *protocol, struct flowi4 *fl4)
531{
532 struct udphdr *uh;
533 __be16 sport;
534 bool csum = !!(e->flags & TUNNEL_ENCAP_FLAG_CSUM);
535 int type = csum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
536
537 skb = iptunnel_handle_offloads(skb, csum, type);
538
539 if (IS_ERR(skb))
540 return PTR_ERR(skb);
541
542 /* Get length and hash before making space in skb */
543
544 sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
545 skb, 0, 0, false);
546
547 skb_push(skb, hdr_len);
548
549 skb_reset_transport_header(skb);
550 uh = udp_hdr(skb);
551
552 if (e->type == TUNNEL_ENCAP_GUE) {
553 struct guehdr *guehdr = (struct guehdr *)&uh[1];
554
555 guehdr->version = 0;
556 guehdr->hlen = 0;
557 guehdr->flags = 0;
558 guehdr->next_hdr = *protocol;
559 }
560
561 uh->dest = e->dport;
562 uh->source = sport;
563 uh->len = htons(skb->len);
564 uh->check = 0;
565 udp_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM), skb,
566 fl4->saddr, fl4->daddr, skb->len);
567
568 *protocol = IPPROTO_UDP;
569
570 return 0;
571}
572
573int ip_tunnel_encap(struct sk_buff *skb, struct ip_tunnel *t,
574 u8 *protocol, struct flowi4 *fl4)
575{
576 switch (t->encap.type) {
577 case TUNNEL_ENCAP_NONE:
578 return 0;
579 case TUNNEL_ENCAP_FOU:
580 case TUNNEL_ENCAP_GUE:
581 return fou_build_header(skb, &t->encap, t->encap_hlen,
582 protocol, fl4);
583 default:
584 return -EINVAL;
585 }
586}
587EXPORT_SYMBOL(ip_tunnel_encap);
588
490static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb, 589static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
491 struct rtable *rt, __be16 df) 590 struct rtable *rt, __be16 df)
492{ 591{
@@ -536,7 +635,7 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
536} 635}
537 636
538void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, 637void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
539 const struct iphdr *tnl_params, const u8 protocol) 638 const struct iphdr *tnl_params, u8 protocol)
540{ 639{
541 struct ip_tunnel *tunnel = netdev_priv(dev); 640 struct ip_tunnel *tunnel = netdev_priv(dev);
542 const struct iphdr *inner_iph; 641 const struct iphdr *inner_iph;
@@ -617,6 +716,9 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
617 init_tunnel_flow(&fl4, protocol, dst, tnl_params->saddr, 716 init_tunnel_flow(&fl4, protocol, dst, tnl_params->saddr,
618 tunnel->parms.o_key, RT_TOS(tos), tunnel->parms.link); 717 tunnel->parms.o_key, RT_TOS(tos), tunnel->parms.link);
619 718
719 if (ip_tunnel_encap(skb, tunnel, &protocol, &fl4) < 0)
720 goto tx_error;
721
620 rt = connected ? tunnel_rtable_get(tunnel, 0, &fl4.saddr) : NULL; 722 rt = connected ? tunnel_rtable_get(tunnel, 0, &fl4.saddr) : NULL;
621 723
622 if (!rt) { 724 if (!rt) {
@@ -670,7 +772,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
670 df |= (inner_iph->frag_off&htons(IP_DF)); 772 df |= (inner_iph->frag_off&htons(IP_DF));
671 773
672 max_headroom = LL_RESERVED_SPACE(rt->dst.dev) + sizeof(struct iphdr) 774 max_headroom = LL_RESERVED_SPACE(rt->dst.dev) + sizeof(struct iphdr)
673 + rt->dst.header_len; 775 + rt->dst.header_len + ip_encap_hlen(&tunnel->encap);
674 if (max_headroom > dev->needed_headroom) 776 if (max_headroom > dev->needed_headroom)
675 dev->needed_headroom = max_headroom; 777 dev->needed_headroom = max_headroom;
676 778
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index e453cb724a95..3e861011e4a3 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -364,7 +364,7 @@ static int vti_tunnel_init(struct net_device *dev)
364 dev->iflink = 0; 364 dev->iflink = 0;
365 dev->addr_len = 4; 365 dev->addr_len = 4;
366 dev->features |= NETIF_F_LLTX; 366 dev->features |= NETIF_F_LLTX;
367 dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; 367 netif_keep_dst(dev);
368 368
369 return ip_tunnel_init(dev); 369 return ip_tunnel_init(dev);
370} 370}
diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 5bbef4fdcb43..648fa1490ea7 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -262,7 +262,8 @@ static int __init ic_open_devs(void)
262 /* wait for a carrier on at least one device */ 262 /* wait for a carrier on at least one device */
263 start = jiffies; 263 start = jiffies;
264 next_msg = start + msecs_to_jiffies(CONF_CARRIER_TIMEOUT/12); 264 next_msg = start + msecs_to_jiffies(CONF_CARRIER_TIMEOUT/12);
265 while (jiffies - start < msecs_to_jiffies(CONF_CARRIER_TIMEOUT)) { 265 while (time_before(jiffies, start +
266 msecs_to_jiffies(CONF_CARRIER_TIMEOUT))) {
266 int wait, elapsed; 267 int wait, elapsed;
267 268
268 for_each_netdev(&init_net, dev) 269 for_each_netdev(&init_net, dev)
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 62eaa005e146..37096d64730e 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -224,6 +224,8 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
224 if (IS_ERR(skb)) 224 if (IS_ERR(skb))
225 goto out; 225 goto out;
226 226
227 skb_set_inner_ipproto(skb, IPPROTO_IPIP);
228
227 ip_tunnel_xmit(skb, dev, tiph, tiph->protocol); 229 ip_tunnel_xmit(skb, dev, tiph, tiph->protocol);
228 return NETDEV_TX_OK; 230 return NETDEV_TX_OK;
229 231
@@ -287,7 +289,7 @@ static void ipip_tunnel_setup(struct net_device *dev)
287 dev->iflink = 0; 289 dev->iflink = 0;
288 dev->addr_len = 4; 290 dev->addr_len = 4;
289 dev->features |= NETIF_F_LLTX; 291 dev->features |= NETIF_F_LLTX;
290 dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; 292 netif_keep_dst(dev);
291 293
292 dev->features |= IPIP_FEATURES; 294 dev->features |= IPIP_FEATURES;
293 dev->hw_features |= IPIP_FEATURES; 295 dev->hw_features |= IPIP_FEATURES;
@@ -301,7 +303,8 @@ static int ipip_tunnel_init(struct net_device *dev)
301 memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4); 303 memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4);
302 memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4); 304 memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4);
303 305
304 tunnel->hlen = 0; 306 tunnel->tun_hlen = 0;
307 tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen;
305 tunnel->parms.iph.protocol = IPPROTO_IPIP; 308 tunnel->parms.iph.protocol = IPPROTO_IPIP;
306 return ip_tunnel_init(dev); 309 return ip_tunnel_init(dev);
307} 310}
@@ -340,10 +343,53 @@ static void ipip_netlink_parms(struct nlattr *data[],
340 parms->iph.frag_off = htons(IP_DF); 343 parms->iph.frag_off = htons(IP_DF);
341} 344}
342 345
346/* This function returns true when ENCAP attributes are present in the nl msg */
347static bool ipip_netlink_encap_parms(struct nlattr *data[],
348 struct ip_tunnel_encap *ipencap)
349{
350 bool ret = false;
351
352 memset(ipencap, 0, sizeof(*ipencap));
353
354 if (!data)
355 return ret;
356
357 if (data[IFLA_IPTUN_ENCAP_TYPE]) {
358 ret = true;
359 ipencap->type = nla_get_u16(data[IFLA_IPTUN_ENCAP_TYPE]);
360 }
361
362 if (data[IFLA_IPTUN_ENCAP_FLAGS]) {
363 ret = true;
364 ipencap->flags = nla_get_u16(data[IFLA_IPTUN_ENCAP_FLAGS]);
365 }
366
367 if (data[IFLA_IPTUN_ENCAP_SPORT]) {
368 ret = true;
369 ipencap->sport = nla_get_u16(data[IFLA_IPTUN_ENCAP_SPORT]);
370 }
371
372 if (data[IFLA_IPTUN_ENCAP_DPORT]) {
373 ret = true;
374 ipencap->dport = nla_get_u16(data[IFLA_IPTUN_ENCAP_DPORT]);
375 }
376
377 return ret;
378}
379
343static int ipip_newlink(struct net *src_net, struct net_device *dev, 380static int ipip_newlink(struct net *src_net, struct net_device *dev,
344 struct nlattr *tb[], struct nlattr *data[]) 381 struct nlattr *tb[], struct nlattr *data[])
345{ 382{
346 struct ip_tunnel_parm p; 383 struct ip_tunnel_parm p;
384 struct ip_tunnel_encap ipencap;
385
386 if (ipip_netlink_encap_parms(data, &ipencap)) {
387 struct ip_tunnel *t = netdev_priv(dev);
388 int err = ip_tunnel_encap_setup(t, &ipencap);
389
390 if (err < 0)
391 return err;
392 }
347 393
348 ipip_netlink_parms(data, &p); 394 ipip_netlink_parms(data, &p);
349 return ip_tunnel_newlink(dev, tb, &p); 395 return ip_tunnel_newlink(dev, tb, &p);
@@ -353,6 +399,15 @@ static int ipip_changelink(struct net_device *dev, struct nlattr *tb[],
353 struct nlattr *data[]) 399 struct nlattr *data[])
354{ 400{
355 struct ip_tunnel_parm p; 401 struct ip_tunnel_parm p;
402 struct ip_tunnel_encap ipencap;
403
404 if (ipip_netlink_encap_parms(data, &ipencap)) {
405 struct ip_tunnel *t = netdev_priv(dev);
406 int err = ip_tunnel_encap_setup(t, &ipencap);
407
408 if (err < 0)
409 return err;
410 }
356 411
357 ipip_netlink_parms(data, &p); 412 ipip_netlink_parms(data, &p);
358 413
@@ -378,6 +433,14 @@ static size_t ipip_get_size(const struct net_device *dev)
378 nla_total_size(1) + 433 nla_total_size(1) +
379 /* IFLA_IPTUN_PMTUDISC */ 434 /* IFLA_IPTUN_PMTUDISC */
380 nla_total_size(1) + 435 nla_total_size(1) +
436 /* IFLA_IPTUN_ENCAP_TYPE */
437 nla_total_size(2) +
438 /* IFLA_IPTUN_ENCAP_FLAGS */
439 nla_total_size(2) +
440 /* IFLA_IPTUN_ENCAP_SPORT */
441 nla_total_size(2) +
442 /* IFLA_IPTUN_ENCAP_DPORT */
443 nla_total_size(2) +
381 0; 444 0;
382} 445}
383 446
@@ -394,6 +457,17 @@ static int ipip_fill_info(struct sk_buff *skb, const struct net_device *dev)
394 nla_put_u8(skb, IFLA_IPTUN_PMTUDISC, 457 nla_put_u8(skb, IFLA_IPTUN_PMTUDISC,
395 !!(parm->iph.frag_off & htons(IP_DF)))) 458 !!(parm->iph.frag_off & htons(IP_DF))))
396 goto nla_put_failure; 459 goto nla_put_failure;
460
461 if (nla_put_u16(skb, IFLA_IPTUN_ENCAP_TYPE,
462 tunnel->encap.type) ||
463 nla_put_u16(skb, IFLA_IPTUN_ENCAP_SPORT,
464 tunnel->encap.sport) ||
465 nla_put_u16(skb, IFLA_IPTUN_ENCAP_DPORT,
466 tunnel->encap.dport) ||
467 nla_put_u16(skb, IFLA_IPTUN_ENCAP_FLAGS,
468 tunnel->encap.dport))
469 goto nla_put_failure;
470
397 return 0; 471 return 0;
398 472
399nla_put_failure: 473nla_put_failure:
@@ -407,6 +481,10 @@ static const struct nla_policy ipip_policy[IFLA_IPTUN_MAX + 1] = {
407 [IFLA_IPTUN_TTL] = { .type = NLA_U8 }, 481 [IFLA_IPTUN_TTL] = { .type = NLA_U8 },
408 [IFLA_IPTUN_TOS] = { .type = NLA_U8 }, 482 [IFLA_IPTUN_TOS] = { .type = NLA_U8 },
409 [IFLA_IPTUN_PMTUDISC] = { .type = NLA_U8 }, 483 [IFLA_IPTUN_PMTUDISC] = { .type = NLA_U8 },
484 [IFLA_IPTUN_ENCAP_TYPE] = { .type = NLA_U16 },
485 [IFLA_IPTUN_ENCAP_FLAGS] = { .type = NLA_U16 },
486 [IFLA_IPTUN_ENCAP_SPORT] = { .type = NLA_U16 },
487 [IFLA_IPTUN_ENCAP_DPORT] = { .type = NLA_U16 },
410}; 488};
411 489
412static struct rtnl_link_ops ipip_link_ops __read_mostly = { 490static struct rtnl_link_ops ipip_link_ops __read_mostly = {
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 7cbcaf4f0194..4c019d5c3f57 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -61,18 +61,13 @@ config NFT_CHAIN_ROUTE_IPV4
61 fields such as the source, destination, type of service and 61 fields such as the source, destination, type of service and
62 the packet mark. 62 the packet mark.
63 63
64config NFT_CHAIN_NAT_IPV4 64config NF_REJECT_IPV4
65 depends on NF_TABLES_IPV4 65 tristate "IPv4 packet rejection"
66 depends on NF_NAT_IPV4 && NFT_NAT 66 default m if NETFILTER_ADVANCED=n
67 tristate "IPv4 nf_tables nat chain support"
68 help
69 This option enables the "nat" chain for IPv4 in nf_tables. This
70 chain type is used to perform Network Address Translation (NAT)
71 packet transformations such as the source, destination address and
72 source and destination ports.
73 67
74config NFT_REJECT_IPV4 68config NFT_REJECT_IPV4
75 depends on NF_TABLES_IPV4 69 depends on NF_TABLES_IPV4
70 select NF_REJECT_IPV4
76 default NFT_REJECT 71 default NFT_REJECT
77 tristate 72 tristate
78 73
@@ -94,6 +89,30 @@ config NF_NAT_IPV4
94 89
95if NF_NAT_IPV4 90if NF_NAT_IPV4
96 91
92config NFT_CHAIN_NAT_IPV4
93 depends on NF_TABLES_IPV4
94 tristate "IPv4 nf_tables nat chain support"
95 help
96 This option enables the "nat" chain for IPv4 in nf_tables. This
97 chain type is used to perform Network Address Translation (NAT)
98 packet transformations such as the source, destination address and
99 source and destination ports.
100
101config NF_NAT_MASQUERADE_IPV4
102 tristate "IPv4 masquerade support"
103 help
104 This is the kernel functionality to provide NAT in the masquerade
105 flavour (automatic source address selection).
106
107config NFT_MASQ_IPV4
108 tristate "IPv4 masquerading support for nf_tables"
109 depends on NF_TABLES_IPV4
110 depends on NFT_MASQ
111 select NF_NAT_MASQUERADE_IPV4
112 help
113 This is the expression that provides IPv4 masquerading support for
114 nf_tables.
115
97config NF_NAT_SNMP_BASIC 116config NF_NAT_SNMP_BASIC
98 tristate "Basic SNMP-ALG support" 117 tristate "Basic SNMP-ALG support"
99 depends on NF_CONNTRACK_SNMP 118 depends on NF_CONNTRACK_SNMP
@@ -194,6 +213,7 @@ config IP_NF_FILTER
194config IP_NF_TARGET_REJECT 213config IP_NF_TARGET_REJECT
195 tristate "REJECT target support" 214 tristate "REJECT target support"
196 depends on IP_NF_FILTER 215 depends on IP_NF_FILTER
216 select NF_REJECT_IPV4
197 default m if NETFILTER_ADVANCED=n 217 default m if NETFILTER_ADVANCED=n
198 help 218 help
199 The REJECT target allows a filtering rule to specify that an ICMP 219 The REJECT target allows a filtering rule to specify that an ICMP
@@ -234,6 +254,7 @@ if IP_NF_NAT
234 254
235config IP_NF_TARGET_MASQUERADE 255config IP_NF_TARGET_MASQUERADE
236 tristate "MASQUERADE target support" 256 tristate "MASQUERADE target support"
257 select NF_NAT_MASQUERADE_IPV4
237 default m if NETFILTER_ADVANCED=n 258 default m if NETFILTER_ADVANCED=n
238 help 259 help
239 Masquerading is a special case of NAT: all outgoing connections are 260 Masquerading is a special case of NAT: all outgoing connections are
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index edf4af32e9f2..f4cef5af0969 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -23,10 +23,14 @@ obj-$(CONFIG_NF_DEFRAG_IPV4) += nf_defrag_ipv4.o
23obj-$(CONFIG_NF_LOG_ARP) += nf_log_arp.o 23obj-$(CONFIG_NF_LOG_ARP) += nf_log_arp.o
24obj-$(CONFIG_NF_LOG_IPV4) += nf_log_ipv4.o 24obj-$(CONFIG_NF_LOG_IPV4) += nf_log_ipv4.o
25 25
26# reject
27obj-$(CONFIG_NF_REJECT_IPV4) += nf_reject_ipv4.o
28
26# NAT helpers (nf_conntrack) 29# NAT helpers (nf_conntrack)
27obj-$(CONFIG_NF_NAT_H323) += nf_nat_h323.o 30obj-$(CONFIG_NF_NAT_H323) += nf_nat_h323.o
28obj-$(CONFIG_NF_NAT_PPTP) += nf_nat_pptp.o 31obj-$(CONFIG_NF_NAT_PPTP) += nf_nat_pptp.o
29obj-$(CONFIG_NF_NAT_SNMP_BASIC) += nf_nat_snmp_basic.o 32obj-$(CONFIG_NF_NAT_SNMP_BASIC) += nf_nat_snmp_basic.o
33obj-$(CONFIG_NF_NAT_MASQUERADE_IPV4) += nf_nat_masquerade_ipv4.o
30 34
31# NAT protocols (nf_nat) 35# NAT protocols (nf_nat)
32obj-$(CONFIG_NF_NAT_PROTO_GRE) += nf_nat_proto_gre.o 36obj-$(CONFIG_NF_NAT_PROTO_GRE) += nf_nat_proto_gre.o
@@ -35,6 +39,7 @@ obj-$(CONFIG_NF_TABLES_IPV4) += nf_tables_ipv4.o
35obj-$(CONFIG_NFT_CHAIN_ROUTE_IPV4) += nft_chain_route_ipv4.o 39obj-$(CONFIG_NFT_CHAIN_ROUTE_IPV4) += nft_chain_route_ipv4.o
36obj-$(CONFIG_NFT_CHAIN_NAT_IPV4) += nft_chain_nat_ipv4.o 40obj-$(CONFIG_NFT_CHAIN_NAT_IPV4) += nft_chain_nat_ipv4.o
37obj-$(CONFIG_NFT_REJECT_IPV4) += nft_reject_ipv4.o 41obj-$(CONFIG_NFT_REJECT_IPV4) += nft_reject_ipv4.o
42obj-$(CONFIG_NFT_MASQ_IPV4) += nft_masq_ipv4.o
38obj-$(CONFIG_NF_TABLES_ARP) += nf_tables_arp.o 43obj-$(CONFIG_NF_TABLES_ARP) += nf_tables_arp.o
39 44
40# generic IP tables 45# generic IP tables
diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 2510c02c2d21..e90f83a3415b 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -285,7 +285,7 @@ clusterip_hashfn(const struct sk_buff *skb,
285 } 285 }
286 286
287 /* node numbers are 1..n, not 0..n */ 287 /* node numbers are 1..n, not 0..n */
288 return (((u64)hashval * config->num_total_nodes) >> 32) + 1; 288 return reciprocal_scale(hashval, config->num_total_nodes) + 1;
289} 289}
290 290
291static inline int 291static inline int
diff --git a/net/ipv4/netfilter/ipt_MASQUERADE.c b/net/ipv4/netfilter/ipt_MASQUERADE.c
index 00352ce0f0de..da7f02a0b868 100644
--- a/net/ipv4/netfilter/ipt_MASQUERADE.c
+++ b/net/ipv4/netfilter/ipt_MASQUERADE.c
@@ -22,6 +22,7 @@
22#include <linux/netfilter_ipv4.h> 22#include <linux/netfilter_ipv4.h>
23#include <linux/netfilter/x_tables.h> 23#include <linux/netfilter/x_tables.h>
24#include <net/netfilter/nf_nat.h> 24#include <net/netfilter/nf_nat.h>
25#include <net/netfilter/ipv4/nf_nat_masquerade.h>
25 26
26MODULE_LICENSE("GPL"); 27MODULE_LICENSE("GPL");
27MODULE_AUTHOR("Netfilter Core Team <coreteam@netfilter.org>"); 28MODULE_AUTHOR("Netfilter Core Team <coreteam@netfilter.org>");
@@ -46,103 +47,17 @@ static int masquerade_tg_check(const struct xt_tgchk_param *par)
46static unsigned int 47static unsigned int
47masquerade_tg(struct sk_buff *skb, const struct xt_action_param *par) 48masquerade_tg(struct sk_buff *skb, const struct xt_action_param *par)
48{ 49{
49 struct nf_conn *ct; 50 struct nf_nat_range range;
50 struct nf_conn_nat *nat;
51 enum ip_conntrack_info ctinfo;
52 struct nf_nat_range newrange;
53 const struct nf_nat_ipv4_multi_range_compat *mr; 51 const struct nf_nat_ipv4_multi_range_compat *mr;
54 const struct rtable *rt;
55 __be32 newsrc, nh;
56
57 NF_CT_ASSERT(par->hooknum == NF_INET_POST_ROUTING);
58
59 ct = nf_ct_get(skb, &ctinfo);
60 nat = nfct_nat(ct);
61
62 NF_CT_ASSERT(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED ||
63 ctinfo == IP_CT_RELATED_REPLY));
64
65 /* Source address is 0.0.0.0 - locally generated packet that is
66 * probably not supposed to be masqueraded.
67 */
68 if (ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip == 0)
69 return NF_ACCEPT;
70 52
71 mr = par->targinfo; 53 mr = par->targinfo;
72 rt = skb_rtable(skb); 54 range.flags = mr->range[0].flags;
73 nh = rt_nexthop(rt, ip_hdr(skb)->daddr); 55 range.min_proto = mr->range[0].min;
74 newsrc = inet_select_addr(par->out, nh, RT_SCOPE_UNIVERSE); 56 range.max_proto = mr->range[0].max;
75 if (!newsrc) {
76 pr_info("%s ate my IP address\n", par->out->name);
77 return NF_DROP;
78 }
79
80 nat->masq_index = par->out->ifindex;
81
82 /* Transfer from original range. */
83 memset(&newrange.min_addr, 0, sizeof(newrange.min_addr));
84 memset(&newrange.max_addr, 0, sizeof(newrange.max_addr));
85 newrange.flags = mr->range[0].flags | NF_NAT_RANGE_MAP_IPS;
86 newrange.min_addr.ip = newsrc;
87 newrange.max_addr.ip = newsrc;
88 newrange.min_proto = mr->range[0].min;
89 newrange.max_proto = mr->range[0].max;
90 57
91 /* Hand modified range to generic setup. */ 58 return nf_nat_masquerade_ipv4(skb, par->hooknum, &range, par->out);
92 return nf_nat_setup_info(ct, &newrange, NF_NAT_MANIP_SRC);
93} 59}
94 60
95static int
96device_cmp(struct nf_conn *i, void *ifindex)
97{
98 const struct nf_conn_nat *nat = nfct_nat(i);
99
100 if (!nat)
101 return 0;
102 if (nf_ct_l3num(i) != NFPROTO_IPV4)
103 return 0;
104 return nat->masq_index == (int)(long)ifindex;
105}
106
107static int masq_device_event(struct notifier_block *this,
108 unsigned long event,
109 void *ptr)
110{
111 const struct net_device *dev = netdev_notifier_info_to_dev(ptr);
112 struct net *net = dev_net(dev);
113
114 if (event == NETDEV_DOWN) {
115 /* Device was downed. Search entire table for
116 conntracks which were associated with that device,
117 and forget them. */
118 NF_CT_ASSERT(dev->ifindex != 0);
119
120 nf_ct_iterate_cleanup(net, device_cmp,
121 (void *)(long)dev->ifindex, 0, 0);
122 }
123
124 return NOTIFY_DONE;
125}
126
127static int masq_inet_event(struct notifier_block *this,
128 unsigned long event,
129 void *ptr)
130{
131 struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev;
132 struct netdev_notifier_info info;
133
134 netdev_notifier_info_init(&info, dev);
135 return masq_device_event(this, event, &info);
136}
137
138static struct notifier_block masq_dev_notifier = {
139 .notifier_call = masq_device_event,
140};
141
142static struct notifier_block masq_inet_notifier = {
143 .notifier_call = masq_inet_event,
144};
145
146static struct xt_target masquerade_tg_reg __read_mostly = { 61static struct xt_target masquerade_tg_reg __read_mostly = {
147 .name = "MASQUERADE", 62 .name = "MASQUERADE",
148 .family = NFPROTO_IPV4, 63 .family = NFPROTO_IPV4,
@@ -160,12 +75,8 @@ static int __init masquerade_tg_init(void)
160 75
161 ret = xt_register_target(&masquerade_tg_reg); 76 ret = xt_register_target(&masquerade_tg_reg);
162 77
163 if (ret == 0) { 78 if (ret == 0)
164 /* Register for device down reports */ 79 nf_nat_masquerade_ipv4_register_notifier();
165 register_netdevice_notifier(&masq_dev_notifier);
166 /* Register IP address change reports */
167 register_inetaddr_notifier(&masq_inet_notifier);
168 }
169 80
170 return ret; 81 return ret;
171} 82}
@@ -173,8 +84,7 @@ static int __init masquerade_tg_init(void)
173static void __exit masquerade_tg_exit(void) 84static void __exit masquerade_tg_exit(void)
174{ 85{
175 xt_unregister_target(&masquerade_tg_reg); 86 xt_unregister_target(&masquerade_tg_reg);
176 unregister_netdevice_notifier(&masq_dev_notifier); 87 nf_nat_masquerade_ipv4_unregister_notifier();
177 unregister_inetaddr_notifier(&masq_inet_notifier);
178} 88}
179 89
180module_init(masquerade_tg_init); 90module_init(masquerade_tg_init);
diff --git a/net/ipv4/netfilter/ipt_REJECT.c b/net/ipv4/netfilter/ipt_REJECT.c
index 5b6e0df4ccff..8f48f5517e33 100644
--- a/net/ipv4/netfilter/ipt_REJECT.c
+++ b/net/ipv4/netfilter/ipt_REJECT.c
@@ -20,7 +20,7 @@
20#include <linux/netfilter/x_tables.h> 20#include <linux/netfilter/x_tables.h>
21#include <linux/netfilter_ipv4/ip_tables.h> 21#include <linux/netfilter_ipv4/ip_tables.h>
22#include <linux/netfilter_ipv4/ipt_REJECT.h> 22#include <linux/netfilter_ipv4/ipt_REJECT.h>
23#ifdef CONFIG_BRIDGE_NETFILTER 23#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
24#include <linux/netfilter_bridge.h> 24#include <linux/netfilter_bridge.h>
25#endif 25#endif
26 26
diff --git a/net/ipv4/netfilter/iptable_nat.c b/net/ipv4/netfilter/iptable_nat.c
index f1787c04a4dd..6b67d7e9a75d 100644
--- a/net/ipv4/netfilter/iptable_nat.c
+++ b/net/ipv4/netfilter/iptable_nat.c
@@ -28,222 +28,57 @@ static const struct xt_table nf_nat_ipv4_table = {
28 .af = NFPROTO_IPV4, 28 .af = NFPROTO_IPV4,
29}; 29};
30 30
31static unsigned int alloc_null_binding(struct nf_conn *ct, unsigned int hooknum) 31static unsigned int iptable_nat_do_chain(const struct nf_hook_ops *ops,
32{ 32 struct sk_buff *skb,
33 /* Force range to this IP; let proto decide mapping for 33 const struct net_device *in,
34 * per-proto parts (hence not IP_NAT_RANGE_PROTO_SPECIFIED). 34 const struct net_device *out,
35 */ 35 struct nf_conn *ct)
36 struct nf_nat_range range;
37
38 range.flags = 0;
39 pr_debug("Allocating NULL binding for %p (%pI4)\n", ct,
40 HOOK2MANIP(hooknum) == NF_NAT_MANIP_SRC ?
41 &ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3.ip :
42 &ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3.ip);
43
44 return nf_nat_setup_info(ct, &range, HOOK2MANIP(hooknum));
45}
46
47static unsigned int nf_nat_rule_find(struct sk_buff *skb, unsigned int hooknum,
48 const struct net_device *in,
49 const struct net_device *out,
50 struct nf_conn *ct)
51{ 36{
52 struct net *net = nf_ct_net(ct); 37 struct net *net = nf_ct_net(ct);
53 unsigned int ret;
54 38
55 ret = ipt_do_table(skb, hooknum, in, out, net->ipv4.nat_table); 39 return ipt_do_table(skb, ops->hooknum, in, out, net->ipv4.nat_table);
56 if (ret == NF_ACCEPT) {
57 if (!nf_nat_initialized(ct, HOOK2MANIP(hooknum)))
58 ret = alloc_null_binding(ct, hooknum);
59 }
60 return ret;
61} 40}
62 41
63static unsigned int 42static unsigned int iptable_nat_ipv4_fn(const struct nf_hook_ops *ops,
64nf_nat_ipv4_fn(const struct nf_hook_ops *ops, 43 struct sk_buff *skb,
65 struct sk_buff *skb, 44 const struct net_device *in,
66 const struct net_device *in, 45 const struct net_device *out,
67 const struct net_device *out, 46 int (*okfn)(struct sk_buff *))
68 int (*okfn)(struct sk_buff *))
69{ 47{
70 struct nf_conn *ct; 48 return nf_nat_ipv4_fn(ops, skb, in, out, iptable_nat_do_chain);
71 enum ip_conntrack_info ctinfo;
72 struct nf_conn_nat *nat;
73 /* maniptype == SRC for postrouting. */
74 enum nf_nat_manip_type maniptype = HOOK2MANIP(ops->hooknum);
75
76 /* We never see fragments: conntrack defrags on pre-routing
77 * and local-out, and nf_nat_out protects post-routing.
78 */
79 NF_CT_ASSERT(!ip_is_fragment(ip_hdr(skb)));
80
81 ct = nf_ct_get(skb, &ctinfo);
82 /* Can't track? It's not due to stress, or conntrack would
83 * have dropped it. Hence it's the user's responsibilty to
84 * packet filter it out, or implement conntrack/NAT for that
85 * protocol. 8) --RR
86 */
87 if (!ct)
88 return NF_ACCEPT;
89
90 /* Don't try to NAT if this packet is not conntracked */
91 if (nf_ct_is_untracked(ct))
92 return NF_ACCEPT;
93
94 nat = nf_ct_nat_ext_add(ct);
95 if (nat == NULL)
96 return NF_ACCEPT;
97
98 switch (ctinfo) {
99 case IP_CT_RELATED:
100 case IP_CT_RELATED_REPLY:
101 if (ip_hdr(skb)->protocol == IPPROTO_ICMP) {
102 if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
103 ops->hooknum))
104 return NF_DROP;
105 else
106 return NF_ACCEPT;
107 }
108 /* Fall thru... (Only ICMPs can be IP_CT_IS_REPLY) */
109 case IP_CT_NEW:
110 /* Seen it before? This can happen for loopback, retrans,
111 * or local packets.
112 */
113 if (!nf_nat_initialized(ct, maniptype)) {
114 unsigned int ret;
115
116 ret = nf_nat_rule_find(skb, ops->hooknum, in, out, ct);
117 if (ret != NF_ACCEPT)
118 return ret;
119 } else {
120 pr_debug("Already setup manip %s for ct %p\n",
121 maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST",
122 ct);
123 if (nf_nat_oif_changed(ops->hooknum, ctinfo, nat, out))
124 goto oif_changed;
125 }
126 break;
127
128 default:
129 /* ESTABLISHED */
130 NF_CT_ASSERT(ctinfo == IP_CT_ESTABLISHED ||
131 ctinfo == IP_CT_ESTABLISHED_REPLY);
132 if (nf_nat_oif_changed(ops->hooknum, ctinfo, nat, out))
133 goto oif_changed;
134 }
135
136 return nf_nat_packet(ct, ctinfo, ops->hooknum, skb);
137
138oif_changed:
139 nf_ct_kill_acct(ct, ctinfo, skb);
140 return NF_DROP;
141} 49}
142 50
143static unsigned int 51static unsigned int iptable_nat_ipv4_in(const struct nf_hook_ops *ops,
144nf_nat_ipv4_in(const struct nf_hook_ops *ops, 52 struct sk_buff *skb,
145 struct sk_buff *skb, 53 const struct net_device *in,
146 const struct net_device *in, 54 const struct net_device *out,
147 const struct net_device *out, 55 int (*okfn)(struct sk_buff *))
148 int (*okfn)(struct sk_buff *))
149{ 56{
150 unsigned int ret; 57 return nf_nat_ipv4_in(ops, skb, in, out, iptable_nat_do_chain);
151 __be32 daddr = ip_hdr(skb)->daddr;
152
153 ret = nf_nat_ipv4_fn(ops, skb, in, out, okfn);
154 if (ret != NF_DROP && ret != NF_STOLEN &&
155 daddr != ip_hdr(skb)->daddr)
156 skb_dst_drop(skb);
157
158 return ret;
159} 58}
160 59
161static unsigned int 60static unsigned int iptable_nat_ipv4_out(const struct nf_hook_ops *ops,
162nf_nat_ipv4_out(const struct nf_hook_ops *ops, 61 struct sk_buff *skb,
163 struct sk_buff *skb, 62 const struct net_device *in,
164 const struct net_device *in, 63 const struct net_device *out,
165 const struct net_device *out, 64 int (*okfn)(struct sk_buff *))
166 int (*okfn)(struct sk_buff *))
167{ 65{
168#ifdef CONFIG_XFRM 66 return nf_nat_ipv4_out(ops, skb, in, out, iptable_nat_do_chain);
169 const struct nf_conn *ct;
170 enum ip_conntrack_info ctinfo;
171 int err;
172#endif
173 unsigned int ret;
174
175 /* root is playing with raw sockets. */
176 if (skb->len < sizeof(struct iphdr) ||
177 ip_hdrlen(skb) < sizeof(struct iphdr))
178 return NF_ACCEPT;
179
180 ret = nf_nat_ipv4_fn(ops, skb, in, out, okfn);
181#ifdef CONFIG_XFRM
182 if (ret != NF_DROP && ret != NF_STOLEN &&
183 !(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
184 (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
185 enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
186
187 if ((ct->tuplehash[dir].tuple.src.u3.ip !=
188 ct->tuplehash[!dir].tuple.dst.u3.ip) ||
189 (ct->tuplehash[dir].tuple.dst.protonum != IPPROTO_ICMP &&
190 ct->tuplehash[dir].tuple.src.u.all !=
191 ct->tuplehash[!dir].tuple.dst.u.all)) {
192 err = nf_xfrm_me_harder(skb, AF_INET);
193 if (err < 0)
194 ret = NF_DROP_ERR(err);
195 }
196 }
197#endif
198 return ret;
199} 67}
200 68
201static unsigned int 69static unsigned int iptable_nat_ipv4_local_fn(const struct nf_hook_ops *ops,
202nf_nat_ipv4_local_fn(const struct nf_hook_ops *ops, 70 struct sk_buff *skb,
203 struct sk_buff *skb, 71 const struct net_device *in,
204 const struct net_device *in, 72 const struct net_device *out,
205 const struct net_device *out, 73 int (*okfn)(struct sk_buff *))
206 int (*okfn)(struct sk_buff *))
207{ 74{
208 const struct nf_conn *ct; 75 return nf_nat_ipv4_local_fn(ops, skb, in, out, iptable_nat_do_chain);
209 enum ip_conntrack_info ctinfo;
210 unsigned int ret;
211 int err;
212
213 /* root is playing with raw sockets. */
214 if (skb->len < sizeof(struct iphdr) ||
215 ip_hdrlen(skb) < sizeof(struct iphdr))
216 return NF_ACCEPT;
217
218 ret = nf_nat_ipv4_fn(ops, skb, in, out, okfn);
219 if (ret != NF_DROP && ret != NF_STOLEN &&
220 (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
221 enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
222
223 if (ct->tuplehash[dir].tuple.dst.u3.ip !=
224 ct->tuplehash[!dir].tuple.src.u3.ip) {
225 err = ip_route_me_harder(skb, RTN_UNSPEC);
226 if (err < 0)
227 ret = NF_DROP_ERR(err);
228 }
229#ifdef CONFIG_XFRM
230 else if (!(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
231 ct->tuplehash[dir].tuple.dst.protonum != IPPROTO_ICMP &&
232 ct->tuplehash[dir].tuple.dst.u.all !=
233 ct->tuplehash[!dir].tuple.src.u.all) {
234 err = nf_xfrm_me_harder(skb, AF_INET);
235 if (err < 0)
236 ret = NF_DROP_ERR(err);
237 }
238#endif
239 }
240 return ret;
241} 76}
242 77
243static struct nf_hook_ops nf_nat_ipv4_ops[] __read_mostly = { 78static struct nf_hook_ops nf_nat_ipv4_ops[] __read_mostly = {
244 /* Before packet filtering, change destination */ 79 /* Before packet filtering, change destination */
245 { 80 {
246 .hook = nf_nat_ipv4_in, 81 .hook = iptable_nat_ipv4_in,
247 .owner = THIS_MODULE, 82 .owner = THIS_MODULE,
248 .pf = NFPROTO_IPV4, 83 .pf = NFPROTO_IPV4,
249 .hooknum = NF_INET_PRE_ROUTING, 84 .hooknum = NF_INET_PRE_ROUTING,
@@ -251,7 +86,7 @@ static struct nf_hook_ops nf_nat_ipv4_ops[] __read_mostly = {
251 }, 86 },
252 /* After packet filtering, change source */ 87 /* After packet filtering, change source */
253 { 88 {
254 .hook = nf_nat_ipv4_out, 89 .hook = iptable_nat_ipv4_out,
255 .owner = THIS_MODULE, 90 .owner = THIS_MODULE,
256 .pf = NFPROTO_IPV4, 91 .pf = NFPROTO_IPV4,
257 .hooknum = NF_INET_POST_ROUTING, 92 .hooknum = NF_INET_POST_ROUTING,
@@ -259,7 +94,7 @@ static struct nf_hook_ops nf_nat_ipv4_ops[] __read_mostly = {
259 }, 94 },
260 /* Before packet filtering, change destination */ 95 /* Before packet filtering, change destination */
261 { 96 {
262 .hook = nf_nat_ipv4_local_fn, 97 .hook = iptable_nat_ipv4_local_fn,
263 .owner = THIS_MODULE, 98 .owner = THIS_MODULE,
264 .pf = NFPROTO_IPV4, 99 .pf = NFPROTO_IPV4,
265 .hooknum = NF_INET_LOCAL_OUT, 100 .hooknum = NF_INET_LOCAL_OUT,
@@ -267,7 +102,7 @@ static struct nf_hook_ops nf_nat_ipv4_ops[] __read_mostly = {
267 }, 102 },
268 /* After packet filtering, change source */ 103 /* After packet filtering, change source */
269 { 104 {
270 .hook = nf_nat_ipv4_fn, 105 .hook = iptable_nat_ipv4_fn,
271 .owner = THIS_MODULE, 106 .owner = THIS_MODULE,
272 .pf = NFPROTO_IPV4, 107 .pf = NFPROTO_IPV4,
273 .hooknum = NF_INET_LOCAL_IN, 108 .hooknum = NF_INET_LOCAL_IN,
diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
index 76bd1aef257f..7e5ca6f2d0cd 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -50,7 +50,7 @@ static enum ip_defrag_users nf_ct_defrag_user(unsigned int hooknum,
50 zone = nf_ct_zone((struct nf_conn *)skb->nfct); 50 zone = nf_ct_zone((struct nf_conn *)skb->nfct);
51#endif 51#endif
52 52
53#ifdef CONFIG_BRIDGE_NETFILTER 53#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
54 if (skb->nf_bridge && 54 if (skb->nf_bridge &&
55 skb->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING) 55 skb->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING)
56 return IP_DEFRAG_CONNTRACK_BRIDGE_IN + zone; 56 return IP_DEFRAG_CONNTRACK_BRIDGE_IN + zone;
diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
index 14f5ccd06337..fc37711e11f3 100644
--- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
@@ -254,6 +254,205 @@ int nf_nat_icmp_reply_translation(struct sk_buff *skb,
254} 254}
255EXPORT_SYMBOL_GPL(nf_nat_icmp_reply_translation); 255EXPORT_SYMBOL_GPL(nf_nat_icmp_reply_translation);
256 256
257unsigned int
258nf_nat_ipv4_fn(const struct nf_hook_ops *ops, struct sk_buff *skb,
259 const struct net_device *in, const struct net_device *out,
260 unsigned int (*do_chain)(const struct nf_hook_ops *ops,
261 struct sk_buff *skb,
262 const struct net_device *in,
263 const struct net_device *out,
264 struct nf_conn *ct))
265{
266 struct nf_conn *ct;
267 enum ip_conntrack_info ctinfo;
268 struct nf_conn_nat *nat;
269 /* maniptype == SRC for postrouting. */
270 enum nf_nat_manip_type maniptype = HOOK2MANIP(ops->hooknum);
271
272 /* We never see fragments: conntrack defrags on pre-routing
273 * and local-out, and nf_nat_out protects post-routing.
274 */
275 NF_CT_ASSERT(!ip_is_fragment(ip_hdr(skb)));
276
277 ct = nf_ct_get(skb, &ctinfo);
278 /* Can't track? It's not due to stress, or conntrack would
279 * have dropped it. Hence it's the user's responsibilty to
280 * packet filter it out, or implement conntrack/NAT for that
281 * protocol. 8) --RR
282 */
283 if (!ct)
284 return NF_ACCEPT;
285
286 /* Don't try to NAT if this packet is not conntracked */
287 if (nf_ct_is_untracked(ct))
288 return NF_ACCEPT;
289
290 nat = nf_ct_nat_ext_add(ct);
291 if (nat == NULL)
292 return NF_ACCEPT;
293
294 switch (ctinfo) {
295 case IP_CT_RELATED:
296 case IP_CT_RELATED_REPLY:
297 if (ip_hdr(skb)->protocol == IPPROTO_ICMP) {
298 if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
299 ops->hooknum))
300 return NF_DROP;
301 else
302 return NF_ACCEPT;
303 }
304 /* Fall thru... (Only ICMPs can be IP_CT_IS_REPLY) */
305 case IP_CT_NEW:
306 /* Seen it before? This can happen for loopback, retrans,
307 * or local packets.
308 */
309 if (!nf_nat_initialized(ct, maniptype)) {
310 unsigned int ret;
311
312 ret = do_chain(ops, skb, in, out, ct);
313 if (ret != NF_ACCEPT)
314 return ret;
315
316 if (nf_nat_initialized(ct, HOOK2MANIP(ops->hooknum)))
317 break;
318
319 ret = nf_nat_alloc_null_binding(ct, ops->hooknum);
320 if (ret != NF_ACCEPT)
321 return ret;
322 } else {
323 pr_debug("Already setup manip %s for ct %p\n",
324 maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST",
325 ct);
326 if (nf_nat_oif_changed(ops->hooknum, ctinfo, nat, out))
327 goto oif_changed;
328 }
329 break;
330
331 default:
332 /* ESTABLISHED */
333 NF_CT_ASSERT(ctinfo == IP_CT_ESTABLISHED ||
334 ctinfo == IP_CT_ESTABLISHED_REPLY);
335 if (nf_nat_oif_changed(ops->hooknum, ctinfo, nat, out))
336 goto oif_changed;
337 }
338
339 return nf_nat_packet(ct, ctinfo, ops->hooknum, skb);
340
341oif_changed:
342 nf_ct_kill_acct(ct, ctinfo, skb);
343 return NF_DROP;
344}
345EXPORT_SYMBOL_GPL(nf_nat_ipv4_fn);
346
347unsigned int
348nf_nat_ipv4_in(const struct nf_hook_ops *ops, struct sk_buff *skb,
349 const struct net_device *in, const struct net_device *out,
350 unsigned int (*do_chain)(const struct nf_hook_ops *ops,
351 struct sk_buff *skb,
352 const struct net_device *in,
353 const struct net_device *out,
354 struct nf_conn *ct))
355{
356 unsigned int ret;
357 __be32 daddr = ip_hdr(skb)->daddr;
358
359 ret = nf_nat_ipv4_fn(ops, skb, in, out, do_chain);
360 if (ret != NF_DROP && ret != NF_STOLEN &&
361 daddr != ip_hdr(skb)->daddr)
362 skb_dst_drop(skb);
363
364 return ret;
365}
366EXPORT_SYMBOL_GPL(nf_nat_ipv4_in);
367
368unsigned int
369nf_nat_ipv4_out(const struct nf_hook_ops *ops, struct sk_buff *skb,
370 const struct net_device *in, const struct net_device *out,
371 unsigned int (*do_chain)(const struct nf_hook_ops *ops,
372 struct sk_buff *skb,
373 const struct net_device *in,
374 const struct net_device *out,
375 struct nf_conn *ct))
376{
377#ifdef CONFIG_XFRM
378 const struct nf_conn *ct;
379 enum ip_conntrack_info ctinfo;
380 int err;
381#endif
382 unsigned int ret;
383
384 /* root is playing with raw sockets. */
385 if (skb->len < sizeof(struct iphdr) ||
386 ip_hdrlen(skb) < sizeof(struct iphdr))
387 return NF_ACCEPT;
388
389 ret = nf_nat_ipv4_fn(ops, skb, in, out, do_chain);
390#ifdef CONFIG_XFRM
391 if (ret != NF_DROP && ret != NF_STOLEN &&
392 !(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
393 (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
394 enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
395
396 if ((ct->tuplehash[dir].tuple.src.u3.ip !=
397 ct->tuplehash[!dir].tuple.dst.u3.ip) ||
398 (ct->tuplehash[dir].tuple.dst.protonum != IPPROTO_ICMP &&
399 ct->tuplehash[dir].tuple.src.u.all !=
400 ct->tuplehash[!dir].tuple.dst.u.all)) {
401 err = nf_xfrm_me_harder(skb, AF_INET);
402 if (err < 0)
403 ret = NF_DROP_ERR(err);
404 }
405 }
406#endif
407 return ret;
408}
409EXPORT_SYMBOL_GPL(nf_nat_ipv4_out);
410
411unsigned int
412nf_nat_ipv4_local_fn(const struct nf_hook_ops *ops, struct sk_buff *skb,
413 const struct net_device *in, const struct net_device *out,
414 unsigned int (*do_chain)(const struct nf_hook_ops *ops,
415 struct sk_buff *skb,
416 const struct net_device *in,
417 const struct net_device *out,
418 struct nf_conn *ct))
419{
420 const struct nf_conn *ct;
421 enum ip_conntrack_info ctinfo;
422 unsigned int ret;
423 int err;
424
425 /* root is playing with raw sockets. */
426 if (skb->len < sizeof(struct iphdr) ||
427 ip_hdrlen(skb) < sizeof(struct iphdr))
428 return NF_ACCEPT;
429
430 ret = nf_nat_ipv4_fn(ops, skb, in, out, do_chain);
431 if (ret != NF_DROP && ret != NF_STOLEN &&
432 (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
433 enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
434
435 if (ct->tuplehash[dir].tuple.dst.u3.ip !=
436 ct->tuplehash[!dir].tuple.src.u3.ip) {
437 err = ip_route_me_harder(skb, RTN_UNSPEC);
438 if (err < 0)
439 ret = NF_DROP_ERR(err);
440 }
441#ifdef CONFIG_XFRM
442 else if (!(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
443 ct->tuplehash[dir].tuple.dst.protonum != IPPROTO_ICMP &&
444 ct->tuplehash[dir].tuple.dst.u.all !=
445 ct->tuplehash[!dir].tuple.src.u.all) {
446 err = nf_xfrm_me_harder(skb, AF_INET);
447 if (err < 0)
448 ret = NF_DROP_ERR(err);
449 }
450#endif
451 }
452 return ret;
453}
454EXPORT_SYMBOL_GPL(nf_nat_ipv4_local_fn);
455
257static int __init nf_nat_l3proto_ipv4_init(void) 456static int __init nf_nat_l3proto_ipv4_init(void)
258{ 457{
259 int err; 458 int err;
diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
new file mode 100644
index 000000000000..c6eb42100e9a
--- /dev/null
+++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
@@ -0,0 +1,153 @@
1/* (C) 1999-2001 Paul `Rusty' Russell
2 * (C) 2002-2006 Netfilter Core Team <coreteam@netfilter.org>
3 *
4 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License version 2 as
6 * published by the Free Software Foundation.
7 */
8
9#include <linux/types.h>
10#include <linux/module.h>
11#include <linux/atomic.h>
12#include <linux/inetdevice.h>
13#include <linux/ip.h>
14#include <linux/timer.h>
15#include <linux/netfilter.h>
16#include <net/protocol.h>
17#include <net/ip.h>
18#include <net/checksum.h>
19#include <net/route.h>
20#include <linux/netfilter_ipv4.h>
21#include <linux/netfilter/x_tables.h>
22#include <net/netfilter/nf_nat.h>
23#include <net/netfilter/ipv4/nf_nat_masquerade.h>
24
25unsigned int
26nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum,
27 const struct nf_nat_range *range,
28 const struct net_device *out)
29{
30 struct nf_conn *ct;
31 struct nf_conn_nat *nat;
32 enum ip_conntrack_info ctinfo;
33 struct nf_nat_range newrange;
34 const struct rtable *rt;
35 __be32 newsrc, nh;
36
37 NF_CT_ASSERT(hooknum == NF_INET_POST_ROUTING);
38
39 ct = nf_ct_get(skb, &ctinfo);
40 nat = nfct_nat(ct);
41
42 NF_CT_ASSERT(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED ||
43 ctinfo == IP_CT_RELATED_REPLY));
44
45 /* Source address is 0.0.0.0 - locally generated packet that is
46 * probably not supposed to be masqueraded.
47 */
48 if (ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip == 0)
49 return NF_ACCEPT;
50
51 rt = skb_rtable(skb);
52 nh = rt_nexthop(rt, ip_hdr(skb)->daddr);
53 newsrc = inet_select_addr(out, nh, RT_SCOPE_UNIVERSE);
54 if (!newsrc) {
55 pr_info("%s ate my IP address\n", out->name);
56 return NF_DROP;
57 }
58
59 nat->masq_index = out->ifindex;
60
61 /* Transfer from original range. */
62 memset(&newrange.min_addr, 0, sizeof(newrange.min_addr));
63 memset(&newrange.max_addr, 0, sizeof(newrange.max_addr));
64 newrange.flags = range->flags | NF_NAT_RANGE_MAP_IPS;
65 newrange.min_addr.ip = newsrc;
66 newrange.max_addr.ip = newsrc;
67 newrange.min_proto = range->min_proto;
68 newrange.max_proto = range->max_proto;
69
70 /* Hand modified range to generic setup. */
71 return nf_nat_setup_info(ct, &newrange, NF_NAT_MANIP_SRC);
72}
73EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv4);
74
75static int device_cmp(struct nf_conn *i, void *ifindex)
76{
77 const struct nf_conn_nat *nat = nfct_nat(i);
78
79 if (!nat)
80 return 0;
81 if (nf_ct_l3num(i) != NFPROTO_IPV4)
82 return 0;
83 return nat->masq_index == (int)(long)ifindex;
84}
85
86static int masq_device_event(struct notifier_block *this,
87 unsigned long event,
88 void *ptr)
89{
90 const struct net_device *dev = netdev_notifier_info_to_dev(ptr);
91 struct net *net = dev_net(dev);
92
93 if (event == NETDEV_DOWN) {
94 /* Device was downed. Search entire table for
95 * conntracks which were associated with that device,
96 * and forget them.
97 */
98 NF_CT_ASSERT(dev->ifindex != 0);
99
100 nf_ct_iterate_cleanup(net, device_cmp,
101 (void *)(long)dev->ifindex, 0, 0);
102 }
103
104 return NOTIFY_DONE;
105}
106
107static int masq_inet_event(struct notifier_block *this,
108 unsigned long event,
109 void *ptr)
110{
111 struct net_device *dev = ((struct in_ifaddr *)ptr)->ifa_dev->dev;
112 struct netdev_notifier_info info;
113
114 netdev_notifier_info_init(&info, dev);
115 return masq_device_event(this, event, &info);
116}
117
118static struct notifier_block masq_dev_notifier = {
119 .notifier_call = masq_device_event,
120};
121
122static struct notifier_block masq_inet_notifier = {
123 .notifier_call = masq_inet_event,
124};
125
126static atomic_t masquerade_notifier_refcount = ATOMIC_INIT(0);
127
128void nf_nat_masquerade_ipv4_register_notifier(void)
129{
130 /* check if the notifier was already set */
131 if (atomic_inc_return(&masquerade_notifier_refcount) > 1)
132 return;
133
134 /* Register for device down reports */
135 register_netdevice_notifier(&masq_dev_notifier);
136 /* Register IP address change reports */
137 register_inetaddr_notifier(&masq_inet_notifier);
138}
139EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv4_register_notifier);
140
141void nf_nat_masquerade_ipv4_unregister_notifier(void)
142{
143 /* check if the notifier still has clients */
144 if (atomic_dec_return(&masquerade_notifier_refcount) > 0)
145 return;
146
147 unregister_netdevice_notifier(&masq_dev_notifier);
148 unregister_inetaddr_notifier(&masq_inet_notifier);
149}
150EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv4_unregister_notifier);
151
152MODULE_LICENSE("GPL");
153MODULE_AUTHOR("Rusty Russell <rusty@rustcorp.com.au>");
diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c
new file mode 100644
index 000000000000..b023b4eb1a96
--- /dev/null
+++ b/net/ipv4/netfilter/nf_reject_ipv4.c
@@ -0,0 +1,127 @@
1/* (C) 1999-2001 Paul `Rusty' Russell
2 * (C) 2002-2004 Netfilter Core Team <coreteam@netfilter.org>
3 *
4 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License version 2 as
6 * published by the Free Software Foundation.
7 */
8
9#include <net/ip.h>
10#include <net/tcp.h>
11#include <net/route.h>
12#include <net/dst.h>
13#include <linux/netfilter_ipv4.h>
14
15/* Send RST reply */
16void nf_send_reset(struct sk_buff *oldskb, int hook)
17{
18 struct sk_buff *nskb;
19 const struct iphdr *oiph;
20 struct iphdr *niph;
21 const struct tcphdr *oth;
22 struct tcphdr _otcph, *tcph;
23
24 /* IP header checks: fragment. */
25 if (ip_hdr(oldskb)->frag_off & htons(IP_OFFSET))
26 return;
27
28 oth = skb_header_pointer(oldskb, ip_hdrlen(oldskb),
29 sizeof(_otcph), &_otcph);
30 if (oth == NULL)
31 return;
32
33 /* No RST for RST. */
34 if (oth->rst)
35 return;
36
37 if (skb_rtable(oldskb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
38 return;
39
40 /* Check checksum */
41 if (nf_ip_checksum(oldskb, hook, ip_hdrlen(oldskb), IPPROTO_TCP))
42 return;
43 oiph = ip_hdr(oldskb);
44
45 nskb = alloc_skb(sizeof(struct iphdr) + sizeof(struct tcphdr) +
46 LL_MAX_HEADER, GFP_ATOMIC);
47 if (!nskb)
48 return;
49
50 skb_reserve(nskb, LL_MAX_HEADER);
51
52 skb_reset_network_header(nskb);
53 niph = (struct iphdr *)skb_put(nskb, sizeof(struct iphdr));
54 niph->version = 4;
55 niph->ihl = sizeof(struct iphdr) / 4;
56 niph->tos = 0;
57 niph->id = 0;
58 niph->frag_off = htons(IP_DF);
59 niph->protocol = IPPROTO_TCP;
60 niph->check = 0;
61 niph->saddr = oiph->daddr;
62 niph->daddr = oiph->saddr;
63
64 skb_reset_transport_header(nskb);
65 tcph = (struct tcphdr *)skb_put(nskb, sizeof(struct tcphdr));
66 memset(tcph, 0, sizeof(*tcph));
67 tcph->source = oth->dest;
68 tcph->dest = oth->source;
69 tcph->doff = sizeof(struct tcphdr) / 4;
70
71 if (oth->ack)
72 tcph->seq = oth->ack_seq;
73 else {
74 tcph->ack_seq = htonl(ntohl(oth->seq) + oth->syn + oth->fin +
75 oldskb->len - ip_hdrlen(oldskb) -
76 (oth->doff << 2));
77 tcph->ack = 1;
78 }
79
80 tcph->rst = 1;
81 tcph->check = ~tcp_v4_check(sizeof(struct tcphdr), niph->saddr,
82 niph->daddr, 0);
83 nskb->ip_summed = CHECKSUM_PARTIAL;
84 nskb->csum_start = (unsigned char *)tcph - nskb->head;
85 nskb->csum_offset = offsetof(struct tcphdr, check);
86
87 /* ip_route_me_harder expects skb->dst to be set */
88 skb_dst_set_noref(nskb, skb_dst(oldskb));
89
90 nskb->protocol = htons(ETH_P_IP);
91 if (ip_route_me_harder(nskb, RTN_UNSPEC))
92 goto free_nskb;
93
94 niph->ttl = ip4_dst_hoplimit(skb_dst(nskb));
95
96 /* "Never happens" */
97 if (nskb->len > dst_mtu(skb_dst(nskb)))
98 goto free_nskb;
99
100 nf_ct_attach(nskb, oldskb);
101
102#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
103 /* If we use ip_local_out for bridged traffic, the MAC source on
104 * the RST will be ours, instead of the destination's. This confuses
105 * some routers/firewalls, and they drop the packet. So we need to
106 * build the eth header using the original destination's MAC as the
107 * source, and send the RST packet directly.
108 */
109 if (oldskb->nf_bridge) {
110 struct ethhdr *oeth = eth_hdr(oldskb);
111 nskb->dev = oldskb->nf_bridge->physindev;
112 niph->tot_len = htons(nskb->len);
113 ip_send_check(niph);
114 if (dev_hard_header(nskb, nskb->dev, ntohs(nskb->protocol),
115 oeth->h_source, oeth->h_dest, nskb->len) < 0)
116 goto free_nskb;
117 dev_queue_xmit(nskb);
118 } else
119#endif
120 ip_local_out(nskb);
121
122 return;
123
124 free_nskb:
125 kfree_skb(nskb);
126}
127EXPORT_SYMBOL_GPL(nf_send_reset);
diff --git a/net/ipv4/netfilter/nft_chain_nat_ipv4.c b/net/ipv4/netfilter/nft_chain_nat_ipv4.c
index 3964157d826c..df547bf50078 100644
--- a/net/ipv4/netfilter/nft_chain_nat_ipv4.c
+++ b/net/ipv4/netfilter/nft_chain_nat_ipv4.c
@@ -26,136 +26,53 @@
26#include <net/netfilter/nf_nat_l3proto.h> 26#include <net/netfilter/nf_nat_l3proto.h>
27#include <net/ip.h> 27#include <net/ip.h>
28 28
29/* 29static unsigned int nft_nat_do_chain(const struct nf_hook_ops *ops,
30 * NAT chains 30 struct sk_buff *skb,
31 */ 31 const struct net_device *in,
32 32 const struct net_device *out,
33static unsigned int nf_nat_fn(const struct nf_hook_ops *ops, 33 struct nf_conn *ct)
34 struct sk_buff *skb,
35 const struct net_device *in,
36 const struct net_device *out,
37 int (*okfn)(struct sk_buff *))
38{ 34{
39 enum ip_conntrack_info ctinfo;
40 struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
41 struct nf_conn_nat *nat;
42 enum nf_nat_manip_type maniptype = HOOK2MANIP(ops->hooknum);
43 struct nft_pktinfo pkt; 35 struct nft_pktinfo pkt;
44 unsigned int ret;
45
46 if (ct == NULL || nf_ct_is_untracked(ct))
47 return NF_ACCEPT;
48
49 NF_CT_ASSERT(!(ip_hdr(skb)->frag_off & htons(IP_MF | IP_OFFSET)));
50
51 nat = nf_ct_nat_ext_add(ct);
52 if (nat == NULL)
53 return NF_ACCEPT;
54
55 switch (ctinfo) {
56 case IP_CT_RELATED:
57 case IP_CT_RELATED + IP_CT_IS_REPLY:
58 if (ip_hdr(skb)->protocol == IPPROTO_ICMP) {
59 if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
60 ops->hooknum))
61 return NF_DROP;
62 else
63 return NF_ACCEPT;
64 }
65 /* Fall through */
66 case IP_CT_NEW:
67 if (nf_nat_initialized(ct, maniptype))
68 break;
69 36
70 nft_set_pktinfo_ipv4(&pkt, ops, skb, in, out); 37 nft_set_pktinfo_ipv4(&pkt, ops, skb, in, out);
71 38
72 ret = nft_do_chain(&pkt, ops); 39 return nft_do_chain(&pkt, ops);
73 if (ret != NF_ACCEPT)
74 return ret;
75 if (!nf_nat_initialized(ct, maniptype)) {
76 ret = nf_nat_alloc_null_binding(ct, ops->hooknum);
77 if (ret != NF_ACCEPT)
78 return ret;
79 }
80 default:
81 break;
82 }
83
84 return nf_nat_packet(ct, ctinfo, ops->hooknum, skb);
85} 40}
86 41
87static unsigned int nf_nat_prerouting(const struct nf_hook_ops *ops, 42static unsigned int nft_nat_ipv4_fn(const struct nf_hook_ops *ops,
88 struct sk_buff *skb, 43 struct sk_buff *skb,
89 const struct net_device *in, 44 const struct net_device *in,
90 const struct net_device *out, 45 const struct net_device *out,
91 int (*okfn)(struct sk_buff *)) 46 int (*okfn)(struct sk_buff *))
92{ 47{
93 __be32 daddr = ip_hdr(skb)->daddr; 48 return nf_nat_ipv4_fn(ops, skb, in, out, nft_nat_do_chain);
94 unsigned int ret;
95
96 ret = nf_nat_fn(ops, skb, in, out, okfn);
97 if (ret != NF_DROP && ret != NF_STOLEN &&
98 ip_hdr(skb)->daddr != daddr) {
99 skb_dst_drop(skb);
100 }
101 return ret;
102} 49}
103 50
104static unsigned int nf_nat_postrouting(const struct nf_hook_ops *ops, 51static unsigned int nft_nat_ipv4_in(const struct nf_hook_ops *ops,
105 struct sk_buff *skb, 52 struct sk_buff *skb,
106 const struct net_device *in, 53 const struct net_device *in,
107 const struct net_device *out, 54 const struct net_device *out,
108 int (*okfn)(struct sk_buff *)) 55 int (*okfn)(struct sk_buff *))
109{ 56{
110 enum ip_conntrack_info ctinfo __maybe_unused; 57 return nf_nat_ipv4_in(ops, skb, in, out, nft_nat_do_chain);
111 const struct nf_conn *ct __maybe_unused;
112 unsigned int ret;
113
114 ret = nf_nat_fn(ops, skb, in, out, okfn);
115#ifdef CONFIG_XFRM
116 if (ret != NF_DROP && ret != NF_STOLEN &&
117 (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
118 enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
119
120 if (ct->tuplehash[dir].tuple.src.u3.ip !=
121 ct->tuplehash[!dir].tuple.dst.u3.ip ||
122 ct->tuplehash[dir].tuple.src.u.all !=
123 ct->tuplehash[!dir].tuple.dst.u.all)
124 return nf_xfrm_me_harder(skb, AF_INET) == 0 ?
125 ret : NF_DROP;
126 }
127#endif
128 return ret;
129} 58}
130 59
131static unsigned int nf_nat_output(const struct nf_hook_ops *ops, 60static unsigned int nft_nat_ipv4_out(const struct nf_hook_ops *ops,
132 struct sk_buff *skb, 61 struct sk_buff *skb,
133 const struct net_device *in, 62 const struct net_device *in,
134 const struct net_device *out, 63 const struct net_device *out,
135 int (*okfn)(struct sk_buff *)) 64 int (*okfn)(struct sk_buff *))
136{ 65{
137 enum ip_conntrack_info ctinfo; 66 return nf_nat_ipv4_out(ops, skb, in, out, nft_nat_do_chain);
138 const struct nf_conn *ct; 67}
139 unsigned int ret;
140
141 ret = nf_nat_fn(ops, skb, in, out, okfn);
142 if (ret != NF_DROP && ret != NF_STOLEN &&
143 (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
144 enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
145 68
146 if (ct->tuplehash[dir].tuple.dst.u3.ip != 69static unsigned int nft_nat_ipv4_local_fn(const struct nf_hook_ops *ops,
147 ct->tuplehash[!dir].tuple.src.u3.ip) { 70 struct sk_buff *skb,
148 if (ip_route_me_harder(skb, RTN_UNSPEC)) 71 const struct net_device *in,
149 ret = NF_DROP; 72 const struct net_device *out,
150 } 73 int (*okfn)(struct sk_buff *))
151#ifdef CONFIG_XFRM 74{
152 else if (ct->tuplehash[dir].tuple.dst.u.all != 75 return nf_nat_ipv4_local_fn(ops, skb, in, out, nft_nat_do_chain);
153 ct->tuplehash[!dir].tuple.src.u.all)
154 if (nf_xfrm_me_harder(skb, AF_INET))
155 ret = NF_DROP;
156#endif
157 }
158 return ret;
159} 76}
160 77
161static const struct nf_chain_type nft_chain_nat_ipv4 = { 78static const struct nf_chain_type nft_chain_nat_ipv4 = {
@@ -168,10 +85,10 @@ static const struct nf_chain_type nft_chain_nat_ipv4 = {
168 (1 << NF_INET_LOCAL_OUT) | 85 (1 << NF_INET_LOCAL_OUT) |
169 (1 << NF_INET_LOCAL_IN), 86 (1 << NF_INET_LOCAL_IN),
170 .hooks = { 87 .hooks = {
171 [NF_INET_PRE_ROUTING] = nf_nat_prerouting, 88 [NF_INET_PRE_ROUTING] = nft_nat_ipv4_in,
172 [NF_INET_POST_ROUTING] = nf_nat_postrouting, 89 [NF_INET_POST_ROUTING] = nft_nat_ipv4_out,
173 [NF_INET_LOCAL_OUT] = nf_nat_output, 90 [NF_INET_LOCAL_OUT] = nft_nat_ipv4_local_fn,
174 [NF_INET_LOCAL_IN] = nf_nat_fn, 91 [NF_INET_LOCAL_IN] = nft_nat_ipv4_fn,
175 }, 92 },
176}; 93};
177 94
diff --git a/net/ipv4/netfilter/nft_masq_ipv4.c b/net/ipv4/netfilter/nft_masq_ipv4.c
new file mode 100644
index 000000000000..1c636d6b5b50
--- /dev/null
+++ b/net/ipv4/netfilter/nft_masq_ipv4.c
@@ -0,0 +1,77 @@
1/*
2 * Copyright (c) 2014 Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
3 *
4 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License version 2 as
6 * published by the Free Software Foundation.
7 */
8
9#include <linux/kernel.h>
10#include <linux/init.h>
11#include <linux/module.h>
12#include <linux/netlink.h>
13#include <linux/netfilter.h>
14#include <linux/netfilter/nf_tables.h>
15#include <net/netfilter/nf_tables.h>
16#include <net/netfilter/nft_masq.h>
17#include <net/netfilter/ipv4/nf_nat_masquerade.h>
18
19static void nft_masq_ipv4_eval(const struct nft_expr *expr,
20 struct nft_data data[NFT_REG_MAX + 1],
21 const struct nft_pktinfo *pkt)
22{
23 struct nft_masq *priv = nft_expr_priv(expr);
24 struct nf_nat_range range;
25 unsigned int verdict;
26
27 range.flags = priv->flags;
28
29 verdict = nf_nat_masquerade_ipv4(pkt->skb, pkt->ops->hooknum,
30 &range, pkt->out);
31
32 data[NFT_REG_VERDICT].verdict = verdict;
33}
34
35static struct nft_expr_type nft_masq_ipv4_type;
36static const struct nft_expr_ops nft_masq_ipv4_ops = {
37 .type = &nft_masq_ipv4_type,
38 .size = NFT_EXPR_SIZE(sizeof(struct nft_masq)),
39 .eval = nft_masq_ipv4_eval,
40 .init = nft_masq_init,
41 .dump = nft_masq_dump,
42};
43
44static struct nft_expr_type nft_masq_ipv4_type __read_mostly = {
45 .family = NFPROTO_IPV4,
46 .name = "masq",
47 .ops = &nft_masq_ipv4_ops,
48 .policy = nft_masq_policy,
49 .maxattr = NFTA_MASQ_MAX,
50 .owner = THIS_MODULE,
51};
52
53static int __init nft_masq_ipv4_module_init(void)
54{
55 int ret;
56
57 ret = nft_register_expr(&nft_masq_ipv4_type);
58 if (ret < 0)
59 return ret;
60
61 nf_nat_masquerade_ipv4_register_notifier();
62
63 return ret;
64}
65
66static void __exit nft_masq_ipv4_module_exit(void)
67{
68 nft_unregister_expr(&nft_masq_ipv4_type);
69 nf_nat_masquerade_ipv4_unregister_notifier();
70}
71
72module_init(nft_masq_ipv4_module_init);
73module_exit(nft_masq_ipv4_module_exit);
74
75MODULE_LICENSE("GPL");
76MODULE_AUTHOR("Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>");
77MODULE_ALIAS_NFT_AF_EXPR(AF_INET, "masq");
diff --git a/net/ipv4/netfilter/nft_reject_ipv4.c b/net/ipv4/netfilter/nft_reject_ipv4.c
index e79718a382f2..ed33299c56d1 100644
--- a/net/ipv4/netfilter/nft_reject_ipv4.c
+++ b/net/ipv4/netfilter/nft_reject_ipv4.c
@@ -16,7 +16,6 @@
16#include <linux/netfilter.h> 16#include <linux/netfilter.h>
17#include <linux/netfilter/nf_tables.h> 17#include <linux/netfilter/nf_tables.h>
18#include <net/netfilter/nf_tables.h> 18#include <net/netfilter/nf_tables.h>
19#include <net/icmp.h>
20#include <net/netfilter/ipv4/nf_reject.h> 19#include <net/netfilter/ipv4/nf_reject.h>
21#include <net/netfilter/nft_reject.h> 20#include <net/netfilter/nft_reject.h>
22 21
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index a3c59a077a5f..57f7c9804139 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -311,7 +311,7 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk,
311 if (addr->sin_addr.s_addr == htonl(INADDR_ANY)) 311 if (addr->sin_addr.s_addr == htonl(INADDR_ANY))
312 chk_addr_ret = RTN_LOCAL; 312 chk_addr_ret = RTN_LOCAL;
313 313
314 if ((sysctl_ip_nonlocal_bind == 0 && 314 if ((net->ipv4.sysctl_ip_nonlocal_bind == 0 &&
315 isk->freebind == 0 && isk->transparent == 0 && 315 isk->freebind == 0 && isk->transparent == 0 &&
316 chk_addr_ret != RTN_LOCAL) || 316 chk_addr_ret != RTN_LOCAL) ||
317 chk_addr_ret == RTN_MULTICAST || 317 chk_addr_ret == RTN_MULTICAST ||
diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
index 46d6a1c923a8..4b7c0ec65251 100644
--- a/net/ipv4/protocol.c
+++ b/net/ipv4/protocol.c
@@ -30,6 +30,7 @@
30 30
31const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly; 31const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
32const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly; 32const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly;
33EXPORT_SYMBOL(inet_offloads);
33 34
34int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol) 35int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol)
35{ 36{
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index cbadb942c332..793c0bb8c4fd 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -596,12 +596,12 @@ static struct fib_nh_exception *fnhe_oldest(struct fnhe_hash_bucket *hash)
596 596
597static inline u32 fnhe_hashfun(__be32 daddr) 597static inline u32 fnhe_hashfun(__be32 daddr)
598{ 598{
599 static u32 fnhe_hashrnd __read_mostly;
599 u32 hval; 600 u32 hval;
600 601
601 hval = (__force u32) daddr; 602 net_get_random_once(&fnhe_hashrnd, sizeof(fnhe_hashrnd));
602 hval ^= (hval >> 11) ^ (hval >> 22); 603 hval = jhash_1word((__force u32) daddr, fnhe_hashrnd);
603 604 return hash_32(hval, FNHE_HASH_SHIFT);
604 return hval & (FNHE_HASH_SIZE - 1);
605} 605}
606 606
607static void fill_route_from_fnhe(struct rtable *rt, struct fib_nh_exception *fnhe) 607static void fill_route_from_fnhe(struct rtable *rt, struct fib_nh_exception *fnhe)
@@ -628,12 +628,12 @@ static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw,
628 628
629 spin_lock_bh(&fnhe_lock); 629 spin_lock_bh(&fnhe_lock);
630 630
631 hash = nh->nh_exceptions; 631 hash = rcu_dereference(nh->nh_exceptions);
632 if (!hash) { 632 if (!hash) {
633 hash = kzalloc(FNHE_HASH_SIZE * sizeof(*hash), GFP_ATOMIC); 633 hash = kzalloc(FNHE_HASH_SIZE * sizeof(*hash), GFP_ATOMIC);
634 if (!hash) 634 if (!hash)
635 goto out_unlock; 635 goto out_unlock;
636 nh->nh_exceptions = hash; 636 rcu_assign_pointer(nh->nh_exceptions, hash);
637 } 637 }
638 638
639 hash += hval; 639 hash += hval;
@@ -1242,7 +1242,7 @@ static unsigned int ipv4_mtu(const struct dst_entry *dst)
1242 1242
1243static struct fib_nh_exception *find_exception(struct fib_nh *nh, __be32 daddr) 1243static struct fib_nh_exception *find_exception(struct fib_nh *nh, __be32 daddr)
1244{ 1244{
1245 struct fnhe_hash_bucket *hash = nh->nh_exceptions; 1245 struct fnhe_hash_bucket *hash = rcu_dereference(nh->nh_exceptions);
1246 struct fib_nh_exception *fnhe; 1246 struct fib_nh_exception *fnhe;
1247 u32 hval; 1247 u32 hval;
1248 1248
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index c0c75688896e..0431a8f3c8f4 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -25,7 +25,7 @@
25 25
26extern int sysctl_tcp_syncookies; 26extern int sysctl_tcp_syncookies;
27 27
28static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS]; 28static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS] __read_mostly;
29 29
30#define COOKIEBITS 24 /* Upper bits store count */ 30#define COOKIEBITS 24 /* Upper bits store count */
31#define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1) 31#define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index a9fde0eef77c..b3c53c8b331e 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -286,13 +286,6 @@ static struct ctl_table ipv4_table[] = {
286 .extra2 = &ip_ttl_max, 286 .extra2 = &ip_ttl_max,
287 }, 287 },
288 { 288 {
289 .procname = "ip_nonlocal_bind",
290 .data = &sysctl_ip_nonlocal_bind,
291 .maxlen = sizeof(int),
292 .mode = 0644,
293 .proc_handler = proc_dointvec
294 },
295 {
296 .procname = "tcp_syn_retries", 289 .procname = "tcp_syn_retries",
297 .data = &sysctl_tcp_syn_retries, 290 .data = &sysctl_tcp_syn_retries,
298 .maxlen = sizeof(int), 291 .maxlen = sizeof(int),
@@ -450,6 +443,16 @@ static struct ctl_table ipv4_table[] = {
450 .mode = 0644, 443 .mode = 0644,
451 .proc_handler = proc_dointvec 444 .proc_handler = proc_dointvec
452 }, 445 },
446#ifdef CONFIG_IP_MULTICAST
447 {
448 .procname = "igmp_qrv",
449 .data = &sysctl_igmp_qrv,
450 .maxlen = sizeof(int),
451 .mode = 0644,
452 .proc_handler = proc_dointvec_minmax,
453 .extra1 = &one
454 },
455#endif
453 { 456 {
454 .procname = "inet_peer_threshold", 457 .procname = "inet_peer_threshold",
455 .data = &inet_peer_threshold, 458 .data = &inet_peer_threshold,
@@ -719,6 +722,22 @@ static struct ctl_table ipv4_table[] = {
719 .extra2 = &one, 722 .extra2 = &one,
720 }, 723 },
721 { 724 {
725 .procname = "icmp_msgs_per_sec",
726 .data = &sysctl_icmp_msgs_per_sec,
727 .maxlen = sizeof(int),
728 .mode = 0644,
729 .proc_handler = proc_dointvec_minmax,
730 .extra1 = &zero,
731 },
732 {
733 .procname = "icmp_msgs_burst",
734 .data = &sysctl_icmp_msgs_burst,
735 .maxlen = sizeof(int),
736 .mode = 0644,
737 .proc_handler = proc_dointvec_minmax,
738 .extra1 = &zero,
739 },
740 {
722 .procname = "udp_mem", 741 .procname = "udp_mem",
723 .data = &sysctl_udp_mem, 742 .data = &sysctl_udp_mem,
724 .maxlen = sizeof(sysctl_udp_mem), 743 .maxlen = sizeof(sysctl_udp_mem),
@@ -830,6 +849,13 @@ static struct ctl_table ipv4_net_table[] = {
830 .proc_handler = proc_dointvec, 849 .proc_handler = proc_dointvec,
831 }, 850 },
832 { 851 {
852 .procname = "ip_nonlocal_bind",
853 .data = &init_net.ipv4.sysctl_ip_nonlocal_bind,
854 .maxlen = sizeof(int),
855 .mode = 0644,
856 .proc_handler = proc_dointvec
857 },
858 {
833 .procname = "fwmark_reflect", 859 .procname = "fwmark_reflect",
834 .data = &init_net.ipv4.sysctl_fwmark_reflect, 860 .data = &init_net.ipv4.sysctl_fwmark_reflect,
835 .maxlen = sizeof(int), 861 .maxlen = sizeof(int),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8ee43ae90396..461003d258ba 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -404,7 +404,7 @@ void tcp_init_sock(struct sock *sk)
404 404
405 tp->reordering = sysctl_tcp_reordering; 405 tp->reordering = sysctl_tcp_reordering;
406 tcp_enable_early_retrans(tp); 406 tcp_enable_early_retrans(tp);
407 icsk->icsk_ca_ops = &tcp_init_congestion_ops; 407 tcp_assign_congestion_control(sk);
408 408
409 tp->tsoffset = 0; 409 tp->tsoffset = 0;
410 410
@@ -608,7 +608,7 @@ static inline bool forced_push(const struct tcp_sock *tp)
608 return after(tp->write_seq, tp->pushed_seq + (tp->max_window >> 1)); 608 return after(tp->write_seq, tp->pushed_seq + (tp->max_window >> 1));
609} 609}
610 610
611static inline void skb_entail(struct sock *sk, struct sk_buff *skb) 611static void skb_entail(struct sock *sk, struct sk_buff *skb)
612{ 612{
613 struct tcp_sock *tp = tcp_sk(sk); 613 struct tcp_sock *tp = tcp_sk(sk);
614 struct tcp_skb_cb *tcb = TCP_SKB_CB(skb); 614 struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
@@ -617,7 +617,7 @@ static inline void skb_entail(struct sock *sk, struct sk_buff *skb)
617 tcb->seq = tcb->end_seq = tp->write_seq; 617 tcb->seq = tcb->end_seq = tp->write_seq;
618 tcb->tcp_flags = TCPHDR_ACK; 618 tcb->tcp_flags = TCPHDR_ACK;
619 tcb->sacked = 0; 619 tcb->sacked = 0;
620 skb_header_release(skb); 620 __skb_header_release(skb);
621 tcp_add_write_queue_tail(sk, skb); 621 tcp_add_write_queue_tail(sk, skb);
622 sk->sk_wmem_queued += skb->truesize; 622 sk->sk_wmem_queued += skb->truesize;
623 sk_mem_charge(sk, skb->truesize); 623 sk_mem_charge(sk, skb->truesize);
@@ -962,7 +962,7 @@ new_segment:
962 skb->ip_summed = CHECKSUM_PARTIAL; 962 skb->ip_summed = CHECKSUM_PARTIAL;
963 tp->write_seq += copy; 963 tp->write_seq += copy;
964 TCP_SKB_CB(skb)->end_seq += copy; 964 TCP_SKB_CB(skb)->end_seq += copy;
965 skb_shinfo(skb)->gso_segs = 0; 965 tcp_skb_pcount_set(skb, 0);
966 966
967 if (!copied) 967 if (!copied)
968 TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH; 968 TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH;
@@ -1260,7 +1260,7 @@ new_segment:
1260 1260
1261 tp->write_seq += copy; 1261 tp->write_seq += copy;
1262 TCP_SKB_CB(skb)->end_seq += copy; 1262 TCP_SKB_CB(skb)->end_seq += copy;
1263 skb_shinfo(skb)->gso_segs = 0; 1263 tcp_skb_pcount_set(skb, 0);
1264 1264
1265 from += copy; 1265 from += copy;
1266 copied += copy; 1266 copied += copy;
@@ -1476,9 +1476,9 @@ static struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
1476 1476
1477 while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) { 1477 while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
1478 offset = seq - TCP_SKB_CB(skb)->seq; 1478 offset = seq - TCP_SKB_CB(skb)->seq;
1479 if (tcp_hdr(skb)->syn) 1479 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN)
1480 offset--; 1480 offset--;
1481 if (offset < skb->len || tcp_hdr(skb)->fin) { 1481 if (offset < skb->len || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) {
1482 *off = offset; 1482 *off = offset;
1483 return skb; 1483 return skb;
1484 } 1484 }
@@ -1551,7 +1551,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
1551 if (offset + 1 != skb->len) 1551 if (offset + 1 != skb->len)
1552 continue; 1552 continue;
1553 } 1553 }
1554 if (tcp_hdr(skb)->fin) { 1554 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
1555 sk_eat_skb(sk, skb); 1555 sk_eat_skb(sk, skb);
1556 ++seq; 1556 ++seq;
1557 break; 1557 break;
@@ -1665,11 +1665,11 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
1665 break; 1665 break;
1666 1666
1667 offset = *seq - TCP_SKB_CB(skb)->seq; 1667 offset = *seq - TCP_SKB_CB(skb)->seq;
1668 if (tcp_hdr(skb)->syn) 1668 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN)
1669 offset--; 1669 offset--;
1670 if (offset < skb->len) 1670 if (offset < skb->len)
1671 goto found_ok_skb; 1671 goto found_ok_skb;
1672 if (tcp_hdr(skb)->fin) 1672 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
1673 goto found_fin_ok; 1673 goto found_fin_ok;
1674 WARN(!(flags & MSG_PEEK), 1674 WARN(!(flags & MSG_PEEK),
1675 "recvmsg bug 2: copied %X seq %X rcvnxt %X fl %X\n", 1675 "recvmsg bug 2: copied %X seq %X rcvnxt %X fl %X\n",
@@ -1857,7 +1857,7 @@ skip_copy:
1857 if (used + offset < skb->len) 1857 if (used + offset < skb->len)
1858 continue; 1858 continue;
1859 1859
1860 if (tcp_hdr(skb)->fin) 1860 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
1861 goto found_fin_ok; 1861 goto found_fin_ok;
1862 if (!(flags & MSG_PEEK)) 1862 if (!(flags & MSG_PEEK))
1863 sk_eat_skb(sk, skb); 1863 sk_eat_skb(sk, skb);
@@ -2044,8 +2044,10 @@ void tcp_close(struct sock *sk, long timeout)
2044 * reader process may not have drained the data yet! 2044 * reader process may not have drained the data yet!
2045 */ 2045 */
2046 while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) { 2046 while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
2047 u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq - 2047 u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq;
2048 tcp_hdr(skb)->fin; 2048
2049 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
2050 len--;
2049 data_was_unread += len; 2051 data_was_unread += len;
2050 __kfree_skb(skb); 2052 __kfree_skb(skb);
2051 } 2053 }
@@ -2572,7 +2574,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
2572 break; 2574 break;
2573#endif 2575#endif
2574 case TCP_USER_TIMEOUT: 2576 case TCP_USER_TIMEOUT:
2575 /* Cap the max timeout in ms TCP will retry/retrans 2577 /* Cap the max time in ms TCP will retry or probe the window
2576 * before giving up and aborting (ETIMEDOUT) a connection. 2578 * before giving up and aborting (ETIMEDOUT) a connection.
2577 */ 2579 */
2578 if (val < 0) 2580 if (val < 0)
@@ -3051,7 +3053,7 @@ static int __init set_thash_entries(char *str)
3051} 3053}
3052__setup("thash_entries=", set_thash_entries); 3054__setup("thash_entries=", set_thash_entries);
3053 3055
3054static void tcp_init_mem(void) 3056static void __init tcp_init_mem(void)
3055{ 3057{
3056 unsigned long limit = nr_free_buffer_pages() / 8; 3058 unsigned long limit = nr_free_buffer_pages() / 8;
3057 limit = max(limit, 128UL); 3059 limit = max(limit, 128UL);
@@ -3137,8 +3139,6 @@ void __init tcp_init(void)
3137 tcp_hashinfo.ehash_mask + 1, tcp_hashinfo.bhash_size); 3139 tcp_hashinfo.ehash_mask + 1, tcp_hashinfo.bhash_size);
3138 3140
3139 tcp_metrics_init(); 3141 tcp_metrics_init();
3140 3142 BUG_ON(tcp_register_congestion_control(&tcp_reno) != 0);
3141 tcp_register_congestion_control(&tcp_reno);
3142
3143 tcp_tasklet_init(); 3143 tcp_tasklet_init();
3144} 3144}
diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c
index d5de69bc04f5..bb395d46a389 100644
--- a/net/ipv4/tcp_bic.c
+++ b/net/ipv4/tcp_bic.c
@@ -17,7 +17,6 @@
17#include <linux/module.h> 17#include <linux/module.h>
18#include <net/tcp.h> 18#include <net/tcp.h>
19 19
20
21#define BICTCP_BETA_SCALE 1024 /* Scale factor beta calculation 20#define BICTCP_BETA_SCALE 1024 /* Scale factor beta calculation
22 * max_cwnd = snd_cwnd * beta 21 * max_cwnd = snd_cwnd * beta
23 */ 22 */
@@ -46,11 +45,10 @@ MODULE_PARM_DESC(initial_ssthresh, "initial value of slow start threshold");
46module_param(smooth_part, int, 0644); 45module_param(smooth_part, int, 0644);
47MODULE_PARM_DESC(smooth_part, "log(B/(B*Smin))/log(B/(B-1))+B, # of RTT from Wmax-B to Wmax"); 46MODULE_PARM_DESC(smooth_part, "log(B/(B*Smin))/log(B/(B-1))+B, # of RTT from Wmax-B to Wmax");
48 47
49
50/* BIC TCP Parameters */ 48/* BIC TCP Parameters */
51struct bictcp { 49struct bictcp {
52 u32 cnt; /* increase cwnd by 1 after ACKs */ 50 u32 cnt; /* increase cwnd by 1 after ACKs */
53 u32 last_max_cwnd; /* last maximum snd_cwnd */ 51 u32 last_max_cwnd; /* last maximum snd_cwnd */
54 u32 loss_cwnd; /* congestion window at last loss */ 52 u32 loss_cwnd; /* congestion window at last loss */
55 u32 last_cwnd; /* the last snd_cwnd */ 53 u32 last_cwnd; /* the last snd_cwnd */
56 u32 last_time; /* time when updated last_cwnd */ 54 u32 last_time; /* time when updated last_cwnd */
@@ -103,7 +101,7 @@ static inline void bictcp_update(struct bictcp *ca, u32 cwnd)
103 101
104 /* binary increase */ 102 /* binary increase */
105 if (cwnd < ca->last_max_cwnd) { 103 if (cwnd < ca->last_max_cwnd) {
106 __u32 dist = (ca->last_max_cwnd - cwnd) 104 __u32 dist = (ca->last_max_cwnd - cwnd)
107 / BICTCP_B; 105 / BICTCP_B;
108 106
109 if (dist > max_increment) 107 if (dist > max_increment)
@@ -154,7 +152,6 @@ static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked)
154 bictcp_update(ca, tp->snd_cwnd); 152 bictcp_update(ca, tp->snd_cwnd);
155 tcp_cong_avoid_ai(tp, ca->cnt); 153 tcp_cong_avoid_ai(tp, ca->cnt);
156 } 154 }
157
158} 155}
159 156
160/* 157/*
@@ -177,7 +174,6 @@ static u32 bictcp_recalc_ssthresh(struct sock *sk)
177 174
178 ca->loss_cwnd = tp->snd_cwnd; 175 ca->loss_cwnd = tp->snd_cwnd;
179 176
180
181 if (tp->snd_cwnd <= low_window) 177 if (tp->snd_cwnd <= low_window)
182 return max(tp->snd_cwnd >> 1U, 2U); 178 return max(tp->snd_cwnd >> 1U, 2U);
183 else 179 else
@@ -188,6 +184,7 @@ static u32 bictcp_undo_cwnd(struct sock *sk)
188{ 184{
189 const struct tcp_sock *tp = tcp_sk(sk); 185 const struct tcp_sock *tp = tcp_sk(sk);
190 const struct bictcp *ca = inet_csk_ca(sk); 186 const struct bictcp *ca = inet_csk_ca(sk);
187
191 return max(tp->snd_cwnd, ca->loss_cwnd); 188 return max(tp->snd_cwnd, ca->loss_cwnd);
192} 189}
193 190
@@ -206,12 +203,12 @@ static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt)
206 203
207 if (icsk->icsk_ca_state == TCP_CA_Open) { 204 if (icsk->icsk_ca_state == TCP_CA_Open) {
208 struct bictcp *ca = inet_csk_ca(sk); 205 struct bictcp *ca = inet_csk_ca(sk);
206
209 cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT; 207 cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT;
210 ca->delayed_ack += cnt; 208 ca->delayed_ack += cnt;
211 } 209 }
212} 210}
213 211
214
215static struct tcp_congestion_ops bictcp __read_mostly = { 212static struct tcp_congestion_ops bictcp __read_mostly = {
216 .init = bictcp_init, 213 .init = bictcp_init,
217 .ssthresh = bictcp_recalc_ssthresh, 214 .ssthresh = bictcp_recalc_ssthresh,
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 7b09d8b49fa5..b1c5970d47a1 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -74,24 +74,34 @@ void tcp_unregister_congestion_control(struct tcp_congestion_ops *ca)
74EXPORT_SYMBOL_GPL(tcp_unregister_congestion_control); 74EXPORT_SYMBOL_GPL(tcp_unregister_congestion_control);
75 75
76/* Assign choice of congestion control. */ 76/* Assign choice of congestion control. */
77void tcp_init_congestion_control(struct sock *sk) 77void tcp_assign_congestion_control(struct sock *sk)
78{ 78{
79 struct inet_connection_sock *icsk = inet_csk(sk); 79 struct inet_connection_sock *icsk = inet_csk(sk);
80 struct tcp_congestion_ops *ca; 80 struct tcp_congestion_ops *ca;
81 81
82 /* if no choice made yet assign the current value set as default */ 82 rcu_read_lock();
83 if (icsk->icsk_ca_ops == &tcp_init_congestion_ops) { 83 list_for_each_entry_rcu(ca, &tcp_cong_list, list) {
84 rcu_read_lock(); 84 if (likely(try_module_get(ca->owner))) {
85 list_for_each_entry_rcu(ca, &tcp_cong_list, list) { 85 icsk->icsk_ca_ops = ca;
86 if (try_module_get(ca->owner)) { 86 goto out;
87 icsk->icsk_ca_ops = ca;
88 break;
89 }
90
91 /* fallback to next available */
92 } 87 }
93 rcu_read_unlock(); 88 /* Fallback to next available. The last really
89 * guaranteed fallback is Reno from this list.
90 */
94 } 91 }
92out:
93 rcu_read_unlock();
94
95 /* Clear out private data before diag gets it and
96 * the ca has not been initialized.
97 */
98 if (ca->get_info)
99 memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv));
100}
101
102void tcp_init_congestion_control(struct sock *sk)
103{
104 const struct inet_connection_sock *icsk = inet_csk(sk);
95 105
96 if (icsk->icsk_ca_ops->init) 106 if (icsk->icsk_ca_ops->init)
97 icsk->icsk_ca_ops->init(sk); 107 icsk->icsk_ca_ops->init(sk);
@@ -142,7 +152,6 @@ static int __init tcp_congestion_default(void)
142} 152}
143late_initcall(tcp_congestion_default); 153late_initcall(tcp_congestion_default);
144 154
145
146/* Build string with list of available congestion control values */ 155/* Build string with list of available congestion control values */
147void tcp_get_available_congestion_control(char *buf, size_t maxlen) 156void tcp_get_available_congestion_control(char *buf, size_t maxlen)
148{ 157{
@@ -154,7 +163,6 @@ void tcp_get_available_congestion_control(char *buf, size_t maxlen)
154 offs += snprintf(buf + offs, maxlen - offs, 163 offs += snprintf(buf + offs, maxlen - offs,
155 "%s%s", 164 "%s%s",
156 offs == 0 ? "" : " ", ca->name); 165 offs == 0 ? "" : " ", ca->name);
157
158 } 166 }
159 rcu_read_unlock(); 167 rcu_read_unlock();
160} 168}
@@ -186,7 +194,6 @@ void tcp_get_allowed_congestion_control(char *buf, size_t maxlen)
186 offs += snprintf(buf + offs, maxlen - offs, 194 offs += snprintf(buf + offs, maxlen - offs,
187 "%s%s", 195 "%s%s",
188 offs == 0 ? "" : " ", ca->name); 196 offs == 0 ? "" : " ", ca->name);
189
190 } 197 }
191 rcu_read_unlock(); 198 rcu_read_unlock();
192} 199}
@@ -230,7 +237,6 @@ out:
230 return ret; 237 return ret;
231} 238}
232 239
233
234/* Change congestion control for socket */ 240/* Change congestion control for socket */
235int tcp_set_congestion_control(struct sock *sk, const char *name) 241int tcp_set_congestion_control(struct sock *sk, const char *name)
236{ 242{
@@ -285,15 +291,13 @@ int tcp_set_congestion_control(struct sock *sk, const char *name)
285 * ABC caps N to 2. Slow start exits when cwnd grows over ssthresh and 291 * ABC caps N to 2. Slow start exits when cwnd grows over ssthresh and
286 * returns the leftover acks to adjust cwnd in congestion avoidance mode. 292 * returns the leftover acks to adjust cwnd in congestion avoidance mode.
287 */ 293 */
288int tcp_slow_start(struct tcp_sock *tp, u32 acked) 294void tcp_slow_start(struct tcp_sock *tp, u32 acked)
289{ 295{
290 u32 cwnd = tp->snd_cwnd + acked; 296 u32 cwnd = tp->snd_cwnd + acked;
291 297
292 if (cwnd > tp->snd_ssthresh) 298 if (cwnd > tp->snd_ssthresh)
293 cwnd = tp->snd_ssthresh + 1; 299 cwnd = tp->snd_ssthresh + 1;
294 acked -= cwnd - tp->snd_cwnd;
295 tp->snd_cwnd = min(cwnd, tp->snd_cwnd_clamp); 300 tp->snd_cwnd = min(cwnd, tp->snd_cwnd_clamp);
296 return acked;
297} 301}
298EXPORT_SYMBOL_GPL(tcp_slow_start); 302EXPORT_SYMBOL_GPL(tcp_slow_start);
299 303
@@ -337,6 +341,7 @@ EXPORT_SYMBOL_GPL(tcp_reno_cong_avoid);
337u32 tcp_reno_ssthresh(struct sock *sk) 341u32 tcp_reno_ssthresh(struct sock *sk)
338{ 342{
339 const struct tcp_sock *tp = tcp_sk(sk); 343 const struct tcp_sock *tp = tcp_sk(sk);
344
340 return max(tp->snd_cwnd >> 1U, 2U); 345 return max(tp->snd_cwnd >> 1U, 2U);
341} 346}
342EXPORT_SYMBOL_GPL(tcp_reno_ssthresh); 347EXPORT_SYMBOL_GPL(tcp_reno_ssthresh);
@@ -348,15 +353,3 @@ struct tcp_congestion_ops tcp_reno = {
348 .ssthresh = tcp_reno_ssthresh, 353 .ssthresh = tcp_reno_ssthresh,
349 .cong_avoid = tcp_reno_cong_avoid, 354 .cong_avoid = tcp_reno_cong_avoid,
350}; 355};
351
352/* Initial congestion control used (until SYN)
353 * really reno under another name so we can tell difference
354 * during tcp_set_default_congestion_control
355 */
356struct tcp_congestion_ops tcp_init_congestion_ops = {
357 .name = "",
358 .owner = THIS_MODULE,
359 .ssthresh = tcp_reno_ssthresh,
360 .cong_avoid = tcp_reno_cong_avoid,
361};
362EXPORT_SYMBOL_GPL(tcp_init_congestion_ops);
diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index a9bd8a4828a9..20de0118c98e 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -82,12 +82,13 @@ MODULE_PARM_DESC(hystart_ack_delta, "spacing between ack's indicating train (mse
82/* BIC TCP Parameters */ 82/* BIC TCP Parameters */
83struct bictcp { 83struct bictcp {
84 u32 cnt; /* increase cwnd by 1 after ACKs */ 84 u32 cnt; /* increase cwnd by 1 after ACKs */
85 u32 last_max_cwnd; /* last maximum snd_cwnd */ 85 u32 last_max_cwnd; /* last maximum snd_cwnd */
86 u32 loss_cwnd; /* congestion window at last loss */ 86 u32 loss_cwnd; /* congestion window at last loss */
87 u32 last_cwnd; /* the last snd_cwnd */ 87 u32 last_cwnd; /* the last snd_cwnd */
88 u32 last_time; /* time when updated last_cwnd */ 88 u32 last_time; /* time when updated last_cwnd */
89 u32 bic_origin_point;/* origin point of bic function */ 89 u32 bic_origin_point;/* origin point of bic function */
90 u32 bic_K; /* time to origin point from the beginning of the current epoch */ 90 u32 bic_K; /* time to origin point
91 from the beginning of the current epoch */
91 u32 delay_min; /* min delay (msec << 3) */ 92 u32 delay_min; /* min delay (msec << 3) */
92 u32 epoch_start; /* beginning of an epoch */ 93 u32 epoch_start; /* beginning of an epoch */
93 u32 ack_cnt; /* number of acks */ 94 u32 ack_cnt; /* number of acks */
@@ -219,7 +220,7 @@ static inline void bictcp_update(struct bictcp *ca, u32 cwnd)
219 ca->last_time = tcp_time_stamp; 220 ca->last_time = tcp_time_stamp;
220 221
221 if (ca->epoch_start == 0) { 222 if (ca->epoch_start == 0) {
222 ca->epoch_start = tcp_time_stamp; /* record the beginning of an epoch */ 223 ca->epoch_start = tcp_time_stamp; /* record beginning */
223 ca->ack_cnt = 1; /* start counting */ 224 ca->ack_cnt = 1; /* start counting */
224 ca->tcp_cwnd = cwnd; /* syn with cubic */ 225 ca->tcp_cwnd = cwnd; /* syn with cubic */
225 226
@@ -263,9 +264,9 @@ static inline void bictcp_update(struct bictcp *ca, u32 cwnd)
263 264
264 /* c/rtt * (t-K)^3 */ 265 /* c/rtt * (t-K)^3 */
265 delta = (cube_rtt_scale * offs * offs * offs) >> (10+3*BICTCP_HZ); 266 delta = (cube_rtt_scale * offs * offs * offs) >> (10+3*BICTCP_HZ);
266 if (t < ca->bic_K) /* below origin*/ 267 if (t < ca->bic_K) /* below origin*/
267 bic_target = ca->bic_origin_point - delta; 268 bic_target = ca->bic_origin_point - delta;
268 else /* above origin*/ 269 else /* above origin*/
269 bic_target = ca->bic_origin_point + delta; 270 bic_target = ca->bic_origin_point + delta;
270 271
271 /* cubic function - calc bictcp_cnt*/ 272 /* cubic function - calc bictcp_cnt*/
@@ -285,13 +286,14 @@ static inline void bictcp_update(struct bictcp *ca, u32 cwnd)
285 /* TCP Friendly */ 286 /* TCP Friendly */
286 if (tcp_friendliness) { 287 if (tcp_friendliness) {
287 u32 scale = beta_scale; 288 u32 scale = beta_scale;
289
288 delta = (cwnd * scale) >> 3; 290 delta = (cwnd * scale) >> 3;
289 while (ca->ack_cnt > delta) { /* update tcp cwnd */ 291 while (ca->ack_cnt > delta) { /* update tcp cwnd */
290 ca->ack_cnt -= delta; 292 ca->ack_cnt -= delta;
291 ca->tcp_cwnd++; 293 ca->tcp_cwnd++;
292 } 294 }
293 295
294 if (ca->tcp_cwnd > cwnd){ /* if bic is slower than tcp */ 296 if (ca->tcp_cwnd > cwnd) { /* if bic is slower than tcp */
295 delta = ca->tcp_cwnd - cwnd; 297 delta = ca->tcp_cwnd - cwnd;
296 max_cnt = cwnd / delta; 298 max_cnt = cwnd / delta;
297 if (ca->cnt > max_cnt) 299 if (ca->cnt > max_cnt)
@@ -320,7 +322,6 @@ static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked)
320 bictcp_update(ca, tp->snd_cwnd); 322 bictcp_update(ca, tp->snd_cwnd);
321 tcp_cong_avoid_ai(tp, ca->cnt); 323 tcp_cong_avoid_ai(tp, ca->cnt);
322 } 324 }
323
324} 325}
325 326
326static u32 bictcp_recalc_ssthresh(struct sock *sk) 327static u32 bictcp_recalc_ssthresh(struct sock *sk)
@@ -452,7 +453,8 @@ static int __init cubictcp_register(void)
452 * based on SRTT of 100ms 453 * based on SRTT of 100ms
453 */ 454 */
454 455
455 beta_scale = 8*(BICTCP_BETA_SCALE+beta)/ 3 / (BICTCP_BETA_SCALE - beta); 456 beta_scale = 8*(BICTCP_BETA_SCALE+beta) / 3
457 / (BICTCP_BETA_SCALE - beta);
456 458
457 cube_rtt_scale = (bic_scale * 10); /* 1024*c/rtt */ 459 cube_rtt_scale = (bic_scale * 10); /* 1024*c/rtt */
458 460
diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c
new file mode 100644
index 000000000000..b504371af742
--- /dev/null
+++ b/net/ipv4/tcp_dctcp.c
@@ -0,0 +1,344 @@
1/* DataCenter TCP (DCTCP) congestion control.
2 *
3 * http://simula.stanford.edu/~alizade/Site/DCTCP.html
4 *
5 * This is an implementation of DCTCP over Reno, an enhancement to the
6 * TCP congestion control algorithm designed for data centers. DCTCP
7 * leverages Explicit Congestion Notification (ECN) in the network to
8 * provide multi-bit feedback to the end hosts. DCTCP's goal is to meet
9 * the following three data center transport requirements:
10 *
11 * - High burst tolerance (incast due to partition/aggregate)
12 * - Low latency (short flows, queries)
13 * - High throughput (continuous data updates, large file transfers)
14 * with commodity shallow buffered switches
15 *
16 * The algorithm is described in detail in the following two papers:
17 *
18 * 1) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye,
19 * Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan:
20 * "Data Center TCP (DCTCP)", Data Center Networks session
21 * Proc. ACM SIGCOMM, New Delhi, 2010.
22 * http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf
23 *
24 * 2) Mohammad Alizadeh, Adel Javanmard, and Balaji Prabhakar:
25 * "Analysis of DCTCP: Stability, Convergence, and Fairness"
26 * Proc. ACM SIGMETRICS, San Jose, 2011.
27 * http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf
28 *
29 * Initial prototype from Abdul Kabbani, Masato Yasuda and Mohammad Alizadeh.
30 *
31 * Authors:
32 *
33 * Daniel Borkmann <dborkman@redhat.com>
34 * Florian Westphal <fw@strlen.de>
35 * Glenn Judd <glenn.judd@morganstanley.com>
36 *
37 * This program is free software; you can redistribute it and/or modify
38 * it under the terms of the GNU General Public License as published by
39 * the Free Software Foundation; either version 2 of the License, or (at
40 * your option) any later version.
41 */
42
43#include <linux/module.h>
44#include <linux/mm.h>
45#include <net/tcp.h>
46#include <linux/inet_diag.h>
47
48#define DCTCP_MAX_ALPHA 1024U
49
50struct dctcp {
51 u32 acked_bytes_ecn;
52 u32 acked_bytes_total;
53 u32 prior_snd_una;
54 u32 prior_rcv_nxt;
55 u32 dctcp_alpha;
56 u32 next_seq;
57 u32 ce_state;
58 u32 delayed_ack_reserved;
59};
60
61static unsigned int dctcp_shift_g __read_mostly = 4; /* g = 1/2^4 */
62module_param(dctcp_shift_g, uint, 0644);
63MODULE_PARM_DESC(dctcp_shift_g, "parameter g for updating dctcp_alpha");
64
65static unsigned int dctcp_alpha_on_init __read_mostly = DCTCP_MAX_ALPHA;
66module_param(dctcp_alpha_on_init, uint, 0644);
67MODULE_PARM_DESC(dctcp_alpha_on_init, "parameter for initial alpha value");
68
69static unsigned int dctcp_clamp_alpha_on_loss __read_mostly;
70module_param(dctcp_clamp_alpha_on_loss, uint, 0644);
71MODULE_PARM_DESC(dctcp_clamp_alpha_on_loss,
72 "parameter for clamping alpha on loss");
73
74static struct tcp_congestion_ops dctcp_reno;
75
76static void dctcp_reset(const struct tcp_sock *tp, struct dctcp *ca)
77{
78 ca->next_seq = tp->snd_nxt;
79
80 ca->acked_bytes_ecn = 0;
81 ca->acked_bytes_total = 0;
82}
83
84static void dctcp_init(struct sock *sk)
85{
86 const struct tcp_sock *tp = tcp_sk(sk);
87
88 if ((tp->ecn_flags & TCP_ECN_OK) ||
89 (sk->sk_state == TCP_LISTEN ||
90 sk->sk_state == TCP_CLOSE)) {
91 struct dctcp *ca = inet_csk_ca(sk);
92
93 ca->prior_snd_una = tp->snd_una;
94 ca->prior_rcv_nxt = tp->rcv_nxt;
95
96 ca->dctcp_alpha = min(dctcp_alpha_on_init, DCTCP_MAX_ALPHA);
97
98 ca->delayed_ack_reserved = 0;
99 ca->ce_state = 0;
100
101 dctcp_reset(tp, ca);
102 return;
103 }
104
105 /* No ECN support? Fall back to Reno. Also need to clear
106 * ECT from sk since it is set during 3WHS for DCTCP.
107 */
108 inet_csk(sk)->icsk_ca_ops = &dctcp_reno;
109 INET_ECN_dontxmit(sk);
110}
111
112static u32 dctcp_ssthresh(struct sock *sk)
113{
114 const struct dctcp *ca = inet_csk_ca(sk);
115 struct tcp_sock *tp = tcp_sk(sk);
116
117 return max(tp->snd_cwnd - ((tp->snd_cwnd * ca->dctcp_alpha) >> 11U), 2U);
118}
119
120/* Minimal DCTP CE state machine:
121 *
122 * S: 0 <- last pkt was non-CE
123 * 1 <- last pkt was CE
124 */
125
126static void dctcp_ce_state_0_to_1(struct sock *sk)
127{
128 struct dctcp *ca = inet_csk_ca(sk);
129 struct tcp_sock *tp = tcp_sk(sk);
130
131 /* State has changed from CE=0 to CE=1 and delayed
132 * ACK has not sent yet.
133 */
134 if (!ca->ce_state && ca->delayed_ack_reserved) {
135 u32 tmp_rcv_nxt;
136
137 /* Save current rcv_nxt. */
138 tmp_rcv_nxt = tp->rcv_nxt;
139
140 /* Generate previous ack with CE=0. */
141 tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
142 tp->rcv_nxt = ca->prior_rcv_nxt;
143
144 tcp_send_ack(sk);
145
146 /* Recover current rcv_nxt. */
147 tp->rcv_nxt = tmp_rcv_nxt;
148 }
149
150 ca->prior_rcv_nxt = tp->rcv_nxt;
151 ca->ce_state = 1;
152
153 tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
154}
155
156static void dctcp_ce_state_1_to_0(struct sock *sk)
157{
158 struct dctcp *ca = inet_csk_ca(sk);
159 struct tcp_sock *tp = tcp_sk(sk);
160
161 /* State has changed from CE=1 to CE=0 and delayed
162 * ACK has not sent yet.
163 */
164 if (ca->ce_state && ca->delayed_ack_reserved) {
165 u32 tmp_rcv_nxt;
166
167 /* Save current rcv_nxt. */
168 tmp_rcv_nxt = tp->rcv_nxt;
169
170 /* Generate previous ack with CE=1. */
171 tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
172 tp->rcv_nxt = ca->prior_rcv_nxt;
173
174 tcp_send_ack(sk);
175
176 /* Recover current rcv_nxt. */
177 tp->rcv_nxt = tmp_rcv_nxt;
178 }
179
180 ca->prior_rcv_nxt = tp->rcv_nxt;
181 ca->ce_state = 0;
182
183 tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
184}
185
186static void dctcp_update_alpha(struct sock *sk, u32 flags)
187{
188 const struct tcp_sock *tp = tcp_sk(sk);
189 struct dctcp *ca = inet_csk_ca(sk);
190 u32 acked_bytes = tp->snd_una - ca->prior_snd_una;
191
192 /* If ack did not advance snd_una, count dupack as MSS size.
193 * If ack did update window, do not count it at all.
194 */
195 if (acked_bytes == 0 && !(flags & CA_ACK_WIN_UPDATE))
196 acked_bytes = inet_csk(sk)->icsk_ack.rcv_mss;
197 if (acked_bytes) {
198 ca->acked_bytes_total += acked_bytes;
199 ca->prior_snd_una = tp->snd_una;
200
201 if (flags & CA_ACK_ECE)
202 ca->acked_bytes_ecn += acked_bytes;
203 }
204
205 /* Expired RTT */
206 if (!before(tp->snd_una, ca->next_seq)) {
207 /* For avoiding denominator == 1. */
208 if (ca->acked_bytes_total == 0)
209 ca->acked_bytes_total = 1;
210
211 /* alpha = (1 - g) * alpha + g * F */
212 ca->dctcp_alpha = ca->dctcp_alpha -
213 (ca->dctcp_alpha >> dctcp_shift_g) +
214 (ca->acked_bytes_ecn << (10U - dctcp_shift_g)) /
215 ca->acked_bytes_total;
216
217 if (ca->dctcp_alpha > DCTCP_MAX_ALPHA)
218 /* Clamp dctcp_alpha to max. */
219 ca->dctcp_alpha = DCTCP_MAX_ALPHA;
220
221 dctcp_reset(tp, ca);
222 }
223}
224
225static void dctcp_state(struct sock *sk, u8 new_state)
226{
227 if (dctcp_clamp_alpha_on_loss && new_state == TCP_CA_Loss) {
228 struct dctcp *ca = inet_csk_ca(sk);
229
230 /* If this extension is enabled, we clamp dctcp_alpha to
231 * max on packet loss; the motivation is that dctcp_alpha
232 * is an indicator to the extend of congestion and packet
233 * loss is an indicator of extreme congestion; setting
234 * this in practice turned out to be beneficial, and
235 * effectively assumes total congestion which reduces the
236 * window by half.
237 */
238 ca->dctcp_alpha = DCTCP_MAX_ALPHA;
239 }
240}
241
242static void dctcp_update_ack_reserved(struct sock *sk, enum tcp_ca_event ev)
243{
244 struct dctcp *ca = inet_csk_ca(sk);
245
246 switch (ev) {
247 case CA_EVENT_DELAYED_ACK:
248 if (!ca->delayed_ack_reserved)
249 ca->delayed_ack_reserved = 1;
250 break;
251 case CA_EVENT_NON_DELAYED_ACK:
252 if (ca->delayed_ack_reserved)
253 ca->delayed_ack_reserved = 0;
254 break;
255 default:
256 /* Don't care for the rest. */
257 break;
258 }
259}
260
261static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev)
262{
263 switch (ev) {
264 case CA_EVENT_ECN_IS_CE:
265 dctcp_ce_state_0_to_1(sk);
266 break;
267 case CA_EVENT_ECN_NO_CE:
268 dctcp_ce_state_1_to_0(sk);
269 break;
270 case CA_EVENT_DELAYED_ACK:
271 case CA_EVENT_NON_DELAYED_ACK:
272 dctcp_update_ack_reserved(sk, ev);
273 break;
274 default:
275 /* Don't care for the rest. */
276 break;
277 }
278}
279
280static void dctcp_get_info(struct sock *sk, u32 ext, struct sk_buff *skb)
281{
282 const struct dctcp *ca = inet_csk_ca(sk);
283
284 /* Fill it also in case of VEGASINFO due to req struct limits.
285 * We can still correctly retrieve it later.
286 */
287 if (ext & (1 << (INET_DIAG_DCTCPINFO - 1)) ||
288 ext & (1 << (INET_DIAG_VEGASINFO - 1))) {
289 struct tcp_dctcp_info info;
290
291 memset(&info, 0, sizeof(info));
292 if (inet_csk(sk)->icsk_ca_ops != &dctcp_reno) {
293 info.dctcp_enabled = 1;
294 info.dctcp_ce_state = (u16) ca->ce_state;
295 info.dctcp_alpha = ca->dctcp_alpha;
296 info.dctcp_ab_ecn = ca->acked_bytes_ecn;
297 info.dctcp_ab_tot = ca->acked_bytes_total;
298 }
299
300 nla_put(skb, INET_DIAG_DCTCPINFO, sizeof(info), &info);
301 }
302}
303
304static struct tcp_congestion_ops dctcp __read_mostly = {
305 .init = dctcp_init,
306 .in_ack_event = dctcp_update_alpha,
307 .cwnd_event = dctcp_cwnd_event,
308 .ssthresh = dctcp_ssthresh,
309 .cong_avoid = tcp_reno_cong_avoid,
310 .set_state = dctcp_state,
311 .get_info = dctcp_get_info,
312 .flags = TCP_CONG_NEEDS_ECN,
313 .owner = THIS_MODULE,
314 .name = "dctcp",
315};
316
317static struct tcp_congestion_ops dctcp_reno __read_mostly = {
318 .ssthresh = tcp_reno_ssthresh,
319 .cong_avoid = tcp_reno_cong_avoid,
320 .get_info = dctcp_get_info,
321 .owner = THIS_MODULE,
322 .name = "dctcp-reno",
323};
324
325static int __init dctcp_register(void)
326{
327 BUILD_BUG_ON(sizeof(struct dctcp) > ICSK_CA_PRIV_SIZE);
328 return tcp_register_congestion_control(&dctcp);
329}
330
331static void __exit dctcp_unregister(void)
332{
333 tcp_unregister_congestion_control(&dctcp);
334}
335
336module_init(dctcp_register);
337module_exit(dctcp_unregister);
338
339MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
340MODULE_AUTHOR("Florian Westphal <fw@strlen.de>");
341MODULE_AUTHOR("Glenn Judd <glenn.judd@morganstanley.com>");
342
343MODULE_LICENSE("GPL v2");
344MODULE_DESCRIPTION("DataCenter TCP (DCTCP)");
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index ed3f2ad42e0f..0d73f9ddb55b 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -9,7 +9,6 @@
9 * 2 of the License, or (at your option) any later version. 9 * 2 of the License, or (at your option) any later version.
10 */ 10 */
11 11
12
13#include <linux/module.h> 12#include <linux/module.h>
14#include <linux/inet_diag.h> 13#include <linux/inet_diag.h>
15 14
@@ -35,13 +34,13 @@ static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
35} 34}
36 35
37static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, 36static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
38 struct inet_diag_req_v2 *r, struct nlattr *bc) 37 struct inet_diag_req_v2 *r, struct nlattr *bc)
39{ 38{
40 inet_diag_dump_icsk(&tcp_hashinfo, skb, cb, r, bc); 39 inet_diag_dump_icsk(&tcp_hashinfo, skb, cb, r, bc);
41} 40}
42 41
43static int tcp_diag_dump_one(struct sk_buff *in_skb, const struct nlmsghdr *nlh, 42static int tcp_diag_dump_one(struct sk_buff *in_skb, const struct nlmsghdr *nlh,
44 struct inet_diag_req_v2 *req) 43 struct inet_diag_req_v2 *req)
45{ 44{
46 return inet_diag_dump_one_icsk(&tcp_hashinfo, in_skb, nlh, req); 45 return inet_diag_dump_one_icsk(&tcp_hashinfo, in_skb, nlh, req);
47} 46}
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 9771563ab564..815c85e3b1e0 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -115,7 +115,7 @@ static bool tcp_fastopen_cookie_gen(struct request_sock *req,
115 115
116 if (__tcp_fastopen_cookie_gen(&ip6h->saddr, &tmp)) { 116 if (__tcp_fastopen_cookie_gen(&ip6h->saddr, &tmp)) {
117 struct in6_addr *buf = (struct in6_addr *) tmp.val; 117 struct in6_addr *buf = (struct in6_addr *) tmp.val;
118 int i = 4; 118 int i;
119 119
120 for (i = 0; i < 4; i++) 120 for (i = 0; i < 4; i++)
121 buf->s6_addr32[i] ^= ip6h->daddr.s6_addr32[i]; 121 buf->s6_addr32[i] ^= ip6h->daddr.s6_addr32[i];
diff --git a/net/ipv4/tcp_highspeed.c b/net/ipv4/tcp_highspeed.c
index 1c4908280d92..882c08aae2f5 100644
--- a/net/ipv4/tcp_highspeed.c
+++ b/net/ipv4/tcp_highspeed.c
@@ -9,7 +9,6 @@
9#include <linux/module.h> 9#include <linux/module.h>
10#include <net/tcp.h> 10#include <net/tcp.h>
11 11
12
13/* From AIMD tables from RFC 3649 appendix B, 12/* From AIMD tables from RFC 3649 appendix B,
14 * with fixed-point MD scaled <<8. 13 * with fixed-point MD scaled <<8.
15 */ 14 */
@@ -17,78 +16,78 @@ static const struct hstcp_aimd_val {
17 unsigned int cwnd; 16 unsigned int cwnd;
18 unsigned int md; 17 unsigned int md;
19} hstcp_aimd_vals[] = { 18} hstcp_aimd_vals[] = {
20 { 38, 128, /* 0.50 */ }, 19 { 38, 128, /* 0.50 */ },
21 { 118, 112, /* 0.44 */ }, 20 { 118, 112, /* 0.44 */ },
22 { 221, 104, /* 0.41 */ }, 21 { 221, 104, /* 0.41 */ },
23 { 347, 98, /* 0.38 */ }, 22 { 347, 98, /* 0.38 */ },
24 { 495, 93, /* 0.37 */ }, 23 { 495, 93, /* 0.37 */ },
25 { 663, 89, /* 0.35 */ }, 24 { 663, 89, /* 0.35 */ },
26 { 851, 86, /* 0.34 */ }, 25 { 851, 86, /* 0.34 */ },
27 { 1058, 83, /* 0.33 */ }, 26 { 1058, 83, /* 0.33 */ },
28 { 1284, 81, /* 0.32 */ }, 27 { 1284, 81, /* 0.32 */ },
29 { 1529, 78, /* 0.31 */ }, 28 { 1529, 78, /* 0.31 */ },
30 { 1793, 76, /* 0.30 */ }, 29 { 1793, 76, /* 0.30 */ },
31 { 2076, 74, /* 0.29 */ }, 30 { 2076, 74, /* 0.29 */ },
32 { 2378, 72, /* 0.28 */ }, 31 { 2378, 72, /* 0.28 */ },
33 { 2699, 71, /* 0.28 */ }, 32 { 2699, 71, /* 0.28 */ },
34 { 3039, 69, /* 0.27 */ }, 33 { 3039, 69, /* 0.27 */ },
35 { 3399, 68, /* 0.27 */ }, 34 { 3399, 68, /* 0.27 */ },
36 { 3778, 66, /* 0.26 */ }, 35 { 3778, 66, /* 0.26 */ },
37 { 4177, 65, /* 0.26 */ }, 36 { 4177, 65, /* 0.26 */ },
38 { 4596, 64, /* 0.25 */ }, 37 { 4596, 64, /* 0.25 */ },
39 { 5036, 62, /* 0.25 */ }, 38 { 5036, 62, /* 0.25 */ },
40 { 5497, 61, /* 0.24 */ }, 39 { 5497, 61, /* 0.24 */ },
41 { 5979, 60, /* 0.24 */ }, 40 { 5979, 60, /* 0.24 */ },
42 { 6483, 59, /* 0.23 */ }, 41 { 6483, 59, /* 0.23 */ },
43 { 7009, 58, /* 0.23 */ }, 42 { 7009, 58, /* 0.23 */ },
44 { 7558, 57, /* 0.22 */ }, 43 { 7558, 57, /* 0.22 */ },
45 { 8130, 56, /* 0.22 */ }, 44 { 8130, 56, /* 0.22 */ },
46 { 8726, 55, /* 0.22 */ }, 45 { 8726, 55, /* 0.22 */ },
47 { 9346, 54, /* 0.21 */ }, 46 { 9346, 54, /* 0.21 */ },
48 { 9991, 53, /* 0.21 */ }, 47 { 9991, 53, /* 0.21 */ },
49 { 10661, 52, /* 0.21 */ }, 48 { 10661, 52, /* 0.21 */ },
50 { 11358, 52, /* 0.20 */ }, 49 { 11358, 52, /* 0.20 */ },
51 { 12082, 51, /* 0.20 */ }, 50 { 12082, 51, /* 0.20 */ },
52 { 12834, 50, /* 0.20 */ }, 51 { 12834, 50, /* 0.20 */ },
53 { 13614, 49, /* 0.19 */ }, 52 { 13614, 49, /* 0.19 */ },
54 { 14424, 48, /* 0.19 */ }, 53 { 14424, 48, /* 0.19 */ },
55 { 15265, 48, /* 0.19 */ }, 54 { 15265, 48, /* 0.19 */ },
56 { 16137, 47, /* 0.19 */ }, 55 { 16137, 47, /* 0.19 */ },
57 { 17042, 46, /* 0.18 */ }, 56 { 17042, 46, /* 0.18 */ },
58 { 17981, 45, /* 0.18 */ }, 57 { 17981, 45, /* 0.18 */ },
59 { 18955, 45, /* 0.18 */ }, 58 { 18955, 45, /* 0.18 */ },
60 { 19965, 44, /* 0.17 */ }, 59 { 19965, 44, /* 0.17 */ },
61 { 21013, 43, /* 0.17 */ }, 60 { 21013, 43, /* 0.17 */ },
62 { 22101, 43, /* 0.17 */ }, 61 { 22101, 43, /* 0.17 */ },
63 { 23230, 42, /* 0.17 */ }, 62 { 23230, 42, /* 0.17 */ },
64 { 24402, 41, /* 0.16 */ }, 63 { 24402, 41, /* 0.16 */ },
65 { 25618, 41, /* 0.16 */ }, 64 { 25618, 41, /* 0.16 */ },
66 { 26881, 40, /* 0.16 */ }, 65 { 26881, 40, /* 0.16 */ },
67 { 28193, 39, /* 0.16 */ }, 66 { 28193, 39, /* 0.16 */ },
68 { 29557, 39, /* 0.15 */ }, 67 { 29557, 39, /* 0.15 */ },
69 { 30975, 38, /* 0.15 */ }, 68 { 30975, 38, /* 0.15 */ },
70 { 32450, 38, /* 0.15 */ }, 69 { 32450, 38, /* 0.15 */ },
71 { 33986, 37, /* 0.15 */ }, 70 { 33986, 37, /* 0.15 */ },
72 { 35586, 36, /* 0.14 */ }, 71 { 35586, 36, /* 0.14 */ },
73 { 37253, 36, /* 0.14 */ }, 72 { 37253, 36, /* 0.14 */ },
74 { 38992, 35, /* 0.14 */ }, 73 { 38992, 35, /* 0.14 */ },
75 { 40808, 35, /* 0.14 */ }, 74 { 40808, 35, /* 0.14 */ },
76 { 42707, 34, /* 0.13 */ }, 75 { 42707, 34, /* 0.13 */ },
77 { 44694, 33, /* 0.13 */ }, 76 { 44694, 33, /* 0.13 */ },
78 { 46776, 33, /* 0.13 */ }, 77 { 46776, 33, /* 0.13 */ },
79 { 48961, 32, /* 0.13 */ }, 78 { 48961, 32, /* 0.13 */ },
80 { 51258, 32, /* 0.13 */ }, 79 { 51258, 32, /* 0.13 */ },
81 { 53677, 31, /* 0.12 */ }, 80 { 53677, 31, /* 0.12 */ },
82 { 56230, 30, /* 0.12 */ }, 81 { 56230, 30, /* 0.12 */ },
83 { 58932, 30, /* 0.12 */ }, 82 { 58932, 30, /* 0.12 */ },
84 { 61799, 29, /* 0.12 */ }, 83 { 61799, 29, /* 0.12 */ },
85 { 64851, 28, /* 0.11 */ }, 84 { 64851, 28, /* 0.11 */ },
86 { 68113, 28, /* 0.11 */ }, 85 { 68113, 28, /* 0.11 */ },
87 { 71617, 27, /* 0.11 */ }, 86 { 71617, 27, /* 0.11 */ },
88 { 75401, 26, /* 0.10 */ }, 87 { 75401, 26, /* 0.10 */ },
89 { 79517, 26, /* 0.10 */ }, 88 { 79517, 26, /* 0.10 */ },
90 { 84035, 25, /* 0.10 */ }, 89 { 84035, 25, /* 0.10 */ },
91 { 89053, 24, /* 0.10 */ }, 90 { 89053, 24, /* 0.10 */ },
92}; 91};
93 92
94#define HSTCP_AIMD_MAX ARRAY_SIZE(hstcp_aimd_vals) 93#define HSTCP_AIMD_MAX ARRAY_SIZE(hstcp_aimd_vals)
diff --git a/net/ipv4/tcp_htcp.c b/net/ipv4/tcp_htcp.c
index 031361311a8b..58469fff6c18 100644
--- a/net/ipv4/tcp_htcp.c
+++ b/net/ipv4/tcp_htcp.c
@@ -98,7 +98,8 @@ static inline void measure_rtt(struct sock *sk, u32 srtt)
98 } 98 }
99} 99}
100 100
101static void measure_achieved_throughput(struct sock *sk, u32 pkts_acked, s32 rtt) 101static void measure_achieved_throughput(struct sock *sk,
102 u32 pkts_acked, s32 rtt)
102{ 103{
103 const struct inet_connection_sock *icsk = inet_csk(sk); 104 const struct inet_connection_sock *icsk = inet_csk(sk);
104 const struct tcp_sock *tp = tcp_sk(sk); 105 const struct tcp_sock *tp = tcp_sk(sk);
@@ -148,8 +149,8 @@ static inline void htcp_beta_update(struct htcp *ca, u32 minRTT, u32 maxRTT)
148 if (use_bandwidth_switch) { 149 if (use_bandwidth_switch) {
149 u32 maxB = ca->maxB; 150 u32 maxB = ca->maxB;
150 u32 old_maxB = ca->old_maxB; 151 u32 old_maxB = ca->old_maxB;
151 ca->old_maxB = ca->maxB;
152 152
153 ca->old_maxB = ca->maxB;
153 if (!between(5 * maxB, 4 * old_maxB, 6 * old_maxB)) { 154 if (!between(5 * maxB, 4 * old_maxB, 6 * old_maxB)) {
154 ca->beta = BETA_MIN; 155 ca->beta = BETA_MIN;
155 ca->modeswitch = 0; 156 ca->modeswitch = 0;
@@ -270,6 +271,7 @@ static void htcp_state(struct sock *sk, u8 new_state)
270 case TCP_CA_Open: 271 case TCP_CA_Open:
271 { 272 {
272 struct htcp *ca = inet_csk_ca(sk); 273 struct htcp *ca = inet_csk_ca(sk);
274
273 if (ca->undo_last_cong) { 275 if (ca->undo_last_cong) {
274 ca->last_cong = jiffies; 276 ca->last_cong = jiffies;
275 ca->undo_last_cong = 0; 277 ca->undo_last_cong = 0;
diff --git a/net/ipv4/tcp_hybla.c b/net/ipv4/tcp_hybla.c
index d8f8f05a4951..f963b274f2b0 100644
--- a/net/ipv4/tcp_hybla.c
+++ b/net/ipv4/tcp_hybla.c
@@ -29,7 +29,6 @@ static int rtt0 = 25;
29module_param(rtt0, int, 0644); 29module_param(rtt0, int, 0644);
30MODULE_PARM_DESC(rtt0, "reference rout trip time (ms)"); 30MODULE_PARM_DESC(rtt0, "reference rout trip time (ms)");
31 31
32
33/* This is called to refresh values for hybla parameters */ 32/* This is called to refresh values for hybla parameters */
34static inline void hybla_recalc_param (struct sock *sk) 33static inline void hybla_recalc_param (struct sock *sk)
35{ 34{
diff --git a/net/ipv4/tcp_illinois.c b/net/ipv4/tcp_illinois.c
index 5999b3972e64..1d5a30a90adf 100644
--- a/net/ipv4/tcp_illinois.c
+++ b/net/ipv4/tcp_illinois.c
@@ -284,7 +284,7 @@ static void tcp_illinois_cong_avoid(struct sock *sk, u32 ack, u32 acked)
284 delta = (tp->snd_cwnd_cnt * ca->alpha) >> ALPHA_SHIFT; 284 delta = (tp->snd_cwnd_cnt * ca->alpha) >> ALPHA_SHIFT;
285 if (delta >= tp->snd_cwnd) { 285 if (delta >= tp->snd_cwnd) {
286 tp->snd_cwnd = min(tp->snd_cwnd + delta / tp->snd_cwnd, 286 tp->snd_cwnd = min(tp->snd_cwnd + delta / tp->snd_cwnd,
287 (u32) tp->snd_cwnd_clamp); 287 (u32)tp->snd_cwnd_clamp);
288 tp->snd_cwnd_cnt = 0; 288 tp->snd_cwnd_cnt = 0;
289 } 289 }
290 } 290 }
@@ -299,7 +299,6 @@ static u32 tcp_illinois_ssthresh(struct sock *sk)
299 return max(tp->snd_cwnd - ((tp->snd_cwnd * ca->beta) >> BETA_SHIFT), 2U); 299 return max(tp->snd_cwnd - ((tp->snd_cwnd * ca->beta) >> BETA_SHIFT), 2U);
300} 300}
301 301
302
303/* Extract info for Tcp socket info provided via netlink. */ 302/* Extract info for Tcp socket info provided via netlink. */
304static void tcp_illinois_info(struct sock *sk, u32 ext, 303static void tcp_illinois_info(struct sock *sk, u32 ext,
305 struct sk_buff *skb) 304 struct sk_buff *skb)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0185eea59342..00a41499d52c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -200,28 +200,25 @@ static inline bool tcp_in_quickack_mode(const struct sock *sk)
200 return icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong; 200 return icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong;
201} 201}
202 202
203static inline void TCP_ECN_queue_cwr(struct tcp_sock *tp) 203static void tcp_ecn_queue_cwr(struct tcp_sock *tp)
204{ 204{
205 if (tp->ecn_flags & TCP_ECN_OK) 205 if (tp->ecn_flags & TCP_ECN_OK)
206 tp->ecn_flags |= TCP_ECN_QUEUE_CWR; 206 tp->ecn_flags |= TCP_ECN_QUEUE_CWR;
207} 207}
208 208
209static inline void TCP_ECN_accept_cwr(struct tcp_sock *tp, const struct sk_buff *skb) 209static void tcp_ecn_accept_cwr(struct tcp_sock *tp, const struct sk_buff *skb)
210{ 210{
211 if (tcp_hdr(skb)->cwr) 211 if (tcp_hdr(skb)->cwr)
212 tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR; 212 tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
213} 213}
214 214
215static inline void TCP_ECN_withdraw_cwr(struct tcp_sock *tp) 215static void tcp_ecn_withdraw_cwr(struct tcp_sock *tp)
216{ 216{
217 tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR; 217 tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
218} 218}
219 219
220static inline void TCP_ECN_check_ce(struct tcp_sock *tp, const struct sk_buff *skb) 220static void __tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb)
221{ 221{
222 if (!(tp->ecn_flags & TCP_ECN_OK))
223 return;
224
225 switch (TCP_SKB_CB(skb)->ip_dsfield & INET_ECN_MASK) { 222 switch (TCP_SKB_CB(skb)->ip_dsfield & INET_ECN_MASK) {
226 case INET_ECN_NOT_ECT: 223 case INET_ECN_NOT_ECT:
227 /* Funny extension: if ECT is not set on a segment, 224 /* Funny extension: if ECT is not set on a segment,
@@ -232,30 +229,43 @@ static inline void TCP_ECN_check_ce(struct tcp_sock *tp, const struct sk_buff *s
232 tcp_enter_quickack_mode((struct sock *)tp); 229 tcp_enter_quickack_mode((struct sock *)tp);
233 break; 230 break;
234 case INET_ECN_CE: 231 case INET_ECN_CE:
232 if (tcp_ca_needs_ecn((struct sock *)tp))
233 tcp_ca_event((struct sock *)tp, CA_EVENT_ECN_IS_CE);
234
235 if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) { 235 if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
236 /* Better not delay acks, sender can have a very low cwnd */ 236 /* Better not delay acks, sender can have a very low cwnd */
237 tcp_enter_quickack_mode((struct sock *)tp); 237 tcp_enter_quickack_mode((struct sock *)tp);
238 tp->ecn_flags |= TCP_ECN_DEMAND_CWR; 238 tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
239 } 239 }
240 /* fallinto */ 240 tp->ecn_flags |= TCP_ECN_SEEN;
241 break;
241 default: 242 default:
243 if (tcp_ca_needs_ecn((struct sock *)tp))
244 tcp_ca_event((struct sock *)tp, CA_EVENT_ECN_NO_CE);
242 tp->ecn_flags |= TCP_ECN_SEEN; 245 tp->ecn_flags |= TCP_ECN_SEEN;
246 break;
243 } 247 }
244} 248}
245 249
246static inline void TCP_ECN_rcv_synack(struct tcp_sock *tp, const struct tcphdr *th) 250static void tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb)
251{
252 if (tp->ecn_flags & TCP_ECN_OK)
253 __tcp_ecn_check_ce(tp, skb);
254}
255
256static void tcp_ecn_rcv_synack(struct tcp_sock *tp, const struct tcphdr *th)
247{ 257{
248 if ((tp->ecn_flags & TCP_ECN_OK) && (!th->ece || th->cwr)) 258 if ((tp->ecn_flags & TCP_ECN_OK) && (!th->ece || th->cwr))
249 tp->ecn_flags &= ~TCP_ECN_OK; 259 tp->ecn_flags &= ~TCP_ECN_OK;
250} 260}
251 261
252static inline void TCP_ECN_rcv_syn(struct tcp_sock *tp, const struct tcphdr *th) 262static void tcp_ecn_rcv_syn(struct tcp_sock *tp, const struct tcphdr *th)
253{ 263{
254 if ((tp->ecn_flags & TCP_ECN_OK) && (!th->ece || !th->cwr)) 264 if ((tp->ecn_flags & TCP_ECN_OK) && (!th->ece || !th->cwr))
255 tp->ecn_flags &= ~TCP_ECN_OK; 265 tp->ecn_flags &= ~TCP_ECN_OK;
256} 266}
257 267
258static bool TCP_ECN_rcv_ecn_echo(const struct tcp_sock *tp, const struct tcphdr *th) 268static bool tcp_ecn_rcv_ecn_echo(const struct tcp_sock *tp, const struct tcphdr *th)
259{ 269{
260 if (th->ece && !th->syn && (tp->ecn_flags & TCP_ECN_OK)) 270 if (th->ece && !th->syn && (tp->ecn_flags & TCP_ECN_OK))
261 return true; 271 return true;
@@ -652,7 +662,7 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb)
652 } 662 }
653 icsk->icsk_ack.lrcvtime = now; 663 icsk->icsk_ack.lrcvtime = now;
654 664
655 TCP_ECN_check_ce(tp, skb); 665 tcp_ecn_check_ce(tp, skb);
656 666
657 if (skb->len >= 128) 667 if (skb->len >= 128)
658 tcp_grow_window(sk, skb); 668 tcp_grow_window(sk, skb);
@@ -1294,9 +1304,9 @@ static bool tcp_shifted_skb(struct sock *sk, struct sk_buff *skb,
1294 TCP_SKB_CB(prev)->end_seq += shifted; 1304 TCP_SKB_CB(prev)->end_seq += shifted;
1295 TCP_SKB_CB(skb)->seq += shifted; 1305 TCP_SKB_CB(skb)->seq += shifted;
1296 1306
1297 skb_shinfo(prev)->gso_segs += pcount; 1307 tcp_skb_pcount_add(prev, pcount);
1298 BUG_ON(skb_shinfo(skb)->gso_segs < pcount); 1308 BUG_ON(tcp_skb_pcount(skb) < pcount);
1299 skb_shinfo(skb)->gso_segs -= pcount; 1309 tcp_skb_pcount_add(skb, -pcount);
1300 1310
1301 /* When we're adding to gso_segs == 1, gso_size will be zero, 1311 /* When we're adding to gso_segs == 1, gso_size will be zero,
1302 * in theory this shouldn't be necessary but as long as DSACK 1312 * in theory this shouldn't be necessary but as long as DSACK
@@ -1309,7 +1319,7 @@ static bool tcp_shifted_skb(struct sock *sk, struct sk_buff *skb,
1309 } 1319 }
1310 1320
1311 /* CHECKME: To clear or not to clear? Mimics normal skb currently */ 1321 /* CHECKME: To clear or not to clear? Mimics normal skb currently */
1312 if (skb_shinfo(skb)->gso_segs <= 1) { 1322 if (tcp_skb_pcount(skb) <= 1) {
1313 skb_shinfo(skb)->gso_size = 0; 1323 skb_shinfo(skb)->gso_size = 0;
1314 skb_shinfo(skb)->gso_type = 0; 1324 skb_shinfo(skb)->gso_type = 0;
1315 } 1325 }
@@ -1887,21 +1897,21 @@ static inline void tcp_reset_reno_sack(struct tcp_sock *tp)
1887 tp->sacked_out = 0; 1897 tp->sacked_out = 0;
1888} 1898}
1889 1899
1890static void tcp_clear_retrans_partial(struct tcp_sock *tp) 1900void tcp_clear_retrans(struct tcp_sock *tp)
1891{ 1901{
1892 tp->retrans_out = 0; 1902 tp->retrans_out = 0;
1893 tp->lost_out = 0; 1903 tp->lost_out = 0;
1894
1895 tp->undo_marker = 0; 1904 tp->undo_marker = 0;
1896 tp->undo_retrans = -1; 1905 tp->undo_retrans = -1;
1906 tp->fackets_out = 0;
1907 tp->sacked_out = 0;
1897} 1908}
1898 1909
1899void tcp_clear_retrans(struct tcp_sock *tp) 1910static inline void tcp_init_undo(struct tcp_sock *tp)
1900{ 1911{
1901 tcp_clear_retrans_partial(tp); 1912 tp->undo_marker = tp->snd_una;
1902 1913 /* Retransmission still in flight may cause DSACKs later. */
1903 tp->fackets_out = 0; 1914 tp->undo_retrans = tp->retrans_out ? : -1;
1904 tp->sacked_out = 0;
1905} 1915}
1906 1916
1907/* Enter Loss state. If we detect SACK reneging, forget all SACK information 1917/* Enter Loss state. If we detect SACK reneging, forget all SACK information
@@ -1924,18 +1934,18 @@ void tcp_enter_loss(struct sock *sk)
1924 tp->prior_ssthresh = tcp_current_ssthresh(sk); 1934 tp->prior_ssthresh = tcp_current_ssthresh(sk);
1925 tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk); 1935 tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
1926 tcp_ca_event(sk, CA_EVENT_LOSS); 1936 tcp_ca_event(sk, CA_EVENT_LOSS);
1937 tcp_init_undo(tp);
1927 } 1938 }
1928 tp->snd_cwnd = 1; 1939 tp->snd_cwnd = 1;
1929 tp->snd_cwnd_cnt = 0; 1940 tp->snd_cwnd_cnt = 0;
1930 tp->snd_cwnd_stamp = tcp_time_stamp; 1941 tp->snd_cwnd_stamp = tcp_time_stamp;
1931 1942
1932 tcp_clear_retrans_partial(tp); 1943 tp->retrans_out = 0;
1944 tp->lost_out = 0;
1933 1945
1934 if (tcp_is_reno(tp)) 1946 if (tcp_is_reno(tp))
1935 tcp_reset_reno_sack(tp); 1947 tcp_reset_reno_sack(tp);
1936 1948
1937 tp->undo_marker = tp->snd_una;
1938
1939 skb = tcp_write_queue_head(sk); 1949 skb = tcp_write_queue_head(sk);
1940 is_reneg = skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED); 1950 is_reneg = skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED);
1941 if (is_reneg) { 1951 if (is_reneg) {
@@ -1949,9 +1959,6 @@ void tcp_enter_loss(struct sock *sk)
1949 if (skb == tcp_send_head(sk)) 1959 if (skb == tcp_send_head(sk))
1950 break; 1960 break;
1951 1961
1952 if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_RETRANS)
1953 tp->undo_marker = 0;
1954
1955 TCP_SKB_CB(skb)->sacked &= (~TCPCB_TAGBITS)|TCPCB_SACKED_ACKED; 1962 TCP_SKB_CB(skb)->sacked &= (~TCPCB_TAGBITS)|TCPCB_SACKED_ACKED;
1956 if (!(TCP_SKB_CB(skb)->sacked&TCPCB_SACKED_ACKED) || is_reneg) { 1963 if (!(TCP_SKB_CB(skb)->sacked&TCPCB_SACKED_ACKED) || is_reneg) {
1957 TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED; 1964 TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED;
@@ -1971,7 +1978,7 @@ void tcp_enter_loss(struct sock *sk)
1971 sysctl_tcp_reordering); 1978 sysctl_tcp_reordering);
1972 tcp_set_ca_state(sk, TCP_CA_Loss); 1979 tcp_set_ca_state(sk, TCP_CA_Loss);
1973 tp->high_seq = tp->snd_nxt; 1980 tp->high_seq = tp->snd_nxt;
1974 TCP_ECN_queue_cwr(tp); 1981 tcp_ecn_queue_cwr(tp);
1975 1982
1976 /* F-RTO RFC5682 sec 3.1 step 1: retransmit SND.UNA if no previous 1983 /* F-RTO RFC5682 sec 3.1 step 1: retransmit SND.UNA if no previous
1977 * loss recovery is underway except recurring timeout(s) on 1984 * loss recovery is underway except recurring timeout(s) on
@@ -2363,7 +2370,7 @@ static void tcp_undo_cwnd_reduction(struct sock *sk, bool unmark_loss)
2363 2370
2364 if (tp->prior_ssthresh > tp->snd_ssthresh) { 2371 if (tp->prior_ssthresh > tp->snd_ssthresh) {
2365 tp->snd_ssthresh = tp->prior_ssthresh; 2372 tp->snd_ssthresh = tp->prior_ssthresh;
2366 TCP_ECN_withdraw_cwr(tp); 2373 tcp_ecn_withdraw_cwr(tp);
2367 } 2374 }
2368 } else { 2375 } else {
2369 tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh); 2376 tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh);
@@ -2493,7 +2500,7 @@ static void tcp_init_cwnd_reduction(struct sock *sk)
2493 tp->prr_delivered = 0; 2500 tp->prr_delivered = 0;
2494 tp->prr_out = 0; 2501 tp->prr_out = 0;
2495 tp->snd_ssthresh = inet_csk(sk)->icsk_ca_ops->ssthresh(sk); 2502 tp->snd_ssthresh = inet_csk(sk)->icsk_ca_ops->ssthresh(sk);
2496 TCP_ECN_queue_cwr(tp); 2503 tcp_ecn_queue_cwr(tp);
2497} 2504}
2498 2505
2499static void tcp_cwnd_reduction(struct sock *sk, const int prior_unsacked, 2506static void tcp_cwnd_reduction(struct sock *sk, const int prior_unsacked,
@@ -2670,8 +2677,7 @@ static void tcp_enter_recovery(struct sock *sk, bool ece_ack)
2670 NET_INC_STATS_BH(sock_net(sk), mib_idx); 2677 NET_INC_STATS_BH(sock_net(sk), mib_idx);
2671 2678
2672 tp->prior_ssthresh = 0; 2679 tp->prior_ssthresh = 0;
2673 tp->undo_marker = tp->snd_una; 2680 tcp_init_undo(tp);
2674 tp->undo_retrans = tp->retrans_out ? : -1;
2675 2681
2676 if (inet_csk(sk)->icsk_ca_state < TCP_CA_CWR) { 2682 if (inet_csk(sk)->icsk_ca_state < TCP_CA_CWR) {
2677 if (!ece_ack) 2683 if (!ece_ack)
@@ -2970,7 +2976,8 @@ void tcp_rearm_rto(struct sock *sk)
2970 if (icsk->icsk_pending == ICSK_TIME_EARLY_RETRANS || 2976 if (icsk->icsk_pending == ICSK_TIME_EARLY_RETRANS ||
2971 icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) { 2977 icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) {
2972 struct sk_buff *skb = tcp_write_queue_head(sk); 2978 struct sk_buff *skb = tcp_write_queue_head(sk);
2973 const u32 rto_time_stamp = TCP_SKB_CB(skb)->when + rto; 2979 const u32 rto_time_stamp =
2980 tcp_skb_timestamp(skb) + rto;
2974 s32 delta = (s32)(rto_time_stamp - tcp_time_stamp); 2981 s32 delta = (s32)(rto_time_stamp - tcp_time_stamp);
2975 /* delta may not be positive if the socket is locked 2982 /* delta may not be positive if the socket is locked
2976 * when the retrans timer fires and is rescheduled. 2983 * when the retrans timer fires and is rescheduled.
@@ -3210,9 +3217,10 @@ static void tcp_ack_probe(struct sock *sk)
3210 * This function is not for random using! 3217 * This function is not for random using!
3211 */ 3218 */
3212 } else { 3219 } else {
3220 unsigned long when = inet_csk_rto_backoff(icsk, TCP_RTO_MAX);
3221
3213 inet_csk_reset_xmit_timer(sk, ICSK_TIME_PROBE0, 3222 inet_csk_reset_xmit_timer(sk, ICSK_TIME_PROBE0,
3214 min(icsk->icsk_rto << icsk->icsk_backoff, TCP_RTO_MAX), 3223 when, TCP_RTO_MAX);
3215 TCP_RTO_MAX);
3216 } 3224 }
3217} 3225}
3218 3226
@@ -3363,6 +3371,14 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag)
3363 } 3371 }
3364} 3372}
3365 3373
3374static inline void tcp_in_ack_event(struct sock *sk, u32 flags)
3375{
3376 const struct inet_connection_sock *icsk = inet_csk(sk);
3377
3378 if (icsk->icsk_ca_ops->in_ack_event)
3379 icsk->icsk_ca_ops->in_ack_event(sk, flags);
3380}
3381
3366/* This routine deals with incoming acks, but not outgoing ones. */ 3382/* This routine deals with incoming acks, but not outgoing ones. */
3367static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) 3383static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
3368{ 3384{
@@ -3422,10 +3438,12 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
3422 tp->snd_una = ack; 3438 tp->snd_una = ack;
3423 flag |= FLAG_WIN_UPDATE; 3439 flag |= FLAG_WIN_UPDATE;
3424 3440
3425 tcp_ca_event(sk, CA_EVENT_FAST_ACK); 3441 tcp_in_ack_event(sk, CA_ACK_WIN_UPDATE);
3426 3442
3427 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPACKS); 3443 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPACKS);
3428 } else { 3444 } else {
3445 u32 ack_ev_flags = CA_ACK_SLOWPATH;
3446
3429 if (ack_seq != TCP_SKB_CB(skb)->end_seq) 3447 if (ack_seq != TCP_SKB_CB(skb)->end_seq)
3430 flag |= FLAG_DATA; 3448 flag |= FLAG_DATA;
3431 else 3449 else
@@ -3437,10 +3455,15 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
3437 flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una, 3455 flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una,
3438 &sack_rtt_us); 3456 &sack_rtt_us);
3439 3457
3440 if (TCP_ECN_rcv_ecn_echo(tp, tcp_hdr(skb))) 3458 if (tcp_ecn_rcv_ecn_echo(tp, tcp_hdr(skb))) {
3441 flag |= FLAG_ECE; 3459 flag |= FLAG_ECE;
3460 ack_ev_flags |= CA_ACK_ECE;
3461 }
3462
3463 if (flag & FLAG_WIN_UPDATE)
3464 ack_ev_flags |= CA_ACK_WIN_UPDATE;
3442 3465
3443 tcp_ca_event(sk, CA_EVENT_SLOW_ACK); 3466 tcp_in_ack_event(sk, ack_ev_flags);
3444 } 3467 }
3445 3468
3446 /* We passed data and got it acked, remove any soft error 3469 /* We passed data and got it acked, remove any soft error
@@ -4062,6 +4085,44 @@ static void tcp_sack_remove(struct tcp_sock *tp)
4062 tp->rx_opt.num_sacks = num_sacks; 4085 tp->rx_opt.num_sacks = num_sacks;
4063} 4086}
4064 4087
4088/**
4089 * tcp_try_coalesce - try to merge skb to prior one
4090 * @sk: socket
4091 * @to: prior buffer
4092 * @from: buffer to add in queue
4093 * @fragstolen: pointer to boolean
4094 *
4095 * Before queueing skb @from after @to, try to merge them
4096 * to reduce overall memory use and queue lengths, if cost is small.
4097 * Packets in ofo or receive queues can stay a long time.
4098 * Better try to coalesce them right now to avoid future collapses.
4099 * Returns true if caller should free @from instead of queueing it
4100 */
4101static bool tcp_try_coalesce(struct sock *sk,
4102 struct sk_buff *to,
4103 struct sk_buff *from,
4104 bool *fragstolen)
4105{
4106 int delta;
4107
4108 *fragstolen = false;
4109
4110 /* Its possible this segment overlaps with prior segment in queue */
4111 if (TCP_SKB_CB(from)->seq != TCP_SKB_CB(to)->end_seq)
4112 return false;
4113
4114 if (!skb_try_coalesce(to, from, fragstolen, &delta))
4115 return false;
4116
4117 atomic_add(delta, &sk->sk_rmem_alloc);
4118 sk_mem_charge(sk, delta);
4119 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
4120 TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
4121 TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
4122 TCP_SKB_CB(to)->tcp_flags |= TCP_SKB_CB(from)->tcp_flags;
4123 return true;
4124}
4125
4065/* This one checks to see if we can put data from the 4126/* This one checks to see if we can put data from the
4066 * out_of_order queue into the receive_queue. 4127 * out_of_order queue into the receive_queue.
4067 */ 4128 */
@@ -4069,7 +4130,8 @@ static void tcp_ofo_queue(struct sock *sk)
4069{ 4130{
4070 struct tcp_sock *tp = tcp_sk(sk); 4131 struct tcp_sock *tp = tcp_sk(sk);
4071 __u32 dsack_high = tp->rcv_nxt; 4132 __u32 dsack_high = tp->rcv_nxt;
4072 struct sk_buff *skb; 4133 struct sk_buff *skb, *tail;
4134 bool fragstolen, eaten;
4073 4135
4074 while ((skb = skb_peek(&tp->out_of_order_queue)) != NULL) { 4136 while ((skb = skb_peek(&tp->out_of_order_queue)) != NULL) {
4075 if (after(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) 4137 if (after(TCP_SKB_CB(skb)->seq, tp->rcv_nxt))
@@ -4082,9 +4144,9 @@ static void tcp_ofo_queue(struct sock *sk)
4082 tcp_dsack_extend(sk, TCP_SKB_CB(skb)->seq, dsack); 4144 tcp_dsack_extend(sk, TCP_SKB_CB(skb)->seq, dsack);
4083 } 4145 }
4084 4146
4147 __skb_unlink(skb, &tp->out_of_order_queue);
4085 if (!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt)) { 4148 if (!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt)) {
4086 SOCK_DEBUG(sk, "ofo packet was already received\n"); 4149 SOCK_DEBUG(sk, "ofo packet was already received\n");
4087 __skb_unlink(skb, &tp->out_of_order_queue);
4088 __kfree_skb(skb); 4150 __kfree_skb(skb);
4089 continue; 4151 continue;
4090 } 4152 }
@@ -4092,11 +4154,15 @@ static void tcp_ofo_queue(struct sock *sk)
4092 tp->rcv_nxt, TCP_SKB_CB(skb)->seq, 4154 tp->rcv_nxt, TCP_SKB_CB(skb)->seq,
4093 TCP_SKB_CB(skb)->end_seq); 4155 TCP_SKB_CB(skb)->end_seq);
4094 4156
4095 __skb_unlink(skb, &tp->out_of_order_queue); 4157 tail = skb_peek_tail(&sk->sk_receive_queue);
4096 __skb_queue_tail(&sk->sk_receive_queue, skb); 4158 eaten = tail && tcp_try_coalesce(sk, tail, skb, &fragstolen);
4097 tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq; 4159 tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
4098 if (tcp_hdr(skb)->fin) 4160 if (!eaten)
4161 __skb_queue_tail(&sk->sk_receive_queue, skb);
4162 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
4099 tcp_fin(sk); 4163 tcp_fin(sk);
4164 if (eaten)
4165 kfree_skb_partial(skb, fragstolen);
4100 } 4166 }
4101} 4167}
4102 4168
@@ -4123,53 +4189,13 @@ static int tcp_try_rmem_schedule(struct sock *sk, struct sk_buff *skb,
4123 return 0; 4189 return 0;
4124} 4190}
4125 4191
4126/**
4127 * tcp_try_coalesce - try to merge skb to prior one
4128 * @sk: socket
4129 * @to: prior buffer
4130 * @from: buffer to add in queue
4131 * @fragstolen: pointer to boolean
4132 *
4133 * Before queueing skb @from after @to, try to merge them
4134 * to reduce overall memory use and queue lengths, if cost is small.
4135 * Packets in ofo or receive queues can stay a long time.
4136 * Better try to coalesce them right now to avoid future collapses.
4137 * Returns true if caller should free @from instead of queueing it
4138 */
4139static bool tcp_try_coalesce(struct sock *sk,
4140 struct sk_buff *to,
4141 struct sk_buff *from,
4142 bool *fragstolen)
4143{
4144 int delta;
4145
4146 *fragstolen = false;
4147
4148 if (tcp_hdr(from)->fin)
4149 return false;
4150
4151 /* Its possible this segment overlaps with prior segment in queue */
4152 if (TCP_SKB_CB(from)->seq != TCP_SKB_CB(to)->end_seq)
4153 return false;
4154
4155 if (!skb_try_coalesce(to, from, fragstolen, &delta))
4156 return false;
4157
4158 atomic_add(delta, &sk->sk_rmem_alloc);
4159 sk_mem_charge(sk, delta);
4160 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
4161 TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
4162 TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
4163 return true;
4164}
4165
4166static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) 4192static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
4167{ 4193{
4168 struct tcp_sock *tp = tcp_sk(sk); 4194 struct tcp_sock *tp = tcp_sk(sk);
4169 struct sk_buff *skb1; 4195 struct sk_buff *skb1;
4170 u32 seq, end_seq; 4196 u32 seq, end_seq;
4171 4197
4172 TCP_ECN_check_ce(tp, skb); 4198 tcp_ecn_check_ce(tp, skb);
4173 4199
4174 if (unlikely(tcp_try_rmem_schedule(sk, skb, skb->truesize))) { 4200 if (unlikely(tcp_try_rmem_schedule(sk, skb, skb->truesize))) {
4175 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPOFODROP); 4201 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPOFODROP);
@@ -4308,24 +4334,19 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int
4308 4334
4309int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size) 4335int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
4310{ 4336{
4311 struct sk_buff *skb = NULL; 4337 struct sk_buff *skb;
4312 struct tcphdr *th;
4313 bool fragstolen; 4338 bool fragstolen;
4314 4339
4315 if (size == 0) 4340 if (size == 0)
4316 return 0; 4341 return 0;
4317 4342
4318 skb = alloc_skb(size + sizeof(*th), sk->sk_allocation); 4343 skb = alloc_skb(size, sk->sk_allocation);
4319 if (!skb) 4344 if (!skb)
4320 goto err; 4345 goto err;
4321 4346
4322 if (tcp_try_rmem_schedule(sk, skb, size + sizeof(*th))) 4347 if (tcp_try_rmem_schedule(sk, skb, skb->truesize))
4323 goto err_free; 4348 goto err_free;
4324 4349
4325 th = (struct tcphdr *)skb_put(skb, sizeof(*th));
4326 skb_reset_transport_header(skb);
4327 memset(th, 0, sizeof(*th));
4328
4329 if (memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size)) 4350 if (memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size))
4330 goto err_free; 4351 goto err_free;
4331 4352
@@ -4333,7 +4354,7 @@ int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
4333 TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + size; 4354 TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + size;
4334 TCP_SKB_CB(skb)->ack_seq = tcp_sk(sk)->snd_una - 1; 4355 TCP_SKB_CB(skb)->ack_seq = tcp_sk(sk)->snd_una - 1;
4335 4356
4336 if (tcp_queue_rcv(sk, skb, sizeof(*th), &fragstolen)) { 4357 if (tcp_queue_rcv(sk, skb, 0, &fragstolen)) {
4337 WARN_ON_ONCE(fragstolen); /* should not happen */ 4358 WARN_ON_ONCE(fragstolen); /* should not happen */
4338 __kfree_skb(skb); 4359 __kfree_skb(skb);
4339 } 4360 }
@@ -4347,7 +4368,6 @@ err:
4347 4368
4348static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) 4369static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
4349{ 4370{
4350 const struct tcphdr *th = tcp_hdr(skb);
4351 struct tcp_sock *tp = tcp_sk(sk); 4371 struct tcp_sock *tp = tcp_sk(sk);
4352 int eaten = -1; 4372 int eaten = -1;
4353 bool fragstolen = false; 4373 bool fragstolen = false;
@@ -4356,9 +4376,9 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
4356 goto drop; 4376 goto drop;
4357 4377
4358 skb_dst_drop(skb); 4378 skb_dst_drop(skb);
4359 __skb_pull(skb, th->doff * 4); 4379 __skb_pull(skb, tcp_hdr(skb)->doff * 4);
4360 4380
4361 TCP_ECN_accept_cwr(tp, skb); 4381 tcp_ecn_accept_cwr(tp, skb);
4362 4382
4363 tp->rx_opt.dsack = 0; 4383 tp->rx_opt.dsack = 0;
4364 4384
@@ -4400,7 +4420,7 @@ queue_and_out:
4400 tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq; 4420 tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
4401 if (skb->len) 4421 if (skb->len)
4402 tcp_event_data_recv(sk, skb); 4422 tcp_event_data_recv(sk, skb);
4403 if (th->fin) 4423 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
4404 tcp_fin(sk); 4424 tcp_fin(sk);
4405 4425
4406 if (!skb_queue_empty(&tp->out_of_order_queue)) { 4426 if (!skb_queue_empty(&tp->out_of_order_queue)) {
@@ -4515,7 +4535,7 @@ restart:
4515 * - bloated or contains data before "start" or 4535 * - bloated or contains data before "start" or
4516 * overlaps to the next one. 4536 * overlaps to the next one.
4517 */ 4537 */
4518 if (!tcp_hdr(skb)->syn && !tcp_hdr(skb)->fin && 4538 if (!(TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) &&
4519 (tcp_win_from_space(skb->truesize) > skb->len || 4539 (tcp_win_from_space(skb->truesize) > skb->len ||
4520 before(TCP_SKB_CB(skb)->seq, start))) { 4540 before(TCP_SKB_CB(skb)->seq, start))) {
4521 end_of_skbs = false; 4541 end_of_skbs = false;
@@ -4534,30 +4554,18 @@ restart:
4534 /* Decided to skip this, advance start seq. */ 4554 /* Decided to skip this, advance start seq. */
4535 start = TCP_SKB_CB(skb)->end_seq; 4555 start = TCP_SKB_CB(skb)->end_seq;
4536 } 4556 }
4537 if (end_of_skbs || tcp_hdr(skb)->syn || tcp_hdr(skb)->fin) 4557 if (end_of_skbs ||
4558 (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)))
4538 return; 4559 return;
4539 4560
4540 while (before(start, end)) { 4561 while (before(start, end)) {
4562 int copy = min_t(int, SKB_MAX_ORDER(0, 0), end - start);
4541 struct sk_buff *nskb; 4563 struct sk_buff *nskb;
4542 unsigned int header = skb_headroom(skb);
4543 int copy = SKB_MAX_ORDER(header, 0);
4544 4564
4545 /* Too big header? This can happen with IPv6. */ 4565 nskb = alloc_skb(copy, GFP_ATOMIC);
4546 if (copy < 0)
4547 return;
4548 if (end - start < copy)
4549 copy = end - start;
4550 nskb = alloc_skb(copy + header, GFP_ATOMIC);
4551 if (!nskb) 4566 if (!nskb)
4552 return; 4567 return;
4553 4568
4554 skb_set_mac_header(nskb, skb_mac_header(skb) - skb->head);
4555 skb_set_network_header(nskb, (skb_network_header(skb) -
4556 skb->head));
4557 skb_set_transport_header(nskb, (skb_transport_header(skb) -
4558 skb->head));
4559 skb_reserve(nskb, header);
4560 memcpy(nskb->head, skb->head, header);
4561 memcpy(nskb->cb, skb->cb, sizeof(skb->cb)); 4569 memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
4562 TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(nskb)->end_seq = start; 4570 TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(nskb)->end_seq = start;
4563 __skb_queue_before(list, skb, nskb); 4571 __skb_queue_before(list, skb, nskb);
@@ -4581,8 +4589,7 @@ restart:
4581 skb = tcp_collapse_one(sk, skb, list); 4589 skb = tcp_collapse_one(sk, skb, list);
4582 if (!skb || 4590 if (!skb ||
4583 skb == tail || 4591 skb == tail ||
4584 tcp_hdr(skb)->syn || 4592 (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)))
4585 tcp_hdr(skb)->fin)
4586 return; 4593 return;
4587 } 4594 }
4588 } 4595 }
@@ -5386,7 +5393,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
5386 * state to ESTABLISHED..." 5393 * state to ESTABLISHED..."
5387 */ 5394 */
5388 5395
5389 TCP_ECN_rcv_synack(tp, th); 5396 tcp_ecn_rcv_synack(tp, th);
5390 5397
5391 tcp_init_wl(tp, TCP_SKB_CB(skb)->seq); 5398 tcp_init_wl(tp, TCP_SKB_CB(skb)->seq);
5392 tcp_ack(sk, skb, FLAG_SLOWPATH); 5399 tcp_ack(sk, skb, FLAG_SLOWPATH);
@@ -5505,7 +5512,7 @@ discard:
5505 tp->snd_wl1 = TCP_SKB_CB(skb)->seq; 5512 tp->snd_wl1 = TCP_SKB_CB(skb)->seq;
5506 tp->max_window = tp->snd_wnd; 5513 tp->max_window = tp->snd_wnd;
5507 5514
5508 TCP_ECN_rcv_syn(tp, th); 5515 tcp_ecn_rcv_syn(tp, th);
5509 5516
5510 tcp_mtup_init(sk); 5517 tcp_mtup_init(sk);
5511 tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); 5518 tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
@@ -5835,6 +5842,40 @@ static inline void pr_drop_req(struct request_sock *req, __u16 port, int family)
5835#endif 5842#endif
5836} 5843}
5837 5844
5845/* RFC3168 : 6.1.1 SYN packets must not have ECT/ECN bits set
5846 *
5847 * If we receive a SYN packet with these bits set, it means a
5848 * network is playing bad games with TOS bits. In order to
5849 * avoid possible false congestion notifications, we disable
5850 * TCP ECN negociation.
5851 *
5852 * Exception: tcp_ca wants ECN. This is required for DCTCP
5853 * congestion control; it requires setting ECT on all packets,
5854 * including SYN. We inverse the test in this case: If our
5855 * local socket wants ECN, but peer only set ece/cwr (but not
5856 * ECT in IP header) its probably a non-DCTCP aware sender.
5857 */
5858static void tcp_ecn_create_request(struct request_sock *req,
5859 const struct sk_buff *skb,
5860 const struct sock *listen_sk)
5861{
5862 const struct tcphdr *th = tcp_hdr(skb);
5863 const struct net *net = sock_net(listen_sk);
5864 bool th_ecn = th->ece && th->cwr;
5865 bool ect, need_ecn;
5866
5867 if (!th_ecn)
5868 return;
5869
5870 ect = !INET_ECN_is_not_ect(TCP_SKB_CB(skb)->ip_dsfield);
5871 need_ecn = tcp_ca_needs_ecn(listen_sk);
5872
5873 if (!ect && !need_ecn && net->ipv4.sysctl_tcp_ecn)
5874 inet_rsk(req)->ecn_ok = 1;
5875 else if (ect && need_ecn)
5876 inet_rsk(req)->ecn_ok = 1;
5877}
5878
5838int tcp_conn_request(struct request_sock_ops *rsk_ops, 5879int tcp_conn_request(struct request_sock_ops *rsk_ops,
5839 const struct tcp_request_sock_ops *af_ops, 5880 const struct tcp_request_sock_ops *af_ops,
5840 struct sock *sk, struct sk_buff *skb) 5881 struct sock *sk, struct sk_buff *skb)
@@ -5843,7 +5884,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
5843 struct request_sock *req; 5884 struct request_sock *req;
5844 struct tcp_sock *tp = tcp_sk(sk); 5885 struct tcp_sock *tp = tcp_sk(sk);
5845 struct dst_entry *dst = NULL; 5886 struct dst_entry *dst = NULL;
5846 __u32 isn = TCP_SKB_CB(skb)->when; 5887 __u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn;
5847 bool want_cookie = false, fastopen; 5888 bool want_cookie = false, fastopen;
5848 struct flowi fl; 5889 struct flowi fl;
5849 struct tcp_fastopen_cookie foc = { .len = -1 }; 5890 struct tcp_fastopen_cookie foc = { .len = -1 };
@@ -5895,7 +5936,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
5895 goto drop_and_free; 5936 goto drop_and_free;
5896 5937
5897 if (!want_cookie || tmp_opt.tstamp_ok) 5938 if (!want_cookie || tmp_opt.tstamp_ok)
5898 TCP_ECN_create_request(req, skb, sock_net(sk)); 5939 tcp_ecn_create_request(req, skb, sk);
5899 5940
5900 if (want_cookie) { 5941 if (want_cookie) {
5901 isn = cookie_init_sequence(af_ops, sk, skb, &req->mss); 5942 isn = cookie_init_sequence(af_ops, sk, skb, &req->mss);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index fbea536cf5c0..552e87e3c269 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -89,7 +89,6 @@ int sysctl_tcp_tw_reuse __read_mostly;
89int sysctl_tcp_low_latency __read_mostly; 89int sysctl_tcp_low_latency __read_mostly;
90EXPORT_SYMBOL(sysctl_tcp_low_latency); 90EXPORT_SYMBOL(sysctl_tcp_low_latency);
91 91
92
93#ifdef CONFIG_TCP_MD5SIG 92#ifdef CONFIG_TCP_MD5SIG
94static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key, 93static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key,
95 __be32 daddr, __be32 saddr, const struct tcphdr *th); 94 __be32 daddr, __be32 saddr, const struct tcphdr *th);
@@ -430,15 +429,16 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
430 break; 429 break;
431 430
432 icsk->icsk_backoff--; 431 icsk->icsk_backoff--;
433 inet_csk(sk)->icsk_rto = (tp->srtt_us ? __tcp_set_rto(tp) : 432 icsk->icsk_rto = tp->srtt_us ? __tcp_set_rto(tp) :
434 TCP_TIMEOUT_INIT) << icsk->icsk_backoff; 433 TCP_TIMEOUT_INIT;
435 tcp_bound_rto(sk); 434 icsk->icsk_rto = inet_csk_rto_backoff(icsk, TCP_RTO_MAX);
436 435
437 skb = tcp_write_queue_head(sk); 436 skb = tcp_write_queue_head(sk);
438 BUG_ON(!skb); 437 BUG_ON(!skb);
439 438
440 remaining = icsk->icsk_rto - min(icsk->icsk_rto, 439 remaining = icsk->icsk_rto -
441 tcp_time_stamp - TCP_SKB_CB(skb)->when); 440 min(icsk->icsk_rto,
441 tcp_time_stamp - tcp_skb_timestamp(skb));
442 442
443 if (remaining) { 443 if (remaining) {
444 inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, 444 inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
@@ -680,8 +680,9 @@ static void tcp_v4_send_reset(struct sock *sk, struct sk_buff *skb)
680 680
681 net = dev_net(skb_dst(skb)->dev); 681 net = dev_net(skb_dst(skb)->dev);
682 arg.tos = ip_hdr(skb)->tos; 682 arg.tos = ip_hdr(skb)->tos;
683 ip_send_unicast_reply(net, skb, ip_hdr(skb)->saddr, 683 ip_send_unicast_reply(net, skb, &TCP_SKB_CB(skb)->header.h4.opt,
684 ip_hdr(skb)->daddr, &arg, arg.iov[0].iov_len); 684 ip_hdr(skb)->saddr, ip_hdr(skb)->daddr,
685 &arg, arg.iov[0].iov_len);
685 686
686 TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS); 687 TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS);
687 TCP_INC_STATS_BH(net, TCP_MIB_OUTRSTS); 688 TCP_INC_STATS_BH(net, TCP_MIB_OUTRSTS);
@@ -763,8 +764,9 @@ static void tcp_v4_send_ack(struct sk_buff *skb, u32 seq, u32 ack,
763 if (oif) 764 if (oif)
764 arg.bound_dev_if = oif; 765 arg.bound_dev_if = oif;
765 arg.tos = tos; 766 arg.tos = tos;
766 ip_send_unicast_reply(net, skb, ip_hdr(skb)->saddr, 767 ip_send_unicast_reply(net, skb, &TCP_SKB_CB(skb)->header.h4.opt,
767 ip_hdr(skb)->daddr, &arg, arg.iov[0].iov_len); 768 ip_hdr(skb)->saddr, ip_hdr(skb)->daddr,
769 &arg, arg.iov[0].iov_len);
768 770
769 TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS); 771 TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS);
770} 772}
@@ -883,18 +885,16 @@ EXPORT_SYMBOL(tcp_syn_flood_action);
883 */ 885 */
884static struct ip_options_rcu *tcp_v4_save_options(struct sk_buff *skb) 886static struct ip_options_rcu *tcp_v4_save_options(struct sk_buff *skb)
885{ 887{
886 const struct ip_options *opt = &(IPCB(skb)->opt); 888 const struct ip_options *opt = &TCP_SKB_CB(skb)->header.h4.opt;
887 struct ip_options_rcu *dopt = NULL; 889 struct ip_options_rcu *dopt = NULL;
888 890
889 if (opt && opt->optlen) { 891 if (opt && opt->optlen) {
890 int opt_size = sizeof(*dopt) + opt->optlen; 892 int opt_size = sizeof(*dopt) + opt->optlen;
891 893
892 dopt = kmalloc(opt_size, GFP_ATOMIC); 894 dopt = kmalloc(opt_size, GFP_ATOMIC);
893 if (dopt) { 895 if (dopt && __ip_options_echo(&dopt->opt, skb, opt)) {
894 if (ip_options_echo(&dopt->opt, skb)) { 896 kfree(dopt);
895 kfree(dopt); 897 dopt = NULL;
896 dopt = NULL;
897 }
898 } 898 }
899 } 899 }
900 return dopt; 900 return dopt;
@@ -1268,7 +1268,7 @@ struct request_sock_ops tcp_request_sock_ops __read_mostly = {
1268 .send_ack = tcp_v4_reqsk_send_ack, 1268 .send_ack = tcp_v4_reqsk_send_ack,
1269 .destructor = tcp_v4_reqsk_destructor, 1269 .destructor = tcp_v4_reqsk_destructor,
1270 .send_reset = tcp_v4_send_reset, 1270 .send_reset = tcp_v4_send_reset,
1271 .syn_ack_timeout = tcp_syn_ack_timeout, 1271 .syn_ack_timeout = tcp_syn_ack_timeout,
1272}; 1272};
1273 1273
1274static const struct tcp_request_sock_ops tcp_request_sock_ipv4_ops = { 1274static const struct tcp_request_sock_ops tcp_request_sock_ipv4_ops = {
@@ -1428,7 +1428,7 @@ static struct sock *tcp_v4_hnd_req(struct sock *sk, struct sk_buff *skb)
1428 1428
1429#ifdef CONFIG_SYN_COOKIES 1429#ifdef CONFIG_SYN_COOKIES
1430 if (!th->syn) 1430 if (!th->syn)
1431 sk = cookie_v4_check(sk, skb, &(IPCB(skb)->opt)); 1431 sk = cookie_v4_check(sk, skb, &TCP_SKB_CB(skb)->header.h4.opt);
1432#endif 1432#endif
1433 return sk; 1433 return sk;
1434} 1434}
@@ -1558,7 +1558,17 @@ bool tcp_prequeue(struct sock *sk, struct sk_buff *skb)
1558 skb_queue_len(&tp->ucopy.prequeue) == 0) 1558 skb_queue_len(&tp->ucopy.prequeue) == 0)
1559 return false; 1559 return false;
1560 1560
1561 skb_dst_force(skb); 1561 /* Before escaping RCU protected region, we need to take care of skb
1562 * dst. Prequeue is only enabled for established sockets.
1563 * For such sockets, we might need the skb dst only to set sk->sk_rx_dst
1564 * Instead of doing full sk_rx_dst validity here, let's perform
1565 * an optimistic check.
1566 */
1567 if (likely(sk->sk_rx_dst))
1568 skb_dst_drop(skb);
1569 else
1570 skb_dst_force(skb);
1571
1562 __skb_queue_tail(&tp->ucopy.prequeue, skb); 1572 __skb_queue_tail(&tp->ucopy.prequeue, skb);
1563 tp->ucopy.memory += skb->truesize; 1573 tp->ucopy.memory += skb->truesize;
1564 if (tp->ucopy.memory > sk->sk_rcvbuf) { 1574 if (tp->ucopy.memory > sk->sk_rcvbuf) {
@@ -1623,11 +1633,19 @@ int tcp_v4_rcv(struct sk_buff *skb)
1623 1633
1624 th = tcp_hdr(skb); 1634 th = tcp_hdr(skb);
1625 iph = ip_hdr(skb); 1635 iph = ip_hdr(skb);
1636 /* This is tricky : We move IPCB at its correct location into TCP_SKB_CB()
1637 * barrier() makes sure compiler wont play fool^Waliasing games.
1638 */
1639 memmove(&TCP_SKB_CB(skb)->header.h4, IPCB(skb),
1640 sizeof(struct inet_skb_parm));
1641 barrier();
1642
1626 TCP_SKB_CB(skb)->seq = ntohl(th->seq); 1643 TCP_SKB_CB(skb)->seq = ntohl(th->seq);
1627 TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin + 1644 TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin +
1628 skb->len - th->doff * 4); 1645 skb->len - th->doff * 4);
1629 TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq); 1646 TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq);
1630 TCP_SKB_CB(skb)->when = 0; 1647 TCP_SKB_CB(skb)->tcp_flags = tcp_flag_byte(th);
1648 TCP_SKB_CB(skb)->tcp_tw_isn = 0;
1631 TCP_SKB_CB(skb)->ip_dsfield = ipv4_get_dsfield(iph); 1649 TCP_SKB_CB(skb)->ip_dsfield = ipv4_get_dsfield(iph);
1632 TCP_SKB_CB(skb)->sacked = 0; 1650 TCP_SKB_CB(skb)->sacked = 0;
1633 1651
@@ -1754,9 +1772,11 @@ void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
1754{ 1772{
1755 struct dst_entry *dst = skb_dst(skb); 1773 struct dst_entry *dst = skb_dst(skb);
1756 1774
1757 dst_hold(dst); 1775 if (dst) {
1758 sk->sk_rx_dst = dst; 1776 dst_hold(dst);
1759 inet_sk(sk)->rx_dst_ifindex = skb->skb_iif; 1777 sk->sk_rx_dst = dst;
1778 inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
1779 }
1760} 1780}
1761EXPORT_SYMBOL(inet_sk_rx_dst_set); 1781EXPORT_SYMBOL(inet_sk_rx_dst_set);
1762 1782
@@ -2167,7 +2187,7 @@ int tcp_seq_open(struct inode *inode, struct file *file)
2167 2187
2168 s = ((struct seq_file *)file->private_data)->private; 2188 s = ((struct seq_file *)file->private_data)->private;
2169 s->family = afinfo->family; 2189 s->family = afinfo->family;
2170 s->last_pos = 0; 2190 s->last_pos = 0;
2171 return 0; 2191 return 0;
2172} 2192}
2173EXPORT_SYMBOL(tcp_seq_open); 2193EXPORT_SYMBOL(tcp_seq_open);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 1649988bd1b6..63d2680b65db 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -232,7 +232,7 @@ kill:
232 u32 isn = tcptw->tw_snd_nxt + 65535 + 2; 232 u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
233 if (isn == 0) 233 if (isn == 0)
234 isn++; 234 isn++;
235 TCP_SKB_CB(skb)->when = isn; 235 TCP_SKB_CB(skb)->tcp_tw_isn = isn;
236 return TCP_TW_SYN; 236 return TCP_TW_SYN;
237 } 237 }
238 238
@@ -393,8 +393,8 @@ void tcp_openreq_init_rwin(struct request_sock *req,
393} 393}
394EXPORT_SYMBOL(tcp_openreq_init_rwin); 394EXPORT_SYMBOL(tcp_openreq_init_rwin);
395 395
396static inline void TCP_ECN_openreq_child(struct tcp_sock *tp, 396static void tcp_ecn_openreq_child(struct tcp_sock *tp,
397 struct request_sock *req) 397 const struct request_sock *req)
398{ 398{
399 tp->ecn_flags = inet_rsk(req)->ecn_ok ? TCP_ECN_OK : 0; 399 tp->ecn_flags = inet_rsk(req)->ecn_ok ? TCP_ECN_OK : 0;
400} 400}
@@ -451,9 +451,8 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
451 newtp->snd_cwnd = TCP_INIT_CWND; 451 newtp->snd_cwnd = TCP_INIT_CWND;
452 newtp->snd_cwnd_cnt = 0; 452 newtp->snd_cwnd_cnt = 0;
453 453
454 if (newicsk->icsk_ca_ops != &tcp_init_congestion_ops && 454 if (!try_module_get(newicsk->icsk_ca_ops->owner))
455 !try_module_get(newicsk->icsk_ca_ops->owner)) 455 tcp_assign_congestion_control(newsk);
456 newicsk->icsk_ca_ops = &tcp_init_congestion_ops;
457 456
458 tcp_set_ca_state(newsk, TCP_CA_Open); 457 tcp_set_ca_state(newsk, TCP_CA_Open);
459 tcp_init_xmit_timers(newsk); 458 tcp_init_xmit_timers(newsk);
@@ -508,7 +507,7 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
508 if (skb->len >= TCP_MSS_DEFAULT + newtp->tcp_header_len) 507 if (skb->len >= TCP_MSS_DEFAULT + newtp->tcp_header_len)
509 newicsk->icsk_ack.last_seg_size = skb->len - newtp->tcp_header_len; 508 newicsk->icsk_ack.last_seg_size = skb->len - newtp->tcp_header_len;
510 newtp->rx_opt.mss_clamp = req->mss; 509 newtp->rx_opt.mss_clamp = req->mss;
511 TCP_ECN_openreq_child(newtp, req); 510 tcp_ecn_openreq_child(newtp, req);
512 newtp->fastopen_rsk = NULL; 511 newtp->fastopen_rsk = NULL;
513 newtp->syn_data_acked = 0; 512 newtp->syn_data_acked = 0;
514 513
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index bc1b83cb8309..5b90f2f447a5 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -29,6 +29,28 @@ static void tcp_gso_tstamp(struct sk_buff *skb, unsigned int ts_seq,
29 } 29 }
30} 30}
31 31
32struct sk_buff *tcp4_gso_segment(struct sk_buff *skb,
33 netdev_features_t features)
34{
35 if (!pskb_may_pull(skb, sizeof(struct tcphdr)))
36 return ERR_PTR(-EINVAL);
37
38 if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL)) {
39 const struct iphdr *iph = ip_hdr(skb);
40 struct tcphdr *th = tcp_hdr(skb);
41
42 /* Set up checksum pseudo header, usually expect stack to
43 * have done this already.
44 */
45
46 th->check = 0;
47 skb->ip_summed = CHECKSUM_PARTIAL;
48 __tcp_v4_send_check(skb, iph->saddr, iph->daddr);
49 }
50
51 return tcp_gso_segment(skb, features);
52}
53
32struct sk_buff *tcp_gso_segment(struct sk_buff *skb, 54struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
33 netdev_features_t features) 55 netdev_features_t features)
34{ 56{
@@ -44,9 +66,6 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
44 __sum16 newcheck; 66 __sum16 newcheck;
45 bool ooo_okay, copy_destructor; 67 bool ooo_okay, copy_destructor;
46 68
47 if (!pskb_may_pull(skb, sizeof(*th)))
48 goto out;
49
50 th = tcp_hdr(skb); 69 th = tcp_hdr(skb);
51 thlen = th->doff * 4; 70 thlen = th->doff * 4;
52 if (thlen < sizeof(*th)) 71 if (thlen < sizeof(*th))
@@ -269,54 +288,16 @@ int tcp_gro_complete(struct sk_buff *skb)
269} 288}
270EXPORT_SYMBOL(tcp_gro_complete); 289EXPORT_SYMBOL(tcp_gro_complete);
271 290
272static int tcp_v4_gso_send_check(struct sk_buff *skb)
273{
274 const struct iphdr *iph;
275 struct tcphdr *th;
276
277 if (!pskb_may_pull(skb, sizeof(*th)))
278 return -EINVAL;
279
280 iph = ip_hdr(skb);
281 th = tcp_hdr(skb);
282
283 th->check = 0;
284 skb->ip_summed = CHECKSUM_PARTIAL;
285 __tcp_v4_send_check(skb, iph->saddr, iph->daddr);
286 return 0;
287}
288
289static struct sk_buff **tcp4_gro_receive(struct sk_buff **head, struct sk_buff *skb) 291static struct sk_buff **tcp4_gro_receive(struct sk_buff **head, struct sk_buff *skb)
290{ 292{
291 /* Use the IP hdr immediately proceeding for this transport */
292 const struct iphdr *iph = skb_gro_network_header(skb);
293 __wsum wsum;
294
295 /* Don't bother verifying checksum if we're going to flush anyway. */ 293 /* Don't bother verifying checksum if we're going to flush anyway. */
296 if (NAPI_GRO_CB(skb)->flush) 294 if (!NAPI_GRO_CB(skb)->flush &&
297 goto skip_csum; 295 skb_gro_checksum_validate(skb, IPPROTO_TCP,
298 296 inet_gro_compute_pseudo)) {
299 wsum = NAPI_GRO_CB(skb)->csum;
300
301 switch (skb->ip_summed) {
302 case CHECKSUM_NONE:
303 wsum = skb_checksum(skb, skb_gro_offset(skb), skb_gro_len(skb),
304 0);
305
306 /* fall through */
307
308 case CHECKSUM_COMPLETE:
309 if (!tcp_v4_check(skb_gro_len(skb), iph->saddr, iph->daddr,
310 wsum)) {
311 skb->ip_summed = CHECKSUM_UNNECESSARY;
312 break;
313 }
314
315 NAPI_GRO_CB(skb)->flush = 1; 297 NAPI_GRO_CB(skb)->flush = 1;
316 return NULL; 298 return NULL;
317 } 299 }
318 300
319skip_csum:
320 return tcp_gro_receive(head, skb); 301 return tcp_gro_receive(head, skb);
321} 302}
322 303
@@ -334,8 +315,7 @@ static int tcp4_gro_complete(struct sk_buff *skb, int thoff)
334 315
335static const struct net_offload tcpv4_offload = { 316static const struct net_offload tcpv4_offload = {
336 .callbacks = { 317 .callbacks = {
337 .gso_send_check = tcp_v4_gso_send_check, 318 .gso_segment = tcp4_gso_segment,
338 .gso_segment = tcp_gso_segment,
339 .gro_receive = tcp4_gro_receive, 319 .gro_receive = tcp4_gro_receive,
340 .gro_complete = tcp4_gro_complete, 320 .gro_complete = tcp4_gro_complete,
341 }, 321 },
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5a7c41fbc6d3..8d4eac793700 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -318,36 +318,47 @@ static u16 tcp_select_window(struct sock *sk)
318} 318}
319 319
320/* Packet ECN state for a SYN-ACK */ 320/* Packet ECN state for a SYN-ACK */
321static inline void TCP_ECN_send_synack(const struct tcp_sock *tp, struct sk_buff *skb) 321static void tcp_ecn_send_synack(struct sock *sk, struct sk_buff *skb)
322{ 322{
323 const struct tcp_sock *tp = tcp_sk(sk);
324
323 TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_CWR; 325 TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_CWR;
324 if (!(tp->ecn_flags & TCP_ECN_OK)) 326 if (!(tp->ecn_flags & TCP_ECN_OK))
325 TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_ECE; 327 TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_ECE;
328 else if (tcp_ca_needs_ecn(sk))
329 INET_ECN_xmit(sk);
326} 330}
327 331
328/* Packet ECN state for a SYN. */ 332/* Packet ECN state for a SYN. */
329static inline void TCP_ECN_send_syn(struct sock *sk, struct sk_buff *skb) 333static void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb)
330{ 334{
331 struct tcp_sock *tp = tcp_sk(sk); 335 struct tcp_sock *tp = tcp_sk(sk);
332 336
333 tp->ecn_flags = 0; 337 tp->ecn_flags = 0;
334 if (sock_net(sk)->ipv4.sysctl_tcp_ecn == 1) { 338 if (sock_net(sk)->ipv4.sysctl_tcp_ecn == 1 ||
339 tcp_ca_needs_ecn(sk)) {
335 TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ECE | TCPHDR_CWR; 340 TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ECE | TCPHDR_CWR;
336 tp->ecn_flags = TCP_ECN_OK; 341 tp->ecn_flags = TCP_ECN_OK;
342 if (tcp_ca_needs_ecn(sk))
343 INET_ECN_xmit(sk);
337 } 344 }
338} 345}
339 346
340static __inline__ void 347static void
341TCP_ECN_make_synack(const struct request_sock *req, struct tcphdr *th) 348tcp_ecn_make_synack(const struct request_sock *req, struct tcphdr *th,
349 struct sock *sk)
342{ 350{
343 if (inet_rsk(req)->ecn_ok) 351 if (inet_rsk(req)->ecn_ok) {
344 th->ece = 1; 352 th->ece = 1;
353 if (tcp_ca_needs_ecn(sk))
354 INET_ECN_xmit(sk);
355 }
345} 356}
346 357
347/* Set up ECN state for a packet on a ESTABLISHED socket that is about to 358/* Set up ECN state for a packet on a ESTABLISHED socket that is about to
348 * be sent. 359 * be sent.
349 */ 360 */
350static inline void TCP_ECN_send(struct sock *sk, struct sk_buff *skb, 361static void tcp_ecn_send(struct sock *sk, struct sk_buff *skb,
351 int tcp_header_len) 362 int tcp_header_len)
352{ 363{
353 struct tcp_sock *tp = tcp_sk(sk); 364 struct tcp_sock *tp = tcp_sk(sk);
@@ -362,7 +373,7 @@ static inline void TCP_ECN_send(struct sock *sk, struct sk_buff *skb,
362 tcp_hdr(skb)->cwr = 1; 373 tcp_hdr(skb)->cwr = 1;
363 skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN; 374 skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
364 } 375 }
365 } else { 376 } else if (!tcp_ca_needs_ecn(sk)) {
366 /* ACK or retransmitted segment: clear ECT|CE */ 377 /* ACK or retransmitted segment: clear ECT|CE */
367 INET_ECN_dontxmit(sk); 378 INET_ECN_dontxmit(sk);
368 } 379 }
@@ -384,7 +395,7 @@ static void tcp_init_nondata_skb(struct sk_buff *skb, u32 seq, u8 flags)
384 TCP_SKB_CB(skb)->tcp_flags = flags; 395 TCP_SKB_CB(skb)->tcp_flags = flags;
385 TCP_SKB_CB(skb)->sacked = 0; 396 TCP_SKB_CB(skb)->sacked = 0;
386 397
387 shinfo->gso_segs = 1; 398 tcp_skb_pcount_set(skb, 1);
388 shinfo->gso_size = 0; 399 shinfo->gso_size = 0;
389 shinfo->gso_type = 0; 400 shinfo->gso_type = 0;
390 401
@@ -550,7 +561,7 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb,
550 561
551 if (likely(sysctl_tcp_timestamps && *md5 == NULL)) { 562 if (likely(sysctl_tcp_timestamps && *md5 == NULL)) {
552 opts->options |= OPTION_TS; 563 opts->options |= OPTION_TS;
553 opts->tsval = TCP_SKB_CB(skb)->when + tp->tsoffset; 564 opts->tsval = tcp_skb_timestamp(skb) + tp->tsoffset;
554 opts->tsecr = tp->rx_opt.ts_recent; 565 opts->tsecr = tp->rx_opt.ts_recent;
555 remaining -= TCPOLEN_TSTAMP_ALIGNED; 566 remaining -= TCPOLEN_TSTAMP_ALIGNED;
556 } 567 }
@@ -618,7 +629,7 @@ static unsigned int tcp_synack_options(struct sock *sk,
618 } 629 }
619 if (likely(ireq->tstamp_ok)) { 630 if (likely(ireq->tstamp_ok)) {
620 opts->options |= OPTION_TS; 631 opts->options |= OPTION_TS;
621 opts->tsval = TCP_SKB_CB(skb)->when; 632 opts->tsval = tcp_skb_timestamp(skb);
622 opts->tsecr = req->ts_recent; 633 opts->tsecr = req->ts_recent;
623 remaining -= TCPOLEN_TSTAMP_ALIGNED; 634 remaining -= TCPOLEN_TSTAMP_ALIGNED;
624 } 635 }
@@ -647,7 +658,6 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
647 struct tcp_out_options *opts, 658 struct tcp_out_options *opts,
648 struct tcp_md5sig_key **md5) 659 struct tcp_md5sig_key **md5)
649{ 660{
650 struct tcp_skb_cb *tcb = skb ? TCP_SKB_CB(skb) : NULL;
651 struct tcp_sock *tp = tcp_sk(sk); 661 struct tcp_sock *tp = tcp_sk(sk);
652 unsigned int size = 0; 662 unsigned int size = 0;
653 unsigned int eff_sacks; 663 unsigned int eff_sacks;
@@ -666,7 +676,7 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
666 676
667 if (likely(tp->rx_opt.tstamp_ok)) { 677 if (likely(tp->rx_opt.tstamp_ok)) {
668 opts->options |= OPTION_TS; 678 opts->options |= OPTION_TS;
669 opts->tsval = tcb ? tcb->when + tp->tsoffset : 0; 679 opts->tsval = skb ? tcp_skb_timestamp(skb) + tp->tsoffset : 0;
670 opts->tsecr = tp->rx_opt.ts_recent; 680 opts->tsecr = tp->rx_opt.ts_recent;
671 size += TCPOLEN_TSTAMP_ALIGNED; 681 size += TCPOLEN_TSTAMP_ALIGNED;
672 } 682 }
@@ -886,8 +896,6 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
886 skb = skb_clone(skb, gfp_mask); 896 skb = skb_clone(skb, gfp_mask);
887 if (unlikely(!skb)) 897 if (unlikely(!skb))
888 return -ENOBUFS; 898 return -ENOBUFS;
889 /* Our usage of tstamp should remain private */
890 skb->tstamp.tv64 = 0;
891 } 899 }
892 900
893 inet = inet_sk(sk); 901 inet = inet_sk(sk);
@@ -952,7 +960,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
952 960
953 tcp_options_write((__be32 *)(th + 1), tp, &opts); 961 tcp_options_write((__be32 *)(th + 1), tp, &opts);
954 if (likely((tcb->tcp_flags & TCPHDR_SYN) == 0)) 962 if (likely((tcb->tcp_flags & TCPHDR_SYN) == 0))
955 TCP_ECN_send(sk, skb, tcp_header_size); 963 tcp_ecn_send(sk, skb, tcp_header_size);
956 964
957#ifdef CONFIG_TCP_MD5SIG 965#ifdef CONFIG_TCP_MD5SIG
958 /* Calculate the MD5 hash, as we have all we need now */ 966 /* Calculate the MD5 hash, as we have all we need now */
@@ -975,7 +983,18 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
975 TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS, 983 TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS,
976 tcp_skb_pcount(skb)); 984 tcp_skb_pcount(skb));
977 985
986 /* OK, its time to fill skb_shinfo(skb)->gso_segs */
987 skb_shinfo(skb)->gso_segs = tcp_skb_pcount(skb);
988
989 /* Our usage of tstamp should remain private */
990 skb->tstamp.tv64 = 0;
991
992 /* Cleanup our debris for IP stacks */
993 memset(skb->cb, 0, max(sizeof(struct inet_skb_parm),
994 sizeof(struct inet6_skb_parm)));
995
978 err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl); 996 err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl);
997
979 if (likely(err <= 0)) 998 if (likely(err <= 0))
980 return err; 999 return err;
981 1000
@@ -995,7 +1014,7 @@ static void tcp_queue_skb(struct sock *sk, struct sk_buff *skb)
995 1014
996 /* Advance write_seq and place onto the write_queue. */ 1015 /* Advance write_seq and place onto the write_queue. */
997 tp->write_seq = TCP_SKB_CB(skb)->end_seq; 1016 tp->write_seq = TCP_SKB_CB(skb)->end_seq;
998 skb_header_release(skb); 1017 __skb_header_release(skb);
999 tcp_add_write_queue_tail(sk, skb); 1018 tcp_add_write_queue_tail(sk, skb);
1000 sk->sk_wmem_queued += skb->truesize; 1019 sk->sk_wmem_queued += skb->truesize;
1001 sk_mem_charge(sk, skb->truesize); 1020 sk_mem_charge(sk, skb->truesize);
@@ -1014,11 +1033,11 @@ static void tcp_set_skb_tso_segs(const struct sock *sk, struct sk_buff *skb,
1014 /* Avoid the costly divide in the normal 1033 /* Avoid the costly divide in the normal
1015 * non-TSO case. 1034 * non-TSO case.
1016 */ 1035 */
1017 shinfo->gso_segs = 1; 1036 tcp_skb_pcount_set(skb, 1);
1018 shinfo->gso_size = 0; 1037 shinfo->gso_size = 0;
1019 shinfo->gso_type = 0; 1038 shinfo->gso_type = 0;
1020 } else { 1039 } else {
1021 shinfo->gso_segs = DIV_ROUND_UP(skb->len, mss_now); 1040 tcp_skb_pcount_set(skb, DIV_ROUND_UP(skb->len, mss_now));
1022 shinfo->gso_size = mss_now; 1041 shinfo->gso_size = mss_now;
1023 shinfo->gso_type = sk->sk_gso_type; 1042 shinfo->gso_type = sk->sk_gso_type;
1024 } 1043 }
@@ -1146,10 +1165,6 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
1146 1165
1147 buff->ip_summed = skb->ip_summed; 1166 buff->ip_summed = skb->ip_summed;
1148 1167
1149 /* Looks stupid, but our code really uses when of
1150 * skbs, which it never sent before. --ANK
1151 */
1152 TCP_SKB_CB(buff)->when = TCP_SKB_CB(skb)->when;
1153 buff->tstamp = skb->tstamp; 1168 buff->tstamp = skb->tstamp;
1154 tcp_fragment_tstamp(skb, buff); 1169 tcp_fragment_tstamp(skb, buff);
1155 1170
@@ -1171,7 +1186,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
1171 } 1186 }
1172 1187
1173 /* Link BUFF into the send queue. */ 1188 /* Link BUFF into the send queue. */
1174 skb_header_release(buff); 1189 __skb_header_release(buff);
1175 tcp_insert_write_queue_after(skb, buff, sk); 1190 tcp_insert_write_queue_after(skb, buff, sk);
1176 1191
1177 return 0; 1192 return 0;
@@ -1675,7 +1690,7 @@ static int tso_fragment(struct sock *sk, struct sk_buff *skb, unsigned int len,
1675 tcp_set_skb_tso_segs(sk, buff, mss_now); 1690 tcp_set_skb_tso_segs(sk, buff, mss_now);
1676 1691
1677 /* Link BUFF into the send queue. */ 1692 /* Link BUFF into the send queue. */
1678 skb_header_release(buff); 1693 __skb_header_release(buff);
1679 tcp_insert_write_queue_after(skb, buff, sk); 1694 tcp_insert_write_queue_after(skb, buff, sk);
1680 1695
1681 return 0; 1696 return 0;
@@ -1874,8 +1889,8 @@ static int tcp_mtu_probe(struct sock *sk)
1874 tcp_init_tso_segs(sk, nskb, nskb->len); 1889 tcp_init_tso_segs(sk, nskb, nskb->len);
1875 1890
1876 /* We're ready to send. If this fails, the probe will 1891 /* We're ready to send. If this fails, the probe will
1877 * be resegmented into mss-sized pieces by tcp_write_xmit(). */ 1892 * be resegmented into mss-sized pieces by tcp_write_xmit().
1878 TCP_SKB_CB(nskb)->when = tcp_time_stamp; 1893 */
1879 if (!tcp_transmit_skb(sk, nskb, 1, GFP_ATOMIC)) { 1894 if (!tcp_transmit_skb(sk, nskb, 1, GFP_ATOMIC)) {
1880 /* Decrement cwnd here because we are sending 1895 /* Decrement cwnd here because we are sending
1881 * effectively two packets. */ 1896 * effectively two packets. */
@@ -1935,8 +1950,8 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
1935 BUG_ON(!tso_segs); 1950 BUG_ON(!tso_segs);
1936 1951
1937 if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) { 1952 if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) {
1938 /* "when" is used as a start point for the retransmit timer */ 1953 /* "skb_mstamp" is used as a start point for the retransmit timer */
1939 TCP_SKB_CB(skb)->when = tcp_time_stamp; 1954 skb_mstamp_get(&skb->skb_mstamp);
1940 goto repair; /* Skip network transmission */ 1955 goto repair; /* Skip network transmission */
1941 } 1956 }
1942 1957
@@ -2000,8 +2015,6 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
2000 unlikely(tso_fragment(sk, skb, limit, mss_now, gfp))) 2015 unlikely(tso_fragment(sk, skb, limit, mss_now, gfp)))
2001 break; 2016 break;
2002 2017
2003 TCP_SKB_CB(skb)->when = tcp_time_stamp;
2004
2005 if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) 2018 if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
2006 break; 2019 break;
2007 2020
@@ -2097,10 +2110,7 @@ bool tcp_schedule_loss_probe(struct sock *sk)
2097static bool skb_still_in_host_queue(const struct sock *sk, 2110static bool skb_still_in_host_queue(const struct sock *sk,
2098 const struct sk_buff *skb) 2111 const struct sk_buff *skb)
2099{ 2112{
2100 const struct sk_buff *fclone = skb + 1; 2113 if (unlikely(skb_fclone_busy(skb))) {
2101
2102 if (unlikely(skb->fclone == SKB_FCLONE_ORIG &&
2103 fclone->fclone == SKB_FCLONE_CLONE)) {
2104 NET_INC_STATS_BH(sock_net(sk), 2114 NET_INC_STATS_BH(sock_net(sk),
2105 LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES); 2115 LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES);
2106 return true; 2116 return true;
@@ -2499,7 +2509,6 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
2499 /* Make a copy, if the first transmission SKB clone we made 2509 /* Make a copy, if the first transmission SKB clone we made
2500 * is still in somebody's hands, else make a clone. 2510 * is still in somebody's hands, else make a clone.
2501 */ 2511 */
2502 TCP_SKB_CB(skb)->when = tcp_time_stamp;
2503 2512
2504 /* make sure skb->data is aligned on arches that require it 2513 /* make sure skb->data is aligned on arches that require it
2505 * and check if ack-trimming & collapsing extended the headroom 2514 * and check if ack-trimming & collapsing extended the headroom
@@ -2544,7 +2553,7 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
2544 2553
2545 /* Save stamp of the first retransmit. */ 2554 /* Save stamp of the first retransmit. */
2546 if (!tp->retrans_stamp) 2555 if (!tp->retrans_stamp)
2547 tp->retrans_stamp = TCP_SKB_CB(skb)->when; 2556 tp->retrans_stamp = tcp_skb_timestamp(skb);
2548 2557
2549 /* snd_nxt is stored to detect loss of retransmitted segment, 2558 /* snd_nxt is stored to detect loss of retransmitted segment,
2550 * see tcp_input.c tcp_sacktag_write_queue(). 2559 * see tcp_input.c tcp_sacktag_write_queue().
@@ -2752,7 +2761,6 @@ void tcp_send_active_reset(struct sock *sk, gfp_t priority)
2752 tcp_init_nondata_skb(skb, tcp_acceptable_seq(sk), 2761 tcp_init_nondata_skb(skb, tcp_acceptable_seq(sk),
2753 TCPHDR_ACK | TCPHDR_RST); 2762 TCPHDR_ACK | TCPHDR_RST);
2754 /* Send it off. */ 2763 /* Send it off. */
2755 TCP_SKB_CB(skb)->when = tcp_time_stamp;
2756 if (tcp_transmit_skb(sk, skb, 0, priority)) 2764 if (tcp_transmit_skb(sk, skb, 0, priority))
2757 NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTFAILED); 2765 NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTFAILED);
2758 2766
@@ -2780,7 +2788,7 @@ int tcp_send_synack(struct sock *sk)
2780 if (nskb == NULL) 2788 if (nskb == NULL)
2781 return -ENOMEM; 2789 return -ENOMEM;
2782 tcp_unlink_write_queue(skb, sk); 2790 tcp_unlink_write_queue(skb, sk);
2783 skb_header_release(nskb); 2791 __skb_header_release(nskb);
2784 __tcp_add_write_queue_head(sk, nskb); 2792 __tcp_add_write_queue_head(sk, nskb);
2785 sk_wmem_free_skb(sk, skb); 2793 sk_wmem_free_skb(sk, skb);
2786 sk->sk_wmem_queued += nskb->truesize; 2794 sk->sk_wmem_queued += nskb->truesize;
@@ -2789,9 +2797,8 @@ int tcp_send_synack(struct sock *sk)
2789 } 2797 }
2790 2798
2791 TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ACK; 2799 TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ACK;
2792 TCP_ECN_send_synack(tcp_sk(sk), skb); 2800 tcp_ecn_send_synack(sk, skb);
2793 } 2801 }
2794 TCP_SKB_CB(skb)->when = tcp_time_stamp;
2795 return tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC); 2802 return tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC);
2796} 2803}
2797 2804
@@ -2835,10 +2842,10 @@ struct sk_buff *tcp_make_synack(struct sock *sk, struct dst_entry *dst,
2835 memset(&opts, 0, sizeof(opts)); 2842 memset(&opts, 0, sizeof(opts));
2836#ifdef CONFIG_SYN_COOKIES 2843#ifdef CONFIG_SYN_COOKIES
2837 if (unlikely(req->cookie_ts)) 2844 if (unlikely(req->cookie_ts))
2838 TCP_SKB_CB(skb)->when = cookie_init_timestamp(req); 2845 skb->skb_mstamp.stamp_jiffies = cookie_init_timestamp(req);
2839 else 2846 else
2840#endif 2847#endif
2841 TCP_SKB_CB(skb)->when = tcp_time_stamp; 2848 skb_mstamp_get(&skb->skb_mstamp);
2842 tcp_header_size = tcp_synack_options(sk, req, mss, skb, &opts, &md5, 2849 tcp_header_size = tcp_synack_options(sk, req, mss, skb, &opts, &md5,
2843 foc) + sizeof(*th); 2850 foc) + sizeof(*th);
2844 2851
@@ -2849,7 +2856,7 @@ struct sk_buff *tcp_make_synack(struct sock *sk, struct dst_entry *dst,
2849 memset(th, 0, sizeof(struct tcphdr)); 2856 memset(th, 0, sizeof(struct tcphdr));
2850 th->syn = 1; 2857 th->syn = 1;
2851 th->ack = 1; 2858 th->ack = 1;
2852 TCP_ECN_make_synack(req, th); 2859 tcp_ecn_make_synack(req, th, sk);
2853 th->source = htons(ireq->ir_num); 2860 th->source = htons(ireq->ir_num);
2854 th->dest = ireq->ir_rmt_port; 2861 th->dest = ireq->ir_rmt_port;
2855 /* Setting of flags are superfluous here for callers (and ECE is 2862 /* Setting of flags are superfluous here for callers (and ECE is
@@ -2956,7 +2963,7 @@ static void tcp_connect_queue_skb(struct sock *sk, struct sk_buff *skb)
2956 struct tcp_skb_cb *tcb = TCP_SKB_CB(skb); 2963 struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
2957 2964
2958 tcb->end_seq += skb->len; 2965 tcb->end_seq += skb->len;
2959 skb_header_release(skb); 2966 __skb_header_release(skb);
2960 __tcp_add_write_queue_tail(sk, skb); 2967 __tcp_add_write_queue_tail(sk, skb);
2961 sk->sk_wmem_queued += skb->truesize; 2968 sk->sk_wmem_queued += skb->truesize;
2962 sk_mem_charge(sk, skb->truesize); 2969 sk_mem_charge(sk, skb->truesize);
@@ -3086,9 +3093,9 @@ int tcp_connect(struct sock *sk)
3086 skb_reserve(buff, MAX_TCP_HEADER); 3093 skb_reserve(buff, MAX_TCP_HEADER);
3087 3094
3088 tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN); 3095 tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
3089 tp->retrans_stamp = TCP_SKB_CB(buff)->when = tcp_time_stamp; 3096 tp->retrans_stamp = tcp_time_stamp;
3090 tcp_connect_queue_skb(sk, buff); 3097 tcp_connect_queue_skb(sk, buff);
3091 TCP_ECN_send_syn(sk, buff); 3098 tcp_ecn_send_syn(sk, buff);
3092 3099
3093 /* Send off SYN; include data in Fast Open. */ 3100 /* Send off SYN; include data in Fast Open. */
3094 err = tp->fastopen_req ? tcp_send_syn_data(sk, buff) : 3101 err = tp->fastopen_req ? tcp_send_syn_data(sk, buff) :
@@ -3120,6 +3127,8 @@ void tcp_send_delayed_ack(struct sock *sk)
3120 int ato = icsk->icsk_ack.ato; 3127 int ato = icsk->icsk_ack.ato;
3121 unsigned long timeout; 3128 unsigned long timeout;
3122 3129
3130 tcp_ca_event(sk, CA_EVENT_DELAYED_ACK);
3131
3123 if (ato > TCP_DELACK_MIN) { 3132 if (ato > TCP_DELACK_MIN) {
3124 const struct tcp_sock *tp = tcp_sk(sk); 3133 const struct tcp_sock *tp = tcp_sk(sk);
3125 int max_ato = HZ / 2; 3134 int max_ato = HZ / 2;
@@ -3176,6 +3185,8 @@ void tcp_send_ack(struct sock *sk)
3176 if (sk->sk_state == TCP_CLOSE) 3185 if (sk->sk_state == TCP_CLOSE)
3177 return; 3186 return;
3178 3187
3188 tcp_ca_event(sk, CA_EVENT_NON_DELAYED_ACK);
3189
3179 /* We are not putting this on the write queue, so 3190 /* We are not putting this on the write queue, so
3180 * tcp_transmit_skb() will set the ownership to this 3191 * tcp_transmit_skb() will set the ownership to this
3181 * sock. 3192 * sock.
@@ -3194,9 +3205,10 @@ void tcp_send_ack(struct sock *sk)
3194 tcp_init_nondata_skb(buff, tcp_acceptable_seq(sk), TCPHDR_ACK); 3205 tcp_init_nondata_skb(buff, tcp_acceptable_seq(sk), TCPHDR_ACK);
3195 3206
3196 /* Send it off, this clears delayed acks for us. */ 3207 /* Send it off, this clears delayed acks for us. */
3197 TCP_SKB_CB(buff)->when = tcp_time_stamp; 3208 skb_mstamp_get(&buff->skb_mstamp);
3198 tcp_transmit_skb(sk, buff, 0, sk_gfp_atomic(sk, GFP_ATOMIC)); 3209 tcp_transmit_skb(sk, buff, 0, sk_gfp_atomic(sk, GFP_ATOMIC));
3199} 3210}
3211EXPORT_SYMBOL_GPL(tcp_send_ack);
3200 3212
3201/* This routine sends a packet with an out of date sequence 3213/* This routine sends a packet with an out of date sequence
3202 * number. It assumes the other end will try to ack it. 3214 * number. It assumes the other end will try to ack it.
@@ -3226,7 +3238,7 @@ static int tcp_xmit_probe_skb(struct sock *sk, int urgent)
3226 * send it. 3238 * send it.
3227 */ 3239 */
3228 tcp_init_nondata_skb(skb, tp->snd_una - !urgent, TCPHDR_ACK); 3240 tcp_init_nondata_skb(skb, tp->snd_una - !urgent, TCPHDR_ACK);
3229 TCP_SKB_CB(skb)->when = tcp_time_stamp; 3241 skb_mstamp_get(&skb->skb_mstamp);
3230 return tcp_transmit_skb(sk, skb, 0, GFP_ATOMIC); 3242 return tcp_transmit_skb(sk, skb, 0, GFP_ATOMIC);
3231} 3243}
3232 3244
@@ -3270,7 +3282,6 @@ int tcp_write_wakeup(struct sock *sk)
3270 tcp_set_skb_tso_segs(sk, skb, mss); 3282 tcp_set_skb_tso_segs(sk, skb, mss);
3271 3283
3272 TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_PSH; 3284 TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_PSH;
3273 TCP_SKB_CB(skb)->when = tcp_time_stamp;
3274 err = tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC); 3285 err = tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC);
3275 if (!err) 3286 if (!err)
3276 tcp_event_new_data_sent(sk, skb); 3287 tcp_event_new_data_sent(sk, skb);
@@ -3289,6 +3300,7 @@ void tcp_send_probe0(struct sock *sk)
3289{ 3300{
3290 struct inet_connection_sock *icsk = inet_csk(sk); 3301 struct inet_connection_sock *icsk = inet_csk(sk);
3291 struct tcp_sock *tp = tcp_sk(sk); 3302 struct tcp_sock *tp = tcp_sk(sk);
3303 unsigned long probe_max;
3292 int err; 3304 int err;
3293 3305
3294 err = tcp_write_wakeup(sk); 3306 err = tcp_write_wakeup(sk);
@@ -3304,9 +3316,7 @@ void tcp_send_probe0(struct sock *sk)
3304 if (icsk->icsk_backoff < sysctl_tcp_retries2) 3316 if (icsk->icsk_backoff < sysctl_tcp_retries2)
3305 icsk->icsk_backoff++; 3317 icsk->icsk_backoff++;
3306 icsk->icsk_probes_out++; 3318 icsk->icsk_probes_out++;
3307 inet_csk_reset_xmit_timer(sk, ICSK_TIME_PROBE0, 3319 probe_max = TCP_RTO_MAX;
3308 min(icsk->icsk_rto << icsk->icsk_backoff, TCP_RTO_MAX),
3309 TCP_RTO_MAX);
3310 } else { 3320 } else {
3311 /* If packet was not sent due to local congestion, 3321 /* If packet was not sent due to local congestion,
3312 * do not backoff and do not remember icsk_probes_out. 3322 * do not backoff and do not remember icsk_probes_out.
@@ -3316,11 +3326,11 @@ void tcp_send_probe0(struct sock *sk)
3316 */ 3326 */
3317 if (!icsk->icsk_probes_out) 3327 if (!icsk->icsk_probes_out)
3318 icsk->icsk_probes_out = 1; 3328 icsk->icsk_probes_out = 1;
3319 inet_csk_reset_xmit_timer(sk, ICSK_TIME_PROBE0, 3329 probe_max = TCP_RESOURCE_PROBE_INTERVAL;
3320 min(icsk->icsk_rto << icsk->icsk_backoff,
3321 TCP_RESOURCE_PROBE_INTERVAL),
3322 TCP_RTO_MAX);
3323 } 3330 }
3331 inet_csk_reset_xmit_timer(sk, ICSK_TIME_PROBE0,
3332 inet_csk_rto_backoff(icsk, probe_max),
3333 TCP_RTO_MAX);
3324} 3334}
3325 3335
3326int tcp_rtx_synack(struct sock *sk, struct request_sock *req) 3336int tcp_rtx_synack(struct sock *sk, struct request_sock *req)
diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c
index 3b66610d4156..ebf5ff57526e 100644
--- a/net/ipv4/tcp_probe.c
+++ b/net/ipv4/tcp_probe.c
@@ -83,7 +83,6 @@ static struct {
83 struct tcp_log *log; 83 struct tcp_log *log;
84} tcp_probe; 84} tcp_probe;
85 85
86
87static inline int tcp_probe_used(void) 86static inline int tcp_probe_used(void)
88{ 87{
89 return (tcp_probe.head - tcp_probe.tail) & (bufsize - 1); 88 return (tcp_probe.head - tcp_probe.tail) & (bufsize - 1);
@@ -101,7 +100,6 @@ static inline int tcp_probe_avail(void)
101 si4.sin_addr.s_addr = inet->inet_##mem##addr; \ 100 si4.sin_addr.s_addr = inet->inet_##mem##addr; \
102 } while (0) \ 101 } while (0) \
103 102
104
105/* 103/*
106 * Hook inserted to be called before each receive packet. 104 * Hook inserted to be called before each receive packet.
107 * Note: arguments must match tcp_rcv_established()! 105 * Note: arguments must match tcp_rcv_established()!
@@ -194,8 +192,8 @@ static int tcpprobe_sprint(char *tbuf, int n)
194 192
195 return scnprintf(tbuf, n, 193 return scnprintf(tbuf, n,
196 "%lu.%09lu %pISpc %pISpc %d %#x %#x %u %u %u %u %u\n", 194 "%lu.%09lu %pISpc %pISpc %d %#x %#x %u %u %u %u %u\n",
197 (unsigned long) tv.tv_sec, 195 (unsigned long)tv.tv_sec,
198 (unsigned long) tv.tv_nsec, 196 (unsigned long)tv.tv_nsec,
199 &p->src, &p->dst, p->length, p->snd_nxt, p->snd_una, 197 &p->src, &p->dst, p->length, p->snd_nxt, p->snd_una,
200 p->snd_cwnd, p->ssthresh, p->snd_wnd, p->srtt, p->rcv_wnd); 198 p->snd_cwnd, p->ssthresh, p->snd_wnd, p->srtt, p->rcv_wnd);
201} 199}
diff --git a/net/ipv4/tcp_scalable.c b/net/ipv4/tcp_scalable.c
index 8250949b8853..6824afb65d93 100644
--- a/net/ipv4/tcp_scalable.c
+++ b/net/ipv4/tcp_scalable.c
@@ -31,10 +31,10 @@ static void tcp_scalable_cong_avoid(struct sock *sk, u32 ack, u32 acked)
31static u32 tcp_scalable_ssthresh(struct sock *sk) 31static u32 tcp_scalable_ssthresh(struct sock *sk)
32{ 32{
33 const struct tcp_sock *tp = tcp_sk(sk); 33 const struct tcp_sock *tp = tcp_sk(sk);
34
34 return max(tp->snd_cwnd - (tp->snd_cwnd>>TCP_SCALABLE_MD_SCALE), 2U); 35 return max(tp->snd_cwnd - (tp->snd_cwnd>>TCP_SCALABLE_MD_SCALE), 2U);
35} 36}
36 37
37
38static struct tcp_congestion_ops tcp_scalable __read_mostly = { 38static struct tcp_congestion_ops tcp_scalable __read_mostly = {
39 .ssthresh = tcp_scalable_ssthresh, 39 .ssthresh = tcp_scalable_ssthresh,
40 .cong_avoid = tcp_scalable_cong_avoid, 40 .cong_avoid = tcp_scalable_cong_avoid,
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index df90cd1ce37f..9b21ae8b2e31 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -52,7 +52,7 @@ static void tcp_write_err(struct sock *sk)
52 * limit. 52 * limit.
53 * 2. If we have strong memory pressure. 53 * 2. If we have strong memory pressure.
54 */ 54 */
55static int tcp_out_of_resources(struct sock *sk, int do_reset) 55static int tcp_out_of_resources(struct sock *sk, bool do_reset)
56{ 56{
57 struct tcp_sock *tp = tcp_sk(sk); 57 struct tcp_sock *tp = tcp_sk(sk);
58 int shift = 0; 58 int shift = 0;
@@ -72,7 +72,7 @@ static int tcp_out_of_resources(struct sock *sk, int do_reset)
72 if ((s32)(tcp_time_stamp - tp->lsndtime) <= TCP_TIMEWAIT_LEN || 72 if ((s32)(tcp_time_stamp - tp->lsndtime) <= TCP_TIMEWAIT_LEN ||
73 /* 2. Window is closed. */ 73 /* 2. Window is closed. */
74 (!tp->snd_wnd && !tp->packets_out)) 74 (!tp->snd_wnd && !tp->packets_out))
75 do_reset = 1; 75 do_reset = true;
76 if (do_reset) 76 if (do_reset)
77 tcp_send_active_reset(sk, GFP_ATOMIC); 77 tcp_send_active_reset(sk, GFP_ATOMIC);
78 tcp_done(sk); 78 tcp_done(sk);
@@ -135,10 +135,9 @@ static bool retransmits_timed_out(struct sock *sk,
135 if (!inet_csk(sk)->icsk_retransmits) 135 if (!inet_csk(sk)->icsk_retransmits)
136 return false; 136 return false;
137 137
138 if (unlikely(!tcp_sk(sk)->retrans_stamp)) 138 start_ts = tcp_sk(sk)->retrans_stamp;
139 start_ts = TCP_SKB_CB(tcp_write_queue_head(sk))->when; 139 if (unlikely(!start_ts))
140 else 140 start_ts = tcp_skb_timestamp(tcp_write_queue_head(sk));
141 start_ts = tcp_sk(sk)->retrans_stamp;
142 141
143 if (likely(timeout == 0)) { 142 if (likely(timeout == 0)) {
144 linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base); 143 linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base);
@@ -181,7 +180,7 @@ static int tcp_write_timeout(struct sock *sk)
181 180
182 retry_until = sysctl_tcp_retries2; 181 retry_until = sysctl_tcp_retries2;
183 if (sock_flag(sk, SOCK_DEAD)) { 182 if (sock_flag(sk, SOCK_DEAD)) {
184 const int alive = (icsk->icsk_rto < TCP_RTO_MAX); 183 const int alive = icsk->icsk_rto < TCP_RTO_MAX;
185 184
186 retry_until = tcp_orphan_retries(sk, alive); 185 retry_until = tcp_orphan_retries(sk, alive);
187 do_reset = alive || 186 do_reset = alive ||
@@ -271,40 +270,41 @@ static void tcp_probe_timer(struct sock *sk)
271 struct inet_connection_sock *icsk = inet_csk(sk); 270 struct inet_connection_sock *icsk = inet_csk(sk);
272 struct tcp_sock *tp = tcp_sk(sk); 271 struct tcp_sock *tp = tcp_sk(sk);
273 int max_probes; 272 int max_probes;
273 u32 start_ts;
274 274
275 if (tp->packets_out || !tcp_send_head(sk)) { 275 if (tp->packets_out || !tcp_send_head(sk)) {
276 icsk->icsk_probes_out = 0; 276 icsk->icsk_probes_out = 0;
277 return; 277 return;
278 } 278 }
279 279
280 /* *WARNING* RFC 1122 forbids this 280 /* RFC 1122 4.2.2.17 requires the sender to stay open indefinitely as
281 * 281 * long as the receiver continues to respond probes. We support this by
282 * It doesn't AFAIK, because we kill the retransmit timer -AK 282 * default and reset icsk_probes_out with incoming ACKs. But if the
283 * 283 * socket is orphaned or the user specifies TCP_USER_TIMEOUT, we
284 * FIXME: We ought not to do it, Solaris 2.5 actually has fixing 284 * kill the socket when the retry count and the time exceeds the
285 * this behaviour in Solaris down as a bug fix. [AC] 285 * corresponding system limit. We also implement similar policy when
286 * 286 * we use RTO to probe window in tcp_retransmit_timer().
287 * Let me to explain. icsk_probes_out is zeroed by incoming ACKs
288 * even if they advertise zero window. Hence, connection is killed only
289 * if we received no ACKs for normal connection timeout. It is not killed
290 * only because window stays zero for some time, window may be zero
291 * until armageddon and even later. We are in full accordance
292 * with RFCs, only probe timer combines both retransmission timeout
293 * and probe timeout in one bottle. --ANK
294 */ 287 */
295 max_probes = sysctl_tcp_retries2; 288 start_ts = tcp_skb_timestamp(tcp_send_head(sk));
289 if (!start_ts)
290 skb_mstamp_get(&tcp_send_head(sk)->skb_mstamp);
291 else if (icsk->icsk_user_timeout &&
292 (s32)(tcp_time_stamp - start_ts) > icsk->icsk_user_timeout)
293 goto abort;
296 294
295 max_probes = sysctl_tcp_retries2;
297 if (sock_flag(sk, SOCK_DEAD)) { 296 if (sock_flag(sk, SOCK_DEAD)) {
298 const int alive = ((icsk->icsk_rto << icsk->icsk_backoff) < TCP_RTO_MAX); 297 const int alive = inet_csk_rto_backoff(icsk, TCP_RTO_MAX) < TCP_RTO_MAX;
299 298
300 max_probes = tcp_orphan_retries(sk, alive); 299 max_probes = tcp_orphan_retries(sk, alive);
301 300 if (!alive && icsk->icsk_backoff >= max_probes)
302 if (tcp_out_of_resources(sk, alive || icsk->icsk_probes_out <= max_probes)) 301 goto abort;
302 if (tcp_out_of_resources(sk, true))
303 return; 303 return;
304 } 304 }
305 305
306 if (icsk->icsk_probes_out > max_probes) { 306 if (icsk->icsk_probes_out > max_probes) {
307 tcp_write_err(sk); 307abort: tcp_write_err(sk);
308 } else { 308 } else {
309 /* Only send another probe if we didn't close things up. */ 309 /* Only send another probe if we didn't close things up. */
310 tcp_send_probe0(sk); 310 tcp_send_probe0(sk);
diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c
index b40ad897f945..a6afde666ab1 100644
--- a/net/ipv4/tcp_vegas.c
+++ b/net/ipv4/tcp_vegas.c
@@ -51,7 +51,6 @@ MODULE_PARM_DESC(beta, "upper bound of packets in network");
51module_param(gamma, int, 0644); 51module_param(gamma, int, 0644);
52MODULE_PARM_DESC(gamma, "limit on increase (scale by 2)"); 52MODULE_PARM_DESC(gamma, "limit on increase (scale by 2)");
53 53
54
55/* There are several situations when we must "re-start" Vegas: 54/* There are several situations when we must "re-start" Vegas:
56 * 55 *
57 * o when a connection is established 56 * o when a connection is established
@@ -133,7 +132,6 @@ EXPORT_SYMBOL_GPL(tcp_vegas_pkts_acked);
133 132
134void tcp_vegas_state(struct sock *sk, u8 ca_state) 133void tcp_vegas_state(struct sock *sk, u8 ca_state)
135{ 134{
136
137 if (ca_state == TCP_CA_Open) 135 if (ca_state == TCP_CA_Open)
138 vegas_enable(sk); 136 vegas_enable(sk);
139 else 137 else
@@ -285,7 +283,6 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked)
285 /* Use normal slow start */ 283 /* Use normal slow start */
286 else if (tp->snd_cwnd <= tp->snd_ssthresh) 284 else if (tp->snd_cwnd <= tp->snd_ssthresh)
287 tcp_slow_start(tp, acked); 285 tcp_slow_start(tp, acked);
288
289} 286}
290 287
291/* Extract info for Tcp socket info provided via netlink. */ 288/* Extract info for Tcp socket info provided via netlink. */
diff --git a/net/ipv4/tcp_veno.c b/net/ipv4/tcp_veno.c
index 8276977d2c85..a4d2d2d88dca 100644
--- a/net/ipv4/tcp_veno.c
+++ b/net/ipv4/tcp_veno.c
@@ -175,7 +175,6 @@ static void tcp_veno_cong_avoid(struct sock *sk, u32 ack, u32 acked)
175 } else 175 } else
176 tp->snd_cwnd_cnt++; 176 tp->snd_cwnd_cnt++;
177 } 177 }
178
179 } 178 }
180 if (tp->snd_cwnd < 2) 179 if (tp->snd_cwnd < 2)
181 tp->snd_cwnd = 2; 180 tp->snd_cwnd = 2;
diff --git a/net/ipv4/tcp_westwood.c b/net/ipv4/tcp_westwood.c
index b94a04ae2ed5..bb63fba47d47 100644
--- a/net/ipv4/tcp_westwood.c
+++ b/net/ipv4/tcp_westwood.c
@@ -42,7 +42,6 @@ struct westwood {
42 u8 reset_rtt_min; /* Reset RTT min to next RTT sample*/ 42 u8 reset_rtt_min; /* Reset RTT min to next RTT sample*/
43}; 43};
44 44
45
46/* TCP Westwood functions and constants */ 45/* TCP Westwood functions and constants */
47#define TCP_WESTWOOD_RTT_MIN (HZ/20) /* 50ms */ 46#define TCP_WESTWOOD_RTT_MIN (HZ/20) /* 50ms */
48#define TCP_WESTWOOD_INIT_RTT (20*HZ) /* maybe too conservative?! */ 47#define TCP_WESTWOOD_INIT_RTT (20*HZ) /* maybe too conservative?! */
@@ -153,7 +152,6 @@ static inline void update_rtt_min(struct westwood *w)
153 w->rtt_min = min(w->rtt, w->rtt_min); 152 w->rtt_min = min(w->rtt, w->rtt_min);
154} 153}
155 154
156
157/* 155/*
158 * @westwood_fast_bw 156 * @westwood_fast_bw
159 * It is called when we are in fast path. In particular it is called when 157 * It is called when we are in fast path. In particular it is called when
@@ -208,7 +206,6 @@ static inline u32 westwood_acked_count(struct sock *sk)
208 return w->cumul_ack; 206 return w->cumul_ack;
209} 207}
210 208
211
212/* 209/*
213 * TCP Westwood 210 * TCP Westwood
214 * Here limit is evaluated as Bw estimation*RTTmin (for obtaining it 211 * Here limit is evaluated as Bw estimation*RTTmin (for obtaining it
@@ -219,47 +216,51 @@ static u32 tcp_westwood_bw_rttmin(const struct sock *sk)
219{ 216{
220 const struct tcp_sock *tp = tcp_sk(sk); 217 const struct tcp_sock *tp = tcp_sk(sk);
221 const struct westwood *w = inet_csk_ca(sk); 218 const struct westwood *w = inet_csk_ca(sk);
219
222 return max_t(u32, (w->bw_est * w->rtt_min) / tp->mss_cache, 2); 220 return max_t(u32, (w->bw_est * w->rtt_min) / tp->mss_cache, 2);
223} 221}
224 222
223static void tcp_westwood_ack(struct sock *sk, u32 ack_flags)
224{
225 if (ack_flags & CA_ACK_SLOWPATH) {
226 struct westwood *w = inet_csk_ca(sk);
227
228 westwood_update_window(sk);
229 w->bk += westwood_acked_count(sk);
230
231 update_rtt_min(w);
232 return;
233 }
234
235 westwood_fast_bw(sk);
236}
237
225static void tcp_westwood_event(struct sock *sk, enum tcp_ca_event event) 238static void tcp_westwood_event(struct sock *sk, enum tcp_ca_event event)
226{ 239{
227 struct tcp_sock *tp = tcp_sk(sk); 240 struct tcp_sock *tp = tcp_sk(sk);
228 struct westwood *w = inet_csk_ca(sk); 241 struct westwood *w = inet_csk_ca(sk);
229 242
230 switch (event) { 243 switch (event) {
231 case CA_EVENT_FAST_ACK:
232 westwood_fast_bw(sk);
233 break;
234
235 case CA_EVENT_COMPLETE_CWR: 244 case CA_EVENT_COMPLETE_CWR:
236 tp->snd_cwnd = tp->snd_ssthresh = tcp_westwood_bw_rttmin(sk); 245 tp->snd_cwnd = tp->snd_ssthresh = tcp_westwood_bw_rttmin(sk);
237 break; 246 break;
238
239 case CA_EVENT_LOSS: 247 case CA_EVENT_LOSS:
240 tp->snd_ssthresh = tcp_westwood_bw_rttmin(sk); 248 tp->snd_ssthresh = tcp_westwood_bw_rttmin(sk);
241 /* Update RTT_min when next ack arrives */ 249 /* Update RTT_min when next ack arrives */
242 w->reset_rtt_min = 1; 250 w->reset_rtt_min = 1;
243 break; 251 break;
244
245 case CA_EVENT_SLOW_ACK:
246 westwood_update_window(sk);
247 w->bk += westwood_acked_count(sk);
248 update_rtt_min(w);
249 break;
250
251 default: 252 default:
252 /* don't care */ 253 /* don't care */
253 break; 254 break;
254 } 255 }
255} 256}
256 257
257
258/* Extract info for Tcp socket info provided via netlink. */ 258/* Extract info for Tcp socket info provided via netlink. */
259static void tcp_westwood_info(struct sock *sk, u32 ext, 259static void tcp_westwood_info(struct sock *sk, u32 ext,
260 struct sk_buff *skb) 260 struct sk_buff *skb)
261{ 261{
262 const struct westwood *ca = inet_csk_ca(sk); 262 const struct westwood *ca = inet_csk_ca(sk);
263
263 if (ext & (1 << (INET_DIAG_VEGASINFO - 1))) { 264 if (ext & (1 << (INET_DIAG_VEGASINFO - 1))) {
264 struct tcpvegas_info info = { 265 struct tcpvegas_info info = {
265 .tcpv_enabled = 1, 266 .tcpv_enabled = 1,
@@ -271,12 +272,12 @@ static void tcp_westwood_info(struct sock *sk, u32 ext,
271 } 272 }
272} 273}
273 274
274
275static struct tcp_congestion_ops tcp_westwood __read_mostly = { 275static struct tcp_congestion_ops tcp_westwood __read_mostly = {
276 .init = tcp_westwood_init, 276 .init = tcp_westwood_init,
277 .ssthresh = tcp_reno_ssthresh, 277 .ssthresh = tcp_reno_ssthresh,
278 .cong_avoid = tcp_reno_cong_avoid, 278 .cong_avoid = tcp_reno_cong_avoid,
279 .cwnd_event = tcp_westwood_event, 279 .cwnd_event = tcp_westwood_event,
280 .in_ack_event = tcp_westwood_ack,
280 .get_info = tcp_westwood_info, 281 .get_info = tcp_westwood_info,
281 .pkts_acked = tcp_westwood_pkts_acked, 282 .pkts_acked = tcp_westwood_pkts_acked,
282 283
diff --git a/net/ipv4/tcp_yeah.c b/net/ipv4/tcp_yeah.c
index 599b79b8eac0..cd7273218598 100644
--- a/net/ipv4/tcp_yeah.c
+++ b/net/ipv4/tcp_yeah.c
@@ -54,10 +54,8 @@ static void tcp_yeah_init(struct sock *sk)
54 /* Ensure the MD arithmetic works. This is somewhat pedantic, 54 /* Ensure the MD arithmetic works. This is somewhat pedantic,
55 * since I don't think we will see a cwnd this large. :) */ 55 * since I don't think we will see a cwnd this large. :) */
56 tp->snd_cwnd_clamp = min_t(u32, tp->snd_cwnd_clamp, 0xffffffff/128); 56 tp->snd_cwnd_clamp = min_t(u32, tp->snd_cwnd_clamp, 0xffffffff/128);
57
58} 57}
59 58
60
61static void tcp_yeah_pkts_acked(struct sock *sk, u32 pkts_acked, s32 rtt_us) 59static void tcp_yeah_pkts_acked(struct sock *sk, u32 pkts_acked, s32 rtt_us)
62{ 60{
63 const struct inet_connection_sock *icsk = inet_csk(sk); 61 const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -84,7 +82,7 @@ static void tcp_yeah_cong_avoid(struct sock *sk, u32 ack, u32 acked)
84 /* Scalable */ 82 /* Scalable */
85 83
86 tp->snd_cwnd_cnt += yeah->pkts_acked; 84 tp->snd_cwnd_cnt += yeah->pkts_acked;
87 if (tp->snd_cwnd_cnt > min(tp->snd_cwnd, TCP_SCALABLE_AI_CNT)){ 85 if (tp->snd_cwnd_cnt > min(tp->snd_cwnd, TCP_SCALABLE_AI_CNT)) {
88 if (tp->snd_cwnd < tp->snd_cwnd_clamp) 86 if (tp->snd_cwnd < tp->snd_cwnd_clamp)
89 tp->snd_cwnd++; 87 tp->snd_cwnd++;
90 tp->snd_cwnd_cnt = 0; 88 tp->snd_cwnd_cnt = 0;
@@ -120,7 +118,6 @@ static void tcp_yeah_cong_avoid(struct sock *sk, u32 ack, u32 acked)
120 */ 118 */
121 119
122 if (after(ack, yeah->vegas.beg_snd_nxt)) { 120 if (after(ack, yeah->vegas.beg_snd_nxt)) {
123
124 /* We do the Vegas calculations only if we got enough RTT 121 /* We do the Vegas calculations only if we got enough RTT
125 * samples that we can be reasonably sure that we got 122 * samples that we can be reasonably sure that we got
126 * at least one RTT sample that wasn't from a delayed ACK. 123 * at least one RTT sample that wasn't from a delayed ACK.
@@ -189,7 +186,6 @@ static void tcp_yeah_cong_avoid(struct sock *sk, u32 ack, u32 acked)
189 } 186 }
190 187
191 yeah->lastQ = queue; 188 yeah->lastQ = queue;
192
193 } 189 }
194 190
195 /* Save the extent of the current window so we can use this 191 /* Save the extent of the current window so we can use this
@@ -205,7 +201,8 @@ static void tcp_yeah_cong_avoid(struct sock *sk, u32 ack, u32 acked)
205 } 201 }
206} 202}
207 203
208static u32 tcp_yeah_ssthresh(struct sock *sk) { 204static u32 tcp_yeah_ssthresh(struct sock *sk)
205{
209 const struct tcp_sock *tp = tcp_sk(sk); 206 const struct tcp_sock *tp = tcp_sk(sk);
210 struct yeah *yeah = inet_csk_ca(sk); 207 struct yeah *yeah = inet_csk_ca(sk);
211 u32 reduction; 208 u32 reduction;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f57c0e4c2326..cd0db5471bb5 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -99,6 +99,7 @@
99#include <linux/slab.h> 99#include <linux/slab.h>
100#include <net/tcp_states.h> 100#include <net/tcp_states.h>
101#include <linux/skbuff.h> 101#include <linux/skbuff.h>
102#include <linux/netdevice.h>
102#include <linux/proc_fs.h> 103#include <linux/proc_fs.h>
103#include <linux/seq_file.h> 104#include <linux/seq_file.h>
104#include <net/net_namespace.h> 105#include <net/net_namespace.h>
@@ -224,7 +225,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
224 remaining = (high - low) + 1; 225 remaining = (high - low) + 1;
225 226
226 rand = prandom_u32(); 227 rand = prandom_u32();
227 first = (((u64)rand * remaining) >> 32) + low; 228 first = reciprocal_scale(rand, remaining) + low;
228 /* 229 /*
229 * force rand to be an odd multiple of UDP_HTABLE_SIZE 230 * force rand to be an odd multiple of UDP_HTABLE_SIZE
230 */ 231 */
@@ -448,7 +449,7 @@ begin:
448 } 449 }
449 } else if (score == badness && reuseport) { 450 } else if (score == badness && reuseport) {
450 matches++; 451 matches++;
451 if (((u64)hash * matches) >> 32 == 0) 452 if (reciprocal_scale(hash, matches) == 0)
452 result = sk; 453 result = sk;
453 hash = next_pseudo_random32(hash); 454 hash = next_pseudo_random32(hash);
454 } 455 }
@@ -529,7 +530,7 @@ begin:
529 } 530 }
530 } else if (score == badness && reuseport) { 531 } else if (score == badness && reuseport) {
531 matches++; 532 matches++;
532 if (((u64)hash * matches) >> 32 == 0) 533 if (reciprocal_scale(hash, matches) == 0)
533 result = sk; 534 result = sk;
534 hash = next_pseudo_random32(hash); 535 hash = next_pseudo_random32(hash);
535 } 536 }
@@ -1787,6 +1788,10 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
1787 if (sk != NULL) { 1788 if (sk != NULL) {
1788 int ret; 1789 int ret;
1789 1790
1791 if (udp_sk(sk)->convert_csum && uh->check && !IS_UDPLITE(sk))
1792 skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
1793 inet_compute_pseudo);
1794
1790 ret = udp_queue_rcv_skb(sk, skb); 1795 ret = udp_queue_rcv_skb(sk, skb);
1791 sock_put(sk); 1796 sock_put(sk);
1792 1797
@@ -1967,7 +1972,7 @@ void udp_v4_early_demux(struct sk_buff *skb)
1967 return; 1972 return;
1968 1973
1969 skb->sk = sk; 1974 skb->sk = sk;
1970 skb->destructor = sock_edemux; 1975 skb->destructor = sock_efree;
1971 dst = sk->sk_rx_dst; 1976 dst = sk->sk_rx_dst;
1972 1977
1973 if (dst) 1978 if (dst)
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 59035bc3008d..507310ef4b56 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -25,30 +25,11 @@ struct udp_offload_priv {
25 struct udp_offload_priv __rcu *next; 25 struct udp_offload_priv __rcu *next;
26}; 26};
27 27
28static int udp4_ufo_send_check(struct sk_buff *skb) 28static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
29{ 29 netdev_features_t features,
30 if (!pskb_may_pull(skb, sizeof(struct udphdr))) 30 struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb,
31 return -EINVAL; 31 netdev_features_t features),
32 32 __be16 new_protocol)
33 if (likely(!skb->encapsulation)) {
34 const struct iphdr *iph;
35 struct udphdr *uh;
36
37 iph = ip_hdr(skb);
38 uh = udp_hdr(skb);
39
40 uh->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, skb->len,
41 IPPROTO_UDP, 0);
42 skb->csum_start = skb_transport_header(skb) - skb->head;
43 skb->csum_offset = offsetof(struct udphdr, check);
44 skb->ip_summed = CHECKSUM_PARTIAL;
45 }
46
47 return 0;
48}
49
50struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
51 netdev_features_t features)
52{ 33{
53 struct sk_buff *segs = ERR_PTR(-EINVAL); 34 struct sk_buff *segs = ERR_PTR(-EINVAL);
54 u16 mac_offset = skb->mac_header; 35 u16 mac_offset = skb->mac_header;
@@ -70,7 +51,7 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
70 skb_reset_mac_header(skb); 51 skb_reset_mac_header(skb);
71 skb_set_network_header(skb, skb_inner_network_offset(skb)); 52 skb_set_network_header(skb, skb_inner_network_offset(skb));
72 skb->mac_len = skb_inner_network_offset(skb); 53 skb->mac_len = skb_inner_network_offset(skb);
73 skb->protocol = htons(ETH_P_TEB); 54 skb->protocol = new_protocol;
74 55
75 need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM); 56 need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM);
76 if (need_csum) 57 if (need_csum)
@@ -78,7 +59,7 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
78 59
79 /* segment inner packet. */ 60 /* segment inner packet. */
80 enc_features = skb->dev->hw_enc_features & netif_skb_features(skb); 61 enc_features = skb->dev->hw_enc_features & netif_skb_features(skb);
81 segs = skb_mac_gso_segment(skb, enc_features); 62 segs = gso_inner_segment(skb, enc_features);
82 if (IS_ERR_OR_NULL(segs)) { 63 if (IS_ERR_OR_NULL(segs)) {
83 skb_gso_error_unwind(skb, protocol, tnl_hlen, mac_offset, 64 skb_gso_error_unwind(skb, protocol, tnl_hlen, mac_offset,
84 mac_len); 65 mac_len);
@@ -123,21 +104,63 @@ out:
123 return segs; 104 return segs;
124} 105}
125 106
107struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
108 netdev_features_t features,
109 bool is_ipv6)
110{
111 __be16 protocol = skb->protocol;
112 const struct net_offload **offloads;
113 const struct net_offload *ops;
114 struct sk_buff *segs = ERR_PTR(-EINVAL);
115 struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb,
116 netdev_features_t features);
117
118 rcu_read_lock();
119
120 switch (skb->inner_protocol_type) {
121 case ENCAP_TYPE_ETHER:
122 protocol = skb->inner_protocol;
123 gso_inner_segment = skb_mac_gso_segment;
124 break;
125 case ENCAP_TYPE_IPPROTO:
126 offloads = is_ipv6 ? inet6_offloads : inet_offloads;
127 ops = rcu_dereference(offloads[skb->inner_ipproto]);
128 if (!ops || !ops->callbacks.gso_segment)
129 goto out_unlock;
130 gso_inner_segment = ops->callbacks.gso_segment;
131 break;
132 default:
133 goto out_unlock;
134 }
135
136 segs = __skb_udp_tunnel_segment(skb, features, gso_inner_segment,
137 protocol);
138
139out_unlock:
140 rcu_read_unlock();
141
142 return segs;
143}
144
126static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, 145static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
127 netdev_features_t features) 146 netdev_features_t features)
128{ 147{
129 struct sk_buff *segs = ERR_PTR(-EINVAL); 148 struct sk_buff *segs = ERR_PTR(-EINVAL);
130 unsigned int mss; 149 unsigned int mss;
131 int offset;
132 __wsum csum; 150 __wsum csum;
151 struct udphdr *uh;
152 struct iphdr *iph;
133 153
134 if (skb->encapsulation && 154 if (skb->encapsulation &&
135 (skb_shinfo(skb)->gso_type & 155 (skb_shinfo(skb)->gso_type &
136 (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM))) { 156 (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM))) {
137 segs = skb_udp_tunnel_segment(skb, features); 157 segs = skb_udp_tunnel_segment(skb, features, false);
138 goto out; 158 goto out;
139 } 159 }
140 160
161 if (!pskb_may_pull(skb, sizeof(struct udphdr)))
162 goto out;
163
141 mss = skb_shinfo(skb)->gso_size; 164 mss = skb_shinfo(skb)->gso_size;
142 if (unlikely(skb->len <= mss)) 165 if (unlikely(skb->len <= mss))
143 goto out; 166 goto out;
@@ -165,10 +188,16 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
165 * HW cannot do checksum of UDP packets sent as multiple 188 * HW cannot do checksum of UDP packets sent as multiple
166 * IP fragments. 189 * IP fragments.
167 */ 190 */
168 offset = skb_checksum_start_offset(skb); 191
169 csum = skb_checksum(skb, offset, skb->len - offset, 0); 192 uh = udp_hdr(skb);
170 offset += skb->csum_offset; 193 iph = ip_hdr(skb);
171 *(__sum16 *)(skb->data + offset) = csum_fold(csum); 194
195 uh->check = 0;
196 csum = skb_checksum(skb, 0, skb->len, 0);
197 uh->check = udp_v4_check(skb->len, iph->saddr, iph->daddr, csum);
198 if (uh->check == 0)
199 uh->check = CSUM_MANGLED_0;
200
172 skb->ip_summed = CHECKSUM_NONE; 201 skb->ip_summed = CHECKSUM_NONE;
173 202
174 /* Fragment the skb. IP headers of the fragments are updated in 203 /* Fragment the skb. IP headers of the fragments are updated in
@@ -228,30 +257,24 @@ unlock:
228} 257}
229EXPORT_SYMBOL(udp_del_offload); 258EXPORT_SYMBOL(udp_del_offload);
230 259
231static struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb) 260struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
261 struct udphdr *uh)
232{ 262{
233 struct udp_offload_priv *uo_priv; 263 struct udp_offload_priv *uo_priv;
234 struct sk_buff *p, **pp = NULL; 264 struct sk_buff *p, **pp = NULL;
235 struct udphdr *uh, *uh2; 265 struct udphdr *uh2;
236 unsigned int hlen, off; 266 unsigned int off = skb_gro_offset(skb);
237 int flush = 1; 267 int flush = 1;
238 268
239 if (NAPI_GRO_CB(skb)->udp_mark || 269 if (NAPI_GRO_CB(skb)->udp_mark ||
240 (!skb->encapsulation && skb->ip_summed != CHECKSUM_COMPLETE)) 270 (skb->ip_summed != CHECKSUM_PARTIAL &&
271 NAPI_GRO_CB(skb)->csum_cnt == 0 &&
272 !NAPI_GRO_CB(skb)->csum_valid))
241 goto out; 273 goto out;
242 274
243 /* mark that this skb passed once through the udp gro layer */ 275 /* mark that this skb passed once through the udp gro layer */
244 NAPI_GRO_CB(skb)->udp_mark = 1; 276 NAPI_GRO_CB(skb)->udp_mark = 1;
245 277
246 off = skb_gro_offset(skb);
247 hlen = off + sizeof(*uh);
248 uh = skb_gro_header_fast(skb, off);
249 if (skb_gro_header_hard(skb, hlen)) {
250 uh = skb_gro_header_slow(skb, hlen, off);
251 if (unlikely(!uh))
252 goto out;
253 }
254
255 rcu_read_lock(); 278 rcu_read_lock();
256 uo_priv = rcu_dereference(udp_offload_base); 279 uo_priv = rcu_dereference(udp_offload_base);
257 for (; uo_priv != NULL; uo_priv = rcu_dereference(uo_priv->next)) { 280 for (; uo_priv != NULL; uo_priv = rcu_dereference(uo_priv->next)) {
@@ -269,7 +292,12 @@ unflush:
269 continue; 292 continue;
270 293
271 uh2 = (struct udphdr *)(p->data + off); 294 uh2 = (struct udphdr *)(p->data + off);
272 if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) { 295
296 /* Match ports and either checksums are either both zero
297 * or nonzero.
298 */
299 if ((*(u32 *)&uh->source != *(u32 *)&uh2->source) ||
300 (!uh->check ^ !uh2->check)) {
273 NAPI_GRO_CB(p)->same_flow = 0; 301 NAPI_GRO_CB(p)->same_flow = 0;
274 continue; 302 continue;
275 } 303 }
@@ -277,6 +305,7 @@ unflush:
277 305
278 skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */ 306 skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */
279 skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr)); 307 skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
308 NAPI_GRO_CB(skb)->proto = uo_priv->offload->ipproto;
280 pp = uo_priv->offload->callbacks.gro_receive(head, skb); 309 pp = uo_priv->offload->callbacks.gro_receive(head, skb);
281 310
282out_unlock: 311out_unlock:
@@ -286,7 +315,34 @@ out:
286 return pp; 315 return pp;
287} 316}
288 317
289static int udp_gro_complete(struct sk_buff *skb, int nhoff) 318static struct sk_buff **udp4_gro_receive(struct sk_buff **head,
319 struct sk_buff *skb)
320{
321 struct udphdr *uh = udp_gro_udphdr(skb);
322
323 if (unlikely(!uh))
324 goto flush;
325
326 /* Don't bother verifying checksum if we're going to flush anyway. */
327 if (NAPI_GRO_CB(skb)->flush)
328 goto skip;
329
330 if (skb_gro_checksum_validate_zero_check(skb, IPPROTO_UDP, uh->check,
331 inet_gro_compute_pseudo))
332 goto flush;
333 else if (uh->check)
334 skb_gro_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
335 inet_gro_compute_pseudo);
336skip:
337 NAPI_GRO_CB(skb)->is_ipv6 = 0;
338 return udp_gro_receive(head, skb, uh);
339
340flush:
341 NAPI_GRO_CB(skb)->flush = 1;
342 return NULL;
343}
344
345int udp_gro_complete(struct sk_buff *skb, int nhoff)
290{ 346{
291 struct udp_offload_priv *uo_priv; 347 struct udp_offload_priv *uo_priv;
292 __be16 newlen = htons(skb->len - nhoff); 348 __be16 newlen = htons(skb->len - nhoff);
@@ -304,19 +360,32 @@ static int udp_gro_complete(struct sk_buff *skb, int nhoff)
304 break; 360 break;
305 } 361 }
306 362
307 if (uo_priv != NULL) 363 if (uo_priv != NULL) {
364 NAPI_GRO_CB(skb)->proto = uo_priv->offload->ipproto;
308 err = uo_priv->offload->callbacks.gro_complete(skb, nhoff + sizeof(struct udphdr)); 365 err = uo_priv->offload->callbacks.gro_complete(skb, nhoff + sizeof(struct udphdr));
366 }
309 367
310 rcu_read_unlock(); 368 rcu_read_unlock();
311 return err; 369 return err;
312} 370}
313 371
372static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
373{
374 const struct iphdr *iph = ip_hdr(skb);
375 struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
376
377 if (uh->check)
378 uh->check = ~udp_v4_check(skb->len - nhoff, iph->saddr,
379 iph->daddr, 0);
380
381 return udp_gro_complete(skb, nhoff);
382}
383
314static const struct net_offload udpv4_offload = { 384static const struct net_offload udpv4_offload = {
315 .callbacks = { 385 .callbacks = {
316 .gso_send_check = udp4_ufo_send_check,
317 .gso_segment = udp4_ufo_fragment, 386 .gso_segment = udp4_ufo_fragment,
318 .gro_receive = udp_gro_receive, 387 .gro_receive = udp4_gro_receive,
319 .gro_complete = udp_gro_complete, 388 .gro_complete = udp4_gro_complete,
320 }, 389 },
321}; 390};
322 391
diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c
index 61ec1a65207e..1671263e5fa0 100644
--- a/net/ipv4/udp_tunnel.c
+++ b/net/ipv4/udp_tunnel.c
@@ -8,83 +8,40 @@
8#include <net/udp_tunnel.h> 8#include <net/udp_tunnel.h>
9#include <net/net_namespace.h> 9#include <net/net_namespace.h>
10 10
11int udp_sock_create(struct net *net, struct udp_port_cfg *cfg, 11int udp_sock_create4(struct net *net, struct udp_port_cfg *cfg,
12 struct socket **sockp) 12 struct socket **sockp)
13{ 13{
14 int err = -EINVAL; 14 int err;
15 struct socket *sock = NULL; 15 struct socket *sock = NULL;
16 struct sockaddr_in udp_addr;
16 17
17#if IS_ENABLED(CONFIG_IPV6) 18 err = sock_create_kern(AF_INET, SOCK_DGRAM, 0, &sock);
18 if (cfg->family == AF_INET6) { 19 if (err < 0)
19 struct sockaddr_in6 udp6_addr; 20 goto error;
20 21
21 err = sock_create_kern(AF_INET6, SOCK_DGRAM, 0, &sock); 22 sk_change_net(sock->sk, net);
22 if (err < 0)
23 goto error;
24
25 sk_change_net(sock->sk, net);
26
27 udp6_addr.sin6_family = AF_INET6;
28 memcpy(&udp6_addr.sin6_addr, &cfg->local_ip6,
29 sizeof(udp6_addr.sin6_addr));
30 udp6_addr.sin6_port = cfg->local_udp_port;
31 err = kernel_bind(sock, (struct sockaddr *)&udp6_addr,
32 sizeof(udp6_addr));
33 if (err < 0)
34 goto error;
35
36 if (cfg->peer_udp_port) {
37 udp6_addr.sin6_family = AF_INET6;
38 memcpy(&udp6_addr.sin6_addr, &cfg->peer_ip6,
39 sizeof(udp6_addr.sin6_addr));
40 udp6_addr.sin6_port = cfg->peer_udp_port;
41 err = kernel_connect(sock,
42 (struct sockaddr *)&udp6_addr,
43 sizeof(udp6_addr), 0);
44 }
45 if (err < 0)
46 goto error;
47 23
48 udp_set_no_check6_tx(sock->sk, !cfg->use_udp6_tx_checksums); 24 udp_addr.sin_family = AF_INET;
49 udp_set_no_check6_rx(sock->sk, !cfg->use_udp6_rx_checksums); 25 udp_addr.sin_addr = cfg->local_ip;
50 } else 26 udp_addr.sin_port = cfg->local_udp_port;
51#endif 27 err = kernel_bind(sock, (struct sockaddr *)&udp_addr,
52 if (cfg->family == AF_INET) { 28 sizeof(udp_addr));
53 struct sockaddr_in udp_addr; 29 if (err < 0)
54 30 goto error;
55 err = sock_create_kern(AF_INET, SOCK_DGRAM, 0, &sock);
56 if (err < 0)
57 goto error;
58
59 sk_change_net(sock->sk, net);
60 31
32 if (cfg->peer_udp_port) {
61 udp_addr.sin_family = AF_INET; 33 udp_addr.sin_family = AF_INET;
62 udp_addr.sin_addr = cfg->local_ip; 34 udp_addr.sin_addr = cfg->peer_ip;
63 udp_addr.sin_port = cfg->local_udp_port; 35 udp_addr.sin_port = cfg->peer_udp_port;
64 err = kernel_bind(sock, (struct sockaddr *)&udp_addr, 36 err = kernel_connect(sock, (struct sockaddr *)&udp_addr,
65 sizeof(udp_addr)); 37 sizeof(udp_addr), 0);
66 if (err < 0) 38 if (err < 0)
67 goto error; 39 goto error;
68
69 if (cfg->peer_udp_port) {
70 udp_addr.sin_family = AF_INET;
71 udp_addr.sin_addr = cfg->peer_ip;
72 udp_addr.sin_port = cfg->peer_udp_port;
73 err = kernel_connect(sock,
74 (struct sockaddr *)&udp_addr,
75 sizeof(udp_addr), 0);
76 if (err < 0)
77 goto error;
78 }
79
80 sock->sk->sk_no_check_tx = !cfg->use_udp_checksums;
81 } else {
82 return -EPFNOSUPPORT;
83 } 40 }
84 41
42 sock->sk->sk_no_check_tx = !cfg->use_udp_checksums;
85 43
86 *sockp = sock; 44 *sockp = sock;
87
88 return 0; 45 return 0;
89 46
90error: 47error:
@@ -95,6 +52,57 @@ error:
95 *sockp = NULL; 52 *sockp = NULL;
96 return err; 53 return err;
97} 54}
98EXPORT_SYMBOL(udp_sock_create); 55EXPORT_SYMBOL(udp_sock_create4);
56
57void setup_udp_tunnel_sock(struct net *net, struct socket *sock,
58 struct udp_tunnel_sock_cfg *cfg)
59{
60 struct sock *sk = sock->sk;
61
62 /* Disable multicast loopback */
63 inet_sk(sk)->mc_loop = 0;
64
65 /* Enable CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE conversion */
66 udp_set_convert_csum(sk, true);
67
68 rcu_assign_sk_user_data(sk, cfg->sk_user_data);
69
70 udp_sk(sk)->encap_type = cfg->encap_type;
71 udp_sk(sk)->encap_rcv = cfg->encap_rcv;
72 udp_sk(sk)->encap_destroy = cfg->encap_destroy;
73
74 udp_tunnel_encap_enable(sock);
75}
76EXPORT_SYMBOL_GPL(setup_udp_tunnel_sock);
77
78int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt,
79 struct sk_buff *skb, __be32 src, __be32 dst,
80 __u8 tos, __u8 ttl, __be16 df, __be16 src_port,
81 __be16 dst_port, bool xnet)
82{
83 struct udphdr *uh;
84
85 __skb_push(skb, sizeof(*uh));
86 skb_reset_transport_header(skb);
87 uh = udp_hdr(skb);
88
89 uh->dest = dst_port;
90 uh->source = src_port;
91 uh->len = htons(skb->len);
92
93 udp_set_csum(sock->sk->sk_no_check_tx, skb, src, dst, skb->len);
94
95 return iptunnel_xmit(sock->sk, rt, skb, src, dst, IPPROTO_UDP,
96 tos, ttl, df, xnet);
97}
98EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb);
99
100void udp_tunnel_sock_release(struct socket *sock)
101{
102 rcu_assign_sk_user_data(sock->sk, NULL);
103 kernel_sock_shutdown(sock, SHUT_RDWR);
104 sk_release_kernel(sock->sk);
105}
106EXPORT_SYMBOL_GPL(udp_tunnel_sock_release);
99 107
100MODULE_LICENSE("GPL"); 108MODULE_LICENSE("GPL");
diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 2fe68364bb20..2e8c06108ab9 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -45,3 +45,7 @@ obj-y += addrconf_core.o exthdrs_core.o ip6_checksum.o ip6_icmp.o
45obj-$(CONFIG_INET) += output_core.o protocol.o $(ipv6-offload) 45obj-$(CONFIG_INET) += output_core.o protocol.o $(ipv6-offload)
46 46
47obj-$(subst m,y,$(CONFIG_IPV6)) += inet6_hashtables.o 47obj-$(subst m,y,$(CONFIG_IPV6)) += inet6_hashtables.o
48
49ifneq ($(CONFIG_IPV6),)
50obj-$(CONFIG_NET_UDP_TUNNEL) += ip6_udp_tunnel.o
51endif
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3e118dfddd02..725c763270a0 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -180,7 +180,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
180 .rtr_solicits = MAX_RTR_SOLICITATIONS, 180 .rtr_solicits = MAX_RTR_SOLICITATIONS,
181 .rtr_solicit_interval = RTR_SOLICITATION_INTERVAL, 181 .rtr_solicit_interval = RTR_SOLICITATION_INTERVAL,
182 .rtr_solicit_delay = MAX_RTR_SOLICITATION_DELAY, 182 .rtr_solicit_delay = MAX_RTR_SOLICITATION_DELAY,
183 .use_tempaddr = 0, 183 .use_tempaddr = 0,
184 .temp_valid_lft = TEMP_VALID_LIFETIME, 184 .temp_valid_lft = TEMP_VALID_LIFETIME,
185 .temp_prefered_lft = TEMP_PREFERRED_LIFETIME, 185 .temp_prefered_lft = TEMP_PREFERRED_LIFETIME,
186 .regen_max_retry = REGEN_MAX_RETRY, 186 .regen_max_retry = REGEN_MAX_RETRY,
@@ -1105,8 +1105,8 @@ retry:
1105 spin_unlock_bh(&ifp->lock); 1105 spin_unlock_bh(&ifp->lock);
1106 1106
1107 regen_advance = idev->cnf.regen_max_retry * 1107 regen_advance = idev->cnf.regen_max_retry *
1108 idev->cnf.dad_transmits * 1108 idev->cnf.dad_transmits *
1109 NEIGH_VAR(idev->nd_parms, RETRANS_TIME) / HZ; 1109 NEIGH_VAR(idev->nd_parms, RETRANS_TIME) / HZ;
1110 write_unlock_bh(&idev->lock); 1110 write_unlock_bh(&idev->lock);
1111 1111
1112 /* A temporary address is created only if this calculated Preferred 1112 /* A temporary address is created only if this calculated Preferred
@@ -1725,7 +1725,7 @@ static void addrconf_join_anycast(struct inet6_ifaddr *ifp)
1725 ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len); 1725 ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len);
1726 if (ipv6_addr_any(&addr)) 1726 if (ipv6_addr_any(&addr))
1727 return; 1727 return;
1728 ipv6_dev_ac_inc(ifp->idev->dev, &addr); 1728 __ipv6_dev_ac_inc(ifp->idev, &addr);
1729} 1729}
1730 1730
1731/* caller must hold RTNL */ 1731/* caller must hold RTNL */
@@ -2844,6 +2844,9 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
2844 if (dev->flags & IFF_SLAVE) 2844 if (dev->flags & IFF_SLAVE)
2845 break; 2845 break;
2846 2846
2847 if (idev && idev->cnf.disable_ipv6)
2848 break;
2849
2847 if (event == NETDEV_UP) { 2850 if (event == NETDEV_UP) {
2848 if (!addrconf_qdisc_ok(dev)) { 2851 if (!addrconf_qdisc_ok(dev)) {
2849 /* device is not ready yet. */ 2852 /* device is not ready yet. */
@@ -3030,7 +3033,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
3030 struct hlist_head *h = &inet6_addr_lst[i]; 3033 struct hlist_head *h = &inet6_addr_lst[i];
3031 3034
3032 spin_lock_bh(&addrconf_hash_lock); 3035 spin_lock_bh(&addrconf_hash_lock);
3033 restart: 3036restart:
3034 hlist_for_each_entry_rcu(ifa, h, addr_lst) { 3037 hlist_for_each_entry_rcu(ifa, h, addr_lst) {
3035 if (ifa->idev == idev) { 3038 if (ifa->idev == idev) {
3036 hlist_del_init_rcu(&ifa->addr_lst); 3039 hlist_del_init_rcu(&ifa->addr_lst);
@@ -3544,8 +3547,8 @@ static void __net_exit if6_proc_net_exit(struct net *net)
3544} 3547}
3545 3548
3546static struct pernet_operations if6_proc_net_ops = { 3549static struct pernet_operations if6_proc_net_ops = {
3547 .init = if6_proc_net_init, 3550 .init = if6_proc_net_init,
3548 .exit = if6_proc_net_exit, 3551 .exit = if6_proc_net_exit,
3549}; 3552};
3550 3553
3551int __init if6_proc_init(void) 3554int __init if6_proc_init(void)
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 2daa3a133e49..e8c4400f23e9 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -7,15 +7,15 @@
7 * 7 *
8 * Adapted from linux/net/ipv4/af_inet.c 8 * Adapted from linux/net/ipv4/af_inet.c
9 * 9 *
10 * Fixes: 10 * Fixes:
11 * piggy, Karl Knutson : Socket protocol table 11 * piggy, Karl Knutson : Socket protocol table
12 * Hideaki YOSHIFUJI : sin6_scope_id support 12 * Hideaki YOSHIFUJI : sin6_scope_id support
13 * Arnaldo Melo : check proc_net_create return, cleanups 13 * Arnaldo Melo : check proc_net_create return, cleanups
14 * 14 *
15 * This program is free software; you can redistribute it and/or 15 * This program is free software; you can redistribute it and/or
16 * modify it under the terms of the GNU General Public License 16 * modify it under the terms of the GNU General Public License
17 * as published by the Free Software Foundation; either version 17 * as published by the Free Software Foundation; either version
18 * 2 of the License, or (at your option) any later version. 18 * 2 of the License, or (at your option) any later version.
19 */ 19 */
20 20
21#define pr_fmt(fmt) "IPv6: " fmt 21#define pr_fmt(fmt) "IPv6: " fmt
@@ -302,7 +302,7 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
302 /* Reproduce AF_INET checks to make the bindings consistent */ 302 /* Reproduce AF_INET checks to make the bindings consistent */
303 v4addr = addr->sin6_addr.s6_addr32[3]; 303 v4addr = addr->sin6_addr.s6_addr32[3];
304 chk_addr_ret = inet_addr_type(net, v4addr); 304 chk_addr_ret = inet_addr_type(net, v4addr);
305 if (!sysctl_ip_nonlocal_bind && 305 if (!net->ipv4.sysctl_ip_nonlocal_bind &&
306 !(inet->freebind || inet->transparent) && 306 !(inet->freebind || inet->transparent) &&
307 v4addr != htonl(INADDR_ANY) && 307 v4addr != htonl(INADDR_ANY) &&
308 chk_addr_ret != RTN_LOCAL && 308 chk_addr_ret != RTN_LOCAL &&
@@ -672,10 +672,10 @@ int inet6_sk_rebuild_header(struct sock *sk)
672} 672}
673EXPORT_SYMBOL_GPL(inet6_sk_rebuild_header); 673EXPORT_SYMBOL_GPL(inet6_sk_rebuild_header);
674 674
675bool ipv6_opt_accepted(const struct sock *sk, const struct sk_buff *skb) 675bool ipv6_opt_accepted(const struct sock *sk, const struct sk_buff *skb,
676 const struct inet6_skb_parm *opt)
676{ 677{
677 const struct ipv6_pinfo *np = inet6_sk(sk); 678 const struct ipv6_pinfo *np = inet6_sk(sk);
678 const struct inet6_skb_parm *opt = IP6CB(skb);
679 679
680 if (np->rxopt.all) { 680 if (np->rxopt.all) {
681 if ((opt->hop && (np->rxopt.bits.hopopts || 681 if ((opt->hop && (np->rxopt.bits.hopopts ||
@@ -766,7 +766,7 @@ static int __net_init inet6_net_init(struct net *net)
766 net->ipv6.sysctl.icmpv6_time = 1*HZ; 766 net->ipv6.sysctl.icmpv6_time = 1*HZ;
767 net->ipv6.sysctl.flowlabel_consistency = 1; 767 net->ipv6.sysctl.flowlabel_consistency = 1;
768 net->ipv6.sysctl.auto_flowlabels = 0; 768 net->ipv6.sysctl.auto_flowlabels = 0;
769 atomic_set(&net->ipv6.rt_genid, 0); 769 atomic_set(&net->ipv6.fib6_sernum, 1);
770 770
771 err = ipv6_init_mibs(net); 771 err = ipv6_init_mibs(net);
772 if (err) 772 if (err)
diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c
index 72a4930bdc0a..6d16eb0e0c7f 100644
--- a/net/ipv6/ah6.c
+++ b/net/ipv6/ah6.c
@@ -17,10 +17,10 @@
17 * Authors 17 * Authors
18 * 18 *
19 * Mitsuru KANDA @USAGI : IPv6 Support 19 * Mitsuru KANDA @USAGI : IPv6 Support
20 * Kazunori MIYAZAWA @USAGI : 20 * Kazunori MIYAZAWA @USAGI :
21 * Kunihiro Ishiguro <kunihiro@ipinfusion.com> 21 * Kunihiro Ishiguro <kunihiro@ipinfusion.com>
22 * 22 *
23 * This file is derived from net/ipv4/ah.c. 23 * This file is derived from net/ipv4/ah.c.
24 */ 24 */
25 25
26#define pr_fmt(fmt) "IPv6: " fmt 26#define pr_fmt(fmt) "IPv6: " fmt
@@ -284,7 +284,7 @@ static int ipv6_clear_mutable_options(struct ipv6hdr *iph, int len, int dir)
284 ipv6_rearrange_rthdr(iph, exthdr.rth); 284 ipv6_rearrange_rthdr(iph, exthdr.rth);
285 break; 285 break;
286 286
287 default : 287 default:
288 return 0; 288 return 0;
289 } 289 }
290 290
@@ -478,7 +478,7 @@ static void ah6_input_done(struct crypto_async_request *base, int err)
478 auth_data = ah_tmp_auth(work_iph, hdr_len); 478 auth_data = ah_tmp_auth(work_iph, hdr_len);
479 icv = ah_tmp_icv(ahp->ahash, auth_data, ahp->icv_trunc_len); 479 icv = ah_tmp_icv(ahp->ahash, auth_data, ahp->icv_trunc_len);
480 480
481 err = memcmp(icv, auth_data, ahp->icv_trunc_len) ? -EBADMSG: 0; 481 err = memcmp(icv, auth_data, ahp->icv_trunc_len) ? -EBADMSG : 0;
482 if (err) 482 if (err)
483 goto out; 483 goto out;
484 484
@@ -622,7 +622,7 @@ static int ah6_input(struct xfrm_state *x, struct sk_buff *skb)
622 goto out_free; 622 goto out_free;
623 } 623 }
624 624
625 err = memcmp(icv, auth_data, ahp->icv_trunc_len) ? -EBADMSG: 0; 625 err = memcmp(icv, auth_data, ahp->icv_trunc_len) ? -EBADMSG : 0;
626 if (err) 626 if (err)
627 goto out_free; 627 goto out_free;
628 628
@@ -647,8 +647,8 @@ static int ah6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
647 u8 type, u8 code, int offset, __be32 info) 647 u8 type, u8 code, int offset, __be32 info)
648{ 648{
649 struct net *net = dev_net(skb->dev); 649 struct net *net = dev_net(skb->dev);
650 struct ipv6hdr *iph = (struct ipv6hdr*)skb->data; 650 struct ipv6hdr *iph = (struct ipv6hdr *)skb->data;
651 struct ip_auth_hdr *ah = (struct ip_auth_hdr*)(skb->data+offset); 651 struct ip_auth_hdr *ah = (struct ip_auth_hdr *)(skb->data+offset);
652 struct xfrm_state *x; 652 struct xfrm_state *x;
653 653
654 if (type != ICMPV6_PKT_TOOBIG && 654 if (type != ICMPV6_PKT_TOOBIG &&
@@ -713,8 +713,6 @@ static int ah6_init_state(struct xfrm_state *x)
713 ahp->icv_full_len = aalg_desc->uinfo.auth.icv_fullbits/8; 713 ahp->icv_full_len = aalg_desc->uinfo.auth.icv_fullbits/8;
714 ahp->icv_trunc_len = x->aalg->alg_trunc_len/8; 714 ahp->icv_trunc_len = x->aalg->alg_trunc_len/8;
715 715
716 BUG_ON(ahp->icv_trunc_len > MAX_AH_AUTH_LEN);
717
718 x->props.header_len = XFRM_ALIGN8(sizeof(struct ip_auth_hdr) + 716 x->props.header_len = XFRM_ALIGN8(sizeof(struct ip_auth_hdr) +
719 ahp->icv_trunc_len); 717 ahp->icv_trunc_len);
720 switch (x->props.mode) { 718 switch (x->props.mode) {
@@ -755,11 +753,10 @@ static int ah6_rcv_cb(struct sk_buff *skb, int err)
755 return 0; 753 return 0;
756} 754}
757 755
758static const struct xfrm_type ah6_type = 756static const struct xfrm_type ah6_type = {
759{
760 .description = "AH6", 757 .description = "AH6",
761 .owner = THIS_MODULE, 758 .owner = THIS_MODULE,
762 .proto = IPPROTO_AH, 759 .proto = IPPROTO_AH,
763 .flags = XFRM_TYPE_REPLAY_PROT, 760 .flags = XFRM_TYPE_REPLAY_PROT,
764 .init_state = ah6_init_state, 761 .init_state = ah6_init_state,
765 .destructor = ah6_destroy, 762 .destructor = ah6_destroy,
diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c
index 9a386842fd62..f5e319a8d4e2 100644
--- a/net/ipv6/anycast.c
+++ b/net/ipv6/anycast.c
@@ -46,10 +46,6 @@
46 46
47static int ipv6_dev_ac_dec(struct net_device *dev, const struct in6_addr *addr); 47static int ipv6_dev_ac_dec(struct net_device *dev, const struct in6_addr *addr);
48 48
49/* Big ac list lock for all the sockets */
50static DEFINE_SPINLOCK(ipv6_sk_ac_lock);
51
52
53/* 49/*
54 * socket join an anycast group 50 * socket join an anycast group
55 */ 51 */
@@ -78,7 +74,6 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
78 pac->acl_addr = *addr; 74 pac->acl_addr = *addr;
79 75
80 rtnl_lock(); 76 rtnl_lock();
81 rcu_read_lock();
82 if (ifindex == 0) { 77 if (ifindex == 0) {
83 struct rt6_info *rt; 78 struct rt6_info *rt;
84 79
@@ -91,11 +86,11 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
91 goto error; 86 goto error;
92 } else { 87 } else {
93 /* router, no matching interface: just pick one */ 88 /* router, no matching interface: just pick one */
94 dev = dev_get_by_flags_rcu(net, IFF_UP, 89 dev = __dev_get_by_flags(net, IFF_UP,
95 IFF_UP | IFF_LOOPBACK); 90 IFF_UP | IFF_LOOPBACK);
96 } 91 }
97 } else 92 } else
98 dev = dev_get_by_index_rcu(net, ifindex); 93 dev = __dev_get_by_index(net, ifindex);
99 94
100 if (dev == NULL) { 95 if (dev == NULL) {
101 err = -ENODEV; 96 err = -ENODEV;
@@ -127,17 +122,14 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
127 goto error; 122 goto error;
128 } 123 }
129 124
130 err = ipv6_dev_ac_inc(dev, addr); 125 err = __ipv6_dev_ac_inc(idev, addr);
131 if (!err) { 126 if (!err) {
132 spin_lock_bh(&ipv6_sk_ac_lock);
133 pac->acl_next = np->ipv6_ac_list; 127 pac->acl_next = np->ipv6_ac_list;
134 np->ipv6_ac_list = pac; 128 np->ipv6_ac_list = pac;
135 spin_unlock_bh(&ipv6_sk_ac_lock);
136 pac = NULL; 129 pac = NULL;
137 } 130 }
138 131
139error: 132error:
140 rcu_read_unlock();
141 rtnl_unlock(); 133 rtnl_unlock();
142 if (pac) 134 if (pac)
143 sock_kfree_s(sk, pac, sizeof(*pac)); 135 sock_kfree_s(sk, pac, sizeof(*pac));
@@ -154,7 +146,7 @@ int ipv6_sock_ac_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
154 struct ipv6_ac_socklist *pac, *prev_pac; 146 struct ipv6_ac_socklist *pac, *prev_pac;
155 struct net *net = sock_net(sk); 147 struct net *net = sock_net(sk);
156 148
157 spin_lock_bh(&ipv6_sk_ac_lock); 149 rtnl_lock();
158 prev_pac = NULL; 150 prev_pac = NULL;
159 for (pac = np->ipv6_ac_list; pac; pac = pac->acl_next) { 151 for (pac = np->ipv6_ac_list; pac; pac = pac->acl_next) {
160 if ((ifindex == 0 || pac->acl_ifindex == ifindex) && 152 if ((ifindex == 0 || pac->acl_ifindex == ifindex) &&
@@ -163,7 +155,7 @@ int ipv6_sock_ac_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
163 prev_pac = pac; 155 prev_pac = pac;
164 } 156 }
165 if (!pac) { 157 if (!pac) {
166 spin_unlock_bh(&ipv6_sk_ac_lock); 158 rtnl_unlock();
167 return -ENOENT; 159 return -ENOENT;
168 } 160 }
169 if (prev_pac) 161 if (prev_pac)
@@ -171,14 +163,9 @@ int ipv6_sock_ac_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
171 else 163 else
172 np->ipv6_ac_list = pac->acl_next; 164 np->ipv6_ac_list = pac->acl_next;
173 165
174 spin_unlock_bh(&ipv6_sk_ac_lock); 166 dev = __dev_get_by_index(net, pac->acl_ifindex);
175
176 rtnl_lock();
177 rcu_read_lock();
178 dev = dev_get_by_index_rcu(net, pac->acl_ifindex);
179 if (dev) 167 if (dev)
180 ipv6_dev_ac_dec(dev, &pac->acl_addr); 168 ipv6_dev_ac_dec(dev, &pac->acl_addr);
181 rcu_read_unlock();
182 rtnl_unlock(); 169 rtnl_unlock();
183 170
184 sock_kfree_s(sk, pac, sizeof(*pac)); 171 sock_kfree_s(sk, pac, sizeof(*pac));
@@ -196,19 +183,16 @@ void ipv6_sock_ac_close(struct sock *sk)
196 if (!np->ipv6_ac_list) 183 if (!np->ipv6_ac_list)
197 return; 184 return;
198 185
199 spin_lock_bh(&ipv6_sk_ac_lock); 186 rtnl_lock();
200 pac = np->ipv6_ac_list; 187 pac = np->ipv6_ac_list;
201 np->ipv6_ac_list = NULL; 188 np->ipv6_ac_list = NULL;
202 spin_unlock_bh(&ipv6_sk_ac_lock);
203 189
204 prev_index = 0; 190 prev_index = 0;
205 rtnl_lock();
206 rcu_read_lock();
207 while (pac) { 191 while (pac) {
208 struct ipv6_ac_socklist *next = pac->acl_next; 192 struct ipv6_ac_socklist *next = pac->acl_next;
209 193
210 if (pac->acl_ifindex != prev_index) { 194 if (pac->acl_ifindex != prev_index) {
211 dev = dev_get_by_index_rcu(net, pac->acl_ifindex); 195 dev = __dev_get_by_index(net, pac->acl_ifindex);
212 prev_index = pac->acl_ifindex; 196 prev_index = pac->acl_ifindex;
213 } 197 }
214 if (dev) 198 if (dev)
@@ -216,10 +200,14 @@ void ipv6_sock_ac_close(struct sock *sk)
216 sock_kfree_s(sk, pac, sizeof(*pac)); 200 sock_kfree_s(sk, pac, sizeof(*pac));
217 pac = next; 201 pac = next;
218 } 202 }
219 rcu_read_unlock();
220 rtnl_unlock(); 203 rtnl_unlock();
221} 204}
222 205
206static void aca_get(struct ifacaddr6 *aca)
207{
208 atomic_inc(&aca->aca_refcnt);
209}
210
223static void aca_put(struct ifacaddr6 *ac) 211static void aca_put(struct ifacaddr6 *ac)
224{ 212{
225 if (atomic_dec_and_test(&ac->aca_refcnt)) { 213 if (atomic_dec_and_test(&ac->aca_refcnt)) {
@@ -229,23 +217,40 @@ static void aca_put(struct ifacaddr6 *ac)
229 } 217 }
230} 218}
231 219
220static struct ifacaddr6 *aca_alloc(struct rt6_info *rt,
221 const struct in6_addr *addr)
222{
223 struct inet6_dev *idev = rt->rt6i_idev;
224 struct ifacaddr6 *aca;
225
226 aca = kzalloc(sizeof(*aca), GFP_ATOMIC);
227 if (aca == NULL)
228 return NULL;
229
230 aca->aca_addr = *addr;
231 in6_dev_hold(idev);
232 aca->aca_idev = idev;
233 aca->aca_rt = rt;
234 aca->aca_users = 1;
235 /* aca_tstamp should be updated upon changes */
236 aca->aca_cstamp = aca->aca_tstamp = jiffies;
237 atomic_set(&aca->aca_refcnt, 1);
238 spin_lock_init(&aca->aca_lock);
239
240 return aca;
241}
242
232/* 243/*
233 * device anycast group inc (add if not found) 244 * device anycast group inc (add if not found)
234 */ 245 */
235int ipv6_dev_ac_inc(struct net_device *dev, const struct in6_addr *addr) 246int __ipv6_dev_ac_inc(struct inet6_dev *idev, const struct in6_addr *addr)
236{ 247{
237 struct ifacaddr6 *aca; 248 struct ifacaddr6 *aca;
238 struct inet6_dev *idev;
239 struct rt6_info *rt; 249 struct rt6_info *rt;
240 int err; 250 int err;
241 251
242 ASSERT_RTNL(); 252 ASSERT_RTNL();
243 253
244 idev = in6_dev_get(dev);
245
246 if (idev == NULL)
247 return -EINVAL;
248
249 write_lock_bh(&idev->lock); 254 write_lock_bh(&idev->lock);
250 if (idev->dead) { 255 if (idev->dead) {
251 err = -ENODEV; 256 err = -ENODEV;
@@ -260,46 +265,35 @@ int ipv6_dev_ac_inc(struct net_device *dev, const struct in6_addr *addr)
260 } 265 }
261 } 266 }
262 267
263 /*
264 * not found: create a new one.
265 */
266
267 aca = kzalloc(sizeof(struct ifacaddr6), GFP_ATOMIC);
268
269 if (aca == NULL) {
270 err = -ENOMEM;
271 goto out;
272 }
273
274 rt = addrconf_dst_alloc(idev, addr, true); 268 rt = addrconf_dst_alloc(idev, addr, true);
275 if (IS_ERR(rt)) { 269 if (IS_ERR(rt)) {
276 kfree(aca);
277 err = PTR_ERR(rt); 270 err = PTR_ERR(rt);
278 goto out; 271 goto out;
279 } 272 }
280 273 aca = aca_alloc(rt, addr);
281 aca->aca_addr = *addr; 274 if (aca == NULL) {
282 aca->aca_idev = idev; 275 ip6_rt_put(rt);
283 aca->aca_rt = rt; 276 err = -ENOMEM;
284 aca->aca_users = 1; 277 goto out;
285 /* aca_tstamp should be updated upon changes */ 278 }
286 aca->aca_cstamp = aca->aca_tstamp = jiffies;
287 atomic_set(&aca->aca_refcnt, 2);
288 spin_lock_init(&aca->aca_lock);
289 279
290 aca->aca_next = idev->ac_list; 280 aca->aca_next = idev->ac_list;
291 idev->ac_list = aca; 281 idev->ac_list = aca;
282
283 /* Hold this for addrconf_join_solict() below before we unlock,
284 * it is already exposed via idev->ac_list.
285 */
286 aca_get(aca);
292 write_unlock_bh(&idev->lock); 287 write_unlock_bh(&idev->lock);
293 288
294 ip6_ins_rt(rt); 289 ip6_ins_rt(rt);
295 290
296 addrconf_join_solict(dev, &aca->aca_addr); 291 addrconf_join_solict(idev->dev, &aca->aca_addr);
297 292
298 aca_put(aca); 293 aca_put(aca);
299 return 0; 294 return 0;
300out: 295out:
301 write_unlock_bh(&idev->lock); 296 write_unlock_bh(&idev->lock);
302 in6_dev_put(idev);
303 return err; 297 return err;
304} 298}
305 299
@@ -341,7 +335,7 @@ int __ipv6_dev_ac_dec(struct inet6_dev *idev, const struct in6_addr *addr)
341 return 0; 335 return 0;
342} 336}
343 337
344/* called with rcu_read_lock() */ 338/* called with rtnl_lock() */
345static int ipv6_dev_ac_dec(struct net_device *dev, const struct in6_addr *addr) 339static int ipv6_dev_ac_dec(struct net_device *dev, const struct in6_addr *addr)
346{ 340{
347 struct inet6_dev *idev = __in6_dev_get(dev); 341 struct inet6_dev *idev = __in6_dev_get(dev);
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 2753319524f1..2cdc38338be3 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -43,13 +43,13 @@ static bool ipv6_mapped_addr_any(const struct in6_addr *a)
43int ip6_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) 43int ip6_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
44{ 44{
45 struct sockaddr_in6 *usin = (struct sockaddr_in6 *) uaddr; 45 struct sockaddr_in6 *usin = (struct sockaddr_in6 *) uaddr;
46 struct inet_sock *inet = inet_sk(sk); 46 struct inet_sock *inet = inet_sk(sk);
47 struct ipv6_pinfo *np = inet6_sk(sk); 47 struct ipv6_pinfo *np = inet6_sk(sk);
48 struct in6_addr *daddr, *final_p, final; 48 struct in6_addr *daddr, *final_p, final;
49 struct dst_entry *dst; 49 struct dst_entry *dst;
50 struct flowi6 fl6; 50 struct flowi6 fl6;
51 struct ip6_flowlabel *flowlabel = NULL; 51 struct ip6_flowlabel *flowlabel = NULL;
52 struct ipv6_txoptions *opt; 52 struct ipv6_txoptions *opt;
53 int addr_type; 53 int addr_type;
54 int err; 54 int err;
55 55
@@ -332,7 +332,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
332{ 332{
333 struct ipv6_pinfo *np = inet6_sk(sk); 333 struct ipv6_pinfo *np = inet6_sk(sk);
334 struct sock_exterr_skb *serr; 334 struct sock_exterr_skb *serr;
335 struct sk_buff *skb, *skb2; 335 struct sk_buff *skb;
336 DECLARE_SOCKADDR(struct sockaddr_in6 *, sin, msg->msg_name); 336 DECLARE_SOCKADDR(struct sockaddr_in6 *, sin, msg->msg_name);
337 struct { 337 struct {
338 struct sock_extended_err ee; 338 struct sock_extended_err ee;
@@ -342,7 +342,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
342 int copied; 342 int copied;
343 343
344 err = -EAGAIN; 344 err = -EAGAIN;
345 skb = skb_dequeue(&sk->sk_error_queue); 345 skb = sock_dequeue_err_skb(sk);
346 if (skb == NULL) 346 if (skb == NULL)
347 goto out; 347 goto out;
348 348
@@ -415,17 +415,6 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
415 msg->msg_flags |= MSG_ERRQUEUE; 415 msg->msg_flags |= MSG_ERRQUEUE;
416 err = copied; 416 err = copied;
417 417
418 /* Reset and regenerate socket error */
419 spin_lock_bh(&sk->sk_error_queue.lock);
420 sk->sk_err = 0;
421 if ((skb2 = skb_peek(&sk->sk_error_queue)) != NULL) {
422 sk->sk_err = SKB_EXT_ERR(skb2)->ee.ee_errno;
423 spin_unlock_bh(&sk->sk_error_queue.lock);
424 sk->sk_error_report(sk);
425 } else {
426 spin_unlock_bh(&sk->sk_error_queue.lock);
427 }
428
429out_free_skb: 418out_free_skb:
430 kfree_skb(skb); 419 kfree_skb(skb);
431out: 420out:
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index d15da1377149..83fc3a385a26 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -17,10 +17,10 @@
17 * Authors 17 * Authors
18 * 18 *
19 * Mitsuru KANDA @USAGI : IPv6 Support 19 * Mitsuru KANDA @USAGI : IPv6 Support
20 * Kazunori MIYAZAWA @USAGI : 20 * Kazunori MIYAZAWA @USAGI :
21 * Kunihiro Ishiguro <kunihiro@ipinfusion.com> 21 * Kunihiro Ishiguro <kunihiro@ipinfusion.com>
22 * 22 *
23 * This file is derived from net/ipv4/esp.c 23 * This file is derived from net/ipv4/esp.c
24 */ 24 */
25 25
26#define pr_fmt(fmt) "IPv6: " fmt 26#define pr_fmt(fmt) "IPv6: " fmt
@@ -598,7 +598,7 @@ static int esp6_init_state(struct xfrm_state *x)
598 case XFRM_MODE_BEET: 598 case XFRM_MODE_BEET:
599 if (x->sel.family != AF_INET6) 599 if (x->sel.family != AF_INET6)
600 x->props.header_len += IPV4_BEET_PHMAXLEN + 600 x->props.header_len += IPV4_BEET_PHMAXLEN +
601 (sizeof(struct ipv6hdr) - sizeof(struct iphdr)); 601 (sizeof(struct ipv6hdr) - sizeof(struct iphdr));
602 break; 602 break;
603 case XFRM_MODE_TRANSPORT: 603 case XFRM_MODE_TRANSPORT:
604 break; 604 break;
@@ -621,11 +621,10 @@ static int esp6_rcv_cb(struct sk_buff *skb, int err)
621 return 0; 621 return 0;
622} 622}
623 623
624static const struct xfrm_type esp6_type = 624static const struct xfrm_type esp6_type = {
625{
626 .description = "ESP6", 625 .description = "ESP6",
627 .owner = THIS_MODULE, 626 .owner = THIS_MODULE,
628 .proto = IPPROTO_ESP, 627 .proto = IPPROTO_ESP,
629 .flags = XFRM_TYPE_REPLAY_PROT, 628 .flags = XFRM_TYPE_REPLAY_PROT,
630 .init_state = esp6_init_state, 629 .init_state = esp6_init_state,
631 .destructor = esp6_destroy, 630 .destructor = esp6_destroy,
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 8d67900aa003..bfde361b6134 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -142,7 +142,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs, struct sk_buff *skb)
142 default: /* Other TLV code so scan list */ 142 default: /* Other TLV code so scan list */
143 if (optlen > len) 143 if (optlen > len)
144 goto bad; 144 goto bad;
145 for (curr=procs; curr->type >= 0; curr++) { 145 for (curr = procs; curr->type >= 0; curr++) {
146 if (curr->type == nh[off]) { 146 if (curr->type == nh[off]) {
147 /* type specific length/alignment 147 /* type specific length/alignment
148 checks will be performed in the 148 checks will be performed in the
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 06ba3e58320b..97ae70077a4f 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -170,11 +170,11 @@ static bool is_ineligible(const struct sk_buff *skb)
170/* 170/*
171 * Check the ICMP output rate limit 171 * Check the ICMP output rate limit
172 */ 172 */
173static inline bool icmpv6_xrlim_allow(struct sock *sk, u8 type, 173static bool icmpv6_xrlim_allow(struct sock *sk, u8 type,
174 struct flowi6 *fl6) 174 struct flowi6 *fl6)
175{ 175{
176 struct dst_entry *dst;
177 struct net *net = sock_net(sk); 176 struct net *net = sock_net(sk);
177 struct dst_entry *dst;
178 bool res = false; 178 bool res = false;
179 179
180 /* Informational messages are not limited. */ 180 /* Informational messages are not limited. */
@@ -199,16 +199,20 @@ static inline bool icmpv6_xrlim_allow(struct sock *sk, u8 type,
199 } else { 199 } else {
200 struct rt6_info *rt = (struct rt6_info *)dst; 200 struct rt6_info *rt = (struct rt6_info *)dst;
201 int tmo = net->ipv6.sysctl.icmpv6_time; 201 int tmo = net->ipv6.sysctl.icmpv6_time;
202 struct inet_peer *peer;
203 202
204 /* Give more bandwidth to wider prefixes. */ 203 /* Give more bandwidth to wider prefixes. */
205 if (rt->rt6i_dst.plen < 128) 204 if (rt->rt6i_dst.plen < 128)
206 tmo >>= ((128 - rt->rt6i_dst.plen)>>5); 205 tmo >>= ((128 - rt->rt6i_dst.plen)>>5);
207 206
208 peer = inet_getpeer_v6(net->ipv6.peers, &rt->rt6i_dst.addr, 1); 207 if (icmp_global_allow()) {
209 res = inet_peer_xrlim_allow(peer, tmo); 208 struct inet_peer *peer;
210 if (peer) 209
211 inet_putpeer(peer); 210 peer = inet_getpeer_v6(net->ipv6.peers,
211 &rt->rt6i_dst.addr, 1);
212 res = inet_peer_xrlim_allow(peer, tmo);
213 if (peer)
214 inet_putpeer(peer);
215 }
212 } 216 }
213 dst_release(dst); 217 dst_release(dst);
214 return res; 218 return res;
@@ -503,7 +507,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info)
503 msg.type = type; 507 msg.type = type;
504 508
505 len = skb->len - msg.offset; 509 len = skb->len - msg.offset;
506 len = min_t(unsigned int, len, IPV6_MIN_MTU - sizeof(struct ipv6hdr) -sizeof(struct icmp6hdr)); 510 len = min_t(unsigned int, len, IPV6_MIN_MTU - sizeof(struct ipv6hdr) - sizeof(struct icmp6hdr));
507 if (len < 0) { 511 if (len < 0) {
508 LIMIT_NETDEBUG(KERN_DEBUG "icmp: len problem\n"); 512 LIMIT_NETDEBUG(KERN_DEBUG "icmp: len problem\n");
509 goto out_dst_release; 513 goto out_dst_release;
@@ -636,7 +640,7 @@ void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info)
636 /* now skip over extension headers */ 640 /* now skip over extension headers */
637 inner_offset = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), 641 inner_offset = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr),
638 &nexthdr, &frag_off); 642 &nexthdr, &frag_off);
639 if (inner_offset<0) 643 if (inner_offset < 0)
640 goto out; 644 goto out;
641 } else { 645 } else {
642 inner_offset = sizeof(struct ipv6hdr); 646 inner_offset = sizeof(struct ipv6hdr);
@@ -773,12 +777,12 @@ static int icmpv6_rcv(struct sk_buff *skb)
773 break; 777 break;
774 778
775 default: 779 default:
776 LIMIT_NETDEBUG(KERN_DEBUG "icmpv6: msg of unknown type\n");
777
778 /* informational */ 780 /* informational */
779 if (type & ICMPV6_INFOMSG_MASK) 781 if (type & ICMPV6_INFOMSG_MASK)
780 break; 782 break;
781 783
784 LIMIT_NETDEBUG(KERN_DEBUG "icmpv6: msg of unknown type\n");
785
782 /* 786 /*
783 * error of unknown type. 787 * error of unknown type.
784 * must pass to upper level 788 * must pass to upper level
@@ -808,7 +812,7 @@ void icmpv6_flow_init(struct sock *sk, struct flowi6 *fl6,
808 memset(fl6, 0, sizeof(*fl6)); 812 memset(fl6, 0, sizeof(*fl6));
809 fl6->saddr = *saddr; 813 fl6->saddr = *saddr;
810 fl6->daddr = *daddr; 814 fl6->daddr = *daddr;
811 fl6->flowi6_proto = IPPROTO_ICMPV6; 815 fl6->flowi6_proto = IPPROTO_ICMPV6;
812 fl6->fl6_icmp_type = type; 816 fl6->fl6_icmp_type = type;
813 fl6->fl6_icmp_code = 0; 817 fl6->fl6_icmp_code = 0;
814 fl6->flowi6_oif = oif; 818 fl6->flowi6_oif = oif;
@@ -875,8 +879,8 @@ static void __net_exit icmpv6_sk_exit(struct net *net)
875} 879}
876 880
877static struct pernet_operations icmpv6_sk_ops = { 881static struct pernet_operations icmpv6_sk_ops = {
878 .init = icmpv6_sk_init, 882 .init = icmpv6_sk_init,
879 .exit = icmpv6_sk_exit, 883 .exit = icmpv6_sk_exit,
880}; 884};
881 885
882int __init icmpv6_init(void) 886int __init icmpv6_init(void)
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index a245e5ddffbd..29b32206e494 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -63,7 +63,6 @@ int inet6_csk_bind_conflict(const struct sock *sk,
63 63
64 return sk2 != NULL; 64 return sk2 != NULL;
65} 65}
66
67EXPORT_SYMBOL_GPL(inet6_csk_bind_conflict); 66EXPORT_SYMBOL_GPL(inet6_csk_bind_conflict);
68 67
69struct dst_entry *inet6_csk_route_req(struct sock *sk, 68struct dst_entry *inet6_csk_route_req(struct sock *sk,
@@ -144,7 +143,6 @@ struct request_sock *inet6_csk_search_req(const struct sock *sk,
144 143
145 return NULL; 144 return NULL;
146} 145}
147
148EXPORT_SYMBOL_GPL(inet6_csk_search_req); 146EXPORT_SYMBOL_GPL(inet6_csk_search_req);
149 147
150void inet6_csk_reqsk_queue_hash_add(struct sock *sk, 148void inet6_csk_reqsk_queue_hash_add(struct sock *sk,
@@ -160,10 +158,9 @@ void inet6_csk_reqsk_queue_hash_add(struct sock *sk,
160 reqsk_queue_hash_req(&icsk->icsk_accept_queue, h, req, timeout); 158 reqsk_queue_hash_req(&icsk->icsk_accept_queue, h, req, timeout);
161 inet_csk_reqsk_queue_added(sk, timeout); 159 inet_csk_reqsk_queue_added(sk, timeout);
162} 160}
163
164EXPORT_SYMBOL_GPL(inet6_csk_reqsk_queue_hash_add); 161EXPORT_SYMBOL_GPL(inet6_csk_reqsk_queue_hash_add);
165 162
166void inet6_csk_addr2sockaddr(struct sock *sk, struct sockaddr * uaddr) 163void inet6_csk_addr2sockaddr(struct sock *sk, struct sockaddr *uaddr)
167{ 164{
168 struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *) uaddr; 165 struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *) uaddr;
169 166
@@ -175,7 +172,6 @@ void inet6_csk_addr2sockaddr(struct sock *sk, struct sockaddr * uaddr)
175 sin6->sin6_scope_id = ipv6_iface_scope_id(&sin6->sin6_addr, 172 sin6->sin6_scope_id = ipv6_iface_scope_id(&sin6->sin6_addr,
176 sk->sk_bound_dev_if); 173 sk->sk_bound_dev_if);
177} 174}
178
179EXPORT_SYMBOL_GPL(inet6_csk_addr2sockaddr); 175EXPORT_SYMBOL_GPL(inet6_csk_addr2sockaddr);
180 176
181static inline 177static inline
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 262e13c02ec2..051dffb49c90 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -6,7 +6,7 @@
6 * Generic INET6 transport hashtables 6 * Generic INET6 transport hashtables
7 * 7 *
8 * Authors: Lotsa people, from code originally in tcp, generalised here 8 * Authors: Lotsa people, from code originally in tcp, generalised here
9 * by Arnaldo Carvalho de Melo <acme@mandriva.com> 9 * by Arnaldo Carvalho de Melo <acme@mandriva.com>
10 * 10 *
11 * This program is free software; you can redistribute it and/or 11 * This program is free software; you can redistribute it and/or
12 * modify it under the terms of the GNU General Public License 12 * modify it under the terms of the GNU General Public License
@@ -198,7 +198,7 @@ begin:
198 } 198 }
199 } else if (score == hiscore && reuseport) { 199 } else if (score == hiscore && reuseport) {
200 matches++; 200 matches++;
201 if (((u64)phash * matches) >> 32 == 0) 201 if (reciprocal_scale(phash, matches) == 0)
202 result = sk; 202 result = sk;
203 phash = next_pseudo_random32(phash); 203 phash = next_pseudo_random32(phash);
204 } 204 }
@@ -222,7 +222,6 @@ begin:
222 rcu_read_unlock(); 222 rcu_read_unlock();
223 return result; 223 return result;
224} 224}
225
226EXPORT_SYMBOL_GPL(inet6_lookup_listener); 225EXPORT_SYMBOL_GPL(inet6_lookup_listener);
227 226
228struct sock *inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo, 227struct sock *inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo,
@@ -238,7 +237,6 @@ struct sock *inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo,
238 237
239 return sk; 238 return sk;
240} 239}
241
242EXPORT_SYMBOL_GPL(inet6_lookup); 240EXPORT_SYMBOL_GPL(inet6_lookup);
243 241
244static int __inet6_check_established(struct inet_timewait_death_row *death_row, 242static int __inet6_check_established(struct inet_timewait_death_row *death_row,
@@ -324,5 +322,4 @@ int inet6_hash_connect(struct inet_timewait_death_row *death_row,
324 return __inet_hash_connect(death_row, sk, inet6_sk_port_offset(sk), 322 return __inet_hash_connect(death_row, sk, inet6_sk_port_offset(sk),
325 __inet6_check_established, __inet6_hash); 323 __inet6_check_established, __inet6_hash);
326} 324}
327
328EXPORT_SYMBOL_GPL(inet6_hash_connect); 325EXPORT_SYMBOL_GPL(inet6_hash_connect);
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 97b9fa8de377..b2d1838897c9 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -46,20 +46,11 @@
46 46
47static struct kmem_cache *fib6_node_kmem __read_mostly; 47static struct kmem_cache *fib6_node_kmem __read_mostly;
48 48
49enum fib_walk_state_t { 49struct fib6_cleaner {
50#ifdef CONFIG_IPV6_SUBTREES 50 struct fib6_walker w;
51 FWS_S,
52#endif
53 FWS_L,
54 FWS_R,
55 FWS_C,
56 FWS_U
57};
58
59struct fib6_cleaner_t {
60 struct fib6_walker_t w;
61 struct net *net; 51 struct net *net;
62 int (*func)(struct rt6_info *, void *arg); 52 int (*func)(struct rt6_info *, void *arg);
53 int sernum;
63 void *arg; 54 void *arg;
64}; 55};
65 56
@@ -74,8 +65,8 @@ static DEFINE_RWLOCK(fib6_walker_lock);
74static void fib6_prune_clones(struct net *net, struct fib6_node *fn); 65static void fib6_prune_clones(struct net *net, struct fib6_node *fn);
75static struct rt6_info *fib6_find_prefix(struct net *net, struct fib6_node *fn); 66static struct rt6_info *fib6_find_prefix(struct net *net, struct fib6_node *fn);
76static struct fib6_node *fib6_repair_tree(struct net *net, struct fib6_node *fn); 67static struct fib6_node *fib6_repair_tree(struct net *net, struct fib6_node *fn);
77static int fib6_walk(struct fib6_walker_t *w); 68static int fib6_walk(struct fib6_walker *w);
78static int fib6_walk_continue(struct fib6_walker_t *w); 69static int fib6_walk_continue(struct fib6_walker *w);
79 70
80/* 71/*
81 * A routing update causes an increase of the serial number on the 72 * A routing update causes an increase of the serial number on the
@@ -84,34 +75,41 @@ static int fib6_walk_continue(struct fib6_walker_t *w);
84 * result of redirects, path MTU changes, etc. 75 * result of redirects, path MTU changes, etc.
85 */ 76 */
86 77
87static __u32 rt_sernum;
88
89static void fib6_gc_timer_cb(unsigned long arg); 78static void fib6_gc_timer_cb(unsigned long arg);
90 79
91static LIST_HEAD(fib6_walkers); 80static LIST_HEAD(fib6_walkers);
92#define FOR_WALKERS(w) list_for_each_entry(w, &fib6_walkers, lh) 81#define FOR_WALKERS(w) list_for_each_entry(w, &fib6_walkers, lh)
93 82
94static inline void fib6_walker_link(struct fib6_walker_t *w) 83static void fib6_walker_link(struct fib6_walker *w)
95{ 84{
96 write_lock_bh(&fib6_walker_lock); 85 write_lock_bh(&fib6_walker_lock);
97 list_add(&w->lh, &fib6_walkers); 86 list_add(&w->lh, &fib6_walkers);
98 write_unlock_bh(&fib6_walker_lock); 87 write_unlock_bh(&fib6_walker_lock);
99} 88}
100 89
101static inline void fib6_walker_unlink(struct fib6_walker_t *w) 90static void fib6_walker_unlink(struct fib6_walker *w)
102{ 91{
103 write_lock_bh(&fib6_walker_lock); 92 write_lock_bh(&fib6_walker_lock);
104 list_del(&w->lh); 93 list_del(&w->lh);
105 write_unlock_bh(&fib6_walker_lock); 94 write_unlock_bh(&fib6_walker_lock);
106} 95}
107static __inline__ u32 fib6_new_sernum(void) 96
97static int fib6_new_sernum(struct net *net)
108{ 98{
109 u32 n = ++rt_sernum; 99 int new, old;
110 if ((__s32)n <= 0) 100
111 rt_sernum = n = 1; 101 do {
112 return n; 102 old = atomic_read(&net->ipv6.fib6_sernum);
103 new = old < INT_MAX ? old + 1 : 1;
104 } while (atomic_cmpxchg(&net->ipv6.fib6_sernum,
105 old, new) != old);
106 return new;
113} 107}
114 108
109enum {
110 FIB6_NO_SERNUM_CHANGE = 0,
111};
112
115/* 113/*
116 * Auxiliary address test functions for the radix tree. 114 * Auxiliary address test functions for the radix tree.
117 * 115 *
@@ -128,7 +126,7 @@ static __inline__ u32 fib6_new_sernum(void)
128# define BITOP_BE32_SWIZZLE 0 126# define BITOP_BE32_SWIZZLE 0
129#endif 127#endif
130 128
131static __inline__ __be32 addr_bit_set(const void *token, int fn_bit) 129static __be32 addr_bit_set(const void *token, int fn_bit)
132{ 130{
133 const __be32 *addr = token; 131 const __be32 *addr = token;
134 /* 132 /*
@@ -142,7 +140,7 @@ static __inline__ __be32 addr_bit_set(const void *token, int fn_bit)
142 addr[fn_bit >> 5]; 140 addr[fn_bit >> 5];
143} 141}
144 142
145static __inline__ struct fib6_node *node_alloc(void) 143static struct fib6_node *node_alloc(void)
146{ 144{
147 struct fib6_node *fn; 145 struct fib6_node *fn;
148 146
@@ -151,12 +149,12 @@ static __inline__ struct fib6_node *node_alloc(void)
151 return fn; 149 return fn;
152} 150}
153 151
154static __inline__ void node_free(struct fib6_node *fn) 152static void node_free(struct fib6_node *fn)
155{ 153{
156 kmem_cache_free(fib6_node_kmem, fn); 154 kmem_cache_free(fib6_node_kmem, fn);
157} 155}
158 156
159static __inline__ void rt6_release(struct rt6_info *rt) 157static void rt6_release(struct rt6_info *rt)
160{ 158{
161 if (atomic_dec_and_test(&rt->rt6i_ref)) 159 if (atomic_dec_and_test(&rt->rt6i_ref))
162 dst_free(&rt->dst); 160 dst_free(&rt->dst);
@@ -267,7 +265,7 @@ static void __net_init fib6_tables_init(struct net *net)
267 265
268#endif 266#endif
269 267
270static int fib6_dump_node(struct fib6_walker_t *w) 268static int fib6_dump_node(struct fib6_walker *w)
271{ 269{
272 int res; 270 int res;
273 struct rt6_info *rt; 271 struct rt6_info *rt;
@@ -287,7 +285,7 @@ static int fib6_dump_node(struct fib6_walker_t *w)
287 285
288static void fib6_dump_end(struct netlink_callback *cb) 286static void fib6_dump_end(struct netlink_callback *cb)
289{ 287{
290 struct fib6_walker_t *w = (void *)cb->args[2]; 288 struct fib6_walker *w = (void *)cb->args[2];
291 289
292 if (w) { 290 if (w) {
293 if (cb->args[4]) { 291 if (cb->args[4]) {
@@ -310,7 +308,7 @@ static int fib6_dump_done(struct netlink_callback *cb)
310static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb, 308static int fib6_dump_table(struct fib6_table *table, struct sk_buff *skb,
311 struct netlink_callback *cb) 309 struct netlink_callback *cb)
312{ 310{
313 struct fib6_walker_t *w; 311 struct fib6_walker *w;
314 int res; 312 int res;
315 313
316 w = (void *)cb->args[2]; 314 w = (void *)cb->args[2];
@@ -355,7 +353,7 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
355 unsigned int h, s_h; 353 unsigned int h, s_h;
356 unsigned int e = 0, s_e; 354 unsigned int e = 0, s_e;
357 struct rt6_rtnl_dump_arg arg; 355 struct rt6_rtnl_dump_arg arg;
358 struct fib6_walker_t *w; 356 struct fib6_walker *w;
359 struct fib6_table *tb; 357 struct fib6_table *tb;
360 struct hlist_head *head; 358 struct hlist_head *head;
361 int res = 0; 359 int res = 0;
@@ -423,14 +421,13 @@ out:
423static struct fib6_node *fib6_add_1(struct fib6_node *root, 421static struct fib6_node *fib6_add_1(struct fib6_node *root,
424 struct in6_addr *addr, int plen, 422 struct in6_addr *addr, int plen,
425 int offset, int allow_create, 423 int offset, int allow_create,
426 int replace_required) 424 int replace_required, int sernum)
427{ 425{
428 struct fib6_node *fn, *in, *ln; 426 struct fib6_node *fn, *in, *ln;
429 struct fib6_node *pn = NULL; 427 struct fib6_node *pn = NULL;
430 struct rt6key *key; 428 struct rt6key *key;
431 int bit; 429 int bit;
432 __be32 dir = 0; 430 __be32 dir = 0;
433 __u32 sernum = fib6_new_sernum();
434 431
435 RT6_TRACE("fib6_add_1\n"); 432 RT6_TRACE("fib6_add_1\n");
436 433
@@ -627,7 +624,7 @@ insert_above:
627 return ln; 624 return ln;
628} 625}
629 626
630static inline bool rt6_qualify_for_ecmp(struct rt6_info *rt) 627static bool rt6_qualify_for_ecmp(struct rt6_info *rt)
631{ 628{
632 return (rt->rt6i_flags & (RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC)) == 629 return (rt->rt6i_flags & (RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC)) ==
633 RTF_GATEWAY; 630 RTF_GATEWAY;
@@ -820,7 +817,7 @@ add:
820 return 0; 817 return 0;
821} 818}
822 819
823static __inline__ void fib6_start_gc(struct net *net, struct rt6_info *rt) 820static void fib6_start_gc(struct net *net, struct rt6_info *rt)
824{ 821{
825 if (!timer_pending(&net->ipv6.ip6_fib_timer) && 822 if (!timer_pending(&net->ipv6.ip6_fib_timer) &&
826 (rt->rt6i_flags & (RTF_EXPIRES | RTF_CACHE))) 823 (rt->rt6i_flags & (RTF_EXPIRES | RTF_CACHE)))
@@ -848,6 +845,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
848 int err = -ENOMEM; 845 int err = -ENOMEM;
849 int allow_create = 1; 846 int allow_create = 1;
850 int replace_required = 0; 847 int replace_required = 0;
848 int sernum = fib6_new_sernum(info->nl_net);
851 849
852 if (info->nlh) { 850 if (info->nlh) {
853 if (!(info->nlh->nlmsg_flags & NLM_F_CREATE)) 851 if (!(info->nlh->nlmsg_flags & NLM_F_CREATE))
@@ -860,7 +858,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
860 858
861 fn = fib6_add_1(root, &rt->rt6i_dst.addr, rt->rt6i_dst.plen, 859 fn = fib6_add_1(root, &rt->rt6i_dst.addr, rt->rt6i_dst.plen,
862 offsetof(struct rt6_info, rt6i_dst), allow_create, 860 offsetof(struct rt6_info, rt6i_dst), allow_create,
863 replace_required); 861 replace_required, sernum);
864 if (IS_ERR(fn)) { 862 if (IS_ERR(fn)) {
865 err = PTR_ERR(fn); 863 err = PTR_ERR(fn);
866 fn = NULL; 864 fn = NULL;
@@ -894,14 +892,14 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
894 sfn->leaf = info->nl_net->ipv6.ip6_null_entry; 892 sfn->leaf = info->nl_net->ipv6.ip6_null_entry;
895 atomic_inc(&info->nl_net->ipv6.ip6_null_entry->rt6i_ref); 893 atomic_inc(&info->nl_net->ipv6.ip6_null_entry->rt6i_ref);
896 sfn->fn_flags = RTN_ROOT; 894 sfn->fn_flags = RTN_ROOT;
897 sfn->fn_sernum = fib6_new_sernum(); 895 sfn->fn_sernum = sernum;
898 896
899 /* Now add the first leaf node to new subtree */ 897 /* Now add the first leaf node to new subtree */
900 898
901 sn = fib6_add_1(sfn, &rt->rt6i_src.addr, 899 sn = fib6_add_1(sfn, &rt->rt6i_src.addr,
902 rt->rt6i_src.plen, 900 rt->rt6i_src.plen,
903 offsetof(struct rt6_info, rt6i_src), 901 offsetof(struct rt6_info, rt6i_src),
904 allow_create, replace_required); 902 allow_create, replace_required, sernum);
905 903
906 if (IS_ERR(sn)) { 904 if (IS_ERR(sn)) {
907 /* If it is failed, discard just allocated 905 /* If it is failed, discard just allocated
@@ -920,7 +918,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt, struct nl_info *info,
920 sn = fib6_add_1(fn->subtree, &rt->rt6i_src.addr, 918 sn = fib6_add_1(fn->subtree, &rt->rt6i_src.addr,
921 rt->rt6i_src.plen, 919 rt->rt6i_src.plen,
922 offsetof(struct rt6_info, rt6i_src), 920 offsetof(struct rt6_info, rt6i_src),
923 allow_create, replace_required); 921 allow_create, replace_required, sernum);
924 922
925 if (IS_ERR(sn)) { 923 if (IS_ERR(sn)) {
926 err = PTR_ERR(sn); 924 err = PTR_ERR(sn);
@@ -1174,7 +1172,7 @@ static struct fib6_node *fib6_repair_tree(struct net *net,
1174 int children; 1172 int children;
1175 int nstate; 1173 int nstate;
1176 struct fib6_node *child, *pn; 1174 struct fib6_node *child, *pn;
1177 struct fib6_walker_t *w; 1175 struct fib6_walker *w;
1178 int iter = 0; 1176 int iter = 0;
1179 1177
1180 for (;;) { 1178 for (;;) {
@@ -1276,7 +1274,7 @@ static struct fib6_node *fib6_repair_tree(struct net *net,
1276static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp, 1274static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
1277 struct nl_info *info) 1275 struct nl_info *info)
1278{ 1276{
1279 struct fib6_walker_t *w; 1277 struct fib6_walker *w;
1280 struct rt6_info *rt = *rtp; 1278 struct rt6_info *rt = *rtp;
1281 struct net *net = info->nl_net; 1279 struct net *net = info->nl_net;
1282 1280
@@ -1414,7 +1412,7 @@ int fib6_del(struct rt6_info *rt, struct nl_info *info)
1414 * <0 -> walk is terminated by an error. 1412 * <0 -> walk is terminated by an error.
1415 */ 1413 */
1416 1414
1417static int fib6_walk_continue(struct fib6_walker_t *w) 1415static int fib6_walk_continue(struct fib6_walker *w)
1418{ 1416{
1419 struct fib6_node *fn, *pn; 1417 struct fib6_node *fn, *pn;
1420 1418
@@ -1498,7 +1496,7 @@ skip:
1498 } 1496 }
1499} 1497}
1500 1498
1501static int fib6_walk(struct fib6_walker_t *w) 1499static int fib6_walk(struct fib6_walker *w)
1502{ 1500{
1503 int res; 1501 int res;
1504 1502
@@ -1512,15 +1510,25 @@ static int fib6_walk(struct fib6_walker_t *w)
1512 return res; 1510 return res;
1513} 1511}
1514 1512
1515static int fib6_clean_node(struct fib6_walker_t *w) 1513static int fib6_clean_node(struct fib6_walker *w)
1516{ 1514{
1517 int res; 1515 int res;
1518 struct rt6_info *rt; 1516 struct rt6_info *rt;
1519 struct fib6_cleaner_t *c = container_of(w, struct fib6_cleaner_t, w); 1517 struct fib6_cleaner *c = container_of(w, struct fib6_cleaner, w);
1520 struct nl_info info = { 1518 struct nl_info info = {
1521 .nl_net = c->net, 1519 .nl_net = c->net,
1522 }; 1520 };
1523 1521
1522 if (c->sernum != FIB6_NO_SERNUM_CHANGE &&
1523 w->node->fn_sernum != c->sernum)
1524 w->node->fn_sernum = c->sernum;
1525
1526 if (!c->func) {
1527 WARN_ON_ONCE(c->sernum == FIB6_NO_SERNUM_CHANGE);
1528 w->leaf = NULL;
1529 return 0;
1530 }
1531
1524 for (rt = w->leaf; rt; rt = rt->dst.rt6_next) { 1532 for (rt = w->leaf; rt; rt = rt->dst.rt6_next) {
1525 res = c->func(rt, c->arg); 1533 res = c->func(rt, c->arg);
1526 if (res < 0) { 1534 if (res < 0) {
@@ -1554,9 +1562,9 @@ static int fib6_clean_node(struct fib6_walker_t *w)
1554 1562
1555static void fib6_clean_tree(struct net *net, struct fib6_node *root, 1563static void fib6_clean_tree(struct net *net, struct fib6_node *root,
1556 int (*func)(struct rt6_info *, void *arg), 1564 int (*func)(struct rt6_info *, void *arg),
1557 int prune, void *arg) 1565 bool prune, int sernum, void *arg)
1558{ 1566{
1559 struct fib6_cleaner_t c; 1567 struct fib6_cleaner c;
1560 1568
1561 c.w.root = root; 1569 c.w.root = root;
1562 c.w.func = fib6_clean_node; 1570 c.w.func = fib6_clean_node;
@@ -1564,14 +1572,16 @@ static void fib6_clean_tree(struct net *net, struct fib6_node *root,
1564 c.w.count = 0; 1572 c.w.count = 0;
1565 c.w.skip = 0; 1573 c.w.skip = 0;
1566 c.func = func; 1574 c.func = func;
1575 c.sernum = sernum;
1567 c.arg = arg; 1576 c.arg = arg;
1568 c.net = net; 1577 c.net = net;
1569 1578
1570 fib6_walk(&c.w); 1579 fib6_walk(&c.w);
1571} 1580}
1572 1581
1573void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg), 1582static void __fib6_clean_all(struct net *net,
1574 void *arg) 1583 int (*func)(struct rt6_info *, void *),
1584 int sernum, void *arg)
1575{ 1585{
1576 struct fib6_table *table; 1586 struct fib6_table *table;
1577 struct hlist_head *head; 1587 struct hlist_head *head;
@@ -1583,13 +1593,19 @@ void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
1583 hlist_for_each_entry_rcu(table, head, tb6_hlist) { 1593 hlist_for_each_entry_rcu(table, head, tb6_hlist) {
1584 write_lock_bh(&table->tb6_lock); 1594 write_lock_bh(&table->tb6_lock);
1585 fib6_clean_tree(net, &table->tb6_root, 1595 fib6_clean_tree(net, &table->tb6_root,
1586 func, 0, arg); 1596 func, false, sernum, arg);
1587 write_unlock_bh(&table->tb6_lock); 1597 write_unlock_bh(&table->tb6_lock);
1588 } 1598 }
1589 } 1599 }
1590 rcu_read_unlock(); 1600 rcu_read_unlock();
1591} 1601}
1592 1602
1603void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *),
1604 void *arg)
1605{
1606 __fib6_clean_all(net, func, FIB6_NO_SERNUM_CHANGE, arg);
1607}
1608
1593static int fib6_prune_clone(struct rt6_info *rt, void *arg) 1609static int fib6_prune_clone(struct rt6_info *rt, void *arg)
1594{ 1610{
1595 if (rt->rt6i_flags & RTF_CACHE) { 1611 if (rt->rt6i_flags & RTF_CACHE) {
@@ -1602,25 +1618,15 @@ static int fib6_prune_clone(struct rt6_info *rt, void *arg)
1602 1618
1603static void fib6_prune_clones(struct net *net, struct fib6_node *fn) 1619static void fib6_prune_clones(struct net *net, struct fib6_node *fn)
1604{ 1620{
1605 fib6_clean_tree(net, fn, fib6_prune_clone, 1, NULL); 1621 fib6_clean_tree(net, fn, fib6_prune_clone, true,
1606} 1622 FIB6_NO_SERNUM_CHANGE, NULL);
1607
1608static int fib6_update_sernum(struct rt6_info *rt, void *arg)
1609{
1610 __u32 sernum = *(__u32 *)arg;
1611
1612 if (rt->rt6i_node &&
1613 rt->rt6i_node->fn_sernum != sernum)
1614 rt->rt6i_node->fn_sernum = sernum;
1615
1616 return 0;
1617} 1623}
1618 1624
1619static void fib6_flush_trees(struct net *net) 1625static void fib6_flush_trees(struct net *net)
1620{ 1626{
1621 __u32 new_sernum = fib6_new_sernum(); 1627 int new_sernum = fib6_new_sernum(net);
1622 1628
1623 fib6_clean_all(net, fib6_update_sernum, &new_sernum); 1629 __fib6_clean_all(net, NULL, new_sernum, NULL);
1624} 1630}
1625 1631
1626/* 1632/*
@@ -1828,10 +1834,10 @@ void fib6_gc_cleanup(void)
1828 1834
1829struct ipv6_route_iter { 1835struct ipv6_route_iter {
1830 struct seq_net_private p; 1836 struct seq_net_private p;
1831 struct fib6_walker_t w; 1837 struct fib6_walker w;
1832 loff_t skip; 1838 loff_t skip;
1833 struct fib6_table *tbl; 1839 struct fib6_table *tbl;
1834 __u32 sernum; 1840 int sernum;
1835}; 1841};
1836 1842
1837static int ipv6_route_seq_show(struct seq_file *seq, void *v) 1843static int ipv6_route_seq_show(struct seq_file *seq, void *v)
@@ -1859,7 +1865,7 @@ static int ipv6_route_seq_show(struct seq_file *seq, void *v)
1859 return 0; 1865 return 0;
1860} 1866}
1861 1867
1862static int ipv6_route_yield(struct fib6_walker_t *w) 1868static int ipv6_route_yield(struct fib6_walker *w)
1863{ 1869{
1864 struct ipv6_route_iter *iter = w->args; 1870 struct ipv6_route_iter *iter = w->args;
1865 1871
@@ -1980,7 +1986,7 @@ static void *ipv6_route_seq_start(struct seq_file *seq, loff_t *pos)
1980 1986
1981static bool ipv6_route_iter_active(struct ipv6_route_iter *iter) 1987static bool ipv6_route_iter_active(struct ipv6_route_iter *iter)
1982{ 1988{
1983 struct fib6_walker_t *w = &iter->w; 1989 struct fib6_walker *w = &iter->w;
1984 return w->node && !(w->state == FWS_U && w->node == w->root); 1990 return w->node && !(w->state == FWS_U && w->node == w->root);
1985} 1991}
1986 1992
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index 4052694c6f2c..3dd7d4ebd7cd 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -136,7 +136,7 @@ static void ip6_fl_gc(unsigned long dummy)
136 136
137 spin_lock(&ip6_fl_lock); 137 spin_lock(&ip6_fl_lock);
138 138
139 for (i=0; i<=FL_HASH_MASK; i++) { 139 for (i = 0; i <= FL_HASH_MASK; i++) {
140 struct ip6_flowlabel *fl; 140 struct ip6_flowlabel *fl;
141 struct ip6_flowlabel __rcu **flp; 141 struct ip6_flowlabel __rcu **flp;
142 142
@@ -239,7 +239,7 @@ static struct ip6_flowlabel *fl_intern(struct net *net,
239 239
240/* Socket flowlabel lists */ 240/* Socket flowlabel lists */
241 241
242struct ip6_flowlabel * fl6_sock_lookup(struct sock *sk, __be32 label) 242struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk, __be32 label)
243{ 243{
244 struct ipv6_fl_socklist *sfl; 244 struct ipv6_fl_socklist *sfl;
245 struct ipv6_pinfo *np = inet6_sk(sk); 245 struct ipv6_pinfo *np = inet6_sk(sk);
@@ -259,7 +259,6 @@ struct ip6_flowlabel * fl6_sock_lookup(struct sock *sk, __be32 label)
259 rcu_read_unlock_bh(); 259 rcu_read_unlock_bh();
260 return NULL; 260 return NULL;
261} 261}
262
263EXPORT_SYMBOL_GPL(fl6_sock_lookup); 262EXPORT_SYMBOL_GPL(fl6_sock_lookup);
264 263
265void fl6_free_socklist(struct sock *sk) 264void fl6_free_socklist(struct sock *sk)
@@ -293,11 +292,11 @@ void fl6_free_socklist(struct sock *sk)
293 following rthdr. 292 following rthdr.
294 */ 293 */
295 294
296struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions * opt_space, 295struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions *opt_space,
297 struct ip6_flowlabel * fl, 296 struct ip6_flowlabel *fl,
298 struct ipv6_txoptions * fopt) 297 struct ipv6_txoptions *fopt)
299{ 298{
300 struct ipv6_txoptions * fl_opt = fl->opt; 299 struct ipv6_txoptions *fl_opt = fl->opt;
301 300
302 if (fopt == NULL || fopt->opt_flen == 0) 301 if (fopt == NULL || fopt->opt_flen == 0)
303 return fl_opt; 302 return fl_opt;
@@ -388,7 +387,7 @@ fl_create(struct net *net, struct sock *sk, struct in6_flowlabel_req *freq,
388 goto done; 387 goto done;
389 388
390 msg.msg_controllen = olen; 389 msg.msg_controllen = olen;
391 msg.msg_control = (void*)(fl->opt+1); 390 msg.msg_control = (void *)(fl->opt+1);
392 memset(&flowi6, 0, sizeof(flowi6)); 391 memset(&flowi6, 0, sizeof(flowi6));
393 392
394 err = ip6_datagram_send_ctl(net, sk, &msg, &flowi6, fl->opt, 393 err = ip6_datagram_send_ctl(net, sk, &msg, &flowi6, fl->opt,
@@ -517,7 +516,7 @@ int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
517 struct net *net = sock_net(sk); 516 struct net *net = sock_net(sk);
518 struct ipv6_pinfo *np = inet6_sk(sk); 517 struct ipv6_pinfo *np = inet6_sk(sk);
519 struct in6_flowlabel_req freq; 518 struct in6_flowlabel_req freq;
520 struct ipv6_fl_socklist *sfl1=NULL; 519 struct ipv6_fl_socklist *sfl1 = NULL;
521 struct ipv6_fl_socklist *sfl; 520 struct ipv6_fl_socklist *sfl;
522 struct ipv6_fl_socklist __rcu **sflp; 521 struct ipv6_fl_socklist __rcu **sflp;
523 struct ip6_flowlabel *fl, *fl1 = NULL; 522 struct ip6_flowlabel *fl, *fl1 = NULL;
@@ -542,7 +541,7 @@ int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
542 } 541 }
543 spin_lock_bh(&ip6_sk_fl_lock); 542 spin_lock_bh(&ip6_sk_fl_lock);
544 for (sflp = &np->ipv6_fl_list; 543 for (sflp = &np->ipv6_fl_list;
545 (sfl = rcu_dereference(*sflp))!=NULL; 544 (sfl = rcu_dereference(*sflp)) != NULL;
546 sflp = &sfl->next) { 545 sflp = &sfl->next) {
547 if (sfl->fl->label == freq.flr_label) { 546 if (sfl->fl->label == freq.flr_label) {
548 if (freq.flr_label == (np->flow_label&IPV6_FLOWLABEL_MASK)) 547 if (freq.flr_label == (np->flow_label&IPV6_FLOWLABEL_MASK))
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index f304471477dc..12c3c8ef3849 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -618,6 +618,7 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
618 int err = -1; 618 int err = -1;
619 u8 proto; 619 u8 proto;
620 struct sk_buff *new_skb; 620 struct sk_buff *new_skb;
621 __be16 protocol;
621 622
622 if (dev->type == ARPHRD_ETHER) 623 if (dev->type == ARPHRD_ETHER)
623 IPCB(skb)->flags = 0; 624 IPCB(skb)->flags = 0;
@@ -734,8 +735,9 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
734 ipv6h->daddr = fl6->daddr; 735 ipv6h->daddr = fl6->daddr;
735 736
736 ((__be16 *)(ipv6h + 1))[0] = tunnel->parms.o_flags; 737 ((__be16 *)(ipv6h + 1))[0] = tunnel->parms.o_flags;
737 ((__be16 *)(ipv6h + 1))[1] = (dev->type == ARPHRD_ETHER) ? 738 protocol = (dev->type == ARPHRD_ETHER) ?
738 htons(ETH_P_TEB) : skb->protocol; 739 htons(ETH_P_TEB) : skb->protocol;
740 ((__be16 *)(ipv6h + 1))[1] = protocol;
739 741
740 if (tunnel->parms.o_flags&(GRE_KEY|GRE_CSUM|GRE_SEQ)) { 742 if (tunnel->parms.o_flags&(GRE_KEY|GRE_CSUM|GRE_SEQ)) {
741 __be32 *ptr = (__be32 *)(((u8 *)ipv6h) + tunnel->hlen - 4); 743 __be32 *ptr = (__be32 *)(((u8 *)ipv6h) + tunnel->hlen - 4);
@@ -756,6 +758,8 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
756 } 758 }
757 } 759 }
758 760
761 skb_set_inner_protocol(skb, protocol);
762
759 ip6tunnel_xmit(skb, dev); 763 ip6tunnel_xmit(skb, dev);
760 if (ndst) 764 if (ndst)
761 ip6_tnl_dst_store(tunnel, ndst); 765 ip6_tnl_dst_store(tunnel, ndst);
@@ -782,7 +786,7 @@ static inline int ip6gre_xmit_ipv4(struct sk_buff *skb, struct net_device *dev)
782 encap_limit = t->parms.encap_limit; 786 encap_limit = t->parms.encap_limit;
783 787
784 memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6)); 788 memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
785 fl6.flowi6_proto = IPPROTO_IPIP; 789 fl6.flowi6_proto = IPPROTO_GRE;
786 790
787 dsfield = ipv4_get_dsfield(iph); 791 dsfield = ipv4_get_dsfield(iph);
788 792
@@ -832,7 +836,7 @@ static inline int ip6gre_xmit_ipv6(struct sk_buff *skb, struct net_device *dev)
832 encap_limit = t->parms.encap_limit; 836 encap_limit = t->parms.encap_limit;
833 837
834 memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6)); 838 memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
835 fl6.flowi6_proto = IPPROTO_IPV6; 839 fl6.flowi6_proto = IPPROTO_GRE;
836 840
837 dsfield = ipv6_get_dsfield(ipv6h); 841 dsfield = ipv6_get_dsfield(ipv6h);
838 if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) 842 if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
@@ -1238,7 +1242,7 @@ static void ip6gre_tunnel_setup(struct net_device *dev)
1238 dev->flags |= IFF_NOARP; 1242 dev->flags |= IFF_NOARP;
1239 dev->iflink = 0; 1243 dev->iflink = 0;
1240 dev->addr_len = sizeof(struct in6_addr); 1244 dev->addr_len = sizeof(struct in6_addr);
1241 dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; 1245 netif_keep_dst(dev);
1242} 1246}
1243 1247
1244static int ip6gre_tunnel_init(struct net_device *dev) 1248static int ip6gre_tunnel_init(struct net_device *dev)
diff --git a/net/ipv6/ip6_icmp.c b/net/ipv6/ip6_icmp.c
index 4578e23834f7..14dacc544c3e 100644
--- a/net/ipv6/ip6_icmp.c
+++ b/net/ipv6/ip6_icmp.c
@@ -13,7 +13,7 @@ static ip6_icmp_send_t __rcu *ip6_icmp_send;
13int inet6_register_icmp_sender(ip6_icmp_send_t *fn) 13int inet6_register_icmp_sender(ip6_icmp_send_t *fn)
14{ 14{
15 return (cmpxchg((ip6_icmp_send_t **)&ip6_icmp_send, NULL, fn) == NULL) ? 15 return (cmpxchg((ip6_icmp_send_t **)&ip6_icmp_send, NULL, fn) == NULL) ?
16 0 : -EBUSY; 16 0 : -EBUSY;
17} 17}
18EXPORT_SYMBOL(inet6_register_icmp_sender); 18EXPORT_SYMBOL(inet6_register_icmp_sender);
19 19
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 51d54dc376f3..a3084ab5df6c 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -15,8 +15,8 @@
15 */ 15 */
16/* Changes 16/* Changes
17 * 17 *
18 * Mitsuru KANDA @USAGI and 18 * Mitsuru KANDA @USAGI and
19 * YOSHIFUJI Hideaki @USAGI: Remove ipv6_parse_exthdrs(). 19 * YOSHIFUJI Hideaki @USAGI: Remove ipv6_parse_exthdrs().
20 */ 20 */
21 21
22#include <linux/errno.h> 22#include <linux/errno.h>
@@ -65,7 +65,7 @@ int ip6_rcv_finish(struct sk_buff *skb)
65int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) 65int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
66{ 66{
67 const struct ipv6hdr *hdr; 67 const struct ipv6hdr *hdr;
68 u32 pkt_len; 68 u32 pkt_len;
69 struct inet6_dev *idev; 69 struct inet6_dev *idev;
70 struct net *net = dev_net(skb->dev); 70 struct net *net = dev_net(skb->dev);
71 71
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 65eda2a8af48..9034f76ae013 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -53,31 +53,6 @@ static int ipv6_gso_pull_exthdrs(struct sk_buff *skb, int proto)
53 return proto; 53 return proto;
54} 54}
55 55
56static int ipv6_gso_send_check(struct sk_buff *skb)
57{
58 const struct ipv6hdr *ipv6h;
59 const struct net_offload *ops;
60 int err = -EINVAL;
61
62 if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h))))
63 goto out;
64
65 ipv6h = ipv6_hdr(skb);
66 __skb_pull(skb, sizeof(*ipv6h));
67 err = -EPROTONOSUPPORT;
68
69 ops = rcu_dereference(inet6_offloads[
70 ipv6_gso_pull_exthdrs(skb, ipv6h->nexthdr)]);
71
72 if (likely(ops && ops->callbacks.gso_send_check)) {
73 skb_reset_transport_header(skb);
74 err = ops->callbacks.gso_send_check(skb);
75 }
76
77out:
78 return err;
79}
80
81static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, 56static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
82 netdev_features_t features) 57 netdev_features_t features)
83{ 58{
@@ -244,7 +219,7 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
244 continue; 219 continue;
245 220
246 iph2 = (struct ipv6hdr *)(p->data + off); 221 iph2 = (struct ipv6hdr *)(p->data + off);
247 first_word = *(__be32 *)iph ^ *(__be32 *)iph2 ; 222 first_word = *(__be32 *)iph ^ *(__be32 *)iph2;
248 223
249 /* All fields must match except length and Traffic Class. 224 /* All fields must match except length and Traffic Class.
250 * XXX skbs on the gro_list have all been parsed and pulled 225 * XXX skbs on the gro_list have all been parsed and pulled
@@ -261,6 +236,9 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
261 /* flush if Traffic Class fields are different */ 236 /* flush if Traffic Class fields are different */
262 NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF00000)); 237 NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF00000));
263 NAPI_GRO_CB(p)->flush |= flush; 238 NAPI_GRO_CB(p)->flush |= flush;
239
240 /* Clear flush_id, there's really no concept of ID in IPv6. */
241 NAPI_GRO_CB(p)->flush_id = 0;
264 } 242 }
265 243
266 NAPI_GRO_CB(skb)->flush |= flush; 244 NAPI_GRO_CB(skb)->flush |= flush;
@@ -303,7 +281,6 @@ out_unlock:
303static struct packet_offload ipv6_packet_offload __read_mostly = { 281static struct packet_offload ipv6_packet_offload __read_mostly = {
304 .type = cpu_to_be16(ETH_P_IPV6), 282 .type = cpu_to_be16(ETH_P_IPV6),
305 .callbacks = { 283 .callbacks = {
306 .gso_send_check = ipv6_gso_send_check,
307 .gso_segment = ipv6_gso_segment, 284 .gso_segment = ipv6_gso_segment,
308 .gro_receive = ipv6_gro_receive, 285 .gro_receive = ipv6_gro_receive,
309 .gro_complete = ipv6_gro_complete, 286 .gro_complete = ipv6_gro_complete,
@@ -312,8 +289,9 @@ static struct packet_offload ipv6_packet_offload __read_mostly = {
312 289
313static const struct net_offload sit_offload = { 290static const struct net_offload sit_offload = {
314 .callbacks = { 291 .callbacks = {
315 .gso_send_check = ipv6_gso_send_check,
316 .gso_segment = ipv6_gso_segment, 292 .gso_segment = ipv6_gso_segment,
293 .gro_receive = ipv6_gro_receive,
294 .gro_complete = ipv6_gro_complete,
317 }, 295 },
318}; 296};
319 297
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 0a3448b2888f..8e950c250ada 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -20,7 +20,7 @@
20 * etc. 20 * etc.
21 * 21 *
22 * H. von Brand : Added missing #include <linux/string.h> 22 * H. von Brand : Added missing #include <linux/string.h>
23 * Imran Patel : frag id should be in NBO 23 * Imran Patel : frag id should be in NBO
24 * Kazunori MIYAZAWA @USAGI 24 * Kazunori MIYAZAWA @USAGI
25 * : add ip6_append_data and related functions 25 * : add ip6_append_data and related functions
26 * for datagram xmit 26 * for datagram xmit
@@ -233,7 +233,6 @@ int ip6_xmit(struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
233 kfree_skb(skb); 233 kfree_skb(skb);
234 return -EMSGSIZE; 234 return -EMSGSIZE;
235} 235}
236
237EXPORT_SYMBOL(ip6_xmit); 236EXPORT_SYMBOL(ip6_xmit);
238 237
239static int ip6_call_ra_chain(struct sk_buff *skb, int sel) 238static int ip6_call_ra_chain(struct sk_buff *skb, int sel)
@@ -555,14 +554,14 @@ static void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt)
555int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)) 554int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
556{ 555{
557 struct sk_buff *frag; 556 struct sk_buff *frag;
558 struct rt6_info *rt = (struct rt6_info*)skb_dst(skb); 557 struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
559 struct ipv6_pinfo *np = skb->sk ? inet6_sk(skb->sk) : NULL; 558 struct ipv6_pinfo *np = skb->sk ? inet6_sk(skb->sk) : NULL;
560 struct ipv6hdr *tmp_hdr; 559 struct ipv6hdr *tmp_hdr;
561 struct frag_hdr *fh; 560 struct frag_hdr *fh;
562 unsigned int mtu, hlen, left, len; 561 unsigned int mtu, hlen, left, len;
563 int hroom, troom; 562 int hroom, troom;
564 __be32 frag_id = 0; 563 __be32 frag_id = 0;
565 int ptr, offset = 0, err=0; 564 int ptr, offset = 0, err = 0;
566 u8 *prevhdr, nexthdr = 0; 565 u8 *prevhdr, nexthdr = 0;
567 struct net *net = dev_net(skb_dst(skb)->dev); 566 struct net *net = dev_net(skb_dst(skb)->dev);
568 567
@@ -637,7 +636,7 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
637 } 636 }
638 637
639 __skb_pull(skb, hlen); 638 __skb_pull(skb, hlen);
640 fh = (struct frag_hdr*)__skb_push(skb, sizeof(struct frag_hdr)); 639 fh = (struct frag_hdr *)__skb_push(skb, sizeof(struct frag_hdr));
641 __skb_push(skb, hlen); 640 __skb_push(skb, hlen);
642 skb_reset_network_header(skb); 641 skb_reset_network_header(skb);
643 memcpy(skb_network_header(skb), tmp_hdr, hlen); 642 memcpy(skb_network_header(skb), tmp_hdr, hlen);
@@ -662,7 +661,7 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
662 if (frag) { 661 if (frag) {
663 frag->ip_summed = CHECKSUM_NONE; 662 frag->ip_summed = CHECKSUM_NONE;
664 skb_reset_transport_header(frag); 663 skb_reset_transport_header(frag);
665 fh = (struct frag_hdr*)__skb_push(frag, sizeof(struct frag_hdr)); 664 fh = (struct frag_hdr *)__skb_push(frag, sizeof(struct frag_hdr));
666 __skb_push(frag, hlen); 665 __skb_push(frag, hlen);
667 skb_reset_network_header(frag); 666 skb_reset_network_header(frag);
668 memcpy(skb_network_header(frag), tmp_hdr, 667 memcpy(skb_network_header(frag), tmp_hdr,
@@ -681,7 +680,7 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
681 } 680 }
682 681
683 err = output(skb); 682 err = output(skb);
684 if(!err) 683 if (!err)
685 IP6_INC_STATS(net, ip6_dst_idev(&rt->dst), 684 IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
686 IPSTATS_MIB_FRAGCREATES); 685 IPSTATS_MIB_FRAGCREATES);
687 686
@@ -702,11 +701,7 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
702 return 0; 701 return 0;
703 } 702 }
704 703
705 while (frag) { 704 kfree_skb_list(frag);
706 skb = frag->next;
707 kfree_skb(frag);
708 frag = skb;
709 }
710 705
711 IP6_INC_STATS(net, ip6_dst_idev(&rt->dst), 706 IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
712 IPSTATS_MIB_FRAGFAILS); 707 IPSTATS_MIB_FRAGFAILS);
@@ -742,7 +737,7 @@ slow_path:
742 /* 737 /*
743 * Keep copying data until we run out. 738 * Keep copying data until we run out.
744 */ 739 */
745 while(left > 0) { 740 while (left > 0) {
746 len = left; 741 len = left;
747 /* IF: it doesn't fit, use 'mtu' - the data space left */ 742 /* IF: it doesn't fit, use 'mtu' - the data space left */
748 if (len > mtu) 743 if (len > mtu)
@@ -865,7 +860,7 @@ static struct dst_entry *ip6_sk_dst_check(struct sock *sk,
865 /* Yes, checking route validity in not connected 860 /* Yes, checking route validity in not connected
866 * case is not very simple. Take into account, 861 * case is not very simple. Take into account,
867 * that we do not support routing by source, TOS, 862 * that we do not support routing by source, TOS,
868 * and MSG_DONTROUTE --ANK (980726) 863 * and MSG_DONTROUTE --ANK (980726)
869 * 864 *
870 * 1. ip6_rt_check(): If route was host route, 865 * 1. ip6_rt_check(): If route was host route,
871 * check that cached destination is current. 866 * check that cached destination is current.
@@ -1049,7 +1044,7 @@ static inline int ip6_ufo_append_data(struct sock *sk,
1049 int getfrag(void *from, char *to, int offset, int len, 1044 int getfrag(void *from, char *to, int offset, int len,
1050 int odd, struct sk_buff *skb), 1045 int odd, struct sk_buff *skb),
1051 void *from, int length, int hh_len, int fragheaderlen, 1046 void *from, int length, int hh_len, int fragheaderlen,
1052 int transhdrlen, int mtu,unsigned int flags, 1047 int transhdrlen, int mtu, unsigned int flags,
1053 struct rt6_info *rt) 1048 struct rt6_info *rt)
1054 1049
1055{ 1050{
@@ -1072,7 +1067,7 @@ static inline int ip6_ufo_append_data(struct sock *sk,
1072 skb_reserve(skb, hh_len); 1067 skb_reserve(skb, hh_len);
1073 1068
1074 /* create space for UDP/IP header */ 1069 /* create space for UDP/IP header */
1075 skb_put(skb,fragheaderlen + transhdrlen); 1070 skb_put(skb, fragheaderlen + transhdrlen);
1076 1071
1077 /* initialize network header pointer */ 1072 /* initialize network header pointer */
1078 skb_reset_network_header(skb); 1073 skb_reset_network_header(skb);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 69a84b464009..9409887fb664 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -412,12 +412,12 @@ __u16 ip6_tnl_parse_tlv_enc_lim(struct sk_buff *skb, __u8 *raw)
412{ 412{
413 const struct ipv6hdr *ipv6h = (const struct ipv6hdr *) raw; 413 const struct ipv6hdr *ipv6h = (const struct ipv6hdr *) raw;
414 __u8 nexthdr = ipv6h->nexthdr; 414 __u8 nexthdr = ipv6h->nexthdr;
415 __u16 off = sizeof (*ipv6h); 415 __u16 off = sizeof(*ipv6h);
416 416
417 while (ipv6_ext_hdr(nexthdr) && nexthdr != NEXTHDR_NONE) { 417 while (ipv6_ext_hdr(nexthdr) && nexthdr != NEXTHDR_NONE) {
418 __u16 optlen = 0; 418 __u16 optlen = 0;
419 struct ipv6_opt_hdr *hdr; 419 struct ipv6_opt_hdr *hdr;
420 if (raw + off + sizeof (*hdr) > skb->data && 420 if (raw + off + sizeof(*hdr) > skb->data &&
421 !pskb_may_pull(skb, raw - skb->data + off + sizeof (*hdr))) 421 !pskb_may_pull(skb, raw - skb->data + off + sizeof (*hdr)))
422 break; 422 break;
423 423
@@ -534,7 +534,7 @@ ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct inet6_skb_parm *opt,
534 mtu = IPV6_MIN_MTU; 534 mtu = IPV6_MIN_MTU;
535 t->dev->mtu = mtu; 535 t->dev->mtu = mtu;
536 536
537 if ((len = sizeof (*ipv6h) + ntohs(ipv6h->payload_len)) > mtu) { 537 if ((len = sizeof(*ipv6h) + ntohs(ipv6h->payload_len)) > mtu) {
538 rel_type = ICMPV6_PKT_TOOBIG; 538 rel_type = ICMPV6_PKT_TOOBIG;
539 rel_code = 0; 539 rel_code = 0;
540 rel_info = mtu; 540 rel_info = mtu;
@@ -995,7 +995,7 @@ static int ip6_tnl_xmit2(struct sk_buff *skb,
995 t->parms.name); 995 t->parms.name);
996 goto tx_err_dst_release; 996 goto tx_err_dst_release;
997 } 997 }
998 mtu = dst_mtu(dst) - sizeof (*ipv6h); 998 mtu = dst_mtu(dst) - sizeof(*ipv6h);
999 if (encap_limit >= 0) { 999 if (encap_limit >= 0) {
1000 max_headroom += 8; 1000 max_headroom += 8;
1001 mtu -= 8; 1001 mtu -= 8;
@@ -1087,7 +1087,7 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
1087 if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) 1087 if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
1088 encap_limit = t->parms.encap_limit; 1088 encap_limit = t->parms.encap_limit;
1089 1089
1090 memcpy(&fl6, &t->fl.u.ip6, sizeof (fl6)); 1090 memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
1091 fl6.flowi6_proto = IPPROTO_IPIP; 1091 fl6.flowi6_proto = IPPROTO_IPIP;
1092 1092
1093 dsfield = ipv4_get_dsfield(iph); 1093 dsfield = ipv4_get_dsfield(iph);
@@ -1139,7 +1139,7 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
1139 } else if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) 1139 } else if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
1140 encap_limit = t->parms.encap_limit; 1140 encap_limit = t->parms.encap_limit;
1141 1141
1142 memcpy(&fl6, &t->fl.u.ip6, sizeof (fl6)); 1142 memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
1143 fl6.flowi6_proto = IPPROTO_IPV6; 1143 fl6.flowi6_proto = IPPROTO_IPV6;
1144 1144
1145 dsfield = ipv6_get_dsfield(ipv6h); 1145 dsfield = ipv6_get_dsfield(ipv6h);
@@ -1233,11 +1233,11 @@ static void ip6_tnl_link_config(struct ip6_tnl *t)
1233 1233
1234 if (rt->dst.dev) { 1234 if (rt->dst.dev) {
1235 dev->hard_header_len = rt->dst.dev->hard_header_len + 1235 dev->hard_header_len = rt->dst.dev->hard_header_len +
1236 sizeof (struct ipv6hdr); 1236 sizeof(struct ipv6hdr);
1237 1237
1238 dev->mtu = rt->dst.dev->mtu - sizeof (struct ipv6hdr); 1238 dev->mtu = rt->dst.dev->mtu - sizeof(struct ipv6hdr);
1239 if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) 1239 if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
1240 dev->mtu-=8; 1240 dev->mtu -= 8;
1241 1241
1242 if (dev->mtu < IPV6_MIN_MTU) 1242 if (dev->mtu < IPV6_MIN_MTU)
1243 dev->mtu = IPV6_MIN_MTU; 1243 dev->mtu = IPV6_MIN_MTU;
@@ -1354,7 +1354,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
1354 switch (cmd) { 1354 switch (cmd) {
1355 case SIOCGETTUNNEL: 1355 case SIOCGETTUNNEL:
1356 if (dev == ip6n->fb_tnl_dev) { 1356 if (dev == ip6n->fb_tnl_dev) {
1357 if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof (p))) { 1357 if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) {
1358 err = -EFAULT; 1358 err = -EFAULT;
1359 break; 1359 break;
1360 } 1360 }
@@ -1366,7 +1366,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
1366 memset(&p, 0, sizeof(p)); 1366 memset(&p, 0, sizeof(p));
1367 } 1367 }
1368 ip6_tnl_parm_to_user(&p, &t->parms); 1368 ip6_tnl_parm_to_user(&p, &t->parms);
1369 if (copy_to_user(ifr->ifr_ifru.ifru_data, &p, sizeof (p))) { 1369 if (copy_to_user(ifr->ifr_ifru.ifru_data, &p, sizeof(p))) {
1370 err = -EFAULT; 1370 err = -EFAULT;
1371 } 1371 }
1372 break; 1372 break;
@@ -1376,7 +1376,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
1376 if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) 1376 if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
1377 break; 1377 break;
1378 err = -EFAULT; 1378 err = -EFAULT;
1379 if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof (p))) 1379 if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
1380 break; 1380 break;
1381 err = -EINVAL; 1381 err = -EINVAL;
1382 if (p.proto != IPPROTO_IPV6 && p.proto != IPPROTO_IPIP && 1382 if (p.proto != IPPROTO_IPV6 && p.proto != IPPROTO_IPIP &&
@@ -1411,7 +1411,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
1411 1411
1412 if (dev == ip6n->fb_tnl_dev) { 1412 if (dev == ip6n->fb_tnl_dev) {
1413 err = -EFAULT; 1413 err = -EFAULT;
1414 if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof (p))) 1414 if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
1415 break; 1415 break;
1416 err = -ENOENT; 1416 err = -ENOENT;
1417 ip6_tnl_parm_from_user(&p1, &p); 1417 ip6_tnl_parm_from_user(&p1, &p);
@@ -1486,14 +1486,14 @@ static void ip6_tnl_dev_setup(struct net_device *dev)
1486 dev->destructor = ip6_dev_free; 1486 dev->destructor = ip6_dev_free;
1487 1487
1488 dev->type = ARPHRD_TUNNEL6; 1488 dev->type = ARPHRD_TUNNEL6;
1489 dev->hard_header_len = LL_MAX_HEADER + sizeof (struct ipv6hdr); 1489 dev->hard_header_len = LL_MAX_HEADER + sizeof(struct ipv6hdr);
1490 dev->mtu = ETH_DATA_LEN - sizeof (struct ipv6hdr); 1490 dev->mtu = ETH_DATA_LEN - sizeof(struct ipv6hdr);
1491 t = netdev_priv(dev); 1491 t = netdev_priv(dev);
1492 if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) 1492 if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
1493 dev->mtu-=8; 1493 dev->mtu -= 8;
1494 dev->flags |= IFF_NOARP; 1494 dev->flags |= IFF_NOARP;
1495 dev->addr_len = sizeof(struct in6_addr); 1495 dev->addr_len = sizeof(struct in6_addr);
1496 dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; 1496 netif_keep_dst(dev);
1497 /* This perm addr will be used as interface identifier by IPv6 */ 1497 /* This perm addr will be used as interface identifier by IPv6 */
1498 dev->addr_assign_type = NET_ADDR_RANDOM; 1498 dev->addr_assign_type = NET_ADDR_RANDOM;
1499 eth_random_addr(dev->perm_addr); 1499 eth_random_addr(dev->perm_addr);
diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
new file mode 100644
index 000000000000..b04ed72c4542
--- /dev/null
+++ b/net/ipv6/ip6_udp_tunnel.c
@@ -0,0 +1,107 @@
1#include <linux/module.h>
2#include <linux/errno.h>
3#include <linux/socket.h>
4#include <linux/udp.h>
5#include <linux/types.h>
6#include <linux/kernel.h>
7#include <linux/in6.h>
8#include <net/udp.h>
9#include <net/udp_tunnel.h>
10#include <net/net_namespace.h>
11#include <net/netns/generic.h>
12#include <net/ip6_tunnel.h>
13#include <net/ip6_checksum.h>
14
15int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
16 struct socket **sockp)
17{
18 struct sockaddr_in6 udp6_addr;
19 int err;
20 struct socket *sock = NULL;
21
22 err = sock_create_kern(AF_INET6, SOCK_DGRAM, 0, &sock);
23 if (err < 0)
24 goto error;
25
26 sk_change_net(sock->sk, net);
27
28 udp6_addr.sin6_family = AF_INET6;
29 memcpy(&udp6_addr.sin6_addr, &cfg->local_ip6,
30 sizeof(udp6_addr.sin6_addr));
31 udp6_addr.sin6_port = cfg->local_udp_port;
32 err = kernel_bind(sock, (struct sockaddr *)&udp6_addr,
33 sizeof(udp6_addr));
34 if (err < 0)
35 goto error;
36
37 if (cfg->peer_udp_port) {
38 udp6_addr.sin6_family = AF_INET6;
39 memcpy(&udp6_addr.sin6_addr, &cfg->peer_ip6,
40 sizeof(udp6_addr.sin6_addr));
41 udp6_addr.sin6_port = cfg->peer_udp_port;
42 err = kernel_connect(sock,
43 (struct sockaddr *)&udp6_addr,
44 sizeof(udp6_addr), 0);
45 }
46 if (err < 0)
47 goto error;
48
49 udp_set_no_check6_tx(sock->sk, !cfg->use_udp6_tx_checksums);
50 udp_set_no_check6_rx(sock->sk, !cfg->use_udp6_rx_checksums);
51
52 *sockp = sock;
53 return 0;
54
55error:
56 if (sock) {
57 kernel_sock_shutdown(sock, SHUT_RDWR);
58 sk_release_kernel(sock->sk);
59 }
60 *sockp = NULL;
61 return err;
62}
63EXPORT_SYMBOL_GPL(udp_sock_create6);
64
65int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst,
66 struct sk_buff *skb, struct net_device *dev,
67 struct in6_addr *saddr, struct in6_addr *daddr,
68 __u8 prio, __u8 ttl, __be16 src_port, __be16 dst_port)
69{
70 struct udphdr *uh;
71 struct ipv6hdr *ip6h;
72 struct sock *sk = sock->sk;
73
74 __skb_push(skb, sizeof(*uh));
75 skb_reset_transport_header(skb);
76 uh = udp_hdr(skb);
77
78 uh->dest = dst_port;
79 uh->source = src_port;
80
81 uh->len = htons(skb->len);
82 uh->check = 0;
83
84 memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
85 IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED
86 | IPSKB_REROUTED);
87 skb_dst_set(skb, dst);
88
89 udp6_set_csum(udp_get_no_check6_tx(sk), skb, &inet6_sk(sk)->saddr,
90 &sk->sk_v6_daddr, skb->len);
91
92 __skb_push(skb, sizeof(*ip6h));
93 skb_reset_network_header(skb);
94 ip6h = ipv6_hdr(skb);
95 ip6_flow_hdr(ip6h, prio, htonl(0));
96 ip6h->payload_len = htons(skb->len);
97 ip6h->nexthdr = IPPROTO_UDP;
98 ip6h->hop_limit = ttl;
99 ip6h->daddr = *daddr;
100 ip6h->saddr = *saddr;
101
102 ip6tunnel_xmit(skb, dev);
103 return 0;
104}
105EXPORT_SYMBOL_GPL(udp_tunnel6_xmit_skb);
106
107MODULE_LICENSE("GPL");
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 5833a2244467..d440bb585524 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -807,7 +807,7 @@ static void vti6_dev_setup(struct net_device *dev)
807 dev->mtu = ETH_DATA_LEN; 807 dev->mtu = ETH_DATA_LEN;
808 dev->flags |= IFF_NOARP; 808 dev->flags |= IFF_NOARP;
809 dev->addr_len = sizeof(struct in6_addr); 809 dev->addr_len = sizeof(struct in6_addr);
810 dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; 810 netif_keep_dst(dev);
811} 811}
812 812
813/** 813/**
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index f9a3fd320d1d..0171f08325c3 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -845,7 +845,7 @@ static void ip6mr_destroy_unres(struct mr6_table *mrt, struct mfc6_cache *c)
845 845
846 atomic_dec(&mrt->cache_resolve_queue_len); 846 atomic_dec(&mrt->cache_resolve_queue_len);
847 847
848 while((skb = skb_dequeue(&c->mfc_un.unres.unresolved)) != NULL) { 848 while ((skb = skb_dequeue(&c->mfc_un.unres.unresolved)) != NULL) {
849 if (ipv6_hdr(skb)->version == 0) { 849 if (ipv6_hdr(skb)->version == 0) {
850 struct nlmsghdr *nlh = (struct nlmsghdr *)skb_pull(skb, sizeof(struct ipv6hdr)); 850 struct nlmsghdr *nlh = (struct nlmsghdr *)skb_pull(skb, sizeof(struct ipv6hdr));
851 nlh->nlmsg_type = NLMSG_ERROR; 851 nlh->nlmsg_type = NLMSG_ERROR;
@@ -1103,7 +1103,7 @@ static void ip6mr_cache_resolve(struct net *net, struct mr6_table *mrt,
1103 * Play the pending entries through our router 1103 * Play the pending entries through our router
1104 */ 1104 */
1105 1105
1106 while((skb = __skb_dequeue(&uc->mfc_un.unres.unresolved))) { 1106 while ((skb = __skb_dequeue(&uc->mfc_un.unres.unresolved))) {
1107 if (ipv6_hdr(skb)->version == 0) { 1107 if (ipv6_hdr(skb)->version == 0) {
1108 struct nlmsghdr *nlh = (struct nlmsghdr *)skb_pull(skb, sizeof(struct ipv6hdr)); 1108 struct nlmsghdr *nlh = (struct nlmsghdr *)skb_pull(skb, sizeof(struct ipv6hdr));
1109 1109
diff --git a/net/ipv6/ipcomp6.c b/net/ipv6/ipcomp6.c
index d1c793cffcb5..1b9316e1386a 100644
--- a/net/ipv6/ipcomp6.c
+++ b/net/ipv6/ipcomp6.c
@@ -181,8 +181,7 @@ static int ipcomp6_rcv_cb(struct sk_buff *skb, int err)
181 return 0; 181 return 0;
182} 182}
183 183
184static const struct xfrm_type ipcomp6_type = 184static const struct xfrm_type ipcomp6_type = {
185{
186 .description = "IPCOMP6", 185 .description = "IPCOMP6",
187 .owner = THIS_MODULE, 186 .owner = THIS_MODULE,
188 .proto = IPPROTO_COMP, 187 .proto = IPPROTO_COMP,
@@ -193,8 +192,7 @@ static const struct xfrm_type ipcomp6_type =
193 .hdr_offset = xfrm6_find_1stfragopt, 192 .hdr_offset = xfrm6_find_1stfragopt,
194}; 193};
195 194
196static struct xfrm6_protocol ipcomp6_protocol = 195static struct xfrm6_protocol ipcomp6_protocol = {
197{
198 .handler = xfrm6_rcv, 196 .handler = xfrm6_rcv,
199 .cb_handler = ipcomp6_rcv_cb, 197 .cb_handler = ipcomp6_rcv_cb,
200 .err_handler = ipcomp6_err, 198 .err_handler = ipcomp6_err,
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 0c289982796d..e1a9583bb419 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -66,12 +66,12 @@ int ip6_ra_control(struct sock *sk, int sel)
66 if (sk->sk_type != SOCK_RAW || inet_sk(sk)->inet_num != IPPROTO_RAW) 66 if (sk->sk_type != SOCK_RAW || inet_sk(sk)->inet_num != IPPROTO_RAW)
67 return -ENOPROTOOPT; 67 return -ENOPROTOOPT;
68 68
69 new_ra = (sel>=0) ? kmalloc(sizeof(*new_ra), GFP_KERNEL) : NULL; 69 new_ra = (sel >= 0) ? kmalloc(sizeof(*new_ra), GFP_KERNEL) : NULL;
70 70
71 write_lock_bh(&ip6_ra_lock); 71 write_lock_bh(&ip6_ra_lock);
72 for (rap = &ip6_ra_chain; (ra=*rap) != NULL; rap = &ra->next) { 72 for (rap = &ip6_ra_chain; (ra = *rap) != NULL; rap = &ra->next) {
73 if (ra->sk == sk) { 73 if (ra->sk == sk) {
74 if (sel>=0) { 74 if (sel >= 0) {
75 write_unlock_bh(&ip6_ra_lock); 75 write_unlock_bh(&ip6_ra_lock);
76 kfree(new_ra); 76 kfree(new_ra);
77 return -EADDRINUSE; 77 return -EADDRINUSE;
@@ -130,7 +130,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
130 int retv = -ENOPROTOOPT; 130 int retv = -ENOPROTOOPT;
131 131
132 if (optval == NULL) 132 if (optval == NULL)
133 val=0; 133 val = 0;
134 else { 134 else {
135 if (optlen >= sizeof(int)) { 135 if (optlen >= sizeof(int)) {
136 if (get_user(val, (int __user *) optval)) 136 if (get_user(val, (int __user *) optval))
@@ -139,7 +139,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
139 val = 0; 139 val = 0;
140 } 140 }
141 141
142 valbool = (val!=0); 142 valbool = (val != 0);
143 143
144 if (ip6_mroute_opt(optname)) 144 if (ip6_mroute_opt(optname))
145 return ip6_mroute_setsockopt(sk, optname, optval, optlen); 145 return ip6_mroute_setsockopt(sk, optname, optval, optlen);
@@ -474,7 +474,7 @@ sticky_done:
474 goto done; 474 goto done;
475 475
476 msg.msg_controllen = optlen; 476 msg.msg_controllen = optlen;
477 msg.msg_control = (void*)(opt+1); 477 msg.msg_control = (void *)(opt+1);
478 478
479 retv = ip6_datagram_send_ctl(net, sk, &msg, &fl6, opt, &junk, 479 retv = ip6_datagram_send_ctl(net, sk, &msg, &fl6, opt, &junk,
480 &junk, &junk); 480 &junk, &junk);
@@ -687,7 +687,7 @@ done:
687 retv = -ENOBUFS; 687 retv = -ENOBUFS;
688 break; 688 break;
689 } 689 }
690 gsf = kmalloc(optlen,GFP_KERNEL); 690 gsf = kmalloc(optlen, GFP_KERNEL);
691 if (!gsf) { 691 if (!gsf) {
692 retv = -ENOBUFS; 692 retv = -ENOBUFS;
693 break; 693 break;
@@ -873,7 +873,6 @@ int ipv6_setsockopt(struct sock *sk, int level, int optname,
873#endif 873#endif
874 return err; 874 return err;
875} 875}
876
877EXPORT_SYMBOL(ipv6_setsockopt); 876EXPORT_SYMBOL(ipv6_setsockopt);
878 877
879#ifdef CONFIG_COMPAT 878#ifdef CONFIG_COMPAT
@@ -909,7 +908,6 @@ int compat_ipv6_setsockopt(struct sock *sk, int level, int optname,
909#endif 908#endif
910 return err; 909 return err;
911} 910}
912
913EXPORT_SYMBOL(compat_ipv6_setsockopt); 911EXPORT_SYMBOL(compat_ipv6_setsockopt);
914#endif 912#endif
915 913
@@ -921,7 +919,7 @@ static int ipv6_getsockopt_sticky(struct sock *sk, struct ipv6_txoptions *opt,
921 if (!opt) 919 if (!opt)
922 return 0; 920 return 0;
923 921
924 switch(optname) { 922 switch (optname) {
925 case IPV6_HOPOPTS: 923 case IPV6_HOPOPTS:
926 hdr = opt->hopopt; 924 hdr = opt->hopopt;
927 break; 925 break;
@@ -1284,9 +1282,9 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
1284 return -ENOPROTOOPT; 1282 return -ENOPROTOOPT;
1285 } 1283 }
1286 len = min_t(unsigned int, sizeof(int), len); 1284 len = min_t(unsigned int, sizeof(int), len);
1287 if(put_user(len, optlen)) 1285 if (put_user(len, optlen))
1288 return -EFAULT; 1286 return -EFAULT;
1289 if(copy_to_user(optval,&val,len)) 1287 if (copy_to_user(optval, &val, len))
1290 return -EFAULT; 1288 return -EFAULT;
1291 return 0; 1289 return 0;
1292} 1290}
@@ -1299,7 +1297,7 @@ int ipv6_getsockopt(struct sock *sk, int level, int optname,
1299 if (level == SOL_IP && sk->sk_type != SOCK_RAW) 1297 if (level == SOL_IP && sk->sk_type != SOCK_RAW)
1300 return udp_prot.getsockopt(sk, level, optname, optval, optlen); 1298 return udp_prot.getsockopt(sk, level, optname, optval, optlen);
1301 1299
1302 if(level != SOL_IPV6) 1300 if (level != SOL_IPV6)
1303 return -ENOPROTOOPT; 1301 return -ENOPROTOOPT;
1304 1302
1305 err = do_ipv6_getsockopt(sk, level, optname, optval, optlen, 0); 1303 err = do_ipv6_getsockopt(sk, level, optname, optval, optlen, 0);
@@ -1321,7 +1319,6 @@ int ipv6_getsockopt(struct sock *sk, int level, int optname,
1321#endif 1319#endif
1322 return err; 1320 return err;
1323} 1321}
1324
1325EXPORT_SYMBOL(ipv6_getsockopt); 1322EXPORT_SYMBOL(ipv6_getsockopt);
1326 1323
1327#ifdef CONFIG_COMPAT 1324#ifdef CONFIG_COMPAT
@@ -1364,7 +1361,6 @@ int compat_ipv6_getsockopt(struct sock *sk, int level, int optname,
1364#endif 1361#endif
1365 return err; 1362 return err;
1366} 1363}
1367
1368EXPORT_SYMBOL(compat_ipv6_getsockopt); 1364EXPORT_SYMBOL(compat_ipv6_getsockopt);
1369#endif 1365#endif
1370 1366
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index a23b655a7627..9648de2b6745 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -64,15 +64,6 @@
64 64
65#include <net/ip6_checksum.h> 65#include <net/ip6_checksum.h>
66 66
67/* Set to 3 to get tracing... */
68#define MCAST_DEBUG 2
69
70#if MCAST_DEBUG >= 3
71#define MDBG(x) printk x
72#else
73#define MDBG(x)
74#endif
75
76/* Ensure that we have struct in6_addr aligned on 32bit word. */ 67/* Ensure that we have struct in6_addr aligned on 32bit word. */
77static void *__mld2_query_bugs[] __attribute__((__unused__)) = { 68static void *__mld2_query_bugs[] __attribute__((__unused__)) = {
78 BUILD_BUG_ON_NULL(offsetof(struct mld2_query, mld2q_srcs) % 4), 69 BUILD_BUG_ON_NULL(offsetof(struct mld2_query, mld2q_srcs) % 4),
@@ -82,9 +73,6 @@ static void *__mld2_query_bugs[] __attribute__((__unused__)) = {
82 73
83static struct in6_addr mld2_all_mcr = MLD2_ALL_MCR_INIT; 74static struct in6_addr mld2_all_mcr = MLD2_ALL_MCR_INIT;
84 75
85/* Big mc list lock for all the sockets */
86static DEFINE_SPINLOCK(ipv6_sk_mc_lock);
87
88static void igmp6_join_group(struct ifmcaddr6 *ma); 76static void igmp6_join_group(struct ifmcaddr6 *ma);
89static void igmp6_leave_group(struct ifmcaddr6 *ma); 77static void igmp6_leave_group(struct ifmcaddr6 *ma);
90static void igmp6_timer_handler(unsigned long data); 78static void igmp6_timer_handler(unsigned long data);
@@ -121,6 +109,7 @@ static int ip6_mc_leave_src(struct sock *sk, struct ipv6_mc_socklist *iml,
121#define IPV6_MLD_MAX_MSF 64 109#define IPV6_MLD_MAX_MSF 64
122 110
123int sysctl_mld_max_msf __read_mostly = IPV6_MLD_MAX_MSF; 111int sysctl_mld_max_msf __read_mostly = IPV6_MLD_MAX_MSF;
112int sysctl_mld_qrv __read_mostly = MLD_QRV_DEFAULT;
124 113
125/* 114/*
126 * socket join on multicast group 115 * socket join on multicast group
@@ -173,7 +162,6 @@ int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
173 mc_lst->addr = *addr; 162 mc_lst->addr = *addr;
174 163
175 rtnl_lock(); 164 rtnl_lock();
176 rcu_read_lock();
177 if (ifindex == 0) { 165 if (ifindex == 0) {
178 struct rt6_info *rt; 166 struct rt6_info *rt;
179 rt = rt6_lookup(net, addr, NULL, 0, 0); 167 rt = rt6_lookup(net, addr, NULL, 0, 0);
@@ -182,10 +170,9 @@ int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
182 ip6_rt_put(rt); 170 ip6_rt_put(rt);
183 } 171 }
184 } else 172 } else
185 dev = dev_get_by_index_rcu(net, ifindex); 173 dev = __dev_get_by_index(net, ifindex);
186 174
187 if (dev == NULL) { 175 if (dev == NULL) {
188 rcu_read_unlock();
189 rtnl_unlock(); 176 rtnl_unlock();
190 sock_kfree_s(sk, mc_lst, sizeof(*mc_lst)); 177 sock_kfree_s(sk, mc_lst, sizeof(*mc_lst));
191 return -ENODEV; 178 return -ENODEV;
@@ -203,18 +190,14 @@ int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
203 err = ipv6_dev_mc_inc(dev, addr); 190 err = ipv6_dev_mc_inc(dev, addr);
204 191
205 if (err) { 192 if (err) {
206 rcu_read_unlock();
207 rtnl_unlock(); 193 rtnl_unlock();
208 sock_kfree_s(sk, mc_lst, sizeof(*mc_lst)); 194 sock_kfree_s(sk, mc_lst, sizeof(*mc_lst));
209 return err; 195 return err;
210 } 196 }
211 197
212 spin_lock(&ipv6_sk_mc_lock);
213 mc_lst->next = np->ipv6_mc_list; 198 mc_lst->next = np->ipv6_mc_list;
214 rcu_assign_pointer(np->ipv6_mc_list, mc_lst); 199 rcu_assign_pointer(np->ipv6_mc_list, mc_lst);
215 spin_unlock(&ipv6_sk_mc_lock);
216 200
217 rcu_read_unlock();
218 rtnl_unlock(); 201 rtnl_unlock();
219 202
220 return 0; 203 return 0;
@@ -234,20 +217,16 @@ int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
234 return -EINVAL; 217 return -EINVAL;
235 218
236 rtnl_lock(); 219 rtnl_lock();
237 spin_lock(&ipv6_sk_mc_lock);
238 for (lnk = &np->ipv6_mc_list; 220 for (lnk = &np->ipv6_mc_list;
239 (mc_lst = rcu_dereference_protected(*lnk, 221 (mc_lst = rtnl_dereference(*lnk)) != NULL;
240 lockdep_is_held(&ipv6_sk_mc_lock))) !=NULL ;
241 lnk = &mc_lst->next) { 222 lnk = &mc_lst->next) {
242 if ((ifindex == 0 || mc_lst->ifindex == ifindex) && 223 if ((ifindex == 0 || mc_lst->ifindex == ifindex) &&
243 ipv6_addr_equal(&mc_lst->addr, addr)) { 224 ipv6_addr_equal(&mc_lst->addr, addr)) {
244 struct net_device *dev; 225 struct net_device *dev;
245 226
246 *lnk = mc_lst->next; 227 *lnk = mc_lst->next;
247 spin_unlock(&ipv6_sk_mc_lock);
248 228
249 rcu_read_lock(); 229 dev = __dev_get_by_index(net, mc_lst->ifindex);
250 dev = dev_get_by_index_rcu(net, mc_lst->ifindex);
251 if (dev != NULL) { 230 if (dev != NULL) {
252 struct inet6_dev *idev = __in6_dev_get(dev); 231 struct inet6_dev *idev = __in6_dev_get(dev);
253 232
@@ -256,7 +235,6 @@ int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
256 __ipv6_dev_mc_dec(idev, &mc_lst->addr); 235 __ipv6_dev_mc_dec(idev, &mc_lst->addr);
257 } else 236 } else
258 (void) ip6_mc_leave_src(sk, mc_lst, NULL); 237 (void) ip6_mc_leave_src(sk, mc_lst, NULL);
259 rcu_read_unlock();
260 rtnl_unlock(); 238 rtnl_unlock();
261 239
262 atomic_sub(sizeof(*mc_lst), &sk->sk_omem_alloc); 240 atomic_sub(sizeof(*mc_lst), &sk->sk_omem_alloc);
@@ -264,7 +242,6 @@ int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
264 return 0; 242 return 0;
265 } 243 }
266 } 244 }
267 spin_unlock(&ipv6_sk_mc_lock);
268 rtnl_unlock(); 245 rtnl_unlock();
269 246
270 return -EADDRNOTAVAIL; 247 return -EADDRNOTAVAIL;
@@ -311,16 +288,12 @@ void ipv6_sock_mc_close(struct sock *sk)
311 return; 288 return;
312 289
313 rtnl_lock(); 290 rtnl_lock();
314 spin_lock(&ipv6_sk_mc_lock); 291 while ((mc_lst = rtnl_dereference(np->ipv6_mc_list)) != NULL) {
315 while ((mc_lst = rcu_dereference_protected(np->ipv6_mc_list,
316 lockdep_is_held(&ipv6_sk_mc_lock))) != NULL) {
317 struct net_device *dev; 292 struct net_device *dev;
318 293
319 np->ipv6_mc_list = mc_lst->next; 294 np->ipv6_mc_list = mc_lst->next;
320 spin_unlock(&ipv6_sk_mc_lock);
321 295
322 rcu_read_lock(); 296 dev = __dev_get_by_index(net, mc_lst->ifindex);
323 dev = dev_get_by_index_rcu(net, mc_lst->ifindex);
324 if (dev) { 297 if (dev) {
325 struct inet6_dev *idev = __in6_dev_get(dev); 298 struct inet6_dev *idev = __in6_dev_get(dev);
326 299
@@ -329,14 +302,11 @@ void ipv6_sock_mc_close(struct sock *sk)
329 __ipv6_dev_mc_dec(idev, &mc_lst->addr); 302 __ipv6_dev_mc_dec(idev, &mc_lst->addr);
330 } else 303 } else
331 (void) ip6_mc_leave_src(sk, mc_lst, NULL); 304 (void) ip6_mc_leave_src(sk, mc_lst, NULL);
332 rcu_read_unlock();
333 305
334 atomic_sub(sizeof(*mc_lst), &sk->sk_omem_alloc); 306 atomic_sub(sizeof(*mc_lst), &sk->sk_omem_alloc);
335 kfree_rcu(mc_lst, rcu); 307 kfree_rcu(mc_lst, rcu);
336 308
337 spin_lock(&ipv6_sk_mc_lock);
338 } 309 }
339 spin_unlock(&ipv6_sk_mc_lock);
340 rtnl_unlock(); 310 rtnl_unlock();
341} 311}
342 312
@@ -400,7 +370,7 @@ int ip6_mc_source(int add, int omode, struct sock *sk,
400 if (!psl) 370 if (!psl)
401 goto done; /* err = -EADDRNOTAVAIL */ 371 goto done; /* err = -EADDRNOTAVAIL */
402 rv = !0; 372 rv = !0;
403 for (i=0; i<psl->sl_count; i++) { 373 for (i = 0; i < psl->sl_count; i++) {
404 rv = !ipv6_addr_equal(&psl->sl_addr[i], source); 374 rv = !ipv6_addr_equal(&psl->sl_addr[i], source);
405 if (rv == 0) 375 if (rv == 0)
406 break; 376 break;
@@ -417,7 +387,7 @@ int ip6_mc_source(int add, int omode, struct sock *sk,
417 /* update the interface filter */ 387 /* update the interface filter */
418 ip6_mc_del_src(idev, group, omode, 1, source, 1); 388 ip6_mc_del_src(idev, group, omode, 1, source, 1);
419 389
420 for (j=i+1; j<psl->sl_count; j++) 390 for (j = i+1; j < psl->sl_count; j++)
421 psl->sl_addr[j-1] = psl->sl_addr[j]; 391 psl->sl_addr[j-1] = psl->sl_addr[j];
422 psl->sl_count--; 392 psl->sl_count--;
423 err = 0; 393 err = 0;
@@ -443,19 +413,19 @@ int ip6_mc_source(int add, int omode, struct sock *sk,
443 newpsl->sl_max = count; 413 newpsl->sl_max = count;
444 newpsl->sl_count = count - IP6_SFBLOCK; 414 newpsl->sl_count = count - IP6_SFBLOCK;
445 if (psl) { 415 if (psl) {
446 for (i=0; i<psl->sl_count; i++) 416 for (i = 0; i < psl->sl_count; i++)
447 newpsl->sl_addr[i] = psl->sl_addr[i]; 417 newpsl->sl_addr[i] = psl->sl_addr[i];
448 sock_kfree_s(sk, psl, IP6_SFLSIZE(psl->sl_max)); 418 sock_kfree_s(sk, psl, IP6_SFLSIZE(psl->sl_max));
449 } 419 }
450 pmc->sflist = psl = newpsl; 420 pmc->sflist = psl = newpsl;
451 } 421 }
452 rv = 1; /* > 0 for insert logic below if sl_count is 0 */ 422 rv = 1; /* > 0 for insert logic below if sl_count is 0 */
453 for (i=0; i<psl->sl_count; i++) { 423 for (i = 0; i < psl->sl_count; i++) {
454 rv = !ipv6_addr_equal(&psl->sl_addr[i], source); 424 rv = !ipv6_addr_equal(&psl->sl_addr[i], source);
455 if (rv == 0) /* There is an error in the address. */ 425 if (rv == 0) /* There is an error in the address. */
456 goto done; 426 goto done;
457 } 427 }
458 for (j=psl->sl_count-1; j>=i; j--) 428 for (j = psl->sl_count-1; j >= i; j--)
459 psl->sl_addr[j+1] = psl->sl_addr[j]; 429 psl->sl_addr[j+1] = psl->sl_addr[j];
460 psl->sl_addr[i] = *source; 430 psl->sl_addr[i] = *source;
461 psl->sl_count++; 431 psl->sl_count++;
@@ -524,7 +494,7 @@ int ip6_mc_msfilter(struct sock *sk, struct group_filter *gsf)
524 goto done; 494 goto done;
525 } 495 }
526 newpsl->sl_max = newpsl->sl_count = gsf->gf_numsrc; 496 newpsl->sl_max = newpsl->sl_count = gsf->gf_numsrc;
527 for (i=0; i<newpsl->sl_count; ++i) { 497 for (i = 0; i < newpsl->sl_count; ++i) {
528 struct sockaddr_in6 *psin6; 498 struct sockaddr_in6 *psin6;
529 499
530 psin6 = (struct sockaddr_in6 *)&gsf->gf_slist[i]; 500 psin6 = (struct sockaddr_in6 *)&gsf->gf_slist[i];
@@ -586,9 +556,8 @@ int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf,
586 } 556 }
587 557
588 err = -EADDRNOTAVAIL; 558 err = -EADDRNOTAVAIL;
589 /* 559 /* changes to the ipv6_mc_list require the socket lock and
590 * changes to the ipv6_mc_list require the socket lock and 560 * rtnl lock. We have the socket lock and rcu read lock,
591 * a read lock on ip6_sk_mc_lock. We have the socket lock,
592 * so reading the list is safe. 561 * so reading the list is safe.
593 */ 562 */
594 563
@@ -612,11 +581,10 @@ int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf,
612 copy_to_user(optval, gsf, GROUP_FILTER_SIZE(0))) { 581 copy_to_user(optval, gsf, GROUP_FILTER_SIZE(0))) {
613 return -EFAULT; 582 return -EFAULT;
614 } 583 }
615 /* changes to psl require the socket lock, a read lock on 584 /* changes to psl require the socket lock, and a write lock
616 * on ipv6_sk_mc_lock and a write lock on pmc->sflock. We 585 * on pmc->sflock. We have the socket lock so reading here is safe.
617 * have the socket lock, so reading here is safe.
618 */ 586 */
619 for (i=0; i<copycount; i++) { 587 for (i = 0; i < copycount; i++) {
620 struct sockaddr_in6 *psin6; 588 struct sockaddr_in6 *psin6;
621 struct sockaddr_storage ss; 589 struct sockaddr_storage ss;
622 590
@@ -658,7 +626,7 @@ bool inet6_mc_check(struct sock *sk, const struct in6_addr *mc_addr,
658 } else { 626 } else {
659 int i; 627 int i;
660 628
661 for (i=0; i<psl->sl_count; i++) { 629 for (i = 0; i < psl->sl_count; i++) {
662 if (ipv6_addr_equal(&psl->sl_addr[i], src_addr)) 630 if (ipv6_addr_equal(&psl->sl_addr[i], src_addr))
663 break; 631 break;
664 } 632 }
@@ -673,14 +641,6 @@ bool inet6_mc_check(struct sock *sk, const struct in6_addr *mc_addr,
673 return rv; 641 return rv;
674} 642}
675 643
676static void ma_put(struct ifmcaddr6 *mc)
677{
678 if (atomic_dec_and_test(&mc->mca_refcnt)) {
679 in6_dev_put(mc->idev);
680 kfree(mc);
681 }
682}
683
684static void igmp6_group_added(struct ifmcaddr6 *mc) 644static void igmp6_group_added(struct ifmcaddr6 *mc)
685{ 645{
686 struct net_device *dev = mc->idev->dev; 646 struct net_device *dev = mc->idev->dev;
@@ -772,7 +732,7 @@ static void mld_add_delrec(struct inet6_dev *idev, struct ifmcaddr6 *im)
772 pmc->mca_tomb = im->mca_tomb; 732 pmc->mca_tomb = im->mca_tomb;
773 pmc->mca_sources = im->mca_sources; 733 pmc->mca_sources = im->mca_sources;
774 im->mca_tomb = im->mca_sources = NULL; 734 im->mca_tomb = im->mca_sources = NULL;
775 for (psf=pmc->mca_sources; psf; psf=psf->sf_next) 735 for (psf = pmc->mca_sources; psf; psf = psf->sf_next)
776 psf->sf_crcount = pmc->mca_crcount; 736 psf->sf_crcount = pmc->mca_crcount;
777 } 737 }
778 spin_unlock_bh(&im->mca_lock); 738 spin_unlock_bh(&im->mca_lock);
@@ -790,7 +750,7 @@ static void mld_del_delrec(struct inet6_dev *idev, const struct in6_addr *pmca)
790 750
791 spin_lock_bh(&idev->mc_lock); 751 spin_lock_bh(&idev->mc_lock);
792 pmc_prev = NULL; 752 pmc_prev = NULL;
793 for (pmc=idev->mc_tomb; pmc; pmc=pmc->next) { 753 for (pmc = idev->mc_tomb; pmc; pmc = pmc->next) {
794 if (ipv6_addr_equal(&pmc->mca_addr, pmca)) 754 if (ipv6_addr_equal(&pmc->mca_addr, pmca))
795 break; 755 break;
796 pmc_prev = pmc; 756 pmc_prev = pmc;
@@ -804,7 +764,7 @@ static void mld_del_delrec(struct inet6_dev *idev, const struct in6_addr *pmca)
804 spin_unlock_bh(&idev->mc_lock); 764 spin_unlock_bh(&idev->mc_lock);
805 765
806 if (pmc) { 766 if (pmc) {
807 for (psf=pmc->mca_tomb; psf; psf=psf_next) { 767 for (psf = pmc->mca_tomb; psf; psf = psf_next) {
808 psf_next = psf->sf_next; 768 psf_next = psf->sf_next;
809 kfree(psf); 769 kfree(psf);
810 } 770 }
@@ -831,14 +791,14 @@ static void mld_clear_delrec(struct inet6_dev *idev)
831 791
832 /* clear dead sources, too */ 792 /* clear dead sources, too */
833 read_lock_bh(&idev->lock); 793 read_lock_bh(&idev->lock);
834 for (pmc=idev->mc_list; pmc; pmc=pmc->next) { 794 for (pmc = idev->mc_list; pmc; pmc = pmc->next) {
835 struct ip6_sf_list *psf, *psf_next; 795 struct ip6_sf_list *psf, *psf_next;
836 796
837 spin_lock_bh(&pmc->mca_lock); 797 spin_lock_bh(&pmc->mca_lock);
838 psf = pmc->mca_tomb; 798 psf = pmc->mca_tomb;
839 pmc->mca_tomb = NULL; 799 pmc->mca_tomb = NULL;
840 spin_unlock_bh(&pmc->mca_lock); 800 spin_unlock_bh(&pmc->mca_lock);
841 for (; psf; psf=psf_next) { 801 for (; psf; psf = psf_next) {
842 psf_next = psf->sf_next; 802 psf_next = psf->sf_next;
843 kfree(psf); 803 kfree(psf);
844 } 804 }
@@ -846,6 +806,48 @@ static void mld_clear_delrec(struct inet6_dev *idev)
846 read_unlock_bh(&idev->lock); 806 read_unlock_bh(&idev->lock);
847} 807}
848 808
809static void mca_get(struct ifmcaddr6 *mc)
810{
811 atomic_inc(&mc->mca_refcnt);
812}
813
814static void ma_put(struct ifmcaddr6 *mc)
815{
816 if (atomic_dec_and_test(&mc->mca_refcnt)) {
817 in6_dev_put(mc->idev);
818 kfree(mc);
819 }
820}
821
822static struct ifmcaddr6 *mca_alloc(struct inet6_dev *idev,
823 const struct in6_addr *addr)
824{
825 struct ifmcaddr6 *mc;
826
827 mc = kzalloc(sizeof(*mc), GFP_ATOMIC);
828 if (mc == NULL)
829 return NULL;
830
831 setup_timer(&mc->mca_timer, igmp6_timer_handler, (unsigned long)mc);
832
833 mc->mca_addr = *addr;
834 mc->idev = idev; /* reference taken by caller */
835 mc->mca_users = 1;
836 /* mca_stamp should be updated upon changes */
837 mc->mca_cstamp = mc->mca_tstamp = jiffies;
838 atomic_set(&mc->mca_refcnt, 1);
839 spin_lock_init(&mc->mca_lock);
840
841 /* initial mode is (EX, empty) */
842 mc->mca_sfmode = MCAST_EXCLUDE;
843 mc->mca_sfcount[MCAST_EXCLUDE] = 1;
844
845 if (ipv6_addr_is_ll_all_nodes(&mc->mca_addr) ||
846 IPV6_ADDR_MC_SCOPE(&mc->mca_addr) < IPV6_ADDR_SCOPE_LINKLOCAL)
847 mc->mca_flags |= MAF_NOREPORT;
848
849 return mc;
850}
849 851
850/* 852/*
851 * device multicast group inc (add if not found) 853 * device multicast group inc (add if not found)
@@ -881,38 +883,20 @@ int ipv6_dev_mc_inc(struct net_device *dev, const struct in6_addr *addr)
881 } 883 }
882 } 884 }
883 885
884 /* 886 mc = mca_alloc(idev, addr);
885 * not found: create a new one. 887 if (!mc) {
886 */
887
888 mc = kzalloc(sizeof(struct ifmcaddr6), GFP_ATOMIC);
889
890 if (mc == NULL) {
891 write_unlock_bh(&idev->lock); 888 write_unlock_bh(&idev->lock);
892 in6_dev_put(idev); 889 in6_dev_put(idev);
893 return -ENOMEM; 890 return -ENOMEM;
894 } 891 }
895 892
896 setup_timer(&mc->mca_timer, igmp6_timer_handler, (unsigned long)mc);
897
898 mc->mca_addr = *addr;
899 mc->idev = idev; /* (reference taken) */
900 mc->mca_users = 1;
901 /* mca_stamp should be updated upon changes */
902 mc->mca_cstamp = mc->mca_tstamp = jiffies;
903 atomic_set(&mc->mca_refcnt, 2);
904 spin_lock_init(&mc->mca_lock);
905
906 /* initial mode is (EX, empty) */
907 mc->mca_sfmode = MCAST_EXCLUDE;
908 mc->mca_sfcount[MCAST_EXCLUDE] = 1;
909
910 if (ipv6_addr_is_ll_all_nodes(&mc->mca_addr) ||
911 IPV6_ADDR_MC_SCOPE(&mc->mca_addr) < IPV6_ADDR_SCOPE_LINKLOCAL)
912 mc->mca_flags |= MAF_NOREPORT;
913
914 mc->next = idev->mc_list; 893 mc->next = idev->mc_list;
915 idev->mc_list = mc; 894 idev->mc_list = mc;
895
896 /* Hold this for the code below before we unlock,
897 * it is already exposed via idev->mc_list.
898 */
899 mca_get(mc);
916 write_unlock_bh(&idev->lock); 900 write_unlock_bh(&idev->lock);
917 901
918 mld_del_delrec(idev, &mc->mca_addr); 902 mld_del_delrec(idev, &mc->mca_addr);
@@ -931,7 +915,7 @@ int __ipv6_dev_mc_dec(struct inet6_dev *idev, const struct in6_addr *addr)
931 ASSERT_RTNL(); 915 ASSERT_RTNL();
932 916
933 write_lock_bh(&idev->lock); 917 write_lock_bh(&idev->lock);
934 for (map = &idev->mc_list; (ma=*map) != NULL; map = &ma->next) { 918 for (map = &idev->mc_list; (ma = *map) != NULL; map = &ma->next) {
935 if (ipv6_addr_equal(&ma->mca_addr, addr)) { 919 if (ipv6_addr_equal(&ma->mca_addr, addr)) {
936 if (--ma->mca_users == 0) { 920 if (--ma->mca_users == 0) {
937 *map = ma->next; 921 *map = ma->next;
@@ -956,7 +940,7 @@ int ipv6_dev_mc_dec(struct net_device *dev, const struct in6_addr *addr)
956 struct inet6_dev *idev; 940 struct inet6_dev *idev;
957 int err; 941 int err;
958 942
959 rcu_read_lock(); 943 ASSERT_RTNL();
960 944
961 idev = __in6_dev_get(dev); 945 idev = __in6_dev_get(dev);
962 if (!idev) 946 if (!idev)
@@ -964,7 +948,6 @@ int ipv6_dev_mc_dec(struct net_device *dev, const struct in6_addr *addr)
964 else 948 else
965 err = __ipv6_dev_mc_dec(idev, addr); 949 err = __ipv6_dev_mc_dec(idev, addr);
966 950
967 rcu_read_unlock();
968 return err; 951 return err;
969} 952}
970 953
@@ -982,7 +965,7 @@ bool ipv6_chk_mcast_addr(struct net_device *dev, const struct in6_addr *group,
982 idev = __in6_dev_get(dev); 965 idev = __in6_dev_get(dev);
983 if (idev) { 966 if (idev) {
984 read_lock_bh(&idev->lock); 967 read_lock_bh(&idev->lock);
985 for (mc = idev->mc_list; mc; mc=mc->next) { 968 for (mc = idev->mc_list; mc; mc = mc->next) {
986 if (ipv6_addr_equal(&mc->mca_addr, group)) 969 if (ipv6_addr_equal(&mc->mca_addr, group))
987 break; 970 break;
988 } 971 }
@@ -991,7 +974,7 @@ bool ipv6_chk_mcast_addr(struct net_device *dev, const struct in6_addr *group,
991 struct ip6_sf_list *psf; 974 struct ip6_sf_list *psf;
992 975
993 spin_lock_bh(&mc->mca_lock); 976 spin_lock_bh(&mc->mca_lock);
994 for (psf=mc->mca_sources;psf;psf=psf->sf_next) { 977 for (psf = mc->mca_sources; psf; psf = psf->sf_next) {
995 if (ipv6_addr_equal(&psf->sf_addr, src_addr)) 978 if (ipv6_addr_equal(&psf->sf_addr, src_addr))
996 break; 979 break;
997 } 980 }
@@ -1000,7 +983,7 @@ bool ipv6_chk_mcast_addr(struct net_device *dev, const struct in6_addr *group,
1000 psf->sf_count[MCAST_EXCLUDE] != 983 psf->sf_count[MCAST_EXCLUDE] !=
1001 mc->mca_sfcount[MCAST_EXCLUDE]; 984 mc->mca_sfcount[MCAST_EXCLUDE];
1002 else 985 else
1003 rv = mc->mca_sfcount[MCAST_EXCLUDE] !=0; 986 rv = mc->mca_sfcount[MCAST_EXCLUDE] != 0;
1004 spin_unlock_bh(&mc->mca_lock); 987 spin_unlock_bh(&mc->mca_lock);
1005 } else 988 } else
1006 rv = true; /* don't filter unspecified source */ 989 rv = true; /* don't filter unspecified source */
@@ -1091,10 +1074,10 @@ static bool mld_xmarksources(struct ifmcaddr6 *pmc, int nsrcs,
1091 int i, scount; 1074 int i, scount;
1092 1075
1093 scount = 0; 1076 scount = 0;
1094 for (psf=pmc->mca_sources; psf; psf=psf->sf_next) { 1077 for (psf = pmc->mca_sources; psf; psf = psf->sf_next) {
1095 if (scount == nsrcs) 1078 if (scount == nsrcs)
1096 break; 1079 break;
1097 for (i=0; i<nsrcs; i++) { 1080 for (i = 0; i < nsrcs; i++) {
1098 /* skip inactive filters */ 1081 /* skip inactive filters */
1099 if (psf->sf_count[MCAST_INCLUDE] || 1082 if (psf->sf_count[MCAST_INCLUDE] ||
1100 pmc->mca_sfcount[MCAST_EXCLUDE] != 1083 pmc->mca_sfcount[MCAST_EXCLUDE] !=
@@ -1124,10 +1107,10 @@ static bool mld_marksources(struct ifmcaddr6 *pmc, int nsrcs,
1124 /* mark INCLUDE-mode sources */ 1107 /* mark INCLUDE-mode sources */
1125 1108
1126 scount = 0; 1109 scount = 0;
1127 for (psf=pmc->mca_sources; psf; psf=psf->sf_next) { 1110 for (psf = pmc->mca_sources; psf; psf = psf->sf_next) {
1128 if (scount == nsrcs) 1111 if (scount == nsrcs)
1129 break; 1112 break;
1130 for (i=0; i<nsrcs; i++) { 1113 for (i = 0; i < nsrcs; i++) {
1131 if (ipv6_addr_equal(&srcs[i], &psf->sf_addr)) { 1114 if (ipv6_addr_equal(&srcs[i], &psf->sf_addr)) {
1132 psf->sf_gsresp = 1; 1115 psf->sf_gsresp = 1;
1133 scount++; 1116 scount++;
@@ -1205,15 +1188,16 @@ static void mld_update_qrv(struct inet6_dev *idev,
1205 * and SHOULD NOT be one. Catch this here if we ever run 1188 * and SHOULD NOT be one. Catch this here if we ever run
1206 * into such a case in future. 1189 * into such a case in future.
1207 */ 1190 */
1191 const int min_qrv = min(MLD_QRV_DEFAULT, sysctl_mld_qrv);
1208 WARN_ON(idev->mc_qrv == 0); 1192 WARN_ON(idev->mc_qrv == 0);
1209 1193
1210 if (mlh2->mld2q_qrv > 0) 1194 if (mlh2->mld2q_qrv > 0)
1211 idev->mc_qrv = mlh2->mld2q_qrv; 1195 idev->mc_qrv = mlh2->mld2q_qrv;
1212 1196
1213 if (unlikely(idev->mc_qrv < 2)) { 1197 if (unlikely(idev->mc_qrv < min_qrv)) {
1214 net_warn_ratelimited("IPv6: MLD: clamping QRV from %u to %u!\n", 1198 net_warn_ratelimited("IPv6: MLD: clamping QRV from %u to %u!\n",
1215 idev->mc_qrv, MLD_QRV_DEFAULT); 1199 idev->mc_qrv, min_qrv);
1216 idev->mc_qrv = MLD_QRV_DEFAULT; 1200 idev->mc_qrv = min_qrv;
1217 } 1201 }
1218} 1202}
1219 1203
@@ -1253,7 +1237,7 @@ static void mld_update_qri(struct inet6_dev *idev,
1253} 1237}
1254 1238
1255static int mld_process_v1(struct inet6_dev *idev, struct mld_msg *mld, 1239static int mld_process_v1(struct inet6_dev *idev, struct mld_msg *mld,
1256 unsigned long *max_delay) 1240 unsigned long *max_delay, bool v1_query)
1257{ 1241{
1258 unsigned long mldv1_md; 1242 unsigned long mldv1_md;
1259 1243
@@ -1261,11 +1245,32 @@ static int mld_process_v1(struct inet6_dev *idev, struct mld_msg *mld,
1261 if (mld_in_v2_mode_only(idev)) 1245 if (mld_in_v2_mode_only(idev))
1262 return -EINVAL; 1246 return -EINVAL;
1263 1247
1264 /* MLDv1 router present */
1265 mldv1_md = ntohs(mld->mld_maxdelay); 1248 mldv1_md = ntohs(mld->mld_maxdelay);
1249
1250 /* When in MLDv1 fallback and a MLDv2 router start-up being
1251 * unaware of current MLDv1 operation, the MRC == MRD mapping
1252 * only works when the exponential algorithm is not being
1253 * used (as MLDv1 is unaware of such things).
1254 *
1255 * According to the RFC author, the MLDv2 implementations
1256 * he's aware of all use a MRC < 32768 on start up queries.
1257 *
1258 * Thus, should we *ever* encounter something else larger
1259 * than that, just assume the maximum possible within our
1260 * reach.
1261 */
1262 if (!v1_query)
1263 mldv1_md = min(mldv1_md, MLDV1_MRD_MAX_COMPAT);
1264
1266 *max_delay = max(msecs_to_jiffies(mldv1_md), 1UL); 1265 *max_delay = max(msecs_to_jiffies(mldv1_md), 1UL);
1267 1266
1268 mld_set_v1_mode(idev); 1267 /* MLDv1 router present: we need to go into v1 mode *only*
1268 * when an MLDv1 query is received as per section 9.12. of
1269 * RFC3810! And we know from RFC2710 section 3.7 that MLDv1
1270 * queries MUST be of exactly 24 octets.
1271 */
1272 if (v1_query)
1273 mld_set_v1_mode(idev);
1269 1274
1270 /* cancel MLDv2 report timer */ 1275 /* cancel MLDv2 report timer */
1271 mld_gq_stop_timer(idev); 1276 mld_gq_stop_timer(idev);
@@ -1280,10 +1285,6 @@ static int mld_process_v1(struct inet6_dev *idev, struct mld_msg *mld,
1280static int mld_process_v2(struct inet6_dev *idev, struct mld2_query *mld, 1285static int mld_process_v2(struct inet6_dev *idev, struct mld2_query *mld,
1281 unsigned long *max_delay) 1286 unsigned long *max_delay)
1282{ 1287{
1283 /* hosts need to stay in MLDv1 mode, discard MLDv2 queries */
1284 if (mld_in_v1_mode(idev))
1285 return -EINVAL;
1286
1287 *max_delay = max(msecs_to_jiffies(mldv2_mrc(mld)), 1UL); 1288 *max_delay = max(msecs_to_jiffies(mldv2_mrc(mld)), 1UL);
1288 1289
1289 mld_update_qrv(idev, mld); 1290 mld_update_qrv(idev, mld);
@@ -1340,8 +1341,11 @@ int igmp6_event_query(struct sk_buff *skb)
1340 !(group_type&IPV6_ADDR_MULTICAST)) 1341 !(group_type&IPV6_ADDR_MULTICAST))
1341 return -EINVAL; 1342 return -EINVAL;
1342 1343
1343 if (len == MLD_V1_QUERY_LEN) { 1344 if (len < MLD_V1_QUERY_LEN) {
1344 err = mld_process_v1(idev, mld, &max_delay); 1345 return -EINVAL;
1346 } else if (len == MLD_V1_QUERY_LEN || mld_in_v1_mode(idev)) {
1347 err = mld_process_v1(idev, mld, &max_delay,
1348 len == MLD_V1_QUERY_LEN);
1345 if (err < 0) 1349 if (err < 0)
1346 return err; 1350 return err;
1347 } else if (len >= MLD_V2_QUERY_LEN_MIN) { 1351 } else if (len >= MLD_V2_QUERY_LEN_MIN) {
@@ -1373,18 +1377,19 @@ int igmp6_event_query(struct sk_buff *skb)
1373 mlh2 = (struct mld2_query *)skb_transport_header(skb); 1377 mlh2 = (struct mld2_query *)skb_transport_header(skb);
1374 mark = 1; 1378 mark = 1;
1375 } 1379 }
1376 } else 1380 } else {
1377 return -EINVAL; 1381 return -EINVAL;
1382 }
1378 1383
1379 read_lock_bh(&idev->lock); 1384 read_lock_bh(&idev->lock);
1380 if (group_type == IPV6_ADDR_ANY) { 1385 if (group_type == IPV6_ADDR_ANY) {
1381 for (ma = idev->mc_list; ma; ma=ma->next) { 1386 for (ma = idev->mc_list; ma; ma = ma->next) {
1382 spin_lock_bh(&ma->mca_lock); 1387 spin_lock_bh(&ma->mca_lock);
1383 igmp6_group_queried(ma, max_delay); 1388 igmp6_group_queried(ma, max_delay);
1384 spin_unlock_bh(&ma->mca_lock); 1389 spin_unlock_bh(&ma->mca_lock);
1385 } 1390 }
1386 } else { 1391 } else {
1387 for (ma = idev->mc_list; ma; ma=ma->next) { 1392 for (ma = idev->mc_list; ma; ma = ma->next) {
1388 if (!ipv6_addr_equal(group, &ma->mca_addr)) 1393 if (!ipv6_addr_equal(group, &ma->mca_addr))
1389 continue; 1394 continue;
1390 spin_lock_bh(&ma->mca_lock); 1395 spin_lock_bh(&ma->mca_lock);
@@ -1448,7 +1453,7 @@ int igmp6_event_report(struct sk_buff *skb)
1448 */ 1453 */
1449 1454
1450 read_lock_bh(&idev->lock); 1455 read_lock_bh(&idev->lock);
1451 for (ma = idev->mc_list; ma; ma=ma->next) { 1456 for (ma = idev->mc_list; ma; ma = ma->next) {
1452 if (ipv6_addr_equal(&ma->mca_addr, &mld->mld_mca)) { 1457 if (ipv6_addr_equal(&ma->mca_addr, &mld->mld_mca)) {
1453 spin_lock(&ma->mca_lock); 1458 spin_lock(&ma->mca_lock);
1454 if (del_timer(&ma->mca_timer)) 1459 if (del_timer(&ma->mca_timer))
@@ -1512,7 +1517,7 @@ mld_scount(struct ifmcaddr6 *pmc, int type, int gdeleted, int sdeleted)
1512 struct ip6_sf_list *psf; 1517 struct ip6_sf_list *psf;
1513 int scount = 0; 1518 int scount = 0;
1514 1519
1515 for (psf=pmc->mca_sources; psf; psf=psf->sf_next) { 1520 for (psf = pmc->mca_sources; psf; psf = psf->sf_next) {
1516 if (!is_in(pmc, psf, type, gdeleted, sdeleted)) 1521 if (!is_in(pmc, psf, type, gdeleted, sdeleted))
1517 continue; 1522 continue;
1518 scount++; 1523 scount++;
@@ -1726,7 +1731,7 @@ static struct sk_buff *add_grec(struct sk_buff *skb, struct ifmcaddr6 *pmc,
1726 } 1731 }
1727 first = 1; 1732 first = 1;
1728 psf_prev = NULL; 1733 psf_prev = NULL;
1729 for (psf=*psf_list; psf; psf=psf_next) { 1734 for (psf = *psf_list; psf; psf = psf_next) {
1730 struct in6_addr *psrc; 1735 struct in6_addr *psrc;
1731 1736
1732 psf_next = psf->sf_next; 1737 psf_next = psf->sf_next;
@@ -1805,7 +1810,7 @@ static void mld_send_report(struct inet6_dev *idev, struct ifmcaddr6 *pmc)
1805 1810
1806 read_lock_bh(&idev->lock); 1811 read_lock_bh(&idev->lock);
1807 if (!pmc) { 1812 if (!pmc) {
1808 for (pmc=idev->mc_list; pmc; pmc=pmc->next) { 1813 for (pmc = idev->mc_list; pmc; pmc = pmc->next) {
1809 if (pmc->mca_flags & MAF_NOREPORT) 1814 if (pmc->mca_flags & MAF_NOREPORT)
1810 continue; 1815 continue;
1811 spin_lock_bh(&pmc->mca_lock); 1816 spin_lock_bh(&pmc->mca_lock);
@@ -1838,7 +1843,7 @@ static void mld_clear_zeros(struct ip6_sf_list **ppsf)
1838 struct ip6_sf_list *psf_prev, *psf_next, *psf; 1843 struct ip6_sf_list *psf_prev, *psf_next, *psf;
1839 1844
1840 psf_prev = NULL; 1845 psf_prev = NULL;
1841 for (psf=*ppsf; psf; psf = psf_next) { 1846 for (psf = *ppsf; psf; psf = psf_next) {
1842 psf_next = psf->sf_next; 1847 psf_next = psf->sf_next;
1843 if (psf->sf_crcount == 0) { 1848 if (psf->sf_crcount == 0) {
1844 if (psf_prev) 1849 if (psf_prev)
@@ -1862,7 +1867,7 @@ static void mld_send_cr(struct inet6_dev *idev)
1862 1867
1863 /* deleted MCA's */ 1868 /* deleted MCA's */
1864 pmc_prev = NULL; 1869 pmc_prev = NULL;
1865 for (pmc=idev->mc_tomb; pmc; pmc=pmc_next) { 1870 for (pmc = idev->mc_tomb; pmc; pmc = pmc_next) {
1866 pmc_next = pmc->next; 1871 pmc_next = pmc->next;
1867 if (pmc->mca_sfmode == MCAST_INCLUDE) { 1872 if (pmc->mca_sfmode == MCAST_INCLUDE) {
1868 type = MLD2_BLOCK_OLD_SOURCES; 1873 type = MLD2_BLOCK_OLD_SOURCES;
@@ -1895,7 +1900,7 @@ static void mld_send_cr(struct inet6_dev *idev)
1895 spin_unlock(&idev->mc_lock); 1900 spin_unlock(&idev->mc_lock);
1896 1901
1897 /* change recs */ 1902 /* change recs */
1898 for (pmc=idev->mc_list; pmc; pmc=pmc->next) { 1903 for (pmc = idev->mc_list; pmc; pmc = pmc->next) {
1899 spin_lock_bh(&pmc->mca_lock); 1904 spin_lock_bh(&pmc->mca_lock);
1900 if (pmc->mca_sfcount[MCAST_EXCLUDE]) { 1905 if (pmc->mca_sfcount[MCAST_EXCLUDE]) {
1901 type = MLD2_BLOCK_OLD_SOURCES; 1906 type = MLD2_BLOCK_OLD_SOURCES;
@@ -2032,7 +2037,7 @@ static void mld_send_initial_cr(struct inet6_dev *idev)
2032 2037
2033 skb = NULL; 2038 skb = NULL;
2034 read_lock_bh(&idev->lock); 2039 read_lock_bh(&idev->lock);
2035 for (pmc=idev->mc_list; pmc; pmc=pmc->next) { 2040 for (pmc = idev->mc_list; pmc; pmc = pmc->next) {
2036 spin_lock_bh(&pmc->mca_lock); 2041 spin_lock_bh(&pmc->mca_lock);
2037 if (pmc->mca_sfcount[MCAST_EXCLUDE]) 2042 if (pmc->mca_sfcount[MCAST_EXCLUDE])
2038 type = MLD2_CHANGE_TO_EXCLUDE; 2043 type = MLD2_CHANGE_TO_EXCLUDE;
@@ -2077,7 +2082,7 @@ static int ip6_mc_del1_src(struct ifmcaddr6 *pmc, int sfmode,
2077 int rv = 0; 2082 int rv = 0;
2078 2083
2079 psf_prev = NULL; 2084 psf_prev = NULL;
2080 for (psf=pmc->mca_sources; psf; psf=psf->sf_next) { 2085 for (psf = pmc->mca_sources; psf; psf = psf->sf_next) {
2081 if (ipv6_addr_equal(&psf->sf_addr, psfsrc)) 2086 if (ipv6_addr_equal(&psf->sf_addr, psfsrc))
2082 break; 2087 break;
2083 psf_prev = psf; 2088 psf_prev = psf;
@@ -2118,7 +2123,7 @@ static int ip6_mc_del_src(struct inet6_dev *idev, const struct in6_addr *pmca,
2118 if (!idev) 2123 if (!idev)
2119 return -ENODEV; 2124 return -ENODEV;
2120 read_lock_bh(&idev->lock); 2125 read_lock_bh(&idev->lock);
2121 for (pmc=idev->mc_list; pmc; pmc=pmc->next) { 2126 for (pmc = idev->mc_list; pmc; pmc = pmc->next) {
2122 if (ipv6_addr_equal(pmca, &pmc->mca_addr)) 2127 if (ipv6_addr_equal(pmca, &pmc->mca_addr))
2123 break; 2128 break;
2124 } 2129 }
@@ -2138,7 +2143,7 @@ static int ip6_mc_del_src(struct inet6_dev *idev, const struct in6_addr *pmca,
2138 pmc->mca_sfcount[sfmode]--; 2143 pmc->mca_sfcount[sfmode]--;
2139 } 2144 }
2140 err = 0; 2145 err = 0;
2141 for (i=0; i<sfcount; i++) { 2146 for (i = 0; i < sfcount; i++) {
2142 int rv = ip6_mc_del1_src(pmc, sfmode, &psfsrc[i]); 2147 int rv = ip6_mc_del1_src(pmc, sfmode, &psfsrc[i]);
2143 2148
2144 changerec |= rv > 0; 2149 changerec |= rv > 0;
@@ -2154,7 +2159,7 @@ static int ip6_mc_del_src(struct inet6_dev *idev, const struct in6_addr *pmca,
2154 pmc->mca_sfmode = MCAST_INCLUDE; 2159 pmc->mca_sfmode = MCAST_INCLUDE;
2155 pmc->mca_crcount = idev->mc_qrv; 2160 pmc->mca_crcount = idev->mc_qrv;
2156 idev->mc_ifc_count = pmc->mca_crcount; 2161 idev->mc_ifc_count = pmc->mca_crcount;
2157 for (psf=pmc->mca_sources; psf; psf = psf->sf_next) 2162 for (psf = pmc->mca_sources; psf; psf = psf->sf_next)
2158 psf->sf_crcount = 0; 2163 psf->sf_crcount = 0;
2159 mld_ifc_event(pmc->idev); 2164 mld_ifc_event(pmc->idev);
2160 } else if (sf_setstate(pmc) || changerec) 2165 } else if (sf_setstate(pmc) || changerec)
@@ -2173,7 +2178,7 @@ static int ip6_mc_add1_src(struct ifmcaddr6 *pmc, int sfmode,
2173 struct ip6_sf_list *psf, *psf_prev; 2178 struct ip6_sf_list *psf, *psf_prev;
2174 2179
2175 psf_prev = NULL; 2180 psf_prev = NULL;
2176 for (psf=pmc->mca_sources; psf; psf=psf->sf_next) { 2181 for (psf = pmc->mca_sources; psf; psf = psf->sf_next) {
2177 if (ipv6_addr_equal(&psf->sf_addr, psfsrc)) 2182 if (ipv6_addr_equal(&psf->sf_addr, psfsrc))
2178 break; 2183 break;
2179 psf_prev = psf; 2184 psf_prev = psf;
@@ -2198,7 +2203,7 @@ static void sf_markstate(struct ifmcaddr6 *pmc)
2198 struct ip6_sf_list *psf; 2203 struct ip6_sf_list *psf;
2199 int mca_xcount = pmc->mca_sfcount[MCAST_EXCLUDE]; 2204 int mca_xcount = pmc->mca_sfcount[MCAST_EXCLUDE];
2200 2205
2201 for (psf=pmc->mca_sources; psf; psf=psf->sf_next) 2206 for (psf = pmc->mca_sources; psf; psf = psf->sf_next)
2202 if (pmc->mca_sfcount[MCAST_EXCLUDE]) { 2207 if (pmc->mca_sfcount[MCAST_EXCLUDE]) {
2203 psf->sf_oldin = mca_xcount == 2208 psf->sf_oldin = mca_xcount ==
2204 psf->sf_count[MCAST_EXCLUDE] && 2209 psf->sf_count[MCAST_EXCLUDE] &&
@@ -2215,7 +2220,7 @@ static int sf_setstate(struct ifmcaddr6 *pmc)
2215 int new_in, rv; 2220 int new_in, rv;
2216 2221
2217 rv = 0; 2222 rv = 0;
2218 for (psf=pmc->mca_sources; psf; psf=psf->sf_next) { 2223 for (psf = pmc->mca_sources; psf; psf = psf->sf_next) {
2219 if (pmc->mca_sfcount[MCAST_EXCLUDE]) { 2224 if (pmc->mca_sfcount[MCAST_EXCLUDE]) {
2220 new_in = mca_xcount == psf->sf_count[MCAST_EXCLUDE] && 2225 new_in = mca_xcount == psf->sf_count[MCAST_EXCLUDE] &&
2221 !psf->sf_count[MCAST_INCLUDE]; 2226 !psf->sf_count[MCAST_INCLUDE];
@@ -2225,8 +2230,8 @@ static int sf_setstate(struct ifmcaddr6 *pmc)
2225 if (!psf->sf_oldin) { 2230 if (!psf->sf_oldin) {
2226 struct ip6_sf_list *prev = NULL; 2231 struct ip6_sf_list *prev = NULL;
2227 2232
2228 for (dpsf=pmc->mca_tomb; dpsf; 2233 for (dpsf = pmc->mca_tomb; dpsf;
2229 dpsf=dpsf->sf_next) { 2234 dpsf = dpsf->sf_next) {
2230 if (ipv6_addr_equal(&dpsf->sf_addr, 2235 if (ipv6_addr_equal(&dpsf->sf_addr,
2231 &psf->sf_addr)) 2236 &psf->sf_addr))
2232 break; 2237 break;
@@ -2248,7 +2253,7 @@ static int sf_setstate(struct ifmcaddr6 *pmc)
2248 * add or update "delete" records if an active filter 2253 * add or update "delete" records if an active filter
2249 * is now inactive 2254 * is now inactive
2250 */ 2255 */
2251 for (dpsf=pmc->mca_tomb; dpsf; dpsf=dpsf->sf_next) 2256 for (dpsf = pmc->mca_tomb; dpsf; dpsf = dpsf->sf_next)
2252 if (ipv6_addr_equal(&dpsf->sf_addr, 2257 if (ipv6_addr_equal(&dpsf->sf_addr,
2253 &psf->sf_addr)) 2258 &psf->sf_addr))
2254 break; 2259 break;
@@ -2282,7 +2287,7 @@ static int ip6_mc_add_src(struct inet6_dev *idev, const struct in6_addr *pmca,
2282 if (!idev) 2287 if (!idev)
2283 return -ENODEV; 2288 return -ENODEV;
2284 read_lock_bh(&idev->lock); 2289 read_lock_bh(&idev->lock);
2285 for (pmc=idev->mc_list; pmc; pmc=pmc->next) { 2290 for (pmc = idev->mc_list; pmc; pmc = pmc->next) {
2286 if (ipv6_addr_equal(pmca, &pmc->mca_addr)) 2291 if (ipv6_addr_equal(pmca, &pmc->mca_addr))
2287 break; 2292 break;
2288 } 2293 }
@@ -2298,7 +2303,7 @@ static int ip6_mc_add_src(struct inet6_dev *idev, const struct in6_addr *pmca,
2298 if (!delta) 2303 if (!delta)
2299 pmc->mca_sfcount[sfmode]++; 2304 pmc->mca_sfcount[sfmode]++;
2300 err = 0; 2305 err = 0;
2301 for (i=0; i<sfcount; i++) { 2306 for (i = 0; i < sfcount; i++) {
2302 err = ip6_mc_add1_src(pmc, sfmode, &psfsrc[i]); 2307 err = ip6_mc_add1_src(pmc, sfmode, &psfsrc[i]);
2303 if (err) 2308 if (err)
2304 break; 2309 break;
@@ -2308,7 +2313,7 @@ static int ip6_mc_add_src(struct inet6_dev *idev, const struct in6_addr *pmca,
2308 2313
2309 if (!delta) 2314 if (!delta)
2310 pmc->mca_sfcount[sfmode]--; 2315 pmc->mca_sfcount[sfmode]--;
2311 for (j=0; j<i; j++) 2316 for (j = 0; j < i; j++)
2312 ip6_mc_del1_src(pmc, sfmode, &psfsrc[j]); 2317 ip6_mc_del1_src(pmc, sfmode, &psfsrc[j]);
2313 } else if (isexclude != (pmc->mca_sfcount[MCAST_EXCLUDE] != 0)) { 2318 } else if (isexclude != (pmc->mca_sfcount[MCAST_EXCLUDE] != 0)) {
2314 struct ip6_sf_list *psf; 2319 struct ip6_sf_list *psf;
@@ -2322,7 +2327,7 @@ static int ip6_mc_add_src(struct inet6_dev *idev, const struct in6_addr *pmca,
2322 2327
2323 pmc->mca_crcount = idev->mc_qrv; 2328 pmc->mca_crcount = idev->mc_qrv;
2324 idev->mc_ifc_count = pmc->mca_crcount; 2329 idev->mc_ifc_count = pmc->mca_crcount;
2325 for (psf=pmc->mca_sources; psf; psf = psf->sf_next) 2330 for (psf = pmc->mca_sources; psf; psf = psf->sf_next)
2326 psf->sf_crcount = 0; 2331 psf->sf_crcount = 0;
2327 mld_ifc_event(idev); 2332 mld_ifc_event(idev);
2328 } else if (sf_setstate(pmc)) 2333 } else if (sf_setstate(pmc))
@@ -2336,12 +2341,12 @@ static void ip6_mc_clear_src(struct ifmcaddr6 *pmc)
2336{ 2341{
2337 struct ip6_sf_list *psf, *nextpsf; 2342 struct ip6_sf_list *psf, *nextpsf;
2338 2343
2339 for (psf=pmc->mca_tomb; psf; psf=nextpsf) { 2344 for (psf = pmc->mca_tomb; psf; psf = nextpsf) {
2340 nextpsf = psf->sf_next; 2345 nextpsf = psf->sf_next;
2341 kfree(psf); 2346 kfree(psf);
2342 } 2347 }
2343 pmc->mca_tomb = NULL; 2348 pmc->mca_tomb = NULL;
2344 for (psf=pmc->mca_sources; psf; psf=nextpsf) { 2349 for (psf = pmc->mca_sources; psf; psf = nextpsf) {
2345 nextpsf = psf->sf_next; 2350 nextpsf = psf->sf_next;
2346 kfree(psf); 2351 kfree(psf);
2347 } 2352 }
@@ -2380,7 +2385,7 @@ static int ip6_mc_leave_src(struct sock *sk, struct ipv6_mc_socklist *iml,
2380{ 2385{
2381 int err; 2386 int err;
2382 2387
2383 /* callers have the socket lock and a write lock on ipv6_sk_mc_lock, 2388 /* callers have the socket lock and rtnl lock
2384 * so no other readers or writers of iml or its sflist 2389 * so no other readers or writers of iml or its sflist
2385 */ 2390 */
2386 if (!iml->sflist) { 2391 if (!iml->sflist) {
@@ -2485,13 +2490,21 @@ void ipv6_mc_down(struct inet6_dev *idev)
2485 mld_gq_stop_timer(idev); 2490 mld_gq_stop_timer(idev);
2486 mld_dad_stop_timer(idev); 2491 mld_dad_stop_timer(idev);
2487 2492
2488 for (i = idev->mc_list; i; i=i->next) 2493 for (i = idev->mc_list; i; i = i->next)
2489 igmp6_group_dropped(i); 2494 igmp6_group_dropped(i);
2490 read_unlock_bh(&idev->lock); 2495 read_unlock_bh(&idev->lock);
2491 2496
2492 mld_clear_delrec(idev); 2497 mld_clear_delrec(idev);
2493} 2498}
2494 2499
2500static void ipv6_mc_reset(struct inet6_dev *idev)
2501{
2502 idev->mc_qrv = sysctl_mld_qrv;
2503 idev->mc_qi = MLD_QI_DEFAULT;
2504 idev->mc_qri = MLD_QRI_DEFAULT;
2505 idev->mc_v1_seen = 0;
2506 idev->mc_maxdelay = unsolicited_report_interval(idev);
2507}
2495 2508
2496/* Device going up */ 2509/* Device going up */
2497 2510
@@ -2502,7 +2515,8 @@ void ipv6_mc_up(struct inet6_dev *idev)
2502 /* Install multicast list, except for all-nodes (already installed) */ 2515 /* Install multicast list, except for all-nodes (already installed) */
2503 2516
2504 read_lock_bh(&idev->lock); 2517 read_lock_bh(&idev->lock);
2505 for (i = idev->mc_list; i; i=i->next) 2518 ipv6_mc_reset(idev);
2519 for (i = idev->mc_list; i; i = i->next)
2506 igmp6_group_added(i); 2520 igmp6_group_added(i);
2507 read_unlock_bh(&idev->lock); 2521 read_unlock_bh(&idev->lock);
2508} 2522}
@@ -2522,13 +2536,7 @@ void ipv6_mc_init_dev(struct inet6_dev *idev)
2522 (unsigned long)idev); 2536 (unsigned long)idev);
2523 setup_timer(&idev->mc_dad_timer, mld_dad_timer_expire, 2537 setup_timer(&idev->mc_dad_timer, mld_dad_timer_expire,
2524 (unsigned long)idev); 2538 (unsigned long)idev);
2525 2539 ipv6_mc_reset(idev);
2526 idev->mc_qrv = MLD_QRV_DEFAULT;
2527 idev->mc_qi = MLD_QI_DEFAULT;
2528 idev->mc_qri = MLD_QRI_DEFAULT;
2529
2530 idev->mc_maxdelay = unsolicited_report_interval(idev);
2531 idev->mc_v1_seen = 0;
2532 write_unlock_bh(&idev->lock); 2540 write_unlock_bh(&idev->lock);
2533} 2541}
2534 2542
diff --git a/net/ipv6/mip6.c b/net/ipv6/mip6.c
index db9b6cbc9db3..f61429d391d3 100644
--- a/net/ipv6/mip6.c
+++ b/net/ipv6/mip6.c
@@ -336,11 +336,10 @@ static void mip6_destopt_destroy(struct xfrm_state *x)
336{ 336{
337} 337}
338 338
339static const struct xfrm_type mip6_destopt_type = 339static const struct xfrm_type mip6_destopt_type = {
340{
341 .description = "MIP6DESTOPT", 340 .description = "MIP6DESTOPT",
342 .owner = THIS_MODULE, 341 .owner = THIS_MODULE,
343 .proto = IPPROTO_DSTOPTS, 342 .proto = IPPROTO_DSTOPTS,
344 .flags = XFRM_TYPE_NON_FRAGMENT | XFRM_TYPE_LOCAL_COADDR, 343 .flags = XFRM_TYPE_NON_FRAGMENT | XFRM_TYPE_LOCAL_COADDR,
345 .init_state = mip6_destopt_init_state, 344 .init_state = mip6_destopt_init_state,
346 .destructor = mip6_destopt_destroy, 345 .destructor = mip6_destopt_destroy,
@@ -469,11 +468,10 @@ static void mip6_rthdr_destroy(struct xfrm_state *x)
469{ 468{
470} 469}
471 470
472static const struct xfrm_type mip6_rthdr_type = 471static const struct xfrm_type mip6_rthdr_type = {
473{
474 .description = "MIP6RT", 472 .description = "MIP6RT",
475 .owner = THIS_MODULE, 473 .owner = THIS_MODULE,
476 .proto = IPPROTO_ROUTING, 474 .proto = IPPROTO_ROUTING,
477 .flags = XFRM_TYPE_NON_FRAGMENT | XFRM_TYPE_REMOTE_COADDR, 475 .flags = XFRM_TYPE_NON_FRAGMENT | XFRM_TYPE_REMOTE_COADDR,
478 .init_state = mip6_rthdr_init_state, 476 .init_state = mip6_rthdr_init_state,
479 .destructor = mip6_rthdr_destroy, 477 .destructor = mip6_rthdr_destroy,
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 339078f95d1b..4cb45c1079a2 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -175,7 +175,7 @@ static struct nd_opt_hdr *ndisc_next_option(struct nd_opt_hdr *cur,
175 type = cur->nd_opt_type; 175 type = cur->nd_opt_type;
176 do { 176 do {
177 cur = ((void *)cur) + (cur->nd_opt_len << 3); 177 cur = ((void *)cur) + (cur->nd_opt_len << 3);
178 } while(cur < end && cur->nd_opt_type != type); 178 } while (cur < end && cur->nd_opt_type != type);
179 return cur <= end && cur->nd_opt_type == type ? cur : NULL; 179 return cur <= end && cur->nd_opt_type == type ? cur : NULL;
180} 180}
181 181
@@ -192,7 +192,7 @@ static struct nd_opt_hdr *ndisc_next_useropt(struct nd_opt_hdr *cur,
192 return NULL; 192 return NULL;
193 do { 193 do {
194 cur = ((void *)cur) + (cur->nd_opt_len << 3); 194 cur = ((void *)cur) + (cur->nd_opt_len << 3);
195 } while(cur < end && !ndisc_is_useropt(cur)); 195 } while (cur < end && !ndisc_is_useropt(cur));
196 return cur <= end && ndisc_is_useropt(cur) ? cur : NULL; 196 return cur <= end && ndisc_is_useropt(cur) ? cur : NULL;
197} 197}
198 198
@@ -284,7 +284,6 @@ int ndisc_mc_map(const struct in6_addr *addr, char *buf, struct net_device *dev,
284 } 284 }
285 return -EINVAL; 285 return -EINVAL;
286} 286}
287
288EXPORT_SYMBOL(ndisc_mc_map); 287EXPORT_SYMBOL(ndisc_mc_map);
289 288
290static u32 ndisc_hash(const void *pkey, 289static u32 ndisc_hash(const void *pkey,
@@ -296,7 +295,7 @@ static u32 ndisc_hash(const void *pkey,
296 295
297static int ndisc_constructor(struct neighbour *neigh) 296static int ndisc_constructor(struct neighbour *neigh)
298{ 297{
299 struct in6_addr *addr = (struct in6_addr*)&neigh->primary_key; 298 struct in6_addr *addr = (struct in6_addr *)&neigh->primary_key;
300 struct net_device *dev = neigh->dev; 299 struct net_device *dev = neigh->dev;
301 struct inet6_dev *in6_dev; 300 struct inet6_dev *in6_dev;
302 struct neigh_parms *parms; 301 struct neigh_parms *parms;
@@ -344,7 +343,7 @@ static int ndisc_constructor(struct neighbour *neigh)
344 343
345static int pndisc_constructor(struct pneigh_entry *n) 344static int pndisc_constructor(struct pneigh_entry *n)
346{ 345{
347 struct in6_addr *addr = (struct in6_addr*)&n->key; 346 struct in6_addr *addr = (struct in6_addr *)&n->key;
348 struct in6_addr maddr; 347 struct in6_addr maddr;
349 struct net_device *dev = n->dev; 348 struct net_device *dev = n->dev;
350 349
@@ -357,7 +356,7 @@ static int pndisc_constructor(struct pneigh_entry *n)
357 356
358static void pndisc_destructor(struct pneigh_entry *n) 357static void pndisc_destructor(struct pneigh_entry *n)
359{ 358{
360 struct in6_addr *addr = (struct in6_addr*)&n->key; 359 struct in6_addr *addr = (struct in6_addr *)&n->key;
361 struct in6_addr maddr; 360 struct in6_addr maddr;
362 struct net_device *dev = n->dev; 361 struct net_device *dev = n->dev;
363 362
@@ -1065,7 +1064,7 @@ static void ndisc_router_discovery(struct sk_buff *skb)
1065 int optlen; 1064 int optlen;
1066 unsigned int pref = 0; 1065 unsigned int pref = 0;
1067 1066
1068 __u8 * opt = (__u8 *)(ra_msg + 1); 1067 __u8 *opt = (__u8 *)(ra_msg + 1);
1069 1068
1070 optlen = (skb_tail_pointer(skb) - skb_transport_header(skb)) - 1069 optlen = (skb_tail_pointer(skb) - skb_transport_header(skb)) -
1071 sizeof(struct ra_msg); 1070 sizeof(struct ra_msg);
@@ -1319,7 +1318,7 @@ skip_linkparms:
1319 continue; 1318 continue;
1320 if (ri->prefix_len > in6_dev->cnf.accept_ra_rt_info_max_plen) 1319 if (ri->prefix_len > in6_dev->cnf.accept_ra_rt_info_max_plen)
1321 continue; 1320 continue;
1322 rt6_route_rcv(skb->dev, (u8*)p, (p->nd_opt_len) << 3, 1321 rt6_route_rcv(skb->dev, (u8 *)p, (p->nd_opt_len) << 3,
1323 &ipv6_hdr(skb)->saddr); 1322 &ipv6_hdr(skb)->saddr);
1324 } 1323 }
1325 } 1324 }
@@ -1352,7 +1351,7 @@ skip_routeinfo:
1352 __be32 n; 1351 __be32 n;
1353 u32 mtu; 1352 u32 mtu;
1354 1353
1355 memcpy(&n, ((u8*)(ndopts.nd_opts_mtu+1))+2, sizeof(mtu)); 1354 memcpy(&n, ((u8 *)(ndopts.nd_opts_mtu+1))+2, sizeof(mtu));
1356 mtu = ntohl(n); 1355 mtu = ntohl(n);
1357 1356
1358 if (mtu < IPV6_MIN_MTU || mtu > skb->dev->mtu) { 1357 if (mtu < IPV6_MIN_MTU || mtu > skb->dev->mtu) {
diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index 2812816aabdc..6af874fc187f 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -40,18 +40,13 @@ config NFT_CHAIN_ROUTE_IPV6
40 fields such as the source, destination, flowlabel, hop-limit and 40 fields such as the source, destination, flowlabel, hop-limit and
41 the packet mark. 41 the packet mark.
42 42
43config NFT_CHAIN_NAT_IPV6 43config NF_REJECT_IPV6
44 depends on NF_TABLES_IPV6 44 tristate "IPv6 packet rejection"
45 depends on NF_NAT_IPV6 && NFT_NAT 45 default m if NETFILTER_ADVANCED=n
46 tristate "IPv6 nf_tables nat chain support"
47 help
48 This option enables the "nat" chain for IPv6 in nf_tables. This
49 chain type is used to perform Network Address Translation (NAT)
50 packet transformations such as the source, destination address and
51 source and destination ports.
52 46
53config NFT_REJECT_IPV6 47config NFT_REJECT_IPV6
54 depends on NF_TABLES_IPV6 48 depends on NF_TABLES_IPV6
49 select NF_REJECT_IPV6
55 default NFT_REJECT 50 default NFT_REJECT
56 tristate 51 tristate
57 52
@@ -70,6 +65,34 @@ config NF_NAT_IPV6
70 forms of full Network Address Port Translation. This can be 65 forms of full Network Address Port Translation. This can be
71 controlled by iptables or nft. 66 controlled by iptables or nft.
72 67
68if NF_NAT_IPV6
69
70config NFT_CHAIN_NAT_IPV6
71 depends on NF_TABLES_IPV6
72 tristate "IPv6 nf_tables nat chain support"
73 help
74 This option enables the "nat" chain for IPv6 in nf_tables. This
75 chain type is used to perform Network Address Translation (NAT)
76 packet transformations such as the source, destination address and
77 source and destination ports.
78
79config NF_NAT_MASQUERADE_IPV6
80 tristate "IPv6 masquerade support"
81 help
82 This is the kernel functionality to provide NAT in the masquerade
83 flavour (automatic source address selection) for IPv6.
84
85config NFT_MASQ_IPV6
86 tristate "IPv6 masquerade support for nf_tables"
87 depends on NF_TABLES_IPV6
88 depends on NFT_MASQ
89 select NF_NAT_MASQUERADE_IPV6
90 help
91 This is the expression that provides IPv4 masquerading support for
92 nf_tables.
93
94endif # NF_NAT_IPV6
95
73config IP6_NF_IPTABLES 96config IP6_NF_IPTABLES
74 tristate "IP6 tables support (required for filtering)" 97 tristate "IP6 tables support (required for filtering)"
75 depends on INET && IPV6 98 depends on INET && IPV6
@@ -190,6 +213,7 @@ config IP6_NF_FILTER
190config IP6_NF_TARGET_REJECT 213config IP6_NF_TARGET_REJECT
191 tristate "REJECT target support" 214 tristate "REJECT target support"
192 depends on IP6_NF_FILTER 215 depends on IP6_NF_FILTER
216 select NF_REJECT_IPV6
193 default m if NETFILTER_ADVANCED=n 217 default m if NETFILTER_ADVANCED=n
194 help 218 help
195 The REJECT target allows a filtering rule to specify that an ICMPv6 219 The REJECT target allows a filtering rule to specify that an ICMPv6
@@ -260,6 +284,7 @@ if IP6_NF_NAT
260 284
261config IP6_NF_TARGET_MASQUERADE 285config IP6_NF_TARGET_MASQUERADE
262 tristate "MASQUERADE target support" 286 tristate "MASQUERADE target support"
287 select NF_NAT_MASQUERADE_IPV6
263 help 288 help
264 Masquerading is a special case of NAT: all outgoing connections are 289 Masquerading is a special case of NAT: all outgoing connections are
265 changed to seem to come from a particular interface's address, and 290 changed to seem to come from a particular interface's address, and
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index c3d3286db4bb..fbb25f01143c 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -18,6 +18,7 @@ obj-$(CONFIG_NF_CONNTRACK_IPV6) += nf_conntrack_ipv6.o
18 18
19nf_nat_ipv6-y := nf_nat_l3proto_ipv6.o nf_nat_proto_icmpv6.o 19nf_nat_ipv6-y := nf_nat_l3proto_ipv6.o nf_nat_proto_icmpv6.o
20obj-$(CONFIG_NF_NAT_IPV6) += nf_nat_ipv6.o 20obj-$(CONFIG_NF_NAT_IPV6) += nf_nat_ipv6.o
21obj-$(CONFIG_NF_NAT_MASQUERADE_IPV6) += nf_nat_masquerade_ipv6.o
21 22
22# defrag 23# defrag
23nf_defrag_ipv6-y := nf_defrag_ipv6_hooks.o nf_conntrack_reasm.o 24nf_defrag_ipv6-y := nf_defrag_ipv6_hooks.o nf_conntrack_reasm.o
@@ -26,11 +27,15 @@ obj-$(CONFIG_NF_DEFRAG_IPV6) += nf_defrag_ipv6.o
26# logging 27# logging
27obj-$(CONFIG_NF_LOG_IPV6) += nf_log_ipv6.o 28obj-$(CONFIG_NF_LOG_IPV6) += nf_log_ipv6.o
28 29
30# reject
31obj-$(CONFIG_NF_REJECT_IPV6) += nf_reject_ipv6.o
32
29# nf_tables 33# nf_tables
30obj-$(CONFIG_NF_TABLES_IPV6) += nf_tables_ipv6.o 34obj-$(CONFIG_NF_TABLES_IPV6) += nf_tables_ipv6.o
31obj-$(CONFIG_NFT_CHAIN_ROUTE_IPV6) += nft_chain_route_ipv6.o 35obj-$(CONFIG_NFT_CHAIN_ROUTE_IPV6) += nft_chain_route_ipv6.o
32obj-$(CONFIG_NFT_CHAIN_NAT_IPV6) += nft_chain_nat_ipv6.o 36obj-$(CONFIG_NFT_CHAIN_NAT_IPV6) += nft_chain_nat_ipv6.o
33obj-$(CONFIG_NFT_REJECT_IPV6) += nft_reject_ipv6.o 37obj-$(CONFIG_NFT_REJECT_IPV6) += nft_reject_ipv6.o
38obj-$(CONFIG_NFT_MASQ_IPV6) += nft_masq_ipv6.o
34 39
35# matches 40# matches
36obj-$(CONFIG_IP6_NF_MATCH_AH) += ip6t_ah.o 41obj-$(CONFIG_IP6_NF_MATCH_AH) += ip6t_ah.o
diff --git a/net/ipv6/netfilter/ip6t_MASQUERADE.c b/net/ipv6/netfilter/ip6t_MASQUERADE.c
index 3e4e92d5e157..7f9f45d829d2 100644
--- a/net/ipv6/netfilter/ip6t_MASQUERADE.c
+++ b/net/ipv6/netfilter/ip6t_MASQUERADE.c
@@ -19,33 +19,12 @@
19#include <net/netfilter/nf_nat.h> 19#include <net/netfilter/nf_nat.h>
20#include <net/addrconf.h> 20#include <net/addrconf.h>
21#include <net/ipv6.h> 21#include <net/ipv6.h>
22#include <net/netfilter/ipv6/nf_nat_masquerade.h>
22 23
23static unsigned int 24static unsigned int
24masquerade_tg6(struct sk_buff *skb, const struct xt_action_param *par) 25masquerade_tg6(struct sk_buff *skb, const struct xt_action_param *par)
25{ 26{
26 const struct nf_nat_range *range = par->targinfo; 27 return nf_nat_masquerade_ipv6(skb, par->targinfo, par->out);
27 enum ip_conntrack_info ctinfo;
28 struct in6_addr src;
29 struct nf_conn *ct;
30 struct nf_nat_range newrange;
31
32 ct = nf_ct_get(skb, &ctinfo);
33 NF_CT_ASSERT(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED ||
34 ctinfo == IP_CT_RELATED_REPLY));
35
36 if (ipv6_dev_get_saddr(dev_net(par->out), par->out,
37 &ipv6_hdr(skb)->daddr, 0, &src) < 0)
38 return NF_DROP;
39
40 nfct_nat(ct)->masq_index = par->out->ifindex;
41
42 newrange.flags = range->flags | NF_NAT_RANGE_MAP_IPS;
43 newrange.min_addr.in6 = src;
44 newrange.max_addr.in6 = src;
45 newrange.min_proto = range->min_proto;
46 newrange.max_proto = range->max_proto;
47
48 return nf_nat_setup_info(ct, &newrange, NF_NAT_MANIP_SRC);
49} 28}
50 29
51static int masquerade_tg6_checkentry(const struct xt_tgchk_param *par) 30static int masquerade_tg6_checkentry(const struct xt_tgchk_param *par)
@@ -57,48 +36,6 @@ static int masquerade_tg6_checkentry(const struct xt_tgchk_param *par)
57 return 0; 36 return 0;
58} 37}
59 38
60static int device_cmp(struct nf_conn *ct, void *ifindex)
61{
62 const struct nf_conn_nat *nat = nfct_nat(ct);
63
64 if (!nat)
65 return 0;
66 if (nf_ct_l3num(ct) != NFPROTO_IPV6)
67 return 0;
68 return nat->masq_index == (int)(long)ifindex;
69}
70
71static int masq_device_event(struct notifier_block *this,
72 unsigned long event, void *ptr)
73{
74 const struct net_device *dev = netdev_notifier_info_to_dev(ptr);
75 struct net *net = dev_net(dev);
76
77 if (event == NETDEV_DOWN)
78 nf_ct_iterate_cleanup(net, device_cmp,
79 (void *)(long)dev->ifindex, 0, 0);
80
81 return NOTIFY_DONE;
82}
83
84static struct notifier_block masq_dev_notifier = {
85 .notifier_call = masq_device_event,
86};
87
88static int masq_inet_event(struct notifier_block *this,
89 unsigned long event, void *ptr)
90{
91 struct inet6_ifaddr *ifa = ptr;
92 struct netdev_notifier_info info;
93
94 netdev_notifier_info_init(&info, ifa->idev->dev);
95 return masq_device_event(this, event, &info);
96}
97
98static struct notifier_block masq_inet_notifier = {
99 .notifier_call = masq_inet_event,
100};
101
102static struct xt_target masquerade_tg6_reg __read_mostly = { 39static struct xt_target masquerade_tg6_reg __read_mostly = {
103 .name = "MASQUERADE", 40 .name = "MASQUERADE",
104 .family = NFPROTO_IPV6, 41 .family = NFPROTO_IPV6,
@@ -115,17 +52,14 @@ static int __init masquerade_tg6_init(void)
115 int err; 52 int err;
116 53
117 err = xt_register_target(&masquerade_tg6_reg); 54 err = xt_register_target(&masquerade_tg6_reg);
118 if (err == 0) { 55 if (err == 0)
119 register_netdevice_notifier(&masq_dev_notifier); 56 nf_nat_masquerade_ipv6_register_notifier();
120 register_inet6addr_notifier(&masq_inet_notifier);
121 }
122 57
123 return err; 58 return err;
124} 59}
125static void __exit masquerade_tg6_exit(void) 60static void __exit masquerade_tg6_exit(void)
126{ 61{
127 unregister_inet6addr_notifier(&masq_inet_notifier); 62 nf_nat_masquerade_ipv6_unregister_notifier();
128 unregister_netdevice_notifier(&masq_dev_notifier);
129 xt_unregister_target(&masquerade_tg6_reg); 63 xt_unregister_target(&masquerade_tg6_reg);
130} 64}
131 65
diff --git a/net/ipv6/netfilter/ip6table_nat.c b/net/ipv6/netfilter/ip6table_nat.c
index 387d8b8fc18d..b0634ac996b7 100644
--- a/net/ipv6/netfilter/ip6table_nat.c
+++ b/net/ipv6/netfilter/ip6table_nat.c
@@ -30,222 +30,57 @@ static const struct xt_table nf_nat_ipv6_table = {
30 .af = NFPROTO_IPV6, 30 .af = NFPROTO_IPV6,
31}; 31};
32 32
33static unsigned int alloc_null_binding(struct nf_conn *ct, unsigned int hooknum) 33static unsigned int ip6table_nat_do_chain(const struct nf_hook_ops *ops,
34{ 34 struct sk_buff *skb,
35 /* Force range to this IP; let proto decide mapping for 35 const struct net_device *in,
36 * per-proto parts (hence not IP_NAT_RANGE_PROTO_SPECIFIED). 36 const struct net_device *out,
37 */ 37 struct nf_conn *ct)
38 struct nf_nat_range range;
39
40 range.flags = 0;
41 pr_debug("Allocating NULL binding for %p (%pI6)\n", ct,
42 HOOK2MANIP(hooknum) == NF_NAT_MANIP_SRC ?
43 &ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3.ip6 :
44 &ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3.ip6);
45
46 return nf_nat_setup_info(ct, &range, HOOK2MANIP(hooknum));
47}
48
49static unsigned int nf_nat_rule_find(struct sk_buff *skb, unsigned int hooknum,
50 const struct net_device *in,
51 const struct net_device *out,
52 struct nf_conn *ct)
53{ 38{
54 struct net *net = nf_ct_net(ct); 39 struct net *net = nf_ct_net(ct);
55 unsigned int ret;
56 40
57 ret = ip6t_do_table(skb, hooknum, in, out, net->ipv6.ip6table_nat); 41 return ip6t_do_table(skb, ops->hooknum, in, out, net->ipv6.ip6table_nat);
58 if (ret == NF_ACCEPT) {
59 if (!nf_nat_initialized(ct, HOOK2MANIP(hooknum)))
60 ret = alloc_null_binding(ct, hooknum);
61 }
62 return ret;
63} 42}
64 43
65static unsigned int 44static unsigned int ip6table_nat_fn(const struct nf_hook_ops *ops,
66nf_nat_ipv6_fn(const struct nf_hook_ops *ops, 45 struct sk_buff *skb,
67 struct sk_buff *skb, 46 const struct net_device *in,
68 const struct net_device *in, 47 const struct net_device *out,
69 const struct net_device *out, 48 int (*okfn)(struct sk_buff *))
70 int (*okfn)(struct sk_buff *))
71{ 49{
72 struct nf_conn *ct; 50 return nf_nat_ipv6_fn(ops, skb, in, out, ip6table_nat_do_chain);
73 enum ip_conntrack_info ctinfo;
74 struct nf_conn_nat *nat;
75 enum nf_nat_manip_type maniptype = HOOK2MANIP(ops->hooknum);
76 __be16 frag_off;
77 int hdrlen;
78 u8 nexthdr;
79
80 ct = nf_ct_get(skb, &ctinfo);
81 /* Can't track? It's not due to stress, or conntrack would
82 * have dropped it. Hence it's the user's responsibilty to
83 * packet filter it out, or implement conntrack/NAT for that
84 * protocol. 8) --RR
85 */
86 if (!ct)
87 return NF_ACCEPT;
88
89 /* Don't try to NAT if this packet is not conntracked */
90 if (nf_ct_is_untracked(ct))
91 return NF_ACCEPT;
92
93 nat = nf_ct_nat_ext_add(ct);
94 if (nat == NULL)
95 return NF_ACCEPT;
96
97 switch (ctinfo) {
98 case IP_CT_RELATED:
99 case IP_CT_RELATED_REPLY:
100 nexthdr = ipv6_hdr(skb)->nexthdr;
101 hdrlen = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr),
102 &nexthdr, &frag_off);
103
104 if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
105 if (!nf_nat_icmpv6_reply_translation(skb, ct, ctinfo,
106 ops->hooknum,
107 hdrlen))
108 return NF_DROP;
109 else
110 return NF_ACCEPT;
111 }
112 /* Fall thru... (Only ICMPs can be IP_CT_IS_REPLY) */
113 case IP_CT_NEW:
114 /* Seen it before? This can happen for loopback, retrans,
115 * or local packets.
116 */
117 if (!nf_nat_initialized(ct, maniptype)) {
118 unsigned int ret;
119
120 ret = nf_nat_rule_find(skb, ops->hooknum, in, out, ct);
121 if (ret != NF_ACCEPT)
122 return ret;
123 } else {
124 pr_debug("Already setup manip %s for ct %p\n",
125 maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST",
126 ct);
127 if (nf_nat_oif_changed(ops->hooknum, ctinfo, nat, out))
128 goto oif_changed;
129 }
130 break;
131
132 default:
133 /* ESTABLISHED */
134 NF_CT_ASSERT(ctinfo == IP_CT_ESTABLISHED ||
135 ctinfo == IP_CT_ESTABLISHED_REPLY);
136 if (nf_nat_oif_changed(ops->hooknum, ctinfo, nat, out))
137 goto oif_changed;
138 }
139
140 return nf_nat_packet(ct, ctinfo, ops->hooknum, skb);
141
142oif_changed:
143 nf_ct_kill_acct(ct, ctinfo, skb);
144 return NF_DROP;
145} 51}
146 52
147static unsigned int 53static unsigned int ip6table_nat_in(const struct nf_hook_ops *ops,
148nf_nat_ipv6_in(const struct nf_hook_ops *ops, 54 struct sk_buff *skb,
149 struct sk_buff *skb, 55 const struct net_device *in,
150 const struct net_device *in, 56 const struct net_device *out,
151 const struct net_device *out, 57 int (*okfn)(struct sk_buff *))
152 int (*okfn)(struct sk_buff *))
153{ 58{
154 unsigned int ret; 59 return nf_nat_ipv6_in(ops, skb, in, out, ip6table_nat_do_chain);
155 struct in6_addr daddr = ipv6_hdr(skb)->daddr;
156
157 ret = nf_nat_ipv6_fn(ops, skb, in, out, okfn);
158 if (ret != NF_DROP && ret != NF_STOLEN &&
159 ipv6_addr_cmp(&daddr, &ipv6_hdr(skb)->daddr))
160 skb_dst_drop(skb);
161
162 return ret;
163} 60}
164 61
165static unsigned int 62static unsigned int ip6table_nat_out(const struct nf_hook_ops *ops,
166nf_nat_ipv6_out(const struct nf_hook_ops *ops, 63 struct sk_buff *skb,
167 struct sk_buff *skb, 64 const struct net_device *in,
168 const struct net_device *in, 65 const struct net_device *out,
169 const struct net_device *out, 66 int (*okfn)(struct sk_buff *))
170 int (*okfn)(struct sk_buff *))
171{ 67{
172#ifdef CONFIG_XFRM 68 return nf_nat_ipv6_out(ops, skb, in, out, ip6table_nat_do_chain);
173 const struct nf_conn *ct;
174 enum ip_conntrack_info ctinfo;
175 int err;
176#endif
177 unsigned int ret;
178
179 /* root is playing with raw sockets. */
180 if (skb->len < sizeof(struct ipv6hdr))
181 return NF_ACCEPT;
182
183 ret = nf_nat_ipv6_fn(ops, skb, in, out, okfn);
184#ifdef CONFIG_XFRM
185 if (ret != NF_DROP && ret != NF_STOLEN &&
186 !(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) &&
187 (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
188 enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
189
190 if (!nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.src.u3,
191 &ct->tuplehash[!dir].tuple.dst.u3) ||
192 (ct->tuplehash[dir].tuple.dst.protonum != IPPROTO_ICMPV6 &&
193 ct->tuplehash[dir].tuple.src.u.all !=
194 ct->tuplehash[!dir].tuple.dst.u.all)) {
195 err = nf_xfrm_me_harder(skb, AF_INET6);
196 if (err < 0)
197 ret = NF_DROP_ERR(err);
198 }
199 }
200#endif
201 return ret;
202} 69}
203 70
204static unsigned int 71static unsigned int ip6table_nat_local_fn(const struct nf_hook_ops *ops,
205nf_nat_ipv6_local_fn(const struct nf_hook_ops *ops, 72 struct sk_buff *skb,
206 struct sk_buff *skb, 73 const struct net_device *in,
207 const struct net_device *in, 74 const struct net_device *out,
208 const struct net_device *out, 75 int (*okfn)(struct sk_buff *))
209 int (*okfn)(struct sk_buff *))
210{ 76{
211 const struct nf_conn *ct; 77 return nf_nat_ipv6_local_fn(ops, skb, in, out, ip6table_nat_do_chain);
212 enum ip_conntrack_info ctinfo;
213 unsigned int ret;
214 int err;
215
216 /* root is playing with raw sockets. */
217 if (skb->len < sizeof(struct ipv6hdr))
218 return NF_ACCEPT;
219
220 ret = nf_nat_ipv6_fn(ops, skb, in, out, okfn);
221 if (ret != NF_DROP && ret != NF_STOLEN &&
222 (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
223 enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
224
225 if (!nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.dst.u3,
226 &ct->tuplehash[!dir].tuple.src.u3)) {
227 err = ip6_route_me_harder(skb);
228 if (err < 0)
229 ret = NF_DROP_ERR(err);
230 }
231#ifdef CONFIG_XFRM
232 else if (!(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) &&
233 ct->tuplehash[dir].tuple.dst.protonum != IPPROTO_ICMPV6 &&
234 ct->tuplehash[dir].tuple.dst.u.all !=
235 ct->tuplehash[!dir].tuple.src.u.all) {
236 err = nf_xfrm_me_harder(skb, AF_INET6);
237 if (err < 0)
238 ret = NF_DROP_ERR(err);
239 }
240#endif
241 }
242 return ret;
243} 78}
244 79
245static struct nf_hook_ops nf_nat_ipv6_ops[] __read_mostly = { 80static struct nf_hook_ops nf_nat_ipv6_ops[] __read_mostly = {
246 /* Before packet filtering, change destination */ 81 /* Before packet filtering, change destination */
247 { 82 {
248 .hook = nf_nat_ipv6_in, 83 .hook = ip6table_nat_in,
249 .owner = THIS_MODULE, 84 .owner = THIS_MODULE,
250 .pf = NFPROTO_IPV6, 85 .pf = NFPROTO_IPV6,
251 .hooknum = NF_INET_PRE_ROUTING, 86 .hooknum = NF_INET_PRE_ROUTING,
@@ -253,7 +88,7 @@ static struct nf_hook_ops nf_nat_ipv6_ops[] __read_mostly = {
253 }, 88 },
254 /* After packet filtering, change source */ 89 /* After packet filtering, change source */
255 { 90 {
256 .hook = nf_nat_ipv6_out, 91 .hook = ip6table_nat_out,
257 .owner = THIS_MODULE, 92 .owner = THIS_MODULE,
258 .pf = NFPROTO_IPV6, 93 .pf = NFPROTO_IPV6,
259 .hooknum = NF_INET_POST_ROUTING, 94 .hooknum = NF_INET_POST_ROUTING,
@@ -261,7 +96,7 @@ static struct nf_hook_ops nf_nat_ipv6_ops[] __read_mostly = {
261 }, 96 },
262 /* Before packet filtering, change destination */ 97 /* Before packet filtering, change destination */
263 { 98 {
264 .hook = nf_nat_ipv6_local_fn, 99 .hook = ip6table_nat_local_fn,
265 .owner = THIS_MODULE, 100 .owner = THIS_MODULE,
266 .pf = NFPROTO_IPV6, 101 .pf = NFPROTO_IPV6,
267 .hooknum = NF_INET_LOCAL_OUT, 102 .hooknum = NF_INET_LOCAL_OUT,
@@ -269,7 +104,7 @@ static struct nf_hook_ops nf_nat_ipv6_ops[] __read_mostly = {
269 }, 104 },
270 /* After packet filtering, change source */ 105 /* After packet filtering, change source */
271 { 106 {
272 .hook = nf_nat_ipv6_fn, 107 .hook = ip6table_nat_fn,
273 .owner = THIS_MODULE, 108 .owner = THIS_MODULE,
274 .pf = NFPROTO_IPV6, 109 .pf = NFPROTO_IPV6,
275 .hooknum = NF_INET_LOCAL_IN, 110 .hooknum = NF_INET_LOCAL_IN,
diff --git a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
index 7b9a748c6bac..e70382e4dfb5 100644
--- a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
+++ b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
@@ -40,7 +40,7 @@ static enum ip6_defrag_users nf_ct6_defrag_user(unsigned int hooknum,
40 zone = nf_ct_zone((struct nf_conn *)skb->nfct); 40 zone = nf_ct_zone((struct nf_conn *)skb->nfct);
41#endif 41#endif
42 42
43#ifdef CONFIG_BRIDGE_NETFILTER 43#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
44 if (skb->nf_bridge && 44 if (skb->nf_bridge &&
45 skb->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING) 45 skb->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING)
46 return IP6_DEFRAG_CONNTRACK_BRIDGE_IN + zone; 46 return IP6_DEFRAG_CONNTRACK_BRIDGE_IN + zone;
diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
index fc8e49b2ff3e..c5812e1c1ffb 100644
--- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
@@ -261,6 +261,205 @@ int nf_nat_icmpv6_reply_translation(struct sk_buff *skb,
261} 261}
262EXPORT_SYMBOL_GPL(nf_nat_icmpv6_reply_translation); 262EXPORT_SYMBOL_GPL(nf_nat_icmpv6_reply_translation);
263 263
264unsigned int
265nf_nat_ipv6_fn(const struct nf_hook_ops *ops, struct sk_buff *skb,
266 const struct net_device *in, const struct net_device *out,
267 unsigned int (*do_chain)(const struct nf_hook_ops *ops,
268 struct sk_buff *skb,
269 const struct net_device *in,
270 const struct net_device *out,
271 struct nf_conn *ct))
272{
273 struct nf_conn *ct;
274 enum ip_conntrack_info ctinfo;
275 struct nf_conn_nat *nat;
276 enum nf_nat_manip_type maniptype = HOOK2MANIP(ops->hooknum);
277 __be16 frag_off;
278 int hdrlen;
279 u8 nexthdr;
280
281 ct = nf_ct_get(skb, &ctinfo);
282 /* Can't track? It's not due to stress, or conntrack would
283 * have dropped it. Hence it's the user's responsibilty to
284 * packet filter it out, or implement conntrack/NAT for that
285 * protocol. 8) --RR
286 */
287 if (!ct)
288 return NF_ACCEPT;
289
290 /* Don't try to NAT if this packet is not conntracked */
291 if (nf_ct_is_untracked(ct))
292 return NF_ACCEPT;
293
294 nat = nf_ct_nat_ext_add(ct);
295 if (nat == NULL)
296 return NF_ACCEPT;
297
298 switch (ctinfo) {
299 case IP_CT_RELATED:
300 case IP_CT_RELATED_REPLY:
301 nexthdr = ipv6_hdr(skb)->nexthdr;
302 hdrlen = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr),
303 &nexthdr, &frag_off);
304
305 if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
306 if (!nf_nat_icmpv6_reply_translation(skb, ct, ctinfo,
307 ops->hooknum,
308 hdrlen))
309 return NF_DROP;
310 else
311 return NF_ACCEPT;
312 }
313 /* Fall thru... (Only ICMPs can be IP_CT_IS_REPLY) */
314 case IP_CT_NEW:
315 /* Seen it before? This can happen for loopback, retrans,
316 * or local packets.
317 */
318 if (!nf_nat_initialized(ct, maniptype)) {
319 unsigned int ret;
320
321 ret = do_chain(ops, skb, in, out, ct);
322 if (ret != NF_ACCEPT)
323 return ret;
324
325 if (nf_nat_initialized(ct, HOOK2MANIP(ops->hooknum)))
326 break;
327
328 ret = nf_nat_alloc_null_binding(ct, ops->hooknum);
329 if (ret != NF_ACCEPT)
330 return ret;
331 } else {
332 pr_debug("Already setup manip %s for ct %p\n",
333 maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST",
334 ct);
335 if (nf_nat_oif_changed(ops->hooknum, ctinfo, nat, out))
336 goto oif_changed;
337 }
338 break;
339
340 default:
341 /* ESTABLISHED */
342 NF_CT_ASSERT(ctinfo == IP_CT_ESTABLISHED ||
343 ctinfo == IP_CT_ESTABLISHED_REPLY);
344 if (nf_nat_oif_changed(ops->hooknum, ctinfo, nat, out))
345 goto oif_changed;
346 }
347
348 return nf_nat_packet(ct, ctinfo, ops->hooknum, skb);
349
350oif_changed:
351 nf_ct_kill_acct(ct, ctinfo, skb);
352 return NF_DROP;
353}
354EXPORT_SYMBOL_GPL(nf_nat_ipv6_fn);
355
356unsigned int
357nf_nat_ipv6_in(const struct nf_hook_ops *ops, struct sk_buff *skb,
358 const struct net_device *in, const struct net_device *out,
359 unsigned int (*do_chain)(const struct nf_hook_ops *ops,
360 struct sk_buff *skb,
361 const struct net_device *in,
362 const struct net_device *out,
363 struct nf_conn *ct))
364{
365 unsigned int ret;
366 struct in6_addr daddr = ipv6_hdr(skb)->daddr;
367
368 ret = nf_nat_ipv6_fn(ops, skb, in, out, do_chain);
369 if (ret != NF_DROP && ret != NF_STOLEN &&
370 ipv6_addr_cmp(&daddr, &ipv6_hdr(skb)->daddr))
371 skb_dst_drop(skb);
372
373 return ret;
374}
375EXPORT_SYMBOL_GPL(nf_nat_ipv6_in);
376
377unsigned int
378nf_nat_ipv6_out(const struct nf_hook_ops *ops, struct sk_buff *skb,
379 const struct net_device *in, const struct net_device *out,
380 unsigned int (*do_chain)(const struct nf_hook_ops *ops,
381 struct sk_buff *skb,
382 const struct net_device *in,
383 const struct net_device *out,
384 struct nf_conn *ct))
385{
386#ifdef CONFIG_XFRM
387 const struct nf_conn *ct;
388 enum ip_conntrack_info ctinfo;
389 int err;
390#endif
391 unsigned int ret;
392
393 /* root is playing with raw sockets. */
394 if (skb->len < sizeof(struct ipv6hdr))
395 return NF_ACCEPT;
396
397 ret = nf_nat_ipv6_fn(ops, skb, in, out, do_chain);
398#ifdef CONFIG_XFRM
399 if (ret != NF_DROP && ret != NF_STOLEN &&
400 !(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) &&
401 (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
402 enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
403
404 if (!nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.src.u3,
405 &ct->tuplehash[!dir].tuple.dst.u3) ||
406 (ct->tuplehash[dir].tuple.dst.protonum != IPPROTO_ICMPV6 &&
407 ct->tuplehash[dir].tuple.src.u.all !=
408 ct->tuplehash[!dir].tuple.dst.u.all)) {
409 err = nf_xfrm_me_harder(skb, AF_INET6);
410 if (err < 0)
411 ret = NF_DROP_ERR(err);
412 }
413 }
414#endif
415 return ret;
416}
417EXPORT_SYMBOL_GPL(nf_nat_ipv6_out);
418
419unsigned int
420nf_nat_ipv6_local_fn(const struct nf_hook_ops *ops, struct sk_buff *skb,
421 const struct net_device *in, const struct net_device *out,
422 unsigned int (*do_chain)(const struct nf_hook_ops *ops,
423 struct sk_buff *skb,
424 const struct net_device *in,
425 const struct net_device *out,
426 struct nf_conn *ct))
427{
428 const struct nf_conn *ct;
429 enum ip_conntrack_info ctinfo;
430 unsigned int ret;
431 int err;
432
433 /* root is playing with raw sockets. */
434 if (skb->len < sizeof(struct ipv6hdr))
435 return NF_ACCEPT;
436
437 ret = nf_nat_ipv6_fn(ops, skb, in, out, do_chain);
438 if (ret != NF_DROP && ret != NF_STOLEN &&
439 (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
440 enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
441
442 if (!nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.dst.u3,
443 &ct->tuplehash[!dir].tuple.src.u3)) {
444 err = ip6_route_me_harder(skb);
445 if (err < 0)
446 ret = NF_DROP_ERR(err);
447 }
448#ifdef CONFIG_XFRM
449 else if (!(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) &&
450 ct->tuplehash[dir].tuple.dst.protonum != IPPROTO_ICMPV6 &&
451 ct->tuplehash[dir].tuple.dst.u.all !=
452 ct->tuplehash[!dir].tuple.src.u.all) {
453 err = nf_xfrm_me_harder(skb, AF_INET6);
454 if (err < 0)
455 ret = NF_DROP_ERR(err);
456 }
457#endif
458 }
459 return ret;
460}
461EXPORT_SYMBOL_GPL(nf_nat_ipv6_local_fn);
462
264static int __init nf_nat_l3proto_ipv6_init(void) 463static int __init nf_nat_l3proto_ipv6_init(void)
265{ 464{
266 int err; 465 int err;
diff --git a/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c b/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c
new file mode 100644
index 000000000000..7745609665cd
--- /dev/null
+++ b/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c
@@ -0,0 +1,120 @@
1/*
2 * Copyright (c) 2011 Patrick McHardy <kaber@trash.net>
3 *
4 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License version 2 as
6 * published by the Free Software Foundation.
7 *
8 * Based on Rusty Russell's IPv6 MASQUERADE target. Development of IPv6
9 * NAT funded by Astaro.
10 */
11
12#include <linux/kernel.h>
13#include <linux/module.h>
14#include <linux/atomic.h>
15#include <linux/netdevice.h>
16#include <linux/ipv6.h>
17#include <linux/netfilter.h>
18#include <linux/netfilter_ipv6.h>
19#include <net/netfilter/nf_nat.h>
20#include <net/addrconf.h>
21#include <net/ipv6.h>
22#include <net/netfilter/ipv6/nf_nat_masquerade.h>
23
24unsigned int
25nf_nat_masquerade_ipv6(struct sk_buff *skb, const struct nf_nat_range *range,
26 const struct net_device *out)
27{
28 enum ip_conntrack_info ctinfo;
29 struct in6_addr src;
30 struct nf_conn *ct;
31 struct nf_nat_range newrange;
32
33 ct = nf_ct_get(skb, &ctinfo);
34 NF_CT_ASSERT(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED ||
35 ctinfo == IP_CT_RELATED_REPLY));
36
37 if (ipv6_dev_get_saddr(dev_net(out), out,
38 &ipv6_hdr(skb)->daddr, 0, &src) < 0)
39 return NF_DROP;
40
41 nfct_nat(ct)->masq_index = out->ifindex;
42
43 newrange.flags = range->flags | NF_NAT_RANGE_MAP_IPS;
44 newrange.min_addr.in6 = src;
45 newrange.max_addr.in6 = src;
46 newrange.min_proto = range->min_proto;
47 newrange.max_proto = range->max_proto;
48
49 return nf_nat_setup_info(ct, &newrange, NF_NAT_MANIP_SRC);
50}
51EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv6);
52
53static int device_cmp(struct nf_conn *ct, void *ifindex)
54{
55 const struct nf_conn_nat *nat = nfct_nat(ct);
56
57 if (!nat)
58 return 0;
59 if (nf_ct_l3num(ct) != NFPROTO_IPV6)
60 return 0;
61 return nat->masq_index == (int)(long)ifindex;
62}
63
64static int masq_device_event(struct notifier_block *this,
65 unsigned long event, void *ptr)
66{
67 const struct net_device *dev = netdev_notifier_info_to_dev(ptr);
68 struct net *net = dev_net(dev);
69
70 if (event == NETDEV_DOWN)
71 nf_ct_iterate_cleanup(net, device_cmp,
72 (void *)(long)dev->ifindex, 0, 0);
73
74 return NOTIFY_DONE;
75}
76
77static struct notifier_block masq_dev_notifier = {
78 .notifier_call = masq_device_event,
79};
80
81static int masq_inet_event(struct notifier_block *this,
82 unsigned long event, void *ptr)
83{
84 struct inet6_ifaddr *ifa = ptr;
85 struct netdev_notifier_info info;
86
87 netdev_notifier_info_init(&info, ifa->idev->dev);
88 return masq_device_event(this, event, &info);
89}
90
91static struct notifier_block masq_inet_notifier = {
92 .notifier_call = masq_inet_event,
93};
94
95static atomic_t masquerade_notifier_refcount = ATOMIC_INIT(0);
96
97void nf_nat_masquerade_ipv6_register_notifier(void)
98{
99 /* check if the notifier is already set */
100 if (atomic_inc_return(&masquerade_notifier_refcount) > 1)
101 return;
102
103 register_netdevice_notifier(&masq_dev_notifier);
104 register_inet6addr_notifier(&masq_inet_notifier);
105}
106EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv6_register_notifier);
107
108void nf_nat_masquerade_ipv6_unregister_notifier(void)
109{
110 /* check if the notifier still has clients */
111 if (atomic_dec_return(&masquerade_notifier_refcount) > 0)
112 return;
113
114 unregister_inet6addr_notifier(&masq_inet_notifier);
115 unregister_netdevice_notifier(&masq_dev_notifier);
116}
117EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv6_unregister_notifier);
118
119MODULE_LICENSE("GPL");
120MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
new file mode 100644
index 000000000000..5f5f0438d74d
--- /dev/null
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -0,0 +1,163 @@
1/* (C) 1999-2001 Paul `Rusty' Russell
2 * (C) 2002-2004 Netfilter Core Team <coreteam@netfilter.org>
3 *
4 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License version 2 as
6 * published by the Free Software Foundation.
7 */
8#include <net/ipv6.h>
9#include <net/ip6_route.h>
10#include <net/ip6_fib.h>
11#include <net/ip6_checksum.h>
12#include <linux/netfilter_ipv6.h>
13
14void nf_send_reset6(struct net *net, struct sk_buff *oldskb, int hook)
15{
16 struct sk_buff *nskb;
17 struct tcphdr otcph, *tcph;
18 unsigned int otcplen, hh_len;
19 int tcphoff, needs_ack;
20 const struct ipv6hdr *oip6h = ipv6_hdr(oldskb);
21 struct ipv6hdr *ip6h;
22#define DEFAULT_TOS_VALUE 0x0U
23 const __u8 tclass = DEFAULT_TOS_VALUE;
24 struct dst_entry *dst = NULL;
25 u8 proto;
26 __be16 frag_off;
27 struct flowi6 fl6;
28
29 if ((!(ipv6_addr_type(&oip6h->saddr) & IPV6_ADDR_UNICAST)) ||
30 (!(ipv6_addr_type(&oip6h->daddr) & IPV6_ADDR_UNICAST))) {
31 pr_debug("addr is not unicast.\n");
32 return;
33 }
34
35 proto = oip6h->nexthdr;
36 tcphoff = ipv6_skip_exthdr(oldskb, ((u8*)(oip6h+1) - oldskb->data), &proto, &frag_off);
37
38 if ((tcphoff < 0) || (tcphoff > oldskb->len)) {
39 pr_debug("Cannot get TCP header.\n");
40 return;
41 }
42
43 otcplen = oldskb->len - tcphoff;
44
45 /* IP header checks: fragment, too short. */
46 if (proto != IPPROTO_TCP || otcplen < sizeof(struct tcphdr)) {
47 pr_debug("proto(%d) != IPPROTO_TCP, "
48 "or too short. otcplen = %d\n",
49 proto, otcplen);
50 return;
51 }
52
53 if (skb_copy_bits(oldskb, tcphoff, &otcph, sizeof(struct tcphdr)))
54 BUG();
55
56 /* No RST for RST. */
57 if (otcph.rst) {
58 pr_debug("RST is set\n");
59 return;
60 }
61
62 /* Check checksum. */
63 if (nf_ip6_checksum(oldskb, hook, tcphoff, IPPROTO_TCP)) {
64 pr_debug("TCP checksum is invalid\n");
65 return;
66 }
67
68 memset(&fl6, 0, sizeof(fl6));
69 fl6.flowi6_proto = IPPROTO_TCP;
70 fl6.saddr = oip6h->daddr;
71 fl6.daddr = oip6h->saddr;
72 fl6.fl6_sport = otcph.dest;
73 fl6.fl6_dport = otcph.source;
74 security_skb_classify_flow(oldskb, flowi6_to_flowi(&fl6));
75 dst = ip6_route_output(net, NULL, &fl6);
76 if (dst == NULL || dst->error) {
77 dst_release(dst);
78 return;
79 }
80 dst = xfrm_lookup(net, dst, flowi6_to_flowi(&fl6), NULL, 0);
81 if (IS_ERR(dst))
82 return;
83
84 hh_len = (dst->dev->hard_header_len + 15)&~15;
85 nskb = alloc_skb(hh_len + 15 + dst->header_len + sizeof(struct ipv6hdr)
86 + sizeof(struct tcphdr) + dst->trailer_len,
87 GFP_ATOMIC);
88
89 if (!nskb) {
90 net_dbg_ratelimited("cannot alloc skb\n");
91 dst_release(dst);
92 return;
93 }
94
95 skb_dst_set(nskb, dst);
96
97 skb_reserve(nskb, hh_len + dst->header_len);
98
99 skb_put(nskb, sizeof(struct ipv6hdr));
100 skb_reset_network_header(nskb);
101 ip6h = ipv6_hdr(nskb);
102 ip6_flow_hdr(ip6h, tclass, 0);
103 ip6h->hop_limit = ip6_dst_hoplimit(dst);
104 ip6h->nexthdr = IPPROTO_TCP;
105 ip6h->saddr = oip6h->daddr;
106 ip6h->daddr = oip6h->saddr;
107
108 skb_reset_transport_header(nskb);
109 tcph = (struct tcphdr *)skb_put(nskb, sizeof(struct tcphdr));
110 /* Truncate to length (no data) */
111 tcph->doff = sizeof(struct tcphdr)/4;
112 tcph->source = otcph.dest;
113 tcph->dest = otcph.source;
114
115 if (otcph.ack) {
116 needs_ack = 0;
117 tcph->seq = otcph.ack_seq;
118 tcph->ack_seq = 0;
119 } else {
120 needs_ack = 1;
121 tcph->ack_seq = htonl(ntohl(otcph.seq) + otcph.syn + otcph.fin
122 + otcplen - (otcph.doff<<2));
123 tcph->seq = 0;
124 }
125
126 /* Reset flags */
127 ((u_int8_t *)tcph)[13] = 0;
128 tcph->rst = 1;
129 tcph->ack = needs_ack;
130 tcph->window = 0;
131 tcph->urg_ptr = 0;
132 tcph->check = 0;
133
134 /* Adjust TCP checksum */
135 tcph->check = csum_ipv6_magic(&ipv6_hdr(nskb)->saddr,
136 &ipv6_hdr(nskb)->daddr,
137 sizeof(struct tcphdr), IPPROTO_TCP,
138 csum_partial(tcph,
139 sizeof(struct tcphdr), 0));
140
141 nf_ct_attach(nskb, oldskb);
142
143#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
144 /* If we use ip6_local_out for bridged traffic, the MAC source on
145 * the RST will be ours, instead of the destination's. This confuses
146 * some routers/firewalls, and they drop the packet. So we need to
147 * build the eth header using the original destination's MAC as the
148 * source, and send the RST packet directly.
149 */
150 if (oldskb->nf_bridge) {
151 struct ethhdr *oeth = eth_hdr(oldskb);
152 nskb->dev = oldskb->nf_bridge->physindev;
153 nskb->protocol = htons(ETH_P_IPV6);
154 ip6h->payload_len = htons(sizeof(struct tcphdr));
155 if (dev_hard_header(nskb, nskb->dev, ntohs(nskb->protocol),
156 oeth->h_source, oeth->h_dest, nskb->len) < 0)
157 return;
158 dev_queue_xmit(nskb);
159 } else
160#endif
161 ip6_local_out(nskb);
162}
163EXPORT_SYMBOL_GPL(nf_send_reset6);
diff --git a/net/ipv6/netfilter/nft_chain_nat_ipv6.c b/net/ipv6/netfilter/nft_chain_nat_ipv6.c
index d189fcb437fe..1c4b75dd425b 100644
--- a/net/ipv6/netfilter/nft_chain_nat_ipv6.c
+++ b/net/ipv6/netfilter/nft_chain_nat_ipv6.c
@@ -24,144 +24,53 @@
24#include <net/netfilter/nf_nat_l3proto.h> 24#include <net/netfilter/nf_nat_l3proto.h>
25#include <net/ipv6.h> 25#include <net/ipv6.h>
26 26
27/* 27static unsigned int nft_nat_do_chain(const struct nf_hook_ops *ops,
28 * IPv6 NAT chains 28 struct sk_buff *skb,
29 */ 29 const struct net_device *in,
30 30 const struct net_device *out,
31static unsigned int nf_nat_ipv6_fn(const struct nf_hook_ops *ops, 31 struct nf_conn *ct)
32 struct sk_buff *skb,
33 const struct net_device *in,
34 const struct net_device *out,
35 int (*okfn)(struct sk_buff *))
36{ 32{
37 enum ip_conntrack_info ctinfo;
38 struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
39 struct nf_conn_nat *nat;
40 enum nf_nat_manip_type maniptype = HOOK2MANIP(ops->hooknum);
41 __be16 frag_off;
42 int hdrlen;
43 u8 nexthdr;
44 struct nft_pktinfo pkt; 33 struct nft_pktinfo pkt;
45 unsigned int ret;
46
47 if (ct == NULL || nf_ct_is_untracked(ct))
48 return NF_ACCEPT;
49
50 nat = nf_ct_nat_ext_add(ct);
51 if (nat == NULL)
52 return NF_ACCEPT;
53
54 switch (ctinfo) {
55 case IP_CT_RELATED:
56 case IP_CT_RELATED + IP_CT_IS_REPLY:
57 nexthdr = ipv6_hdr(skb)->nexthdr;
58 hdrlen = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr),
59 &nexthdr, &frag_off);
60
61 if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) {
62 if (!nf_nat_icmpv6_reply_translation(skb, ct, ctinfo,
63 ops->hooknum,
64 hdrlen))
65 return NF_DROP;
66 else
67 return NF_ACCEPT;
68 }
69 /* Fall through */
70 case IP_CT_NEW:
71 if (nf_nat_initialized(ct, maniptype))
72 break;
73
74 nft_set_pktinfo_ipv6(&pkt, ops, skb, in, out);
75 34
76 ret = nft_do_chain(&pkt, ops); 35 nft_set_pktinfo_ipv6(&pkt, ops, skb, in, out);
77 if (ret != NF_ACCEPT)
78 return ret;
79 if (!nf_nat_initialized(ct, maniptype)) {
80 ret = nf_nat_alloc_null_binding(ct, ops->hooknum);
81 if (ret != NF_ACCEPT)
82 return ret;
83 }
84 default:
85 break;
86 }
87 36
88 return nf_nat_packet(ct, ctinfo, ops->hooknum, skb); 37 return nft_do_chain(&pkt, ops);
89} 38}
90 39
91static unsigned int nf_nat_ipv6_prerouting(const struct nf_hook_ops *ops, 40static unsigned int nft_nat_ipv6_fn(const struct nf_hook_ops *ops,
92 struct sk_buff *skb, 41 struct sk_buff *skb,
93 const struct net_device *in, 42 const struct net_device *in,
94 const struct net_device *out, 43 const struct net_device *out,
95 int (*okfn)(struct sk_buff *)) 44 int (*okfn)(struct sk_buff *))
96{ 45{
97 struct in6_addr daddr = ipv6_hdr(skb)->daddr; 46 return nf_nat_ipv6_fn(ops, skb, in, out, nft_nat_do_chain);
98 unsigned int ret;
99
100 ret = nf_nat_ipv6_fn(ops, skb, in, out, okfn);
101 if (ret != NF_DROP && ret != NF_STOLEN &&
102 ipv6_addr_cmp(&daddr, &ipv6_hdr(skb)->daddr))
103 skb_dst_drop(skb);
104
105 return ret;
106} 47}
107 48
108static unsigned int nf_nat_ipv6_postrouting(const struct nf_hook_ops *ops, 49static unsigned int nft_nat_ipv6_in(const struct nf_hook_ops *ops,
109 struct sk_buff *skb, 50 struct sk_buff *skb,
110 const struct net_device *in, 51 const struct net_device *in,
111 const struct net_device *out, 52 const struct net_device *out,
112 int (*okfn)(struct sk_buff *)) 53 int (*okfn)(struct sk_buff *))
113{ 54{
114 enum ip_conntrack_info ctinfo __maybe_unused; 55 return nf_nat_ipv6_in(ops, skb, in, out, nft_nat_do_chain);
115 const struct nf_conn *ct __maybe_unused;
116 unsigned int ret;
117
118 ret = nf_nat_ipv6_fn(ops, skb, in, out, okfn);
119#ifdef CONFIG_XFRM
120 if (ret != NF_DROP && ret != NF_STOLEN &&
121 !(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) &&
122 (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
123 enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
124
125 if (!nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.src.u3,
126 &ct->tuplehash[!dir].tuple.dst.u3) ||
127 (ct->tuplehash[dir].tuple.src.u.all !=
128 ct->tuplehash[!dir].tuple.dst.u.all))
129 if (nf_xfrm_me_harder(skb, AF_INET6) < 0)
130 ret = NF_DROP;
131 }
132#endif
133 return ret;
134} 56}
135 57
136static unsigned int nf_nat_ipv6_output(const struct nf_hook_ops *ops, 58static unsigned int nft_nat_ipv6_out(const struct nf_hook_ops *ops,
137 struct sk_buff *skb, 59 struct sk_buff *skb,
138 const struct net_device *in, 60 const struct net_device *in,
139 const struct net_device *out, 61 const struct net_device *out,
140 int (*okfn)(struct sk_buff *)) 62 int (*okfn)(struct sk_buff *))
141{ 63{
142 enum ip_conntrack_info ctinfo; 64 return nf_nat_ipv6_out(ops, skb, in, out, nft_nat_do_chain);
143 const struct nf_conn *ct; 65}
144 unsigned int ret;
145
146 ret = nf_nat_ipv6_fn(ops, skb, in, out, okfn);
147 if (ret != NF_DROP && ret != NF_STOLEN &&
148 (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
149 enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
150 66
151 if (!nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.dst.u3, 67static unsigned int nft_nat_ipv6_local_fn(const struct nf_hook_ops *ops,
152 &ct->tuplehash[!dir].tuple.src.u3)) { 68 struct sk_buff *skb,
153 if (ip6_route_me_harder(skb)) 69 const struct net_device *in,
154 ret = NF_DROP; 70 const struct net_device *out,
155 } 71 int (*okfn)(struct sk_buff *))
156#ifdef CONFIG_XFRM 72{
157 else if (!(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) && 73 return nf_nat_ipv6_local_fn(ops, skb, in, out, nft_nat_do_chain);
158 ct->tuplehash[dir].tuple.dst.u.all !=
159 ct->tuplehash[!dir].tuple.src.u.all)
160 if (nf_xfrm_me_harder(skb, AF_INET6))
161 ret = NF_DROP;
162#endif
163 }
164 return ret;
165} 74}
166 75
167static const struct nf_chain_type nft_chain_nat_ipv6 = { 76static const struct nf_chain_type nft_chain_nat_ipv6 = {
@@ -174,10 +83,10 @@ static const struct nf_chain_type nft_chain_nat_ipv6 = {
174 (1 << NF_INET_LOCAL_OUT) | 83 (1 << NF_INET_LOCAL_OUT) |
175 (1 << NF_INET_LOCAL_IN), 84 (1 << NF_INET_LOCAL_IN),
176 .hooks = { 85 .hooks = {
177 [NF_INET_PRE_ROUTING] = nf_nat_ipv6_prerouting, 86 [NF_INET_PRE_ROUTING] = nft_nat_ipv6_in,
178 [NF_INET_POST_ROUTING] = nf_nat_ipv6_postrouting, 87 [NF_INET_POST_ROUTING] = nft_nat_ipv6_out,
179 [NF_INET_LOCAL_OUT] = nf_nat_ipv6_output, 88 [NF_INET_LOCAL_OUT] = nft_nat_ipv6_local_fn,
180 [NF_INET_LOCAL_IN] = nf_nat_ipv6_fn, 89 [NF_INET_LOCAL_IN] = nft_nat_ipv6_fn,
181 }, 90 },
182}; 91};
183 92
diff --git a/net/ipv6/netfilter/nft_masq_ipv6.c b/net/ipv6/netfilter/nft_masq_ipv6.c
new file mode 100644
index 000000000000..556262f40761
--- /dev/null
+++ b/net/ipv6/netfilter/nft_masq_ipv6.c
@@ -0,0 +1,77 @@
1/*
2 * Copyright (c) 2014 Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
3 *
4 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License version 2 as
6 * published by the Free Software Foundation.
7 */
8
9#include <linux/kernel.h>
10#include <linux/init.h>
11#include <linux/module.h>
12#include <linux/netlink.h>
13#include <linux/netfilter.h>
14#include <linux/netfilter/nf_tables.h>
15#include <net/netfilter/nf_tables.h>
16#include <net/netfilter/nf_nat.h>
17#include <net/netfilter/nft_masq.h>
18#include <net/netfilter/ipv6/nf_nat_masquerade.h>
19
20static void nft_masq_ipv6_eval(const struct nft_expr *expr,
21 struct nft_data data[NFT_REG_MAX + 1],
22 const struct nft_pktinfo *pkt)
23{
24 struct nft_masq *priv = nft_expr_priv(expr);
25 struct nf_nat_range range;
26 unsigned int verdict;
27
28 range.flags = priv->flags;
29
30 verdict = nf_nat_masquerade_ipv6(pkt->skb, &range, pkt->out);
31
32 data[NFT_REG_VERDICT].verdict = verdict;
33}
34
35static struct nft_expr_type nft_masq_ipv6_type;
36static const struct nft_expr_ops nft_masq_ipv6_ops = {
37 .type = &nft_masq_ipv6_type,
38 .size = NFT_EXPR_SIZE(sizeof(struct nft_masq)),
39 .eval = nft_masq_ipv6_eval,
40 .init = nft_masq_init,
41 .dump = nft_masq_dump,
42};
43
44static struct nft_expr_type nft_masq_ipv6_type __read_mostly = {
45 .family = NFPROTO_IPV6,
46 .name = "masq",
47 .ops = &nft_masq_ipv6_ops,
48 .policy = nft_masq_policy,
49 .maxattr = NFTA_MASQ_MAX,
50 .owner = THIS_MODULE,
51};
52
53static int __init nft_masq_ipv6_module_init(void)
54{
55 int ret;
56
57 ret = nft_register_expr(&nft_masq_ipv6_type);
58 if (ret < 0)
59 return ret;
60
61 nf_nat_masquerade_ipv6_register_notifier();
62
63 return ret;
64}
65
66static void __exit nft_masq_ipv6_module_exit(void)
67{
68 nft_unregister_expr(&nft_masq_ipv6_type);
69 nf_nat_masquerade_ipv6_unregister_notifier();
70}
71
72module_init(nft_masq_ipv6_module_init);
73module_exit(nft_masq_ipv6_module_exit);
74
75MODULE_LICENSE("GPL");
76MODULE_AUTHOR("Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>");
77MODULE_ALIAS_NFT_AF_EXPR(AF_INET6, "masq");
diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c
index 5ec867e4a8b7..fc24c390af05 100644
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -35,7 +35,7 @@ int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
35 if (found_rhdr) 35 if (found_rhdr)
36 return offset; 36 return offset;
37 break; 37 break;
38 default : 38 default:
39 return offset; 39 return offset;
40 } 40 }
41 41
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index 2d6f860e5c1e..1752cd0b4882 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -8,7 +8,7 @@
8 * except it reports the sockets in the INET6 address family. 8 * except it reports the sockets in the INET6 address family.
9 * 9 *
10 * Authors: David S. Miller (davem@caip.rutgers.edu) 10 * Authors: David S. Miller (davem@caip.rutgers.edu)
11 * YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> 11 * YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
12 * 12 *
13 * This program is free software; you can redistribute it and/or 13 * This program is free software; you can redistribute it and/or
14 * modify it under the terms of the GNU General Public License 14 * modify it under the terms of the GNU General Public License
diff --git a/net/ipv6/protocol.c b/net/ipv6/protocol.c
index e048cf1bb6a2..e3770abe688a 100644
--- a/net/ipv6/protocol.c
+++ b/net/ipv6/protocol.c
@@ -51,6 +51,7 @@ EXPORT_SYMBOL(inet6_del_protocol);
51#endif 51#endif
52 52
53const struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS] __read_mostly; 53const struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS] __read_mostly;
54EXPORT_SYMBOL(inet6_offloads);
54 55
55int inet6_add_offload(const struct net_offload *prot, unsigned char protocol) 56int inet6_add_offload(const struct net_offload *prot, unsigned char protocol)
56{ 57{
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 39d44226e402..896af8807979 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -889,7 +889,7 @@ back_from_confirm:
889 else { 889 else {
890 lock_sock(sk); 890 lock_sock(sk);
891 err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov, 891 err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov,
892 len, 0, hlimit, tclass, opt, &fl6, (struct rt6_info*)dst, 892 len, 0, hlimit, tclass, opt, &fl6, (struct rt6_info *)dst,
893 msg->msg_flags, dontfrag); 893 msg->msg_flags, dontfrag);
894 894
895 if (err) 895 if (err)
@@ -902,7 +902,7 @@ done:
902 dst_release(dst); 902 dst_release(dst);
903out: 903out:
904 fl6_sock_release(flowlabel); 904 fl6_sock_release(flowlabel);
905 return err<0?err:len; 905 return err < 0 ? err : len;
906do_confirm: 906do_confirm:
907 dst_confirm(dst); 907 dst_confirm(dst);
908 if (!(msg->msg_flags & MSG_PROBE) || len) 908 if (!(msg->msg_flags & MSG_PROBE) || len)
@@ -1045,7 +1045,7 @@ static int do_rawv6_getsockopt(struct sock *sk, int level, int optname,
1045 struct raw6_sock *rp = raw6_sk(sk); 1045 struct raw6_sock *rp = raw6_sk(sk);
1046 int val, len; 1046 int val, len;
1047 1047
1048 if (get_user(len,optlen)) 1048 if (get_user(len, optlen))
1049 return -EFAULT; 1049 return -EFAULT;
1050 1050
1051 switch (optname) { 1051 switch (optname) {
@@ -1069,7 +1069,7 @@ static int do_rawv6_getsockopt(struct sock *sk, int level, int optname,
1069 1069
1070 if (put_user(len, optlen)) 1070 if (put_user(len, optlen))
1071 return -EFAULT; 1071 return -EFAULT;
1072 if (copy_to_user(optval,&val,len)) 1072 if (copy_to_user(optval, &val, len))
1073 return -EFAULT; 1073 return -EFAULT;
1074 return 0; 1074 return 0;
1075} 1075}
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index c6557d9f7808..1a157ca2ebc1 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -62,13 +62,12 @@
62 62
63static const char ip6_frag_cache_name[] = "ip6-frags"; 63static const char ip6_frag_cache_name[] = "ip6-frags";
64 64
65struct ip6frag_skb_cb 65struct ip6frag_skb_cb {
66{
67 struct inet6_skb_parm h; 66 struct inet6_skb_parm h;
68 int offset; 67 int offset;
69}; 68};
70 69
71#define FRAG6_CB(skb) ((struct ip6frag_skb_cb*)((skb)->cb)) 70#define FRAG6_CB(skb) ((struct ip6frag_skb_cb *)((skb)->cb))
72 71
73static inline u8 ip6_frag_ecn(const struct ipv6hdr *ipv6h) 72static inline u8 ip6_frag_ecn(const struct ipv6hdr *ipv6h)
74{ 73{
@@ -289,7 +288,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
289 goto found; 288 goto found;
290 } 289 }
291 prev = NULL; 290 prev = NULL;
292 for(next = fq->q.fragments; next != NULL; next = next->next) { 291 for (next = fq->q.fragments; next != NULL; next = next->next) {
293 if (FRAG6_CB(next)->offset >= offset) 292 if (FRAG6_CB(next)->offset >= offset)
294 break; /* bingo! */ 293 break; /* bingo! */
295 prev = next; 294 prev = next;
@@ -529,7 +528,7 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
529 IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMREQDS); 528 IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMREQDS);
530 529
531 /* Jumbo payload inhibits frag. header */ 530 /* Jumbo payload inhibits frag. header */
532 if (hdr->payload_len==0) 531 if (hdr->payload_len == 0)
533 goto fail_hdr; 532 goto fail_hdr;
534 533
535 if (!pskb_may_pull(skb, (skb_transport_offset(skb) + 534 if (!pskb_may_pull(skb, (skb_transport_offset(skb) +
@@ -575,8 +574,7 @@ fail_hdr:
575 return -1; 574 return -1;
576} 575}
577 576
578static const struct inet6_protocol frag_protocol = 577static const struct inet6_protocol frag_protocol = {
579{
580 .handler = ipv6_frag_rcv, 578 .handler = ipv6_frag_rcv,
581 .flags = INET6_PROTO_NOPOLICY, 579 .flags = INET6_PROTO_NOPOLICY,
582}; 580};
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index bafde82324c5..a318dd89b6d9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -812,7 +812,7 @@ out:
812 812
813} 813}
814 814
815struct dst_entry * ip6_route_lookup(struct net *net, struct flowi6 *fl6, 815struct dst_entry *ip6_route_lookup(struct net *net, struct flowi6 *fl6,
816 int flags) 816 int flags)
817{ 817{
818 return fib6_rule_lookup(net, fl6, flags, ip6_pol_route_lookup); 818 return fib6_rule_lookup(net, fl6, flags, ip6_pol_route_lookup);
@@ -842,7 +842,6 @@ struct rt6_info *rt6_lookup(struct net *net, const struct in6_addr *daddr,
842 842
843 return NULL; 843 return NULL;
844} 844}
845
846EXPORT_SYMBOL(rt6_lookup); 845EXPORT_SYMBOL(rt6_lookup);
847 846
848/* ip6_ins_rt is called with FREE table->tb6_lock. 847/* ip6_ins_rt is called with FREE table->tb6_lock.
@@ -1023,7 +1022,7 @@ static struct rt6_info *ip6_pol_route_output(struct net *net, struct fib6_table
1023 return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags); 1022 return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags);
1024} 1023}
1025 1024
1026struct dst_entry * ip6_route_output(struct net *net, const struct sock *sk, 1025struct dst_entry *ip6_route_output(struct net *net, const struct sock *sk,
1027 struct flowi6 *fl6) 1026 struct flowi6 *fl6)
1028{ 1027{
1029 int flags = 0; 1028 int flags = 0;
@@ -1040,7 +1039,6 @@ struct dst_entry * ip6_route_output(struct net *net, const struct sock *sk,
1040 1039
1041 return fib6_rule_lookup(net, fl6, flags, ip6_pol_route_output); 1040 return fib6_rule_lookup(net, fl6, flags, ip6_pol_route_output);
1042} 1041}
1043
1044EXPORT_SYMBOL(ip6_route_output); 1042EXPORT_SYMBOL(ip6_route_output);
1045 1043
1046struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry *dst_orig) 1044struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry *dst_orig)
@@ -1145,7 +1143,7 @@ static void ip6_link_failure(struct sk_buff *skb)
1145static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk, 1143static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
1146 struct sk_buff *skb, u32 mtu) 1144 struct sk_buff *skb, u32 mtu)
1147{ 1145{
1148 struct rt6_info *rt6 = (struct rt6_info*)dst; 1146 struct rt6_info *rt6 = (struct rt6_info *)dst;
1149 1147
1150 dst_confirm(dst); 1148 dst_confirm(dst);
1151 if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) { 1149 if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
@@ -1920,7 +1918,7 @@ static struct rt6_info *rt6_get_route_info(struct net *net,
1920 return NULL; 1918 return NULL;
1921 1919
1922 read_lock_bh(&table->tb6_lock); 1920 read_lock_bh(&table->tb6_lock);
1923 fn = fib6_locate(&table->tb6_root, prefix ,prefixlen, NULL, 0); 1921 fn = fib6_locate(&table->tb6_root, prefix, prefixlen, NULL, 0);
1924 if (!fn) 1922 if (!fn)
1925 goto out; 1923 goto out;
1926 1924
@@ -1979,7 +1977,7 @@ struct rt6_info *rt6_get_dflt_router(const struct in6_addr *addr, struct net_dev
1979 return NULL; 1977 return NULL;
1980 1978
1981 read_lock_bh(&table->tb6_lock); 1979 read_lock_bh(&table->tb6_lock);
1982 for (rt = table->tb6_root.leaf; rt; rt=rt->dst.rt6_next) { 1980 for (rt = table->tb6_root.leaf; rt; rt = rt->dst.rt6_next) {
1983 if (dev == rt->dst.dev && 1981 if (dev == rt->dst.dev &&
1984 ((rt->rt6i_flags & (RTF_ADDRCONF | RTF_DEFAULT)) == (RTF_ADDRCONF | RTF_DEFAULT)) && 1982 ((rt->rt6i_flags & (RTF_ADDRCONF | RTF_DEFAULT)) == (RTF_ADDRCONF | RTF_DEFAULT)) &&
1985 ipv6_addr_equal(&rt->rt6i_gateway, addr)) 1983 ipv6_addr_equal(&rt->rt6i_gateway, addr))
@@ -2064,7 +2062,7 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, void __user *arg)
2064 struct in6_rtmsg rtmsg; 2062 struct in6_rtmsg rtmsg;
2065 int err; 2063 int err;
2066 2064
2067 switch(cmd) { 2065 switch (cmd) {
2068 case SIOCADDRT: /* Add a route */ 2066 case SIOCADDRT: /* Add a route */
2069 case SIOCDELRT: /* Delete a route */ 2067 case SIOCDELRT: /* Delete a route */
2070 if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) 2068 if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
@@ -2187,7 +2185,7 @@ int ip6_route_get_saddr(struct net *net,
2187 unsigned int prefs, 2185 unsigned int prefs,
2188 struct in6_addr *saddr) 2186 struct in6_addr *saddr)
2189{ 2187{
2190 struct inet6_dev *idev = ip6_dst_idev((struct dst_entry*)rt); 2188 struct inet6_dev *idev = ip6_dst_idev((struct dst_entry *)rt);
2191 int err = 0; 2189 int err = 0;
2192 if (rt->rt6i_prefsrc.plen) 2190 if (rt->rt6i_prefsrc.plen)
2193 *saddr = rt->rt6i_prefsrc.addr; 2191 *saddr = rt->rt6i_prefsrc.addr;
@@ -2482,7 +2480,7 @@ beginning:
2482 return last_err; 2480 return last_err;
2483} 2481}
2484 2482
2485static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh) 2483static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh)
2486{ 2484{
2487 struct fib6_config cfg; 2485 struct fib6_config cfg;
2488 int err; 2486 int err;
@@ -2497,7 +2495,7 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh)
2497 return ip6_route_del(&cfg); 2495 return ip6_route_del(&cfg);
2498} 2496}
2499 2497
2500static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh) 2498static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh)
2501{ 2499{
2502 struct fib6_config cfg; 2500 struct fib6_config cfg;
2503 int err; 2501 int err;
@@ -2689,7 +2687,7 @@ int rt6_dump_route(struct rt6_info *rt, void *p_arg)
2689 prefix, 0, NLM_F_MULTI); 2687 prefix, 0, NLM_F_MULTI);
2690} 2688}
2691 2689
2692static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr* nlh) 2690static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
2693{ 2691{
2694 struct net *net = sock_net(in_skb->sk); 2692 struct net *net = sock_net(in_skb->sk);
2695 struct nlattr *tb[RTA_MAX+1]; 2693 struct nlattr *tb[RTA_MAX+1];
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 6163f851dc01..6eab37cf5345 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -812,9 +812,9 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
812 const struct ipv6hdr *iph6 = ipv6_hdr(skb); 812 const struct ipv6hdr *iph6 = ipv6_hdr(skb);
813 u8 tos = tunnel->parms.iph.tos; 813 u8 tos = tunnel->parms.iph.tos;
814 __be16 df = tiph->frag_off; 814 __be16 df = tiph->frag_off;
815 struct rtable *rt; /* Route to the other host */ 815 struct rtable *rt; /* Route to the other host */
816 struct net_device *tdev; /* Device to other host */ 816 struct net_device *tdev; /* Device to other host */
817 unsigned int max_headroom; /* The extra header space needed */ 817 unsigned int max_headroom; /* The extra header space needed */
818 __be32 dst = tiph->daddr; 818 __be32 dst = tiph->daddr;
819 struct flowi4 fl4; 819 struct flowi4 fl4;
820 int mtu; 820 int mtu;
@@ -822,6 +822,8 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
822 int addr_type; 822 int addr_type;
823 u8 ttl; 823 u8 ttl;
824 int err; 824 int err;
825 u8 protocol = IPPROTO_IPV6;
826 int t_hlen = tunnel->hlen + sizeof(struct iphdr);
825 827
826 if (skb->protocol != htons(ETH_P_IPV6)) 828 if (skb->protocol != htons(ETH_P_IPV6))
827 goto tx_error; 829 goto tx_error;
@@ -911,8 +913,14 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
911 goto tx_error; 913 goto tx_error;
912 } 914 }
913 915
916 skb = iptunnel_handle_offloads(skb, false, SKB_GSO_SIT);
917 if (IS_ERR(skb)) {
918 ip_rt_put(rt);
919 goto out;
920 }
921
914 if (df) { 922 if (df) {
915 mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr); 923 mtu = dst_mtu(&rt->dst) - t_hlen;
916 924
917 if (mtu < 68) { 925 if (mtu < 68) {
918 dev->stats.collisions++; 926 dev->stats.collisions++;
@@ -947,7 +955,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
947 /* 955 /*
948 * Okay, now see if we can stuff it in the buffer as-is. 956 * Okay, now see if we can stuff it in the buffer as-is.
949 */ 957 */
950 max_headroom = LL_RESERVED_SPACE(tdev)+sizeof(struct iphdr); 958 max_headroom = LL_RESERVED_SPACE(tdev) + t_hlen;
951 959
952 if (skb_headroom(skb) < max_headroom || skb_shared(skb) || 960 if (skb_headroom(skb) < max_headroom || skb_shared(skb) ||
953 (skb_cloned(skb) && !skb_clone_writable(skb, 0))) { 961 (skb_cloned(skb) && !skb_clone_writable(skb, 0))) {
@@ -969,14 +977,15 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
969 ttl = iph6->hop_limit; 977 ttl = iph6->hop_limit;
970 tos = INET_ECN_encapsulate(tos, ipv6_get_dsfield(iph6)); 978 tos = INET_ECN_encapsulate(tos, ipv6_get_dsfield(iph6));
971 979
972 skb = iptunnel_handle_offloads(skb, false, SKB_GSO_SIT); 980 if (ip_tunnel_encap(skb, tunnel, &protocol, &fl4) < 0) {
973 if (IS_ERR(skb)) {
974 ip_rt_put(rt); 981 ip_rt_put(rt);
975 goto out; 982 goto tx_error;
976 } 983 }
977 984
985 skb_set_inner_ipproto(skb, IPPROTO_IPV6);
986
978 err = iptunnel_xmit(skb->sk, rt, skb, fl4.saddr, fl4.daddr, 987 err = iptunnel_xmit(skb->sk, rt, skb, fl4.saddr, fl4.daddr,
979 IPPROTO_IPV6, tos, ttl, df, 988 protocol, tos, ttl, df,
980 !net_eq(tunnel->net, dev_net(dev))); 989 !net_eq(tunnel->net, dev_net(dev)));
981 iptunnel_xmit_stats(err, &dev->stats, dev->tstats); 990 iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
982 return NETDEV_TX_OK; 991 return NETDEV_TX_OK;
@@ -999,6 +1008,8 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
999 if (IS_ERR(skb)) 1008 if (IS_ERR(skb))
1000 goto out; 1009 goto out;
1001 1010
1011 skb_set_inner_ipproto(skb, IPPROTO_IPIP);
1012
1002 ip_tunnel_xmit(skb, dev, tiph, IPPROTO_IPIP); 1013 ip_tunnel_xmit(skb, dev, tiph, IPPROTO_IPIP);
1003 return NETDEV_TX_OK; 1014 return NETDEV_TX_OK;
1004out: 1015out:
@@ -1059,8 +1070,10 @@ static void ipip6_tunnel_bind_dev(struct net_device *dev)
1059 tdev = __dev_get_by_index(tunnel->net, tunnel->parms.link); 1070 tdev = __dev_get_by_index(tunnel->net, tunnel->parms.link);
1060 1071
1061 if (tdev) { 1072 if (tdev) {
1073 int t_hlen = tunnel->hlen + sizeof(struct iphdr);
1074
1062 dev->hard_header_len = tdev->hard_header_len + sizeof(struct iphdr); 1075 dev->hard_header_len = tdev->hard_header_len + sizeof(struct iphdr);
1063 dev->mtu = tdev->mtu - sizeof(struct iphdr); 1076 dev->mtu = tdev->mtu - t_hlen;
1064 if (dev->mtu < IPV6_MIN_MTU) 1077 if (dev->mtu < IPV6_MIN_MTU)
1065 dev->mtu = IPV6_MIN_MTU; 1078 dev->mtu = IPV6_MIN_MTU;
1066 } 1079 }
@@ -1123,7 +1136,7 @@ static int ipip6_tunnel_update_6rd(struct ip_tunnel *t,
1123#endif 1136#endif
1124 1137
1125static int 1138static int
1126ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd) 1139ipip6_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
1127{ 1140{
1128 int err = 0; 1141 int err = 0;
1129 struct ip_tunnel_parm p; 1142 struct ip_tunnel_parm p;
@@ -1307,7 +1320,10 @@ done:
1307 1320
1308static int ipip6_tunnel_change_mtu(struct net_device *dev, int new_mtu) 1321static int ipip6_tunnel_change_mtu(struct net_device *dev, int new_mtu)
1309{ 1322{
1310 if (new_mtu < IPV6_MIN_MTU || new_mtu > 0xFFF8 - sizeof(struct iphdr)) 1323 struct ip_tunnel *tunnel = netdev_priv(dev);
1324 int t_hlen = tunnel->hlen + sizeof(struct iphdr);
1325
1326 if (new_mtu < IPV6_MIN_MTU || new_mtu > 0xFFF8 - t_hlen)
1311 return -EINVAL; 1327 return -EINVAL;
1312 dev->mtu = new_mtu; 1328 dev->mtu = new_mtu;
1313 return 0; 1329 return 0;
@@ -1338,14 +1354,17 @@ static void ipip6_dev_free(struct net_device *dev)
1338 1354
1339static void ipip6_tunnel_setup(struct net_device *dev) 1355static void ipip6_tunnel_setup(struct net_device *dev)
1340{ 1356{
1357 struct ip_tunnel *tunnel = netdev_priv(dev);
1358 int t_hlen = tunnel->hlen + sizeof(struct iphdr);
1359
1341 dev->netdev_ops = &ipip6_netdev_ops; 1360 dev->netdev_ops = &ipip6_netdev_ops;
1342 dev->destructor = ipip6_dev_free; 1361 dev->destructor = ipip6_dev_free;
1343 1362
1344 dev->type = ARPHRD_SIT; 1363 dev->type = ARPHRD_SIT;
1345 dev->hard_header_len = LL_MAX_HEADER + sizeof(struct iphdr); 1364 dev->hard_header_len = LL_MAX_HEADER + t_hlen;
1346 dev->mtu = ETH_DATA_LEN - sizeof(struct iphdr); 1365 dev->mtu = ETH_DATA_LEN - t_hlen;
1347 dev->flags = IFF_NOARP; 1366 dev->flags = IFF_NOARP;
1348 dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; 1367 netif_keep_dst(dev);
1349 dev->iflink = 0; 1368 dev->iflink = 0;
1350 dev->addr_len = 4; 1369 dev->addr_len = 4;
1351 dev->features |= NETIF_F_LLTX; 1370 dev->features |= NETIF_F_LLTX;
@@ -1466,6 +1485,40 @@ static void ipip6_netlink_parms(struct nlattr *data[],
1466 1485
1467} 1486}
1468 1487
1488/* This function returns true when ENCAP attributes are present in the nl msg */
1489static bool ipip6_netlink_encap_parms(struct nlattr *data[],
1490 struct ip_tunnel_encap *ipencap)
1491{
1492 bool ret = false;
1493
1494 memset(ipencap, 0, sizeof(*ipencap));
1495
1496 if (!data)
1497 return ret;
1498
1499 if (data[IFLA_IPTUN_ENCAP_TYPE]) {
1500 ret = true;
1501 ipencap->type = nla_get_u16(data[IFLA_IPTUN_ENCAP_TYPE]);
1502 }
1503
1504 if (data[IFLA_IPTUN_ENCAP_FLAGS]) {
1505 ret = true;
1506 ipencap->flags = nla_get_u16(data[IFLA_IPTUN_ENCAP_FLAGS]);
1507 }
1508
1509 if (data[IFLA_IPTUN_ENCAP_SPORT]) {
1510 ret = true;
1511 ipencap->sport = nla_get_u16(data[IFLA_IPTUN_ENCAP_SPORT]);
1512 }
1513
1514 if (data[IFLA_IPTUN_ENCAP_DPORT]) {
1515 ret = true;
1516 ipencap->dport = nla_get_u16(data[IFLA_IPTUN_ENCAP_DPORT]);
1517 }
1518
1519 return ret;
1520}
1521
1469#ifdef CONFIG_IPV6_SIT_6RD 1522#ifdef CONFIG_IPV6_SIT_6RD
1470/* This function returns true when 6RD attributes are present in the nl msg */ 1523/* This function returns true when 6RD attributes are present in the nl msg */
1471static bool ipip6_netlink_6rd_parms(struct nlattr *data[], 1524static bool ipip6_netlink_6rd_parms(struct nlattr *data[],
@@ -1509,12 +1562,20 @@ static int ipip6_newlink(struct net *src_net, struct net_device *dev,
1509{ 1562{
1510 struct net *net = dev_net(dev); 1563 struct net *net = dev_net(dev);
1511 struct ip_tunnel *nt; 1564 struct ip_tunnel *nt;
1565 struct ip_tunnel_encap ipencap;
1512#ifdef CONFIG_IPV6_SIT_6RD 1566#ifdef CONFIG_IPV6_SIT_6RD
1513 struct ip_tunnel_6rd ip6rd; 1567 struct ip_tunnel_6rd ip6rd;
1514#endif 1568#endif
1515 int err; 1569 int err;
1516 1570
1517 nt = netdev_priv(dev); 1571 nt = netdev_priv(dev);
1572
1573 if (ipip6_netlink_encap_parms(data, &ipencap)) {
1574 err = ip_tunnel_encap_setup(nt, &ipencap);
1575 if (err < 0)
1576 return err;
1577 }
1578
1518 ipip6_netlink_parms(data, &nt->parms); 1579 ipip6_netlink_parms(data, &nt->parms);
1519 1580
1520 if (ipip6_tunnel_locate(net, &nt->parms, 0)) 1581 if (ipip6_tunnel_locate(net, &nt->parms, 0))
@@ -1537,15 +1598,23 @@ static int ipip6_changelink(struct net_device *dev, struct nlattr *tb[],
1537{ 1598{
1538 struct ip_tunnel *t = netdev_priv(dev); 1599 struct ip_tunnel *t = netdev_priv(dev);
1539 struct ip_tunnel_parm p; 1600 struct ip_tunnel_parm p;
1601 struct ip_tunnel_encap ipencap;
1540 struct net *net = t->net; 1602 struct net *net = t->net;
1541 struct sit_net *sitn = net_generic(net, sit_net_id); 1603 struct sit_net *sitn = net_generic(net, sit_net_id);
1542#ifdef CONFIG_IPV6_SIT_6RD 1604#ifdef CONFIG_IPV6_SIT_6RD
1543 struct ip_tunnel_6rd ip6rd; 1605 struct ip_tunnel_6rd ip6rd;
1544#endif 1606#endif
1607 int err;
1545 1608
1546 if (dev == sitn->fb_tunnel_dev) 1609 if (dev == sitn->fb_tunnel_dev)
1547 return -EINVAL; 1610 return -EINVAL;
1548 1611
1612 if (ipip6_netlink_encap_parms(data, &ipencap)) {
1613 err = ip_tunnel_encap_setup(t, &ipencap);
1614 if (err < 0)
1615 return err;
1616 }
1617
1549 ipip6_netlink_parms(data, &p); 1618 ipip6_netlink_parms(data, &p);
1550 1619
1551 if (((dev->flags & IFF_POINTOPOINT) && !p.iph.daddr) || 1620 if (((dev->flags & IFF_POINTOPOINT) && !p.iph.daddr) ||
@@ -1599,6 +1668,14 @@ static size_t ipip6_get_size(const struct net_device *dev)
1599 /* IFLA_IPTUN_6RD_RELAY_PREFIXLEN */ 1668 /* IFLA_IPTUN_6RD_RELAY_PREFIXLEN */
1600 nla_total_size(2) + 1669 nla_total_size(2) +
1601#endif 1670#endif
1671 /* IFLA_IPTUN_ENCAP_TYPE */
1672 nla_total_size(2) +
1673 /* IFLA_IPTUN_ENCAP_FLAGS */
1674 nla_total_size(2) +
1675 /* IFLA_IPTUN_ENCAP_SPORT */
1676 nla_total_size(2) +
1677 /* IFLA_IPTUN_ENCAP_DPORT */
1678 nla_total_size(2) +
1602 0; 1679 0;
1603} 1680}
1604 1681
@@ -1630,6 +1707,16 @@ static int ipip6_fill_info(struct sk_buff *skb, const struct net_device *dev)
1630 goto nla_put_failure; 1707 goto nla_put_failure;
1631#endif 1708#endif
1632 1709
1710 if (nla_put_u16(skb, IFLA_IPTUN_ENCAP_TYPE,
1711 tunnel->encap.type) ||
1712 nla_put_u16(skb, IFLA_IPTUN_ENCAP_SPORT,
1713 tunnel->encap.sport) ||
1714 nla_put_u16(skb, IFLA_IPTUN_ENCAP_DPORT,
1715 tunnel->encap.dport) ||
1716 nla_put_u16(skb, IFLA_IPTUN_ENCAP_FLAGS,
1717 tunnel->encap.dport))
1718 goto nla_put_failure;
1719
1633 return 0; 1720 return 0;
1634 1721
1635nla_put_failure: 1722nla_put_failure:
@@ -1651,6 +1738,10 @@ static const struct nla_policy ipip6_policy[IFLA_IPTUN_MAX + 1] = {
1651 [IFLA_IPTUN_6RD_PREFIXLEN] = { .type = NLA_U16 }, 1738 [IFLA_IPTUN_6RD_PREFIXLEN] = { .type = NLA_U16 },
1652 [IFLA_IPTUN_6RD_RELAY_PREFIXLEN] = { .type = NLA_U16 }, 1739 [IFLA_IPTUN_6RD_RELAY_PREFIXLEN] = { .type = NLA_U16 },
1653#endif 1740#endif
1741 [IFLA_IPTUN_ENCAP_TYPE] = { .type = NLA_U16 },
1742 [IFLA_IPTUN_ENCAP_FLAGS] = { .type = NLA_U16 },
1743 [IFLA_IPTUN_ENCAP_SPORT] = { .type = NLA_U16 },
1744 [IFLA_IPTUN_ENCAP_DPORT] = { .type = NLA_U16 },
1654}; 1745};
1655 1746
1656static void ipip6_dellink(struct net_device *dev, struct list_head *head) 1747static void ipip6_dellink(struct net_device *dev, struct list_head *head)
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 83cea1d39466..9a2838e93cc5 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -24,7 +24,7 @@
24#define COOKIEBITS 24 /* Upper bits store count */ 24#define COOKIEBITS 24 /* Upper bits store count */
25#define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1) 25#define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
26 26
27static u32 syncookie6_secret[2][16-4+SHA_DIGEST_WORDS]; 27static u32 syncookie6_secret[2][16-4+SHA_DIGEST_WORDS] __read_mostly;
28 28
29/* RFC 2460, Section 8.3: 29/* RFC 2460, Section 8.3:
30 * [ipv6 tcp] MSS must be computed as the maximum packet size minus 60 [..] 30 * [ipv6 tcp] MSS must be computed as the maximum packet size minus 60 [..]
@@ -203,7 +203,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
203 ireq->ir_num = ntohs(th->dest); 203 ireq->ir_num = ntohs(th->dest);
204 ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr; 204 ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr;
205 ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr; 205 ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr;
206 if (ipv6_opt_accepted(sk, skb) || 206 if (ipv6_opt_accepted(sk, skb, &TCP_SKB_CB(skb)->header.h6) ||
207 np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo || 207 np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo ||
208 np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) { 208 np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) {
209 atomic_inc(&skb->users); 209 atomic_inc(&skb->users);
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 0c56c93619e0..c5c10fafcfe2 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -16,6 +16,8 @@
16#include <net/addrconf.h> 16#include <net/addrconf.h>
17#include <net/inet_frag.h> 17#include <net/inet_frag.h>
18 18
19static int one = 1;
20
19static struct ctl_table ipv6_table_template[] = { 21static struct ctl_table ipv6_table_template[] = {
20 { 22 {
21 .procname = "bindv6only", 23 .procname = "bindv6only",
@@ -63,6 +65,14 @@ static struct ctl_table ipv6_rotable[] = {
63 .mode = 0644, 65 .mode = 0644,
64 .proc_handler = proc_dointvec 66 .proc_handler = proc_dointvec
65 }, 67 },
68 {
69 .procname = "mld_qrv",
70 .data = &sysctl_mld_qrv,
71 .maxlen = sizeof(int),
72 .mode = 0644,
73 .proc_handler = proc_dointvec_minmax,
74 .extra1 = &one
75 },
66 { } 76 { }
67}; 77};
68 78
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 03a5d1ed3340..cf2e45ab2fa4 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -92,13 +92,16 @@ static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(struct sock *sk,
92static void inet6_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb) 92static void inet6_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
93{ 93{
94 struct dst_entry *dst = skb_dst(skb); 94 struct dst_entry *dst = skb_dst(skb);
95 const struct rt6_info *rt = (const struct rt6_info *)dst;
96 95
97 dst_hold(dst); 96 if (dst) {
98 sk->sk_rx_dst = dst; 97 const struct rt6_info *rt = (const struct rt6_info *)dst;
99 inet_sk(sk)->rx_dst_ifindex = skb->skb_iif; 98
100 if (rt->rt6i_node) 99 dst_hold(dst);
101 inet6_sk(sk)->rx_dst_cookie = rt->rt6i_node->fn_sernum; 100 sk->sk_rx_dst = dst;
101 inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
102 if (rt->rt6i_node)
103 inet6_sk(sk)->rx_dst_cookie = rt->rt6i_node->fn_sernum;
104 }
102} 105}
103 106
104static void tcp_v6_hash(struct sock *sk) 107static void tcp_v6_hash(struct sock *sk)
@@ -737,8 +740,9 @@ static void tcp_v6_init_req(struct request_sock *req, struct sock *sk,
737 ipv6_addr_type(&ireq->ir_v6_rmt_addr) & IPV6_ADDR_LINKLOCAL) 740 ipv6_addr_type(&ireq->ir_v6_rmt_addr) & IPV6_ADDR_LINKLOCAL)
738 ireq->ir_iif = inet6_iif(skb); 741 ireq->ir_iif = inet6_iif(skb);
739 742
740 if (!TCP_SKB_CB(skb)->when && 743 if (!TCP_SKB_CB(skb)->tcp_tw_isn &&
741 (ipv6_opt_accepted(sk, skb) || np->rxopt.bits.rxinfo || 744 (ipv6_opt_accepted(sk, skb, &TCP_SKB_CB(skb)->header.h6) ||
745 np->rxopt.bits.rxinfo ||
742 np->rxopt.bits.rxoinfo || np->rxopt.bits.rxhlim || 746 np->rxopt.bits.rxoinfo || np->rxopt.bits.rxhlim ||
743 np->rxopt.bits.rxohlim || np->repflow)) { 747 np->rxopt.bits.rxohlim || np->repflow)) {
744 atomic_inc(&skb->users); 748 atomic_inc(&skb->users);
@@ -1363,7 +1367,7 @@ ipv6_pktoptions:
1363 np->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(opt_skb)); 1367 np->rcv_flowinfo = ip6_flowinfo(ipv6_hdr(opt_skb));
1364 if (np->repflow) 1368 if (np->repflow)
1365 np->flow_label = ip6_flowlabel(ipv6_hdr(opt_skb)); 1369 np->flow_label = ip6_flowlabel(ipv6_hdr(opt_skb));
1366 if (ipv6_opt_accepted(sk, opt_skb)) { 1370 if (ipv6_opt_accepted(sk, opt_skb, &TCP_SKB_CB(opt_skb)->header.h6)) {
1367 skb_set_owner_r(opt_skb, sk); 1371 skb_set_owner_r(opt_skb, sk);
1368 opt_skb = xchg(&np->pktoptions, opt_skb); 1372 opt_skb = xchg(&np->pktoptions, opt_skb);
1369 } else { 1373 } else {
@@ -1407,11 +1411,19 @@ static int tcp_v6_rcv(struct sk_buff *skb)
1407 1411
1408 th = tcp_hdr(skb); 1412 th = tcp_hdr(skb);
1409 hdr = ipv6_hdr(skb); 1413 hdr = ipv6_hdr(skb);
1414 /* This is tricky : We move IPCB at its correct location into TCP_SKB_CB()
1415 * barrier() makes sure compiler wont play fool^Waliasing games.
1416 */
1417 memmove(&TCP_SKB_CB(skb)->header.h6, IP6CB(skb),
1418 sizeof(struct inet6_skb_parm));
1419 barrier();
1420
1410 TCP_SKB_CB(skb)->seq = ntohl(th->seq); 1421 TCP_SKB_CB(skb)->seq = ntohl(th->seq);
1411 TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin + 1422 TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin +
1412 skb->len - th->doff*4); 1423 skb->len - th->doff*4);
1413 TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq); 1424 TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq);
1414 TCP_SKB_CB(skb)->when = 0; 1425 TCP_SKB_CB(skb)->tcp_flags = tcp_flag_byte(th);
1426 TCP_SKB_CB(skb)->tcp_tw_isn = 0;
1415 TCP_SKB_CB(skb)->ip_dsfield = ipv6_get_dsfield(hdr); 1427 TCP_SKB_CB(skb)->ip_dsfield = ipv6_get_dsfield(hdr);
1416 TCP_SKB_CB(skb)->sacked = 0; 1428 TCP_SKB_CB(skb)->sacked = 0;
1417 1429
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index 01b0ff9a0c2c..c1ab77105b4c 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -15,54 +15,17 @@
15#include <net/ip6_checksum.h> 15#include <net/ip6_checksum.h>
16#include "ip6_offload.h" 16#include "ip6_offload.h"
17 17
18static int tcp_v6_gso_send_check(struct sk_buff *skb)
19{
20 const struct ipv6hdr *ipv6h;
21 struct tcphdr *th;
22
23 if (!pskb_may_pull(skb, sizeof(*th)))
24 return -EINVAL;
25
26 ipv6h = ipv6_hdr(skb);
27 th = tcp_hdr(skb);
28
29 th->check = 0;
30 skb->ip_summed = CHECKSUM_PARTIAL;
31 __tcp_v6_send_check(skb, &ipv6h->saddr, &ipv6h->daddr);
32 return 0;
33}
34
35static struct sk_buff **tcp6_gro_receive(struct sk_buff **head, 18static struct sk_buff **tcp6_gro_receive(struct sk_buff **head,
36 struct sk_buff *skb) 19 struct sk_buff *skb)
37{ 20{
38 const struct ipv6hdr *iph = skb_gro_network_header(skb);
39 __wsum wsum;
40
41 /* Don't bother verifying checksum if we're going to flush anyway. */ 21 /* Don't bother verifying checksum if we're going to flush anyway. */
42 if (NAPI_GRO_CB(skb)->flush) 22 if (!NAPI_GRO_CB(skb)->flush &&
43 goto skip_csum; 23 skb_gro_checksum_validate(skb, IPPROTO_TCP,
44 24 ip6_gro_compute_pseudo)) {
45 wsum = NAPI_GRO_CB(skb)->csum;
46
47 switch (skb->ip_summed) {
48 case CHECKSUM_NONE:
49 wsum = skb_checksum(skb, skb_gro_offset(skb), skb_gro_len(skb),
50 wsum);
51
52 /* fall through */
53
54 case CHECKSUM_COMPLETE:
55 if (!tcp_v6_check(skb_gro_len(skb), &iph->saddr, &iph->daddr,
56 wsum)) {
57 skb->ip_summed = CHECKSUM_UNNECESSARY;
58 break;
59 }
60
61 NAPI_GRO_CB(skb)->flush = 1; 25 NAPI_GRO_CB(skb)->flush = 1;
62 return NULL; 26 return NULL;
63 } 27 }
64 28
65skip_csum:
66 return tcp_gro_receive(head, skb); 29 return tcp_gro_receive(head, skb);
67} 30}
68 31
@@ -78,10 +41,32 @@ static int tcp6_gro_complete(struct sk_buff *skb, int thoff)
78 return tcp_gro_complete(skb); 41 return tcp_gro_complete(skb);
79} 42}
80 43
44struct sk_buff *tcp6_gso_segment(struct sk_buff *skb,
45 netdev_features_t features)
46{
47 struct tcphdr *th;
48
49 if (!pskb_may_pull(skb, sizeof(*th)))
50 return ERR_PTR(-EINVAL);
51
52 if (unlikely(skb->ip_summed != CHECKSUM_PARTIAL)) {
53 const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
54 struct tcphdr *th = tcp_hdr(skb);
55
56 /* Set up pseudo header, usually expect stack to have done
57 * this.
58 */
59
60 th->check = 0;
61 skb->ip_summed = CHECKSUM_PARTIAL;
62 __tcp_v6_send_check(skb, &ipv6h->saddr, &ipv6h->daddr);
63 }
64
65 return tcp_gso_segment(skb, features);
66}
81static const struct net_offload tcpv6_offload = { 67static const struct net_offload tcpv6_offload = {
82 .callbacks = { 68 .callbacks = {
83 .gso_send_check = tcp_v6_gso_send_check, 69 .gso_segment = tcp6_gso_segment,
84 .gso_segment = tcp_gso_segment,
85 .gro_receive = tcp6_gro_receive, 70 .gro_receive = tcp6_gro_receive,
86 .gro_complete = tcp6_gro_complete, 71 .gro_complete = tcp6_gro_complete,
87 }, 72 },
diff --git a/net/ipv6/tunnel6.c b/net/ipv6/tunnel6.c
index 2c4e4c5c7614..3c758007b327 100644
--- a/net/ipv6/tunnel6.c
+++ b/net/ipv6/tunnel6.c
@@ -15,7 +15,7 @@
15 * along with this program; if not, see <http://www.gnu.org/licenses/>. 15 * along with this program; if not, see <http://www.gnu.org/licenses/>.
16 * 16 *
17 * Authors Mitsuru KANDA <mk@linux-ipv6.org> 17 * Authors Mitsuru KANDA <mk@linux-ipv6.org>
18 * YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> 18 * YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
19 */ 19 */
20 20
21#define pr_fmt(fmt) "IPv6: " fmt 21#define pr_fmt(fmt) "IPv6: " fmt
@@ -64,7 +64,6 @@ err:
64 64
65 return ret; 65 return ret;
66} 66}
67
68EXPORT_SYMBOL(xfrm6_tunnel_register); 67EXPORT_SYMBOL(xfrm6_tunnel_register);
69 68
70int xfrm6_tunnel_deregister(struct xfrm6_tunnel *handler, unsigned short family) 69int xfrm6_tunnel_deregister(struct xfrm6_tunnel *handler, unsigned short family)
@@ -92,7 +91,6 @@ int xfrm6_tunnel_deregister(struct xfrm6_tunnel *handler, unsigned short family)
92 91
93 return ret; 92 return ret;
94} 93}
95
96EXPORT_SYMBOL(xfrm6_tunnel_deregister); 94EXPORT_SYMBOL(xfrm6_tunnel_deregister);
97 95
98#define for_each_tunnel_rcu(head, handler) \ 96#define for_each_tunnel_rcu(head, handler) \
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 4836af8f582d..f6ba535b6feb 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -243,7 +243,7 @@ begin:
243 goto exact_match; 243 goto exact_match;
244 } else if (score == badness && reuseport) { 244 } else if (score == badness && reuseport) {
245 matches++; 245 matches++;
246 if (((u64)hash * matches) >> 32 == 0) 246 if (reciprocal_scale(hash, matches) == 0)
247 result = sk; 247 result = sk;
248 hash = next_pseudo_random32(hash); 248 hash = next_pseudo_random32(hash);
249 } 249 }
@@ -323,7 +323,7 @@ begin:
323 } 323 }
324 } else if (score == badness && reuseport) { 324 } else if (score == badness && reuseport) {
325 matches++; 325 matches++;
326 if (((u64)hash * matches) >> 32 == 0) 326 if (reciprocal_scale(hash, matches) == 0)
327 result = sk; 327 result = sk;
328 hash = next_pseudo_random32(hash); 328 hash = next_pseudo_random32(hash);
329 } 329 }
@@ -373,8 +373,8 @@ EXPORT_SYMBOL_GPL(udp6_lib_lookup);
373 373
374 374
375/* 375/*
376 * This should be easy, if there is something there we 376 * This should be easy, if there is something there we
377 * return it, otherwise we block. 377 * return it, otherwise we block.
378 */ 378 */
379 379
380int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk, 380int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
@@ -530,7 +530,7 @@ void __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
530 const struct ipv6hdr *hdr = (const struct ipv6hdr *)skb->data; 530 const struct ipv6hdr *hdr = (const struct ipv6hdr *)skb->data;
531 const struct in6_addr *saddr = &hdr->saddr; 531 const struct in6_addr *saddr = &hdr->saddr;
532 const struct in6_addr *daddr = &hdr->daddr; 532 const struct in6_addr *daddr = &hdr->daddr;
533 struct udphdr *uh = (struct udphdr*)(skb->data+offset); 533 struct udphdr *uh = (struct udphdr *)(skb->data+offset);
534 struct sock *sk; 534 struct sock *sk;
535 int err; 535 int err;
536 struct net *net = dev_net(skb->dev); 536 struct net *net = dev_net(skb->dev);
@@ -596,7 +596,7 @@ static int __udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
596 596
597static __inline__ void udpv6_err(struct sk_buff *skb, 597static __inline__ void udpv6_err(struct sk_buff *skb,
598 struct inet6_skb_parm *opt, u8 type, 598 struct inet6_skb_parm *opt, u8 type,
599 u8 code, int offset, __be32 info ) 599 u8 code, int offset, __be32 info)
600{ 600{
601 __udp6_lib_err(skb, opt, type, code, offset, info, &udp_table); 601 __udp6_lib_err(skb, opt, type, code, offset, info, &udp_table);
602} 602}
@@ -891,6 +891,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
891 goto csum_error; 891 goto csum_error;
892 } 892 }
893 893
894 if (udp_sk(sk)->convert_csum && uh->check && !IS_UDPLITE(sk))
895 skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
896 ip6_compute_pseudo);
897
894 ret = udpv6_queue_rcv_skb(sk, skb); 898 ret = udpv6_queue_rcv_skb(sk, skb);
895 sock_put(sk); 899 sock_put(sk);
896 900
@@ -960,10 +964,10 @@ static void udp_v6_flush_pending_frames(struct sock *sk)
960} 964}
961 965
962/** 966/**
963 * udp6_hwcsum_outgoing - handle outgoing HW checksumming 967 * udp6_hwcsum_outgoing - handle outgoing HW checksumming
964 * @sk: socket we are sending on 968 * @sk: socket we are sending on
965 * @skb: sk_buff containing the filled-in UDP header 969 * @skb: sk_buff containing the filled-in UDP header
966 * (checksum field must be zeroed out) 970 * (checksum field must be zeroed out)
967 */ 971 */
968static void udp6_hwcsum_outgoing(struct sock *sk, struct sk_buff *skb, 972static void udp6_hwcsum_outgoing(struct sock *sk, struct sk_buff *skb,
969 const struct in6_addr *saddr, 973 const struct in6_addr *saddr,
@@ -1294,7 +1298,7 @@ do_append_data:
1294 getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag; 1298 getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
1295 err = ip6_append_data(sk, getfrag, msg->msg_iov, ulen, 1299 err = ip6_append_data(sk, getfrag, msg->msg_iov, ulen,
1296 sizeof(struct udphdr), hlimit, tclass, opt, &fl6, 1300 sizeof(struct udphdr), hlimit, tclass, opt, &fl6,
1297 (struct rt6_info*)dst, 1301 (struct rt6_info *)dst,
1298 corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags, dontfrag); 1302 corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags, dontfrag);
1299 if (err) 1303 if (err)
1300 udp_v6_flush_pending_frames(sk); 1304 udp_v6_flush_pending_frames(sk);
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 0ae3d98f83e0..6b8f543f6ac6 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -10,34 +10,13 @@
10 * UDPv6 GSO support 10 * UDPv6 GSO support
11 */ 11 */
12#include <linux/skbuff.h> 12#include <linux/skbuff.h>
13#include <linux/netdevice.h>
13#include <net/protocol.h> 14#include <net/protocol.h>
14#include <net/ipv6.h> 15#include <net/ipv6.h>
15#include <net/udp.h> 16#include <net/udp.h>
16#include <net/ip6_checksum.h> 17#include <net/ip6_checksum.h>
17#include "ip6_offload.h" 18#include "ip6_offload.h"
18 19
19static int udp6_ufo_send_check(struct sk_buff *skb)
20{
21 const struct ipv6hdr *ipv6h;
22 struct udphdr *uh;
23
24 if (!pskb_may_pull(skb, sizeof(*uh)))
25 return -EINVAL;
26
27 if (likely(!skb->encapsulation)) {
28 ipv6h = ipv6_hdr(skb);
29 uh = udp_hdr(skb);
30
31 uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, skb->len,
32 IPPROTO_UDP, 0);
33 skb->csum_start = skb_transport_header(skb) - skb->head;
34 skb->csum_offset = offsetof(struct udphdr, check);
35 skb->ip_summed = CHECKSUM_PARTIAL;
36 }
37
38 return 0;
39}
40
41static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, 20static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
42 netdev_features_t features) 21 netdev_features_t features)
43{ 22{
@@ -48,7 +27,6 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
48 u8 *packet_start, *prevhdr; 27 u8 *packet_start, *prevhdr;
49 u8 nexthdr; 28 u8 nexthdr;
50 u8 frag_hdr_sz = sizeof(struct frag_hdr); 29 u8 frag_hdr_sz = sizeof(struct frag_hdr);
51 int offset;
52 __wsum csum; 30 __wsum csum;
53 int tnl_hlen; 31 int tnl_hlen;
54 32
@@ -80,15 +58,29 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
80 58
81 if (skb->encapsulation && skb_shinfo(skb)->gso_type & 59 if (skb->encapsulation && skb_shinfo(skb)->gso_type &
82 (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM)) 60 (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM))
83 segs = skb_udp_tunnel_segment(skb, features); 61 segs = skb_udp_tunnel_segment(skb, features, true);
84 else { 62 else {
63 const struct ipv6hdr *ipv6h;
64 struct udphdr *uh;
65
66 if (!pskb_may_pull(skb, sizeof(struct udphdr)))
67 goto out;
68
85 /* Do software UFO. Complete and fill in the UDP checksum as HW cannot 69 /* Do software UFO. Complete and fill in the UDP checksum as HW cannot
86 * do checksum of UDP packets sent as multiple IP fragments. 70 * do checksum of UDP packets sent as multiple IP fragments.
87 */ 71 */
88 offset = skb_checksum_start_offset(skb); 72
89 csum = skb_checksum(skb, offset, skb->len - offset, 0); 73 uh = udp_hdr(skb);
90 offset += skb->csum_offset; 74 ipv6h = ipv6_hdr(skb);
91 *(__sum16 *)(skb->data + offset) = csum_fold(csum); 75
76 uh->check = 0;
77 csum = skb_checksum(skb, 0, skb->len, 0);
78 uh->check = udp_v6_check(skb->len, &ipv6h->saddr,
79 &ipv6h->daddr, csum);
80
81 if (uh->check == 0)
82 uh->check = CSUM_MANGLED_0;
83
92 skb->ip_summed = CHECKSUM_NONE; 84 skb->ip_summed = CHECKSUM_NONE;
93 85
94 /* Check if there is enough headroom to insert fragment header. */ 86 /* Check if there is enough headroom to insert fragment header. */
@@ -127,10 +119,52 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
127out: 119out:
128 return segs; 120 return segs;
129} 121}
122
123static struct sk_buff **udp6_gro_receive(struct sk_buff **head,
124 struct sk_buff *skb)
125{
126 struct udphdr *uh = udp_gro_udphdr(skb);
127
128 if (unlikely(!uh))
129 goto flush;
130
131 /* Don't bother verifying checksum if we're going to flush anyway. */
132 if (NAPI_GRO_CB(skb)->flush)
133 goto skip;
134
135 if (skb_gro_checksum_validate_zero_check(skb, IPPROTO_UDP, uh->check,
136 ip6_gro_compute_pseudo))
137 goto flush;
138 else if (uh->check)
139 skb_gro_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
140 ip6_gro_compute_pseudo);
141
142skip:
143 NAPI_GRO_CB(skb)->is_ipv6 = 1;
144 return udp_gro_receive(head, skb, uh);
145
146flush:
147 NAPI_GRO_CB(skb)->flush = 1;
148 return NULL;
149}
150
151static int udp6_gro_complete(struct sk_buff *skb, int nhoff)
152{
153 const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
154 struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
155
156 if (uh->check)
157 uh->check = ~udp_v6_check(skb->len - nhoff, &ipv6h->saddr,
158 &ipv6h->daddr, 0);
159
160 return udp_gro_complete(skb, nhoff);
161}
162
130static const struct net_offload udpv6_offload = { 163static const struct net_offload udpv6_offload = {
131 .callbacks = { 164 .callbacks = {
132 .gso_send_check = udp6_ufo_send_check,
133 .gso_segment = udp6_ufo_fragment, 165 .gso_segment = udp6_ufo_fragment,
166 .gro_receive = udp6_gro_receive,
167 .gro_complete = udp6_gro_complete,
134 }, 168 },
135}; 169};
136 170
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index f8c3cf842f53..f48fbe4d16f5 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -3,8 +3,8 @@
3 * 3 *
4 * Authors: 4 * Authors:
5 * Mitsuru KANDA @USAGI 5 * Mitsuru KANDA @USAGI
6 * Kazunori MIYAZAWA @USAGI 6 * Kazunori MIYAZAWA @USAGI
7 * Kunihiro Ishiguro <kunihiro@ipinfusion.com> 7 * Kunihiro Ishiguro <kunihiro@ipinfusion.com>
8 * YOSHIFUJI Hideaki @USAGI 8 * YOSHIFUJI Hideaki @USAGI
9 * IPv6 support 9 * IPv6 support
10 */ 10 */
@@ -52,7 +52,6 @@ int xfrm6_rcv(struct sk_buff *skb)
52 return xfrm6_rcv_spi(skb, skb_network_header(skb)[IP6CB(skb)->nhoff], 52 return xfrm6_rcv_spi(skb, skb_network_header(skb)[IP6CB(skb)->nhoff],
53 0); 53 0);
54} 54}
55
56EXPORT_SYMBOL(xfrm6_rcv); 55EXPORT_SYMBOL(xfrm6_rcv);
57 56
58int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr, 57int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
@@ -142,5 +141,4 @@ int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
142drop: 141drop:
143 return -1; 142 return -1;
144} 143}
145
146EXPORT_SYMBOL(xfrm6_input_addr); 144EXPORT_SYMBOL(xfrm6_input_addr);
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 433672d07d0b..ca3f29b98ae5 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -25,7 +25,6 @@ int xfrm6_find_1stfragopt(struct xfrm_state *x, struct sk_buff *skb,
25{ 25{
26 return ip6_find_1stfragopt(skb, prevhdr); 26 return ip6_find_1stfragopt(skb, prevhdr);
27} 27}
28
29EXPORT_SYMBOL(xfrm6_find_1stfragopt); 28EXPORT_SYMBOL(xfrm6_find_1stfragopt);
30 29
31static int xfrm6_local_dontfrag(struct sk_buff *skb) 30static int xfrm6_local_dontfrag(struct sk_buff *skb)
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 2a0bbda2c76a..ac49f84fe2c3 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -3,11 +3,11 @@
3 * 3 *
4 * Authors: 4 * Authors:
5 * Mitsuru KANDA @USAGI 5 * Mitsuru KANDA @USAGI
6 * Kazunori MIYAZAWA @USAGI 6 * Kazunori MIYAZAWA @USAGI
7 * Kunihiro Ishiguro <kunihiro@ipinfusion.com> 7 * Kunihiro Ishiguro <kunihiro@ipinfusion.com>
8 * IPv6 support 8 * IPv6 support
9 * YOSHIFUJI Hideaki 9 * YOSHIFUJI Hideaki
10 * Split up af-specific portion 10 * Split up af-specific portion
11 * 11 *
12 */ 12 */
13 13
@@ -84,7 +84,7 @@ static int xfrm6_init_path(struct xfrm_dst *path, struct dst_entry *dst,
84 int nfheader_len) 84 int nfheader_len)
85{ 85{
86 if (dst->ops->family == AF_INET6) { 86 if (dst->ops->family == AF_INET6) {
87 struct rt6_info *rt = (struct rt6_info*)dst; 87 struct rt6_info *rt = (struct rt6_info *)dst;
88 if (rt->rt6i_node) 88 if (rt->rt6i_node)
89 path->path_cookie = rt->rt6i_node->fn_sernum; 89 path->path_cookie = rt->rt6i_node->fn_sernum;
90 } 90 }
@@ -97,7 +97,7 @@ static int xfrm6_init_path(struct xfrm_dst *path, struct dst_entry *dst,
97static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev, 97static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
98 const struct flowi *fl) 98 const struct flowi *fl)
99{ 99{
100 struct rt6_info *rt = (struct rt6_info*)xdst->route; 100 struct rt6_info *rt = (struct rt6_info *)xdst->route;
101 101
102 xdst->u.dst.dev = dev; 102 xdst->u.dst.dev = dev;
103 dev_hold(dev); 103 dev_hold(dev);
@@ -296,7 +296,7 @@ static struct xfrm_policy_afinfo xfrm6_policy_afinfo = {
296 .family = AF_INET6, 296 .family = AF_INET6,
297 .dst_ops = &xfrm6_dst_ops, 297 .dst_ops = &xfrm6_dst_ops,
298 .dst_lookup = xfrm6_dst_lookup, 298 .dst_lookup = xfrm6_dst_lookup,
299 .get_saddr = xfrm6_get_saddr, 299 .get_saddr = xfrm6_get_saddr,
300 .decode_session = _decode_session6, 300 .decode_session = _decode_session6,
301 .get_tos = xfrm6_get_tos, 301 .get_tos = xfrm6_get_tos,
302 .init_dst = xfrm6_init_dst, 302 .init_dst = xfrm6_init_dst,
@@ -319,9 +319,9 @@ static void xfrm6_policy_fini(void)
319static struct ctl_table xfrm6_policy_table[] = { 319static struct ctl_table xfrm6_policy_table[] = {
320 { 320 {
321 .procname = "xfrm6_gc_thresh", 321 .procname = "xfrm6_gc_thresh",
322 .data = &init_net.xfrm.xfrm6_dst_ops.gc_thresh, 322 .data = &init_net.xfrm.xfrm6_dst_ops.gc_thresh,
323 .maxlen = sizeof(int), 323 .maxlen = sizeof(int),
324 .mode = 0644, 324 .mode = 0644,
325 .proc_handler = proc_dointvec, 325 .proc_handler = proc_dointvec,
326 }, 326 },
327 { } 327 { }
diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
index 3fc970135fc6..8a1f9c0d2a13 100644
--- a/net/ipv6/xfrm6_state.c
+++ b/net/ipv6/xfrm6_state.c
@@ -3,11 +3,11 @@
3 * 3 *
4 * Authors: 4 * Authors:
5 * Mitsuru KANDA @USAGI 5 * Mitsuru KANDA @USAGI
6 * Kazunori MIYAZAWA @USAGI 6 * Kazunori MIYAZAWA @USAGI
7 * Kunihiro Ishiguro <kunihiro@ipinfusion.com> 7 * Kunihiro Ishiguro <kunihiro@ipinfusion.com>
8 * IPv6 support 8 * IPv6 support
9 * YOSHIFUJI Hideaki @USAGI 9 * YOSHIFUJI Hideaki @USAGI
10 * Split up af-specific portion 10 * Split up af-specific portion
11 * 11 *
12 */ 12 */
13 13
@@ -45,10 +45,10 @@ xfrm6_init_temprop(struct xfrm_state *x, const struct xfrm_tmpl *tmpl,
45 const xfrm_address_t *daddr, const xfrm_address_t *saddr) 45 const xfrm_address_t *daddr, const xfrm_address_t *saddr)
46{ 46{
47 x->id = tmpl->id; 47 x->id = tmpl->id;
48 if (ipv6_addr_any((struct in6_addr*)&x->id.daddr)) 48 if (ipv6_addr_any((struct in6_addr *)&x->id.daddr))
49 memcpy(&x->id.daddr, daddr, sizeof(x->sel.daddr)); 49 memcpy(&x->id.daddr, daddr, sizeof(x->sel.daddr));
50 memcpy(&x->props.saddr, &tmpl->saddr, sizeof(x->props.saddr)); 50 memcpy(&x->props.saddr, &tmpl->saddr, sizeof(x->props.saddr));
51 if (ipv6_addr_any((struct in6_addr*)&x->props.saddr)) 51 if (ipv6_addr_any((struct in6_addr *)&x->props.saddr))
52 memcpy(&x->props.saddr, saddr, sizeof(x->props.saddr)); 52 memcpy(&x->props.saddr, saddr, sizeof(x->props.saddr));
53 x->props.mode = tmpl->mode; 53 x->props.mode = tmpl->mode;
54 x->props.reqid = tmpl->reqid; 54 x->props.reqid = tmpl->reqid;
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index 1c66465a42dd..5743044cd660 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -15,7 +15,7 @@
15 * along with this program; if not, see <http://www.gnu.org/licenses/>. 15 * along with this program; if not, see <http://www.gnu.org/licenses/>.
16 * 16 *
17 * Authors Mitsuru KANDA <mk@linux-ipv6.org> 17 * Authors Mitsuru KANDA <mk@linux-ipv6.org>
18 * YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> 18 * YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
19 * 19 *
20 * Based on net/ipv4/xfrm4_tunnel.c 20 * Based on net/ipv4/xfrm4_tunnel.c
21 * 21 *
@@ -110,7 +110,6 @@ __be32 xfrm6_tunnel_spi_lookup(struct net *net, const xfrm_address_t *saddr)
110 rcu_read_unlock_bh(); 110 rcu_read_unlock_bh();
111 return htonl(spi); 111 return htonl(spi);
112} 112}
113
114EXPORT_SYMBOL(xfrm6_tunnel_spi_lookup); 113EXPORT_SYMBOL(xfrm6_tunnel_spi_lookup);
115 114
116static int __xfrm6_tunnel_spi_check(struct net *net, u32 spi) 115static int __xfrm6_tunnel_spi_check(struct net *net, u32 spi)
@@ -187,7 +186,6 @@ __be32 xfrm6_tunnel_alloc_spi(struct net *net, xfrm_address_t *saddr)
187 186
188 return htonl(spi); 187 return htonl(spi);
189} 188}
190
191EXPORT_SYMBOL(xfrm6_tunnel_alloc_spi); 189EXPORT_SYMBOL(xfrm6_tunnel_alloc_spi);
192 190
193static void x6spi_destroy_rcu(struct rcu_head *head) 191static void x6spi_destroy_rcu(struct rcu_head *head)
diff --git a/net/irda/irlan/irlan_common.c b/net/irda/irlan/irlan_common.c
index 1bc49edf2296..5a2d0a695529 100644
--- a/net/irda/irlan/irlan_common.c
+++ b/net/irda/irlan/irlan_common.c
@@ -98,7 +98,7 @@ static const struct file_operations irlan_fops = {
98extern struct proc_dir_entry *proc_irda; 98extern struct proc_dir_entry *proc_irda;
99#endif /* CONFIG_PROC_FS */ 99#endif /* CONFIG_PROC_FS */
100 100
101static struct irlan_cb *irlan_open(__u32 saddr, __u32 daddr); 101static struct irlan_cb __init *irlan_open(__u32 saddr, __u32 daddr);
102static void __irlan_close(struct irlan_cb *self); 102static void __irlan_close(struct irlan_cb *self);
103static int __irlan_insert_param(struct sk_buff *skb, char *param, int type, 103static int __irlan_insert_param(struct sk_buff *skb, char *param, int type,
104 __u8 value_byte, __u16 value_short, 104 __u8 value_byte, __u16 value_short,
@@ -196,7 +196,7 @@ static void __exit irlan_cleanup(void)
196 * Open new instance of a client/provider, we should only register the 196 * Open new instance of a client/provider, we should only register the
197 * network device if this instance is ment for a particular client/provider 197 * network device if this instance is ment for a particular client/provider
198 */ 198 */
199static struct irlan_cb *irlan_open(__u32 saddr, __u32 daddr) 199static struct irlan_cb __init *irlan_open(__u32 saddr, __u32 daddr)
200{ 200{
201 struct net_device *dev; 201 struct net_device *dev;
202 struct irlan_cb *self; 202 struct irlan_cb *self;
diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c
index da787930df0a..2a6a1fdd62c0 100644
--- a/net/iucv/iucv.c
+++ b/net/iucv/iucv.c
@@ -493,8 +493,8 @@ static void iucv_declare_cpu(void *data)
493 err = "Paging or storage error"; 493 err = "Paging or storage error";
494 break; 494 break;
495 } 495 }
496 pr_warning("Defining an interrupt buffer on CPU %i" 496 pr_warn("Defining an interrupt buffer on CPU %i failed with 0x%02x (%s)\n",
497 " failed with 0x%02x (%s)\n", cpu, rc, err); 497 cpu, rc, err);
498 return; 498 return;
499 } 499 }
500 500
@@ -1831,7 +1831,7 @@ static void iucv_external_interrupt(struct ext_code ext_code,
1831 BUG_ON(p->iptype < 0x01 || p->iptype > 0x09); 1831 BUG_ON(p->iptype < 0x01 || p->iptype > 0x09);
1832 work = kmalloc(sizeof(struct iucv_irq_list), GFP_ATOMIC); 1832 work = kmalloc(sizeof(struct iucv_irq_list), GFP_ATOMIC);
1833 if (!work) { 1833 if (!work) {
1834 pr_warning("iucv_external_interrupt: out of memory\n"); 1834 pr_warn("iucv_external_interrupt: out of memory\n");
1835 return; 1835 return;
1836 } 1836 }
1837 memcpy(&work->data, p, sizeof(work->data)); 1837 memcpy(&work->data, p, sizeof(work->data));
@@ -1974,8 +1974,7 @@ static int iucv_pm_restore(struct device *dev)
1974 printk(KERN_WARNING "iucv_pm_restore %p\n", iucv_path_table); 1974 printk(KERN_WARNING "iucv_pm_restore %p\n", iucv_path_table);
1975#endif 1975#endif
1976 if ((iucv_pm_state != IUCV_PM_RESTORING) && iucv_path_table) 1976 if ((iucv_pm_state != IUCV_PM_RESTORING) && iucv_path_table)
1977 pr_warning("Suspending Linux did not completely close all IUCV " 1977 pr_warn("Suspending Linux did not completely close all IUCV connections\n");
1978 "connections\n");
1979 iucv_pm_state = IUCV_PM_RESTORING; 1978 iucv_pm_state = IUCV_PM_RESTORING;
1980 if (cpumask_empty(&iucv_irq_cpumask)) { 1979 if (cpumask_empty(&iucv_irq_cpumask)) {
1981 rc = iucv_query_maxconn(); 1980 rc = iucv_query_maxconn();
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 1109d3bb8dac..895348e44c7d 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -148,7 +148,7 @@ do { \
148 atomic_read(&_t->ref_count)); \ 148 atomic_read(&_t->ref_count)); \
149 l2tp_tunnel_inc_refcount_1(_t); \ 149 l2tp_tunnel_inc_refcount_1(_t); \
150} while (0) 150} while (0)
151#define l2tp_tunnel_dec_refcount(_t) 151#define l2tp_tunnel_dec_refcount(_t) \
152do { \ 152do { \
153 pr_debug("l2tp_tunnel_dec_refcount: %s:%d %s: cnt=%d\n", \ 153 pr_debug("l2tp_tunnel_dec_refcount: %s:%d %s: cnt=%d\n", \
154 __func__, __LINE__, (_t)->name, \ 154 __func__, __LINE__, (_t)->name, \
@@ -1582,19 +1582,17 @@ int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id, u32
1582 /* Mark socket as an encapsulation socket. See net/ipv4/udp.c */ 1582 /* Mark socket as an encapsulation socket. See net/ipv4/udp.c */
1583 tunnel->encap = encap; 1583 tunnel->encap = encap;
1584 if (encap == L2TP_ENCAPTYPE_UDP) { 1584 if (encap == L2TP_ENCAPTYPE_UDP) {
1585 /* Mark socket as an encapsulation socket. See net/ipv4/udp.c */ 1585 struct udp_tunnel_sock_cfg udp_cfg;
1586 udp_sk(sk)->encap_type = UDP_ENCAP_L2TPINUDP; 1586
1587 udp_sk(sk)->encap_rcv = l2tp_udp_encap_recv; 1587 udp_cfg.sk_user_data = tunnel;
1588 udp_sk(sk)->encap_destroy = l2tp_udp_encap_destroy; 1588 udp_cfg.encap_type = UDP_ENCAP_L2TPINUDP;
1589#if IS_ENABLED(CONFIG_IPV6) 1589 udp_cfg.encap_rcv = l2tp_udp_encap_recv;
1590 if (sk->sk_family == PF_INET6 && !tunnel->v4mapped) 1590 udp_cfg.encap_destroy = l2tp_udp_encap_destroy;
1591 udpv6_encap_enable();
1592 else
1593#endif
1594 udp_encap_enable();
1595 }
1596 1591
1597 sk->sk_user_data = tunnel; 1592 setup_udp_tunnel_sock(net, sock, &udp_cfg);
1593 } else {
1594 sk->sk_user_data = tunnel;
1595 }
1598 1596
1599 /* Hook on the tunnel socket destructor so that we can cleanup 1597 /* Hook on the tunnel socket destructor so that we can cleanup
1600 * if the tunnel socket goes away. 1598 * if the tunnel socket goes away.
diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
index f0e84bc48038..a48bad468880 100644
--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -227,7 +227,7 @@ static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *d
227void __ieee80211_start_rx_ba_session(struct sta_info *sta, 227void __ieee80211_start_rx_ba_session(struct sta_info *sta,
228 u8 dialog_token, u16 timeout, 228 u8 dialog_token, u16 timeout,
229 u16 start_seq_num, u16 ba_policy, u16 tid, 229 u16 start_seq_num, u16 ba_policy, u16 tid,
230 u16 buf_size, bool tx) 230 u16 buf_size, bool tx, bool auto_seq)
231{ 231{
232 struct ieee80211_local *local = sta->sdata->local; 232 struct ieee80211_local *local = sta->sdata->local;
233 struct tid_ampdu_rx *tid_agg_rx; 233 struct tid_ampdu_rx *tid_agg_rx;
@@ -326,6 +326,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
326 tid_agg_rx->buf_size = buf_size; 326 tid_agg_rx->buf_size = buf_size;
327 tid_agg_rx->timeout = timeout; 327 tid_agg_rx->timeout = timeout;
328 tid_agg_rx->stored_mpdu_num = 0; 328 tid_agg_rx->stored_mpdu_num = 0;
329 tid_agg_rx->auto_seq = auto_seq;
329 status = WLAN_STATUS_SUCCESS; 330 status = WLAN_STATUS_SUCCESS;
330 331
331 /* activate it for RX */ 332 /* activate it for RX */
@@ -367,7 +368,7 @@ void ieee80211_process_addba_request(struct ieee80211_local *local,
367 368
368 __ieee80211_start_rx_ba_session(sta, dialog_token, timeout, 369 __ieee80211_start_rx_ba_session(sta, dialog_token, timeout,
369 start_seq_num, ba_policy, tid, 370 start_seq_num, ba_policy, tid,
370 buf_size, true); 371 buf_size, true, false);
371} 372}
372 373
373void ieee80211_start_rx_ba_session_offl(struct ieee80211_vif *vif, 374void ieee80211_start_rx_ba_session_offl(struct ieee80211_vif *vif,
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 927b4ea0128b..fb6a1502b6df 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -2,6 +2,7 @@
2 * mac80211 configuration hooks for cfg80211 2 * mac80211 configuration hooks for cfg80211
3 * 3 *
4 * Copyright 2006-2010 Johannes Berg <johannes@sipsolutions.net> 4 * Copyright 2006-2010 Johannes Berg <johannes@sipsolutions.net>
5 * Copyright 2013-2014 Intel Mobile Communications GmbH
5 * 6 *
6 * This file is GPLv2 as found in COPYING. 7 * This file is GPLv2 as found in COPYING.
7 */ 8 */
@@ -682,8 +683,19 @@ static int ieee80211_start_ap(struct wiphy *wiphy, struct net_device *dev,
682 if (old) 683 if (old)
683 return -EALREADY; 684 return -EALREADY;
684 685
685 /* TODO: make hostapd tell us what it wants */ 686 switch (params->smps_mode) {
686 sdata->smps_mode = IEEE80211_SMPS_OFF; 687 case NL80211_SMPS_OFF:
688 sdata->smps_mode = IEEE80211_SMPS_OFF;
689 break;
690 case NL80211_SMPS_STATIC:
691 sdata->smps_mode = IEEE80211_SMPS_STATIC;
692 break;
693 case NL80211_SMPS_DYNAMIC:
694 sdata->smps_mode = IEEE80211_SMPS_DYNAMIC;
695 break;
696 default:
697 return -EINVAL;
698 }
687 sdata->needed_rx_chains = sdata->local->rx_chains; 699 sdata->needed_rx_chains = sdata->local->rx_chains;
688 700
689 mutex_lock(&local->mtx); 701 mutex_lock(&local->mtx);
@@ -1011,15 +1023,8 @@ static int sta_apply_parameters(struct ieee80211_local *local,
1011 clear_sta_flag(sta, WLAN_STA_SHORT_PREAMBLE); 1023 clear_sta_flag(sta, WLAN_STA_SHORT_PREAMBLE);
1012 } 1024 }
1013 1025
1014 if (mask & BIT(NL80211_STA_FLAG_WME)) { 1026 if (mask & BIT(NL80211_STA_FLAG_WME))
1015 if (set & BIT(NL80211_STA_FLAG_WME)) { 1027 sta->sta.wme = set & BIT(NL80211_STA_FLAG_WME);
1016 set_sta_flag(sta, WLAN_STA_WME);
1017 sta->sta.wme = true;
1018 } else {
1019 clear_sta_flag(sta, WLAN_STA_WME);
1020 sta->sta.wme = false;
1021 }
1022 }
1023 1028
1024 if (mask & BIT(NL80211_STA_FLAG_MFP)) { 1029 if (mask & BIT(NL80211_STA_FLAG_MFP)) {
1025 if (set & BIT(NL80211_STA_FLAG_MFP)) 1030 if (set & BIT(NL80211_STA_FLAG_MFP))
@@ -1984,8 +1989,13 @@ static int ieee80211_set_wiphy_params(struct wiphy *wiphy, u32 changed)
1984 return err; 1989 return err;
1985 } 1990 }
1986 1991
1987 if (changed & WIPHY_PARAM_COVERAGE_CLASS) { 1992 if ((changed & WIPHY_PARAM_COVERAGE_CLASS) ||
1988 err = drv_set_coverage_class(local, wiphy->coverage_class); 1993 (changed & WIPHY_PARAM_DYN_ACK)) {
1994 s16 coverage_class;
1995
1996 coverage_class = changed & WIPHY_PARAM_COVERAGE_CLASS ?
1997 wiphy->coverage_class : -1;
1998 err = drv_set_coverage_class(local, coverage_class);
1989 1999
1990 if (err) 2000 if (err)
1991 return err; 2001 return err;
@@ -2358,6 +2368,58 @@ static int ieee80211_set_bitrate_mask(struct wiphy *wiphy,
2358 return 0; 2368 return 0;
2359} 2369}
2360 2370
2371static bool ieee80211_coalesce_started_roc(struct ieee80211_local *local,
2372 struct ieee80211_roc_work *new_roc,
2373 struct ieee80211_roc_work *cur_roc)
2374{
2375 unsigned long j = jiffies;
2376 unsigned long cur_roc_end = cur_roc->hw_start_time +
2377 msecs_to_jiffies(cur_roc->duration);
2378 struct ieee80211_roc_work *next_roc;
2379 int new_dur;
2380
2381 if (WARN_ON(!cur_roc->started || !cur_roc->hw_begun))
2382 return false;
2383
2384 if (time_after(j + IEEE80211_ROC_MIN_LEFT, cur_roc_end))
2385 return false;
2386
2387 ieee80211_handle_roc_started(new_roc);
2388
2389 new_dur = new_roc->duration - jiffies_to_msecs(cur_roc_end - j);
2390
2391 /* cur_roc is long enough - add new_roc to the dependents list. */
2392 if (new_dur <= 0) {
2393 list_add_tail(&new_roc->list, &cur_roc->dependents);
2394 return true;
2395 }
2396
2397 new_roc->duration = new_dur;
2398
2399 /*
2400 * if cur_roc was already coalesced before, we might
2401 * want to extend the next roc instead of adding
2402 * a new one.
2403 */
2404 next_roc = list_entry(cur_roc->list.next,
2405 struct ieee80211_roc_work, list);
2406 if (&next_roc->list != &local->roc_list &&
2407 next_roc->chan == new_roc->chan &&
2408 next_roc->sdata == new_roc->sdata &&
2409 !WARN_ON(next_roc->started)) {
2410 list_add_tail(&new_roc->list, &next_roc->dependents);
2411 next_roc->duration = max(next_roc->duration,
2412 new_roc->duration);
2413 next_roc->type = max(next_roc->type, new_roc->type);
2414 return true;
2415 }
2416
2417 /* add right after cur_roc */
2418 list_add(&new_roc->list, &cur_roc->list);
2419
2420 return true;
2421}
2422
2361static int ieee80211_start_roc_work(struct ieee80211_local *local, 2423static int ieee80211_start_roc_work(struct ieee80211_local *local,
2362 struct ieee80211_sub_if_data *sdata, 2424 struct ieee80211_sub_if_data *sdata,
2363 struct ieee80211_channel *channel, 2425 struct ieee80211_channel *channel,
@@ -2463,8 +2525,6 @@ static int ieee80211_start_roc_work(struct ieee80211_local *local,
2463 2525
2464 /* If it has already started, it's more difficult ... */ 2526 /* If it has already started, it's more difficult ... */
2465 if (local->ops->remain_on_channel) { 2527 if (local->ops->remain_on_channel) {
2466 unsigned long j = jiffies;
2467
2468 /* 2528 /*
2469 * In the offloaded ROC case, if it hasn't begun, add 2529 * In the offloaded ROC case, if it hasn't begun, add
2470 * this new one to the dependent list to be handled 2530 * this new one to the dependent list to be handled
@@ -2487,28 +2547,8 @@ static int ieee80211_start_roc_work(struct ieee80211_local *local,
2487 break; 2547 break;
2488 } 2548 }
2489 2549
2490 if (time_before(j + IEEE80211_ROC_MIN_LEFT, 2550 if (ieee80211_coalesce_started_roc(local, roc, tmp))
2491 tmp->hw_start_time +
2492 msecs_to_jiffies(tmp->duration))) {
2493 int new_dur;
2494
2495 ieee80211_handle_roc_started(roc);
2496
2497 new_dur = roc->duration -
2498 jiffies_to_msecs(tmp->hw_start_time +
2499 msecs_to_jiffies(
2500 tmp->duration) -
2501 j);
2502
2503 if (new_dur > 0) {
2504 /* add right after tmp */
2505 list_add(&roc->list, &tmp->list);
2506 } else {
2507 list_add_tail(&roc->list,
2508 &tmp->dependents);
2509 }
2510 queued = true; 2551 queued = true;
2511 }
2512 } else if (del_timer_sync(&tmp->work.timer)) { 2552 } else if (del_timer_sync(&tmp->work.timer)) {
2513 unsigned long new_end; 2553 unsigned long new_end;
2514 2554
@@ -3352,7 +3392,7 @@ static int ieee80211_probe_client(struct wiphy *wiphy, struct net_device *dev,
3352 band = chanctx_conf->def.chan->band; 3392 band = chanctx_conf->def.chan->band;
3353 sta = sta_info_get_bss(sdata, peer); 3393 sta = sta_info_get_bss(sdata, peer);
3354 if (sta) { 3394 if (sta) {
3355 qos = test_sta_flag(sta, WLAN_STA_WME); 3395 qos = sta->sta.wme;
3356 } else { 3396 } else {
3357 rcu_read_unlock(); 3397 rcu_read_unlock();
3358 return -ENOLINK; 3398 return -ENOLINK;
diff --git a/net/mac80211/chan.c b/net/mac80211/chan.c
index 399ad82c997f..4c74e8da64b9 100644
--- a/net/mac80211/chan.c
+++ b/net/mac80211/chan.c
@@ -549,12 +549,12 @@ static void ieee80211_recalc_chanctx_chantype(struct ieee80211_local *local,
549 549
550 compat = cfg80211_chandef_compatible( 550 compat = cfg80211_chandef_compatible(
551 &sdata->vif.bss_conf.chandef, compat); 551 &sdata->vif.bss_conf.chandef, compat);
552 if (!compat) 552 if (WARN_ON_ONCE(!compat))
553 break; 553 break;
554 } 554 }
555 rcu_read_unlock(); 555 rcu_read_unlock();
556 556
557 if (WARN_ON_ONCE(!compat)) 557 if (!compat)
558 return; 558 return;
559 559
560 ieee80211_change_chanctx(local, ctx, compat); 560 ieee80211_change_chanctx(local, ctx, compat);
@@ -639,41 +639,6 @@ out:
639 return ret; 639 return ret;
640} 640}
641 641
642static void __ieee80211_vif_release_channel(struct ieee80211_sub_if_data *sdata)
643{
644 struct ieee80211_local *local = sdata->local;
645 struct ieee80211_chanctx_conf *conf;
646 struct ieee80211_chanctx *ctx;
647 bool use_reserved_switch = false;
648
649 lockdep_assert_held(&local->chanctx_mtx);
650
651 conf = rcu_dereference_protected(sdata->vif.chanctx_conf,
652 lockdep_is_held(&local->chanctx_mtx));
653 if (!conf)
654 return;
655
656 ctx = container_of(conf, struct ieee80211_chanctx, conf);
657
658 if (sdata->reserved_chanctx) {
659 if (sdata->reserved_chanctx->replace_state ==
660 IEEE80211_CHANCTX_REPLACES_OTHER &&
661 ieee80211_chanctx_num_reserved(local,
662 sdata->reserved_chanctx) > 1)
663 use_reserved_switch = true;
664
665 ieee80211_vif_unreserve_chanctx(sdata);
666 }
667
668 ieee80211_assign_vif_chanctx(sdata, NULL);
669 if (ieee80211_chanctx_refcount(local, ctx) == 0)
670 ieee80211_free_chanctx(local, ctx);
671
672 /* Unreserving may ready an in-place reservation. */
673 if (use_reserved_switch)
674 ieee80211_vif_use_reserved_switch(local);
675}
676
677void ieee80211_recalc_smps_chanctx(struct ieee80211_local *local, 642void ieee80211_recalc_smps_chanctx(struct ieee80211_local *local,
678 struct ieee80211_chanctx *chanctx) 643 struct ieee80211_chanctx *chanctx)
679{ 644{
@@ -764,63 +729,6 @@ void ieee80211_recalc_smps_chanctx(struct ieee80211_local *local,
764 drv_change_chanctx(local, chanctx, IEEE80211_CHANCTX_CHANGE_RX_CHAINS); 729 drv_change_chanctx(local, chanctx, IEEE80211_CHANCTX_CHANGE_RX_CHAINS);
765} 730}
766 731
767int ieee80211_vif_use_channel(struct ieee80211_sub_if_data *sdata,
768 const struct cfg80211_chan_def *chandef,
769 enum ieee80211_chanctx_mode mode)
770{
771 struct ieee80211_local *local = sdata->local;
772 struct ieee80211_chanctx *ctx;
773 u8 radar_detect_width = 0;
774 int ret;
775
776 lockdep_assert_held(&local->mtx);
777
778 WARN_ON(sdata->dev && netif_carrier_ok(sdata->dev));
779
780 mutex_lock(&local->chanctx_mtx);
781
782 ret = cfg80211_chandef_dfs_required(local->hw.wiphy,
783 chandef,
784 sdata->wdev.iftype);
785 if (ret < 0)
786 goto out;
787 if (ret > 0)
788 radar_detect_width = BIT(chandef->width);
789
790 sdata->radar_required = ret;
791
792 ret = ieee80211_check_combinations(sdata, chandef, mode,
793 radar_detect_width);
794 if (ret < 0)
795 goto out;
796
797 __ieee80211_vif_release_channel(sdata);
798
799 ctx = ieee80211_find_chanctx(local, chandef, mode);
800 if (!ctx)
801 ctx = ieee80211_new_chanctx(local, chandef, mode);
802 if (IS_ERR(ctx)) {
803 ret = PTR_ERR(ctx);
804 goto out;
805 }
806
807 sdata->vif.bss_conf.chandef = *chandef;
808
809 ret = ieee80211_assign_vif_chanctx(sdata, ctx);
810 if (ret) {
811 /* if assign fails refcount stays the same */
812 if (ieee80211_chanctx_refcount(local, ctx) == 0)
813 ieee80211_free_chanctx(local, ctx);
814 goto out;
815 }
816
817 ieee80211_recalc_smps_chanctx(local, ctx);
818 ieee80211_recalc_radar_chanctx(local, ctx);
819 out:
820 mutex_unlock(&local->chanctx_mtx);
821 return ret;
822}
823
824static void 732static void
825__ieee80211_vif_copy_chanctx_to_vlans(struct ieee80211_sub_if_data *sdata, 733__ieee80211_vif_copy_chanctx_to_vlans(struct ieee80211_sub_if_data *sdata,
826 bool clear) 734 bool clear)
@@ -1269,8 +1177,7 @@ err:
1269 return err; 1177 return err;
1270} 1178}
1271 1179
1272int 1180static int ieee80211_vif_use_reserved_switch(struct ieee80211_local *local)
1273ieee80211_vif_use_reserved_switch(struct ieee80211_local *local)
1274{ 1181{
1275 struct ieee80211_sub_if_data *sdata, *sdata_tmp; 1182 struct ieee80211_sub_if_data *sdata, *sdata_tmp;
1276 struct ieee80211_chanctx *ctx, *ctx_tmp, *old_ctx; 1183 struct ieee80211_chanctx *ctx, *ctx_tmp, *old_ctx;
@@ -1522,6 +1429,98 @@ err:
1522 return err; 1429 return err;
1523} 1430}
1524 1431
1432static void __ieee80211_vif_release_channel(struct ieee80211_sub_if_data *sdata)
1433{
1434 struct ieee80211_local *local = sdata->local;
1435 struct ieee80211_chanctx_conf *conf;
1436 struct ieee80211_chanctx *ctx;
1437 bool use_reserved_switch = false;
1438
1439 lockdep_assert_held(&local->chanctx_mtx);
1440
1441 conf = rcu_dereference_protected(sdata->vif.chanctx_conf,
1442 lockdep_is_held(&local->chanctx_mtx));
1443 if (!conf)
1444 return;
1445
1446 ctx = container_of(conf, struct ieee80211_chanctx, conf);
1447
1448 if (sdata->reserved_chanctx) {
1449 if (sdata->reserved_chanctx->replace_state ==
1450 IEEE80211_CHANCTX_REPLACES_OTHER &&
1451 ieee80211_chanctx_num_reserved(local,
1452 sdata->reserved_chanctx) > 1)
1453 use_reserved_switch = true;
1454
1455 ieee80211_vif_unreserve_chanctx(sdata);
1456 }
1457
1458 ieee80211_assign_vif_chanctx(sdata, NULL);
1459 if (ieee80211_chanctx_refcount(local, ctx) == 0)
1460 ieee80211_free_chanctx(local, ctx);
1461
1462 /* Unreserving may ready an in-place reservation. */
1463 if (use_reserved_switch)
1464 ieee80211_vif_use_reserved_switch(local);
1465}
1466
1467int ieee80211_vif_use_channel(struct ieee80211_sub_if_data *sdata,
1468 const struct cfg80211_chan_def *chandef,
1469 enum ieee80211_chanctx_mode mode)
1470{
1471 struct ieee80211_local *local = sdata->local;
1472 struct ieee80211_chanctx *ctx;
1473 u8 radar_detect_width = 0;
1474 int ret;
1475
1476 lockdep_assert_held(&local->mtx);
1477
1478 WARN_ON(sdata->dev && netif_carrier_ok(sdata->dev));
1479
1480 mutex_lock(&local->chanctx_mtx);
1481
1482 ret = cfg80211_chandef_dfs_required(local->hw.wiphy,
1483 chandef,
1484 sdata->wdev.iftype);
1485 if (ret < 0)
1486 goto out;
1487 if (ret > 0)
1488 radar_detect_width = BIT(chandef->width);
1489
1490 sdata->radar_required = ret;
1491
1492 ret = ieee80211_check_combinations(sdata, chandef, mode,
1493 radar_detect_width);
1494 if (ret < 0)
1495 goto out;
1496
1497 __ieee80211_vif_release_channel(sdata);
1498
1499 ctx = ieee80211_find_chanctx(local, chandef, mode);
1500 if (!ctx)
1501 ctx = ieee80211_new_chanctx(local, chandef, mode);
1502 if (IS_ERR(ctx)) {
1503 ret = PTR_ERR(ctx);
1504 goto out;
1505 }
1506
1507 sdata->vif.bss_conf.chandef = *chandef;
1508
1509 ret = ieee80211_assign_vif_chanctx(sdata, ctx);
1510 if (ret) {
1511 /* if assign fails refcount stays the same */
1512 if (ieee80211_chanctx_refcount(local, ctx) == 0)
1513 ieee80211_free_chanctx(local, ctx);
1514 goto out;
1515 }
1516
1517 ieee80211_recalc_smps_chanctx(local, ctx);
1518 ieee80211_recalc_radar_chanctx(local, ctx);
1519 out:
1520 mutex_unlock(&local->chanctx_mtx);
1521 return ret;
1522}
1523
1525int ieee80211_vif_use_reserved_context(struct ieee80211_sub_if_data *sdata) 1524int ieee80211_vif_use_reserved_context(struct ieee80211_sub_if_data *sdata)
1526{ 1525{
1527 struct ieee80211_local *local = sdata->local; 1526 struct ieee80211_local *local = sdata->local;
diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 0e963bc1ceac..54a189f0393e 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -3,6 +3,7 @@
3 * mac80211 debugfs for wireless PHYs 3 * mac80211 debugfs for wireless PHYs
4 * 4 *
5 * Copyright 2007 Johannes Berg <johannes@sipsolutions.net> 5 * Copyright 2007 Johannes Berg <johannes@sipsolutions.net>
6 * Copyright 2013-2014 Intel Mobile Communications GmbH
6 * 7 *
7 * GPLv2 8 * GPLv2
8 * 9 *
@@ -302,11 +303,6 @@ static ssize_t hwflags_read(struct file *file, char __user *user_buf,
302 sf += scnprintf(buf + sf, mxln - sf, "SUPPORTS_DYNAMIC_PS\n"); 303 sf += scnprintf(buf + sf, mxln - sf, "SUPPORTS_DYNAMIC_PS\n");
303 if (local->hw.flags & IEEE80211_HW_MFP_CAPABLE) 304 if (local->hw.flags & IEEE80211_HW_MFP_CAPABLE)
304 sf += scnprintf(buf + sf, mxln - sf, "MFP_CAPABLE\n"); 305 sf += scnprintf(buf + sf, mxln - sf, "MFP_CAPABLE\n");
305 if (local->hw.flags & IEEE80211_HW_SUPPORTS_STATIC_SMPS)
306 sf += scnprintf(buf + sf, mxln - sf, "SUPPORTS_STATIC_SMPS\n");
307 if (local->hw.flags & IEEE80211_HW_SUPPORTS_DYNAMIC_SMPS)
308 sf += scnprintf(buf + sf, mxln - sf,
309 "SUPPORTS_DYNAMIC_SMPS\n");
310 if (local->hw.flags & IEEE80211_HW_SUPPORTS_UAPSD) 306 if (local->hw.flags & IEEE80211_HW_SUPPORTS_UAPSD)
311 sf += scnprintf(buf + sf, mxln - sf, "SUPPORTS_UAPSD\n"); 307 sf += scnprintf(buf + sf, mxln - sf, "SUPPORTS_UAPSD\n");
312 if (local->hw.flags & IEEE80211_HW_REPORTS_TX_ACK_STATUS) 308 if (local->hw.flags & IEEE80211_HW_REPORTS_TX_ACK_STATUS)
diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c
index e205ebabfa50..c68896adfa96 100644
--- a/net/mac80211/debugfs_netdev.c
+++ b/net/mac80211/debugfs_netdev.c
@@ -226,12 +226,12 @@ static int ieee80211_set_smps(struct ieee80211_sub_if_data *sdata,
226 struct ieee80211_local *local = sdata->local; 226 struct ieee80211_local *local = sdata->local;
227 int err; 227 int err;
228 228
229 if (!(local->hw.flags & IEEE80211_HW_SUPPORTS_STATIC_SMPS) && 229 if (!(local->hw.wiphy->features & NL80211_FEATURE_STATIC_SMPS) &&
230 smps_mode == IEEE80211_SMPS_STATIC) 230 smps_mode == IEEE80211_SMPS_STATIC)
231 return -EINVAL; 231 return -EINVAL;
232 232
233 /* auto should be dynamic if in PS mode */ 233 /* auto should be dynamic if in PS mode */
234 if (!(local->hw.flags & IEEE80211_HW_SUPPORTS_DYNAMIC_SMPS) && 234 if (!(local->hw.wiphy->features & NL80211_FEATURE_DYNAMIC_SMPS) &&
235 (smps_mode == IEEE80211_SMPS_DYNAMIC || 235 (smps_mode == IEEE80211_SMPS_DYNAMIC ||
236 smps_mode == IEEE80211_SMPS_AUTOMATIC)) 236 smps_mode == IEEE80211_SMPS_AUTOMATIC))
237 return -EINVAL; 237 return -EINVAL;
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 86173c0de40e..bafe48916229 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -2,6 +2,7 @@
2 * Copyright 2003-2005 Devicescape Software, Inc. 2 * Copyright 2003-2005 Devicescape Software, Inc.
3 * Copyright (c) 2006 Jiri Benc <jbenc@suse.cz> 3 * Copyright (c) 2006 Jiri Benc <jbenc@suse.cz>
4 * Copyright 2007 Johannes Berg <johannes@sipsolutions.net> 4 * Copyright 2007 Johannes Berg <johannes@sipsolutions.net>
5 * Copyright 2013-2014 Intel Mobile Communications GmbH
5 * 6 *
6 * This program is free software; you can redistribute it and/or modify 7 * This program is free software; you can redistribute it and/or modify
7 * it under the terms of the GNU General Public License version 2 as 8 * it under the terms of the GNU General Public License version 2 as
@@ -77,7 +78,8 @@ static ssize_t sta_flags_read(struct file *file, char __user *userbuf,
77 TEST(AUTH), TEST(ASSOC), TEST(PS_STA), 78 TEST(AUTH), TEST(ASSOC), TEST(PS_STA),
78 TEST(PS_DRIVER), TEST(AUTHORIZED), 79 TEST(PS_DRIVER), TEST(AUTHORIZED),
79 TEST(SHORT_PREAMBLE), 80 TEST(SHORT_PREAMBLE),
80 TEST(WME), TEST(WDS), TEST(CLEAR_PS_FILT), 81 sta->sta.wme ? "WME\n" : "",
82 TEST(WDS), TEST(CLEAR_PS_FILT),
81 TEST(MFP), TEST(BLOCK_BA), TEST(PSPOLL), 83 TEST(MFP), TEST(BLOCK_BA), TEST(PSPOLL),
82 TEST(UAPSD), TEST(SP), TEST(TDLS_PEER), 84 TEST(UAPSD), TEST(SP), TEST(TDLS_PEER),
83 TEST(TDLS_PEER_AUTH), TEST(4ADDR_EVENT), 85 TEST(TDLS_PEER_AUTH), TEST(4ADDR_EVENT),
diff --git a/net/mac80211/driver-ops.h b/net/mac80211/driver-ops.h
index 11423958116a..196d48c68134 100644
--- a/net/mac80211/driver-ops.h
+++ b/net/mac80211/driver-ops.h
@@ -450,7 +450,7 @@ static inline int drv_set_rts_threshold(struct ieee80211_local *local,
450} 450}
451 451
452static inline int drv_set_coverage_class(struct ieee80211_local *local, 452static inline int drv_set_coverage_class(struct ieee80211_local *local,
453 u8 value) 453 s16 value)
454{ 454{
455 int ret = 0; 455 int ret = 0;
456 might_sleep(); 456 might_sleep();
diff --git a/net/mac80211/ibss.c b/net/mac80211/ibss.c
index 9713dc54ea4b..56b53571c807 100644
--- a/net/mac80211/ibss.c
+++ b/net/mac80211/ibss.c
@@ -6,6 +6,7 @@
6 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz> 6 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
7 * Copyright 2007, Michael Wu <flamingice@sourmilk.net> 7 * Copyright 2007, Michael Wu <flamingice@sourmilk.net>
8 * Copyright 2009, Johannes Berg <johannes@sipsolutions.net> 8 * Copyright 2009, Johannes Berg <johannes@sipsolutions.net>
9 * Copyright 2013-2014 Intel Mobile Communications GmbH
9 * 10 *
10 * This program is free software; you can redistribute it and/or modify 11 * This program is free software; you can redistribute it and/or modify
11 * it under the terms of the GNU General Public License version 2 as 12 * it under the terms of the GNU General Public License version 2 as
@@ -1038,7 +1039,7 @@ static void ieee80211_rx_bss_info(struct ieee80211_sub_if_data *sdata,
1038 } 1039 }
1039 1040
1040 if (sta && elems->wmm_info) 1041 if (sta && elems->wmm_info)
1041 set_sta_flag(sta, WLAN_STA_WME); 1042 sta->sta.wme = true;
1042 1043
1043 if (sta && elems->ht_operation && elems->ht_cap_elem && 1044 if (sta && elems->ht_operation && elems->ht_cap_elem &&
1044 sdata->u.ibss.chandef.width != NL80211_CHAN_WIDTH_20_NOHT && 1045 sdata->u.ibss.chandef.width != NL80211_CHAN_WIDTH_20_NOHT &&
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index ef7a089ac546..c2aaec4dfcf0 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -3,6 +3,7 @@
3 * Copyright 2005, Devicescape Software, Inc. 3 * Copyright 2005, Devicescape Software, Inc.
4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz> 4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
5 * Copyright 2007-2010 Johannes Berg <johannes@sipsolutions.net> 5 * Copyright 2007-2010 Johannes Berg <johannes@sipsolutions.net>
6 * Copyright 2013-2014 Intel Mobile Communications GmbH
6 * 7 *
7 * This program is free software; you can redistribute it and/or modify 8 * This program is free software; you can redistribute it and/or modify
8 * it under the terms of the GNU General Public License version 2 as 9 * it under the terms of the GNU General Public License version 2 as
@@ -354,6 +355,7 @@ enum ieee80211_sta_flags {
354 IEEE80211_STA_DISABLE_80P80MHZ = BIT(12), 355 IEEE80211_STA_DISABLE_80P80MHZ = BIT(12),
355 IEEE80211_STA_DISABLE_160MHZ = BIT(13), 356 IEEE80211_STA_DISABLE_160MHZ = BIT(13),
356 IEEE80211_STA_DISABLE_WMM = BIT(14), 357 IEEE80211_STA_DISABLE_WMM = BIT(14),
358 IEEE80211_STA_ENABLE_RRM = BIT(15),
357}; 359};
358 360
359struct ieee80211_mgd_auth_data { 361struct ieee80211_mgd_auth_data {
@@ -1367,6 +1369,7 @@ struct ieee802_11_elems {
1367 const struct ieee80211_wide_bw_chansw_ie *wide_bw_chansw_ie; 1369 const struct ieee80211_wide_bw_chansw_ie *wide_bw_chansw_ie;
1368 const u8 *country_elem; 1370 const u8 *country_elem;
1369 const u8 *pwr_constr_elem; 1371 const u8 *pwr_constr_elem;
1372 const u8 *cisco_dtpc_elem;
1370 const struct ieee80211_timeout_interval_ie *timeout_int; 1373 const struct ieee80211_timeout_interval_ie *timeout_int;
1371 const u8 *opmode_notif; 1374 const u8 *opmode_notif;
1372 const struct ieee80211_sec_chan_offs_ie *sec_chan_offs; 1375 const struct ieee80211_sec_chan_offs_ie *sec_chan_offs;
@@ -1587,7 +1590,7 @@ void __ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid,
1587void __ieee80211_start_rx_ba_session(struct sta_info *sta, 1590void __ieee80211_start_rx_ba_session(struct sta_info *sta,
1588 u8 dialog_token, u16 timeout, 1591 u8 dialog_token, u16 timeout,
1589 u16 start_seq_num, u16 ba_policy, u16 tid, 1592 u16 start_seq_num, u16 ba_policy, u16 tid,
1590 u16 buf_size, bool tx); 1593 u16 buf_size, bool tx, bool auto_seq);
1591void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta, 1594void ieee80211_sta_tear_down_BA_sessions(struct sta_info *sta,
1592 enum ieee80211_agg_stop_reason reason); 1595 enum ieee80211_agg_stop_reason reason);
1593void ieee80211_process_delba(struct ieee80211_sub_if_data *sdata, 1596void ieee80211_process_delba(struct ieee80211_sub_if_data *sdata,
@@ -1869,7 +1872,6 @@ ieee80211_vif_reserve_chanctx(struct ieee80211_sub_if_data *sdata,
1869int __must_check 1872int __must_check
1870ieee80211_vif_use_reserved_context(struct ieee80211_sub_if_data *sdata); 1873ieee80211_vif_use_reserved_context(struct ieee80211_sub_if_data *sdata);
1871int ieee80211_vif_unreserve_chanctx(struct ieee80211_sub_if_data *sdata); 1874int ieee80211_vif_unreserve_chanctx(struct ieee80211_sub_if_data *sdata);
1872int ieee80211_vif_use_reserved_switch(struct ieee80211_local *local);
1873 1875
1874int __must_check 1876int __must_check
1875ieee80211_vif_change_bandwidth(struct ieee80211_sub_if_data *sdata, 1877ieee80211_vif_change_bandwidth(struct ieee80211_sub_if_data *sdata,
@@ -1918,7 +1920,7 @@ int ieee80211_tdls_mgmt(struct wiphy *wiphy, struct net_device *dev,
1918 size_t extra_ies_len); 1920 size_t extra_ies_len);
1919int ieee80211_tdls_oper(struct wiphy *wiphy, struct net_device *dev, 1921int ieee80211_tdls_oper(struct wiphy *wiphy, struct net_device *dev,
1920 const u8 *peer, enum nl80211_tdls_operation oper); 1922 const u8 *peer, enum nl80211_tdls_operation oper);
1921 1923void ieee80211_tdls_peer_del_work(struct work_struct *wk);
1922 1924
1923extern const struct ethtool_ops ieee80211_ethtool_ops; 1925extern const struct ethtool_ops ieee80211_ethtool_ops;
1924 1926
@@ -1929,4 +1931,3 @@ extern const struct ethtool_ops ieee80211_ethtool_ops;
1929#endif 1931#endif
1930 1932
1931#endif /* IEEE80211_I_H */ 1933#endif /* IEEE80211_I_H */
1932void ieee80211_tdls_peer_del_work(struct work_struct *wk);
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index f75e5f132c5a..af237223a8cd 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -5,6 +5,7 @@
5 * Copyright 2005-2006, Devicescape Software, Inc. 5 * Copyright 2005-2006, Devicescape Software, Inc.
6 * Copyright (c) 2006 Jiri Benc <jbenc@suse.cz> 6 * Copyright (c) 2006 Jiri Benc <jbenc@suse.cz>
7 * Copyright 2008, Johannes Berg <johannes@sipsolutions.net> 7 * Copyright 2008, Johannes Berg <johannes@sipsolutions.net>
8 * Copyright 2013-2014 Intel Mobile Communications GmbH
8 * 9 *
9 * This program is free software; you can redistribute it and/or modify 10 * This program is free software; you can redistribute it and/or modify
10 * it under the terms of the GNU General Public License version 2 as 11 * it under the terms of the GNU General Public License version 2 as
@@ -1172,19 +1173,11 @@ static void ieee80211_iface_work(struct work_struct *work)
1172 rx_agg = (void *)&skb->cb; 1173 rx_agg = (void *)&skb->cb;
1173 mutex_lock(&local->sta_mtx); 1174 mutex_lock(&local->sta_mtx);
1174 sta = sta_info_get_bss(sdata, rx_agg->addr); 1175 sta = sta_info_get_bss(sdata, rx_agg->addr);
1175 if (sta) { 1176 if (sta)
1176 u16 last_seq;
1177
1178 last_seq = IEEE80211_SEQ_TO_SN(le16_to_cpu(
1179 sta->last_seq_ctrl[rx_agg->tid]));
1180
1181 __ieee80211_start_rx_ba_session(sta, 1177 __ieee80211_start_rx_ba_session(sta,
1182 0, 0, 1178 0, 0, 0, 1, rx_agg->tid,
1183 ieee80211_sn_inc(last_seq),
1184 1, rx_agg->tid,
1185 IEEE80211_MAX_AMPDU_BUF, 1179 IEEE80211_MAX_AMPDU_BUF,
1186 false); 1180 false, true);
1187 }
1188 mutex_unlock(&local->sta_mtx); 1181 mutex_unlock(&local->sta_mtx);
1189 } else if (skb->pkt_type == IEEE80211_SDATA_QUEUE_RX_AGG_STOP) { 1182 } else if (skb->pkt_type == IEEE80211_SDATA_QUEUE_RX_AGG_STOP) {
1190 rx_agg = (void *)&skb->cb; 1183 rx_agg = (void *)&skb->cb;
diff --git a/net/mac80211/key.c b/net/mac80211/key.c
index d808cff80153..4712150dc210 100644
--- a/net/mac80211/key.c
+++ b/net/mac80211/key.c
@@ -3,6 +3,7 @@
3 * Copyright 2005-2006, Devicescape Software, Inc. 3 * Copyright 2005-2006, Devicescape Software, Inc.
4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz> 4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
5 * Copyright 2007-2008 Johannes Berg <johannes@sipsolutions.net> 5 * Copyright 2007-2008 Johannes Berg <johannes@sipsolutions.net>
6 * Copyright 2013-2014 Intel Mobile Communications GmbH
6 * 7 *
7 * This program is free software; you can redistribute it and/or modify 8 * This program is free software; you can redistribute it and/or modify
8 * it under the terms of the GNU General Public License version 2 as 9 * it under the terms of the GNU General Public License version 2 as
@@ -130,9 +131,7 @@ static int ieee80211_key_enable_hw_accel(struct ieee80211_key *key)
130 if (!ret) { 131 if (!ret) {
131 key->flags |= KEY_FLAG_UPLOADED_TO_HARDWARE; 132 key->flags |= KEY_FLAG_UPLOADED_TO_HARDWARE;
132 133
133 if (!((key->conf.flags & IEEE80211_KEY_FLAG_GENERATE_MMIC) || 134 if (!(key->conf.flags & IEEE80211_KEY_FLAG_GENERATE_MMIC))
134 (key->conf.flags & IEEE80211_KEY_FLAG_GENERATE_IV) ||
135 (key->conf.flags & IEEE80211_KEY_FLAG_PUT_IV_SPACE)))
136 sdata->crypto_tx_tailroom_needed_cnt--; 135 sdata->crypto_tx_tailroom_needed_cnt--;
137 136
138 WARN_ON((key->conf.flags & IEEE80211_KEY_FLAG_PUT_IV_SPACE) && 137 WARN_ON((key->conf.flags & IEEE80211_KEY_FLAG_PUT_IV_SPACE) &&
@@ -180,9 +179,7 @@ static void ieee80211_key_disable_hw_accel(struct ieee80211_key *key)
180 sta = key->sta; 179 sta = key->sta;
181 sdata = key->sdata; 180 sdata = key->sdata;
182 181
183 if (!((key->conf.flags & IEEE80211_KEY_FLAG_GENERATE_MMIC) || 182 if (!(key->conf.flags & IEEE80211_KEY_FLAG_GENERATE_MMIC))
184 (key->conf.flags & IEEE80211_KEY_FLAG_GENERATE_IV) ||
185 (key->conf.flags & IEEE80211_KEY_FLAG_PUT_IV_SPACE)))
186 increment_tailroom_need_count(sdata); 183 increment_tailroom_need_count(sdata);
187 184
188 ret = drv_set_key(key->local, DISABLE_KEY, sdata, 185 ret = drv_set_key(key->local, DISABLE_KEY, sdata,
@@ -425,7 +422,7 @@ static void ieee80211_key_free_common(struct ieee80211_key *key)
425 ieee80211_aes_key_free(key->u.ccmp.tfm); 422 ieee80211_aes_key_free(key->u.ccmp.tfm);
426 if (key->conf.cipher == WLAN_CIPHER_SUITE_AES_CMAC) 423 if (key->conf.cipher == WLAN_CIPHER_SUITE_AES_CMAC)
427 ieee80211_aes_cmac_key_free(key->u.aes_cmac.tfm); 424 ieee80211_aes_cmac_key_free(key->u.aes_cmac.tfm);
428 kfree(key); 425 kzfree(key);
429} 426}
430 427
431static void __ieee80211_key_destroy(struct ieee80211_key *key, 428static void __ieee80211_key_destroy(struct ieee80211_key *key,
@@ -878,9 +875,7 @@ void ieee80211_remove_key(struct ieee80211_key_conf *keyconf)
878 if (key->flags & KEY_FLAG_UPLOADED_TO_HARDWARE) { 875 if (key->flags & KEY_FLAG_UPLOADED_TO_HARDWARE) {
879 key->flags &= ~KEY_FLAG_UPLOADED_TO_HARDWARE; 876 key->flags &= ~KEY_FLAG_UPLOADED_TO_HARDWARE;
880 877
881 if (!((key->conf.flags & IEEE80211_KEY_FLAG_GENERATE_MMIC) || 878 if (!(key->conf.flags & IEEE80211_KEY_FLAG_GENERATE_MMIC))
882 (key->conf.flags & IEEE80211_KEY_FLAG_GENERATE_IV) ||
883 (key->conf.flags & IEEE80211_KEY_FLAG_PUT_IV_SPACE)))
884 increment_tailroom_need_count(key->sdata); 879 increment_tailroom_need_count(key->sdata);
885 } 880 }
886 881
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index e0ab4320a078..0de7c93bf62b 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -2,6 +2,7 @@
2 * Copyright 2002-2005, Instant802 Networks, Inc. 2 * Copyright 2002-2005, Instant802 Networks, Inc.
3 * Copyright 2005-2006, Devicescape Software, Inc. 3 * Copyright 2005-2006, Devicescape Software, Inc.
4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz> 4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
5 * Copyright 2013-2014 Intel Mobile Communications GmbH
5 * 6 *
6 * This program is free software; you can redistribute it and/or modify 7 * This program is free software; you can redistribute it and/or modify
7 * it under the terms of the GNU General Public License version 2 as 8 * it under the terms of the GNU General Public License version 2 as
diff --git a/net/mac80211/mesh_pathtbl.c b/net/mac80211/mesh_pathtbl.c
index cf032a8db9d7..a6699dceae7c 100644
--- a/net/mac80211/mesh_pathtbl.c
+++ b/net/mac80211/mesh_pathtbl.c
@@ -729,7 +729,7 @@ void mesh_plink_broken(struct sta_info *sta)
729 tbl = rcu_dereference(mesh_paths); 729 tbl = rcu_dereference(mesh_paths);
730 for_each_mesh_entry(tbl, node, i) { 730 for_each_mesh_entry(tbl, node, i) {
731 mpath = node->mpath; 731 mpath = node->mpath;
732 if (rcu_dereference(mpath->next_hop) == sta && 732 if (rcu_access_pointer(mpath->next_hop) == sta &&
733 mpath->flags & MESH_PATH_ACTIVE && 733 mpath->flags & MESH_PATH_ACTIVE &&
734 !(mpath->flags & MESH_PATH_FIXED)) { 734 !(mpath->flags & MESH_PATH_FIXED)) {
735 spin_lock_bh(&mpath->state_lock); 735 spin_lock_bh(&mpath->state_lock);
@@ -794,7 +794,7 @@ void mesh_path_flush_by_nexthop(struct sta_info *sta)
794 tbl = resize_dereference_mesh_paths(); 794 tbl = resize_dereference_mesh_paths();
795 for_each_mesh_entry(tbl, node, i) { 795 for_each_mesh_entry(tbl, node, i) {
796 mpath = node->mpath; 796 mpath = node->mpath;
797 if (rcu_dereference(mpath->next_hop) == sta) { 797 if (rcu_access_pointer(mpath->next_hop) == sta) {
798 spin_lock(&tbl->hashwlock[i]); 798 spin_lock(&tbl->hashwlock[i]);
799 __mesh_path_del(tbl, node); 799 __mesh_path_del(tbl, node);
800 spin_unlock(&tbl->hashwlock[i]); 800 spin_unlock(&tbl->hashwlock[i]);
diff --git a/net/mac80211/mesh_plink.c b/net/mac80211/mesh_plink.c
index c47194d27149..b488e1859b18 100644
--- a/net/mac80211/mesh_plink.c
+++ b/net/mac80211/mesh_plink.c
@@ -431,14 +431,12 @@ __mesh_sta_info_alloc(struct ieee80211_sub_if_data *sdata, u8 *hw_addr)
431 return NULL; 431 return NULL;
432 432
433 sta->plink_state = NL80211_PLINK_LISTEN; 433 sta->plink_state = NL80211_PLINK_LISTEN;
434 sta->sta.wme = true;
434 435
435 sta_info_pre_move_state(sta, IEEE80211_STA_AUTH); 436 sta_info_pre_move_state(sta, IEEE80211_STA_AUTH);
436 sta_info_pre_move_state(sta, IEEE80211_STA_ASSOC); 437 sta_info_pre_move_state(sta, IEEE80211_STA_ASSOC);
437 sta_info_pre_move_state(sta, IEEE80211_STA_AUTHORIZED); 438 sta_info_pre_move_state(sta, IEEE80211_STA_AUTHORIZED);
438 439
439 set_sta_flag(sta, WLAN_STA_WME);
440 sta->sta.wme = true;
441
442 return sta; 440 return sta;
443} 441}
444 442
@@ -1004,7 +1002,6 @@ mesh_process_plink_frame(struct ieee80211_sub_if_data *sdata,
1004 enum ieee80211_self_protected_actioncode ftype; 1002 enum ieee80211_self_protected_actioncode ftype;
1005 u32 changed = 0; 1003 u32 changed = 0;
1006 u8 ie_len = elems->peering_len; 1004 u8 ie_len = elems->peering_len;
1007 __le16 _plid, _llid;
1008 u16 plid, llid = 0; 1005 u16 plid, llid = 0;
1009 1006
1010 if (!elems->peering) { 1007 if (!elems->peering) {
@@ -1039,13 +1036,10 @@ mesh_process_plink_frame(struct ieee80211_sub_if_data *sdata,
1039 /* Note the lines below are correct, the llid in the frame is the plid 1036 /* Note the lines below are correct, the llid in the frame is the plid
1040 * from the point of view of this host. 1037 * from the point of view of this host.
1041 */ 1038 */
1042 memcpy(&_plid, PLINK_GET_LLID(elems->peering), sizeof(__le16)); 1039 plid = get_unaligned_le16(PLINK_GET_LLID(elems->peering));
1043 plid = le16_to_cpu(_plid);
1044 if (ftype == WLAN_SP_MESH_PEERING_CONFIRM || 1040 if (ftype == WLAN_SP_MESH_PEERING_CONFIRM ||
1045 (ftype == WLAN_SP_MESH_PEERING_CLOSE && ie_len == 8)) { 1041 (ftype == WLAN_SP_MESH_PEERING_CLOSE && ie_len == 8))
1046 memcpy(&_llid, PLINK_GET_PLID(elems->peering), sizeof(__le16)); 1042 llid = get_unaligned_le16(PLINK_GET_PLID(elems->peering));
1047 llid = le16_to_cpu(_llid);
1048 }
1049 1043
1050 /* WARNING: Only for sta pointer, is dropped & re-acquired */ 1044 /* WARNING: Only for sta pointer, is dropped & re-acquired */
1051 rcu_read_lock(); 1045 rcu_read_lock();
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index b82a12a9f0f1..2de88704278b 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -5,6 +5,7 @@
5 * Copyright 2005, Devicescape Software, Inc. 5 * Copyright 2005, Devicescape Software, Inc.
6 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz> 6 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
7 * Copyright 2007, Michael Wu <flamingice@sourmilk.net> 7 * Copyright 2007, Michael Wu <flamingice@sourmilk.net>
8 * Copyright 2013-2014 Intel Mobile Communications GmbH
8 * 9 *
9 * This program is free software; you can redistribute it and/or modify 10 * This program is free software; you can redistribute it and/or modify
10 * it under the terms of the GNU General Public License version 2 as 11 * it under the terms of the GNU General Public License version 2 as
@@ -149,6 +150,7 @@ static u32
149ieee80211_determine_chantype(struct ieee80211_sub_if_data *sdata, 150ieee80211_determine_chantype(struct ieee80211_sub_if_data *sdata,
150 struct ieee80211_supported_band *sband, 151 struct ieee80211_supported_band *sband,
151 struct ieee80211_channel *channel, 152 struct ieee80211_channel *channel,
153 const struct ieee80211_ht_cap *ht_cap,
152 const struct ieee80211_ht_operation *ht_oper, 154 const struct ieee80211_ht_operation *ht_oper,
153 const struct ieee80211_vht_operation *vht_oper, 155 const struct ieee80211_vht_operation *vht_oper,
154 struct cfg80211_chan_def *chandef, bool tracking) 156 struct cfg80211_chan_def *chandef, bool tracking)
@@ -162,13 +164,19 @@ ieee80211_determine_chantype(struct ieee80211_sub_if_data *sdata,
162 chandef->center_freq1 = channel->center_freq; 164 chandef->center_freq1 = channel->center_freq;
163 chandef->center_freq2 = 0; 165 chandef->center_freq2 = 0;
164 166
165 if (!ht_oper || !sband->ht_cap.ht_supported) { 167 if (!ht_cap || !ht_oper || !sband->ht_cap.ht_supported) {
166 ret = IEEE80211_STA_DISABLE_HT | IEEE80211_STA_DISABLE_VHT; 168 ret = IEEE80211_STA_DISABLE_HT | IEEE80211_STA_DISABLE_VHT;
167 goto out; 169 goto out;
168 } 170 }
169 171
170 chandef->width = NL80211_CHAN_WIDTH_20; 172 chandef->width = NL80211_CHAN_WIDTH_20;
171 173
174 if (!(ht_cap->cap_info &
175 cpu_to_le16(IEEE80211_HT_CAP_SUP_WIDTH_20_40))) {
176 ret = IEEE80211_STA_DISABLE_40MHZ;
177 goto out;
178 }
179
172 ht_cfreq = ieee80211_channel_to_frequency(ht_oper->primary_chan, 180 ht_cfreq = ieee80211_channel_to_frequency(ht_oper->primary_chan,
173 channel->band); 181 channel->band);
174 /* check that channel matches the right operating channel */ 182 /* check that channel matches the right operating channel */
@@ -328,6 +336,7 @@ out:
328 336
329static int ieee80211_config_bw(struct ieee80211_sub_if_data *sdata, 337static int ieee80211_config_bw(struct ieee80211_sub_if_data *sdata,
330 struct sta_info *sta, 338 struct sta_info *sta,
339 const struct ieee80211_ht_cap *ht_cap,
331 const struct ieee80211_ht_operation *ht_oper, 340 const struct ieee80211_ht_operation *ht_oper,
332 const struct ieee80211_vht_operation *vht_oper, 341 const struct ieee80211_vht_operation *vht_oper,
333 const u8 *bssid, u32 *changed) 342 const u8 *bssid, u32 *changed)
@@ -367,8 +376,9 @@ static int ieee80211_config_bw(struct ieee80211_sub_if_data *sdata,
367 sband = local->hw.wiphy->bands[chan->band]; 376 sband = local->hw.wiphy->bands[chan->band];
368 377
369 /* calculate new channel (type) based on HT/VHT operation IEs */ 378 /* calculate new channel (type) based on HT/VHT operation IEs */
370 flags = ieee80211_determine_chantype(sdata, sband, chan, ht_oper, 379 flags = ieee80211_determine_chantype(sdata, sband, chan,
371 vht_oper, &chandef, true); 380 ht_cap, ht_oper, vht_oper,
381 &chandef, true);
372 382
373 /* 383 /*
374 * Downgrade the new channel if we associated with restricted 384 * Downgrade the new channel if we associated with restricted
@@ -663,6 +673,9 @@ static void ieee80211_send_assoc(struct ieee80211_sub_if_data *sdata)
663 (local->hw.flags & IEEE80211_HW_SPECTRUM_MGMT)) 673 (local->hw.flags & IEEE80211_HW_SPECTRUM_MGMT))
664 capab |= WLAN_CAPABILITY_SPECTRUM_MGMT; 674 capab |= WLAN_CAPABILITY_SPECTRUM_MGMT;
665 675
676 if (ifmgd->flags & IEEE80211_STA_ENABLE_RRM)
677 capab |= WLAN_CAPABILITY_RADIO_MEASURE;
678
666 mgmt = (struct ieee80211_mgmt *) skb_put(skb, 24); 679 mgmt = (struct ieee80211_mgmt *) skb_put(skb, 24);
667 memset(mgmt, 0, 24); 680 memset(mgmt, 0, 24);
668 memcpy(mgmt->da, assoc_data->bss->bssid, ETH_ALEN); 681 memcpy(mgmt->da, assoc_data->bss->bssid, ETH_ALEN);
@@ -728,16 +741,17 @@ static void ieee80211_send_assoc(struct ieee80211_sub_if_data *sdata)
728 } 741 }
729 } 742 }
730 743
731 if (capab & WLAN_CAPABILITY_SPECTRUM_MGMT) { 744 if (capab & WLAN_CAPABILITY_SPECTRUM_MGMT ||
732 /* 1. power capabilities */ 745 capab & WLAN_CAPABILITY_RADIO_MEASURE) {
733 pos = skb_put(skb, 4); 746 pos = skb_put(skb, 4);
734 *pos++ = WLAN_EID_PWR_CAPABILITY; 747 *pos++ = WLAN_EID_PWR_CAPABILITY;
735 *pos++ = 2; 748 *pos++ = 2;
736 *pos++ = 0; /* min tx power */ 749 *pos++ = 0; /* min tx power */
737 /* max tx power */ 750 /* max tx power */
738 *pos++ = ieee80211_chandef_max_power(&chanctx_conf->def); 751 *pos++ = ieee80211_chandef_max_power(&chanctx_conf->def);
752 }
739 753
740 /* 2. supported channels */ 754 if (capab & WLAN_CAPABILITY_SPECTRUM_MGMT) {
741 /* TODO: get this in reg domain format */ 755 /* TODO: get this in reg domain format */
742 pos = skb_put(skb, 2 * sband->n_channels + 2); 756 pos = skb_put(skb, 2 * sband->n_channels + 2);
743 *pos++ = WLAN_EID_SUPPORTED_CHANNELS; 757 *pos++ = WLAN_EID_SUPPORTED_CHANNELS;
@@ -1157,19 +1171,21 @@ ieee80211_sta_process_chanswitch(struct ieee80211_sub_if_data *sdata,
1157 TU_TO_EXP_TIME(csa_ie.count * cbss->beacon_interval)); 1171 TU_TO_EXP_TIME(csa_ie.count * cbss->beacon_interval));
1158} 1172}
1159 1173
1160static u32 ieee80211_handle_pwr_constr(struct ieee80211_sub_if_data *sdata, 1174static bool
1161 struct ieee80211_channel *channel, 1175ieee80211_find_80211h_pwr_constr(struct ieee80211_sub_if_data *sdata,
1162 const u8 *country_ie, u8 country_ie_len, 1176 struct ieee80211_channel *channel,
1163 const u8 *pwr_constr_elem) 1177 const u8 *country_ie, u8 country_ie_len,
1178 const u8 *pwr_constr_elem,
1179 int *chan_pwr, int *pwr_reduction)
1164{ 1180{
1165 struct ieee80211_country_ie_triplet *triplet; 1181 struct ieee80211_country_ie_triplet *triplet;
1166 int chan = ieee80211_frequency_to_channel(channel->center_freq); 1182 int chan = ieee80211_frequency_to_channel(channel->center_freq);
1167 int i, chan_pwr, chan_increment, new_ap_level; 1183 int i, chan_increment;
1168 bool have_chan_pwr = false; 1184 bool have_chan_pwr = false;
1169 1185
1170 /* Invalid IE */ 1186 /* Invalid IE */
1171 if (country_ie_len % 2 || country_ie_len < IEEE80211_COUNTRY_IE_MIN_LEN) 1187 if (country_ie_len % 2 || country_ie_len < IEEE80211_COUNTRY_IE_MIN_LEN)
1172 return 0; 1188 return false;
1173 1189
1174 triplet = (void *)(country_ie + 3); 1190 triplet = (void *)(country_ie + 3);
1175 country_ie_len -= 3; 1191 country_ie_len -= 3;
@@ -1197,7 +1213,7 @@ static u32 ieee80211_handle_pwr_constr(struct ieee80211_sub_if_data *sdata,
1197 for (i = 0; i < triplet->chans.num_channels; i++) { 1213 for (i = 0; i < triplet->chans.num_channels; i++) {
1198 if (first_channel + i * chan_increment == chan) { 1214 if (first_channel + i * chan_increment == chan) {
1199 have_chan_pwr = true; 1215 have_chan_pwr = true;
1200 chan_pwr = triplet->chans.max_power; 1216 *chan_pwr = triplet->chans.max_power;
1201 break; 1217 break;
1202 } 1218 }
1203 } 1219 }
@@ -1209,18 +1225,76 @@ static u32 ieee80211_handle_pwr_constr(struct ieee80211_sub_if_data *sdata,
1209 country_ie_len -= 3; 1225 country_ie_len -= 3;
1210 } 1226 }
1211 1227
1212 if (!have_chan_pwr) 1228 if (have_chan_pwr)
1229 *pwr_reduction = *pwr_constr_elem;
1230 return have_chan_pwr;
1231}
1232
1233static void ieee80211_find_cisco_dtpc(struct ieee80211_sub_if_data *sdata,
1234 struct ieee80211_channel *channel,
1235 const u8 *cisco_dtpc_ie,
1236 int *pwr_level)
1237{
1238 /* From practical testing, the first data byte of the DTPC element
1239 * seems to contain the requested dBm level, and the CLI on Cisco
1240 * APs clearly state the range is -127 to 127 dBm, which indicates
1241 * a signed byte, although it seemingly never actually goes negative.
1242 * The other byte seems to always be zero.
1243 */
1244 *pwr_level = (__s8)cisco_dtpc_ie[4];
1245}
1246
1247static u32 ieee80211_handle_pwr_constr(struct ieee80211_sub_if_data *sdata,
1248 struct ieee80211_channel *channel,
1249 struct ieee80211_mgmt *mgmt,
1250 const u8 *country_ie, u8 country_ie_len,
1251 const u8 *pwr_constr_ie,
1252 const u8 *cisco_dtpc_ie)
1253{
1254 bool has_80211h_pwr = false, has_cisco_pwr = false;
1255 int chan_pwr = 0, pwr_reduction_80211h = 0;
1256 int pwr_level_cisco, pwr_level_80211h;
1257 int new_ap_level;
1258
1259 if (country_ie && pwr_constr_ie &&
1260 mgmt->u.probe_resp.capab_info &
1261 cpu_to_le16(WLAN_CAPABILITY_SPECTRUM_MGMT)) {
1262 has_80211h_pwr = ieee80211_find_80211h_pwr_constr(
1263 sdata, channel, country_ie, country_ie_len,
1264 pwr_constr_ie, &chan_pwr, &pwr_reduction_80211h);
1265 pwr_level_80211h =
1266 max_t(int, 0, chan_pwr - pwr_reduction_80211h);
1267 }
1268
1269 if (cisco_dtpc_ie) {
1270 ieee80211_find_cisco_dtpc(
1271 sdata, channel, cisco_dtpc_ie, &pwr_level_cisco);
1272 has_cisco_pwr = true;
1273 }
1274
1275 if (!has_80211h_pwr && !has_cisco_pwr)
1213 return 0; 1276 return 0;
1214 1277
1215 new_ap_level = max_t(int, 0, chan_pwr - *pwr_constr_elem); 1278 /* If we have both 802.11h and Cisco DTPC, apply both limits
1279 * by picking the smallest of the two power levels advertised.
1280 */
1281 if (has_80211h_pwr &&
1282 (!has_cisco_pwr || pwr_level_80211h <= pwr_level_cisco)) {
1283 sdata_info(sdata,
1284 "Limiting TX power to %d (%d - %d) dBm as advertised by %pM\n",
1285 pwr_level_80211h, chan_pwr, pwr_reduction_80211h,
1286 sdata->u.mgd.bssid);
1287 new_ap_level = pwr_level_80211h;
1288 } else { /* has_cisco_pwr is always true here. */
1289 sdata_info(sdata,
1290 "Limiting TX power to %d dBm as advertised by %pM\n",
1291 pwr_level_cisco, sdata->u.mgd.bssid);
1292 new_ap_level = pwr_level_cisco;
1293 }
1216 1294
1217 if (sdata->ap_power_level == new_ap_level) 1295 if (sdata->ap_power_level == new_ap_level)
1218 return 0; 1296 return 0;
1219 1297
1220 sdata_info(sdata,
1221 "Limiting TX power to %d (%d - %d) dBm as advertised by %pM\n",
1222 new_ap_level, chan_pwr, *pwr_constr_elem,
1223 sdata->u.mgd.bssid);
1224 sdata->ap_power_level = new_ap_level; 1298 sdata->ap_power_level = new_ap_level;
1225 if (__ieee80211_recalc_txpower(sdata)) 1299 if (__ieee80211_recalc_txpower(sdata))
1226 return BSS_CHANGED_TXPOWER; 1300 return BSS_CHANGED_TXPOWER;
@@ -2677,8 +2751,7 @@ static bool ieee80211_assoc_success(struct ieee80211_sub_if_data *sdata,
2677 if (ifmgd->flags & IEEE80211_STA_MFP_ENABLED) 2751 if (ifmgd->flags & IEEE80211_STA_MFP_ENABLED)
2678 set_sta_flag(sta, WLAN_STA_MFP); 2752 set_sta_flag(sta, WLAN_STA_MFP);
2679 2753
2680 if (elems.wmm_param) 2754 sta->sta.wme = elems.wmm_param;
2681 set_sta_flag(sta, WLAN_STA_WME);
2682 2755
2683 err = sta_info_move_state(sta, IEEE80211_STA_ASSOC); 2756 err = sta_info_move_state(sta, IEEE80211_STA_ASSOC);
2684 if (!err && !(ifmgd->flags & IEEE80211_STA_CONTROL_PORT)) 2757 if (!err && !(ifmgd->flags & IEEE80211_STA_CONTROL_PORT))
@@ -2744,6 +2817,7 @@ static void ieee80211_rx_mgmt_assoc_resp(struct ieee80211_sub_if_data *sdata,
2744 struct ieee80211_mgd_assoc_data *assoc_data = ifmgd->assoc_data; 2817 struct ieee80211_mgd_assoc_data *assoc_data = ifmgd->assoc_data;
2745 u16 capab_info, status_code, aid; 2818 u16 capab_info, status_code, aid;
2746 struct ieee802_11_elems elems; 2819 struct ieee802_11_elems elems;
2820 int ac, uapsd_queues = -1;
2747 u8 *pos; 2821 u8 *pos;
2748 bool reassoc; 2822 bool reassoc;
2749 struct cfg80211_bss *bss; 2823 struct cfg80211_bss *bss;
@@ -2813,9 +2887,15 @@ static void ieee80211_rx_mgmt_assoc_resp(struct ieee80211_sub_if_data *sdata,
2813 * is set can cause the interface to go idle 2887 * is set can cause the interface to go idle
2814 */ 2888 */
2815 ieee80211_destroy_assoc_data(sdata, true); 2889 ieee80211_destroy_assoc_data(sdata, true);
2890
2891 /* get uapsd queues configuration */
2892 uapsd_queues = 0;
2893 for (ac = 0; ac < IEEE80211_NUM_ACS; ac++)
2894 if (sdata->tx_conf[ac].uapsd)
2895 uapsd_queues |= BIT(ac);
2816 } 2896 }
2817 2897
2818 cfg80211_rx_assoc_resp(sdata->dev, bss, (u8 *)mgmt, len); 2898 cfg80211_rx_assoc_resp(sdata->dev, bss, (u8 *)mgmt, len, uapsd_queues);
2819} 2899}
2820 2900
2821static void ieee80211_rx_bss_info(struct ieee80211_sub_if_data *sdata, 2901static void ieee80211_rx_bss_info(struct ieee80211_sub_if_data *sdata,
@@ -2885,7 +2965,9 @@ static void ieee80211_rx_mgmt_probe_resp(struct ieee80211_sub_if_data *sdata,
2885/* 2965/*
2886 * This is the canonical list of information elements we care about, 2966 * This is the canonical list of information elements we care about,
2887 * the filter code also gives us all changes to the Microsoft OUI 2967 * the filter code also gives us all changes to the Microsoft OUI
2888 * (00:50:F2) vendor IE which is used for WMM which we need to track. 2968 * (00:50:F2) vendor IE which is used for WMM which we need to track,
2969 * as well as the DTPC IE (part of the Cisco OUI) used for signaling
2970 * changes to requested client power.
2889 * 2971 *
2890 * We implement beacon filtering in software since that means we can 2972 * We implement beacon filtering in software since that means we can
2891 * avoid processing the frame here and in cfg80211, and userspace 2973 * avoid processing the frame here and in cfg80211, and userspace
@@ -3174,7 +3256,8 @@ static void ieee80211_rx_mgmt_beacon(struct ieee80211_sub_if_data *sdata,
3174 mutex_lock(&local->sta_mtx); 3256 mutex_lock(&local->sta_mtx);
3175 sta = sta_info_get(sdata, bssid); 3257 sta = sta_info_get(sdata, bssid);
3176 3258
3177 if (ieee80211_config_bw(sdata, sta, elems.ht_operation, 3259 if (ieee80211_config_bw(sdata, sta,
3260 elems.ht_cap_elem, elems.ht_operation,
3178 elems.vht_operation, bssid, &changed)) { 3261 elems.vht_operation, bssid, &changed)) {
3179 mutex_unlock(&local->sta_mtx); 3262 mutex_unlock(&local->sta_mtx);
3180 ieee80211_set_disassoc(sdata, IEEE80211_STYPE_DEAUTH, 3263 ieee80211_set_disassoc(sdata, IEEE80211_STYPE_DEAUTH,
@@ -3190,13 +3273,11 @@ static void ieee80211_rx_mgmt_beacon(struct ieee80211_sub_if_data *sdata,
3190 rx_status->band, true); 3273 rx_status->band, true);
3191 mutex_unlock(&local->sta_mtx); 3274 mutex_unlock(&local->sta_mtx);
3192 3275
3193 if (elems.country_elem && elems.pwr_constr_elem && 3276 changed |= ieee80211_handle_pwr_constr(sdata, chan, mgmt,
3194 mgmt->u.probe_resp.capab_info & 3277 elems.country_elem,
3195 cpu_to_le16(WLAN_CAPABILITY_SPECTRUM_MGMT)) 3278 elems.country_elem_len,
3196 changed |= ieee80211_handle_pwr_constr(sdata, chan, 3279 elems.pwr_constr_elem,
3197 elems.country_elem, 3280 elems.cisco_dtpc_elem);
3198 elems.country_elem_len,
3199 elems.pwr_constr_elem);
3200 3281
3201 ieee80211_bss_info_change_notify(sdata, changed); 3282 ieee80211_bss_info_change_notify(sdata, changed);
3202} 3283}
@@ -3724,7 +3805,7 @@ void ieee80211_sta_setup_sdata(struct ieee80211_sub_if_data *sdata)
3724 ifmgd->uapsd_max_sp_len = sdata->local->hw.uapsd_max_sp_len; 3805 ifmgd->uapsd_max_sp_len = sdata->local->hw.uapsd_max_sp_len;
3725 ifmgd->p2p_noa_index = -1; 3806 ifmgd->p2p_noa_index = -1;
3726 3807
3727 if (sdata->local->hw.flags & IEEE80211_HW_SUPPORTS_DYNAMIC_SMPS) 3808 if (sdata->local->hw.wiphy->features & NL80211_FEATURE_DYNAMIC_SMPS)
3728 ifmgd->req_smps = IEEE80211_SMPS_AUTOMATIC; 3809 ifmgd->req_smps = IEEE80211_SMPS_AUTOMATIC;
3729 else 3810 else
3730 ifmgd->req_smps = IEEE80211_SMPS_OFF; 3811 ifmgd->req_smps = IEEE80211_SMPS_OFF;
@@ -3808,6 +3889,7 @@ static int ieee80211_prep_channel(struct ieee80211_sub_if_data *sdata,
3808{ 3889{
3809 struct ieee80211_local *local = sdata->local; 3890 struct ieee80211_local *local = sdata->local;
3810 struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; 3891 struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
3892 const struct ieee80211_ht_cap *ht_cap = NULL;
3811 const struct ieee80211_ht_operation *ht_oper = NULL; 3893 const struct ieee80211_ht_operation *ht_oper = NULL;
3812 const struct ieee80211_vht_operation *vht_oper = NULL; 3894 const struct ieee80211_vht_operation *vht_oper = NULL;
3813 struct ieee80211_supported_band *sband; 3895 struct ieee80211_supported_band *sband;
@@ -3824,14 +3906,17 @@ static int ieee80211_prep_channel(struct ieee80211_sub_if_data *sdata,
3824 3906
3825 if (!(ifmgd->flags & IEEE80211_STA_DISABLE_HT) && 3907 if (!(ifmgd->flags & IEEE80211_STA_DISABLE_HT) &&
3826 sband->ht_cap.ht_supported) { 3908 sband->ht_cap.ht_supported) {
3827 const u8 *ht_oper_ie, *ht_cap; 3909 const u8 *ht_oper_ie, *ht_cap_ie;
3828 3910
3829 ht_oper_ie = ieee80211_bss_get_ie(cbss, WLAN_EID_HT_OPERATION); 3911 ht_oper_ie = ieee80211_bss_get_ie(cbss, WLAN_EID_HT_OPERATION);
3830 if (ht_oper_ie && ht_oper_ie[1] >= sizeof(*ht_oper)) 3912 if (ht_oper_ie && ht_oper_ie[1] >= sizeof(*ht_oper))
3831 ht_oper = (void *)(ht_oper_ie + 2); 3913 ht_oper = (void *)(ht_oper_ie + 2);
3832 3914
3833 ht_cap = ieee80211_bss_get_ie(cbss, WLAN_EID_HT_CAPABILITY); 3915 ht_cap_ie = ieee80211_bss_get_ie(cbss, WLAN_EID_HT_CAPABILITY);
3834 if (!ht_cap || ht_cap[1] < sizeof(struct ieee80211_ht_cap)) { 3916 if (ht_cap_ie && ht_cap_ie[1] >= sizeof(*ht_cap))
3917 ht_cap = (void *)(ht_cap_ie + 2);
3918
3919 if (!ht_cap) {
3835 ifmgd->flags |= IEEE80211_STA_DISABLE_HT; 3920 ifmgd->flags |= IEEE80211_STA_DISABLE_HT;
3836 ht_oper = NULL; 3921 ht_oper = NULL;
3837 } 3922 }
@@ -3862,7 +3947,7 @@ static int ieee80211_prep_channel(struct ieee80211_sub_if_data *sdata,
3862 3947
3863 ifmgd->flags |= ieee80211_determine_chantype(sdata, sband, 3948 ifmgd->flags |= ieee80211_determine_chantype(sdata, sband,
3864 cbss->channel, 3949 cbss->channel,
3865 ht_oper, vht_oper, 3950 ht_cap, ht_oper, vht_oper,
3866 &chandef, false); 3951 &chandef, false);
3867 3952
3868 sdata->needed_rx_chains = min(ieee80211_ht_vht_rx_chains(sdata, cbss), 3953 sdata->needed_rx_chains = min(ieee80211_ht_vht_rx_chains(sdata, cbss),
@@ -4395,6 +4480,11 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
4395 ifmgd->flags &= ~IEEE80211_STA_MFP_ENABLED; 4480 ifmgd->flags &= ~IEEE80211_STA_MFP_ENABLED;
4396 } 4481 }
4397 4482
4483 if (req->flags & ASSOC_REQ_USE_RRM)
4484 ifmgd->flags |= IEEE80211_STA_ENABLE_RRM;
4485 else
4486 ifmgd->flags &= ~IEEE80211_STA_ENABLE_RRM;
4487
4398 if (req->crypto.control_port) 4488 if (req->crypto.control_port)
4399 ifmgd->flags |= IEEE80211_STA_CONTROL_PORT; 4489 ifmgd->flags |= IEEE80211_STA_CONTROL_PORT;
4400 else 4490 else
diff --git a/net/mac80211/rc80211_minstrel.c b/net/mac80211/rc80211_minstrel.c
index 1c1469c36dca..2baa7ed8789d 100644
--- a/net/mac80211/rc80211_minstrel.c
+++ b/net/mac80211/rc80211_minstrel.c
@@ -75,7 +75,7 @@ minstrel_sort_best_tp_rates(struct minstrel_sta_info *mi, int i, u8 *tp_list)
75{ 75{
76 int j = MAX_THR_RATES; 76 int j = MAX_THR_RATES;
77 77
78 while (j > 0 && mi->r[i].cur_tp > mi->r[tp_list[j - 1]].cur_tp) 78 while (j > 0 && mi->r[i].stats.cur_tp > mi->r[tp_list[j - 1]].stats.cur_tp)
79 j--; 79 j--;
80 if (j < MAX_THR_RATES - 1) 80 if (j < MAX_THR_RATES - 1)
81 memmove(&tp_list[j + 1], &tp_list[j], MAX_THR_RATES - (j + 1)); 81 memmove(&tp_list[j + 1], &tp_list[j], MAX_THR_RATES - (j + 1));
@@ -92,7 +92,7 @@ minstrel_set_rate(struct minstrel_sta_info *mi, struct ieee80211_sta_rates *rate
92 ratetbl->rate[offset].idx = r->rix; 92 ratetbl->rate[offset].idx = r->rix;
93 ratetbl->rate[offset].count = r->adjusted_retry_count; 93 ratetbl->rate[offset].count = r->adjusted_retry_count;
94 ratetbl->rate[offset].count_cts = r->retry_count_cts; 94 ratetbl->rate[offset].count_cts = r->retry_count_cts;
95 ratetbl->rate[offset].count_rts = r->retry_count_rtscts; 95 ratetbl->rate[offset].count_rts = r->stats.retry_count_rtscts;
96} 96}
97 97
98static void 98static void
@@ -140,44 +140,46 @@ minstrel_update_stats(struct minstrel_priv *mp, struct minstrel_sta_info *mi)
140 140
141 for (i = 0; i < mi->n_rates; i++) { 141 for (i = 0; i < mi->n_rates; i++) {
142 struct minstrel_rate *mr = &mi->r[i]; 142 struct minstrel_rate *mr = &mi->r[i];
143 struct minstrel_rate_stats *mrs = &mi->r[i].stats;
143 144
144 usecs = mr->perfect_tx_time; 145 usecs = mr->perfect_tx_time;
145 if (!usecs) 146 if (!usecs)
146 usecs = 1000000; 147 usecs = 1000000;
147 148
148 if (unlikely(mr->attempts > 0)) { 149 if (unlikely(mrs->attempts > 0)) {
149 mr->sample_skipped = 0; 150 mrs->sample_skipped = 0;
150 mr->cur_prob = MINSTREL_FRAC(mr->success, mr->attempts); 151 mrs->cur_prob = MINSTREL_FRAC(mrs->success,
151 mr->succ_hist += mr->success; 152 mrs->attempts);
152 mr->att_hist += mr->attempts; 153 mrs->succ_hist += mrs->success;
153 mr->probability = minstrel_ewma(mr->probability, 154 mrs->att_hist += mrs->attempts;
154 mr->cur_prob, 155 mrs->probability = minstrel_ewma(mrs->probability,
155 EWMA_LEVEL); 156 mrs->cur_prob,
157 EWMA_LEVEL);
156 } else 158 } else
157 mr->sample_skipped++; 159 mrs->sample_skipped++;
158 160
159 mr->last_success = mr->success; 161 mrs->last_success = mrs->success;
160 mr->last_attempts = mr->attempts; 162 mrs->last_attempts = mrs->attempts;
161 mr->success = 0; 163 mrs->success = 0;
162 mr->attempts = 0; 164 mrs->attempts = 0;
163 165
164 /* Update throughput per rate, reset thr. below 10% success */ 166 /* Update throughput per rate, reset thr. below 10% success */
165 if (mr->probability < MINSTREL_FRAC(10, 100)) 167 if (mrs->probability < MINSTREL_FRAC(10, 100))
166 mr->cur_tp = 0; 168 mrs->cur_tp = 0;
167 else 169 else
168 mr->cur_tp = mr->probability * (1000000 / usecs); 170 mrs->cur_tp = mrs->probability * (1000000 / usecs);
169 171
170 /* Sample less often below the 10% chance of success. 172 /* Sample less often below the 10% chance of success.
171 * Sample less often above the 95% chance of success. */ 173 * Sample less often above the 95% chance of success. */
172 if (mr->probability > MINSTREL_FRAC(95, 100) || 174 if (mrs->probability > MINSTREL_FRAC(95, 100) ||
173 mr->probability < MINSTREL_FRAC(10, 100)) { 175 mrs->probability < MINSTREL_FRAC(10, 100)) {
174 mr->adjusted_retry_count = mr->retry_count >> 1; 176 mr->adjusted_retry_count = mrs->retry_count >> 1;
175 if (mr->adjusted_retry_count > 2) 177 if (mr->adjusted_retry_count > 2)
176 mr->adjusted_retry_count = 2; 178 mr->adjusted_retry_count = 2;
177 mr->sample_limit = 4; 179 mr->sample_limit = 4;
178 } else { 180 } else {
179 mr->sample_limit = -1; 181 mr->sample_limit = -1;
180 mr->adjusted_retry_count = mr->retry_count; 182 mr->adjusted_retry_count = mrs->retry_count;
181 } 183 }
182 if (!mr->adjusted_retry_count) 184 if (!mr->adjusted_retry_count)
183 mr->adjusted_retry_count = 2; 185 mr->adjusted_retry_count = 2;
@@ -190,11 +192,11 @@ minstrel_update_stats(struct minstrel_priv *mp, struct minstrel_sta_info *mi)
190 * choose the maximum throughput rate as max_prob_rate 192 * choose the maximum throughput rate as max_prob_rate
191 * (2) if all success probabilities < 95%, the rate with 193 * (2) if all success probabilities < 95%, the rate with
192 * highest success probability is choosen as max_prob_rate */ 194 * highest success probability is choosen as max_prob_rate */
193 if (mr->probability >= MINSTREL_FRAC(95, 100)) { 195 if (mrs->probability >= MINSTREL_FRAC(95, 100)) {
194 if (mr->cur_tp >= mi->r[tmp_prob_rate].cur_tp) 196 if (mrs->cur_tp >= mi->r[tmp_prob_rate].stats.cur_tp)
195 tmp_prob_rate = i; 197 tmp_prob_rate = i;
196 } else { 198 } else {
197 if (mr->probability >= mi->r[tmp_prob_rate].probability) 199 if (mrs->probability >= mi->r[tmp_prob_rate].stats.probability)
198 tmp_prob_rate = i; 200 tmp_prob_rate = i;
199 } 201 }
200 } 202 }
@@ -240,14 +242,14 @@ minstrel_tx_status(void *priv, struct ieee80211_supported_band *sband,
240 if (ndx < 0) 242 if (ndx < 0)
241 continue; 243 continue;
242 244
243 mi->r[ndx].attempts += ar[i].count; 245 mi->r[ndx].stats.attempts += ar[i].count;
244 246
245 if ((i != IEEE80211_TX_MAX_RATES - 1) && (ar[i + 1].idx < 0)) 247 if ((i != IEEE80211_TX_MAX_RATES - 1) && (ar[i + 1].idx < 0))
246 mi->r[ndx].success += success; 248 mi->r[ndx].stats.success += success;
247 } 249 }
248 250
249 if ((info->flags & IEEE80211_TX_CTL_RATE_CTRL_PROBE) && (i >= 0)) 251 if ((info->flags & IEEE80211_TX_CTL_RATE_CTRL_PROBE) && (i >= 0))
250 mi->sample_count++; 252 mi->sample_packets++;
251 253
252 if (mi->sample_deferred > 0) 254 if (mi->sample_deferred > 0)
253 mi->sample_deferred--; 255 mi->sample_deferred--;
@@ -265,7 +267,7 @@ minstrel_get_retry_count(struct minstrel_rate *mr,
265 unsigned int retry = mr->adjusted_retry_count; 267 unsigned int retry = mr->adjusted_retry_count;
266 268
267 if (info->control.use_rts) 269 if (info->control.use_rts)
268 retry = max(2U, min(mr->retry_count_rtscts, retry)); 270 retry = max(2U, min(mr->stats.retry_count_rtscts, retry));
269 else if (info->control.use_cts_prot) 271 else if (info->control.use_cts_prot)
270 retry = max(2U, min(mr->retry_count_cts, retry)); 272 retry = max(2U, min(mr->retry_count_cts, retry));
271 return retry; 273 return retry;
@@ -317,15 +319,15 @@ minstrel_get_rate(void *priv, struct ieee80211_sta *sta,
317 sampling_ratio = mp->lookaround_rate; 319 sampling_ratio = mp->lookaround_rate;
318 320
319 /* increase sum packet counter */ 321 /* increase sum packet counter */
320 mi->packet_count++; 322 mi->total_packets++;
321 323
322#ifdef CONFIG_MAC80211_DEBUGFS 324#ifdef CONFIG_MAC80211_DEBUGFS
323 if (mp->fixed_rate_idx != -1) 325 if (mp->fixed_rate_idx != -1)
324 return; 326 return;
325#endif 327#endif
326 328
327 delta = (mi->packet_count * sampling_ratio / 100) - 329 delta = (mi->total_packets * sampling_ratio / 100) -
328 (mi->sample_count + mi->sample_deferred / 2); 330 (mi->sample_packets + mi->sample_deferred / 2);
329 331
330 /* delta < 0: no sampling required */ 332 /* delta < 0: no sampling required */
331 prev_sample = mi->prev_sample; 333 prev_sample = mi->prev_sample;
@@ -333,10 +335,10 @@ minstrel_get_rate(void *priv, struct ieee80211_sta *sta,
333 if (delta < 0 || (!mrr_capable && prev_sample)) 335 if (delta < 0 || (!mrr_capable && prev_sample))
334 return; 336 return;
335 337
336 if (mi->packet_count >= 10000) { 338 if (mi->total_packets >= 10000) {
337 mi->sample_deferred = 0; 339 mi->sample_deferred = 0;
338 mi->sample_count = 0; 340 mi->sample_packets = 0;
339 mi->packet_count = 0; 341 mi->total_packets = 0;
340 } else if (delta > mi->n_rates * 2) { 342 } else if (delta > mi->n_rates * 2) {
341 /* With multi-rate retry, not every planned sample 343 /* With multi-rate retry, not every planned sample
342 * attempt actually gets used, due to the way the retry 344 * attempt actually gets used, due to the way the retry
@@ -347,7 +349,7 @@ minstrel_get_rate(void *priv, struct ieee80211_sta *sta,
347 * starts getting worse, minstrel would start bursting 349 * starts getting worse, minstrel would start bursting
348 * out lots of sampling frames, which would result 350 * out lots of sampling frames, which would result
349 * in a large throughput loss. */ 351 * in a large throughput loss. */
350 mi->sample_count += (delta - mi->n_rates * 2); 352 mi->sample_packets += (delta - mi->n_rates * 2);
351 } 353 }
352 354
353 /* get next random rate sample */ 355 /* get next random rate sample */
@@ -361,7 +363,7 @@ minstrel_get_rate(void *priv, struct ieee80211_sta *sta,
361 */ 363 */
362 if (mrr_capable && 364 if (mrr_capable &&
363 msr->perfect_tx_time > mr->perfect_tx_time && 365 msr->perfect_tx_time > mr->perfect_tx_time &&
364 msr->sample_skipped < 20) { 366 msr->stats.sample_skipped < 20) {
365 /* Only use IEEE80211_TX_CTL_RATE_CTRL_PROBE to mark 367 /* Only use IEEE80211_TX_CTL_RATE_CTRL_PROBE to mark
366 * packets that have the sampling rate deferred to the 368 * packets that have the sampling rate deferred to the
367 * second MRR stage. Increase the sample counter only 369 * second MRR stage. Increase the sample counter only
@@ -375,7 +377,7 @@ minstrel_get_rate(void *priv, struct ieee80211_sta *sta,
375 if (!msr->sample_limit != 0) 377 if (!msr->sample_limit != 0)
376 return; 378 return;
377 379
378 mi->sample_count++; 380 mi->sample_packets++;
379 if (msr->sample_limit > 0) 381 if (msr->sample_limit > 0)
380 msr->sample_limit--; 382 msr->sample_limit--;
381 } 383 }
@@ -384,7 +386,7 @@ minstrel_get_rate(void *priv, struct ieee80211_sta *sta,
384 * has a probability of >95%, we shouldn't be attempting 386 * has a probability of >95%, we shouldn't be attempting
385 * to use it, as this only wastes precious airtime */ 387 * to use it, as this only wastes precious airtime */
386 if (!mrr_capable && 388 if (!mrr_capable &&
387 (mi->r[ndx].probability > MINSTREL_FRAC(95, 100))) 389 (mi->r[ndx].stats.probability > MINSTREL_FRAC(95, 100)))
388 return; 390 return;
389 391
390 mi->prev_sample = true; 392 mi->prev_sample = true;
@@ -459,6 +461,7 @@ minstrel_rate_init(void *priv, struct ieee80211_supported_band *sband,
459 461
460 for (i = 0; i < sband->n_bitrates; i++) { 462 for (i = 0; i < sband->n_bitrates; i++) {
461 struct minstrel_rate *mr = &mi->r[n]; 463 struct minstrel_rate *mr = &mi->r[n];
464 struct minstrel_rate_stats *mrs = &mi->r[n].stats;
462 unsigned int tx_time = 0, tx_time_cts = 0, tx_time_rtscts = 0; 465 unsigned int tx_time = 0, tx_time_cts = 0, tx_time_rtscts = 0;
463 unsigned int tx_time_single; 466 unsigned int tx_time_single;
464 unsigned int cw = mp->cw_min; 467 unsigned int cw = mp->cw_min;
@@ -471,6 +474,7 @@ minstrel_rate_init(void *priv, struct ieee80211_supported_band *sband,
471 474
472 n++; 475 n++;
473 memset(mr, 0, sizeof(*mr)); 476 memset(mr, 0, sizeof(*mr));
477 memset(mrs, 0, sizeof(*mrs));
474 478
475 mr->rix = i; 479 mr->rix = i;
476 shift = ieee80211_chandef_get_shift(chandef); 480 shift = ieee80211_chandef_get_shift(chandef);
@@ -482,9 +486,9 @@ minstrel_rate_init(void *priv, struct ieee80211_supported_band *sband,
482 /* calculate maximum number of retransmissions before 486 /* calculate maximum number of retransmissions before
483 * fallback (based on maximum segment size) */ 487 * fallback (based on maximum segment size) */
484 mr->sample_limit = -1; 488 mr->sample_limit = -1;
485 mr->retry_count = 1; 489 mrs->retry_count = 1;
486 mr->retry_count_cts = 1; 490 mr->retry_count_cts = 1;
487 mr->retry_count_rtscts = 1; 491 mrs->retry_count_rtscts = 1;
488 tx_time = mr->perfect_tx_time + mi->sp_ack_dur; 492 tx_time = mr->perfect_tx_time + mi->sp_ack_dur;
489 do { 493 do {
490 /* add one retransmission */ 494 /* add one retransmission */
@@ -501,13 +505,13 @@ minstrel_rate_init(void *priv, struct ieee80211_supported_band *sband,
501 (mr->retry_count_cts < mp->max_retry)) 505 (mr->retry_count_cts < mp->max_retry))
502 mr->retry_count_cts++; 506 mr->retry_count_cts++;
503 if ((tx_time_rtscts < mp->segment_size) && 507 if ((tx_time_rtscts < mp->segment_size) &&
504 (mr->retry_count_rtscts < mp->max_retry)) 508 (mrs->retry_count_rtscts < mp->max_retry))
505 mr->retry_count_rtscts++; 509 mrs->retry_count_rtscts++;
506 } while ((tx_time < mp->segment_size) && 510 } while ((tx_time < mp->segment_size) &&
507 (++mr->retry_count < mp->max_retry)); 511 (++mr->stats.retry_count < mp->max_retry));
508 mr->adjusted_retry_count = mr->retry_count; 512 mr->adjusted_retry_count = mrs->retry_count;
509 if (!(sband->bitrates[i].flags & IEEE80211_RATE_ERP_G)) 513 if (!(sband->bitrates[i].flags & IEEE80211_RATE_ERP_G))
510 mr->retry_count_cts = mr->retry_count; 514 mr->retry_count_cts = mrs->retry_count;
511 } 515 }
512 516
513 for (i = n; i < sband->n_bitrates; i++) { 517 for (i = n; i < sband->n_bitrates; i++) {
@@ -665,7 +669,7 @@ static u32 minstrel_get_expected_throughput(void *priv_sta)
665 /* convert pkt per sec in kbps (1200 is the average pkt size used for 669 /* convert pkt per sec in kbps (1200 is the average pkt size used for
666 * computing cur_tp 670 * computing cur_tp
667 */ 671 */
668 return MINSTREL_TRUNC(mi->r[idx].cur_tp) * 1200 * 8 / 1024; 672 return MINSTREL_TRUNC(mi->r[idx].stats.cur_tp) * 1200 * 8 / 1024;
669} 673}
670 674
671const struct rate_control_ops mac80211_minstrel = { 675const struct rate_control_ops mac80211_minstrel = {
diff --git a/net/mac80211/rc80211_minstrel.h b/net/mac80211/rc80211_minstrel.h
index 046d1bd598a8..97eca86a4af0 100644
--- a/net/mac80211/rc80211_minstrel.h
+++ b/net/mac80211/rc80211_minstrel.h
@@ -31,6 +31,27 @@ minstrel_ewma(int old, int new, int weight)
31 return (new * (EWMA_DIV - weight) + old * weight) / EWMA_DIV; 31 return (new * (EWMA_DIV - weight) + old * weight) / EWMA_DIV;
32} 32}
33 33
34struct minstrel_rate_stats {
35 /* current / last sampling period attempts/success counters */
36 unsigned int attempts, last_attempts;
37 unsigned int success, last_success;
38
39 /* total attempts/success counters */
40 u64 att_hist, succ_hist;
41
42 /* current throughput */
43 unsigned int cur_tp;
44
45 /* packet delivery probabilities */
46 unsigned int cur_prob, probability;
47
48 /* maximum retry counts */
49 unsigned int retry_count;
50 unsigned int retry_count_rtscts;
51
52 u8 sample_skipped;
53 bool retry_updated;
54};
34 55
35struct minstrel_rate { 56struct minstrel_rate {
36 int bitrate; 57 int bitrate;
@@ -40,26 +61,10 @@ struct minstrel_rate {
40 unsigned int ack_time; 61 unsigned int ack_time;
41 62
42 int sample_limit; 63 int sample_limit;
43 unsigned int retry_count;
44 unsigned int retry_count_cts; 64 unsigned int retry_count_cts;
45 unsigned int retry_count_rtscts;
46 unsigned int adjusted_retry_count; 65 unsigned int adjusted_retry_count;
47 66
48 u32 success; 67 struct minstrel_rate_stats stats;
49 u32 attempts;
50 u32 last_attempts;
51 u32 last_success;
52 u8 sample_skipped;
53
54 /* parts per thousand */
55 u32 cur_prob;
56 u32 probability;
57
58 /* per-rate throughput */
59 u32 cur_tp;
60
61 u64 succ_hist;
62 u64 att_hist;
63}; 68};
64 69
65struct minstrel_sta_info { 70struct minstrel_sta_info {
@@ -73,8 +78,8 @@ struct minstrel_sta_info {
73 78
74 u8 max_tp_rate[MAX_THR_RATES]; 79 u8 max_tp_rate[MAX_THR_RATES];
75 u8 max_prob_rate; 80 u8 max_prob_rate;
76 unsigned int packet_count; 81 unsigned int total_packets;
77 unsigned int sample_count; 82 unsigned int sample_packets;
78 int sample_deferred; 83 int sample_deferred;
79 84
80 unsigned int sample_row; 85 unsigned int sample_row;
diff --git a/net/mac80211/rc80211_minstrel_debugfs.c b/net/mac80211/rc80211_minstrel_debugfs.c
index fd0b9ca1570e..edde723f9f00 100644
--- a/net/mac80211/rc80211_minstrel_debugfs.c
+++ b/net/mac80211/rc80211_minstrel_debugfs.c
@@ -72,6 +72,7 @@ minstrel_stats_open(struct inode *inode, struct file *file)
72 "this succ/attempt success attempts\n"); 72 "this succ/attempt success attempts\n");
73 for (i = 0; i < mi->n_rates; i++) { 73 for (i = 0; i < mi->n_rates; i++) {
74 struct minstrel_rate *mr = &mi->r[i]; 74 struct minstrel_rate *mr = &mi->r[i];
75 struct minstrel_rate_stats *mrs = &mi->r[i].stats;
75 76
76 *(p++) = (i == mi->max_tp_rate[0]) ? 'A' : ' '; 77 *(p++) = (i == mi->max_tp_rate[0]) ? 'A' : ' ';
77 *(p++) = (i == mi->max_tp_rate[1]) ? 'B' : ' '; 78 *(p++) = (i == mi->max_tp_rate[1]) ? 'B' : ' ';
@@ -81,24 +82,24 @@ minstrel_stats_open(struct inode *inode, struct file *file)
81 p += sprintf(p, "%3u%s", mr->bitrate / 2, 82 p += sprintf(p, "%3u%s", mr->bitrate / 2,
82 (mr->bitrate & 1 ? ".5" : " ")); 83 (mr->bitrate & 1 ? ".5" : " "));
83 84
84 tp = MINSTREL_TRUNC(mr->cur_tp / 10); 85 tp = MINSTREL_TRUNC(mrs->cur_tp / 10);
85 prob = MINSTREL_TRUNC(mr->cur_prob * 1000); 86 prob = MINSTREL_TRUNC(mrs->cur_prob * 1000);
86 eprob = MINSTREL_TRUNC(mr->probability * 1000); 87 eprob = MINSTREL_TRUNC(mrs->probability * 1000);
87 88
88 p += sprintf(p, " %6u.%1u %6u.%1u %6u.%1u " 89 p += sprintf(p, " %6u.%1u %6u.%1u %6u.%1u "
89 " %3u(%3u) %8llu %8llu\n", 90 " %3u(%3u) %8llu %8llu\n",
90 tp / 10, tp % 10, 91 tp / 10, tp % 10,
91 eprob / 10, eprob % 10, 92 eprob / 10, eprob % 10,
92 prob / 10, prob % 10, 93 prob / 10, prob % 10,
93 mr->last_success, 94 mrs->last_success,
94 mr->last_attempts, 95 mrs->last_attempts,
95 (unsigned long long)mr->succ_hist, 96 (unsigned long long)mrs->succ_hist,
96 (unsigned long long)mr->att_hist); 97 (unsigned long long)mrs->att_hist);
97 } 98 }
98 p += sprintf(p, "\nTotal packet count:: ideal %d " 99 p += sprintf(p, "\nTotal packet count:: ideal %d "
99 "lookaround %d\n\n", 100 "lookaround %d\n\n",
100 mi->packet_count - mi->sample_count, 101 mi->total_packets - mi->sample_packets,
101 mi->sample_count); 102 mi->sample_packets);
102 ms->len = p - ms->buf; 103 ms->len = p - ms->buf;
103 104
104 return 0; 105 return 0;
diff --git a/net/mac80211/rc80211_minstrel_ht.c b/net/mac80211/rc80211_minstrel_ht.c
index 85c1e74b7714..df90ce2db00c 100644
--- a/net/mac80211/rc80211_minstrel_ht.c
+++ b/net/mac80211/rc80211_minstrel_ht.c
@@ -135,7 +135,7 @@ minstrel_ht_update_rates(struct minstrel_priv *mp, struct minstrel_ht_sta *mi);
135static int 135static int
136minstrel_ht_get_group_idx(struct ieee80211_tx_rate *rate) 136minstrel_ht_get_group_idx(struct ieee80211_tx_rate *rate)
137{ 137{
138 return GROUP_IDX((rate->idx / 8) + 1, 138 return GROUP_IDX((rate->idx / MCS_GROUP_RATES) + 1,
139 !!(rate->flags & IEEE80211_TX_RC_SHORT_GI), 139 !!(rate->flags & IEEE80211_TX_RC_SHORT_GI),
140 !!(rate->flags & IEEE80211_TX_RC_40_MHZ_WIDTH)); 140 !!(rate->flags & IEEE80211_TX_RC_40_MHZ_WIDTH));
141} 141}
@@ -233,12 +233,151 @@ minstrel_ht_calc_tp(struct minstrel_ht_sta *mi, int group, int rate)
233} 233}
234 234
235/* 235/*
236 * Find & sort topmost throughput rates
237 *
238 * If multiple rates provide equal throughput the sorting is based on their
239 * current success probability. Higher success probability is preferred among
240 * MCS groups, CCK rates do not provide aggregation and are therefore at last.
241 */
242static void
243minstrel_ht_sort_best_tp_rates(struct minstrel_ht_sta *mi, u8 index,
244 u8 *tp_list)
245{
246 int cur_group, cur_idx, cur_thr, cur_prob;
247 int tmp_group, tmp_idx, tmp_thr, tmp_prob;
248 int j = MAX_THR_RATES;
249
250 cur_group = index / MCS_GROUP_RATES;
251 cur_idx = index % MCS_GROUP_RATES;
252 cur_thr = mi->groups[cur_group].rates[cur_idx].cur_tp;
253 cur_prob = mi->groups[cur_group].rates[cur_idx].probability;
254
255 tmp_group = tp_list[j - 1] / MCS_GROUP_RATES;
256 tmp_idx = tp_list[j - 1] % MCS_GROUP_RATES;
257 tmp_thr = mi->groups[tmp_group].rates[tmp_idx].cur_tp;
258 tmp_prob = mi->groups[tmp_group].rates[tmp_idx].probability;
259
260 while (j > 0 && (cur_thr > tmp_thr ||
261 (cur_thr == tmp_thr && cur_prob > tmp_prob))) {
262 j--;
263 tmp_group = tp_list[j - 1] / MCS_GROUP_RATES;
264 tmp_idx = tp_list[j - 1] % MCS_GROUP_RATES;
265 tmp_thr = mi->groups[tmp_group].rates[tmp_idx].cur_tp;
266 tmp_prob = mi->groups[tmp_group].rates[tmp_idx].probability;
267 }
268
269 if (j < MAX_THR_RATES - 1) {
270 memmove(&tp_list[j + 1], &tp_list[j], (sizeof(*tp_list) *
271 (MAX_THR_RATES - (j + 1))));
272 }
273 if (j < MAX_THR_RATES)
274 tp_list[j] = index;
275}
276
277/*
278 * Find and set the topmost probability rate per sta and per group
279 */
280static void
281minstrel_ht_set_best_prob_rate(struct minstrel_ht_sta *mi, u8 index)
282{
283 struct minstrel_mcs_group_data *mg;
284 struct minstrel_rate_stats *mr;
285 int tmp_group, tmp_idx, tmp_tp, tmp_prob, max_tp_group;
286
287 mg = &mi->groups[index / MCS_GROUP_RATES];
288 mr = &mg->rates[index % MCS_GROUP_RATES];
289
290 tmp_group = mi->max_prob_rate / MCS_GROUP_RATES;
291 tmp_idx = mi->max_prob_rate % MCS_GROUP_RATES;
292 tmp_tp = mi->groups[tmp_group].rates[tmp_idx].cur_tp;
293 tmp_prob = mi->groups[tmp_group].rates[tmp_idx].probability;
294
295 /* if max_tp_rate[0] is from MCS_GROUP max_prob_rate get selected from
296 * MCS_GROUP as well as CCK_GROUP rates do not allow aggregation */
297 max_tp_group = mi->max_tp_rate[0] / MCS_GROUP_RATES;
298 if((index / MCS_GROUP_RATES == MINSTREL_CCK_GROUP) &&
299 (max_tp_group != MINSTREL_CCK_GROUP))
300 return;
301
302 if (mr->probability > MINSTREL_FRAC(75, 100)) {
303 if (mr->cur_tp > tmp_tp)
304 mi->max_prob_rate = index;
305 if (mr->cur_tp > mg->rates[mg->max_group_prob_rate].cur_tp)
306 mg->max_group_prob_rate = index;
307 } else {
308 if (mr->probability > tmp_prob)
309 mi->max_prob_rate = index;
310 if (mr->probability > mg->rates[mg->max_group_prob_rate].probability)
311 mg->max_group_prob_rate = index;
312 }
313}
314
315
316/*
317 * Assign new rate set per sta and use CCK rates only if the fastest
318 * rate (max_tp_rate[0]) is from CCK group. This prohibits such sorted
319 * rate sets where MCS and CCK rates are mixed, because CCK rates can
320 * not use aggregation.
321 */
322static void
323minstrel_ht_assign_best_tp_rates(struct minstrel_ht_sta *mi,
324 u8 tmp_mcs_tp_rate[MAX_THR_RATES],
325 u8 tmp_cck_tp_rate[MAX_THR_RATES])
326{
327 unsigned int tmp_group, tmp_idx, tmp_cck_tp, tmp_mcs_tp;
328 int i;
329
330 tmp_group = tmp_cck_tp_rate[0] / MCS_GROUP_RATES;
331 tmp_idx = tmp_cck_tp_rate[0] % MCS_GROUP_RATES;
332 tmp_cck_tp = mi->groups[tmp_group].rates[tmp_idx].cur_tp;
333
334 tmp_group = tmp_mcs_tp_rate[0] / MCS_GROUP_RATES;
335 tmp_idx = tmp_mcs_tp_rate[0] % MCS_GROUP_RATES;
336 tmp_mcs_tp = mi->groups[tmp_group].rates[tmp_idx].cur_tp;
337
338 if (tmp_cck_tp > tmp_mcs_tp) {
339 for(i = 0; i < MAX_THR_RATES; i++) {
340 minstrel_ht_sort_best_tp_rates(mi, tmp_cck_tp_rate[i],
341 tmp_mcs_tp_rate);
342 }
343 }
344
345}
346
347/*
348 * Try to increase robustness of max_prob rate by decrease number of
349 * streams if possible.
350 */
351static inline void
352minstrel_ht_prob_rate_reduce_streams(struct minstrel_ht_sta *mi)
353{
354 struct minstrel_mcs_group_data *mg;
355 struct minstrel_rate_stats *mr;
356 int tmp_max_streams, group;
357 int tmp_tp = 0;
358
359 tmp_max_streams = minstrel_mcs_groups[mi->max_tp_rate[0] /
360 MCS_GROUP_RATES].streams;
361 for (group = 0; group < ARRAY_SIZE(minstrel_mcs_groups); group++) {
362 mg = &mi->groups[group];
363 if (!mg->supported || group == MINSTREL_CCK_GROUP)
364 continue;
365 mr = minstrel_get_ratestats(mi, mg->max_group_prob_rate);
366 if (tmp_tp < mr->cur_tp &&
367 (minstrel_mcs_groups[group].streams < tmp_max_streams)) {
368 mi->max_prob_rate = mg->max_group_prob_rate;
369 tmp_tp = mr->cur_tp;
370 }
371 }
372}
373
374/*
236 * Update rate statistics and select new primary rates 375 * Update rate statistics and select new primary rates
237 * 376 *
238 * Rules for rate selection: 377 * Rules for rate selection:
239 * - max_prob_rate must use only one stream, as a tradeoff between delivery 378 * - max_prob_rate must use only one stream, as a tradeoff between delivery
240 * probability and throughput during strong fluctuations 379 * probability and throughput during strong fluctuations
241 * - as long as the max prob rate has a probability of more than 3/4, pick 380 * - as long as the max prob rate has a probability of more than 75%, pick
242 * higher throughput rates, even if the probablity is a bit lower 381 * higher throughput rates, even if the probablity is a bit lower
243 */ 382 */
244static void 383static void
@@ -246,9 +385,9 @@ minstrel_ht_update_stats(struct minstrel_priv *mp, struct minstrel_ht_sta *mi)
246{ 385{
247 struct minstrel_mcs_group_data *mg; 386 struct minstrel_mcs_group_data *mg;
248 struct minstrel_rate_stats *mr; 387 struct minstrel_rate_stats *mr;
249 int cur_prob, cur_prob_tp, cur_tp, cur_tp2; 388 int group, i, j;
250 int group, i, index; 389 u8 tmp_mcs_tp_rate[MAX_THR_RATES], tmp_group_tp_rate[MAX_THR_RATES];
251 bool mi_rates_valid = false; 390 u8 tmp_cck_tp_rate[MAX_THR_RATES], index;
252 391
253 if (mi->ampdu_packets > 0) { 392 if (mi->ampdu_packets > 0) {
254 mi->avg_ampdu_len = minstrel_ewma(mi->avg_ampdu_len, 393 mi->avg_ampdu_len = minstrel_ewma(mi->avg_ampdu_len,
@@ -260,13 +399,14 @@ minstrel_ht_update_stats(struct minstrel_priv *mp, struct minstrel_ht_sta *mi)
260 mi->sample_slow = 0; 399 mi->sample_slow = 0;
261 mi->sample_count = 0; 400 mi->sample_count = 0;
262 401
263 for (group = 0; group < ARRAY_SIZE(minstrel_mcs_groups); group++) { 402 /* Initialize global rate indexes */
264 bool mg_rates_valid = false; 403 for(j = 0; j < MAX_THR_RATES; j++){
404 tmp_mcs_tp_rate[j] = 0;
405 tmp_cck_tp_rate[j] = 0;
406 }
265 407
266 cur_prob = 0; 408 /* Find best rate sets within all MCS groups*/
267 cur_prob_tp = 0; 409 for (group = 0; group < ARRAY_SIZE(minstrel_mcs_groups); group++) {
268 cur_tp = 0;
269 cur_tp2 = 0;
270 410
271 mg = &mi->groups[group]; 411 mg = &mi->groups[group];
272 if (!mg->supported) 412 if (!mg->supported)
@@ -274,24 +414,16 @@ minstrel_ht_update_stats(struct minstrel_priv *mp, struct minstrel_ht_sta *mi)
274 414
275 mi->sample_count++; 415 mi->sample_count++;
276 416
417 /* (re)Initialize group rate indexes */
418 for(j = 0; j < MAX_THR_RATES; j++)
419 tmp_group_tp_rate[j] = group;
420
277 for (i = 0; i < MCS_GROUP_RATES; i++) { 421 for (i = 0; i < MCS_GROUP_RATES; i++) {
278 if (!(mg->supported & BIT(i))) 422 if (!(mg->supported & BIT(i)))
279 continue; 423 continue;
280 424
281 index = MCS_GROUP_RATES * group + i; 425 index = MCS_GROUP_RATES * group + i;
282 426
283 /* initialize rates selections starting indexes */
284 if (!mg_rates_valid) {
285 mg->max_tp_rate = mg->max_tp_rate2 =
286 mg->max_prob_rate = i;
287 if (!mi_rates_valid) {
288 mi->max_tp_rate = mi->max_tp_rate2 =
289 mi->max_prob_rate = index;
290 mi_rates_valid = true;
291 }
292 mg_rates_valid = true;
293 }
294
295 mr = &mg->rates[i]; 427 mr = &mg->rates[i];
296 mr->retry_updated = false; 428 mr->retry_updated = false;
297 minstrel_calc_rate_ewma(mr); 429 minstrel_calc_rate_ewma(mr);
@@ -300,82 +432,47 @@ minstrel_ht_update_stats(struct minstrel_priv *mp, struct minstrel_ht_sta *mi)
300 if (!mr->cur_tp) 432 if (!mr->cur_tp)
301 continue; 433 continue;
302 434
303 if ((mr->cur_tp > cur_prob_tp && mr->probability > 435 /* Find max throughput rate set */
304 MINSTREL_FRAC(3, 4)) || mr->probability > cur_prob) { 436 if (group != MINSTREL_CCK_GROUP) {
305 mg->max_prob_rate = index; 437 minstrel_ht_sort_best_tp_rates(mi, index,
306 cur_prob = mr->probability; 438 tmp_mcs_tp_rate);
307 cur_prob_tp = mr->cur_tp; 439 } else if (group == MINSTREL_CCK_GROUP) {
308 } 440 minstrel_ht_sort_best_tp_rates(mi, index,
309 441 tmp_cck_tp_rate);
310 if (mr->cur_tp > cur_tp) {
311 swap(index, mg->max_tp_rate);
312 cur_tp = mr->cur_tp;
313 mr = minstrel_get_ratestats(mi, index);
314 }
315
316 if (index >= mg->max_tp_rate)
317 continue;
318
319 if (mr->cur_tp > cur_tp2) {
320 mg->max_tp_rate2 = index;
321 cur_tp2 = mr->cur_tp;
322 } 442 }
323 }
324 }
325 443
326 /* try to sample all available rates during each interval */ 444 /* Find max throughput rate set within a group */
327 mi->sample_count *= 8; 445 minstrel_ht_sort_best_tp_rates(mi, index,
446 tmp_group_tp_rate);
328 447
329 cur_prob = 0; 448 /* Find max probability rate per group and global */
330 cur_prob_tp = 0; 449 minstrel_ht_set_best_prob_rate(mi, index);
331 cur_tp = 0;
332 cur_tp2 = 0;
333 for (group = 0; group < ARRAY_SIZE(minstrel_mcs_groups); group++) {
334 mg = &mi->groups[group];
335 if (!mg->supported)
336 continue;
337
338 mr = minstrel_get_ratestats(mi, mg->max_tp_rate);
339 if (cur_tp < mr->cur_tp) {
340 mi->max_tp_rate2 = mi->max_tp_rate;
341 cur_tp2 = cur_tp;
342 mi->max_tp_rate = mg->max_tp_rate;
343 cur_tp = mr->cur_tp;
344 mi->max_prob_streams = minstrel_mcs_groups[group].streams - 1;
345 } 450 }
346 451
347 mr = minstrel_get_ratestats(mi, mg->max_tp_rate2); 452 memcpy(mg->max_group_tp_rate, tmp_group_tp_rate,
348 if (cur_tp2 < mr->cur_tp) { 453 sizeof(mg->max_group_tp_rate));
349 mi->max_tp_rate2 = mg->max_tp_rate2;
350 cur_tp2 = mr->cur_tp;
351 }
352 } 454 }
353 455
354 if (mi->max_prob_streams < 1) 456 /* Assign new rate set per sta */
355 mi->max_prob_streams = 1; 457 minstrel_ht_assign_best_tp_rates(mi, tmp_mcs_tp_rate, tmp_cck_tp_rate);
458 memcpy(mi->max_tp_rate, tmp_mcs_tp_rate, sizeof(mi->max_tp_rate));
356 459
357 for (group = 0; group < ARRAY_SIZE(minstrel_mcs_groups); group++) { 460 /* Try to increase robustness of max_prob_rate*/
358 mg = &mi->groups[group]; 461 minstrel_ht_prob_rate_reduce_streams(mi);
359 if (!mg->supported) 462
360 continue; 463 /* try to sample all available rates during each interval */
361 mr = minstrel_get_ratestats(mi, mg->max_prob_rate); 464 mi->sample_count *= 8;
362 if (cur_prob_tp < mr->cur_tp &&
363 minstrel_mcs_groups[group].streams <= mi->max_prob_streams) {
364 mi->max_prob_rate = mg->max_prob_rate;
365 cur_prob = mr->cur_prob;
366 cur_prob_tp = mr->cur_tp;
367 }
368 }
369 465
370#ifdef CONFIG_MAC80211_DEBUGFS 466#ifdef CONFIG_MAC80211_DEBUGFS
371 /* use fixed index if set */ 467 /* use fixed index if set */
372 if (mp->fixed_rate_idx != -1) { 468 if (mp->fixed_rate_idx != -1) {
373 mi->max_tp_rate = mp->fixed_rate_idx; 469 for (i = 0; i < 4; i++)
374 mi->max_tp_rate2 = mp->fixed_rate_idx; 470 mi->max_tp_rate[i] = mp->fixed_rate_idx;
375 mi->max_prob_rate = mp->fixed_rate_idx; 471 mi->max_prob_rate = mp->fixed_rate_idx;
376 } 472 }
377#endif 473#endif
378 474
475 /* Reset update timer */
379 mi->stats_update = jiffies; 476 mi->stats_update = jiffies;
380} 477}
381 478
@@ -420,8 +517,7 @@ minstrel_next_sample_idx(struct minstrel_ht_sta *mi)
420} 517}
421 518
422static void 519static void
423minstrel_downgrade_rate(struct minstrel_ht_sta *mi, unsigned int *idx, 520minstrel_downgrade_rate(struct minstrel_ht_sta *mi, u8 *idx, bool primary)
424 bool primary)
425{ 521{
426 int group, orig_group; 522 int group, orig_group;
427 523
@@ -437,9 +533,9 @@ minstrel_downgrade_rate(struct minstrel_ht_sta *mi, unsigned int *idx,
437 continue; 533 continue;
438 534
439 if (primary) 535 if (primary)
440 *idx = mi->groups[group].max_tp_rate; 536 *idx = mi->groups[group].max_group_tp_rate[0];
441 else 537 else
442 *idx = mi->groups[group].max_tp_rate2; 538 *idx = mi->groups[group].max_group_tp_rate[1];
443 break; 539 break;
444 } 540 }
445} 541}
@@ -524,19 +620,19 @@ minstrel_ht_tx_status(void *priv, struct ieee80211_supported_band *sband,
524 * check for sudden death of spatial multiplexing, 620 * check for sudden death of spatial multiplexing,
525 * downgrade to a lower number of streams if necessary. 621 * downgrade to a lower number of streams if necessary.
526 */ 622 */
527 rate = minstrel_get_ratestats(mi, mi->max_tp_rate); 623 rate = minstrel_get_ratestats(mi, mi->max_tp_rate[0]);
528 if (rate->attempts > 30 && 624 if (rate->attempts > 30 &&
529 MINSTREL_FRAC(rate->success, rate->attempts) < 625 MINSTREL_FRAC(rate->success, rate->attempts) <
530 MINSTREL_FRAC(20, 100)) { 626 MINSTREL_FRAC(20, 100)) {
531 minstrel_downgrade_rate(mi, &mi->max_tp_rate, true); 627 minstrel_downgrade_rate(mi, &mi->max_tp_rate[0], true);
532 update = true; 628 update = true;
533 } 629 }
534 630
535 rate2 = minstrel_get_ratestats(mi, mi->max_tp_rate2); 631 rate2 = minstrel_get_ratestats(mi, mi->max_tp_rate[1]);
536 if (rate2->attempts > 30 && 632 if (rate2->attempts > 30 &&
537 MINSTREL_FRAC(rate2->success, rate2->attempts) < 633 MINSTREL_FRAC(rate2->success, rate2->attempts) <
538 MINSTREL_FRAC(20, 100)) { 634 MINSTREL_FRAC(20, 100)) {
539 minstrel_downgrade_rate(mi, &mi->max_tp_rate2, false); 635 minstrel_downgrade_rate(mi, &mi->max_tp_rate[1], false);
540 update = true; 636 update = true;
541 } 637 }
542 638
@@ -661,12 +757,12 @@ minstrel_ht_update_rates(struct minstrel_priv *mp, struct minstrel_ht_sta *mi)
661 if (!rates) 757 if (!rates)
662 return; 758 return;
663 759
664 /* Start with max_tp_rate */ 760 /* Start with max_tp_rate[0] */
665 minstrel_ht_set_rate(mp, mi, rates, i++, mi->max_tp_rate); 761 minstrel_ht_set_rate(mp, mi, rates, i++, mi->max_tp_rate[0]);
666 762
667 if (mp->hw->max_rates >= 3) { 763 if (mp->hw->max_rates >= 3) {
668 /* At least 3 tx rates supported, use max_tp_rate2 next */ 764 /* At least 3 tx rates supported, use max_tp_rate[1] next */
669 minstrel_ht_set_rate(mp, mi, rates, i++, mi->max_tp_rate2); 765 minstrel_ht_set_rate(mp, mi, rates, i++, mi->max_tp_rate[1]);
670 } 766 }
671 767
672 if (mp->hw->max_rates >= 2) { 768 if (mp->hw->max_rates >= 2) {
@@ -691,7 +787,7 @@ minstrel_get_sample_rate(struct minstrel_priv *mp, struct minstrel_ht_sta *mi)
691{ 787{
692 struct minstrel_rate_stats *mr; 788 struct minstrel_rate_stats *mr;
693 struct minstrel_mcs_group_data *mg; 789 struct minstrel_mcs_group_data *mg;
694 unsigned int sample_dur, sample_group; 790 unsigned int sample_dur, sample_group, cur_max_tp_streams;
695 int sample_idx = 0; 791 int sample_idx = 0;
696 792
697 if (mi->sample_wait > 0) { 793 if (mi->sample_wait > 0) {
@@ -718,8 +814,8 @@ minstrel_get_sample_rate(struct minstrel_priv *mp, struct minstrel_ht_sta *mi)
718 * to the frame. Hence, don't use sampling for the currently 814 * to the frame. Hence, don't use sampling for the currently
719 * used rates. 815 * used rates.
720 */ 816 */
721 if (sample_idx == mi->max_tp_rate || 817 if (sample_idx == mi->max_tp_rate[0] ||
722 sample_idx == mi->max_tp_rate2 || 818 sample_idx == mi->max_tp_rate[1] ||
723 sample_idx == mi->max_prob_rate) 819 sample_idx == mi->max_prob_rate)
724 return -1; 820 return -1;
725 821
@@ -734,9 +830,12 @@ minstrel_get_sample_rate(struct minstrel_priv *mp, struct minstrel_ht_sta *mi)
734 * Make sure that lower rates get sampled only occasionally, 830 * Make sure that lower rates get sampled only occasionally,
735 * if the link is working perfectly. 831 * if the link is working perfectly.
736 */ 832 */
833
834 cur_max_tp_streams = minstrel_mcs_groups[mi->max_tp_rate[0] /
835 MCS_GROUP_RATES].streams;
737 sample_dur = minstrel_get_duration(sample_idx); 836 sample_dur = minstrel_get_duration(sample_idx);
738 if (sample_dur >= minstrel_get_duration(mi->max_tp_rate2) && 837 if (sample_dur >= minstrel_get_duration(mi->max_tp_rate[1]) &&
739 (mi->max_prob_streams < 838 (cur_max_tp_streams - 1 <
740 minstrel_mcs_groups[sample_group].streams || 839 minstrel_mcs_groups[sample_group].streams ||
741 sample_dur >= minstrel_get_duration(mi->max_prob_rate))) { 840 sample_dur >= minstrel_get_duration(mi->max_prob_rate))) {
742 if (mr->sample_skipped < 20) 841 if (mr->sample_skipped < 20)
@@ -1041,8 +1140,8 @@ static u32 minstrel_ht_get_expected_throughput(void *priv_sta)
1041 if (!msp->is_ht) 1140 if (!msp->is_ht)
1042 return mac80211_minstrel.get_expected_throughput(priv_sta); 1141 return mac80211_minstrel.get_expected_throughput(priv_sta);
1043 1142
1044 i = mi->max_tp_rate / MCS_GROUP_RATES; 1143 i = mi->max_tp_rate[0] / MCS_GROUP_RATES;
1045 j = mi->max_tp_rate % MCS_GROUP_RATES; 1144 j = mi->max_tp_rate[0] % MCS_GROUP_RATES;
1046 1145
1047 /* convert cur_tp from pkt per second in kbps */ 1146 /* convert cur_tp from pkt per second in kbps */
1048 return mi->groups[i].rates[j].cur_tp * AVG_PKT_SIZE * 8 / 1024; 1147 return mi->groups[i].rates[j].cur_tp * AVG_PKT_SIZE * 8 / 1024;
diff --git a/net/mac80211/rc80211_minstrel_ht.h b/net/mac80211/rc80211_minstrel_ht.h
index d655586773ac..01570e0e014b 100644
--- a/net/mac80211/rc80211_minstrel_ht.h
+++ b/net/mac80211/rc80211_minstrel_ht.h
@@ -26,28 +26,6 @@ struct mcs_group {
26 26
27extern const struct mcs_group minstrel_mcs_groups[]; 27extern const struct mcs_group minstrel_mcs_groups[];
28 28
29struct minstrel_rate_stats {
30 /* current / last sampling period attempts/success counters */
31 unsigned int attempts, last_attempts;
32 unsigned int success, last_success;
33
34 /* total attempts/success counters */
35 u64 att_hist, succ_hist;
36
37 /* current throughput */
38 unsigned int cur_tp;
39
40 /* packet delivery probabilities */
41 unsigned int cur_prob, probability;
42
43 /* maximum retry counts */
44 unsigned int retry_count;
45 unsigned int retry_count_rtscts;
46
47 bool retry_updated;
48 u8 sample_skipped;
49};
50
51struct minstrel_mcs_group_data { 29struct minstrel_mcs_group_data {
52 u8 index; 30 u8 index;
53 u8 column; 31 u8 column;
@@ -55,10 +33,9 @@ struct minstrel_mcs_group_data {
55 /* bitfield of supported MCS rates of this group */ 33 /* bitfield of supported MCS rates of this group */
56 u8 supported; 34 u8 supported;
57 35
58 /* selected primary rates */ 36 /* sorted rate set within a MCS group*/
59 unsigned int max_tp_rate; 37 u8 max_group_tp_rate[MAX_THR_RATES];
60 unsigned int max_tp_rate2; 38 u8 max_group_prob_rate;
61 unsigned int max_prob_rate;
62 39
63 /* MCS rate statistics */ 40 /* MCS rate statistics */
64 struct minstrel_rate_stats rates[MCS_GROUP_RATES]; 41 struct minstrel_rate_stats rates[MCS_GROUP_RATES];
@@ -74,15 +51,9 @@ struct minstrel_ht_sta {
74 /* ampdu length (EWMA) */ 51 /* ampdu length (EWMA) */
75 unsigned int avg_ampdu_len; 52 unsigned int avg_ampdu_len;
76 53
77 /* best throughput rate */ 54 /* overall sorted rate set */
78 unsigned int max_tp_rate; 55 u8 max_tp_rate[MAX_THR_RATES];
79 56 u8 max_prob_rate;
80 /* second best throughput rate */
81 unsigned int max_tp_rate2;
82
83 /* best probability rate */
84 unsigned int max_prob_rate;
85 unsigned int max_prob_streams;
86 57
87 /* time of last status update */ 58 /* time of last status update */
88 unsigned long stats_update; 59 unsigned long stats_update;
diff --git a/net/mac80211/rc80211_minstrel_ht_debugfs.c b/net/mac80211/rc80211_minstrel_ht_debugfs.c
index 3e7d793de0c3..a72ad46f2a04 100644
--- a/net/mac80211/rc80211_minstrel_ht_debugfs.c
+++ b/net/mac80211/rc80211_minstrel_ht_debugfs.c
@@ -46,8 +46,10 @@ minstrel_ht_stats_dump(struct minstrel_ht_sta *mi, int i, char *p)
46 else 46 else
47 p += sprintf(p, "HT%c0/%cGI ", htmode, gimode); 47 p += sprintf(p, "HT%c0/%cGI ", htmode, gimode);
48 48
49 *(p++) = (idx == mi->max_tp_rate) ? 'T' : ' '; 49 *(p++) = (idx == mi->max_tp_rate[0]) ? 'A' : ' ';
50 *(p++) = (idx == mi->max_tp_rate2) ? 't' : ' '; 50 *(p++) = (idx == mi->max_tp_rate[1]) ? 'B' : ' ';
51 *(p++) = (idx == mi->max_tp_rate[2]) ? 'C' : ' ';
52 *(p++) = (idx == mi->max_tp_rate[3]) ? 'D' : ' ';
51 *(p++) = (idx == mi->max_prob_rate) ? 'P' : ' '; 53 *(p++) = (idx == mi->max_prob_rate) ? 'P' : ' ';
52 54
53 if (i == max_mcs) { 55 if (i == max_mcs) {
@@ -100,8 +102,8 @@ minstrel_ht_stats_open(struct inode *inode, struct file *file)
100 102
101 file->private_data = ms; 103 file->private_data = ms;
102 p = ms->buf; 104 p = ms->buf;
103 p += sprintf(p, "type rate throughput ewma prob this prob " 105 p += sprintf(p, "type rate throughput ewma prob "
104 "retry this succ/attempt success attempts\n"); 106 "this prob retry this succ/attempt success attempts\n");
105 107
106 p = minstrel_ht_stats_dump(mi, max_mcs, p); 108 p = minstrel_ht_stats_dump(mi, max_mcs, p);
107 for (i = 0; i < max_mcs; i++) 109 for (i = 0; i < max_mcs; i++)
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index bd2c9b22c945..b04ca4049c95 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -3,6 +3,7 @@
3 * Copyright 2005-2006, Devicescape Software, Inc. 3 * Copyright 2005-2006, Devicescape Software, Inc.
4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz> 4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
5 * Copyright 2007-2010 Johannes Berg <johannes@sipsolutions.net> 5 * Copyright 2007-2010 Johannes Berg <johannes@sipsolutions.net>
6 * Copyright 2013-2014 Intel Mobile Communications GmbH
6 * 7 *
7 * This program is free software; you can redistribute it and/or modify 8 * This program is free software; you can redistribute it and/or modify
8 * it under the terms of the GNU General Public License version 2 as 9 * it under the terms of the GNU General Public License version 2 as
@@ -835,6 +836,16 @@ static bool ieee80211_sta_manage_reorder_buf(struct ieee80211_sub_if_data *sdata
835 836
836 spin_lock(&tid_agg_rx->reorder_lock); 837 spin_lock(&tid_agg_rx->reorder_lock);
837 838
839 /*
840 * Offloaded BA sessions have no known starting sequence number so pick
841 * one from first Rxed frame for this tid after BA was started.
842 */
843 if (unlikely(tid_agg_rx->auto_seq)) {
844 tid_agg_rx->auto_seq = false;
845 tid_agg_rx->ssn = mpdu_seq_num;
846 tid_agg_rx->head_seq_num = mpdu_seq_num;
847 }
848
838 buf_size = tid_agg_rx->buf_size; 849 buf_size = tid_agg_rx->buf_size;
839 head_seq_num = tid_agg_rx->head_seq_num; 850 head_seq_num = tid_agg_rx->head_seq_num;
840 851
@@ -2725,7 +2736,7 @@ ieee80211_rx_h_userspace_mgmt(struct ieee80211_rx_data *rx)
2725 sig = status->signal; 2736 sig = status->signal;
2726 2737
2727 if (cfg80211_rx_mgmt(&rx->sdata->wdev, status->freq, sig, 2738 if (cfg80211_rx_mgmt(&rx->sdata->wdev, status->freq, sig,
2728 rx->skb->data, rx->skb->len, 0, GFP_ATOMIC)) { 2739 rx->skb->data, rx->skb->len, 0)) {
2729 if (rx->sta) 2740 if (rx->sta)
2730 rx->sta->rx_packets++; 2741 rx->sta->rx_packets++;
2731 dev_kfree_skb(rx->skb); 2742 dev_kfree_skb(rx->skb);
diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c
index a0a938145dcc..af0d094b2f2f 100644
--- a/net/mac80211/scan.c
+++ b/net/mac80211/scan.c
@@ -6,6 +6,7 @@
6 * Copyright 2005, Devicescape Software, Inc. 6 * Copyright 2005, Devicescape Software, Inc.
7 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz> 7 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
8 * Copyright 2007, Michael Wu <flamingice@sourmilk.net> 8 * Copyright 2007, Michael Wu <flamingice@sourmilk.net>
9 * Copyright 2013-2014 Intel Mobile Communications GmbH
9 * 10 *
10 * This program is free software; you can redistribute it and/or modify 11 * This program is free software; you can redistribute it and/or modify
11 * it under the terms of the GNU General Public License version 2 as 12 * it under the terms of the GNU General Public License version 2 as
@@ -1094,7 +1095,7 @@ int ieee80211_request_sched_scan_stop(struct ieee80211_sub_if_data *sdata)
1094 if (rcu_access_pointer(local->sched_scan_sdata)) { 1095 if (rcu_access_pointer(local->sched_scan_sdata)) {
1095 ret = drv_sched_scan_stop(local, sdata); 1096 ret = drv_sched_scan_stop(local, sdata);
1096 if (!ret) 1097 if (!ret)
1097 rcu_assign_pointer(local->sched_scan_sdata, NULL); 1098 RCU_INIT_POINTER(local->sched_scan_sdata, NULL);
1098 } 1099 }
1099out: 1100out:
1100 mutex_unlock(&local->mtx); 1101 mutex_unlock(&local->mtx);
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index a1e433b88c66..de494df3bab8 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -1,6 +1,7 @@
1/* 1/*
2 * Copyright 2002-2005, Instant802 Networks, Inc. 2 * Copyright 2002-2005, Instant802 Networks, Inc.
3 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz> 3 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
4 * Copyright 2013-2014 Intel Mobile Communications GmbH
4 * 5 *
5 * This program is free software; you can redistribute it and/or modify 6 * This program is free software; you can redistribute it and/or modify
6 * it under the terms of the GNU General Public License version 2 as 7 * it under the terms of the GNU General Public License version 2 as
@@ -1182,7 +1183,7 @@ static void ieee80211_send_null_response(struct ieee80211_sub_if_data *sdata,
1182 struct sk_buff *skb; 1183 struct sk_buff *skb;
1183 int size = sizeof(*nullfunc); 1184 int size = sizeof(*nullfunc);
1184 __le16 fc; 1185 __le16 fc;
1185 bool qos = test_sta_flag(sta, WLAN_STA_WME); 1186 bool qos = sta->sta.wme;
1186 struct ieee80211_tx_info *info; 1187 struct ieee80211_tx_info *info;
1187 struct ieee80211_chanctx_conf *chanctx_conf; 1188 struct ieee80211_chanctx_conf *chanctx_conf;
1188 1189
@@ -1837,7 +1838,7 @@ void sta_set_sinfo(struct sta_info *sta, struct station_info *sinfo)
1837 sinfo->sta_flags.set |= BIT(NL80211_STA_FLAG_AUTHORIZED); 1838 sinfo->sta_flags.set |= BIT(NL80211_STA_FLAG_AUTHORIZED);
1838 if (test_sta_flag(sta, WLAN_STA_SHORT_PREAMBLE)) 1839 if (test_sta_flag(sta, WLAN_STA_SHORT_PREAMBLE))
1839 sinfo->sta_flags.set |= BIT(NL80211_STA_FLAG_SHORT_PREAMBLE); 1840 sinfo->sta_flags.set |= BIT(NL80211_STA_FLAG_SHORT_PREAMBLE);
1840 if (test_sta_flag(sta, WLAN_STA_WME)) 1841 if (sta->sta.wme)
1841 sinfo->sta_flags.set |= BIT(NL80211_STA_FLAG_WME); 1842 sinfo->sta_flags.set |= BIT(NL80211_STA_FLAG_WME);
1842 if (test_sta_flag(sta, WLAN_STA_MFP)) 1843 if (test_sta_flag(sta, WLAN_STA_MFP))
1843 sinfo->sta_flags.set |= BIT(NL80211_STA_FLAG_MFP); 1844 sinfo->sta_flags.set |= BIT(NL80211_STA_FLAG_MFP);
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index d411bcc8ef08..42f68cb8957e 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -1,5 +1,6 @@
1/* 1/*
2 * Copyright 2002-2005, Devicescape Software, Inc. 2 * Copyright 2002-2005, Devicescape Software, Inc.
3 * Copyright 2013-2014 Intel Mobile Communications GmbH
3 * 4 *
4 * This program is free software; you can redistribute it and/or modify 5 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License version 2 as 6 * it under the terms of the GNU General Public License version 2 as
@@ -31,7 +32,6 @@
31 * when virtual port control is not in use. 32 * when virtual port control is not in use.
32 * @WLAN_STA_SHORT_PREAMBLE: Station is capable of receiving short-preamble 33 * @WLAN_STA_SHORT_PREAMBLE: Station is capable of receiving short-preamble
33 * frames. 34 * frames.
34 * @WLAN_STA_WME: Station is a QoS-STA.
35 * @WLAN_STA_WDS: Station is one of our WDS peers. 35 * @WLAN_STA_WDS: Station is one of our WDS peers.
36 * @WLAN_STA_CLEAR_PS_FILT: Clear PS filter in hardware (using the 36 * @WLAN_STA_CLEAR_PS_FILT: Clear PS filter in hardware (using the
37 * IEEE80211_TX_CTL_CLEAR_PS_FILT control flag) when the next 37 * IEEE80211_TX_CTL_CLEAR_PS_FILT control flag) when the next
@@ -69,7 +69,6 @@ enum ieee80211_sta_info_flags {
69 WLAN_STA_PS_STA, 69 WLAN_STA_PS_STA,
70 WLAN_STA_AUTHORIZED, 70 WLAN_STA_AUTHORIZED,
71 WLAN_STA_SHORT_PREAMBLE, 71 WLAN_STA_SHORT_PREAMBLE,
72 WLAN_STA_WME,
73 WLAN_STA_WDS, 72 WLAN_STA_WDS,
74 WLAN_STA_CLEAR_PS_FILT, 73 WLAN_STA_CLEAR_PS_FILT,
75 WLAN_STA_MFP, 74 WLAN_STA_MFP,
@@ -169,6 +168,8 @@ struct tid_ampdu_tx {
169 * @dialog_token: dialog token for aggregation session 168 * @dialog_token: dialog token for aggregation session
170 * @rcu_head: RCU head used for freeing this struct 169 * @rcu_head: RCU head used for freeing this struct
171 * @reorder_lock: serializes access to reorder buffer, see below. 170 * @reorder_lock: serializes access to reorder buffer, see below.
171 * @auto_seq: used for offloaded BA sessions to automatically pick head_seq_and
172 * and ssn.
172 * 173 *
173 * This structure's lifetime is managed by RCU, assignments to 174 * This structure's lifetime is managed by RCU, assignments to
174 * the array holding it must hold the aggregation mutex. 175 * the array holding it must hold the aggregation mutex.
@@ -192,6 +193,7 @@ struct tid_ampdu_rx {
192 u16 buf_size; 193 u16 buf_size;
193 u16 timeout; 194 u16 timeout;
194 u8 dialog_token; 195 u8 dialog_token;
196 bool auto_seq;
195}; 197};
196 198
197/** 199/**
@@ -448,6 +450,9 @@ struct sta_info {
448 enum ieee80211_smps_mode known_smps_mode; 450 enum ieee80211_smps_mode known_smps_mode;
449 const struct ieee80211_cipher_scheme *cipher_scheme; 451 const struct ieee80211_cipher_scheme *cipher_scheme;
450 452
453 /* TDLS timeout data */
454 unsigned long last_tdls_pkt_time;
455
451 /* keep last! */ 456 /* keep last! */
452 struct ieee80211_sta sta; 457 struct ieee80211_sta sta;
453}; 458};
diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index aa06dcad336e..89290e33dafe 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -3,6 +3,7 @@
3 * Copyright 2005-2006, Devicescape Software, Inc. 3 * Copyright 2005-2006, Devicescape Software, Inc.
4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz> 4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
5 * Copyright 2008-2010 Johannes Berg <johannes@sipsolutions.net> 5 * Copyright 2008-2010 Johannes Berg <johannes@sipsolutions.net>
6 * Copyright 2013-2014 Intel Mobile Communications GmbH
6 * 7 *
7 * This program is free software; you can redistribute it and/or modify 8 * This program is free software; you can redistribute it and/or modify
8 * it under the terms of the GNU General Public License version 2 as 9 * it under the terms of the GNU General Public License version 2 as
@@ -537,6 +538,8 @@ static void ieee80211_tx_latency_end_msrmnt(struct ieee80211_local *local,
537 * - current throughput (higher value for higher tpt)? 538 * - current throughput (higher value for higher tpt)?
538 */ 539 */
539#define STA_LOST_PKT_THRESHOLD 50 540#define STA_LOST_PKT_THRESHOLD 50
541#define STA_LOST_TDLS_PKT_THRESHOLD 10
542#define STA_LOST_TDLS_PKT_TIME (10*HZ) /* 10secs since last ACK */
540 543
541static void ieee80211_lost_packet(struct sta_info *sta, struct sk_buff *skb) 544static void ieee80211_lost_packet(struct sta_info *sta, struct sk_buff *skb)
542{ 545{
@@ -547,7 +550,20 @@ static void ieee80211_lost_packet(struct sta_info *sta, struct sk_buff *skb)
547 !(info->flags & IEEE80211_TX_STAT_AMPDU)) 550 !(info->flags & IEEE80211_TX_STAT_AMPDU))
548 return; 551 return;
549 552
550 if (++sta->lost_packets < STA_LOST_PKT_THRESHOLD) 553 sta->lost_packets++;
554 if (!sta->sta.tdls && sta->lost_packets < STA_LOST_PKT_THRESHOLD)
555 return;
556
557 /*
558 * If we're in TDLS mode, make sure that all STA_LOST_TDLS_PKT_THRESHOLD
559 * of the last packets were lost, and that no ACK was received in the
560 * last STA_LOST_TDLS_PKT_TIME ms, before triggering the CQM packet-loss
561 * mechanism.
562 */
563 if (sta->sta.tdls &&
564 (sta->lost_packets < STA_LOST_TDLS_PKT_THRESHOLD ||
565 time_before(jiffies,
566 sta->last_tdls_pkt_time + STA_LOST_TDLS_PKT_TIME)))
551 return; 567 return;
552 568
553 cfg80211_cqm_pktloss_notify(sta->sdata->dev, sta->sta.addr, 569 cfg80211_cqm_pktloss_notify(sta->sdata->dev, sta->sta.addr,
@@ -694,6 +710,10 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
694 if (info->flags & IEEE80211_TX_STAT_ACK) { 710 if (info->flags & IEEE80211_TX_STAT_ACK) {
695 if (sta->lost_packets) 711 if (sta->lost_packets)
696 sta->lost_packets = 0; 712 sta->lost_packets = 0;
713
714 /* Track when last TDLS packet was ACKed */
715 if (test_sta_flag(sta, WLAN_STA_TDLS_PEER_AUTH))
716 sta->last_tdls_pkt_time = jiffies;
697 } else { 717 } else {
698 ieee80211_lost_packet(sta, skb); 718 ieee80211_lost_packet(sta, skb);
699 } 719 }
diff --git a/net/mac80211/tdls.c b/net/mac80211/tdls.c
index 1b21050be174..4ea25dec0698 100644
--- a/net/mac80211/tdls.c
+++ b/net/mac80211/tdls.c
@@ -3,6 +3,7 @@
3 * 3 *
4 * Copyright 2006-2010 Johannes Berg <johannes@sipsolutions.net> 4 * Copyright 2006-2010 Johannes Berg <johannes@sipsolutions.net>
5 * Copyright 2014, Intel Corporation 5 * Copyright 2014, Intel Corporation
6 * Copyright 2014 Intel Mobile Communications GmbH
6 * 7 *
7 * This file is GPLv2 as found in COPYING. 8 * This file is GPLv2 as found in COPYING.
8 */ 9 */
@@ -316,8 +317,7 @@ ieee80211_tdls_add_setup_cfm_ies(struct ieee80211_sub_if_data *sdata,
316 } 317 }
317 318
318 /* add the QoS param IE if both the peer and we support it */ 319 /* add the QoS param IE if both the peer and we support it */
319 if (local->hw.queues >= IEEE80211_NUM_ACS && 320 if (local->hw.queues >= IEEE80211_NUM_ACS && sta->sta.wme)
320 test_sta_flag(sta, WLAN_STA_WME))
321 ieee80211_tdls_add_wmm_param_ie(sdata, skb); 321 ieee80211_tdls_add_wmm_param_ie(sdata, skb);
322 322
323 /* add any custom IEs that go before HT operation */ 323 /* add any custom IEs that go before HT operation */
@@ -412,6 +412,9 @@ ieee80211_prep_tdls_encap_data(struct wiphy *wiphy, struct net_device *dev,
412 tf->ether_type = cpu_to_be16(ETH_P_TDLS); 412 tf->ether_type = cpu_to_be16(ETH_P_TDLS);
413 tf->payload_type = WLAN_TDLS_SNAP_RFTYPE; 413 tf->payload_type = WLAN_TDLS_SNAP_RFTYPE;
414 414
415 /* network header is after the ethernet header */
416 skb_set_network_header(skb, ETH_HLEN);
417
415 switch (action_code) { 418 switch (action_code) {
416 case WLAN_TDLS_SETUP_REQUEST: 419 case WLAN_TDLS_SETUP_REQUEST:
417 tf->category = WLAN_CATEGORY_TDLS; 420 tf->category = WLAN_CATEGORY_TDLS;
diff --git a/net/mac80211/trace.h b/net/mac80211/trace.h
index 02ac535d1274..38fae7ebe984 100644
--- a/net/mac80211/trace.h
+++ b/net/mac80211/trace.h
@@ -672,13 +672,13 @@ DEFINE_EVENT(local_u32_evt, drv_set_rts_threshold,
672); 672);
673 673
674TRACE_EVENT(drv_set_coverage_class, 674TRACE_EVENT(drv_set_coverage_class,
675 TP_PROTO(struct ieee80211_local *local, u8 value), 675 TP_PROTO(struct ieee80211_local *local, s16 value),
676 676
677 TP_ARGS(local, value), 677 TP_ARGS(local, value),
678 678
679 TP_STRUCT__entry( 679 TP_STRUCT__entry(
680 LOCAL_ENTRY 680 LOCAL_ENTRY
681 __field(u8, value) 681 __field(s16, value)
682 ), 682 ),
683 683
684 TP_fast_assign( 684 TP_fast_assign(
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 464106c023d8..900632a250ec 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3,6 +3,7 @@
3 * Copyright 2005-2006, Devicescape Software, Inc. 3 * Copyright 2005-2006, Devicescape Software, Inc.
4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz> 4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
5 * Copyright 2007 Johannes Berg <johannes@sipsolutions.net> 5 * Copyright 2007 Johannes Berg <johannes@sipsolutions.net>
6 * Copyright 2013-2014 Intel Mobile Communications GmbH
6 * 7 *
7 * This program is free software; you can redistribute it and/or modify 8 * This program is free software; you can redistribute it and/or modify
8 * it under the terms of the GNU General Public License version 2 as 9 * it under the terms of the GNU General Public License version 2 as
@@ -1478,7 +1479,10 @@ static int ieee80211_skb_resize(struct ieee80211_sub_if_data *sdata,
1478 tail_need = max_t(int, tail_need, 0); 1479 tail_need = max_t(int, tail_need, 0);
1479 } 1480 }
1480 1481
1481 if (skb_cloned(skb)) 1482 if (skb_cloned(skb) &&
1483 (!(local->hw.flags & IEEE80211_HW_SUPPORTS_CLONED_SKBS) ||
1484 !skb_clone_writable(skb, ETH_HLEN) ||
1485 sdata->crypto_tx_tailroom_needed_cnt))
1482 I802_DEBUG_INC(local->tx_expand_skb_head_cloned); 1486 I802_DEBUG_INC(local->tx_expand_skb_head_cloned);
1483 else if (head_need || tail_need) 1487 else if (head_need || tail_need)
1484 I802_DEBUG_INC(local->tx_expand_skb_head); 1488 I802_DEBUG_INC(local->tx_expand_skb_head);
@@ -1785,9 +1789,8 @@ static void ieee80211_tx_latency_start_msrmnt(struct ieee80211_local *local,
1785 * @skb: packet to be sent 1789 * @skb: packet to be sent
1786 * @dev: incoming interface 1790 * @dev: incoming interface
1787 * 1791 *
1788 * Returns: 0 on success (and frees skb in this case) or 1 on failure (skb will 1792 * Returns: NETDEV_TX_OK both on success and on failure. On failure skb will
1789 * not be freed, and caller is responsible for either retrying later or freeing 1793 * be freed.
1790 * skb).
1791 * 1794 *
1792 * This function takes in an Ethernet header and encapsulates it with suitable 1795 * This function takes in an Ethernet header and encapsulates it with suitable
1793 * IEEE 802.11 header based on which interface the packet is coming in. The 1796 * IEEE 802.11 header based on which interface the packet is coming in. The
@@ -1844,7 +1847,7 @@ netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb,
1844 memcpy(hdr.addr4, skb->data + ETH_ALEN, ETH_ALEN); 1847 memcpy(hdr.addr4, skb->data + ETH_ALEN, ETH_ALEN);
1845 hdrlen = 30; 1848 hdrlen = 30;
1846 authorized = test_sta_flag(sta, WLAN_STA_AUTHORIZED); 1849 authorized = test_sta_flag(sta, WLAN_STA_AUTHORIZED);
1847 wme_sta = test_sta_flag(sta, WLAN_STA_WME); 1850 wme_sta = sta->sta.wme;
1848 } 1851 }
1849 ap_sdata = container_of(sdata->bss, struct ieee80211_sub_if_data, 1852 ap_sdata = container_of(sdata->bss, struct ieee80211_sub_if_data,
1850 u.ap); 1853 u.ap);
@@ -1957,7 +1960,7 @@ netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb,
1957 if (sta) { 1960 if (sta) {
1958 authorized = test_sta_flag(sta, 1961 authorized = test_sta_flag(sta,
1959 WLAN_STA_AUTHORIZED); 1962 WLAN_STA_AUTHORIZED);
1960 wme_sta = test_sta_flag(sta, WLAN_STA_WME); 1963 wme_sta = sta->sta.wme;
1961 tdls_peer = test_sta_flag(sta, 1964 tdls_peer = test_sta_flag(sta,
1962 WLAN_STA_TDLS_PEER); 1965 WLAN_STA_TDLS_PEER);
1963 tdls_auth = test_sta_flag(sta, 1966 tdls_auth = test_sta_flag(sta,
@@ -2035,7 +2038,7 @@ netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb,
2035 sta = sta_info_get(sdata, hdr.addr1); 2038 sta = sta_info_get(sdata, hdr.addr1);
2036 if (sta) { 2039 if (sta) {
2037 authorized = test_sta_flag(sta, WLAN_STA_AUTHORIZED); 2040 authorized = test_sta_flag(sta, WLAN_STA_AUTHORIZED);
2038 wme_sta = test_sta_flag(sta, WLAN_STA_WME); 2041 wme_sta = sta->sta.wme;
2039 } 2042 }
2040 } 2043 }
2041 2044
@@ -2069,30 +2072,23 @@ netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb,
2069 2072
2070 if (unlikely(!multicast && skb->sk && 2073 if (unlikely(!multicast && skb->sk &&
2071 skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS)) { 2074 skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS)) {
2072 struct sk_buff *orig_skb = skb; 2075 struct sk_buff *ack_skb = skb_clone_sk(skb);
2073 2076
2074 skb = skb_clone(skb, GFP_ATOMIC); 2077 if (ack_skb) {
2075 if (skb) {
2076 unsigned long flags; 2078 unsigned long flags;
2077 int id; 2079 int id;
2078 2080
2079 spin_lock_irqsave(&local->ack_status_lock, flags); 2081 spin_lock_irqsave(&local->ack_status_lock, flags);
2080 id = idr_alloc(&local->ack_status_frames, orig_skb, 2082 id = idr_alloc(&local->ack_status_frames, ack_skb,
2081 1, 0x10000, GFP_ATOMIC); 2083 1, 0x10000, GFP_ATOMIC);
2082 spin_unlock_irqrestore(&local->ack_status_lock, flags); 2084 spin_unlock_irqrestore(&local->ack_status_lock, flags);
2083 2085
2084 if (id >= 0) { 2086 if (id >= 0) {
2085 info_id = id; 2087 info_id = id;
2086 info_flags |= IEEE80211_TX_CTL_REQ_TX_STATUS; 2088 info_flags |= IEEE80211_TX_CTL_REQ_TX_STATUS;
2087 } else if (skb_shared(skb)) {
2088 kfree_skb(orig_skb);
2089 } else { 2089 } else {
2090 kfree_skb(skb); 2090 kfree_skb(ack_skb);
2091 skb = orig_skb;
2092 } 2091 }
2093 } else {
2094 /* couldn't clone -- lose tx status ... */
2095 skb = orig_skb;
2096 } 2092 }
2097 } 2093 }
2098 2094
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 725af7a468d2..3c61060a4d2b 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -3,6 +3,7 @@
3 * Copyright 2005-2006, Devicescape Software, Inc. 3 * Copyright 2005-2006, Devicescape Software, Inc.
4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz> 4 * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
5 * Copyright 2007 Johannes Berg <johannes@sipsolutions.net> 5 * Copyright 2007 Johannes Berg <johannes@sipsolutions.net>
6 * Copyright 2013-2014 Intel Mobile Communications GmbH
6 * 7 *
7 * This program is free software; you can redistribute it and/or modify 8 * This program is free software; you can redistribute it and/or modify
8 * it under the terms of the GNU General Public License version 2 as 9 * it under the terms of the GNU General Public License version 2 as
@@ -1014,6 +1015,31 @@ u32 ieee802_11_parse_elems_crc(const u8 *start, size_t len, bool action,
1014 } 1015 }
1015 elems->pwr_constr_elem = pos; 1016 elems->pwr_constr_elem = pos;
1016 break; 1017 break;
1018 case WLAN_EID_CISCO_VENDOR_SPECIFIC:
1019 /* Lots of different options exist, but we only care
1020 * about the Dynamic Transmit Power Control element.
1021 * First check for the Cisco OUI, then for the DTPC
1022 * tag (0x00).
1023 */
1024 if (elen < 4) {
1025 elem_parse_failed = true;
1026 break;
1027 }
1028
1029 if (pos[0] != 0x00 || pos[1] != 0x40 ||
1030 pos[2] != 0x96 || pos[3] != 0x00)
1031 break;
1032
1033 if (elen != 6) {
1034 elem_parse_failed = true;
1035 break;
1036 }
1037
1038 if (calc_crc)
1039 crc = crc32_be(crc, pos - 2, elen + 2);
1040
1041 elems->cisco_dtpc_elem = pos;
1042 break;
1017 case WLAN_EID_TIMEOUT_INTERVAL: 1043 case WLAN_EID_TIMEOUT_INTERVAL:
1018 if (elen >= sizeof(struct ieee80211_timeout_interval_ie)) 1044 if (elen >= sizeof(struct ieee80211_timeout_interval_ie))
1019 elems->timeout_int = (void *)pos; 1045 elems->timeout_int = (void *)pos;
diff --git a/net/mac80211/wme.c b/net/mac80211/wme.c
index d51422c778de..3b873989992c 100644
--- a/net/mac80211/wme.c
+++ b/net/mac80211/wme.c
@@ -1,5 +1,6 @@
1/* 1/*
2 * Copyright 2004, Instant802 Networks, Inc. 2 * Copyright 2004, Instant802 Networks, Inc.
3 * Copyright 2013-2014 Intel Mobile Communications GmbH
3 * 4 *
4 * This program is free software; you can redistribute it and/or modify 5 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License version 2 as 6 * it under the terms of the GNU General Public License version 2 as
@@ -118,7 +119,7 @@ u16 ieee80211_select_queue(struct ieee80211_sub_if_data *sdata,
118 case NL80211_IFTYPE_AP_VLAN: 119 case NL80211_IFTYPE_AP_VLAN:
119 sta = rcu_dereference(sdata->u.vlan.sta); 120 sta = rcu_dereference(sdata->u.vlan.sta);
120 if (sta) { 121 if (sta) {
121 qos = test_sta_flag(sta, WLAN_STA_WME); 122 qos = sta->sta.wme;
122 break; 123 break;
123 } 124 }
124 case NL80211_IFTYPE_AP: 125 case NL80211_IFTYPE_AP:
@@ -145,7 +146,7 @@ u16 ieee80211_select_queue(struct ieee80211_sub_if_data *sdata,
145 if (!sta && ra && !is_multicast_ether_addr(ra)) { 146 if (!sta && ra && !is_multicast_ether_addr(ra)) {
146 sta = sta_info_get(sdata, ra); 147 sta = sta_info_get(sdata, ra);
147 if (sta) 148 if (sta)
148 qos = test_sta_flag(sta, WLAN_STA_WME); 149 qos = sta->sta.wme;
149 } 150 }
150 rcu_read_unlock(); 151 rcu_read_unlock();
151 152
diff --git a/net/mac80211/wpa.c b/net/mac80211/wpa.c
index f7d4ca4c46e0..983527a4c1ab 100644
--- a/net/mac80211/wpa.c
+++ b/net/mac80211/wpa.c
@@ -64,8 +64,11 @@ ieee80211_tx_h_michael_mic_add(struct ieee80211_tx_data *tx)
64 if (!info->control.hw_key) 64 if (!info->control.hw_key)
65 tail += IEEE80211_TKIP_ICV_LEN; 65 tail += IEEE80211_TKIP_ICV_LEN;
66 66
67 if (WARN_ON(skb_tailroom(skb) < tail || 67 if (WARN(skb_tailroom(skb) < tail ||
68 skb_headroom(skb) < IEEE80211_TKIP_IV_LEN)) 68 skb_headroom(skb) < IEEE80211_TKIP_IV_LEN,
69 "mmic: not enough head/tail (%d/%d,%d/%d)\n",
70 skb_headroom(skb), IEEE80211_TKIP_IV_LEN,
71 skb_tailroom(skb), tail))
69 return TX_DROP; 72 return TX_DROP;
70 73
71 key = &tx->key->conf.key[NL80211_TKIP_DATA_OFFSET_TX_MIC_KEY]; 74 key = &tx->key->conf.key[NL80211_TKIP_DATA_OFFSET_TX_MIC_KEY];
diff --git a/net/mac802154/rx.c b/net/mac802154/rx.c
index 7f820a108a9c..a14cf9ede171 100644
--- a/net/mac802154/rx.c
+++ b/net/mac802154/rx.c
@@ -86,9 +86,8 @@ fail:
86static void mac802154_rx_worker(struct work_struct *work) 86static void mac802154_rx_worker(struct work_struct *work)
87{ 87{
88 struct rx_work *rw = container_of(work, struct rx_work, work); 88 struct rx_work *rw = container_of(work, struct rx_work, work);
89 struct sk_buff *skb = rw->skb;
90 89
91 mac802154_subif_rx(rw->dev, skb, rw->lqi); 90 mac802154_subif_rx(rw->dev, rw->skb, rw->lqi);
92 kfree(rw); 91 kfree(rw);
93} 92}
94 93
@@ -101,7 +100,7 @@ ieee802154_rx_irqsafe(struct ieee802154_dev *dev, struct sk_buff *skb, u8 lqi)
101 if (!skb) 100 if (!skb)
102 return; 101 return;
103 102
104 work = kzalloc(sizeof(struct rx_work), GFP_ATOMIC); 103 work = kzalloc(sizeof(*work), GFP_ATOMIC);
105 if (!work) 104 if (!work)
106 return; 105 return;
107 106
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 8124353646ae..fdf4c0e67259 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -89,8 +89,7 @@ netdev_tx_t mac802154_tx(struct mac802154_priv *priv, struct sk_buff *skb,
89 89
90 if (!(priv->phy->channels_supported[page] & (1 << chan))) { 90 if (!(priv->phy->channels_supported[page] & (1 << chan))) {
91 WARN_ON(1); 91 WARN_ON(1);
92 kfree_skb(skb); 92 goto err_tx;
93 return NETDEV_TX_OK;
94 } 93 }
95 94
96 mac802154_monitors_rx(mac802154_to_priv(&priv->hw), skb); 95 mac802154_monitors_rx(mac802154_to_priv(&priv->hw), skb);
@@ -103,12 +102,10 @@ netdev_tx_t mac802154_tx(struct mac802154_priv *priv, struct sk_buff *skb,
103 data[1] = crc >> 8; 102 data[1] = crc >> 8;
104 } 103 }
105 104
106 if (skb_cow_head(skb, priv->hw.extra_tx_headroom)) { 105 if (skb_cow_head(skb, priv->hw.extra_tx_headroom))
107 kfree_skb(skb); 106 goto err_tx;
108 return NETDEV_TX_OK;
109 }
110 107
111 work = kzalloc(sizeof(struct xmit_work), GFP_ATOMIC); 108 work = kzalloc(sizeof(*work), GFP_ATOMIC);
112 if (!work) { 109 if (!work) {
113 kfree_skb(skb); 110 kfree_skb(skb);
114 return NETDEV_TX_BUSY; 111 return NETDEV_TX_BUSY;
@@ -129,4 +126,8 @@ netdev_tx_t mac802154_tx(struct mac802154_priv *priv, struct sk_buff *skb,
129 queue_work(priv->dev_workqueue, &work->work); 126 queue_work(priv->dev_workqueue, &work->work);
130 127
131 return NETDEV_TX_OK; 128 return NETDEV_TX_OK;
129
130err_tx:
131 kfree_skb(skb);
132 return NETDEV_TX_OK;
132} 133}
diff --git a/net/mac802154/wpan.c b/net/mac802154/wpan.c
index d593500ceb3c..4ab86a57dca5 100644
--- a/net/mac802154/wpan.c
+++ b/net/mac802154/wpan.c
@@ -475,8 +475,7 @@ mac802154_subif_frame(struct mac802154_sub_if_data *sdata, struct sk_buff *skb,
475 rc = mac802154_llsec_decrypt(&sdata->sec, skb); 475 rc = mac802154_llsec_decrypt(&sdata->sec, skb);
476 if (rc) { 476 if (rc) {
477 pr_debug("decryption failed: %i\n", rc); 477 pr_debug("decryption failed: %i\n", rc);
478 kfree_skb(skb); 478 goto fail;
479 return NET_RX_DROP;
480 } 479 }
481 480
482 sdata->dev->stats.rx_packets++; 481 sdata->dev->stats.rx_packets++;
@@ -488,9 +487,12 @@ mac802154_subif_frame(struct mac802154_sub_if_data *sdata, struct sk_buff *skb,
488 default: 487 default:
489 pr_warn("ieee802154: bad frame received (type = %d)\n", 488 pr_warn("ieee802154: bad frame received (type = %d)\n",
490 mac_cb(skb)->type); 489 mac_cb(skb)->type);
491 kfree_skb(skb); 490 goto fail;
492 return NET_RX_DROP;
493 } 491 }
492
493fail:
494 kfree_skb(skb);
495 return NET_RX_DROP;
494} 496}
495 497
496static void mac802154_print_addr(const char *name, 498static void mac802154_print_addr(const char *name,
diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index 6b38d083e1c9..e28ed2ef5b06 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -65,15 +65,9 @@ out:
65 return segs; 65 return segs;
66} 66}
67 67
68static int mpls_gso_send_check(struct sk_buff *skb)
69{
70 return 0;
71}
72
73static struct packet_offload mpls_mc_offload = { 68static struct packet_offload mpls_mc_offload = {
74 .type = cpu_to_be16(ETH_P_MPLS_MC), 69 .type = cpu_to_be16(ETH_P_MPLS_MC),
75 .callbacks = { 70 .callbacks = {
76 .gso_send_check = mpls_gso_send_check,
77 .gso_segment = mpls_gso_segment, 71 .gso_segment = mpls_gso_segment,
78 }, 72 },
79}; 73};
@@ -81,7 +75,6 @@ static struct packet_offload mpls_mc_offload = {
81static struct packet_offload mpls_uc_offload = { 75static struct packet_offload mpls_uc_offload = {
82 .type = cpu_to_be16(ETH_P_MPLS_UC), 76 .type = cpu_to_be16(ETH_P_MPLS_UC),
83 .callbacks = { 77 .callbacks = {
84 .gso_send_check = mpls_gso_send_check,
85 .gso_segment = mpls_gso_segment, 78 .gso_segment = mpls_gso_segment,
86 }, 79 },
87}; 80};
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 6d77cce481d5..ae5096ab65eb 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -496,6 +496,15 @@ config NFT_LIMIT
496 This option adds the "limit" expression that you can use to 496 This option adds the "limit" expression that you can use to
497 ratelimit rule matchings. 497 ratelimit rule matchings.
498 498
499config NFT_MASQ
500 depends on NF_TABLES
501 depends on NF_CONNTRACK
502 depends on NF_NAT
503 tristate "Netfilter nf_tables masquerade support"
504 help
505 This option adds the "masquerade" expression that you can use
506 to perform NAT in the masquerade flavour.
507
499config NFT_NAT 508config NFT_NAT
500 depends on NF_TABLES 509 depends on NF_TABLES
501 depends on NF_CONNTRACK 510 depends on NF_CONNTRACK
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index fad5fdba34e5..a9571be3f791 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -87,6 +87,7 @@ obj-$(CONFIG_NFT_RBTREE) += nft_rbtree.o
87obj-$(CONFIG_NFT_HASH) += nft_hash.o 87obj-$(CONFIG_NFT_HASH) += nft_hash.o
88obj-$(CONFIG_NFT_COUNTER) += nft_counter.o 88obj-$(CONFIG_NFT_COUNTER) += nft_counter.o
89obj-$(CONFIG_NFT_LOG) += nft_log.o 89obj-$(CONFIG_NFT_LOG) += nft_log.o
90obj-$(CONFIG_NFT_MASQ) += nft_masq.o
90 91
91# generic X tables 92# generic X tables
92obj-$(CONFIG_NETFILTER_XTABLES) += x_tables.o xt_tcpudp.o 93obj-$(CONFIG_NETFILTER_XTABLES) += x_tables.o xt_tcpudp.o
diff --git a/net/netfilter/ipset/Kconfig b/net/netfilter/ipset/Kconfig
index 2f7f5c32c6f9..234a8ec82076 100644
--- a/net/netfilter/ipset/Kconfig
+++ b/net/netfilter/ipset/Kconfig
@@ -99,6 +99,15 @@ config IP_SET_HASH_IPPORTNET
99 99
100 To compile it as a module, choose M here. If unsure, say N. 100 To compile it as a module, choose M here. If unsure, say N.
101 101
102config IP_SET_HASH_MAC
103 tristate "hash:mac set support"
104 depends on IP_SET
105 help
106 This option adds the hash:mac set type support, by which
107 one can store MAC (ethernet address) elements in a set.
108
109 To compile it as a module, choose M here. If unsure, say N.
110
102config IP_SET_HASH_NETPORTNET 111config IP_SET_HASH_NETPORTNET
103 tristate "hash:net,port,net set support" 112 tristate "hash:net,port,net set support"
104 depends on IP_SET 113 depends on IP_SET
diff --git a/net/netfilter/ipset/Makefile b/net/netfilter/ipset/Makefile
index 231f10196cb9..3dbd5e958489 100644
--- a/net/netfilter/ipset/Makefile
+++ b/net/netfilter/ipset/Makefile
@@ -18,6 +18,7 @@ obj-$(CONFIG_IP_SET_HASH_IPMARK) += ip_set_hash_ipmark.o
18obj-$(CONFIG_IP_SET_HASH_IPPORT) += ip_set_hash_ipport.o 18obj-$(CONFIG_IP_SET_HASH_IPPORT) += ip_set_hash_ipport.o
19obj-$(CONFIG_IP_SET_HASH_IPPORTIP) += ip_set_hash_ipportip.o 19obj-$(CONFIG_IP_SET_HASH_IPPORTIP) += ip_set_hash_ipportip.o
20obj-$(CONFIG_IP_SET_HASH_IPPORTNET) += ip_set_hash_ipportnet.o 20obj-$(CONFIG_IP_SET_HASH_IPPORTNET) += ip_set_hash_ipportnet.o
21obj-$(CONFIG_IP_SET_HASH_MAC) += ip_set_hash_mac.o
21obj-$(CONFIG_IP_SET_HASH_NET) += ip_set_hash_net.o 22obj-$(CONFIG_IP_SET_HASH_NET) += ip_set_hash_net.o
22obj-$(CONFIG_IP_SET_HASH_NETPORT) += ip_set_hash_netport.o 23obj-$(CONFIG_IP_SET_HASH_NETPORT) += ip_set_hash_netport.o
23obj-$(CONFIG_IP_SET_HASH_NETIFACE) += ip_set_hash_netiface.o 24obj-$(CONFIG_IP_SET_HASH_NETIFACE) += ip_set_hash_netiface.o
diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h b/net/netfilter/ipset/ip_set_bitmap_gen.h
index f2c7d83dc23f..6f024a8a1534 100644
--- a/net/netfilter/ipset/ip_set_bitmap_gen.h
+++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
@@ -128,6 +128,8 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext,
128 return 0; 128 return 0;
129 if (SET_WITH_COUNTER(set)) 129 if (SET_WITH_COUNTER(set))
130 ip_set_update_counter(ext_counter(x, set), ext, mext, flags); 130 ip_set_update_counter(ext_counter(x, set), ext, mext, flags);
131 if (SET_WITH_SKBINFO(set))
132 ip_set_get_skbinfo(ext_skbinfo(x, set), ext, mext, flags);
131 return 1; 133 return 1;
132} 134}
133 135
@@ -161,6 +163,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
161 ip_set_init_counter(ext_counter(x, set), ext); 163 ip_set_init_counter(ext_counter(x, set), ext);
162 if (SET_WITH_COMMENT(set)) 164 if (SET_WITH_COMMENT(set))
163 ip_set_init_comment(ext_comment(x, set), ext); 165 ip_set_init_comment(ext_comment(x, set), ext);
166 if (SET_WITH_SKBINFO(set))
167 ip_set_init_skbinfo(ext_skbinfo(x, set), ext);
164 return 0; 168 return 0;
165} 169}
166 170
diff --git a/net/netfilter/ipset/ip_set_bitmap_ip.c b/net/netfilter/ipset/ip_set_bitmap_ip.c
index 6f1f9f494808..55b083ec587a 100644
--- a/net/netfilter/ipset/ip_set_bitmap_ip.c
+++ b/net/netfilter/ipset/ip_set_bitmap_ip.c
@@ -27,7 +27,8 @@
27 27
28#define IPSET_TYPE_REV_MIN 0 28#define IPSET_TYPE_REV_MIN 0
29/* 1 Counter support added */ 29/* 1 Counter support added */
30#define IPSET_TYPE_REV_MAX 2 /* Comment support added */ 30/* 2 Comment support added */
31#define IPSET_TYPE_REV_MAX 3 /* skbinfo support added */
31 32
32MODULE_LICENSE("GPL"); 33MODULE_LICENSE("GPL");
33MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); 34MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>");
@@ -112,7 +113,7 @@ bitmap_ip_kadt(struct ip_set *set, const struct sk_buff *skb,
112{ 113{
113 struct bitmap_ip *map = set->data; 114 struct bitmap_ip *map = set->data;
114 ipset_adtfn adtfn = set->variant->adt[adt]; 115 ipset_adtfn adtfn = set->variant->adt[adt];
115 struct bitmap_ip_adt_elem e = { }; 116 struct bitmap_ip_adt_elem e = { .id = 0 };
116 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); 117 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
117 u32 ip; 118 u32 ip;
118 119
@@ -132,14 +133,17 @@ bitmap_ip_uadt(struct ip_set *set, struct nlattr *tb[],
132 struct bitmap_ip *map = set->data; 133 struct bitmap_ip *map = set->data;
133 ipset_adtfn adtfn = set->variant->adt[adt]; 134 ipset_adtfn adtfn = set->variant->adt[adt];
134 u32 ip = 0, ip_to = 0; 135 u32 ip = 0, ip_to = 0;
135 struct bitmap_ip_adt_elem e = { }; 136 struct bitmap_ip_adt_elem e = { .id = 0 };
136 struct ip_set_ext ext = IP_SET_INIT_UEXT(set); 137 struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
137 int ret = 0; 138 int ret = 0;
138 139
139 if (unlikely(!tb[IPSET_ATTR_IP] || 140 if (unlikely(!tb[IPSET_ATTR_IP] ||
140 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 141 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
141 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 142 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
142 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 143 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
144 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
145 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
146 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
143 return -IPSET_ERR_PROTOCOL; 147 return -IPSET_ERR_PROTOCOL;
144 148
145 if (tb[IPSET_ATTR_LINENO]) 149 if (tb[IPSET_ATTR_LINENO])
@@ -357,6 +361,9 @@ static struct ip_set_type bitmap_ip_type __read_mostly = {
357 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 361 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
358 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 362 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
359 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 363 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
364 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
365 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
366 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
360 }, 367 },
361 .me = THIS_MODULE, 368 .me = THIS_MODULE,
362}; 369};
diff --git a/net/netfilter/ipset/ip_set_bitmap_ipmac.c b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
index 740eabededd9..86104744b00f 100644
--- a/net/netfilter/ipset/ip_set_bitmap_ipmac.c
+++ b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
@@ -27,7 +27,8 @@
27 27
28#define IPSET_TYPE_REV_MIN 0 28#define IPSET_TYPE_REV_MIN 0
29/* 1 Counter support added */ 29/* 1 Counter support added */
30#define IPSET_TYPE_REV_MAX 2 /* Comment support added */ 30/* 2 Comment support added */
31#define IPSET_TYPE_REV_MAX 3 /* skbinfo support added */
31 32
32MODULE_LICENSE("GPL"); 33MODULE_LICENSE("GPL");
33MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); 34MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>");
@@ -203,7 +204,7 @@ bitmap_ipmac_kadt(struct ip_set *set, const struct sk_buff *skb,
203{ 204{
204 struct bitmap_ipmac *map = set->data; 205 struct bitmap_ipmac *map = set->data;
205 ipset_adtfn adtfn = set->variant->adt[adt]; 206 ipset_adtfn adtfn = set->variant->adt[adt];
206 struct bitmap_ipmac_adt_elem e = {}; 207 struct bitmap_ipmac_adt_elem e = { .id = 0 };
207 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); 208 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
208 u32 ip; 209 u32 ip;
209 210
@@ -232,7 +233,7 @@ bitmap_ipmac_uadt(struct ip_set *set, struct nlattr *tb[],
232{ 233{
233 const struct bitmap_ipmac *map = set->data; 234 const struct bitmap_ipmac *map = set->data;
234 ipset_adtfn adtfn = set->variant->adt[adt]; 235 ipset_adtfn adtfn = set->variant->adt[adt];
235 struct bitmap_ipmac_adt_elem e = {}; 236 struct bitmap_ipmac_adt_elem e = { .id = 0 };
236 struct ip_set_ext ext = IP_SET_INIT_UEXT(set); 237 struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
237 u32 ip = 0; 238 u32 ip = 0;
238 int ret = 0; 239 int ret = 0;
@@ -240,7 +241,10 @@ bitmap_ipmac_uadt(struct ip_set *set, struct nlattr *tb[],
240 if (unlikely(!tb[IPSET_ATTR_IP] || 241 if (unlikely(!tb[IPSET_ATTR_IP] ||
241 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 242 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
242 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 243 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
243 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 244 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
245 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
246 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
247 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
244 return -IPSET_ERR_PROTOCOL; 248 return -IPSET_ERR_PROTOCOL;
245 249
246 if (tb[IPSET_ATTR_LINENO]) 250 if (tb[IPSET_ATTR_LINENO])
@@ -394,6 +398,9 @@ static struct ip_set_type bitmap_ipmac_type = {
394 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 398 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
395 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 399 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
396 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 400 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
401 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
402 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
403 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
397 }, 404 },
398 .me = THIS_MODULE, 405 .me = THIS_MODULE,
399}; 406};
diff --git a/net/netfilter/ipset/ip_set_bitmap_port.c b/net/netfilter/ipset/ip_set_bitmap_port.c
index cf99676e69f8..005dd36444c3 100644
--- a/net/netfilter/ipset/ip_set_bitmap_port.c
+++ b/net/netfilter/ipset/ip_set_bitmap_port.c
@@ -22,7 +22,8 @@
22 22
23#define IPSET_TYPE_REV_MIN 0 23#define IPSET_TYPE_REV_MIN 0
24/* 1 Counter support added */ 24/* 1 Counter support added */
25#define IPSET_TYPE_REV_MAX 2 /* Comment support added */ 25/* 2 Comment support added */
26#define IPSET_TYPE_REV_MAX 3 /* skbinfo support added */
26 27
27MODULE_LICENSE("GPL"); 28MODULE_LICENSE("GPL");
28MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); 29MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>");
@@ -104,7 +105,7 @@ bitmap_port_kadt(struct ip_set *set, const struct sk_buff *skb,
104{ 105{
105 struct bitmap_port *map = set->data; 106 struct bitmap_port *map = set->data;
106 ipset_adtfn adtfn = set->variant->adt[adt]; 107 ipset_adtfn adtfn = set->variant->adt[adt];
107 struct bitmap_port_adt_elem e = {}; 108 struct bitmap_port_adt_elem e = { .id = 0 };
108 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); 109 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
109 __be16 __port; 110 __be16 __port;
110 u16 port = 0; 111 u16 port = 0;
@@ -129,7 +130,7 @@ bitmap_port_uadt(struct ip_set *set, struct nlattr *tb[],
129{ 130{
130 struct bitmap_port *map = set->data; 131 struct bitmap_port *map = set->data;
131 ipset_adtfn adtfn = set->variant->adt[adt]; 132 ipset_adtfn adtfn = set->variant->adt[adt];
132 struct bitmap_port_adt_elem e = {}; 133 struct bitmap_port_adt_elem e = { .id = 0 };
133 struct ip_set_ext ext = IP_SET_INIT_UEXT(set); 134 struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
134 u32 port; /* wraparound */ 135 u32 port; /* wraparound */
135 u16 port_to; 136 u16 port_to;
@@ -139,7 +140,10 @@ bitmap_port_uadt(struct ip_set *set, struct nlattr *tb[],
139 !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) || 140 !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) ||
140 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 141 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
141 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 142 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
142 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 143 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
144 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
145 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
146 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
143 return -IPSET_ERR_PROTOCOL; 147 return -IPSET_ERR_PROTOCOL;
144 148
145 if (tb[IPSET_ATTR_LINENO]) 149 if (tb[IPSET_ATTR_LINENO])
@@ -291,6 +295,9 @@ static struct ip_set_type bitmap_port_type = {
291 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 295 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
292 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 296 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
293 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 297 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
298 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
299 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
300 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
294 }, 301 },
295 .me = THIS_MODULE, 302 .me = THIS_MODULE,
296}; 303};
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 5edbbe829495..912e5a05b79d 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -101,7 +101,7 @@ load_settype(const char *name)
101 nfnl_unlock(NFNL_SUBSYS_IPSET); 101 nfnl_unlock(NFNL_SUBSYS_IPSET);
102 pr_debug("try to load ip_set_%s\n", name); 102 pr_debug("try to load ip_set_%s\n", name);
103 if (request_module("ip_set_%s", name) < 0) { 103 if (request_module("ip_set_%s", name) < 0) {
104 pr_warning("Can't find ip_set type %s\n", name); 104 pr_warn("Can't find ip_set type %s\n", name);
105 nfnl_lock(NFNL_SUBSYS_IPSET); 105 nfnl_lock(NFNL_SUBSYS_IPSET);
106 return false; 106 return false;
107 } 107 }
@@ -195,20 +195,19 @@ ip_set_type_register(struct ip_set_type *type)
195 int ret = 0; 195 int ret = 0;
196 196
197 if (type->protocol != IPSET_PROTOCOL) { 197 if (type->protocol != IPSET_PROTOCOL) {
198 pr_warning("ip_set type %s, family %s, revision %u:%u uses " 198 pr_warn("ip_set type %s, family %s, revision %u:%u uses wrong protocol version %u (want %u)\n",
199 "wrong protocol version %u (want %u)\n", 199 type->name, family_name(type->family),
200 type->name, family_name(type->family), 200 type->revision_min, type->revision_max,
201 type->revision_min, type->revision_max, 201 type->protocol, IPSET_PROTOCOL);
202 type->protocol, IPSET_PROTOCOL);
203 return -EINVAL; 202 return -EINVAL;
204 } 203 }
205 204
206 ip_set_type_lock(); 205 ip_set_type_lock();
207 if (find_set_type(type->name, type->family, type->revision_min)) { 206 if (find_set_type(type->name, type->family, type->revision_min)) {
208 /* Duplicate! */ 207 /* Duplicate! */
209 pr_warning("ip_set type %s, family %s with revision min %u " 208 pr_warn("ip_set type %s, family %s with revision min %u already registered!\n",
210 "already registered!\n", type->name, 209 type->name, family_name(type->family),
211 family_name(type->family), type->revision_min); 210 type->revision_min);
212 ret = -EINVAL; 211 ret = -EINVAL;
213 goto unlock; 212 goto unlock;
214 } 213 }
@@ -228,9 +227,9 @@ ip_set_type_unregister(struct ip_set_type *type)
228{ 227{
229 ip_set_type_lock(); 228 ip_set_type_lock();
230 if (!find_set_type(type->name, type->family, type->revision_min)) { 229 if (!find_set_type(type->name, type->family, type->revision_min)) {
231 pr_warning("ip_set type %s, family %s with revision min %u " 230 pr_warn("ip_set type %s, family %s with revision min %u not registered\n",
232 "not registered\n", type->name, 231 type->name, family_name(type->family),
233 family_name(type->family), type->revision_min); 232 type->revision_min);
234 goto unlock; 233 goto unlock;
235 } 234 }
236 list_del_rcu(&type->list); 235 list_del_rcu(&type->list);
@@ -338,6 +337,12 @@ const struct ip_set_ext_type ip_set_extensions[] = {
338 .len = sizeof(unsigned long), 337 .len = sizeof(unsigned long),
339 .align = __alignof__(unsigned long), 338 .align = __alignof__(unsigned long),
340 }, 339 },
340 [IPSET_EXT_ID_SKBINFO] = {
341 .type = IPSET_EXT_SKBINFO,
342 .flag = IPSET_FLAG_WITH_SKBINFO,
343 .len = sizeof(struct ip_set_skbinfo),
344 .align = __alignof__(struct ip_set_skbinfo),
345 },
341 [IPSET_EXT_ID_COMMENT] = { 346 [IPSET_EXT_ID_COMMENT] = {
342 .type = IPSET_EXT_COMMENT | IPSET_EXT_DESTROY, 347 .type = IPSET_EXT_COMMENT | IPSET_EXT_DESTROY,
343 .flag = IPSET_FLAG_WITH_COMMENT, 348 .flag = IPSET_FLAG_WITH_COMMENT,
@@ -383,6 +388,7 @@ int
383ip_set_get_extensions(struct ip_set *set, struct nlattr *tb[], 388ip_set_get_extensions(struct ip_set *set, struct nlattr *tb[],
384 struct ip_set_ext *ext) 389 struct ip_set_ext *ext)
385{ 390{
391 u64 fullmark;
386 if (tb[IPSET_ATTR_TIMEOUT]) { 392 if (tb[IPSET_ATTR_TIMEOUT]) {
387 if (!(set->extensions & IPSET_EXT_TIMEOUT)) 393 if (!(set->extensions & IPSET_EXT_TIMEOUT))
388 return -IPSET_ERR_TIMEOUT; 394 return -IPSET_ERR_TIMEOUT;
@@ -403,7 +409,25 @@ ip_set_get_extensions(struct ip_set *set, struct nlattr *tb[],
403 return -IPSET_ERR_COMMENT; 409 return -IPSET_ERR_COMMENT;
404 ext->comment = ip_set_comment_uget(tb[IPSET_ATTR_COMMENT]); 410 ext->comment = ip_set_comment_uget(tb[IPSET_ATTR_COMMENT]);
405 } 411 }
406 412 if (tb[IPSET_ATTR_SKBMARK]) {
413 if (!(set->extensions & IPSET_EXT_SKBINFO))
414 return -IPSET_ERR_SKBINFO;
415 fullmark = be64_to_cpu(nla_get_be64(tb[IPSET_ATTR_SKBMARK]));
416 ext->skbmark = fullmark >> 32;
417 ext->skbmarkmask = fullmark & 0xffffffff;
418 }
419 if (tb[IPSET_ATTR_SKBPRIO]) {
420 if (!(set->extensions & IPSET_EXT_SKBINFO))
421 return -IPSET_ERR_SKBINFO;
422 ext->skbprio = be32_to_cpu(nla_get_be32(
423 tb[IPSET_ATTR_SKBPRIO]));
424 }
425 if (tb[IPSET_ATTR_SKBQUEUE]) {
426 if (!(set->extensions & IPSET_EXT_SKBINFO))
427 return -IPSET_ERR_SKBINFO;
428 ext->skbqueue = be16_to_cpu(nla_get_be16(
429 tb[IPSET_ATTR_SKBQUEUE]));
430 }
407 return 0; 431 return 0;
408} 432}
409EXPORT_SYMBOL_GPL(ip_set_get_extensions); 433EXPORT_SYMBOL_GPL(ip_set_get_extensions);
@@ -1398,7 +1422,8 @@ call_ad(struct sock *ctnl, struct sk_buff *skb, struct ip_set *set,
1398 struct nlmsghdr *rep, *nlh = nlmsg_hdr(skb); 1422 struct nlmsghdr *rep, *nlh = nlmsg_hdr(skb);
1399 struct sk_buff *skb2; 1423 struct sk_buff *skb2;
1400 struct nlmsgerr *errmsg; 1424 struct nlmsgerr *errmsg;
1401 size_t payload = sizeof(*errmsg) + nlmsg_len(nlh); 1425 size_t payload = min(SIZE_MAX,
1426 sizeof(*errmsg) + nlmsg_len(nlh));
1402 int min_len = nlmsg_total_size(sizeof(struct nfgenmsg)); 1427 int min_len = nlmsg_total_size(sizeof(struct nfgenmsg));
1403 struct nlattr *cda[IPSET_ATTR_CMD_MAX+1]; 1428 struct nlattr *cda[IPSET_ATTR_CMD_MAX+1];
1404 struct nlattr *cmdattr; 1429 struct nlattr *cmdattr;
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index 61c7fb052802..fee7c64e4dd1 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -565,8 +565,8 @@ retry:
565 set->name, orig->htable_bits, htable_bits, orig); 565 set->name, orig->htable_bits, htable_bits, orig);
566 if (!htable_bits) { 566 if (!htable_bits) {
567 /* In case we have plenty of memory :-) */ 567 /* In case we have plenty of memory :-) */
568 pr_warning("Cannot increase the hashsize of set %s further\n", 568 pr_warn("Cannot increase the hashsize of set %s further\n",
569 set->name); 569 set->name);
570 return -IPSET_ERR_HASH_FULL; 570 return -IPSET_ERR_HASH_FULL;
571 } 571 }
572 t = ip_set_alloc(sizeof(*t) 572 t = ip_set_alloc(sizeof(*t)
@@ -651,8 +651,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
651 651
652 if (h->elements >= h->maxelem) { 652 if (h->elements >= h->maxelem) {
653 if (net_ratelimit()) 653 if (net_ratelimit())
654 pr_warning("Set %s is full, maxelem %u reached\n", 654 pr_warn("Set %s is full, maxelem %u reached\n",
655 set->name, h->maxelem); 655 set->name, h->maxelem);
656 return -IPSET_ERR_HASH_FULL; 656 return -IPSET_ERR_HASH_FULL;
657 } 657 }
658 658
@@ -720,6 +720,8 @@ reuse_slot:
720 ip_set_init_counter(ext_counter(data, set), ext); 720 ip_set_init_counter(ext_counter(data, set), ext);
721 if (SET_WITH_COMMENT(set)) 721 if (SET_WITH_COMMENT(set))
722 ip_set_init_comment(ext_comment(data, set), ext); 722 ip_set_init_comment(ext_comment(data, set), ext);
723 if (SET_WITH_SKBINFO(set))
724 ip_set_init_skbinfo(ext_skbinfo(data, set), ext);
723 725
724out: 726out:
725 rcu_read_unlock_bh(); 727 rcu_read_unlock_bh();
@@ -797,6 +799,9 @@ mtype_data_match(struct mtype_elem *data, const struct ip_set_ext *ext,
797 if (SET_WITH_COUNTER(set)) 799 if (SET_WITH_COUNTER(set))
798 ip_set_update_counter(ext_counter(data, set), 800 ip_set_update_counter(ext_counter(data, set),
799 ext, mext, flags); 801 ext, mext, flags);
802 if (SET_WITH_SKBINFO(set))
803 ip_set_get_skbinfo(ext_skbinfo(data, set),
804 ext, mext, flags);
800 return mtype_do_data_match(data); 805 return mtype_do_data_match(data);
801} 806}
802 807
@@ -998,8 +1003,8 @@ mtype_list(const struct ip_set *set,
998nla_put_failure: 1003nla_put_failure:
999 nlmsg_trim(skb, incomplete); 1004 nlmsg_trim(skb, incomplete);
1000 if (unlikely(first == cb->args[IPSET_CB_ARG0])) { 1005 if (unlikely(first == cb->args[IPSET_CB_ARG0])) {
1001 pr_warning("Can't list set %s: one bucket does not fit into " 1006 pr_warn("Can't list set %s: one bucket does not fit into a message. Please report it!\n",
1002 "a message. Please report it!\n", set->name); 1007 set->name);
1003 cb->args[IPSET_CB_ARG0] = 0; 1008 cb->args[IPSET_CB_ARG0] = 0;
1004 return -EMSGSIZE; 1009 return -EMSGSIZE;
1005 } 1010 }
@@ -1049,8 +1054,10 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
1049 struct HTYPE *h; 1054 struct HTYPE *h;
1050 struct htable *t; 1055 struct htable *t;
1051 1056
1057#ifndef IP_SET_PROTO_UNDEF
1052 if (!(set->family == NFPROTO_IPV4 || set->family == NFPROTO_IPV6)) 1058 if (!(set->family == NFPROTO_IPV4 || set->family == NFPROTO_IPV6))
1053 return -IPSET_ERR_INVALID_FAMILY; 1059 return -IPSET_ERR_INVALID_FAMILY;
1060#endif
1054 1061
1055#ifdef IP_SET_HASH_WITH_MARKMASK 1062#ifdef IP_SET_HASH_WITH_MARKMASK
1056 markmask = 0xffffffff; 1063 markmask = 0xffffffff;
@@ -1093,7 +1100,7 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
1093 if (tb[IPSET_ATTR_MARKMASK]) { 1100 if (tb[IPSET_ATTR_MARKMASK]) {
1094 markmask = ntohl(nla_get_u32(tb[IPSET_ATTR_MARKMASK])); 1101 markmask = ntohl(nla_get_u32(tb[IPSET_ATTR_MARKMASK]));
1095 1102
1096 if ((markmask > 4294967295u) || markmask == 0) 1103 if (markmask == 0)
1097 return -IPSET_ERR_INVALID_MARKMASK; 1104 return -IPSET_ERR_INVALID_MARKMASK;
1098 } 1105 }
1099#endif 1106#endif
@@ -1132,25 +1139,32 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
1132 rcu_assign_pointer(h->table, t); 1139 rcu_assign_pointer(h->table, t);
1133 1140
1134 set->data = h; 1141 set->data = h;
1142#ifndef IP_SET_PROTO_UNDEF
1135 if (set->family == NFPROTO_IPV4) { 1143 if (set->family == NFPROTO_IPV4) {
1144#endif
1136 set->variant = &IPSET_TOKEN(HTYPE, 4_variant); 1145 set->variant = &IPSET_TOKEN(HTYPE, 4_variant);
1137 set->dsize = ip_set_elem_len(set, tb, 1146 set->dsize = ip_set_elem_len(set, tb,
1138 sizeof(struct IPSET_TOKEN(HTYPE, 4_elem))); 1147 sizeof(struct IPSET_TOKEN(HTYPE, 4_elem)));
1148#ifndef IP_SET_PROTO_UNDEF
1139 } else { 1149 } else {
1140 set->variant = &IPSET_TOKEN(HTYPE, 6_variant); 1150 set->variant = &IPSET_TOKEN(HTYPE, 6_variant);
1141 set->dsize = ip_set_elem_len(set, tb, 1151 set->dsize = ip_set_elem_len(set, tb,
1142 sizeof(struct IPSET_TOKEN(HTYPE, 6_elem))); 1152 sizeof(struct IPSET_TOKEN(HTYPE, 6_elem)));
1143 } 1153 }
1154#endif
1144 if (tb[IPSET_ATTR_TIMEOUT]) { 1155 if (tb[IPSET_ATTR_TIMEOUT]) {
1145 set->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); 1156 set->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]);
1157#ifndef IP_SET_PROTO_UNDEF
1146 if (set->family == NFPROTO_IPV4) 1158 if (set->family == NFPROTO_IPV4)
1159#endif
1147 IPSET_TOKEN(HTYPE, 4_gc_init)(set, 1160 IPSET_TOKEN(HTYPE, 4_gc_init)(set,
1148 IPSET_TOKEN(HTYPE, 4_gc)); 1161 IPSET_TOKEN(HTYPE, 4_gc));
1162#ifndef IP_SET_PROTO_UNDEF
1149 else 1163 else
1150 IPSET_TOKEN(HTYPE, 6_gc_init)(set, 1164 IPSET_TOKEN(HTYPE, 6_gc_init)(set,
1151 IPSET_TOKEN(HTYPE, 6_gc)); 1165 IPSET_TOKEN(HTYPE, 6_gc));
1166#endif
1152 } 1167 }
1153
1154 pr_debug("create %s hashsize %u (%u) maxelem %u: %p(%p)\n", 1168 pr_debug("create %s hashsize %u (%u) maxelem %u: %p(%p)\n",
1155 set->name, jhash_size(t->htable_bits), 1169 set->name, jhash_size(t->htable_bits),
1156 t->htable_bits, h->maxelem, set->data, t); 1170 t->htable_bits, h->maxelem, set->data, t);
diff --git a/net/netfilter/ipset/ip_set_hash_ip.c b/net/netfilter/ipset/ip_set_hash_ip.c
index dd40607f878e..76959d79e9d1 100644
--- a/net/netfilter/ipset/ip_set_hash_ip.c
+++ b/net/netfilter/ipset/ip_set_hash_ip.c
@@ -26,7 +26,8 @@
26#define IPSET_TYPE_REV_MIN 0 26#define IPSET_TYPE_REV_MIN 0
27/* 1 Counters support */ 27/* 1 Counters support */
28/* 2 Comments support */ 28/* 2 Comments support */
29#define IPSET_TYPE_REV_MAX 3 /* Forceadd support */ 29/* 3 Forceadd support */
30#define IPSET_TYPE_REV_MAX 4 /* skbinfo support */
30 31
31MODULE_LICENSE("GPL"); 32MODULE_LICENSE("GPL");
32MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); 33MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>");
@@ -84,7 +85,7 @@ hash_ip4_kadt(struct ip_set *set, const struct sk_buff *skb,
84{ 85{
85 const struct hash_ip *h = set->data; 86 const struct hash_ip *h = set->data;
86 ipset_adtfn adtfn = set->variant->adt[adt]; 87 ipset_adtfn adtfn = set->variant->adt[adt];
87 struct hash_ip4_elem e = {}; 88 struct hash_ip4_elem e = { 0 };
88 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); 89 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
89 __be32 ip; 90 __be32 ip;
90 91
@@ -103,7 +104,7 @@ hash_ip4_uadt(struct ip_set *set, struct nlattr *tb[],
103{ 104{
104 const struct hash_ip *h = set->data; 105 const struct hash_ip *h = set->data;
105 ipset_adtfn adtfn = set->variant->adt[adt]; 106 ipset_adtfn adtfn = set->variant->adt[adt];
106 struct hash_ip4_elem e = {}; 107 struct hash_ip4_elem e = { 0 };
107 struct ip_set_ext ext = IP_SET_INIT_UEXT(set); 108 struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
108 u32 ip = 0, ip_to = 0, hosts; 109 u32 ip = 0, ip_to = 0, hosts;
109 int ret = 0; 110 int ret = 0;
@@ -111,7 +112,10 @@ hash_ip4_uadt(struct ip_set *set, struct nlattr *tb[],
111 if (unlikely(!tb[IPSET_ATTR_IP] || 112 if (unlikely(!tb[IPSET_ATTR_IP] ||
112 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 113 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
113 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 114 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
114 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 115 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
116 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
117 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
118 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
115 return -IPSET_ERR_PROTOCOL; 119 return -IPSET_ERR_PROTOCOL;
116 120
117 if (tb[IPSET_ATTR_LINENO]) 121 if (tb[IPSET_ATTR_LINENO])
@@ -222,7 +226,7 @@ hash_ip6_kadt(struct ip_set *set, const struct sk_buff *skb,
222{ 226{
223 const struct hash_ip *h = set->data; 227 const struct hash_ip *h = set->data;
224 ipset_adtfn adtfn = set->variant->adt[adt]; 228 ipset_adtfn adtfn = set->variant->adt[adt];
225 struct hash_ip6_elem e = {}; 229 struct hash_ip6_elem e = { { .all = { 0 } } };
226 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); 230 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
227 231
228 ip6addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &e.ip.in6); 232 ip6addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &e.ip.in6);
@@ -239,7 +243,7 @@ hash_ip6_uadt(struct ip_set *set, struct nlattr *tb[],
239{ 243{
240 const struct hash_ip *h = set->data; 244 const struct hash_ip *h = set->data;
241 ipset_adtfn adtfn = set->variant->adt[adt]; 245 ipset_adtfn adtfn = set->variant->adt[adt];
242 struct hash_ip6_elem e = {}; 246 struct hash_ip6_elem e = { { .all = { 0 } } };
243 struct ip_set_ext ext = IP_SET_INIT_UEXT(set); 247 struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
244 int ret; 248 int ret;
245 249
@@ -247,6 +251,9 @@ hash_ip6_uadt(struct ip_set *set, struct nlattr *tb[],
247 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 251 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
248 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 252 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
249 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) || 253 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
254 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
255 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
256 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE) ||
250 tb[IPSET_ATTR_IP_TO] || 257 tb[IPSET_ATTR_IP_TO] ||
251 tb[IPSET_ATTR_CIDR])) 258 tb[IPSET_ATTR_CIDR]))
252 return -IPSET_ERR_PROTOCOL; 259 return -IPSET_ERR_PROTOCOL;
@@ -295,6 +302,9 @@ static struct ip_set_type hash_ip_type __read_mostly = {
295 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 302 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
296 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 303 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
297 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 304 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
305 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
306 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
307 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
298 }, 308 },
299 .me = THIS_MODULE, 309 .me = THIS_MODULE,
300}; 310};
diff --git a/net/netfilter/ipset/ip_set_hash_ipmark.c b/net/netfilter/ipset/ip_set_hash_ipmark.c
index 4eff0a297254..7abf9788cfa8 100644
--- a/net/netfilter/ipset/ip_set_hash_ipmark.c
+++ b/net/netfilter/ipset/ip_set_hash_ipmark.c
@@ -25,7 +25,8 @@
25#include <linux/netfilter/ipset/ip_set_hash.h> 25#include <linux/netfilter/ipset/ip_set_hash.h>
26 26
27#define IPSET_TYPE_REV_MIN 0 27#define IPSET_TYPE_REV_MIN 0
28#define IPSET_TYPE_REV_MAX 1 /* Forceadd support */ 28/* 1 Forceadd support */
29#define IPSET_TYPE_REV_MAX 2 /* skbinfo support */
29 30
30MODULE_LICENSE("GPL"); 31MODULE_LICENSE("GPL");
31MODULE_AUTHOR("Vytas Dauksa <vytas.dauksa@smoothwall.net>"); 32MODULE_AUTHOR("Vytas Dauksa <vytas.dauksa@smoothwall.net>");
@@ -113,7 +114,10 @@ hash_ipmark4_uadt(struct ip_set *set, struct nlattr *tb[],
113 !ip_set_attr_netorder(tb, IPSET_ATTR_MARK) || 114 !ip_set_attr_netorder(tb, IPSET_ATTR_MARK) ||
114 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 115 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
115 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 116 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
116 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 117 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
118 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
119 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
120 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
117 return -IPSET_ERR_PROTOCOL; 121 return -IPSET_ERR_PROTOCOL;
118 122
119 if (tb[IPSET_ATTR_LINENO]) 123 if (tb[IPSET_ATTR_LINENO])
@@ -244,6 +248,9 @@ hash_ipmark6_uadt(struct ip_set *set, struct nlattr *tb[],
244 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 248 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
245 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 249 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
246 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) || 250 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
251 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
252 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
253 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE) ||
247 tb[IPSET_ATTR_IP_TO] || 254 tb[IPSET_ATTR_IP_TO] ||
248 tb[IPSET_ATTR_CIDR])) 255 tb[IPSET_ATTR_CIDR]))
249 return -IPSET_ERR_PROTOCOL; 256 return -IPSET_ERR_PROTOCOL;
@@ -301,6 +308,9 @@ static struct ip_set_type hash_ipmark_type __read_mostly = {
301 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 308 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
302 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 309 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
303 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 310 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
311 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
312 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
313 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
304 }, 314 },
305 .me = THIS_MODULE, 315 .me = THIS_MODULE,
306}; 316};
diff --git a/net/netfilter/ipset/ip_set_hash_ipport.c b/net/netfilter/ipset/ip_set_hash_ipport.c
index 7597b82a8b03..dcbcceb9a52f 100644
--- a/net/netfilter/ipset/ip_set_hash_ipport.c
+++ b/net/netfilter/ipset/ip_set_hash_ipport.c
@@ -28,7 +28,8 @@
28/* 1 SCTP and UDPLITE support added */ 28/* 1 SCTP and UDPLITE support added */
29/* 2 Counters support added */ 29/* 2 Counters support added */
30/* 3 Comments support added */ 30/* 3 Comments support added */
31#define IPSET_TYPE_REV_MAX 4 /* Forceadd support added */ 31/* 4 Forceadd support added */
32#define IPSET_TYPE_REV_MAX 5 /* skbinfo support added */
32 33
33MODULE_LICENSE("GPL"); 34MODULE_LICENSE("GPL");
34MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); 35MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>");
@@ -94,7 +95,7 @@ hash_ipport4_kadt(struct ip_set *set, const struct sk_buff *skb,
94 enum ipset_adt adt, struct ip_set_adt_opt *opt) 95 enum ipset_adt adt, struct ip_set_adt_opt *opt)
95{ 96{
96 ipset_adtfn adtfn = set->variant->adt[adt]; 97 ipset_adtfn adtfn = set->variant->adt[adt];
97 struct hash_ipport4_elem e = { }; 98 struct hash_ipport4_elem e = { .ip = 0 };
98 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); 99 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
99 100
100 if (!ip_set_get_ip4_port(skb, opt->flags & IPSET_DIM_TWO_SRC, 101 if (!ip_set_get_ip4_port(skb, opt->flags & IPSET_DIM_TWO_SRC,
@@ -111,7 +112,7 @@ hash_ipport4_uadt(struct ip_set *set, struct nlattr *tb[],
111{ 112{
112 const struct hash_ipport *h = set->data; 113 const struct hash_ipport *h = set->data;
113 ipset_adtfn adtfn = set->variant->adt[adt]; 114 ipset_adtfn adtfn = set->variant->adt[adt];
114 struct hash_ipport4_elem e = { }; 115 struct hash_ipport4_elem e = { .ip = 0 };
115 struct ip_set_ext ext = IP_SET_INIT_UEXT(set); 116 struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
116 u32 ip, ip_to = 0, p = 0, port, port_to; 117 u32 ip, ip_to = 0, p = 0, port, port_to;
117 bool with_ports = false; 118 bool with_ports = false;
@@ -122,7 +123,10 @@ hash_ipport4_uadt(struct ip_set *set, struct nlattr *tb[],
122 !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) || 123 !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) ||
123 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 124 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
124 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 125 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
125 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 126 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
127 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
128 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
129 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
126 return -IPSET_ERR_PROTOCOL; 130 return -IPSET_ERR_PROTOCOL;
127 131
128 if (tb[IPSET_ATTR_LINENO]) 132 if (tb[IPSET_ATTR_LINENO])
@@ -258,7 +262,7 @@ hash_ipport6_kadt(struct ip_set *set, const struct sk_buff *skb,
258 enum ipset_adt adt, struct ip_set_adt_opt *opt) 262 enum ipset_adt adt, struct ip_set_adt_opt *opt)
259{ 263{
260 ipset_adtfn adtfn = set->variant->adt[adt]; 264 ipset_adtfn adtfn = set->variant->adt[adt];
261 struct hash_ipport6_elem e = { }; 265 struct hash_ipport6_elem e = { .ip = { .all = { 0 } } };
262 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); 266 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
263 267
264 if (!ip_set_get_ip6_port(skb, opt->flags & IPSET_DIM_TWO_SRC, 268 if (!ip_set_get_ip6_port(skb, opt->flags & IPSET_DIM_TWO_SRC,
@@ -275,7 +279,7 @@ hash_ipport6_uadt(struct ip_set *set, struct nlattr *tb[],
275{ 279{
276 const struct hash_ipport *h = set->data; 280 const struct hash_ipport *h = set->data;
277 ipset_adtfn adtfn = set->variant->adt[adt]; 281 ipset_adtfn adtfn = set->variant->adt[adt];
278 struct hash_ipport6_elem e = { }; 282 struct hash_ipport6_elem e = { .ip = { .all = { 0 } } };
279 struct ip_set_ext ext = IP_SET_INIT_UEXT(set); 283 struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
280 u32 port, port_to; 284 u32 port, port_to;
281 bool with_ports = false; 285 bool with_ports = false;
@@ -287,6 +291,9 @@ hash_ipport6_uadt(struct ip_set *set, struct nlattr *tb[],
287 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 291 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
288 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 292 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
289 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) || 293 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
294 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
295 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
296 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE) ||
290 tb[IPSET_ATTR_IP_TO] || 297 tb[IPSET_ATTR_IP_TO] ||
291 tb[IPSET_ATTR_CIDR])) 298 tb[IPSET_ATTR_CIDR]))
292 return -IPSET_ERR_PROTOCOL; 299 return -IPSET_ERR_PROTOCOL;
@@ -370,6 +377,9 @@ static struct ip_set_type hash_ipport_type __read_mostly = {
370 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 377 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
371 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 378 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
372 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 379 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
380 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
381 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
382 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
373 }, 383 },
374 .me = THIS_MODULE, 384 .me = THIS_MODULE,
375}; 385};
diff --git a/net/netfilter/ipset/ip_set_hash_ipportip.c b/net/netfilter/ipset/ip_set_hash_ipportip.c
index 672655ffd573..7ef93fc887a1 100644
--- a/net/netfilter/ipset/ip_set_hash_ipportip.c
+++ b/net/netfilter/ipset/ip_set_hash_ipportip.c
@@ -28,7 +28,8 @@
28/* 1 SCTP and UDPLITE support added */ 28/* 1 SCTP and UDPLITE support added */
29/* 2 Counters support added */ 29/* 2 Counters support added */
30/* 3 Comments support added */ 30/* 3 Comments support added */
31#define IPSET_TYPE_REV_MAX 4 /* Forceadd support added */ 31/* 4 Forceadd support added */
32#define IPSET_TYPE_REV_MAX 5 /* skbinfo support added */
32 33
33MODULE_LICENSE("GPL"); 34MODULE_LICENSE("GPL");
34MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); 35MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>");
@@ -95,7 +96,7 @@ hash_ipportip4_kadt(struct ip_set *set, const struct sk_buff *skb,
95 enum ipset_adt adt, struct ip_set_adt_opt *opt) 96 enum ipset_adt adt, struct ip_set_adt_opt *opt)
96{ 97{
97 ipset_adtfn adtfn = set->variant->adt[adt]; 98 ipset_adtfn adtfn = set->variant->adt[adt];
98 struct hash_ipportip4_elem e = { }; 99 struct hash_ipportip4_elem e = { .ip = 0 };
99 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); 100 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
100 101
101 if (!ip_set_get_ip4_port(skb, opt->flags & IPSET_DIM_TWO_SRC, 102 if (!ip_set_get_ip4_port(skb, opt->flags & IPSET_DIM_TWO_SRC,
@@ -113,7 +114,7 @@ hash_ipportip4_uadt(struct ip_set *set, struct nlattr *tb[],
113{ 114{
114 const struct hash_ipportip *h = set->data; 115 const struct hash_ipportip *h = set->data;
115 ipset_adtfn adtfn = set->variant->adt[adt]; 116 ipset_adtfn adtfn = set->variant->adt[adt];
116 struct hash_ipportip4_elem e = { }; 117 struct hash_ipportip4_elem e = { .ip = 0 };
117 struct ip_set_ext ext = IP_SET_INIT_UEXT(set); 118 struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
118 u32 ip, ip_to = 0, p = 0, port, port_to; 119 u32 ip, ip_to = 0, p = 0, port, port_to;
119 bool with_ports = false; 120 bool with_ports = false;
@@ -124,7 +125,10 @@ hash_ipportip4_uadt(struct ip_set *set, struct nlattr *tb[],
124 !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) || 125 !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) ||
125 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 126 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
126 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 127 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
127 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 128 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
129 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
130 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
131 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
128 return -IPSET_ERR_PROTOCOL; 132 return -IPSET_ERR_PROTOCOL;
129 133
130 if (tb[IPSET_ATTR_LINENO]) 134 if (tb[IPSET_ATTR_LINENO])
@@ -265,7 +269,7 @@ hash_ipportip6_kadt(struct ip_set *set, const struct sk_buff *skb,
265 enum ipset_adt adt, struct ip_set_adt_opt *opt) 269 enum ipset_adt adt, struct ip_set_adt_opt *opt)
266{ 270{
267 ipset_adtfn adtfn = set->variant->adt[adt]; 271 ipset_adtfn adtfn = set->variant->adt[adt];
268 struct hash_ipportip6_elem e = { }; 272 struct hash_ipportip6_elem e = { .ip = { .all = { 0 } } };
269 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); 273 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
270 274
271 if (!ip_set_get_ip6_port(skb, opt->flags & IPSET_DIM_TWO_SRC, 275 if (!ip_set_get_ip6_port(skb, opt->flags & IPSET_DIM_TWO_SRC,
@@ -283,7 +287,7 @@ hash_ipportip6_uadt(struct ip_set *set, struct nlattr *tb[],
283{ 287{
284 const struct hash_ipportip *h = set->data; 288 const struct hash_ipportip *h = set->data;
285 ipset_adtfn adtfn = set->variant->adt[adt]; 289 ipset_adtfn adtfn = set->variant->adt[adt];
286 struct hash_ipportip6_elem e = { }; 290 struct hash_ipportip6_elem e = { .ip = { .all = { 0 } } };
287 struct ip_set_ext ext = IP_SET_INIT_UEXT(set); 291 struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
288 u32 port, port_to; 292 u32 port, port_to;
289 bool with_ports = false; 293 bool with_ports = false;
@@ -295,6 +299,9 @@ hash_ipportip6_uadt(struct ip_set *set, struct nlattr *tb[],
295 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 299 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
296 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 300 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
297 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) || 301 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
302 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
303 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
304 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE) ||
298 tb[IPSET_ATTR_IP_TO] || 305 tb[IPSET_ATTR_IP_TO] ||
299 tb[IPSET_ATTR_CIDR])) 306 tb[IPSET_ATTR_CIDR]))
300 return -IPSET_ERR_PROTOCOL; 307 return -IPSET_ERR_PROTOCOL;
@@ -382,6 +389,9 @@ static struct ip_set_type hash_ipportip_type __read_mostly = {
382 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 389 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
383 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 390 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
384 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 391 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
392 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
393 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
394 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
385 }, 395 },
386 .me = THIS_MODULE, 396 .me = THIS_MODULE,
387}; 397};
diff --git a/net/netfilter/ipset/ip_set_hash_ipportnet.c b/net/netfilter/ipset/ip_set_hash_ipportnet.c
index 7308d84f9277..b6012ad92781 100644
--- a/net/netfilter/ipset/ip_set_hash_ipportnet.c
+++ b/net/netfilter/ipset/ip_set_hash_ipportnet.c
@@ -30,7 +30,8 @@
30/* 3 nomatch flag support added */ 30/* 3 nomatch flag support added */
31/* 4 Counters support added */ 31/* 4 Counters support added */
32/* 5 Comments support added */ 32/* 5 Comments support added */
33#define IPSET_TYPE_REV_MAX 6 /* Forceadd support added */ 33/* 6 Forceadd support added */
34#define IPSET_TYPE_REV_MAX 7 /* skbinfo support added */
34 35
35MODULE_LICENSE("GPL"); 36MODULE_LICENSE("GPL");
36MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); 37MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>");
@@ -179,7 +180,10 @@ hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[],
179 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 180 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
180 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || 181 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) ||
181 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 182 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
182 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 183 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
184 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
185 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
186 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
183 return -IPSET_ERR_PROTOCOL; 187 return -IPSET_ERR_PROTOCOL;
184 188
185 if (tb[IPSET_ATTR_LINENO]) 189 if (tb[IPSET_ATTR_LINENO])
@@ -432,6 +436,9 @@ hash_ipportnet6_uadt(struct ip_set *set, struct nlattr *tb[],
432 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || 436 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) ||
433 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 437 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
434 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) || 438 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
439 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
440 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
441 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE) ||
435 tb[IPSET_ATTR_IP_TO] || 442 tb[IPSET_ATTR_IP_TO] ||
436 tb[IPSET_ATTR_CIDR])) 443 tb[IPSET_ATTR_CIDR]))
437 return -IPSET_ERR_PROTOCOL; 444 return -IPSET_ERR_PROTOCOL;
@@ -541,6 +548,9 @@ static struct ip_set_type hash_ipportnet_type __read_mostly = {
541 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 548 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
542 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 549 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
543 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 550 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
551 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
552 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
553 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
544 }, 554 },
545 .me = THIS_MODULE, 555 .me = THIS_MODULE,
546}; 556};
diff --git a/net/netfilter/ipset/ip_set_hash_mac.c b/net/netfilter/ipset/ip_set_hash_mac.c
new file mode 100644
index 000000000000..65690b52a4d5
--- /dev/null
+++ b/net/netfilter/ipset/ip_set_hash_mac.c
@@ -0,0 +1,173 @@
1/* Copyright (C) 2014 Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2 *
3 * This program is free software; you can redistribute it and/or modify
4 * it under the terms of the GNU General Public License version 2 as
5 * published by the Free Software Foundation.
6 */
7
8/* Kernel module implementing an IP set type: the hash:mac type */
9
10#include <linux/jhash.h>
11#include <linux/module.h>
12#include <linux/etherdevice.h>
13#include <linux/skbuff.h>
14#include <linux/errno.h>
15#include <linux/if_ether.h>
16#include <net/netlink.h>
17
18#include <linux/netfilter.h>
19#include <linux/netfilter/ipset/ip_set.h>
20#include <linux/netfilter/ipset/ip_set_hash.h>
21
22#define IPSET_TYPE_REV_MIN 0
23#define IPSET_TYPE_REV_MAX 0
24
25MODULE_LICENSE("GPL");
26MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>");
27IP_SET_MODULE_DESC("hash:mac", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX);
28MODULE_ALIAS("ip_set_hash:mac");
29
30/* Type specific function prefix */
31#define HTYPE hash_mac
32
33/* Member elements */
34struct hash_mac4_elem {
35 /* Zero valued IP addresses cannot be stored */
36 union {
37 unsigned char ether[ETH_ALEN];
38 __be32 foo[2];
39 };
40};
41
42/* Common functions */
43
44static inline bool
45hash_mac4_data_equal(const struct hash_mac4_elem *e1,
46 const struct hash_mac4_elem *e2,
47 u32 *multi)
48{
49 return ether_addr_equal(e1->ether, e2->ether);
50}
51
52static inline bool
53hash_mac4_data_list(struct sk_buff *skb, const struct hash_mac4_elem *e)
54{
55 return nla_put(skb, IPSET_ATTR_ETHER, ETH_ALEN, e->ether);
56}
57
58static inline void
59hash_mac4_data_next(struct hash_mac4_elem *next,
60 const struct hash_mac4_elem *e)
61{
62}
63
64#define MTYPE hash_mac4
65#define PF 4
66#define HOST_MASK 32
67#define IP_SET_EMIT_CREATE
68#define IP_SET_PROTO_UNDEF
69#include "ip_set_hash_gen.h"
70
71/* Zero valued element is not supported */
72static const unsigned char invalid_ether[ETH_ALEN] = { 0 };
73
74static int
75hash_mac4_kadt(struct ip_set *set, const struct sk_buff *skb,
76 const struct xt_action_param *par,
77 enum ipset_adt adt, struct ip_set_adt_opt *opt)
78{
79 ipset_adtfn adtfn = set->variant->adt[adt];
80 struct hash_mac4_elem e = { { .foo[0] = 0, .foo[1] = 0 } };
81 struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
82
83 /* MAC can be src only */
84 if (!(opt->flags & IPSET_DIM_ONE_SRC))
85 return 0;
86
87 if (skb_mac_header(skb) < skb->head ||
88 (skb_mac_header(skb) + ETH_HLEN) > skb->data)
89 return -EINVAL;
90
91 memcpy(e.ether, eth_hdr(skb)->h_source, ETH_ALEN);
92 if (memcmp(e.ether, invalid_ether, ETH_ALEN) == 0)
93 return -EINVAL;
94 return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags);
95}
96
97static int
98hash_mac4_uadt(struct ip_set *set, struct nlattr *tb[],
99 enum ipset_adt adt, u32 *lineno, u32 flags, bool retried)
100{
101 ipset_adtfn adtfn = set->variant->adt[adt];
102 struct hash_mac4_elem e = { { .foo[0] = 0, .foo[1] = 0 } };
103 struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
104 int ret;
105
106 if (unlikely(!tb[IPSET_ATTR_ETHER] ||
107 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
108 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
109 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
110 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
111 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
112 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
113 return -IPSET_ERR_PROTOCOL;
114
115 if (tb[IPSET_ATTR_LINENO])
116 *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
117
118 ret = ip_set_get_extensions(set, tb, &ext);
119 if (ret)
120 return ret;
121 memcpy(e.ether, nla_data(tb[IPSET_ATTR_ETHER]), ETH_ALEN);
122 if (memcmp(e.ether, invalid_ether, ETH_ALEN) == 0)
123 return -IPSET_ERR_HASH_ELEM;
124
125 return adtfn(set, &e, &ext, &ext, flags);
126}
127
128static struct ip_set_type hash_mac_type __read_mostly = {
129 .name = "hash:mac",
130 .protocol = IPSET_PROTOCOL,
131 .features = IPSET_TYPE_MAC,
132 .dimension = IPSET_DIM_ONE,
133 .family = NFPROTO_UNSPEC,
134 .revision_min = IPSET_TYPE_REV_MIN,
135 .revision_max = IPSET_TYPE_REV_MAX,
136 .create = hash_mac_create,
137 .create_policy = {
138 [IPSET_ATTR_HASHSIZE] = { .type = NLA_U32 },
139 [IPSET_ATTR_MAXELEM] = { .type = NLA_U32 },
140 [IPSET_ATTR_PROBES] = { .type = NLA_U8 },
141 [IPSET_ATTR_RESIZE] = { .type = NLA_U8 },
142 [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 },
143 [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 },
144 },
145 .adt_policy = {
146 [IPSET_ATTR_ETHER] = { .type = NLA_BINARY,
147 .len = ETH_ALEN },
148 [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 },
149 [IPSET_ATTR_LINENO] = { .type = NLA_U32 },
150 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
151 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
152 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
153 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
154 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
155 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
156 },
157 .me = THIS_MODULE,
158};
159
160static int __init
161hash_mac_init(void)
162{
163 return ip_set_type_register(&hash_mac_type);
164}
165
166static void __exit
167hash_mac_fini(void)
168{
169 ip_set_type_unregister(&hash_mac_type);
170}
171
172module_init(hash_mac_init);
173module_exit(hash_mac_fini);
diff --git a/net/netfilter/ipset/ip_set_hash_net.c b/net/netfilter/ipset/ip_set_hash_net.c
index 4c7d495783a3..6b3ac10ac2f1 100644
--- a/net/netfilter/ipset/ip_set_hash_net.c
+++ b/net/netfilter/ipset/ip_set_hash_net.c
@@ -27,7 +27,8 @@
27/* 2 nomatch flag support added */ 27/* 2 nomatch flag support added */
28/* 3 Counters support added */ 28/* 3 Counters support added */
29/* 4 Comments support added */ 29/* 4 Comments support added */
30#define IPSET_TYPE_REV_MAX 5 /* Forceadd support added */ 30/* 5 Forceadd support added */
31#define IPSET_TYPE_REV_MAX 6 /* skbinfo mapping support added */
31 32
32MODULE_LICENSE("GPL"); 33MODULE_LICENSE("GPL");
33MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); 34MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>");
@@ -150,7 +151,10 @@ hash_net4_uadt(struct ip_set *set, struct nlattr *tb[],
150 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 151 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
151 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || 152 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) ||
152 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 153 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
153 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 154 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
155 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
156 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
157 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
154 return -IPSET_ERR_PROTOCOL; 158 return -IPSET_ERR_PROTOCOL;
155 159
156 if (tb[IPSET_ATTR_LINENO]) 160 if (tb[IPSET_ATTR_LINENO])
@@ -318,7 +322,10 @@ hash_net6_uadt(struct ip_set *set, struct nlattr *tb[],
318 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 322 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
319 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || 323 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) ||
320 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 324 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
321 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 325 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
326 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
327 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
328 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
322 return -IPSET_ERR_PROTOCOL; 329 return -IPSET_ERR_PROTOCOL;
323 if (unlikely(tb[IPSET_ATTR_IP_TO])) 330 if (unlikely(tb[IPSET_ATTR_IP_TO]))
324 return -IPSET_ERR_HASH_RANGE_UNSUPPORTED; 331 return -IPSET_ERR_HASH_RANGE_UNSUPPORTED;
@@ -377,6 +384,9 @@ static struct ip_set_type hash_net_type __read_mostly = {
377 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 384 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
378 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 385 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
379 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 386 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
387 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
388 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
389 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
380 }, 390 },
381 .me = THIS_MODULE, 391 .me = THIS_MODULE,
382}; 392};
diff --git a/net/netfilter/ipset/ip_set_hash_netiface.c b/net/netfilter/ipset/ip_set_hash_netiface.c
index db2606805b35..35dd35873442 100644
--- a/net/netfilter/ipset/ip_set_hash_netiface.c
+++ b/net/netfilter/ipset/ip_set_hash_netiface.c
@@ -28,7 +28,8 @@
28/* 2 /0 support added */ 28/* 2 /0 support added */
29/* 3 Counters support added */ 29/* 3 Counters support added */
30/* 4 Comments support added */ 30/* 4 Comments support added */
31#define IPSET_TYPE_REV_MAX 5 /* Forceadd support added */ 31/* 5 Forceadd support added */
32#define IPSET_TYPE_REV_MAX 6 /* skbinfo support added */
32 33
33MODULE_LICENSE("GPL"); 34MODULE_LICENSE("GPL");
34MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); 35MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>");
@@ -236,7 +237,7 @@ hash_netiface4_kadt(struct ip_set *set, const struct sk_buff *skb,
236#define SRCDIR (opt->flags & IPSET_DIM_TWO_SRC) 237#define SRCDIR (opt->flags & IPSET_DIM_TWO_SRC)
237 238
238 if (opt->cmdflags & IPSET_FLAG_PHYSDEV) { 239 if (opt->cmdflags & IPSET_FLAG_PHYSDEV) {
239#ifdef CONFIG_BRIDGE_NETFILTER 240#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
240 const struct nf_bridge_info *nf_bridge = skb->nf_bridge; 241 const struct nf_bridge_info *nf_bridge = skb->nf_bridge;
241 242
242 if (!nf_bridge) 243 if (!nf_bridge)
@@ -281,7 +282,10 @@ hash_netiface4_uadt(struct ip_set *set, struct nlattr *tb[],
281 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 282 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
282 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || 283 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) ||
283 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 284 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
284 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 285 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
286 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
287 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
288 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
285 return -IPSET_ERR_PROTOCOL; 289 return -IPSET_ERR_PROTOCOL;
286 290
287 if (tb[IPSET_ATTR_LINENO]) 291 if (tb[IPSET_ATTR_LINENO])
@@ -470,7 +474,7 @@ hash_netiface6_kadt(struct ip_set *set, const struct sk_buff *skb,
470 ip6_netmask(&e.ip, e.cidr); 474 ip6_netmask(&e.ip, e.cidr);
471 475
472 if (opt->cmdflags & IPSET_FLAG_PHYSDEV) { 476 if (opt->cmdflags & IPSET_FLAG_PHYSDEV) {
473#ifdef CONFIG_BRIDGE_NETFILTER 477#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
474 const struct nf_bridge_info *nf_bridge = skb->nf_bridge; 478 const struct nf_bridge_info *nf_bridge = skb->nf_bridge;
475 479
476 if (!nf_bridge) 480 if (!nf_bridge)
@@ -514,7 +518,10 @@ hash_netiface6_uadt(struct ip_set *set, struct nlattr *tb[],
514 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 518 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
515 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || 519 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) ||
516 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 520 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
517 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 521 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
522 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
523 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
524 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
518 return -IPSET_ERR_PROTOCOL; 525 return -IPSET_ERR_PROTOCOL;
519 if (unlikely(tb[IPSET_ATTR_IP_TO])) 526 if (unlikely(tb[IPSET_ATTR_IP_TO]))
520 return -IPSET_ERR_HASH_RANGE_UNSUPPORTED; 527 return -IPSET_ERR_HASH_RANGE_UNSUPPORTED;
@@ -590,6 +597,9 @@ static struct ip_set_type hash_netiface_type __read_mostly = {
590 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 597 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
591 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 598 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
592 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 599 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
600 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
601 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
602 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
593 }, 603 },
594 .me = THIS_MODULE, 604 .me = THIS_MODULE,
595}; 605};
diff --git a/net/netfilter/ipset/ip_set_hash_netnet.c b/net/netfilter/ipset/ip_set_hash_netnet.c
index 3e99987e4bf2..da00284b3571 100644
--- a/net/netfilter/ipset/ip_set_hash_netnet.c
+++ b/net/netfilter/ipset/ip_set_hash_netnet.c
@@ -24,7 +24,8 @@
24#include <linux/netfilter/ipset/ip_set_hash.h> 24#include <linux/netfilter/ipset/ip_set_hash.h>
25 25
26#define IPSET_TYPE_REV_MIN 0 26#define IPSET_TYPE_REV_MIN 0
27#define IPSET_TYPE_REV_MAX 1 /* Forceadd support added */ 27/* 1 Forceadd support added */
28#define IPSET_TYPE_REV_MAX 2 /* skbinfo support added */
28 29
29MODULE_LICENSE("GPL"); 30MODULE_LICENSE("GPL");
30MODULE_AUTHOR("Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>"); 31MODULE_AUTHOR("Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>");
@@ -171,7 +172,10 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[],
171 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 172 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
172 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || 173 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) ||
173 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 174 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
174 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 175 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
176 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
177 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
178 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
175 return -IPSET_ERR_PROTOCOL; 179 return -IPSET_ERR_PROTOCOL;
176 180
177 if (tb[IPSET_ATTR_LINENO]) 181 if (tb[IPSET_ATTR_LINENO])
@@ -203,7 +207,7 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[],
203 flags |= (IPSET_FLAG_NOMATCH << 16); 207 flags |= (IPSET_FLAG_NOMATCH << 16);
204 } 208 }
205 209
206 if (adt == IPSET_TEST || !(tb[IPSET_ATTR_IP_TO] && 210 if (adt == IPSET_TEST || !(tb[IPSET_ATTR_IP_TO] ||
207 tb[IPSET_ATTR_IP2_TO])) { 211 tb[IPSET_ATTR_IP2_TO])) {
208 e.ip[0] = htonl(ip & ip_set_hostmask(e.cidr[0])); 212 e.ip[0] = htonl(ip & ip_set_hostmask(e.cidr[0]));
209 e.ip[1] = htonl(ip2_from & ip_set_hostmask(e.cidr[1])); 213 e.ip[1] = htonl(ip2_from & ip_set_hostmask(e.cidr[1]));
@@ -219,9 +223,10 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[],
219 return ret; 223 return ret;
220 if (ip_to < ip) 224 if (ip_to < ip)
221 swap(ip, ip_to); 225 swap(ip, ip_to);
222 if (ip + UINT_MAX == ip_to) 226 if (unlikely(ip + UINT_MAX == ip_to))
223 return -IPSET_ERR_HASH_RANGE; 227 return -IPSET_ERR_HASH_RANGE;
224 } 228 } else
229 ip_set_mask_from_to(ip, ip_to, e.cidr[0]);
225 230
226 ip2_to = ip2_from; 231 ip2_to = ip2_from;
227 if (tb[IPSET_ATTR_IP2_TO]) { 232 if (tb[IPSET_ATTR_IP2_TO]) {
@@ -230,10 +235,10 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[],
230 return ret; 235 return ret;
231 if (ip2_to < ip2_from) 236 if (ip2_to < ip2_from)
232 swap(ip2_from, ip2_to); 237 swap(ip2_from, ip2_to);
233 if (ip2_from + UINT_MAX == ip2_to) 238 if (unlikely(ip2_from + UINT_MAX == ip2_to))
234 return -IPSET_ERR_HASH_RANGE; 239 return -IPSET_ERR_HASH_RANGE;
235 240 } else
236 } 241 ip_set_mask_from_to(ip2_from, ip2_to, e.cidr[1]);
237 242
238 if (retried) 243 if (retried)
239 ip = ntohl(h->next.ip[0]); 244 ip = ntohl(h->next.ip[0]);
@@ -393,7 +398,10 @@ hash_netnet6_uadt(struct ip_set *set, struct nlattr *tb[],
393 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 398 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
394 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || 399 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) ||
395 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 400 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
396 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 401 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
402 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
403 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
404 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
397 return -IPSET_ERR_PROTOCOL; 405 return -IPSET_ERR_PROTOCOL;
398 if (unlikely(tb[IPSET_ATTR_IP_TO] || tb[IPSET_ATTR_IP2_TO])) 406 if (unlikely(tb[IPSET_ATTR_IP_TO] || tb[IPSET_ATTR_IP2_TO]))
399 return -IPSET_ERR_HASH_RANGE_UNSUPPORTED; 407 return -IPSET_ERR_HASH_RANGE_UNSUPPORTED;
@@ -461,6 +469,9 @@ static struct ip_set_type hash_netnet_type __read_mostly = {
461 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 469 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
462 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 470 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
463 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 471 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
472 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
473 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
474 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
464 }, 475 },
465 .me = THIS_MODULE, 476 .me = THIS_MODULE,
466}; 477};
diff --git a/net/netfilter/ipset/ip_set_hash_netport.c b/net/netfilter/ipset/ip_set_hash_netport.c
index 1c645fbd09c7..c0ddb58d19dc 100644
--- a/net/netfilter/ipset/ip_set_hash_netport.c
+++ b/net/netfilter/ipset/ip_set_hash_netport.c
@@ -29,7 +29,8 @@
29/* 3 nomatch flag support added */ 29/* 3 nomatch flag support added */
30/* 4 Counters support added */ 30/* 4 Counters support added */
31/* 5 Comments support added */ 31/* 5 Comments support added */
32#define IPSET_TYPE_REV_MAX 6 /* Forceadd support added */ 32/* 6 Forceadd support added */
33#define IPSET_TYPE_REV_MAX 7 /* skbinfo support added */
33 34
34MODULE_LICENSE("GPL"); 35MODULE_LICENSE("GPL");
35MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); 36MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>");
@@ -172,7 +173,10 @@ hash_netport4_uadt(struct ip_set *set, struct nlattr *tb[],
172 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 173 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
173 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || 174 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) ||
174 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 175 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
175 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 176 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
177 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
178 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
179 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
176 return -IPSET_ERR_PROTOCOL; 180 return -IPSET_ERR_PROTOCOL;
177 181
178 if (tb[IPSET_ATTR_LINENO]) 182 if (tb[IPSET_ATTR_LINENO])
@@ -389,7 +393,10 @@ hash_netport6_uadt(struct ip_set *set, struct nlattr *tb[],
389 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 393 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
390 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || 394 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) ||
391 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 395 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
392 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 396 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
397 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
398 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
399 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
393 return -IPSET_ERR_PROTOCOL; 400 return -IPSET_ERR_PROTOCOL;
394 if (unlikely(tb[IPSET_ATTR_IP_TO])) 401 if (unlikely(tb[IPSET_ATTR_IP_TO]))
395 return -IPSET_ERR_HASH_RANGE_UNSUPPORTED; 402 return -IPSET_ERR_HASH_RANGE_UNSUPPORTED;
@@ -489,6 +496,9 @@ static struct ip_set_type hash_netport_type __read_mostly = {
489 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 496 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
490 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 497 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
491 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 498 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
499 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
500 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
501 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
492 }, 502 },
493 .me = THIS_MODULE, 503 .me = THIS_MODULE,
494}; 504};
diff --git a/net/netfilter/ipset/ip_set_hash_netportnet.c b/net/netfilter/ipset/ip_set_hash_netportnet.c
index c0d2ba73f8b2..b8053d675fc3 100644
--- a/net/netfilter/ipset/ip_set_hash_netportnet.c
+++ b/net/netfilter/ipset/ip_set_hash_netportnet.c
@@ -26,7 +26,8 @@
26 26
27#define IPSET_TYPE_REV_MIN 0 27#define IPSET_TYPE_REV_MIN 0
28/* 0 Comments support added */ 28/* 0 Comments support added */
29#define IPSET_TYPE_REV_MAX 1 /* Forceadd support added */ 29/* 1 Forceadd support added */
30#define IPSET_TYPE_REV_MAX 2 /* skbinfo support added */
30 31
31MODULE_LICENSE("GPL"); 32MODULE_LICENSE("GPL");
32MODULE_AUTHOR("Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>"); 33MODULE_AUTHOR("Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>");
@@ -189,7 +190,10 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[],
189 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 190 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
190 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || 191 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) ||
191 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 192 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
192 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 193 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
194 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
195 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
196 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
193 return -IPSET_ERR_PROTOCOL; 197 return -IPSET_ERR_PROTOCOL;
194 198
195 if (tb[IPSET_ATTR_LINENO]) 199 if (tb[IPSET_ATTR_LINENO])
@@ -257,7 +261,8 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[],
257 swap(ip, ip_to); 261 swap(ip, ip_to);
258 if (unlikely(ip + UINT_MAX == ip_to)) 262 if (unlikely(ip + UINT_MAX == ip_to))
259 return -IPSET_ERR_HASH_RANGE; 263 return -IPSET_ERR_HASH_RANGE;
260 } 264 } else
265 ip_set_mask_from_to(ip, ip_to, e.cidr[0]);
261 266
262 port_to = port = ntohs(e.port); 267 port_to = port = ntohs(e.port);
263 if (tb[IPSET_ATTR_PORT_TO]) { 268 if (tb[IPSET_ATTR_PORT_TO]) {
@@ -275,7 +280,8 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[],
275 swap(ip2_from, ip2_to); 280 swap(ip2_from, ip2_to);
276 if (unlikely(ip2_from + UINT_MAX == ip2_to)) 281 if (unlikely(ip2_from + UINT_MAX == ip2_to))
277 return -IPSET_ERR_HASH_RANGE; 282 return -IPSET_ERR_HASH_RANGE;
278 } 283 } else
284 ip_set_mask_from_to(ip2_from, ip2_to, e.cidr[1]);
279 285
280 if (retried) 286 if (retried)
281 ip = ntohl(h->next.ip[0]); 287 ip = ntohl(h->next.ip[0]);
@@ -458,7 +464,10 @@ hash_netportnet6_uadt(struct ip_set *set, struct nlattr *tb[],
458 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 464 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
459 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || 465 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) ||
460 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 466 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
461 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 467 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
468 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
469 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
470 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
462 return -IPSET_ERR_PROTOCOL; 471 return -IPSET_ERR_PROTOCOL;
463 if (unlikely(tb[IPSET_ATTR_IP_TO] || tb[IPSET_ATTR_IP2_TO])) 472 if (unlikely(tb[IPSET_ATTR_IP_TO] || tb[IPSET_ATTR_IP2_TO]))
464 return -IPSET_ERR_HASH_RANGE_UNSUPPORTED; 473 return -IPSET_ERR_HASH_RANGE_UNSUPPORTED;
@@ -567,6 +576,9 @@ static struct ip_set_type hash_netportnet_type __read_mostly = {
567 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 576 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
568 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 577 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
569 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 578 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
579 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
580 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
581 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
570 }, 582 },
571 .me = THIS_MODULE, 583 .me = THIS_MODULE,
572}; 584};
diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c
index 3e2317f3cf68..f8f682806e36 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -17,7 +17,8 @@
17 17
18#define IPSET_TYPE_REV_MIN 0 18#define IPSET_TYPE_REV_MIN 0
19/* 1 Counters support added */ 19/* 1 Counters support added */
20#define IPSET_TYPE_REV_MAX 2 /* Comments support added */ 20/* 2 Comments support added */
21#define IPSET_TYPE_REV_MAX 3 /* skbinfo support added */
21 22
22MODULE_LICENSE("GPL"); 23MODULE_LICENSE("GPL");
23MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); 24MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>");
@@ -73,6 +74,10 @@ list_set_ktest(struct ip_set *set, const struct sk_buff *skb,
73 ip_set_update_counter(ext_counter(e, set), 74 ip_set_update_counter(ext_counter(e, set),
74 ext, &opt->ext, 75 ext, &opt->ext,
75 cmdflags); 76 cmdflags);
77 if (SET_WITH_SKBINFO(set))
78 ip_set_get_skbinfo(ext_skbinfo(e, set),
79 ext, &opt->ext,
80 cmdflags);
76 return ret; 81 return ret;
77 } 82 }
78 } 83 }
@@ -197,6 +202,8 @@ list_set_add(struct ip_set *set, u32 i, struct set_adt_elem *d,
197 ip_set_init_counter(ext_counter(e, set), ext); 202 ip_set_init_counter(ext_counter(e, set), ext);
198 if (SET_WITH_COMMENT(set)) 203 if (SET_WITH_COMMENT(set))
199 ip_set_init_comment(ext_comment(e, set), ext); 204 ip_set_init_comment(ext_comment(e, set), ext);
205 if (SET_WITH_SKBINFO(set))
206 ip_set_init_skbinfo(ext_skbinfo(e, set), ext);
200 return 0; 207 return 0;
201} 208}
202 209
@@ -307,6 +314,8 @@ list_set_uadd(struct ip_set *set, void *value, const struct ip_set_ext *ext,
307 ip_set_init_counter(ext_counter(e, set), ext); 314 ip_set_init_counter(ext_counter(e, set), ext);
308 if (SET_WITH_COMMENT(set)) 315 if (SET_WITH_COMMENT(set))
309 ip_set_init_comment(ext_comment(e, set), ext); 316 ip_set_init_comment(ext_comment(e, set), ext);
317 if (SET_WITH_SKBINFO(set))
318 ip_set_init_skbinfo(ext_skbinfo(e, set), ext);
310 /* Set is already added to the list */ 319 /* Set is already added to the list */
311 ip_set_put_byindex(map->net, d->id); 320 ip_set_put_byindex(map->net, d->id);
312 return 0; 321 return 0;
@@ -378,7 +387,10 @@ list_set_uadt(struct ip_set *set, struct nlattr *tb[],
378 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || 387 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
379 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || 388 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) ||
380 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || 389 !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) ||
381 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) 390 !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES) ||
391 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBMARK) ||
392 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBPRIO) ||
393 !ip_set_optattr_netorder(tb, IPSET_ATTR_SKBQUEUE)))
382 return -IPSET_ERR_PROTOCOL; 394 return -IPSET_ERR_PROTOCOL;
383 395
384 if (tb[IPSET_ATTR_LINENO]) 396 if (tb[IPSET_ATTR_LINENO])
@@ -597,7 +609,9 @@ init_list_set(struct net *net, struct ip_set *set, u32 size)
597 struct set_elem *e; 609 struct set_elem *e;
598 u32 i; 610 u32 i;
599 611
600 map = kzalloc(sizeof(*map) + size * set->dsize, GFP_KERNEL); 612 map = kzalloc(sizeof(*map) +
613 min_t(u32, size, IP_SET_LIST_MAX_SIZE) * set->dsize,
614 GFP_KERNEL);
601 if (!map) 615 if (!map)
602 return false; 616 return false;
603 617
@@ -665,6 +679,9 @@ static struct ip_set_type list_set_type __read_mostly = {
665 [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, 679 [IPSET_ATTR_BYTES] = { .type = NLA_U64 },
666 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, 680 [IPSET_ATTR_PACKETS] = { .type = NLA_U64 },
667 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, 681 [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING },
682 [IPSET_ATTR_SKBMARK] = { .type = NLA_U64 },
683 [IPSET_ATTR_SKBPRIO] = { .type = NLA_U32 },
684 [IPSET_ATTR_SKBQUEUE] = { .type = NLA_U16 },
668 }, 685 },
669 .me = THIS_MODULE, 686 .me = THIS_MODULE,
670}; 687};
diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
index 0c3b1670b0d1..3b6929dec748 100644
--- a/net/netfilter/ipvs/Kconfig
+++ b/net/netfilter/ipvs/Kconfig
@@ -152,6 +152,16 @@ config IP_VS_WLC
152 If you want to compile it in kernel, say Y. To compile it as a 152 If you want to compile it in kernel, say Y. To compile it as a
153 module, choose M here. If unsure, say N. 153 module, choose M here. If unsure, say N.
154 154
155config IP_VS_FO
156 tristate "weighted failover scheduling"
157 ---help---
158 The weighted failover scheduling algorithm directs network
159 connections to the server with the highest weight that is
160 currently available.
161
162 If you want to compile it in kernel, say Y. To compile it as a
163 module, choose M here. If unsure, say N.
164
155config IP_VS_LBLC 165config IP_VS_LBLC
156 tristate "locality-based least-connection scheduling" 166 tristate "locality-based least-connection scheduling"
157 ---help--- 167 ---help---
diff --git a/net/netfilter/ipvs/Makefile b/net/netfilter/ipvs/Makefile
index 34ee602ddb66..38b2723b2e3d 100644
--- a/net/netfilter/ipvs/Makefile
+++ b/net/netfilter/ipvs/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_IP_VS_RR) += ip_vs_rr.o
26obj-$(CONFIG_IP_VS_WRR) += ip_vs_wrr.o 26obj-$(CONFIG_IP_VS_WRR) += ip_vs_wrr.o
27obj-$(CONFIG_IP_VS_LC) += ip_vs_lc.o 27obj-$(CONFIG_IP_VS_LC) += ip_vs_lc.o
28obj-$(CONFIG_IP_VS_WLC) += ip_vs_wlc.o 28obj-$(CONFIG_IP_VS_WLC) += ip_vs_wlc.o
29obj-$(CONFIG_IP_VS_FO) += ip_vs_fo.o
29obj-$(CONFIG_IP_VS_LBLC) += ip_vs_lblc.o 30obj-$(CONFIG_IP_VS_LBLC) += ip_vs_lblc.o
30obj-$(CONFIG_IP_VS_LBLCR) += ip_vs_lblcr.o 31obj-$(CONFIG_IP_VS_LBLCR) += ip_vs_lblcr.o
31obj-$(CONFIG_IP_VS_DH) += ip_vs_dh.o 32obj-$(CONFIG_IP_VS_DH) += ip_vs_dh.o
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 610e19c0e13f..b0f7b626b56d 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -27,6 +27,7 @@
27 27
28#include <linux/interrupt.h> 28#include <linux/interrupt.h>
29#include <linux/in.h> 29#include <linux/in.h>
30#include <linux/inet.h>
30#include <linux/net.h> 31#include <linux/net.h>
31#include <linux/kernel.h> 32#include <linux/kernel.h>
32#include <linux/module.h> 33#include <linux/module.h>
@@ -77,6 +78,13 @@ static unsigned int ip_vs_conn_rnd __read_mostly;
77#define CT_LOCKARRAY_SIZE (1<<CT_LOCKARRAY_BITS) 78#define CT_LOCKARRAY_SIZE (1<<CT_LOCKARRAY_BITS)
78#define CT_LOCKARRAY_MASK (CT_LOCKARRAY_SIZE-1) 79#define CT_LOCKARRAY_MASK (CT_LOCKARRAY_SIZE-1)
79 80
81/* We need an addrstrlen that works with or without v6 */
82#ifdef CONFIG_IP_VS_IPV6
83#define IP_VS_ADDRSTRLEN INET6_ADDRSTRLEN
84#else
85#define IP_VS_ADDRSTRLEN (8+1)
86#endif
87
80struct ip_vs_aligned_lock 88struct ip_vs_aligned_lock
81{ 89{
82 spinlock_t l; 90 spinlock_t l;
@@ -488,7 +496,12 @@ static inline void ip_vs_bind_xmit(struct ip_vs_conn *cp)
488 break; 496 break;
489 497
490 case IP_VS_CONN_F_TUNNEL: 498 case IP_VS_CONN_F_TUNNEL:
491 cp->packet_xmit = ip_vs_tunnel_xmit; 499#ifdef CONFIG_IP_VS_IPV6
500 if (cp->daf == AF_INET6)
501 cp->packet_xmit = ip_vs_tunnel_xmit_v6;
502 else
503#endif
504 cp->packet_xmit = ip_vs_tunnel_xmit;
492 break; 505 break;
493 506
494 case IP_VS_CONN_F_DROUTE: 507 case IP_VS_CONN_F_DROUTE:
@@ -514,7 +527,10 @@ static inline void ip_vs_bind_xmit_v6(struct ip_vs_conn *cp)
514 break; 527 break;
515 528
516 case IP_VS_CONN_F_TUNNEL: 529 case IP_VS_CONN_F_TUNNEL:
517 cp->packet_xmit = ip_vs_tunnel_xmit_v6; 530 if (cp->daf == AF_INET6)
531 cp->packet_xmit = ip_vs_tunnel_xmit_v6;
532 else
533 cp->packet_xmit = ip_vs_tunnel_xmit;
518 break; 534 break;
519 535
520 case IP_VS_CONN_F_DROUTE: 536 case IP_VS_CONN_F_DROUTE:
@@ -580,7 +596,7 @@ ip_vs_bind_dest(struct ip_vs_conn *cp, struct ip_vs_dest *dest)
580 ip_vs_proto_name(cp->protocol), 596 ip_vs_proto_name(cp->protocol),
581 IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport), 597 IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport),
582 IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport), 598 IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport),
583 IP_VS_DBG_ADDR(cp->af, &cp->daddr), ntohs(cp->dport), 599 IP_VS_DBG_ADDR(cp->daf, &cp->daddr), ntohs(cp->dport),
584 ip_vs_fwd_tag(cp), cp->state, 600 ip_vs_fwd_tag(cp), cp->state,
585 cp->flags, atomic_read(&cp->refcnt), 601 cp->flags, atomic_read(&cp->refcnt),
586 atomic_read(&dest->refcnt)); 602 atomic_read(&dest->refcnt));
@@ -616,7 +632,13 @@ void ip_vs_try_bind_dest(struct ip_vs_conn *cp)
616 struct ip_vs_dest *dest; 632 struct ip_vs_dest *dest;
617 633
618 rcu_read_lock(); 634 rcu_read_lock();
619 dest = ip_vs_find_dest(ip_vs_conn_net(cp), cp->af, &cp->daddr, 635
636 /* This function is only invoked by the synchronization code. We do
637 * not currently support heterogeneous pools with synchronization,
638 * so we can make the assumption that the svc_af is the same as the
639 * dest_af
640 */
641 dest = ip_vs_find_dest(ip_vs_conn_net(cp), cp->af, cp->af, &cp->daddr,
620 cp->dport, &cp->vaddr, cp->vport, 642 cp->dport, &cp->vaddr, cp->vport,
621 cp->protocol, cp->fwmark, cp->flags); 643 cp->protocol, cp->fwmark, cp->flags);
622 if (dest) { 644 if (dest) {
@@ -671,7 +693,7 @@ static inline void ip_vs_unbind_dest(struct ip_vs_conn *cp)
671 ip_vs_proto_name(cp->protocol), 693 ip_vs_proto_name(cp->protocol),
672 IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport), 694 IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport),
673 IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport), 695 IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport),
674 IP_VS_DBG_ADDR(cp->af, &cp->daddr), ntohs(cp->dport), 696 IP_VS_DBG_ADDR(cp->daf, &cp->daddr), ntohs(cp->dport),
675 ip_vs_fwd_tag(cp), cp->state, 697 ip_vs_fwd_tag(cp), cp->state,
676 cp->flags, atomic_read(&cp->refcnt), 698 cp->flags, atomic_read(&cp->refcnt),
677 atomic_read(&dest->refcnt)); 699 atomic_read(&dest->refcnt));
@@ -740,7 +762,7 @@ int ip_vs_check_template(struct ip_vs_conn *ct)
740 ntohs(ct->cport), 762 ntohs(ct->cport),
741 IP_VS_DBG_ADDR(ct->af, &ct->vaddr), 763 IP_VS_DBG_ADDR(ct->af, &ct->vaddr),
742 ntohs(ct->vport), 764 ntohs(ct->vport),
743 IP_VS_DBG_ADDR(ct->af, &ct->daddr), 765 IP_VS_DBG_ADDR(ct->daf, &ct->daddr),
744 ntohs(ct->dport)); 766 ntohs(ct->dport));
745 767
746 /* 768 /*
@@ -848,7 +870,7 @@ void ip_vs_conn_expire_now(struct ip_vs_conn *cp)
848 * Create a new connection entry and hash it into the ip_vs_conn_tab 870 * Create a new connection entry and hash it into the ip_vs_conn_tab
849 */ 871 */
850struct ip_vs_conn * 872struct ip_vs_conn *
851ip_vs_conn_new(const struct ip_vs_conn_param *p, 873ip_vs_conn_new(const struct ip_vs_conn_param *p, int dest_af,
852 const union nf_inet_addr *daddr, __be16 dport, unsigned int flags, 874 const union nf_inet_addr *daddr, __be16 dport, unsigned int flags,
853 struct ip_vs_dest *dest, __u32 fwmark) 875 struct ip_vs_dest *dest, __u32 fwmark)
854{ 876{
@@ -867,6 +889,7 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
867 setup_timer(&cp->timer, ip_vs_conn_expire, (unsigned long)cp); 889 setup_timer(&cp->timer, ip_vs_conn_expire, (unsigned long)cp);
868 ip_vs_conn_net_set(cp, p->net); 890 ip_vs_conn_net_set(cp, p->net);
869 cp->af = p->af; 891 cp->af = p->af;
892 cp->daf = dest_af;
870 cp->protocol = p->protocol; 893 cp->protocol = p->protocol;
871 ip_vs_addr_set(p->af, &cp->caddr, p->caddr); 894 ip_vs_addr_set(p->af, &cp->caddr, p->caddr);
872 cp->cport = p->cport; 895 cp->cport = p->cport;
@@ -874,7 +897,7 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
874 ip_vs_addr_set(p->protocol == IPPROTO_IP ? AF_UNSPEC : p->af, 897 ip_vs_addr_set(p->protocol == IPPROTO_IP ? AF_UNSPEC : p->af,
875 &cp->vaddr, p->vaddr); 898 &cp->vaddr, p->vaddr);
876 cp->vport = p->vport; 899 cp->vport = p->vport;
877 ip_vs_addr_set(p->af, &cp->daddr, daddr); 900 ip_vs_addr_set(cp->daf, &cp->daddr, daddr);
878 cp->dport = dport; 901 cp->dport = dport;
879 cp->flags = flags; 902 cp->flags = flags;
880 cp->fwmark = fwmark; 903 cp->fwmark = fwmark;
@@ -1036,6 +1059,7 @@ static int ip_vs_conn_seq_show(struct seq_file *seq, void *v)
1036 struct net *net = seq_file_net(seq); 1059 struct net *net = seq_file_net(seq);
1037 char pe_data[IP_VS_PENAME_MAXLEN + IP_VS_PEDATA_MAXLEN + 3]; 1060 char pe_data[IP_VS_PENAME_MAXLEN + IP_VS_PEDATA_MAXLEN + 3];
1038 size_t len = 0; 1061 size_t len = 0;
1062 char dbuf[IP_VS_ADDRSTRLEN];
1039 1063
1040 if (!ip_vs_conn_net_eq(cp, net)) 1064 if (!ip_vs_conn_net_eq(cp, net))
1041 return 0; 1065 return 0;
@@ -1050,24 +1074,32 @@ static int ip_vs_conn_seq_show(struct seq_file *seq, void *v)
1050 pe_data[len] = '\0'; 1074 pe_data[len] = '\0';
1051 1075
1052#ifdef CONFIG_IP_VS_IPV6 1076#ifdef CONFIG_IP_VS_IPV6
1077 if (cp->daf == AF_INET6)
1078 snprintf(dbuf, sizeof(dbuf), "%pI6", &cp->daddr.in6);
1079 else
1080#endif
1081 snprintf(dbuf, sizeof(dbuf), "%08X",
1082 ntohl(cp->daddr.ip));
1083
1084#ifdef CONFIG_IP_VS_IPV6
1053 if (cp->af == AF_INET6) 1085 if (cp->af == AF_INET6)
1054 seq_printf(seq, "%-3s %pI6 %04X %pI6 %04X " 1086 seq_printf(seq, "%-3s %pI6 %04X %pI6 %04X "
1055 "%pI6 %04X %-11s %7lu%s\n", 1087 "%s %04X %-11s %7lu%s\n",
1056 ip_vs_proto_name(cp->protocol), 1088 ip_vs_proto_name(cp->protocol),
1057 &cp->caddr.in6, ntohs(cp->cport), 1089 &cp->caddr.in6, ntohs(cp->cport),
1058 &cp->vaddr.in6, ntohs(cp->vport), 1090 &cp->vaddr.in6, ntohs(cp->vport),
1059 &cp->daddr.in6, ntohs(cp->dport), 1091 dbuf, ntohs(cp->dport),
1060 ip_vs_state_name(cp->protocol, cp->state), 1092 ip_vs_state_name(cp->protocol, cp->state),
1061 (cp->timer.expires-jiffies)/HZ, pe_data); 1093 (cp->timer.expires-jiffies)/HZ, pe_data);
1062 else 1094 else
1063#endif 1095#endif
1064 seq_printf(seq, 1096 seq_printf(seq,
1065 "%-3s %08X %04X %08X %04X" 1097 "%-3s %08X %04X %08X %04X"
1066 " %08X %04X %-11s %7lu%s\n", 1098 " %s %04X %-11s %7lu%s\n",
1067 ip_vs_proto_name(cp->protocol), 1099 ip_vs_proto_name(cp->protocol),
1068 ntohl(cp->caddr.ip), ntohs(cp->cport), 1100 ntohl(cp->caddr.ip), ntohs(cp->cport),
1069 ntohl(cp->vaddr.ip), ntohs(cp->vport), 1101 ntohl(cp->vaddr.ip), ntohs(cp->vport),
1070 ntohl(cp->daddr.ip), ntohs(cp->dport), 1102 dbuf, ntohs(cp->dport),
1071 ip_vs_state_name(cp->protocol, cp->state), 1103 ip_vs_state_name(cp->protocol, cp->state),
1072 (cp->timer.expires-jiffies)/HZ, pe_data); 1104 (cp->timer.expires-jiffies)/HZ, pe_data);
1073 } 1105 }
@@ -1105,6 +1137,7 @@ static const char *ip_vs_origin_name(unsigned int flags)
1105 1137
1106static int ip_vs_conn_sync_seq_show(struct seq_file *seq, void *v) 1138static int ip_vs_conn_sync_seq_show(struct seq_file *seq, void *v)
1107{ 1139{
1140 char dbuf[IP_VS_ADDRSTRLEN];
1108 1141
1109 if (v == SEQ_START_TOKEN) 1142 if (v == SEQ_START_TOKEN)
1110 seq_puts(seq, 1143 seq_puts(seq,
@@ -1117,12 +1150,21 @@ static int ip_vs_conn_sync_seq_show(struct seq_file *seq, void *v)
1117 return 0; 1150 return 0;
1118 1151
1119#ifdef CONFIG_IP_VS_IPV6 1152#ifdef CONFIG_IP_VS_IPV6
1153 if (cp->daf == AF_INET6)
1154 snprintf(dbuf, sizeof(dbuf), "%pI6", &cp->daddr.in6);
1155 else
1156#endif
1157 snprintf(dbuf, sizeof(dbuf), "%08X",
1158 ntohl(cp->daddr.ip));
1159
1160#ifdef CONFIG_IP_VS_IPV6
1120 if (cp->af == AF_INET6) 1161 if (cp->af == AF_INET6)
1121 seq_printf(seq, "%-3s %pI6 %04X %pI6 %04X %pI6 %04X %-11s %-6s %7lu\n", 1162 seq_printf(seq, "%-3s %pI6 %04X %pI6 %04X "
1163 "%s %04X %-11s %-6s %7lu\n",
1122 ip_vs_proto_name(cp->protocol), 1164 ip_vs_proto_name(cp->protocol),
1123 &cp->caddr.in6, ntohs(cp->cport), 1165 &cp->caddr.in6, ntohs(cp->cport),
1124 &cp->vaddr.in6, ntohs(cp->vport), 1166 &cp->vaddr.in6, ntohs(cp->vport),
1125 &cp->daddr.in6, ntohs(cp->dport), 1167 dbuf, ntohs(cp->dport),
1126 ip_vs_state_name(cp->protocol, cp->state), 1168 ip_vs_state_name(cp->protocol, cp->state),
1127 ip_vs_origin_name(cp->flags), 1169 ip_vs_origin_name(cp->flags),
1128 (cp->timer.expires-jiffies)/HZ); 1170 (cp->timer.expires-jiffies)/HZ);
@@ -1130,11 +1172,11 @@ static int ip_vs_conn_sync_seq_show(struct seq_file *seq, void *v)
1130#endif 1172#endif
1131 seq_printf(seq, 1173 seq_printf(seq,
1132 "%-3s %08X %04X %08X %04X " 1174 "%-3s %08X %04X %08X %04X "
1133 "%08X %04X %-11s %-6s %7lu\n", 1175 "%s %04X %-11s %-6s %7lu\n",
1134 ip_vs_proto_name(cp->protocol), 1176 ip_vs_proto_name(cp->protocol),
1135 ntohl(cp->caddr.ip), ntohs(cp->cport), 1177 ntohl(cp->caddr.ip), ntohs(cp->cport),
1136 ntohl(cp->vaddr.ip), ntohs(cp->vport), 1178 ntohl(cp->vaddr.ip), ntohs(cp->vport),
1137 ntohl(cp->daddr.ip), ntohs(cp->dport), 1179 dbuf, ntohs(cp->dport),
1138 ip_vs_state_name(cp->protocol, cp->state), 1180 ip_vs_state_name(cp->protocol, cp->state),
1139 ip_vs_origin_name(cp->flags), 1181 ip_vs_origin_name(cp->flags),
1140 (cp->timer.expires-jiffies)/HZ); 1182 (cp->timer.expires-jiffies)/HZ);
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 5c34e8d42e01..990decba1fe4 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -328,7 +328,7 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
328 * This adds param.pe_data to the template, 328 * This adds param.pe_data to the template,
329 * and thus param.pe_data will be destroyed 329 * and thus param.pe_data will be destroyed
330 * when the template expires */ 330 * when the template expires */
331 ct = ip_vs_conn_new(&param, &dest->addr, dport, 331 ct = ip_vs_conn_new(&param, dest->af, &dest->addr, dport,
332 IP_VS_CONN_F_TEMPLATE, dest, skb->mark); 332 IP_VS_CONN_F_TEMPLATE, dest, skb->mark);
333 if (ct == NULL) { 333 if (ct == NULL) {
334 kfree(param.pe_data); 334 kfree(param.pe_data);
@@ -357,7 +357,8 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
357 ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol, &iph->saddr, 357 ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol, &iph->saddr,
358 src_port, &iph->daddr, dst_port, &param); 358 src_port, &iph->daddr, dst_port, &param);
359 359
360 cp = ip_vs_conn_new(&param, &dest->addr, dport, flags, dest, skb->mark); 360 cp = ip_vs_conn_new(&param, dest->af, &dest->addr, dport, flags, dest,
361 skb->mark);
361 if (cp == NULL) { 362 if (cp == NULL) {
362 ip_vs_conn_put(ct); 363 ip_vs_conn_put(ct);
363 *ignored = -1; 364 *ignored = -1;
@@ -479,7 +480,7 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
479 ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol, 480 ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol,
480 &iph->saddr, pptr[0], &iph->daddr, 481 &iph->saddr, pptr[0], &iph->daddr,
481 pptr[1], &p); 482 pptr[1], &p);
482 cp = ip_vs_conn_new(&p, &dest->addr, 483 cp = ip_vs_conn_new(&p, dest->af, &dest->addr,
483 dest->port ? dest->port : pptr[1], 484 dest->port ? dest->port : pptr[1],
484 flags, dest, skb->mark); 485 flags, dest, skb->mark);
485 if (!cp) { 486 if (!cp) {
@@ -491,9 +492,9 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
491 IP_VS_DBG_BUF(6, "Schedule fwd:%c c:%s:%u v:%s:%u " 492 IP_VS_DBG_BUF(6, "Schedule fwd:%c c:%s:%u v:%s:%u "
492 "d:%s:%u conn->flags:%X conn->refcnt:%d\n", 493 "d:%s:%u conn->flags:%X conn->refcnt:%d\n",
493 ip_vs_fwd_tag(cp), 494 ip_vs_fwd_tag(cp),
494 IP_VS_DBG_ADDR(svc->af, &cp->caddr), ntohs(cp->cport), 495 IP_VS_DBG_ADDR(cp->af, &cp->caddr), ntohs(cp->cport),
495 IP_VS_DBG_ADDR(svc->af, &cp->vaddr), ntohs(cp->vport), 496 IP_VS_DBG_ADDR(cp->af, &cp->vaddr), ntohs(cp->vport),
496 IP_VS_DBG_ADDR(svc->af, &cp->daddr), ntohs(cp->dport), 497 IP_VS_DBG_ADDR(cp->daf, &cp->daddr), ntohs(cp->dport),
497 cp->flags, atomic_read(&cp->refcnt)); 498 cp->flags, atomic_read(&cp->refcnt));
498 499
499 ip_vs_conn_stats(cp, svc); 500 ip_vs_conn_stats(cp, svc);
@@ -550,7 +551,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
550 ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol, 551 ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol,
551 &iph->saddr, pptr[0], 552 &iph->saddr, pptr[0],
552 &iph->daddr, pptr[1], &p); 553 &iph->daddr, pptr[1], &p);
553 cp = ip_vs_conn_new(&p, &daddr, 0, 554 cp = ip_vs_conn_new(&p, svc->af, &daddr, 0,
554 IP_VS_CONN_F_BYPASS | flags, 555 IP_VS_CONN_F_BYPASS | flags,
555 NULL, skb->mark); 556 NULL, skb->mark);
556 if (!cp) 557 if (!cp)
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index fd3f444a4f96..ac7ba689efe7 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -574,8 +574,8 @@ bool ip_vs_has_real_service(struct net *net, int af, __u16 protocol,
574 * Called under RCU lock. 574 * Called under RCU lock.
575 */ 575 */
576static struct ip_vs_dest * 576static struct ip_vs_dest *
577ip_vs_lookup_dest(struct ip_vs_service *svc, const union nf_inet_addr *daddr, 577ip_vs_lookup_dest(struct ip_vs_service *svc, int dest_af,
578 __be16 dport) 578 const union nf_inet_addr *daddr, __be16 dport)
579{ 579{
580 struct ip_vs_dest *dest; 580 struct ip_vs_dest *dest;
581 581
@@ -583,9 +583,9 @@ ip_vs_lookup_dest(struct ip_vs_service *svc, const union nf_inet_addr *daddr,
583 * Find the destination for the given service 583 * Find the destination for the given service
584 */ 584 */
585 list_for_each_entry_rcu(dest, &svc->destinations, n_list) { 585 list_for_each_entry_rcu(dest, &svc->destinations, n_list) {
586 if ((dest->af == svc->af) 586 if ((dest->af == dest_af) &&
587 && ip_vs_addr_equal(svc->af, &dest->addr, daddr) 587 ip_vs_addr_equal(dest_af, &dest->addr, daddr) &&
588 && (dest->port == dport)) { 588 (dest->port == dport)) {
589 /* HIT */ 589 /* HIT */
590 return dest; 590 return dest;
591 } 591 }
@@ -602,7 +602,7 @@ ip_vs_lookup_dest(struct ip_vs_service *svc, const union nf_inet_addr *daddr,
602 * on the backup. 602 * on the backup.
603 * Called under RCU lock, no refcnt is returned. 603 * Called under RCU lock, no refcnt is returned.
604 */ 604 */
605struct ip_vs_dest *ip_vs_find_dest(struct net *net, int af, 605struct ip_vs_dest *ip_vs_find_dest(struct net *net, int svc_af, int dest_af,
606 const union nf_inet_addr *daddr, 606 const union nf_inet_addr *daddr,
607 __be16 dport, 607 __be16 dport,
608 const union nf_inet_addr *vaddr, 608 const union nf_inet_addr *vaddr,
@@ -613,14 +613,14 @@ struct ip_vs_dest *ip_vs_find_dest(struct net *net, int af,
613 struct ip_vs_service *svc; 613 struct ip_vs_service *svc;
614 __be16 port = dport; 614 __be16 port = dport;
615 615
616 svc = ip_vs_service_find(net, af, fwmark, protocol, vaddr, vport); 616 svc = ip_vs_service_find(net, svc_af, fwmark, protocol, vaddr, vport);
617 if (!svc) 617 if (!svc)
618 return NULL; 618 return NULL;
619 if (fwmark && (flags & IP_VS_CONN_F_FWD_MASK) != IP_VS_CONN_F_MASQ) 619 if (fwmark && (flags & IP_VS_CONN_F_FWD_MASK) != IP_VS_CONN_F_MASQ)
620 port = 0; 620 port = 0;
621 dest = ip_vs_lookup_dest(svc, daddr, port); 621 dest = ip_vs_lookup_dest(svc, dest_af, daddr, port);
622 if (!dest) 622 if (!dest)
623 dest = ip_vs_lookup_dest(svc, daddr, port ^ dport); 623 dest = ip_vs_lookup_dest(svc, dest_af, daddr, port ^ dport);
624 return dest; 624 return dest;
625} 625}
626 626
@@ -657,8 +657,8 @@ static void __ip_vs_dst_cache_reset(struct ip_vs_dest *dest)
657 * scheduling. 657 * scheduling.
658 */ 658 */
659static struct ip_vs_dest * 659static struct ip_vs_dest *
660ip_vs_trash_get_dest(struct ip_vs_service *svc, const union nf_inet_addr *daddr, 660ip_vs_trash_get_dest(struct ip_vs_service *svc, int dest_af,
661 __be16 dport) 661 const union nf_inet_addr *daddr, __be16 dport)
662{ 662{
663 struct ip_vs_dest *dest; 663 struct ip_vs_dest *dest;
664 struct netns_ipvs *ipvs = net_ipvs(svc->net); 664 struct netns_ipvs *ipvs = net_ipvs(svc->net);
@@ -671,11 +671,11 @@ ip_vs_trash_get_dest(struct ip_vs_service *svc, const union nf_inet_addr *daddr,
671 IP_VS_DBG_BUF(3, "Destination %u/%s:%u still in trash, " 671 IP_VS_DBG_BUF(3, "Destination %u/%s:%u still in trash, "
672 "dest->refcnt=%d\n", 672 "dest->refcnt=%d\n",
673 dest->vfwmark, 673 dest->vfwmark,
674 IP_VS_DBG_ADDR(svc->af, &dest->addr), 674 IP_VS_DBG_ADDR(dest->af, &dest->addr),
675 ntohs(dest->port), 675 ntohs(dest->port),
676 atomic_read(&dest->refcnt)); 676 atomic_read(&dest->refcnt));
677 if (dest->af == svc->af && 677 if (dest->af == dest_af &&
678 ip_vs_addr_equal(svc->af, &dest->addr, daddr) && 678 ip_vs_addr_equal(dest_af, &dest->addr, daddr) &&
679 dest->port == dport && 679 dest->port == dport &&
680 dest->vfwmark == svc->fwmark && 680 dest->vfwmark == svc->fwmark &&
681 dest->protocol == svc->protocol && 681 dest->protocol == svc->protocol &&
@@ -779,6 +779,12 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest,
779 struct ip_vs_scheduler *sched; 779 struct ip_vs_scheduler *sched;
780 int conn_flags; 780 int conn_flags;
781 781
782 /* We cannot modify an address and change the address family */
783 BUG_ON(!add && udest->af != dest->af);
784
785 if (add && udest->af != svc->af)
786 ipvs->mixed_address_family_dests++;
787
782 /* set the weight and the flags */ 788 /* set the weight and the flags */
783 atomic_set(&dest->weight, udest->weight); 789 atomic_set(&dest->weight, udest->weight);
784 conn_flags = udest->conn_flags & IP_VS_CONN_F_DEST_MASK; 790 conn_flags = udest->conn_flags & IP_VS_CONN_F_DEST_MASK;
@@ -816,6 +822,8 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest,
816 dest->u_threshold = udest->u_threshold; 822 dest->u_threshold = udest->u_threshold;
817 dest->l_threshold = udest->l_threshold; 823 dest->l_threshold = udest->l_threshold;
818 824
825 dest->af = udest->af;
826
819 spin_lock_bh(&dest->dst_lock); 827 spin_lock_bh(&dest->dst_lock);
820 __ip_vs_dst_cache_reset(dest); 828 __ip_vs_dst_cache_reset(dest);
821 spin_unlock_bh(&dest->dst_lock); 829 spin_unlock_bh(&dest->dst_lock);
@@ -847,7 +855,7 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest,
847 EnterFunction(2); 855 EnterFunction(2);
848 856
849#ifdef CONFIG_IP_VS_IPV6 857#ifdef CONFIG_IP_VS_IPV6
850 if (svc->af == AF_INET6) { 858 if (udest->af == AF_INET6) {
851 atype = ipv6_addr_type(&udest->addr.in6); 859 atype = ipv6_addr_type(&udest->addr.in6);
852 if ((!(atype & IPV6_ADDR_UNICAST) || 860 if ((!(atype & IPV6_ADDR_UNICAST) ||
853 atype & IPV6_ADDR_LINKLOCAL) && 861 atype & IPV6_ADDR_LINKLOCAL) &&
@@ -875,12 +883,12 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest,
875 u64_stats_init(&ip_vs_dest_stats->syncp); 883 u64_stats_init(&ip_vs_dest_stats->syncp);
876 } 884 }
877 885
878 dest->af = svc->af; 886 dest->af = udest->af;
879 dest->protocol = svc->protocol; 887 dest->protocol = svc->protocol;
880 dest->vaddr = svc->addr; 888 dest->vaddr = svc->addr;
881 dest->vport = svc->port; 889 dest->vport = svc->port;
882 dest->vfwmark = svc->fwmark; 890 dest->vfwmark = svc->fwmark;
883 ip_vs_addr_copy(svc->af, &dest->addr, &udest->addr); 891 ip_vs_addr_copy(udest->af, &dest->addr, &udest->addr);
884 dest->port = udest->port; 892 dest->port = udest->port;
885 893
886 atomic_set(&dest->activeconns, 0); 894 atomic_set(&dest->activeconns, 0);
@@ -928,11 +936,11 @@ ip_vs_add_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
928 return -ERANGE; 936 return -ERANGE;
929 } 937 }
930 938
931 ip_vs_addr_copy(svc->af, &daddr, &udest->addr); 939 ip_vs_addr_copy(udest->af, &daddr, &udest->addr);
932 940
933 /* We use function that requires RCU lock */ 941 /* We use function that requires RCU lock */
934 rcu_read_lock(); 942 rcu_read_lock();
935 dest = ip_vs_lookup_dest(svc, &daddr, dport); 943 dest = ip_vs_lookup_dest(svc, udest->af, &daddr, dport);
936 rcu_read_unlock(); 944 rcu_read_unlock();
937 945
938 if (dest != NULL) { 946 if (dest != NULL) {
@@ -944,12 +952,12 @@ ip_vs_add_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
944 * Check if the dest already exists in the trash and 952 * Check if the dest already exists in the trash and
945 * is from the same service 953 * is from the same service
946 */ 954 */
947 dest = ip_vs_trash_get_dest(svc, &daddr, dport); 955 dest = ip_vs_trash_get_dest(svc, udest->af, &daddr, dport);
948 956
949 if (dest != NULL) { 957 if (dest != NULL) {
950 IP_VS_DBG_BUF(3, "Get destination %s:%u from trash, " 958 IP_VS_DBG_BUF(3, "Get destination %s:%u from trash, "
951 "dest->refcnt=%d, service %u/%s:%u\n", 959 "dest->refcnt=%d, service %u/%s:%u\n",
952 IP_VS_DBG_ADDR(svc->af, &daddr), ntohs(dport), 960 IP_VS_DBG_ADDR(udest->af, &daddr), ntohs(dport),
953 atomic_read(&dest->refcnt), 961 atomic_read(&dest->refcnt),
954 dest->vfwmark, 962 dest->vfwmark,
955 IP_VS_DBG_ADDR(svc->af, &dest->vaddr), 963 IP_VS_DBG_ADDR(svc->af, &dest->vaddr),
@@ -992,11 +1000,11 @@ ip_vs_edit_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
992 return -ERANGE; 1000 return -ERANGE;
993 } 1001 }
994 1002
995 ip_vs_addr_copy(svc->af, &daddr, &udest->addr); 1003 ip_vs_addr_copy(udest->af, &daddr, &udest->addr);
996 1004
997 /* We use function that requires RCU lock */ 1005 /* We use function that requires RCU lock */
998 rcu_read_lock(); 1006 rcu_read_lock();
999 dest = ip_vs_lookup_dest(svc, &daddr, dport); 1007 dest = ip_vs_lookup_dest(svc, udest->af, &daddr, dport);
1000 rcu_read_unlock(); 1008 rcu_read_unlock();
1001 1009
1002 if (dest == NULL) { 1010 if (dest == NULL) {
@@ -1055,6 +1063,9 @@ static void __ip_vs_unlink_dest(struct ip_vs_service *svc,
1055 list_del_rcu(&dest->n_list); 1063 list_del_rcu(&dest->n_list);
1056 svc->num_dests--; 1064 svc->num_dests--;
1057 1065
1066 if (dest->af != svc->af)
1067 net_ipvs(svc->net)->mixed_address_family_dests--;
1068
1058 if (svcupd) { 1069 if (svcupd) {
1059 struct ip_vs_scheduler *sched; 1070 struct ip_vs_scheduler *sched;
1060 1071
@@ -1078,7 +1089,7 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
1078 1089
1079 /* We use function that requires RCU lock */ 1090 /* We use function that requires RCU lock */
1080 rcu_read_lock(); 1091 rcu_read_lock();
1081 dest = ip_vs_lookup_dest(svc, &udest->addr, dport); 1092 dest = ip_vs_lookup_dest(svc, udest->af, &udest->addr, dport);
1082 rcu_read_unlock(); 1093 rcu_read_unlock();
1083 1094
1084 if (dest == NULL) { 1095 if (dest == NULL) {
@@ -2179,29 +2190,41 @@ static int ip_vs_set_timeout(struct net *net, struct ip_vs_timeout_user *u)
2179 return 0; 2190 return 0;
2180} 2191}
2181 2192
2193#define CMDID(cmd) (cmd - IP_VS_BASE_CTL)
2194
2195struct ip_vs_svcdest_user {
2196 struct ip_vs_service_user s;
2197 struct ip_vs_dest_user d;
2198};
2199
2200static const unsigned char set_arglen[CMDID(IP_VS_SO_SET_MAX) + 1] = {
2201 [CMDID(IP_VS_SO_SET_ADD)] = sizeof(struct ip_vs_service_user),
2202 [CMDID(IP_VS_SO_SET_EDIT)] = sizeof(struct ip_vs_service_user),
2203 [CMDID(IP_VS_SO_SET_DEL)] = sizeof(struct ip_vs_service_user),
2204 [CMDID(IP_VS_SO_SET_ADDDEST)] = sizeof(struct ip_vs_svcdest_user),
2205 [CMDID(IP_VS_SO_SET_DELDEST)] = sizeof(struct ip_vs_svcdest_user),
2206 [CMDID(IP_VS_SO_SET_EDITDEST)] = sizeof(struct ip_vs_svcdest_user),
2207 [CMDID(IP_VS_SO_SET_TIMEOUT)] = sizeof(struct ip_vs_timeout_user),
2208 [CMDID(IP_VS_SO_SET_STARTDAEMON)] = sizeof(struct ip_vs_daemon_user),
2209 [CMDID(IP_VS_SO_SET_STOPDAEMON)] = sizeof(struct ip_vs_daemon_user),
2210 [CMDID(IP_VS_SO_SET_ZERO)] = sizeof(struct ip_vs_service_user),
2211};
2182 2212
2183#define SET_CMDID(cmd) (cmd - IP_VS_BASE_CTL) 2213union ip_vs_set_arglen {
2184#define SERVICE_ARG_LEN (sizeof(struct ip_vs_service_user)) 2214 struct ip_vs_service_user field_IP_VS_SO_SET_ADD;
2185#define SVCDEST_ARG_LEN (sizeof(struct ip_vs_service_user) + \ 2215 struct ip_vs_service_user field_IP_VS_SO_SET_EDIT;
2186 sizeof(struct ip_vs_dest_user)) 2216 struct ip_vs_service_user field_IP_VS_SO_SET_DEL;
2187#define TIMEOUT_ARG_LEN (sizeof(struct ip_vs_timeout_user)) 2217 struct ip_vs_svcdest_user field_IP_VS_SO_SET_ADDDEST;
2188#define DAEMON_ARG_LEN (sizeof(struct ip_vs_daemon_user)) 2218 struct ip_vs_svcdest_user field_IP_VS_SO_SET_DELDEST;
2189#define MAX_ARG_LEN SVCDEST_ARG_LEN 2219 struct ip_vs_svcdest_user field_IP_VS_SO_SET_EDITDEST;
2190 2220 struct ip_vs_timeout_user field_IP_VS_SO_SET_TIMEOUT;
2191static const unsigned char set_arglen[SET_CMDID(IP_VS_SO_SET_MAX)+1] = { 2221 struct ip_vs_daemon_user field_IP_VS_SO_SET_STARTDAEMON;
2192 [SET_CMDID(IP_VS_SO_SET_ADD)] = SERVICE_ARG_LEN, 2222 struct ip_vs_daemon_user field_IP_VS_SO_SET_STOPDAEMON;
2193 [SET_CMDID(IP_VS_SO_SET_EDIT)] = SERVICE_ARG_LEN, 2223 struct ip_vs_service_user field_IP_VS_SO_SET_ZERO;
2194 [SET_CMDID(IP_VS_SO_SET_DEL)] = SERVICE_ARG_LEN,
2195 [SET_CMDID(IP_VS_SO_SET_FLUSH)] = 0,
2196 [SET_CMDID(IP_VS_SO_SET_ADDDEST)] = SVCDEST_ARG_LEN,
2197 [SET_CMDID(IP_VS_SO_SET_DELDEST)] = SVCDEST_ARG_LEN,
2198 [SET_CMDID(IP_VS_SO_SET_EDITDEST)] = SVCDEST_ARG_LEN,
2199 [SET_CMDID(IP_VS_SO_SET_TIMEOUT)] = TIMEOUT_ARG_LEN,
2200 [SET_CMDID(IP_VS_SO_SET_STARTDAEMON)] = DAEMON_ARG_LEN,
2201 [SET_CMDID(IP_VS_SO_SET_STOPDAEMON)] = DAEMON_ARG_LEN,
2202 [SET_CMDID(IP_VS_SO_SET_ZERO)] = SERVICE_ARG_LEN,
2203}; 2224};
2204 2225
2226#define MAX_SET_ARGLEN sizeof(union ip_vs_set_arglen)
2227
2205static void ip_vs_copy_usvc_compat(struct ip_vs_service_user_kern *usvc, 2228static void ip_vs_copy_usvc_compat(struct ip_vs_service_user_kern *usvc,
2206 struct ip_vs_service_user *usvc_compat) 2229 struct ip_vs_service_user *usvc_compat)
2207{ 2230{
@@ -2232,6 +2255,7 @@ static void ip_vs_copy_udest_compat(struct ip_vs_dest_user_kern *udest,
2232 udest->weight = udest_compat->weight; 2255 udest->weight = udest_compat->weight;
2233 udest->u_threshold = udest_compat->u_threshold; 2256 udest->u_threshold = udest_compat->u_threshold;
2234 udest->l_threshold = udest_compat->l_threshold; 2257 udest->l_threshold = udest_compat->l_threshold;
2258 udest->af = AF_INET;
2235} 2259}
2236 2260
2237static int 2261static int
@@ -2239,7 +2263,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
2239{ 2263{
2240 struct net *net = sock_net(sk); 2264 struct net *net = sock_net(sk);
2241 int ret; 2265 int ret;
2242 unsigned char arg[MAX_ARG_LEN]; 2266 unsigned char arg[MAX_SET_ARGLEN];
2243 struct ip_vs_service_user *usvc_compat; 2267 struct ip_vs_service_user *usvc_compat;
2244 struct ip_vs_service_user_kern usvc; 2268 struct ip_vs_service_user_kern usvc;
2245 struct ip_vs_service *svc; 2269 struct ip_vs_service *svc;
@@ -2247,16 +2271,15 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
2247 struct ip_vs_dest_user_kern udest; 2271 struct ip_vs_dest_user_kern udest;
2248 struct netns_ipvs *ipvs = net_ipvs(net); 2272 struct netns_ipvs *ipvs = net_ipvs(net);
2249 2273
2274 BUILD_BUG_ON(sizeof(arg) > 255);
2250 if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) 2275 if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
2251 return -EPERM; 2276 return -EPERM;
2252 2277
2253 if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_SET_MAX) 2278 if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_SET_MAX)
2254 return -EINVAL; 2279 return -EINVAL;
2255 if (len < 0 || len > MAX_ARG_LEN) 2280 if (len != set_arglen[CMDID(cmd)]) {
2256 return -EINVAL; 2281 IP_VS_DBG(1, "set_ctl: len %u != %u\n",
2257 if (len != set_arglen[SET_CMDID(cmd)]) { 2282 len, set_arglen[CMDID(cmd)]);
2258 pr_err("set_ctl: len %u != %u\n",
2259 len, set_arglen[SET_CMDID(cmd)]);
2260 return -EINVAL; 2283 return -EINVAL;
2261 } 2284 }
2262 2285
@@ -2469,6 +2492,12 @@ __ip_vs_get_dest_entries(struct net *net, const struct ip_vs_get_dests *get,
2469 if (count >= get->num_dests) 2492 if (count >= get->num_dests)
2470 break; 2493 break;
2471 2494
2495 /* Cannot expose heterogeneous members via sockopt
2496 * interface
2497 */
2498 if (dest->af != svc->af)
2499 continue;
2500
2472 entry.addr = dest->addr.ip; 2501 entry.addr = dest->addr.ip;
2473 entry.port = dest->port; 2502 entry.port = dest->port;
2474 entry.conn_flags = atomic_read(&dest->conn_flags); 2503 entry.conn_flags = atomic_read(&dest->conn_flags);
@@ -2512,51 +2541,51 @@ __ip_vs_get_timeouts(struct net *net, struct ip_vs_timeout_user *u)
2512#endif 2541#endif
2513} 2542}
2514 2543
2544static const unsigned char get_arglen[CMDID(IP_VS_SO_GET_MAX) + 1] = {
2545 [CMDID(IP_VS_SO_GET_VERSION)] = 64,
2546 [CMDID(IP_VS_SO_GET_INFO)] = sizeof(struct ip_vs_getinfo),
2547 [CMDID(IP_VS_SO_GET_SERVICES)] = sizeof(struct ip_vs_get_services),
2548 [CMDID(IP_VS_SO_GET_SERVICE)] = sizeof(struct ip_vs_service_entry),
2549 [CMDID(IP_VS_SO_GET_DESTS)] = sizeof(struct ip_vs_get_dests),
2550 [CMDID(IP_VS_SO_GET_TIMEOUT)] = sizeof(struct ip_vs_timeout_user),
2551 [CMDID(IP_VS_SO_GET_DAEMON)] = 2 * sizeof(struct ip_vs_daemon_user),
2552};
2515 2553
2516#define GET_CMDID(cmd) (cmd - IP_VS_BASE_CTL) 2554union ip_vs_get_arglen {
2517#define GET_INFO_ARG_LEN (sizeof(struct ip_vs_getinfo)) 2555 char field_IP_VS_SO_GET_VERSION[64];
2518#define GET_SERVICES_ARG_LEN (sizeof(struct ip_vs_get_services)) 2556 struct ip_vs_getinfo field_IP_VS_SO_GET_INFO;
2519#define GET_SERVICE_ARG_LEN (sizeof(struct ip_vs_service_entry)) 2557 struct ip_vs_get_services field_IP_VS_SO_GET_SERVICES;
2520#define GET_DESTS_ARG_LEN (sizeof(struct ip_vs_get_dests)) 2558 struct ip_vs_service_entry field_IP_VS_SO_GET_SERVICE;
2521#define GET_TIMEOUT_ARG_LEN (sizeof(struct ip_vs_timeout_user)) 2559 struct ip_vs_get_dests field_IP_VS_SO_GET_DESTS;
2522#define GET_DAEMON_ARG_LEN (sizeof(struct ip_vs_daemon_user) * 2) 2560 struct ip_vs_timeout_user field_IP_VS_SO_GET_TIMEOUT;
2523 2561 struct ip_vs_daemon_user field_IP_VS_SO_GET_DAEMON[2];
2524static const unsigned char get_arglen[GET_CMDID(IP_VS_SO_GET_MAX)+1] = {
2525 [GET_CMDID(IP_VS_SO_GET_VERSION)] = 64,
2526 [GET_CMDID(IP_VS_SO_GET_INFO)] = GET_INFO_ARG_LEN,
2527 [GET_CMDID(IP_VS_SO_GET_SERVICES)] = GET_SERVICES_ARG_LEN,
2528 [GET_CMDID(IP_VS_SO_GET_SERVICE)] = GET_SERVICE_ARG_LEN,
2529 [GET_CMDID(IP_VS_SO_GET_DESTS)] = GET_DESTS_ARG_LEN,
2530 [GET_CMDID(IP_VS_SO_GET_TIMEOUT)] = GET_TIMEOUT_ARG_LEN,
2531 [GET_CMDID(IP_VS_SO_GET_DAEMON)] = GET_DAEMON_ARG_LEN,
2532}; 2562};
2533 2563
2564#define MAX_GET_ARGLEN sizeof(union ip_vs_get_arglen)
2565
2534static int 2566static int
2535do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len) 2567do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
2536{ 2568{
2537 unsigned char arg[128]; 2569 unsigned char arg[MAX_GET_ARGLEN];
2538 int ret = 0; 2570 int ret = 0;
2539 unsigned int copylen; 2571 unsigned int copylen;
2540 struct net *net = sock_net(sk); 2572 struct net *net = sock_net(sk);
2541 struct netns_ipvs *ipvs = net_ipvs(net); 2573 struct netns_ipvs *ipvs = net_ipvs(net);
2542 2574
2543 BUG_ON(!net); 2575 BUG_ON(!net);
2576 BUILD_BUG_ON(sizeof(arg) > 255);
2544 if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) 2577 if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
2545 return -EPERM; 2578 return -EPERM;
2546 2579
2547 if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_GET_MAX) 2580 if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_GET_MAX)
2548 return -EINVAL; 2581 return -EINVAL;
2549 2582
2550 if (*len < get_arglen[GET_CMDID(cmd)]) { 2583 copylen = get_arglen[CMDID(cmd)];
2551 pr_err("get_ctl: len %u < %u\n", 2584 if (*len < (int) copylen) {
2552 *len, get_arglen[GET_CMDID(cmd)]); 2585 IP_VS_DBG(1, "get_ctl: len %d < %u\n", *len, copylen);
2553 return -EINVAL; 2586 return -EINVAL;
2554 } 2587 }
2555 2588
2556 copylen = get_arglen[GET_CMDID(cmd)];
2557 if (copylen > 128)
2558 return -EINVAL;
2559
2560 if (copy_from_user(arg, user, copylen) != 0) 2589 if (copy_from_user(arg, user, copylen) != 0)
2561 return -EFAULT; 2590 return -EFAULT;
2562 /* 2591 /*
@@ -2766,6 +2795,7 @@ static const struct nla_policy ip_vs_dest_policy[IPVS_DEST_ATTR_MAX + 1] = {
2766 [IPVS_DEST_ATTR_INACT_CONNS] = { .type = NLA_U32 }, 2795 [IPVS_DEST_ATTR_INACT_CONNS] = { .type = NLA_U32 },
2767 [IPVS_DEST_ATTR_PERSIST_CONNS] = { .type = NLA_U32 }, 2796 [IPVS_DEST_ATTR_PERSIST_CONNS] = { .type = NLA_U32 },
2768 [IPVS_DEST_ATTR_STATS] = { .type = NLA_NESTED }, 2797 [IPVS_DEST_ATTR_STATS] = { .type = NLA_NESTED },
2798 [IPVS_DEST_ATTR_ADDR_FAMILY] = { .type = NLA_U16 },
2769}; 2799};
2770 2800
2771static int ip_vs_genl_fill_stats(struct sk_buff *skb, int container_type, 2801static int ip_vs_genl_fill_stats(struct sk_buff *skb, int container_type,
@@ -3021,7 +3051,8 @@ static int ip_vs_genl_fill_dest(struct sk_buff *skb, struct ip_vs_dest *dest)
3021 nla_put_u32(skb, IPVS_DEST_ATTR_INACT_CONNS, 3051 nla_put_u32(skb, IPVS_DEST_ATTR_INACT_CONNS,
3022 atomic_read(&dest->inactconns)) || 3052 atomic_read(&dest->inactconns)) ||
3023 nla_put_u32(skb, IPVS_DEST_ATTR_PERSIST_CONNS, 3053 nla_put_u32(skb, IPVS_DEST_ATTR_PERSIST_CONNS,
3024 atomic_read(&dest->persistconns))) 3054 atomic_read(&dest->persistconns)) ||
3055 nla_put_u16(skb, IPVS_DEST_ATTR_ADDR_FAMILY, dest->af))
3025 goto nla_put_failure; 3056 goto nla_put_failure;
3026 if (ip_vs_genl_fill_stats(skb, IPVS_DEST_ATTR_STATS, &dest->stats)) 3057 if (ip_vs_genl_fill_stats(skb, IPVS_DEST_ATTR_STATS, &dest->stats))
3027 goto nla_put_failure; 3058 goto nla_put_failure;
@@ -3102,6 +3133,7 @@ static int ip_vs_genl_parse_dest(struct ip_vs_dest_user_kern *udest,
3102{ 3133{
3103 struct nlattr *attrs[IPVS_DEST_ATTR_MAX + 1]; 3134 struct nlattr *attrs[IPVS_DEST_ATTR_MAX + 1];
3104 struct nlattr *nla_addr, *nla_port; 3135 struct nlattr *nla_addr, *nla_port;
3136 struct nlattr *nla_addr_family;
3105 3137
3106 /* Parse mandatory identifying destination fields first */ 3138 /* Parse mandatory identifying destination fields first */
3107 if (nla == NULL || 3139 if (nla == NULL ||
@@ -3110,6 +3142,7 @@ static int ip_vs_genl_parse_dest(struct ip_vs_dest_user_kern *udest,
3110 3142
3111 nla_addr = attrs[IPVS_DEST_ATTR_ADDR]; 3143 nla_addr = attrs[IPVS_DEST_ATTR_ADDR];
3112 nla_port = attrs[IPVS_DEST_ATTR_PORT]; 3144 nla_port = attrs[IPVS_DEST_ATTR_PORT];
3145 nla_addr_family = attrs[IPVS_DEST_ATTR_ADDR_FAMILY];
3113 3146
3114 if (!(nla_addr && nla_port)) 3147 if (!(nla_addr && nla_port))
3115 return -EINVAL; 3148 return -EINVAL;
@@ -3119,6 +3152,11 @@ static int ip_vs_genl_parse_dest(struct ip_vs_dest_user_kern *udest,
3119 nla_memcpy(&udest->addr, nla_addr, sizeof(udest->addr)); 3152 nla_memcpy(&udest->addr, nla_addr, sizeof(udest->addr));
3120 udest->port = nla_get_be16(nla_port); 3153 udest->port = nla_get_be16(nla_port);
3121 3154
3155 if (nla_addr_family)
3156 udest->af = nla_get_u16(nla_addr_family);
3157 else
3158 udest->af = 0;
3159
3122 /* If a full entry was requested, check for the additional fields */ 3160 /* If a full entry was requested, check for the additional fields */
3123 if (full_entry) { 3161 if (full_entry) {
3124 struct nlattr *nla_fwd, *nla_weight, *nla_u_thresh, 3162 struct nlattr *nla_fwd, *nla_weight, *nla_u_thresh,
@@ -3223,6 +3261,12 @@ static int ip_vs_genl_new_daemon(struct net *net, struct nlattr **attrs)
3223 attrs[IPVS_DAEMON_ATTR_SYNC_ID])) 3261 attrs[IPVS_DAEMON_ATTR_SYNC_ID]))
3224 return -EINVAL; 3262 return -EINVAL;
3225 3263
3264 /* The synchronization protocol is incompatible with mixed family
3265 * services
3266 */
3267 if (net_ipvs(net)->mixed_address_family_dests > 0)
3268 return -EINVAL;
3269
3226 return start_sync_thread(net, 3270 return start_sync_thread(net,
3227 nla_get_u32(attrs[IPVS_DAEMON_ATTR_STATE]), 3271 nla_get_u32(attrs[IPVS_DAEMON_ATTR_STATE]),
3228 nla_data(attrs[IPVS_DAEMON_ATTR_MCAST_IFN]), 3272 nla_data(attrs[IPVS_DAEMON_ATTR_MCAST_IFN]),
@@ -3346,6 +3390,35 @@ static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info)
3346 need_full_dest); 3390 need_full_dest);
3347 if (ret) 3391 if (ret)
3348 goto out; 3392 goto out;
3393
3394 /* Old protocols did not allow the user to specify address
3395 * family, so we set it to zero instead. We also didn't
3396 * allow heterogeneous pools in the old code, so it's safe
3397 * to assume that this will have the same address family as
3398 * the service.
3399 */
3400 if (udest.af == 0)
3401 udest.af = svc->af;
3402
3403 if (udest.af != svc->af) {
3404 /* The synchronization protocol is incompatible
3405 * with mixed family services
3406 */
3407 if (net_ipvs(net)->sync_state) {
3408 ret = -EINVAL;
3409 goto out;
3410 }
3411
3412 /* Which connection types do we support? */
3413 switch (udest.conn_flags) {
3414 case IP_VS_CONN_F_TUNNEL:
3415 /* We are able to forward this */
3416 break;
3417 default:
3418 ret = -EINVAL;
3419 goto out;
3420 }
3421 }
3349 } 3422 }
3350 3423
3351 switch (cmd) { 3424 switch (cmd) {
diff --git a/net/netfilter/ipvs/ip_vs_dh.c b/net/netfilter/ipvs/ip_vs_dh.c
index c3b84546ea9e..6be5c538b71e 100644
--- a/net/netfilter/ipvs/ip_vs_dh.c
+++ b/net/netfilter/ipvs/ip_vs_dh.c
@@ -234,7 +234,7 @@ ip_vs_dh_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
234 234
235 IP_VS_DBG_BUF(6, "DH: destination IP address %s --> server %s:%d\n", 235 IP_VS_DBG_BUF(6, "DH: destination IP address %s --> server %s:%d\n",
236 IP_VS_DBG_ADDR(svc->af, &iph->daddr), 236 IP_VS_DBG_ADDR(svc->af, &iph->daddr),
237 IP_VS_DBG_ADDR(svc->af, &dest->addr), 237 IP_VS_DBG_ADDR(dest->af, &dest->addr),
238 ntohs(dest->port)); 238 ntohs(dest->port));
239 239
240 return dest; 240 return dest;
diff --git a/net/netfilter/ipvs/ip_vs_fo.c b/net/netfilter/ipvs/ip_vs_fo.c
new file mode 100644
index 000000000000..e09874d02938
--- /dev/null
+++ b/net/netfilter/ipvs/ip_vs_fo.c
@@ -0,0 +1,79 @@
1/*
2 * IPVS: Weighted Fail Over module
3 *
4 * Authors: Kenny Mathis <kmathis@chokepoint.net>
5 *
6 * This program is free software; you can redistribute it and/or
7 * modify it under the terms of the GNU General Public License
8 * as published by the Free Software Foundation; either version
9 * 2 of the License, or (at your option) any later version.
10 *
11 * Changes:
12 * Kenny Mathis : added initial functionality based on weight
13 *
14 */
15
16#define KMSG_COMPONENT "IPVS"
17#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
18
19#include <linux/module.h>
20#include <linux/kernel.h>
21
22#include <net/ip_vs.h>
23
24/* Weighted Fail Over Module */
25static struct ip_vs_dest *
26ip_vs_fo_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
27 struct ip_vs_iphdr *iph)
28{
29 struct ip_vs_dest *dest, *hweight = NULL;
30 int hw = 0; /* Track highest weight */
31
32 IP_VS_DBG(6, "ip_vs_fo_schedule(): Scheduling...\n");
33
34 /* Basic failover functionality
35 * Find virtual server with highest weight and send it traffic
36 */
37 list_for_each_entry_rcu(dest, &svc->destinations, n_list) {
38 if (!(dest->flags & IP_VS_DEST_F_OVERLOAD) &&
39 atomic_read(&dest->weight) > hw) {
40 hweight = dest;
41 hw = atomic_read(&dest->weight);
42 }
43 }
44
45 if (hweight) {
46 IP_VS_DBG_BUF(6, "FO: server %s:%u activeconns %d weight %d\n",
47 IP_VS_DBG_ADDR(hweight->af, &hweight->addr),
48 ntohs(hweight->port),
49 atomic_read(&hweight->activeconns),
50 atomic_read(&hweight->weight));
51 return hweight;
52 }
53
54 ip_vs_scheduler_err(svc, "no destination available");
55 return NULL;
56}
57
58static struct ip_vs_scheduler ip_vs_fo_scheduler = {
59 .name = "fo",
60 .refcnt = ATOMIC_INIT(0),
61 .module = THIS_MODULE,
62 .n_list = LIST_HEAD_INIT(ip_vs_fo_scheduler.n_list),
63 .schedule = ip_vs_fo_schedule,
64};
65
66static int __init ip_vs_fo_init(void)
67{
68 return register_ip_vs_scheduler(&ip_vs_fo_scheduler);
69}
70
71static void __exit ip_vs_fo_cleanup(void)
72{
73 unregister_ip_vs_scheduler(&ip_vs_fo_scheduler);
74 synchronize_rcu();
75}
76
77module_init(ip_vs_fo_init);
78module_exit(ip_vs_fo_cleanup);
79MODULE_LICENSE("GPL");
diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index 77c173282f38..a64fa15790e5 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
@@ -233,7 +233,8 @@ static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp,
233 ip_vs_conn_fill_param(ip_vs_conn_net(cp), 233 ip_vs_conn_fill_param(ip_vs_conn_net(cp),
234 AF_INET, IPPROTO_TCP, &cp->caddr, 234 AF_INET, IPPROTO_TCP, &cp->caddr,
235 0, &cp->vaddr, port, &p); 235 0, &cp->vaddr, port, &p);
236 n_cp = ip_vs_conn_new(&p, &from, port, 236 /* As above, this is ipv4 only */
237 n_cp = ip_vs_conn_new(&p, AF_INET, &from, port,
237 IP_VS_CONN_F_NO_CPORT | 238 IP_VS_CONN_F_NO_CPORT |
238 IP_VS_CONN_F_NFCT, 239 IP_VS_CONN_F_NFCT,
239 cp->dest, skb->mark); 240 cp->dest, skb->mark);
@@ -396,7 +397,8 @@ static int ip_vs_ftp_in(struct ip_vs_app *app, struct ip_vs_conn *cp,
396 htons(ntohs(cp->vport)-1), &p); 397 htons(ntohs(cp->vport)-1), &p);
397 n_cp = ip_vs_conn_in_get(&p); 398 n_cp = ip_vs_conn_in_get(&p);
398 if (!n_cp) { 399 if (!n_cp) {
399 n_cp = ip_vs_conn_new(&p, &cp->daddr, 400 /* This is ipv4 only */
401 n_cp = ip_vs_conn_new(&p, AF_INET, &cp->daddr,
400 htons(ntohs(cp->dport)-1), 402 htons(ntohs(cp->dport)-1),
401 IP_VS_CONN_F_NFCT, cp->dest, 403 IP_VS_CONN_F_NFCT, cp->dest,
402 skb->mark); 404 skb->mark);
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index 547ff33c1efd..127f14046c51 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -199,11 +199,11 @@ ip_vs_lblc_get(int af, struct ip_vs_lblc_table *tbl,
199 */ 199 */
200static inline struct ip_vs_lblc_entry * 200static inline struct ip_vs_lblc_entry *
201ip_vs_lblc_new(struct ip_vs_lblc_table *tbl, const union nf_inet_addr *daddr, 201ip_vs_lblc_new(struct ip_vs_lblc_table *tbl, const union nf_inet_addr *daddr,
202 struct ip_vs_dest *dest) 202 u16 af, struct ip_vs_dest *dest)
203{ 203{
204 struct ip_vs_lblc_entry *en; 204 struct ip_vs_lblc_entry *en;
205 205
206 en = ip_vs_lblc_get(dest->af, tbl, daddr); 206 en = ip_vs_lblc_get(af, tbl, daddr);
207 if (en) { 207 if (en) {
208 if (en->dest == dest) 208 if (en->dest == dest)
209 return en; 209 return en;
@@ -213,8 +213,8 @@ ip_vs_lblc_new(struct ip_vs_lblc_table *tbl, const union nf_inet_addr *daddr,
213 if (!en) 213 if (!en)
214 return NULL; 214 return NULL;
215 215
216 en->af = dest->af; 216 en->af = af;
217 ip_vs_addr_copy(dest->af, &en->addr, daddr); 217 ip_vs_addr_copy(af, &en->addr, daddr);
218 en->lastuse = jiffies; 218 en->lastuse = jiffies;
219 219
220 ip_vs_dest_hold(dest); 220 ip_vs_dest_hold(dest);
@@ -521,13 +521,13 @@ ip_vs_lblc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
521 /* If we fail to create a cache entry, we'll just use the valid dest */ 521 /* If we fail to create a cache entry, we'll just use the valid dest */
522 spin_lock_bh(&svc->sched_lock); 522 spin_lock_bh(&svc->sched_lock);
523 if (!tbl->dead) 523 if (!tbl->dead)
524 ip_vs_lblc_new(tbl, &iph->daddr, dest); 524 ip_vs_lblc_new(tbl, &iph->daddr, svc->af, dest);
525 spin_unlock_bh(&svc->sched_lock); 525 spin_unlock_bh(&svc->sched_lock);
526 526
527out: 527out:
528 IP_VS_DBG_BUF(6, "LBLC: destination IP address %s --> server %s:%d\n", 528 IP_VS_DBG_BUF(6, "LBLC: destination IP address %s --> server %s:%d\n",
529 IP_VS_DBG_ADDR(svc->af, &iph->daddr), 529 IP_VS_DBG_ADDR(svc->af, &iph->daddr),
530 IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port)); 530 IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port));
531 531
532 return dest; 532 return dest;
533} 533}
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 3f21a2f47de1..2229d2d8bbe0 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -362,18 +362,18 @@ ip_vs_lblcr_get(int af, struct ip_vs_lblcr_table *tbl,
362 */ 362 */
363static inline struct ip_vs_lblcr_entry * 363static inline struct ip_vs_lblcr_entry *
364ip_vs_lblcr_new(struct ip_vs_lblcr_table *tbl, const union nf_inet_addr *daddr, 364ip_vs_lblcr_new(struct ip_vs_lblcr_table *tbl, const union nf_inet_addr *daddr,
365 struct ip_vs_dest *dest) 365 u16 af, struct ip_vs_dest *dest)
366{ 366{
367 struct ip_vs_lblcr_entry *en; 367 struct ip_vs_lblcr_entry *en;
368 368
369 en = ip_vs_lblcr_get(dest->af, tbl, daddr); 369 en = ip_vs_lblcr_get(af, tbl, daddr);
370 if (!en) { 370 if (!en) {
371 en = kmalloc(sizeof(*en), GFP_ATOMIC); 371 en = kmalloc(sizeof(*en), GFP_ATOMIC);
372 if (!en) 372 if (!en)
373 return NULL; 373 return NULL;
374 374
375 en->af = dest->af; 375 en->af = af;
376 ip_vs_addr_copy(dest->af, &en->addr, daddr); 376 ip_vs_addr_copy(af, &en->addr, daddr);
377 en->lastuse = jiffies; 377 en->lastuse = jiffies;
378 378
379 /* initialize its dest set */ 379 /* initialize its dest set */
@@ -706,13 +706,13 @@ ip_vs_lblcr_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
706 /* If we fail to create a cache entry, we'll just use the valid dest */ 706 /* If we fail to create a cache entry, we'll just use the valid dest */
707 spin_lock_bh(&svc->sched_lock); 707 spin_lock_bh(&svc->sched_lock);
708 if (!tbl->dead) 708 if (!tbl->dead)
709 ip_vs_lblcr_new(tbl, &iph->daddr, dest); 709 ip_vs_lblcr_new(tbl, &iph->daddr, svc->af, dest);
710 spin_unlock_bh(&svc->sched_lock); 710 spin_unlock_bh(&svc->sched_lock);
711 711
712out: 712out:
713 IP_VS_DBG_BUF(6, "LBLCR: destination IP address %s --> server %s:%d\n", 713 IP_VS_DBG_BUF(6, "LBLCR: destination IP address %s --> server %s:%d\n",
714 IP_VS_DBG_ADDR(svc->af, &iph->daddr), 714 IP_VS_DBG_ADDR(svc->af, &iph->daddr),
715 IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port)); 715 IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port));
716 716
717 return dest; 717 return dest;
718} 718}
diff --git a/net/netfilter/ipvs/ip_vs_lc.c b/net/netfilter/ipvs/ip_vs_lc.c
index 2bdcb1cf2127..19a0769a989a 100644
--- a/net/netfilter/ipvs/ip_vs_lc.c
+++ b/net/netfilter/ipvs/ip_vs_lc.c
@@ -59,7 +59,7 @@ ip_vs_lc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
59 else 59 else
60 IP_VS_DBG_BUF(6, "LC: server %s:%u activeconns %d " 60 IP_VS_DBG_BUF(6, "LC: server %s:%u activeconns %d "
61 "inactconns %d\n", 61 "inactconns %d\n",
62 IP_VS_DBG_ADDR(svc->af, &least->addr), 62 IP_VS_DBG_ADDR(least->af, &least->addr),
63 ntohs(least->port), 63 ntohs(least->port),
64 atomic_read(&least->activeconns), 64 atomic_read(&least->activeconns),
65 atomic_read(&least->inactconns)); 65 atomic_read(&least->inactconns));
diff --git a/net/netfilter/ipvs/ip_vs_nq.c b/net/netfilter/ipvs/ip_vs_nq.c
index 961a6de9bb29..a8b63401e773 100644
--- a/net/netfilter/ipvs/ip_vs_nq.c
+++ b/net/netfilter/ipvs/ip_vs_nq.c
@@ -107,7 +107,8 @@ ip_vs_nq_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
107 out: 107 out:
108 IP_VS_DBG_BUF(6, "NQ: server %s:%u " 108 IP_VS_DBG_BUF(6, "NQ: server %s:%u "
109 "activeconns %d refcnt %d weight %d overhead %d\n", 109 "activeconns %d refcnt %d weight %d overhead %d\n",
110 IP_VS_DBG_ADDR(svc->af, &least->addr), ntohs(least->port), 110 IP_VS_DBG_ADDR(least->af, &least->addr),
111 ntohs(least->port),
111 atomic_read(&least->activeconns), 112 atomic_read(&least->activeconns),
112 atomic_read(&least->refcnt), 113 atomic_read(&least->refcnt),
113 atomic_read(&least->weight), loh); 114 atomic_read(&least->weight), loh);
diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index 2f7ea7564044..5b84c0b56642 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -432,7 +432,7 @@ set_sctp_state(struct ip_vs_proto_data *pd, struct ip_vs_conn *cp,
432 pd->pp->name, 432 pd->pp->name,
433 ((direction == IP_VS_DIR_OUTPUT) ? 433 ((direction == IP_VS_DIR_OUTPUT) ?
434 "output " : "input "), 434 "output " : "input "),
435 IP_VS_DBG_ADDR(cp->af, &cp->daddr), 435 IP_VS_DBG_ADDR(cp->daf, &cp->daddr),
436 ntohs(cp->dport), 436 ntohs(cp->dport),
437 IP_VS_DBG_ADDR(cp->af, &cp->caddr), 437 IP_VS_DBG_ADDR(cp->af, &cp->caddr),
438 ntohs(cp->cport), 438 ntohs(cp->cport),
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index e3a697234a98..8e92beb0cca9 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -510,7 +510,7 @@ set_tcp_state(struct ip_vs_proto_data *pd, struct ip_vs_conn *cp,
510 th->fin ? 'F' : '.', 510 th->fin ? 'F' : '.',
511 th->ack ? 'A' : '.', 511 th->ack ? 'A' : '.',
512 th->rst ? 'R' : '.', 512 th->rst ? 'R' : '.',
513 IP_VS_DBG_ADDR(cp->af, &cp->daddr), 513 IP_VS_DBG_ADDR(cp->daf, &cp->daddr),
514 ntohs(cp->dport), 514 ntohs(cp->dport),
515 IP_VS_DBG_ADDR(cp->af, &cp->caddr), 515 IP_VS_DBG_ADDR(cp->af, &cp->caddr),
516 ntohs(cp->cport), 516 ntohs(cp->cport),
diff --git a/net/netfilter/ipvs/ip_vs_rr.c b/net/netfilter/ipvs/ip_vs_rr.c
index 176b87c35e34..58bacfc461ee 100644
--- a/net/netfilter/ipvs/ip_vs_rr.c
+++ b/net/netfilter/ipvs/ip_vs_rr.c
@@ -95,7 +95,7 @@ stop:
95 spin_unlock_bh(&svc->sched_lock); 95 spin_unlock_bh(&svc->sched_lock);
96 IP_VS_DBG_BUF(6, "RR: server %s:%u " 96 IP_VS_DBG_BUF(6, "RR: server %s:%u "
97 "activeconns %d refcnt %d weight %d\n", 97 "activeconns %d refcnt %d weight %d\n",
98 IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port), 98 IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port),
99 atomic_read(&dest->activeconns), 99 atomic_read(&dest->activeconns),
100 atomic_read(&dest->refcnt), atomic_read(&dest->weight)); 100 atomic_read(&dest->refcnt), atomic_read(&dest->weight));
101 101
diff --git a/net/netfilter/ipvs/ip_vs_sed.c b/net/netfilter/ipvs/ip_vs_sed.c
index e446b9fa7424..f8e2d00f528b 100644
--- a/net/netfilter/ipvs/ip_vs_sed.c
+++ b/net/netfilter/ipvs/ip_vs_sed.c
@@ -108,7 +108,8 @@ ip_vs_sed_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
108 108
109 IP_VS_DBG_BUF(6, "SED: server %s:%u " 109 IP_VS_DBG_BUF(6, "SED: server %s:%u "
110 "activeconns %d refcnt %d weight %d overhead %d\n", 110 "activeconns %d refcnt %d weight %d overhead %d\n",
111 IP_VS_DBG_ADDR(svc->af, &least->addr), ntohs(least->port), 111 IP_VS_DBG_ADDR(least->af, &least->addr),
112 ntohs(least->port),
112 atomic_read(&least->activeconns), 113 atomic_read(&least->activeconns),
113 atomic_read(&least->refcnt), 114 atomic_read(&least->refcnt),
114 atomic_read(&least->weight), loh); 115 atomic_read(&least->weight), loh);
diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index cc65b2f42cd4..98a13433b68c 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
@@ -138,7 +138,7 @@ ip_vs_sh_get_fallback(struct ip_vs_service *svc, struct ip_vs_sh_state *s,
138 return dest; 138 return dest;
139 139
140 IP_VS_DBG_BUF(6, "SH: selected unavailable server %s:%d, reselecting", 140 IP_VS_DBG_BUF(6, "SH: selected unavailable server %s:%d, reselecting",
141 IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port)); 141 IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port));
142 142
143 /* if the original dest is unavailable, loop around the table 143 /* if the original dest is unavailable, loop around the table
144 * starting from ihash to find a new dest 144 * starting from ihash to find a new dest
@@ -153,7 +153,7 @@ ip_vs_sh_get_fallback(struct ip_vs_service *svc, struct ip_vs_sh_state *s,
153 return dest; 153 return dest;
154 IP_VS_DBG_BUF(6, "SH: selected unavailable " 154 IP_VS_DBG_BUF(6, "SH: selected unavailable "
155 "server %s:%d (offset %d), reselecting", 155 "server %s:%d (offset %d), reselecting",
156 IP_VS_DBG_ADDR(svc->af, &dest->addr), 156 IP_VS_DBG_ADDR(dest->af, &dest->addr),
157 ntohs(dest->port), roffset); 157 ntohs(dest->port), roffset);
158 } 158 }
159 159
@@ -192,7 +192,7 @@ ip_vs_sh_reassign(struct ip_vs_sh_state *s, struct ip_vs_service *svc)
192 RCU_INIT_POINTER(b->dest, dest); 192 RCU_INIT_POINTER(b->dest, dest);
193 193
194 IP_VS_DBG_BUF(6, "assigned i: %d dest: %s weight: %d\n", 194 IP_VS_DBG_BUF(6, "assigned i: %d dest: %s weight: %d\n",
195 i, IP_VS_DBG_ADDR(svc->af, &dest->addr), 195 i, IP_VS_DBG_ADDR(dest->af, &dest->addr),
196 atomic_read(&dest->weight)); 196 atomic_read(&dest->weight));
197 197
198 /* Don't move to next dest until filling weight */ 198 /* Don't move to next dest until filling weight */
@@ -342,7 +342,7 @@ ip_vs_sh_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
342 342
343 IP_VS_DBG_BUF(6, "SH: source IP address %s --> server %s:%d\n", 343 IP_VS_DBG_BUF(6, "SH: source IP address %s --> server %s:%d\n",
344 IP_VS_DBG_ADDR(svc->af, &iph->saddr), 344 IP_VS_DBG_ADDR(svc->af, &iph->saddr),
345 IP_VS_DBG_ADDR(svc->af, &dest->addr), 345 IP_VS_DBG_ADDR(dest->af, &dest->addr),
346 ntohs(dest->port)); 346 ntohs(dest->port));
347 347
348 return dest; 348 return dest;
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index eadffb29dec0..7162c86fd50d 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -880,10 +880,17 @@ static void ip_vs_proc_conn(struct net *net, struct ip_vs_conn_param *param,
880 * but still handled. 880 * but still handled.
881 */ 881 */
882 rcu_read_lock(); 882 rcu_read_lock();
883 dest = ip_vs_find_dest(net, type, daddr, dport, param->vaddr, 883 /* This function is only invoked by the synchronization
884 param->vport, protocol, fwmark, flags); 884 * code. We do not currently support heterogeneous pools
885 * with synchronization, so we can make the assumption that
886 * the svc_af is the same as the dest_af
887 */
888 dest = ip_vs_find_dest(net, type, type, daddr, dport,
889 param->vaddr, param->vport, protocol,
890 fwmark, flags);
885 891
886 cp = ip_vs_conn_new(param, daddr, dport, flags, dest, fwmark); 892 cp = ip_vs_conn_new(param, type, daddr, dport, flags, dest,
893 fwmark);
887 rcu_read_unlock(); 894 rcu_read_unlock();
888 if (!cp) { 895 if (!cp) {
889 kfree(param->pe_data); 896 kfree(param->pe_data);
diff --git a/net/netfilter/ipvs/ip_vs_wlc.c b/net/netfilter/ipvs/ip_vs_wlc.c
index b5b4650d50a9..6b366fd90554 100644
--- a/net/netfilter/ipvs/ip_vs_wlc.c
+++ b/net/netfilter/ipvs/ip_vs_wlc.c
@@ -80,7 +80,8 @@ ip_vs_wlc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
80 80
81 IP_VS_DBG_BUF(6, "WLC: server %s:%u " 81 IP_VS_DBG_BUF(6, "WLC: server %s:%u "
82 "activeconns %d refcnt %d weight %d overhead %d\n", 82 "activeconns %d refcnt %d weight %d overhead %d\n",
83 IP_VS_DBG_ADDR(svc->af, &least->addr), ntohs(least->port), 83 IP_VS_DBG_ADDR(least->af, &least->addr),
84 ntohs(least->port),
84 atomic_read(&least->activeconns), 85 atomic_read(&least->activeconns),
85 atomic_read(&least->refcnt), 86 atomic_read(&least->refcnt),
86 atomic_read(&least->weight), loh); 87 atomic_read(&least->weight), loh);
diff --git a/net/netfilter/ipvs/ip_vs_wrr.c b/net/netfilter/ipvs/ip_vs_wrr.c
index 0546cd572d6b..17e6d4406ca7 100644
--- a/net/netfilter/ipvs/ip_vs_wrr.c
+++ b/net/netfilter/ipvs/ip_vs_wrr.c
@@ -216,7 +216,7 @@ ip_vs_wrr_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
216found: 216found:
217 IP_VS_DBG_BUF(6, "WRR: server %s:%u " 217 IP_VS_DBG_BUF(6, "WRR: server %s:%u "
218 "activeconns %d refcnt %d weight %d\n", 218 "activeconns %d refcnt %d weight %d\n",
219 IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port), 219 IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port),
220 atomic_read(&dest->activeconns), 220 atomic_read(&dest->activeconns),
221 atomic_read(&dest->refcnt), 221 atomic_read(&dest->refcnt),
222 atomic_read(&dest->weight)); 222 atomic_read(&dest->weight));
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 56896a412bce..91f17c1eb8a2 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -157,18 +157,113 @@ retry:
157 return rt; 157 return rt;
158} 158}
159 159
160#ifdef CONFIG_IP_VS_IPV6
161static inline int __ip_vs_is_local_route6(struct rt6_info *rt)
162{
163 return rt->dst.dev && rt->dst.dev->flags & IFF_LOOPBACK;
164}
165#endif
166
167static inline bool crosses_local_route_boundary(int skb_af, struct sk_buff *skb,
168 int rt_mode,
169 bool new_rt_is_local)
170{
171 bool rt_mode_allow_local = !!(rt_mode & IP_VS_RT_MODE_LOCAL);
172 bool rt_mode_allow_non_local = !!(rt_mode & IP_VS_RT_MODE_LOCAL);
173 bool rt_mode_allow_redirect = !!(rt_mode & IP_VS_RT_MODE_RDR);
174 bool source_is_loopback;
175 bool old_rt_is_local;
176
177#ifdef CONFIG_IP_VS_IPV6
178 if (skb_af == AF_INET6) {
179 int addr_type = ipv6_addr_type(&ipv6_hdr(skb)->saddr);
180
181 source_is_loopback =
182 (!skb->dev || skb->dev->flags & IFF_LOOPBACK) &&
183 (addr_type & IPV6_ADDR_LOOPBACK);
184 old_rt_is_local = __ip_vs_is_local_route6(
185 (struct rt6_info *)skb_dst(skb));
186 } else
187#endif
188 {
189 source_is_loopback = ipv4_is_loopback(ip_hdr(skb)->saddr);
190 old_rt_is_local = skb_rtable(skb)->rt_flags & RTCF_LOCAL;
191 }
192
193 if (unlikely(new_rt_is_local)) {
194 if (!rt_mode_allow_local)
195 return true;
196 if (!rt_mode_allow_redirect && !old_rt_is_local)
197 return true;
198 } else {
199 if (!rt_mode_allow_non_local)
200 return true;
201 if (source_is_loopback)
202 return true;
203 }
204 return false;
205}
206
207static inline void maybe_update_pmtu(int skb_af, struct sk_buff *skb, int mtu)
208{
209 struct sock *sk = skb->sk;
210 struct rtable *ort = skb_rtable(skb);
211
212 if (!skb->dev && sk && sk->sk_state != TCP_TIME_WAIT)
213 ort->dst.ops->update_pmtu(&ort->dst, sk, NULL, mtu);
214}
215
216static inline bool ensure_mtu_is_adequate(int skb_af, int rt_mode,
217 struct ip_vs_iphdr *ipvsh,
218 struct sk_buff *skb, int mtu)
219{
220#ifdef CONFIG_IP_VS_IPV6
221 if (skb_af == AF_INET6) {
222 struct net *net = dev_net(skb_dst(skb)->dev);
223
224 if (unlikely(__mtu_check_toobig_v6(skb, mtu))) {
225 if (!skb->dev)
226 skb->dev = net->loopback_dev;
227 /* only send ICMP too big on first fragment */
228 if (!ipvsh->fragoffs)
229 icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
230 IP_VS_DBG(1, "frag needed for %pI6c\n",
231 &ipv6_hdr(skb)->saddr);
232 return false;
233 }
234 } else
235#endif
236 {
237 struct netns_ipvs *ipvs = net_ipvs(skb_net(skb));
238
239 /* If we're going to tunnel the packet and pmtu discovery
240 * is disabled, we'll just fragment it anyway
241 */
242 if ((rt_mode & IP_VS_RT_MODE_TUNNEL) && !sysctl_pmtu_disc(ipvs))
243 return true;
244
245 if (unlikely(ip_hdr(skb)->frag_off & htons(IP_DF) &&
246 skb->len > mtu && !skb_is_gso(skb))) {
247 icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
248 htonl(mtu));
249 IP_VS_DBG(1, "frag needed for %pI4\n",
250 &ip_hdr(skb)->saddr);
251 return false;
252 }
253 }
254
255 return true;
256}
257
160/* Get route to destination or remote server */ 258/* Get route to destination or remote server */
161static int 259static int
162__ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_dest *dest, 260__ip_vs_get_out_rt(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
163 __be32 daddr, int rt_mode, __be32 *ret_saddr) 261 __be32 daddr, int rt_mode, __be32 *ret_saddr,
262 struct ip_vs_iphdr *ipvsh)
164{ 263{
165 struct net *net = dev_net(skb_dst(skb)->dev); 264 struct net *net = dev_net(skb_dst(skb)->dev);
166 struct netns_ipvs *ipvs = net_ipvs(net);
167 struct ip_vs_dest_dst *dest_dst; 265 struct ip_vs_dest_dst *dest_dst;
168 struct rtable *rt; /* Route to the other host */ 266 struct rtable *rt; /* Route to the other host */
169 struct rtable *ort; /* Original route */
170 struct iphdr *iph;
171 __be16 df;
172 int mtu; 267 int mtu;
173 int local, noref = 1; 268 int local, noref = 1;
174 269
@@ -218,30 +313,14 @@ __ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_dest *dest,
218 } 313 }
219 314
220 local = (rt->rt_flags & RTCF_LOCAL) ? 1 : 0; 315 local = (rt->rt_flags & RTCF_LOCAL) ? 1 : 0;
221 if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) & 316 if (unlikely(crosses_local_route_boundary(skb_af, skb, rt_mode,
222 rt_mode)) { 317 local))) {
223 IP_VS_DBG_RL("Stopping traffic to %s address, dest: %pI4\n", 318 IP_VS_DBG_RL("We are crossing local and non-local addresses"
224 (rt->rt_flags & RTCF_LOCAL) ? 319 " daddr=%pI4\n", &dest->addr.ip);
225 "local":"non-local", &daddr);
226 goto err_put; 320 goto err_put;
227 } 321 }
228 iph = ip_hdr(skb); 322
229 if (likely(!local)) { 323 if (unlikely(local)) {
230 if (unlikely(ipv4_is_loopback(iph->saddr))) {
231 IP_VS_DBG_RL("Stopping traffic from loopback address "
232 "%pI4 to non-local address, dest: %pI4\n",
233 &iph->saddr, &daddr);
234 goto err_put;
235 }
236 } else {
237 ort = skb_rtable(skb);
238 if (!(rt_mode & IP_VS_RT_MODE_RDR) &&
239 !(ort->rt_flags & RTCF_LOCAL)) {
240 IP_VS_DBG_RL("Redirect from non-local address %pI4 to "
241 "local requires NAT method, dest: %pI4\n",
242 &iph->daddr, &daddr);
243 goto err_put;
244 }
245 /* skb to local stack, preserve old route */ 324 /* skb to local stack, preserve old route */
246 if (!noref) 325 if (!noref)
247 ip_rt_put(rt); 326 ip_rt_put(rt);
@@ -250,28 +329,17 @@ __ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_dest *dest,
250 329
251 if (likely(!(rt_mode & IP_VS_RT_MODE_TUNNEL))) { 330 if (likely(!(rt_mode & IP_VS_RT_MODE_TUNNEL))) {
252 mtu = dst_mtu(&rt->dst); 331 mtu = dst_mtu(&rt->dst);
253 df = iph->frag_off & htons(IP_DF);
254 } else { 332 } else {
255 struct sock *sk = skb->sk;
256
257 mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr); 333 mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr);
258 if (mtu < 68) { 334 if (mtu < 68) {
259 IP_VS_DBG_RL("%s(): mtu less than 68\n", __func__); 335 IP_VS_DBG_RL("%s(): mtu less than 68\n", __func__);
260 goto err_put; 336 goto err_put;
261 } 337 }
262 ort = skb_rtable(skb); 338 maybe_update_pmtu(skb_af, skb, mtu);
263 if (!skb->dev && sk && sk->sk_state != TCP_TIME_WAIT)
264 ort->dst.ops->update_pmtu(&ort->dst, sk, NULL, mtu);
265 /* MTU check allowed? */
266 df = sysctl_pmtu_disc(ipvs) ? iph->frag_off & htons(IP_DF) : 0;
267 } 339 }
268 340
269 /* MTU checking */ 341 if (!ensure_mtu_is_adequate(skb_af, rt_mode, ipvsh, skb, mtu))
270 if (unlikely(df && skb->len > mtu && !skb_is_gso(skb))) {
271 icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
272 IP_VS_DBG(1, "frag needed for %pI4\n", &iph->saddr);
273 goto err_put; 342 goto err_put;
274 }
275 343
276 skb_dst_drop(skb); 344 skb_dst_drop(skb);
277 if (noref) { 345 if (noref) {
@@ -295,12 +363,6 @@ err_unreach:
295} 363}
296 364
297#ifdef CONFIG_IP_VS_IPV6 365#ifdef CONFIG_IP_VS_IPV6
298
299static inline int __ip_vs_is_local_route6(struct rt6_info *rt)
300{
301 return rt->dst.dev && rt->dst.dev->flags & IFF_LOOPBACK;
302}
303
304static struct dst_entry * 366static struct dst_entry *
305__ip_vs_route_output_v6(struct net *net, struct in6_addr *daddr, 367__ip_vs_route_output_v6(struct net *net, struct in6_addr *daddr,
306 struct in6_addr *ret_saddr, int do_xfrm) 368 struct in6_addr *ret_saddr, int do_xfrm)
@@ -339,14 +401,13 @@ out_err:
339 * Get route to destination or remote server 401 * Get route to destination or remote server
340 */ 402 */
341static int 403static int
342__ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest, 404__ip_vs_get_out_rt_v6(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
343 struct in6_addr *daddr, struct in6_addr *ret_saddr, 405 struct in6_addr *daddr, struct in6_addr *ret_saddr,
344 struct ip_vs_iphdr *ipvsh, int do_xfrm, int rt_mode) 406 struct ip_vs_iphdr *ipvsh, int do_xfrm, int rt_mode)
345{ 407{
346 struct net *net = dev_net(skb_dst(skb)->dev); 408 struct net *net = dev_net(skb_dst(skb)->dev);
347 struct ip_vs_dest_dst *dest_dst; 409 struct ip_vs_dest_dst *dest_dst;
348 struct rt6_info *rt; /* Route to the other host */ 410 struct rt6_info *rt; /* Route to the other host */
349 struct rt6_info *ort; /* Original route */
350 struct dst_entry *dst; 411 struct dst_entry *dst;
351 int mtu; 412 int mtu;
352 int local, noref = 1; 413 int local, noref = 1;
@@ -393,32 +454,15 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
393 } 454 }
394 455
395 local = __ip_vs_is_local_route6(rt); 456 local = __ip_vs_is_local_route6(rt);
396 if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) & 457
397 rt_mode)) { 458 if (unlikely(crosses_local_route_boundary(skb_af, skb, rt_mode,
398 IP_VS_DBG_RL("Stopping traffic to %s address, dest: %pI6c\n", 459 local))) {
399 local ? "local":"non-local", daddr); 460 IP_VS_DBG_RL("We are crossing local and non-local addresses"
461 " daddr=%pI6\n", &dest->addr.in6);
400 goto err_put; 462 goto err_put;
401 } 463 }
402 if (likely(!local)) { 464
403 if (unlikely((!skb->dev || skb->dev->flags & IFF_LOOPBACK) && 465 if (unlikely(local)) {
404 ipv6_addr_type(&ipv6_hdr(skb)->saddr) &
405 IPV6_ADDR_LOOPBACK)) {
406 IP_VS_DBG_RL("Stopping traffic from loopback address "
407 "%pI6c to non-local address, "
408 "dest: %pI6c\n",
409 &ipv6_hdr(skb)->saddr, daddr);
410 goto err_put;
411 }
412 } else {
413 ort = (struct rt6_info *) skb_dst(skb);
414 if (!(rt_mode & IP_VS_RT_MODE_RDR) &&
415 !__ip_vs_is_local_route6(ort)) {
416 IP_VS_DBG_RL("Redirect from non-local address %pI6c "
417 "to local requires NAT method, "
418 "dest: %pI6c\n",
419 &ipv6_hdr(skb)->daddr, daddr);
420 goto err_put;
421 }
422 /* skb to local stack, preserve old route */ 466 /* skb to local stack, preserve old route */
423 if (!noref) 467 if (!noref)
424 dst_release(&rt->dst); 468 dst_release(&rt->dst);
@@ -429,28 +473,17 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
429 if (likely(!(rt_mode & IP_VS_RT_MODE_TUNNEL))) 473 if (likely(!(rt_mode & IP_VS_RT_MODE_TUNNEL)))
430 mtu = dst_mtu(&rt->dst); 474 mtu = dst_mtu(&rt->dst);
431 else { 475 else {
432 struct sock *sk = skb->sk;
433
434 mtu = dst_mtu(&rt->dst) - sizeof(struct ipv6hdr); 476 mtu = dst_mtu(&rt->dst) - sizeof(struct ipv6hdr);
435 if (mtu < IPV6_MIN_MTU) { 477 if (mtu < IPV6_MIN_MTU) {
436 IP_VS_DBG_RL("%s(): mtu less than %d\n", __func__, 478 IP_VS_DBG_RL("%s(): mtu less than %d\n", __func__,
437 IPV6_MIN_MTU); 479 IPV6_MIN_MTU);
438 goto err_put; 480 goto err_put;
439 } 481 }
440 ort = (struct rt6_info *) skb_dst(skb); 482 maybe_update_pmtu(skb_af, skb, mtu);
441 if (!skb->dev && sk && sk->sk_state != TCP_TIME_WAIT)
442 ort->dst.ops->update_pmtu(&ort->dst, sk, NULL, mtu);
443 } 483 }
444 484
445 if (unlikely(__mtu_check_toobig_v6(skb, mtu))) { 485 if (!ensure_mtu_is_adequate(skb_af, rt_mode, ipvsh, skb, mtu))
446 if (!skb->dev)
447 skb->dev = net->loopback_dev;
448 /* only send ICMP too big on first fragment */
449 if (!ipvsh->fragoffs)
450 icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
451 IP_VS_DBG(1, "frag needed for %pI6c\n", &ipv6_hdr(skb)->saddr);
452 goto err_put; 486 goto err_put;
453 }
454 487
455 skb_dst_drop(skb); 488 skb_dst_drop(skb);
456 if (noref) { 489 if (noref) {
@@ -556,8 +589,8 @@ ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
556 EnterFunction(10); 589 EnterFunction(10);
557 590
558 rcu_read_lock(); 591 rcu_read_lock();
559 if (__ip_vs_get_out_rt(skb, NULL, iph->daddr, IP_VS_RT_MODE_NON_LOCAL, 592 if (__ip_vs_get_out_rt(cp->af, skb, NULL, iph->daddr,
560 NULL) < 0) 593 IP_VS_RT_MODE_NON_LOCAL, NULL, ipvsh) < 0)
561 goto tx_error; 594 goto tx_error;
562 595
563 ip_send_check(iph); 596 ip_send_check(iph);
@@ -586,7 +619,7 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
586 EnterFunction(10); 619 EnterFunction(10);
587 620
588 rcu_read_lock(); 621 rcu_read_lock();
589 if (__ip_vs_get_out_rt_v6(skb, NULL, &ipvsh->daddr.in6, NULL, 622 if (__ip_vs_get_out_rt_v6(cp->af, skb, NULL, &ipvsh->daddr.in6, NULL,
590 ipvsh, 0, IP_VS_RT_MODE_NON_LOCAL) < 0) 623 ipvsh, 0, IP_VS_RT_MODE_NON_LOCAL) < 0)
591 goto tx_error; 624 goto tx_error;
592 625
@@ -633,10 +666,10 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
633 } 666 }
634 667
635 was_input = rt_is_input_route(skb_rtable(skb)); 668 was_input = rt_is_input_route(skb_rtable(skb));
636 local = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip, 669 local = __ip_vs_get_out_rt(cp->af, skb, cp->dest, cp->daddr.ip,
637 IP_VS_RT_MODE_LOCAL | 670 IP_VS_RT_MODE_LOCAL |
638 IP_VS_RT_MODE_NON_LOCAL | 671 IP_VS_RT_MODE_NON_LOCAL |
639 IP_VS_RT_MODE_RDR, NULL); 672 IP_VS_RT_MODE_RDR, NULL, ipvsh);
640 if (local < 0) 673 if (local < 0)
641 goto tx_error; 674 goto tx_error;
642 rt = skb_rtable(skb); 675 rt = skb_rtable(skb);
@@ -721,8 +754,8 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
721 IP_VS_DBG(10, "filled cport=%d\n", ntohs(*p)); 754 IP_VS_DBG(10, "filled cport=%d\n", ntohs(*p));
722 } 755 }
723 756
724 local = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6, NULL, 757 local = __ip_vs_get_out_rt_v6(cp->af, skb, cp->dest, &cp->daddr.in6,
725 ipvsh, 0, 758 NULL, ipvsh, 0,
726 IP_VS_RT_MODE_LOCAL | 759 IP_VS_RT_MODE_LOCAL |
727 IP_VS_RT_MODE_NON_LOCAL | 760 IP_VS_RT_MODE_NON_LOCAL |
728 IP_VS_RT_MODE_RDR); 761 IP_VS_RT_MODE_RDR);
@@ -791,6 +824,81 @@ tx_error:
791} 824}
792#endif 825#endif
793 826
827/* When forwarding a packet, we must ensure that we've got enough headroom
828 * for the encapsulation packet in the skb. This also gives us an
829 * opportunity to figure out what the payload_len, dsfield, ttl, and df
830 * values should be, so that we won't need to look at the old ip header
831 * again
832 */
833static struct sk_buff *
834ip_vs_prepare_tunneled_skb(struct sk_buff *skb, int skb_af,
835 unsigned int max_headroom, __u8 *next_protocol,
836 __u32 *payload_len, __u8 *dsfield, __u8 *ttl,
837 __be16 *df)
838{
839 struct sk_buff *new_skb = NULL;
840 struct iphdr *old_iph = NULL;
841#ifdef CONFIG_IP_VS_IPV6
842 struct ipv6hdr *old_ipv6h = NULL;
843#endif
844
845 if (skb_headroom(skb) < max_headroom || skb_cloned(skb)) {
846 new_skb = skb_realloc_headroom(skb, max_headroom);
847 if (!new_skb)
848 goto error;
849 consume_skb(skb);
850 skb = new_skb;
851 }
852
853#ifdef CONFIG_IP_VS_IPV6
854 if (skb_af == AF_INET6) {
855 old_ipv6h = ipv6_hdr(skb);
856 *next_protocol = IPPROTO_IPV6;
857 if (payload_len)
858 *payload_len =
859 ntohs(old_ipv6h->payload_len) +
860 sizeof(*old_ipv6h);
861 *dsfield = ipv6_get_dsfield(old_ipv6h);
862 *ttl = old_ipv6h->hop_limit;
863 if (df)
864 *df = 0;
865 } else
866#endif
867 {
868 old_iph = ip_hdr(skb);
869 /* Copy DF, reset fragment offset and MF */
870 if (df)
871 *df = (old_iph->frag_off & htons(IP_DF));
872 *next_protocol = IPPROTO_IPIP;
873
874 /* fix old IP header checksum */
875 ip_send_check(old_iph);
876 *dsfield = ipv4_get_dsfield(old_iph);
877 *ttl = old_iph->ttl;
878 if (payload_len)
879 *payload_len = ntohs(old_iph->tot_len);
880 }
881
882 return skb;
883error:
884 kfree_skb(skb);
885 return ERR_PTR(-ENOMEM);
886}
887
888static inline int __tun_gso_type_mask(int encaps_af, int orig_af)
889{
890 if (encaps_af == AF_INET) {
891 if (orig_af == AF_INET)
892 return SKB_GSO_IPIP;
893
894 return SKB_GSO_SIT;
895 }
896
897 /* GSO: we need to provide proper SKB_GSO_ value for IPv6:
898 * SKB_GSO_SIT/IPV6
899 */
900 return 0;
901}
794 902
795/* 903/*
796 * IP Tunneling transmitter 904 * IP Tunneling transmitter
@@ -819,9 +927,11 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
819 struct rtable *rt; /* Route to the other host */ 927 struct rtable *rt; /* Route to the other host */
820 __be32 saddr; /* Source for tunnel */ 928 __be32 saddr; /* Source for tunnel */
821 struct net_device *tdev; /* Device to other host */ 929 struct net_device *tdev; /* Device to other host */
822 struct iphdr *old_iph = ip_hdr(skb); 930 __u8 next_protocol = 0;
823 u8 tos = old_iph->tos; 931 __u8 dsfield = 0;
824 __be16 df; 932 __u8 ttl = 0;
933 __be16 df = 0;
934 __be16 *dfp = NULL;
825 struct iphdr *iph; /* Our new IP header */ 935 struct iphdr *iph; /* Our new IP header */
826 unsigned int max_headroom; /* The extra header space needed */ 936 unsigned int max_headroom; /* The extra header space needed */
827 int ret, local; 937 int ret, local;
@@ -829,11 +939,11 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
829 EnterFunction(10); 939 EnterFunction(10);
830 940
831 rcu_read_lock(); 941 rcu_read_lock();
832 local = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip, 942 local = __ip_vs_get_out_rt(cp->af, skb, cp->dest, cp->daddr.ip,
833 IP_VS_RT_MODE_LOCAL | 943 IP_VS_RT_MODE_LOCAL |
834 IP_VS_RT_MODE_NON_LOCAL | 944 IP_VS_RT_MODE_NON_LOCAL |
835 IP_VS_RT_MODE_CONNECT | 945 IP_VS_RT_MODE_CONNECT |
836 IP_VS_RT_MODE_TUNNEL, &saddr); 946 IP_VS_RT_MODE_TUNNEL, &saddr, ipvsh);
837 if (local < 0) 947 if (local < 0)
838 goto tx_error; 948 goto tx_error;
839 if (local) { 949 if (local) {
@@ -844,29 +954,21 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
844 rt = skb_rtable(skb); 954 rt = skb_rtable(skb);
845 tdev = rt->dst.dev; 955 tdev = rt->dst.dev;
846 956
847 /* Copy DF, reset fragment offset and MF */
848 df = sysctl_pmtu_disc(ipvs) ? old_iph->frag_off & htons(IP_DF) : 0;
849
850 /* 957 /*
851 * Okay, now see if we can stuff it in the buffer as-is. 958 * Okay, now see if we can stuff it in the buffer as-is.
852 */ 959 */
853 max_headroom = LL_RESERVED_SPACE(tdev) + sizeof(struct iphdr); 960 max_headroom = LL_RESERVED_SPACE(tdev) + sizeof(struct iphdr);
854 961
855 if (skb_headroom(skb) < max_headroom || skb_cloned(skb)) { 962 /* We only care about the df field if sysctl_pmtu_disc(ipvs) is set */
856 struct sk_buff *new_skb = 963 dfp = sysctl_pmtu_disc(ipvs) ? &df : NULL;
857 skb_realloc_headroom(skb, max_headroom); 964 skb = ip_vs_prepare_tunneled_skb(skb, cp->af, max_headroom,
858 965 &next_protocol, NULL, &dsfield,
859 if (!new_skb) 966 &ttl, dfp);
860 goto tx_error; 967 if (IS_ERR(skb))
861 consume_skb(skb); 968 goto tx_error;
862 skb = new_skb;
863 old_iph = ip_hdr(skb);
864 }
865
866 /* fix old IP header checksum */
867 ip_send_check(old_iph);
868 969
869 skb = iptunnel_handle_offloads(skb, false, SKB_GSO_IPIP); 970 skb = iptunnel_handle_offloads(
971 skb, false, __tun_gso_type_mask(AF_INET, cp->af));
870 if (IS_ERR(skb)) 972 if (IS_ERR(skb))
871 goto tx_error; 973 goto tx_error;
872 974
@@ -883,11 +985,11 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
883 iph->version = 4; 985 iph->version = 4;
884 iph->ihl = sizeof(struct iphdr)>>2; 986 iph->ihl = sizeof(struct iphdr)>>2;
885 iph->frag_off = df; 987 iph->frag_off = df;
886 iph->protocol = IPPROTO_IPIP; 988 iph->protocol = next_protocol;
887 iph->tos = tos; 989 iph->tos = dsfield;
888 iph->daddr = cp->daddr.ip; 990 iph->daddr = cp->daddr.ip;
889 iph->saddr = saddr; 991 iph->saddr = saddr;
890 iph->ttl = old_iph->ttl; 992 iph->ttl = ttl;
891 ip_select_ident(skb, NULL); 993 ip_select_ident(skb, NULL);
892 994
893 /* Another hack: avoid icmp_send in ip_fragment */ 995 /* Another hack: avoid icmp_send in ip_fragment */
@@ -920,7 +1022,10 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
920 struct rt6_info *rt; /* Route to the other host */ 1022 struct rt6_info *rt; /* Route to the other host */
921 struct in6_addr saddr; /* Source for tunnel */ 1023 struct in6_addr saddr; /* Source for tunnel */
922 struct net_device *tdev; /* Device to other host */ 1024 struct net_device *tdev; /* Device to other host */
923 struct ipv6hdr *old_iph = ipv6_hdr(skb); 1025 __u8 next_protocol = 0;
1026 __u32 payload_len = 0;
1027 __u8 dsfield = 0;
1028 __u8 ttl = 0;
924 struct ipv6hdr *iph; /* Our new IP header */ 1029 struct ipv6hdr *iph; /* Our new IP header */
925 unsigned int max_headroom; /* The extra header space needed */ 1030 unsigned int max_headroom; /* The extra header space needed */
926 int ret, local; 1031 int ret, local;
@@ -928,7 +1033,7 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
928 EnterFunction(10); 1033 EnterFunction(10);
929 1034
930 rcu_read_lock(); 1035 rcu_read_lock();
931 local = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6, 1036 local = __ip_vs_get_out_rt_v6(cp->af, skb, cp->dest, &cp->daddr.in6,
932 &saddr, ipvsh, 1, 1037 &saddr, ipvsh, 1,
933 IP_VS_RT_MODE_LOCAL | 1038 IP_VS_RT_MODE_LOCAL |
934 IP_VS_RT_MODE_NON_LOCAL | 1039 IP_VS_RT_MODE_NON_LOCAL |
@@ -948,19 +1053,14 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
948 */ 1053 */
949 max_headroom = LL_RESERVED_SPACE(tdev) + sizeof(struct ipv6hdr); 1054 max_headroom = LL_RESERVED_SPACE(tdev) + sizeof(struct ipv6hdr);
950 1055
951 if (skb_headroom(skb) < max_headroom || skb_cloned(skb)) { 1056 skb = ip_vs_prepare_tunneled_skb(skb, cp->af, max_headroom,
952 struct sk_buff *new_skb = 1057 &next_protocol, &payload_len,
953 skb_realloc_headroom(skb, max_headroom); 1058 &dsfield, &ttl, NULL);
954 1059 if (IS_ERR(skb))
955 if (!new_skb) 1060 goto tx_error;
956 goto tx_error;
957 consume_skb(skb);
958 skb = new_skb;
959 old_iph = ipv6_hdr(skb);
960 }
961 1061
962 /* GSO: we need to provide proper SKB_GSO_ value for IPv6 */ 1062 skb = iptunnel_handle_offloads(
963 skb = iptunnel_handle_offloads(skb, false, 0); /* SKB_GSO_SIT/IPV6 */ 1063 skb, false, __tun_gso_type_mask(AF_INET6, cp->af));
964 if (IS_ERR(skb)) 1064 if (IS_ERR(skb))
965 goto tx_error; 1065 goto tx_error;
966 1066
@@ -975,14 +1075,13 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
975 */ 1075 */
976 iph = ipv6_hdr(skb); 1076 iph = ipv6_hdr(skb);
977 iph->version = 6; 1077 iph->version = 6;
978 iph->nexthdr = IPPROTO_IPV6; 1078 iph->nexthdr = next_protocol;
979 iph->payload_len = old_iph->payload_len; 1079 iph->payload_len = htons(payload_len);
980 be16_add_cpu(&iph->payload_len, sizeof(*old_iph));
981 memset(&iph->flow_lbl, 0, sizeof(iph->flow_lbl)); 1080 memset(&iph->flow_lbl, 0, sizeof(iph->flow_lbl));
982 ipv6_change_dsfield(iph, 0, ipv6_get_dsfield(old_iph)); 1081 ipv6_change_dsfield(iph, 0, dsfield);
983 iph->daddr = cp->daddr.in6; 1082 iph->daddr = cp->daddr.in6;
984 iph->saddr = saddr; 1083 iph->saddr = saddr;
985 iph->hop_limit = old_iph->hop_limit; 1084 iph->hop_limit = ttl;
986 1085
987 /* Another hack: avoid icmp_send in ip_fragment */ 1086 /* Another hack: avoid icmp_send in ip_fragment */
988 skb->ignore_df = 1; 1087 skb->ignore_df = 1;
@@ -1021,10 +1120,10 @@ ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
1021 EnterFunction(10); 1120 EnterFunction(10);
1022 1121
1023 rcu_read_lock(); 1122 rcu_read_lock();
1024 local = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip, 1123 local = __ip_vs_get_out_rt(cp->af, skb, cp->dest, cp->daddr.ip,
1025 IP_VS_RT_MODE_LOCAL | 1124 IP_VS_RT_MODE_LOCAL |
1026 IP_VS_RT_MODE_NON_LOCAL | 1125 IP_VS_RT_MODE_NON_LOCAL |
1027 IP_VS_RT_MODE_KNOWN_NH, NULL); 1126 IP_VS_RT_MODE_KNOWN_NH, NULL, ipvsh);
1028 if (local < 0) 1127 if (local < 0)
1029 goto tx_error; 1128 goto tx_error;
1030 if (local) { 1129 if (local) {
@@ -1060,8 +1159,8 @@ ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
1060 EnterFunction(10); 1159 EnterFunction(10);
1061 1160
1062 rcu_read_lock(); 1161 rcu_read_lock();
1063 local = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6, NULL, 1162 local = __ip_vs_get_out_rt_v6(cp->af, skb, cp->dest, &cp->daddr.in6,
1064 ipvsh, 0, 1163 NULL, ipvsh, 0,
1065 IP_VS_RT_MODE_LOCAL | 1164 IP_VS_RT_MODE_LOCAL |
1066 IP_VS_RT_MODE_NON_LOCAL); 1165 IP_VS_RT_MODE_NON_LOCAL);
1067 if (local < 0) 1166 if (local < 0)
@@ -1128,7 +1227,8 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
1128 IP_VS_RT_MODE_LOCAL | IP_VS_RT_MODE_NON_LOCAL | 1227 IP_VS_RT_MODE_LOCAL | IP_VS_RT_MODE_NON_LOCAL |
1129 IP_VS_RT_MODE_RDR : IP_VS_RT_MODE_NON_LOCAL; 1228 IP_VS_RT_MODE_RDR : IP_VS_RT_MODE_NON_LOCAL;
1130 rcu_read_lock(); 1229 rcu_read_lock();
1131 local = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip, rt_mode, NULL); 1230 local = __ip_vs_get_out_rt(cp->af, skb, cp->dest, cp->daddr.ip, rt_mode,
1231 NULL, iph);
1132 if (local < 0) 1232 if (local < 0)
1133 goto tx_error; 1233 goto tx_error;
1134 rt = skb_rtable(skb); 1234 rt = skb_rtable(skb);
@@ -1219,8 +1319,8 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
1219 IP_VS_RT_MODE_LOCAL | IP_VS_RT_MODE_NON_LOCAL | 1319 IP_VS_RT_MODE_LOCAL | IP_VS_RT_MODE_NON_LOCAL |
1220 IP_VS_RT_MODE_RDR : IP_VS_RT_MODE_NON_LOCAL; 1320 IP_VS_RT_MODE_RDR : IP_VS_RT_MODE_NON_LOCAL;
1221 rcu_read_lock(); 1321 rcu_read_lock();
1222 local = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6, NULL, 1322 local = __ip_vs_get_out_rt_v6(cp->af, skb, cp->dest, &cp->daddr.in6,
1223 ipvsh, 0, rt_mode); 1323 NULL, ipvsh, 0, rt_mode);
1224 if (local < 0) 1324 if (local < 0)
1225 goto tx_error; 1325 goto tx_error;
1226 rt = (struct rt6_info *) skb_dst(skb); 1326 rt = (struct rt6_info *) skb_dst(skb);
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index de88c4ab5146..5016a6929085 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -142,7 +142,7 @@ static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple, u16 zone)
142 142
143static u32 __hash_bucket(u32 hash, unsigned int size) 143static u32 __hash_bucket(u32 hash, unsigned int size)
144{ 144{
145 return ((u64)hash * size) >> 32; 145 return reciprocal_scale(hash, size);
146} 146}
147 147
148static u32 hash_bucket(u32 hash, const struct net *net) 148static u32 hash_bucket(u32 hash, const struct net *net)
@@ -358,7 +358,7 @@ bool nf_ct_delete(struct nf_conn *ct, u32 portid, int report)
358 358
359 tstamp = nf_conn_tstamp_find(ct); 359 tstamp = nf_conn_tstamp_find(ct);
360 if (tstamp && tstamp->stop == 0) 360 if (tstamp && tstamp->stop == 0)
361 tstamp->stop = ktime_to_ns(ktime_get_real()); 361 tstamp->stop = ktime_get_real_ns();
362 362
363 if (nf_ct_is_dying(ct)) 363 if (nf_ct_is_dying(ct))
364 goto delete; 364 goto delete;
diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c
index f87e8f68ad45..91a1837acd0e 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -83,7 +83,8 @@ static unsigned int nf_ct_expect_dst_hash(const struct nf_conntrack_tuple *tuple
83 hash = jhash2(tuple->dst.u3.all, ARRAY_SIZE(tuple->dst.u3.all), 83 hash = jhash2(tuple->dst.u3.all, ARRAY_SIZE(tuple->dst.u3.all),
84 (((tuple->dst.protonum ^ tuple->src.l3num) << 16) | 84 (((tuple->dst.protonum ^ tuple->src.l3num) << 16) |
85 (__force __u16)tuple->dst.u.all) ^ nf_conntrack_hash_rnd); 85 (__force __u16)tuple->dst.u.all) ^ nf_conntrack_hash_rnd);
86 return ((u64)hash * nf_ct_expect_hsize) >> 32; 86
87 return reciprocal_scale(hash, nf_ct_expect_hsize);
87} 88}
88 89
89struct nf_conntrack_expect * 90struct nf_conntrack_expect *
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 355a5c4ef763..1bd9ed9e62f6 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1737,7 +1737,7 @@ ctnetlink_create_conntrack(struct net *net, u16 zone,
1737 } 1737 }
1738 tstamp = nf_conn_tstamp_find(ct); 1738 tstamp = nf_conn_tstamp_find(ct);
1739 if (tstamp) 1739 if (tstamp)
1740 tstamp->start = ktime_to_ns(ktime_get_real()); 1740 tstamp->start = ktime_get_real_ns();
1741 1741
1742 err = nf_conntrack_hash_check_insert(ct); 1742 err = nf_conntrack_hash_check_insert(ct);
1743 if (err < 0) 1743 if (err < 0)
diff --git a/net/netfilter/nf_conntrack_proto_generic.c b/net/netfilter/nf_conntrack_proto_generic.c
index d25f29377648..957c1db66652 100644
--- a/net/netfilter/nf_conntrack_proto_generic.c
+++ b/net/netfilter/nf_conntrack_proto_generic.c
@@ -14,6 +14,30 @@
14 14
15static unsigned int nf_ct_generic_timeout __read_mostly = 600*HZ; 15static unsigned int nf_ct_generic_timeout __read_mostly = 600*HZ;
16 16
17static bool nf_generic_should_process(u8 proto)
18{
19 switch (proto) {
20#ifdef CONFIG_NF_CT_PROTO_SCTP_MODULE
21 case IPPROTO_SCTP:
22 return false;
23#endif
24#ifdef CONFIG_NF_CT_PROTO_DCCP_MODULE
25 case IPPROTO_DCCP:
26 return false;
27#endif
28#ifdef CONFIG_NF_CT_PROTO_GRE_MODULE
29 case IPPROTO_GRE:
30 return false;
31#endif
32#ifdef CONFIG_NF_CT_PROTO_UDPLITE_MODULE
33 case IPPROTO_UDPLITE:
34 return false;
35#endif
36 default:
37 return true;
38 }
39}
40
17static inline struct nf_generic_net *generic_pernet(struct net *net) 41static inline struct nf_generic_net *generic_pernet(struct net *net)
18{ 42{
19 return &net->ct.nf_ct_proto.generic; 43 return &net->ct.nf_ct_proto.generic;
@@ -67,7 +91,7 @@ static int generic_packet(struct nf_conn *ct,
67static bool generic_new(struct nf_conn *ct, const struct sk_buff *skb, 91static bool generic_new(struct nf_conn *ct, const struct sk_buff *skb,
68 unsigned int dataoff, unsigned int *timeouts) 92 unsigned int dataoff, unsigned int *timeouts)
69{ 93{
70 return true; 94 return nf_generic_should_process(nf_ct_protonum(ct));
71} 95}
72 96
73#if IS_ENABLED(CONFIG_NF_CT_NETLINK_TIMEOUT) 97#if IS_ENABLED(CONFIG_NF_CT_NETLINK_TIMEOUT)
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index f641751dba9d..cf65a1e040dd 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -101,7 +101,7 @@ static void *ct_seq_start(struct seq_file *seq, loff_t *pos)
101{ 101{
102 struct ct_iter_state *st = seq->private; 102 struct ct_iter_state *st = seq->private;
103 103
104 st->time_now = ktime_to_ns(ktime_get_real()); 104 st->time_now = ktime_get_real_ns();
105 rcu_read_lock(); 105 rcu_read_lock();
106 return ct_get_idx(seq, *pos); 106 return ct_get_idx(seq, *pos);
107} 107}
diff --git a/net/netfilter/nf_log_common.c b/net/netfilter/nf_log_common.c
index eeb8ef4ff1a3..a2233e77cf39 100644
--- a/net/netfilter/nf_log_common.c
+++ b/net/netfilter/nf_log_common.c
@@ -158,7 +158,7 @@ nf_log_dump_packet_common(struct nf_log_buf *m, u_int8_t pf,
158 '0' + loginfo->u.log.level, prefix, 158 '0' + loginfo->u.log.level, prefix,
159 in ? in->name : "", 159 in ? in->name : "",
160 out ? out->name : ""); 160 out ? out->name : "");
161#ifdef CONFIG_BRIDGE_NETFILTER 161#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
162 if (skb->nf_bridge) { 162 if (skb->nf_bridge) {
163 const struct net_device *physindev; 163 const struct net_device *physindev;
164 const struct net_device *physoutdev; 164 const struct net_device *physoutdev;
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 552f97cd9fde..4e0b47831d43 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -126,7 +126,8 @@ hash_by_src(const struct net *net, u16 zone,
126 /* Original src, to ensure we map it consistently if poss. */ 126 /* Original src, to ensure we map it consistently if poss. */
127 hash = jhash2((u32 *)&tuple->src, sizeof(tuple->src) / sizeof(u32), 127 hash = jhash2((u32 *)&tuple->src, sizeof(tuple->src) / sizeof(u32),
128 tuple->dst.protonum ^ zone ^ nf_conntrack_hash_rnd); 128 tuple->dst.protonum ^ zone ^ nf_conntrack_hash_rnd);
129 return ((u64)hash * net->ct.nat_htable_size) >> 32; 129
130 return reciprocal_scale(hash, net->ct.nat_htable_size);
130} 131}
131 132
132/* Is this tuple already taken? (not by us) */ 133/* Is this tuple already taken? (not by us) */
@@ -274,7 +275,7 @@ find_best_ips_proto(u16 zone, struct nf_conntrack_tuple *tuple,
274 } 275 }
275 276
276 var_ipp->all[i] = (__force __u32) 277 var_ipp->all[i] = (__force __u32)
277 htonl(minip + (((u64)j * dist) >> 32)); 278 htonl(minip + reciprocal_scale(j, dist));
278 if (var_ipp->all[i] != range->max_addr.all[i]) 279 if (var_ipp->all[i] != range->max_addr.all[i])
279 full_range = true; 280 full_range = true;
280 281
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 5d24b1fdb593..4c8b68e5fa16 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -52,7 +52,7 @@ void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
52 dev_put(entry->indev); 52 dev_put(entry->indev);
53 if (entry->outdev) 53 if (entry->outdev)
54 dev_put(entry->outdev); 54 dev_put(entry->outdev);
55#ifdef CONFIG_BRIDGE_NETFILTER 55#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
56 if (entry->skb->nf_bridge) { 56 if (entry->skb->nf_bridge) {
57 struct nf_bridge_info *nf_bridge = entry->skb->nf_bridge; 57 struct nf_bridge_info *nf_bridge = entry->skb->nf_bridge;
58 58
@@ -77,7 +77,7 @@ bool nf_queue_entry_get_refs(struct nf_queue_entry *entry)
77 dev_hold(entry->indev); 77 dev_hold(entry->indev);
78 if (entry->outdev) 78 if (entry->outdev)
79 dev_hold(entry->outdev); 79 dev_hold(entry->outdev);
80#ifdef CONFIG_BRIDGE_NETFILTER 80#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
81 if (entry->skb->nf_bridge) { 81 if (entry->skb->nf_bridge) {
82 struct nf_bridge_info *nf_bridge = entry->skb->nf_bridge; 82 struct nf_bridge_info *nf_bridge = entry->skb->nf_bridge;
83 struct net_device *physdev; 83 struct net_device *physdev;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index deeb95fb7028..556a0dfa4abc 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -127,6 +127,204 @@ static void nft_trans_destroy(struct nft_trans *trans)
127 kfree(trans); 127 kfree(trans);
128} 128}
129 129
130static void nf_tables_unregister_hooks(const struct nft_table *table,
131 const struct nft_chain *chain,
132 unsigned int hook_nops)
133{
134 if (!(table->flags & NFT_TABLE_F_DORMANT) &&
135 chain->flags & NFT_BASE_CHAIN)
136 nf_unregister_hooks(nft_base_chain(chain)->ops, hook_nops);
137}
138
139/* Internal table flags */
140#define NFT_TABLE_INACTIVE (1 << 15)
141
142static int nft_trans_table_add(struct nft_ctx *ctx, int msg_type)
143{
144 struct nft_trans *trans;
145
146 trans = nft_trans_alloc(ctx, msg_type, sizeof(struct nft_trans_table));
147 if (trans == NULL)
148 return -ENOMEM;
149
150 if (msg_type == NFT_MSG_NEWTABLE)
151 ctx->table->flags |= NFT_TABLE_INACTIVE;
152
153 list_add_tail(&trans->list, &ctx->net->nft.commit_list);
154 return 0;
155}
156
157static int nft_deltable(struct nft_ctx *ctx)
158{
159 int err;
160
161 err = nft_trans_table_add(ctx, NFT_MSG_DELTABLE);
162 if (err < 0)
163 return err;
164
165 list_del_rcu(&ctx->table->list);
166 return err;
167}
168
169static int nft_trans_chain_add(struct nft_ctx *ctx, int msg_type)
170{
171 struct nft_trans *trans;
172
173 trans = nft_trans_alloc(ctx, msg_type, sizeof(struct nft_trans_chain));
174 if (trans == NULL)
175 return -ENOMEM;
176
177 if (msg_type == NFT_MSG_NEWCHAIN)
178 ctx->chain->flags |= NFT_CHAIN_INACTIVE;
179
180 list_add_tail(&trans->list, &ctx->net->nft.commit_list);
181 return 0;
182}
183
184static int nft_delchain(struct nft_ctx *ctx)
185{
186 int err;
187
188 err = nft_trans_chain_add(ctx, NFT_MSG_DELCHAIN);
189 if (err < 0)
190 return err;
191
192 ctx->table->use--;
193 list_del_rcu(&ctx->chain->list);
194
195 return err;
196}
197
198static inline bool
199nft_rule_is_active(struct net *net, const struct nft_rule *rule)
200{
201 return (rule->genmask & (1 << net->nft.gencursor)) == 0;
202}
203
204static inline int gencursor_next(struct net *net)
205{
206 return net->nft.gencursor+1 == 1 ? 1 : 0;
207}
208
209static inline int
210nft_rule_is_active_next(struct net *net, const struct nft_rule *rule)
211{
212 return (rule->genmask & (1 << gencursor_next(net))) == 0;
213}
214
215static inline void
216nft_rule_activate_next(struct net *net, struct nft_rule *rule)
217{
218 /* Now inactive, will be active in the future */
219 rule->genmask = (1 << net->nft.gencursor);
220}
221
222static inline void
223nft_rule_deactivate_next(struct net *net, struct nft_rule *rule)
224{
225 rule->genmask = (1 << gencursor_next(net));
226}
227
228static inline void nft_rule_clear(struct net *net, struct nft_rule *rule)
229{
230 rule->genmask = 0;
231}
232
233static int
234nf_tables_delrule_deactivate(struct nft_ctx *ctx, struct nft_rule *rule)
235{
236 /* You cannot delete the same rule twice */
237 if (nft_rule_is_active_next(ctx->net, rule)) {
238 nft_rule_deactivate_next(ctx->net, rule);
239 ctx->chain->use--;
240 return 0;
241 }
242 return -ENOENT;
243}
244
245static struct nft_trans *nft_trans_rule_add(struct nft_ctx *ctx, int msg_type,
246 struct nft_rule *rule)
247{
248 struct nft_trans *trans;
249
250 trans = nft_trans_alloc(ctx, msg_type, sizeof(struct nft_trans_rule));
251 if (trans == NULL)
252 return NULL;
253
254 nft_trans_rule(trans) = rule;
255 list_add_tail(&trans->list, &ctx->net->nft.commit_list);
256
257 return trans;
258}
259
260static int nft_delrule(struct nft_ctx *ctx, struct nft_rule *rule)
261{
262 struct nft_trans *trans;
263 int err;
264
265 trans = nft_trans_rule_add(ctx, NFT_MSG_DELRULE, rule);
266 if (trans == NULL)
267 return -ENOMEM;
268
269 err = nf_tables_delrule_deactivate(ctx, rule);
270 if (err < 0) {
271 nft_trans_destroy(trans);
272 return err;
273 }
274
275 return 0;
276}
277
278static int nft_delrule_by_chain(struct nft_ctx *ctx)
279{
280 struct nft_rule *rule;
281 int err;
282
283 list_for_each_entry(rule, &ctx->chain->rules, list) {
284 err = nft_delrule(ctx, rule);
285 if (err < 0)
286 return err;
287 }
288 return 0;
289}
290
291/* Internal set flag */
292#define NFT_SET_INACTIVE (1 << 15)
293
294static int nft_trans_set_add(struct nft_ctx *ctx, int msg_type,
295 struct nft_set *set)
296{
297 struct nft_trans *trans;
298
299 trans = nft_trans_alloc(ctx, msg_type, sizeof(struct nft_trans_set));
300 if (trans == NULL)
301 return -ENOMEM;
302
303 if (msg_type == NFT_MSG_NEWSET && ctx->nla[NFTA_SET_ID] != NULL) {
304 nft_trans_set_id(trans) =
305 ntohl(nla_get_be32(ctx->nla[NFTA_SET_ID]));
306 set->flags |= NFT_SET_INACTIVE;
307 }
308 nft_trans_set(trans) = set;
309 list_add_tail(&trans->list, &ctx->net->nft.commit_list);
310
311 return 0;
312}
313
314static int nft_delset(struct nft_ctx *ctx, struct nft_set *set)
315{
316 int err;
317
318 err = nft_trans_set_add(ctx, NFT_MSG_DELSET, set);
319 if (err < 0)
320 return err;
321
322 list_del_rcu(&set->list);
323 ctx->table->use--;
324
325 return err;
326}
327
130/* 328/*
131 * Tables 329 * Tables
132 */ 330 */
@@ -207,9 +405,9 @@ static const struct nla_policy nft_table_policy[NFTA_TABLE_MAX + 1] = {
207 [NFTA_TABLE_FLAGS] = { .type = NLA_U32 }, 405 [NFTA_TABLE_FLAGS] = { .type = NLA_U32 },
208}; 406};
209 407
210static int nf_tables_fill_table_info(struct sk_buff *skb, u32 portid, u32 seq, 408static int nf_tables_fill_table_info(struct sk_buff *skb, struct net *net,
211 int event, u32 flags, int family, 409 u32 portid, u32 seq, int event, u32 flags,
212 const struct nft_table *table) 410 int family, const struct nft_table *table)
213{ 411{
214 struct nlmsghdr *nlh; 412 struct nlmsghdr *nlh;
215 struct nfgenmsg *nfmsg; 413 struct nfgenmsg *nfmsg;
@@ -222,7 +420,7 @@ static int nf_tables_fill_table_info(struct sk_buff *skb, u32 portid, u32 seq,
222 nfmsg = nlmsg_data(nlh); 420 nfmsg = nlmsg_data(nlh);
223 nfmsg->nfgen_family = family; 421 nfmsg->nfgen_family = family;
224 nfmsg->version = NFNETLINK_V0; 422 nfmsg->version = NFNETLINK_V0;
225 nfmsg->res_id = 0; 423 nfmsg->res_id = htons(net->nft.base_seq & 0xffff);
226 424
227 if (nla_put_string(skb, NFTA_TABLE_NAME, table->name) || 425 if (nla_put_string(skb, NFTA_TABLE_NAME, table->name) ||
228 nla_put_be32(skb, NFTA_TABLE_FLAGS, htonl(table->flags)) || 426 nla_put_be32(skb, NFTA_TABLE_FLAGS, htonl(table->flags)) ||
@@ -250,8 +448,8 @@ static int nf_tables_table_notify(const struct nft_ctx *ctx, int event)
250 if (skb == NULL) 448 if (skb == NULL)
251 goto err; 449 goto err;
252 450
253 err = nf_tables_fill_table_info(skb, ctx->portid, ctx->seq, event, 0, 451 err = nf_tables_fill_table_info(skb, ctx->net, ctx->portid, ctx->seq,
254 ctx->afi->family, ctx->table); 452 event, 0, ctx->afi->family, ctx->table);
255 if (err < 0) { 453 if (err < 0) {
256 kfree_skb(skb); 454 kfree_skb(skb);
257 goto err; 455 goto err;
@@ -290,7 +488,7 @@ static int nf_tables_dump_tables(struct sk_buff *skb,
290 if (idx > s_idx) 488 if (idx > s_idx)
291 memset(&cb->args[1], 0, 489 memset(&cb->args[1], 0,
292 sizeof(cb->args) - sizeof(cb->args[0])); 490 sizeof(cb->args) - sizeof(cb->args[0]));
293 if (nf_tables_fill_table_info(skb, 491 if (nf_tables_fill_table_info(skb, net,
294 NETLINK_CB(cb->skb).portid, 492 NETLINK_CB(cb->skb).portid,
295 cb->nlh->nlmsg_seq, 493 cb->nlh->nlmsg_seq,
296 NFT_MSG_NEWTABLE, 494 NFT_MSG_NEWTABLE,
@@ -309,9 +507,6 @@ done:
309 return skb->len; 507 return skb->len;
310} 508}
311 509
312/* Internal table flags */
313#define NFT_TABLE_INACTIVE (1 << 15)
314
315static int nf_tables_gettable(struct sock *nlsk, struct sk_buff *skb, 510static int nf_tables_gettable(struct sock *nlsk, struct sk_buff *skb,
316 const struct nlmsghdr *nlh, 511 const struct nlmsghdr *nlh,
317 const struct nlattr * const nla[]) 512 const struct nlattr * const nla[])
@@ -345,7 +540,7 @@ static int nf_tables_gettable(struct sock *nlsk, struct sk_buff *skb,
345 if (!skb2) 540 if (!skb2)
346 return -ENOMEM; 541 return -ENOMEM;
347 542
348 err = nf_tables_fill_table_info(skb2, NETLINK_CB(skb).portid, 543 err = nf_tables_fill_table_info(skb2, net, NETLINK_CB(skb).portid,
349 nlh->nlmsg_seq, NFT_MSG_NEWTABLE, 0, 544 nlh->nlmsg_seq, NFT_MSG_NEWTABLE, 0,
350 family, table); 545 family, table);
351 if (err < 0) 546 if (err < 0)
@@ -443,21 +638,6 @@ err:
443 return ret; 638 return ret;
444} 639}
445 640
446static int nft_trans_table_add(struct nft_ctx *ctx, int msg_type)
447{
448 struct nft_trans *trans;
449
450 trans = nft_trans_alloc(ctx, msg_type, sizeof(struct nft_trans_table));
451 if (trans == NULL)
452 return -ENOMEM;
453
454 if (msg_type == NFT_MSG_NEWTABLE)
455 ctx->table->flags |= NFT_TABLE_INACTIVE;
456
457 list_add_tail(&trans->list, &ctx->net->nft.commit_list);
458 return 0;
459}
460
461static int nf_tables_newtable(struct sock *nlsk, struct sk_buff *skb, 641static int nf_tables_newtable(struct sock *nlsk, struct sk_buff *skb,
462 const struct nlmsghdr *nlh, 642 const struct nlmsghdr *nlh,
463 const struct nlattr * const nla[]) 643 const struct nlattr * const nla[])
@@ -527,6 +707,67 @@ static int nf_tables_newtable(struct sock *nlsk, struct sk_buff *skb,
527 return 0; 707 return 0;
528} 708}
529 709
710static int nft_flush_table(struct nft_ctx *ctx)
711{
712 int err;
713 struct nft_chain *chain, *nc;
714 struct nft_set *set, *ns;
715
716 list_for_each_entry_safe(chain, nc, &ctx->table->chains, list) {
717 ctx->chain = chain;
718
719 err = nft_delrule_by_chain(ctx);
720 if (err < 0)
721 goto out;
722
723 err = nft_delchain(ctx);
724 if (err < 0)
725 goto out;
726 }
727
728 list_for_each_entry_safe(set, ns, &ctx->table->sets, list) {
729 if (set->flags & NFT_SET_ANONYMOUS &&
730 !list_empty(&set->bindings))
731 continue;
732
733 err = nft_delset(ctx, set);
734 if (err < 0)
735 goto out;
736 }
737
738 err = nft_deltable(ctx);
739out:
740 return err;
741}
742
743static int nft_flush(struct nft_ctx *ctx, int family)
744{
745 struct nft_af_info *afi;
746 struct nft_table *table, *nt;
747 const struct nlattr * const *nla = ctx->nla;
748 int err = 0;
749
750 list_for_each_entry(afi, &ctx->net->nft.af_info, list) {
751 if (family != AF_UNSPEC && afi->family != family)
752 continue;
753
754 ctx->afi = afi;
755 list_for_each_entry_safe(table, nt, &afi->tables, list) {
756 if (nla[NFTA_TABLE_NAME] &&
757 nla_strcmp(nla[NFTA_TABLE_NAME], table->name) != 0)
758 continue;
759
760 ctx->table = table;
761
762 err = nft_flush_table(ctx);
763 if (err < 0)
764 goto out;
765 }
766 }
767out:
768 return err;
769}
770
530static int nf_tables_deltable(struct sock *nlsk, struct sk_buff *skb, 771static int nf_tables_deltable(struct sock *nlsk, struct sk_buff *skb,
531 const struct nlmsghdr *nlh, 772 const struct nlmsghdr *nlh,
532 const struct nlattr * const nla[]) 773 const struct nlattr * const nla[])
@@ -535,9 +776,13 @@ static int nf_tables_deltable(struct sock *nlsk, struct sk_buff *skb,
535 struct nft_af_info *afi; 776 struct nft_af_info *afi;
536 struct nft_table *table; 777 struct nft_table *table;
537 struct net *net = sock_net(skb->sk); 778 struct net *net = sock_net(skb->sk);
538 int family = nfmsg->nfgen_family, err; 779 int family = nfmsg->nfgen_family;
539 struct nft_ctx ctx; 780 struct nft_ctx ctx;
540 781
782 nft_ctx_init(&ctx, skb, nlh, NULL, NULL, NULL, nla);
783 if (family == AF_UNSPEC || nla[NFTA_TABLE_NAME] == NULL)
784 return nft_flush(&ctx, family);
785
541 afi = nf_tables_afinfo_lookup(net, family, false); 786 afi = nf_tables_afinfo_lookup(net, family, false);
542 if (IS_ERR(afi)) 787 if (IS_ERR(afi))
543 return PTR_ERR(afi); 788 return PTR_ERR(afi);
@@ -547,16 +792,11 @@ static int nf_tables_deltable(struct sock *nlsk, struct sk_buff *skb,
547 return PTR_ERR(table); 792 return PTR_ERR(table);
548 if (table->flags & NFT_TABLE_INACTIVE) 793 if (table->flags & NFT_TABLE_INACTIVE)
549 return -ENOENT; 794 return -ENOENT;
550 if (table->use > 0)
551 return -EBUSY;
552 795
553 nft_ctx_init(&ctx, skb, nlh, afi, table, NULL, nla); 796 ctx.afi = afi;
554 err = nft_trans_table_add(&ctx, NFT_MSG_DELTABLE); 797 ctx.table = table;
555 if (err < 0)
556 return err;
557 798
558 list_del_rcu(&table->list); 799 return nft_flush_table(&ctx);
559 return 0;
560} 800}
561 801
562static void nf_tables_table_destroy(struct nft_ctx *ctx) 802static void nf_tables_table_destroy(struct nft_ctx *ctx)
@@ -674,9 +914,9 @@ nla_put_failure:
674 return -ENOSPC; 914 return -ENOSPC;
675} 915}
676 916
677static int nf_tables_fill_chain_info(struct sk_buff *skb, u32 portid, u32 seq, 917static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net,
678 int event, u32 flags, int family, 918 u32 portid, u32 seq, int event, u32 flags,
679 const struct nft_table *table, 919 int family, const struct nft_table *table,
680 const struct nft_chain *chain) 920 const struct nft_chain *chain)
681{ 921{
682 struct nlmsghdr *nlh; 922 struct nlmsghdr *nlh;
@@ -690,7 +930,7 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, u32 portid, u32 seq,
690 nfmsg = nlmsg_data(nlh); 930 nfmsg = nlmsg_data(nlh);
691 nfmsg->nfgen_family = family; 931 nfmsg->nfgen_family = family;
692 nfmsg->version = NFNETLINK_V0; 932 nfmsg->version = NFNETLINK_V0;
693 nfmsg->res_id = 0; 933 nfmsg->res_id = htons(net->nft.base_seq & 0xffff);
694 934
695 if (nla_put_string(skb, NFTA_CHAIN_TABLE, table->name)) 935 if (nla_put_string(skb, NFTA_CHAIN_TABLE, table->name))
696 goto nla_put_failure; 936 goto nla_put_failure;
@@ -748,8 +988,8 @@ static int nf_tables_chain_notify(const struct nft_ctx *ctx, int event)
748 if (skb == NULL) 988 if (skb == NULL)
749 goto err; 989 goto err;
750 990
751 err = nf_tables_fill_chain_info(skb, ctx->portid, ctx->seq, event, 0, 991 err = nf_tables_fill_chain_info(skb, ctx->net, ctx->portid, ctx->seq,
752 ctx->afi->family, ctx->table, 992 event, 0, ctx->afi->family, ctx->table,
753 ctx->chain); 993 ctx->chain);
754 if (err < 0) { 994 if (err < 0) {
755 kfree_skb(skb); 995 kfree_skb(skb);
@@ -791,7 +1031,8 @@ static int nf_tables_dump_chains(struct sk_buff *skb,
791 if (idx > s_idx) 1031 if (idx > s_idx)
792 memset(&cb->args[1], 0, 1032 memset(&cb->args[1], 0,
793 sizeof(cb->args) - sizeof(cb->args[0])); 1033 sizeof(cb->args) - sizeof(cb->args[0]));
794 if (nf_tables_fill_chain_info(skb, NETLINK_CB(cb->skb).portid, 1034 if (nf_tables_fill_chain_info(skb, net,
1035 NETLINK_CB(cb->skb).portid,
795 cb->nlh->nlmsg_seq, 1036 cb->nlh->nlmsg_seq,
796 NFT_MSG_NEWCHAIN, 1037 NFT_MSG_NEWCHAIN,
797 NLM_F_MULTI, 1038 NLM_F_MULTI,
@@ -850,7 +1091,7 @@ static int nf_tables_getchain(struct sock *nlsk, struct sk_buff *skb,
850 if (!skb2) 1091 if (!skb2)
851 return -ENOMEM; 1092 return -ENOMEM;
852 1093
853 err = nf_tables_fill_chain_info(skb2, NETLINK_CB(skb).portid, 1094 err = nf_tables_fill_chain_info(skb2, net, NETLINK_CB(skb).portid,
854 nlh->nlmsg_seq, NFT_MSG_NEWCHAIN, 0, 1095 nlh->nlmsg_seq, NFT_MSG_NEWCHAIN, 0,
855 family, table, chain); 1096 family, table, chain);
856 if (err < 0) 1097 if (err < 0)
@@ -913,21 +1154,6 @@ static void nft_chain_stats_replace(struct nft_base_chain *chain,
913 rcu_assign_pointer(chain->stats, newstats); 1154 rcu_assign_pointer(chain->stats, newstats);
914} 1155}
915 1156
916static int nft_trans_chain_add(struct nft_ctx *ctx, int msg_type)
917{
918 struct nft_trans *trans;
919
920 trans = nft_trans_alloc(ctx, msg_type, sizeof(struct nft_trans_chain));
921 if (trans == NULL)
922 return -ENOMEM;
923
924 if (msg_type == NFT_MSG_NEWCHAIN)
925 ctx->chain->flags |= NFT_CHAIN_INACTIVE;
926
927 list_add_tail(&trans->list, &ctx->net->nft.commit_list);
928 return 0;
929}
930
931static void nf_tables_chain_destroy(struct nft_chain *chain) 1157static void nf_tables_chain_destroy(struct nft_chain *chain)
932{ 1158{
933 BUG_ON(chain->use > 0); 1159 BUG_ON(chain->use > 0);
@@ -1157,11 +1383,7 @@ static int nf_tables_newchain(struct sock *nlsk, struct sk_buff *skb,
1157 list_add_tail_rcu(&chain->list, &table->chains); 1383 list_add_tail_rcu(&chain->list, &table->chains);
1158 return 0; 1384 return 0;
1159err2: 1385err2:
1160 if (!(table->flags & NFT_TABLE_F_DORMANT) && 1386 nf_tables_unregister_hooks(table, chain, afi->nops);
1161 chain->flags & NFT_BASE_CHAIN) {
1162 nf_unregister_hooks(nft_base_chain(chain)->ops,
1163 afi->nops);
1164 }
1165err1: 1387err1:
1166 nf_tables_chain_destroy(chain); 1388 nf_tables_chain_destroy(chain);
1167 return err; 1389 return err;
@@ -1178,7 +1400,6 @@ static int nf_tables_delchain(struct sock *nlsk, struct sk_buff *skb,
1178 struct net *net = sock_net(skb->sk); 1400 struct net *net = sock_net(skb->sk);
1179 int family = nfmsg->nfgen_family; 1401 int family = nfmsg->nfgen_family;
1180 struct nft_ctx ctx; 1402 struct nft_ctx ctx;
1181 int err;
1182 1403
1183 afi = nf_tables_afinfo_lookup(net, family, false); 1404 afi = nf_tables_afinfo_lookup(net, family, false);
1184 if (IS_ERR(afi)) 1405 if (IS_ERR(afi))
@@ -1199,13 +1420,8 @@ static int nf_tables_delchain(struct sock *nlsk, struct sk_buff *skb,
1199 return -EBUSY; 1420 return -EBUSY;
1200 1421
1201 nft_ctx_init(&ctx, skb, nlh, afi, table, chain, nla); 1422 nft_ctx_init(&ctx, skb, nlh, afi, table, chain, nla);
1202 err = nft_trans_chain_add(&ctx, NFT_MSG_DELCHAIN);
1203 if (err < 0)
1204 return err;
1205 1423
1206 table->use--; 1424 return nft_delchain(&ctx);
1207 list_del_rcu(&chain->list);
1208 return 0;
1209} 1425}
1210 1426
1211/* 1427/*
@@ -1432,8 +1648,9 @@ static const struct nla_policy nft_rule_policy[NFTA_RULE_MAX + 1] = {
1432 .len = NFT_USERDATA_MAXLEN }, 1648 .len = NFT_USERDATA_MAXLEN },
1433}; 1649};
1434 1650
1435static int nf_tables_fill_rule_info(struct sk_buff *skb, u32 portid, u32 seq, 1651static int nf_tables_fill_rule_info(struct sk_buff *skb, struct net *net,
1436 int event, u32 flags, int family, 1652 u32 portid, u32 seq, int event,
1653 u32 flags, int family,
1437 const struct nft_table *table, 1654 const struct nft_table *table,
1438 const struct nft_chain *chain, 1655 const struct nft_chain *chain,
1439 const struct nft_rule *rule) 1656 const struct nft_rule *rule)
@@ -1453,7 +1670,7 @@ static int nf_tables_fill_rule_info(struct sk_buff *skb, u32 portid, u32 seq,
1453 nfmsg = nlmsg_data(nlh); 1670 nfmsg = nlmsg_data(nlh);
1454 nfmsg->nfgen_family = family; 1671 nfmsg->nfgen_family = family;
1455 nfmsg->version = NFNETLINK_V0; 1672 nfmsg->version = NFNETLINK_V0;
1456 nfmsg->res_id = 0; 1673 nfmsg->res_id = htons(net->nft.base_seq & 0xffff);
1457 1674
1458 if (nla_put_string(skb, NFTA_RULE_TABLE, table->name)) 1675 if (nla_put_string(skb, NFTA_RULE_TABLE, table->name))
1459 goto nla_put_failure; 1676 goto nla_put_failure;
@@ -1509,8 +1726,8 @@ static int nf_tables_rule_notify(const struct nft_ctx *ctx,
1509 if (skb == NULL) 1726 if (skb == NULL)
1510 goto err; 1727 goto err;
1511 1728
1512 err = nf_tables_fill_rule_info(skb, ctx->portid, ctx->seq, event, 0, 1729 err = nf_tables_fill_rule_info(skb, ctx->net, ctx->portid, ctx->seq,
1513 ctx->afi->family, ctx->table, 1730 event, 0, ctx->afi->family, ctx->table,
1514 ctx->chain, rule); 1731 ctx->chain, rule);
1515 if (err < 0) { 1732 if (err < 0) {
1516 kfree_skb(skb); 1733 kfree_skb(skb);
@@ -1527,41 +1744,6 @@ err:
1527 return err; 1744 return err;
1528} 1745}
1529 1746
1530static inline bool
1531nft_rule_is_active(struct net *net, const struct nft_rule *rule)
1532{
1533 return (rule->genmask & (1 << net->nft.gencursor)) == 0;
1534}
1535
1536static inline int gencursor_next(struct net *net)
1537{
1538 return net->nft.gencursor+1 == 1 ? 1 : 0;
1539}
1540
1541static inline int
1542nft_rule_is_active_next(struct net *net, const struct nft_rule *rule)
1543{
1544 return (rule->genmask & (1 << gencursor_next(net))) == 0;
1545}
1546
1547static inline void
1548nft_rule_activate_next(struct net *net, struct nft_rule *rule)
1549{
1550 /* Now inactive, will be active in the future */
1551 rule->genmask = (1 << net->nft.gencursor);
1552}
1553
1554static inline void
1555nft_rule_disactivate_next(struct net *net, struct nft_rule *rule)
1556{
1557 rule->genmask = (1 << gencursor_next(net));
1558}
1559
1560static inline void nft_rule_clear(struct net *net, struct nft_rule *rule)
1561{
1562 rule->genmask = 0;
1563}
1564
1565static int nf_tables_dump_rules(struct sk_buff *skb, 1747static int nf_tables_dump_rules(struct sk_buff *skb,
1566 struct netlink_callback *cb) 1748 struct netlink_callback *cb)
1567{ 1749{
@@ -1591,7 +1773,7 @@ static int nf_tables_dump_rules(struct sk_buff *skb,
1591 if (idx > s_idx) 1773 if (idx > s_idx)
1592 memset(&cb->args[1], 0, 1774 memset(&cb->args[1], 0,
1593 sizeof(cb->args) - sizeof(cb->args[0])); 1775 sizeof(cb->args) - sizeof(cb->args[0]));
1594 if (nf_tables_fill_rule_info(skb, NETLINK_CB(cb->skb).portid, 1776 if (nf_tables_fill_rule_info(skb, net, NETLINK_CB(cb->skb).portid,
1595 cb->nlh->nlmsg_seq, 1777 cb->nlh->nlmsg_seq,
1596 NFT_MSG_NEWRULE, 1778 NFT_MSG_NEWRULE,
1597 NLM_F_MULTI | NLM_F_APPEND, 1779 NLM_F_MULTI | NLM_F_APPEND,
@@ -1657,7 +1839,7 @@ static int nf_tables_getrule(struct sock *nlsk, struct sk_buff *skb,
1657 if (!skb2) 1839 if (!skb2)
1658 return -ENOMEM; 1840 return -ENOMEM;
1659 1841
1660 err = nf_tables_fill_rule_info(skb2, NETLINK_CB(skb).portid, 1842 err = nf_tables_fill_rule_info(skb2, net, NETLINK_CB(skb).portid,
1661 nlh->nlmsg_seq, NFT_MSG_NEWRULE, 0, 1843 nlh->nlmsg_seq, NFT_MSG_NEWRULE, 0,
1662 family, table, chain, rule); 1844 family, table, chain, rule);
1663 if (err < 0) 1845 if (err < 0)
@@ -1687,21 +1869,6 @@ static void nf_tables_rule_destroy(const struct nft_ctx *ctx,
1687 kfree(rule); 1869 kfree(rule);
1688} 1870}
1689 1871
1690static struct nft_trans *nft_trans_rule_add(struct nft_ctx *ctx, int msg_type,
1691 struct nft_rule *rule)
1692{
1693 struct nft_trans *trans;
1694
1695 trans = nft_trans_alloc(ctx, msg_type, sizeof(struct nft_trans_rule));
1696 if (trans == NULL)
1697 return NULL;
1698
1699 nft_trans_rule(trans) = rule;
1700 list_add_tail(&trans->list, &ctx->net->nft.commit_list);
1701
1702 return trans;
1703}
1704
1705#define NFT_RULE_MAXEXPRS 128 1872#define NFT_RULE_MAXEXPRS 128
1706 1873
1707static struct nft_expr_info *info; 1874static struct nft_expr_info *info;
@@ -1823,7 +1990,7 @@ static int nf_tables_newrule(struct sock *nlsk, struct sk_buff *skb,
1823 err = -ENOMEM; 1990 err = -ENOMEM;
1824 goto err2; 1991 goto err2;
1825 } 1992 }
1826 nft_rule_disactivate_next(net, old_rule); 1993 nft_rule_deactivate_next(net, old_rule);
1827 chain->use--; 1994 chain->use--;
1828 list_add_tail_rcu(&rule->list, &old_rule->list); 1995 list_add_tail_rcu(&rule->list, &old_rule->list);
1829 } else { 1996 } else {
@@ -1867,33 +2034,6 @@ err1:
1867 return err; 2034 return err;
1868} 2035}
1869 2036
1870static int
1871nf_tables_delrule_one(struct nft_ctx *ctx, struct nft_rule *rule)
1872{
1873 /* You cannot delete the same rule twice */
1874 if (nft_rule_is_active_next(ctx->net, rule)) {
1875 if (nft_trans_rule_add(ctx, NFT_MSG_DELRULE, rule) == NULL)
1876 return -ENOMEM;
1877 nft_rule_disactivate_next(ctx->net, rule);
1878 ctx->chain->use--;
1879 return 0;
1880 }
1881 return -ENOENT;
1882}
1883
1884static int nf_table_delrule_by_chain(struct nft_ctx *ctx)
1885{
1886 struct nft_rule *rule;
1887 int err;
1888
1889 list_for_each_entry(rule, &ctx->chain->rules, list) {
1890 err = nf_tables_delrule_one(ctx, rule);
1891 if (err < 0)
1892 return err;
1893 }
1894 return 0;
1895}
1896
1897static int nf_tables_delrule(struct sock *nlsk, struct sk_buff *skb, 2037static int nf_tables_delrule(struct sock *nlsk, struct sk_buff *skb,
1898 const struct nlmsghdr *nlh, 2038 const struct nlmsghdr *nlh,
1899 const struct nlattr * const nla[]) 2039 const struct nlattr * const nla[])
@@ -1932,14 +2072,14 @@ static int nf_tables_delrule(struct sock *nlsk, struct sk_buff *skb,
1932 if (IS_ERR(rule)) 2072 if (IS_ERR(rule))
1933 return PTR_ERR(rule); 2073 return PTR_ERR(rule);
1934 2074
1935 err = nf_tables_delrule_one(&ctx, rule); 2075 err = nft_delrule(&ctx, rule);
1936 } else { 2076 } else {
1937 err = nf_table_delrule_by_chain(&ctx); 2077 err = nft_delrule_by_chain(&ctx);
1938 } 2078 }
1939 } else { 2079 } else {
1940 list_for_each_entry(chain, &table->chains, list) { 2080 list_for_each_entry(chain, &table->chains, list) {
1941 ctx.chain = chain; 2081 ctx.chain = chain;
1942 err = nf_table_delrule_by_chain(&ctx); 2082 err = nft_delrule_by_chain(&ctx);
1943 if (err < 0) 2083 if (err < 0)
1944 break; 2084 break;
1945 } 2085 }
@@ -2183,7 +2323,7 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx,
2183 nfmsg = nlmsg_data(nlh); 2323 nfmsg = nlmsg_data(nlh);
2184 nfmsg->nfgen_family = ctx->afi->family; 2324 nfmsg->nfgen_family = ctx->afi->family;
2185 nfmsg->version = NFNETLINK_V0; 2325 nfmsg->version = NFNETLINK_V0;
2186 nfmsg->res_id = 0; 2326 nfmsg->res_id = htons(ctx->net->nft.base_seq & 0xffff);
2187 2327
2188 if (nla_put_string(skb, NFTA_SET_TABLE, ctx->table->name)) 2328 if (nla_put_string(skb, NFTA_SET_TABLE, ctx->table->name))
2189 goto nla_put_failure; 2329 goto nla_put_failure;
@@ -2204,6 +2344,11 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx,
2204 goto nla_put_failure; 2344 goto nla_put_failure;
2205 } 2345 }
2206 2346
2347 if (set->policy != NFT_SET_POL_PERFORMANCE) {
2348 if (nla_put_be32(skb, NFTA_SET_POLICY, htonl(set->policy)))
2349 goto nla_put_failure;
2350 }
2351
2207 desc = nla_nest_start(skb, NFTA_SET_DESC); 2352 desc = nla_nest_start(skb, NFTA_SET_DESC);
2208 if (desc == NULL) 2353 if (desc == NULL)
2209 goto nla_put_failure; 2354 goto nla_put_failure;
@@ -2322,8 +2467,6 @@ static int nf_tables_dump_sets_done(struct netlink_callback *cb)
2322 return 0; 2467 return 0;
2323} 2468}
2324 2469
2325#define NFT_SET_INACTIVE (1 << 15) /* Internal set flag */
2326
2327static int nf_tables_getset(struct sock *nlsk, struct sk_buff *skb, 2470static int nf_tables_getset(struct sock *nlsk, struct sk_buff *skb,
2328 const struct nlmsghdr *nlh, 2471 const struct nlmsghdr *nlh,
2329 const struct nlattr * const nla[]) 2472 const struct nlattr * const nla[])
@@ -2398,26 +2541,6 @@ static int nf_tables_set_desc_parse(const struct nft_ctx *ctx,
2398 return 0; 2541 return 0;
2399} 2542}
2400 2543
2401static int nft_trans_set_add(struct nft_ctx *ctx, int msg_type,
2402 struct nft_set *set)
2403{
2404 struct nft_trans *trans;
2405
2406 trans = nft_trans_alloc(ctx, msg_type, sizeof(struct nft_trans_set));
2407 if (trans == NULL)
2408 return -ENOMEM;
2409
2410 if (msg_type == NFT_MSG_NEWSET && ctx->nla[NFTA_SET_ID] != NULL) {
2411 nft_trans_set_id(trans) =
2412 ntohl(nla_get_be32(ctx->nla[NFTA_SET_ID]));
2413 set->flags |= NFT_SET_INACTIVE;
2414 }
2415 nft_trans_set(trans) = set;
2416 list_add_tail(&trans->list, &ctx->net->nft.commit_list);
2417
2418 return 0;
2419}
2420
2421static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb, 2544static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
2422 const struct nlmsghdr *nlh, 2545 const struct nlmsghdr *nlh,
2423 const struct nlattr * const nla[]) 2546 const struct nlattr * const nla[])
@@ -2551,6 +2674,7 @@ static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb,
2551 set->dlen = desc.dlen; 2674 set->dlen = desc.dlen;
2552 set->flags = flags; 2675 set->flags = flags;
2553 set->size = desc.size; 2676 set->size = desc.size;
2677 set->policy = policy;
2554 2678
2555 err = ops->init(set, &desc, nla); 2679 err = ops->init(set, &desc, nla);
2556 if (err < 0) 2680 if (err < 0)
@@ -2611,13 +2735,7 @@ static int nf_tables_delset(struct sock *nlsk, struct sk_buff *skb,
2611 if (!list_empty(&set->bindings)) 2735 if (!list_empty(&set->bindings))
2612 return -EBUSY; 2736 return -EBUSY;
2613 2737
2614 err = nft_trans_set_add(&ctx, NFT_MSG_DELSET, set); 2738 return nft_delset(&ctx, set);
2615 if (err < 0)
2616 return err;
2617
2618 list_del_rcu(&set->list);
2619 ctx.table->use--;
2620 return 0;
2621} 2739}
2622 2740
2623static int nf_tables_bind_check_setelem(const struct nft_ctx *ctx, 2741static int nf_tables_bind_check_setelem(const struct nft_ctx *ctx,
@@ -2815,7 +2933,7 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb)
2815 nfmsg = nlmsg_data(nlh); 2933 nfmsg = nlmsg_data(nlh);
2816 nfmsg->nfgen_family = ctx.afi->family; 2934 nfmsg->nfgen_family = ctx.afi->family;
2817 nfmsg->version = NFNETLINK_V0; 2935 nfmsg->version = NFNETLINK_V0;
2818 nfmsg->res_id = 0; 2936 nfmsg->res_id = htons(ctx.net->nft.base_seq & 0xffff);
2819 2937
2820 if (nla_put_string(skb, NFTA_SET_ELEM_LIST_TABLE, ctx.table->name)) 2938 if (nla_put_string(skb, NFTA_SET_ELEM_LIST_TABLE, ctx.table->name))
2821 goto nla_put_failure; 2939 goto nla_put_failure;
@@ -2896,7 +3014,7 @@ static int nf_tables_fill_setelem_info(struct sk_buff *skb,
2896 nfmsg = nlmsg_data(nlh); 3014 nfmsg = nlmsg_data(nlh);
2897 nfmsg->nfgen_family = ctx->afi->family; 3015 nfmsg->nfgen_family = ctx->afi->family;
2898 nfmsg->version = NFNETLINK_V0; 3016 nfmsg->version = NFNETLINK_V0;
2899 nfmsg->res_id = 0; 3017 nfmsg->res_id = htons(ctx->net->nft.base_seq & 0xffff);
2900 3018
2901 if (nla_put_string(skb, NFTA_SET_TABLE, ctx->table->name)) 3019 if (nla_put_string(skb, NFTA_SET_TABLE, ctx->table->name))
2902 goto nla_put_failure; 3020 goto nla_put_failure;
@@ -3183,6 +3301,87 @@ static int nf_tables_delsetelem(struct sock *nlsk, struct sk_buff *skb,
3183 return err; 3301 return err;
3184} 3302}
3185 3303
3304static int nf_tables_fill_gen_info(struct sk_buff *skb, struct net *net,
3305 u32 portid, u32 seq)
3306{
3307 struct nlmsghdr *nlh;
3308 struct nfgenmsg *nfmsg;
3309 int event = (NFNL_SUBSYS_NFTABLES << 8) | NFT_MSG_NEWGEN;
3310
3311 nlh = nlmsg_put(skb, portid, seq, event, sizeof(struct nfgenmsg), 0);
3312 if (nlh == NULL)
3313 goto nla_put_failure;
3314
3315 nfmsg = nlmsg_data(nlh);
3316 nfmsg->nfgen_family = AF_UNSPEC;
3317 nfmsg->version = NFNETLINK_V0;
3318 nfmsg->res_id = htons(net->nft.base_seq & 0xffff);
3319
3320 if (nla_put_be32(skb, NFTA_GEN_ID, htonl(net->nft.base_seq)))
3321 goto nla_put_failure;
3322
3323 return nlmsg_end(skb, nlh);
3324
3325nla_put_failure:
3326 nlmsg_trim(skb, nlh);
3327 return -EMSGSIZE;
3328}
3329
3330static int nf_tables_gen_notify(struct net *net, struct sk_buff *skb, int event)
3331{
3332 struct nlmsghdr *nlh = nlmsg_hdr(skb);
3333 struct sk_buff *skb2;
3334 int err;
3335
3336 if (nlmsg_report(nlh) &&
3337 !nfnetlink_has_listeners(net, NFNLGRP_NFTABLES))
3338 return 0;
3339
3340 err = -ENOBUFS;
3341 skb2 = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
3342 if (skb2 == NULL)
3343 goto err;
3344
3345 err = nf_tables_fill_gen_info(skb2, net, NETLINK_CB(skb).portid,
3346 nlh->nlmsg_seq);
3347 if (err < 0) {
3348 kfree_skb(skb2);
3349 goto err;
3350 }
3351
3352 err = nfnetlink_send(skb2, net, NETLINK_CB(skb).portid,
3353 NFNLGRP_NFTABLES, nlmsg_report(nlh), GFP_KERNEL);
3354err:
3355 if (err < 0) {
3356 nfnetlink_set_err(net, NETLINK_CB(skb).portid, NFNLGRP_NFTABLES,
3357 err);
3358 }
3359 return err;
3360}
3361
3362static int nf_tables_getgen(struct sock *nlsk, struct sk_buff *skb,
3363 const struct nlmsghdr *nlh,
3364 const struct nlattr * const nla[])
3365{
3366 struct net *net = sock_net(skb->sk);
3367 struct sk_buff *skb2;
3368 int err;
3369
3370 skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
3371 if (skb2 == NULL)
3372 return -ENOMEM;
3373
3374 err = nf_tables_fill_gen_info(skb2, net, NETLINK_CB(skb).portid,
3375 nlh->nlmsg_seq);
3376 if (err < 0)
3377 goto err;
3378
3379 return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid);
3380err:
3381 kfree_skb(skb2);
3382 return err;
3383}
3384
3186static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = { 3385static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = {
3187 [NFT_MSG_NEWTABLE] = { 3386 [NFT_MSG_NEWTABLE] = {
3188 .call_batch = nf_tables_newtable, 3387 .call_batch = nf_tables_newtable,
@@ -3259,6 +3458,9 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = {
3259 .attr_count = NFTA_SET_ELEM_LIST_MAX, 3458 .attr_count = NFTA_SET_ELEM_LIST_MAX,
3260 .policy = nft_set_elem_list_policy, 3459 .policy = nft_set_elem_list_policy,
3261 }, 3460 },
3461 [NFT_MSG_GETGEN] = {
3462 .call = nf_tables_getgen,
3463 },
3262}; 3464};
3263 3465
3264static void nft_chain_commit_update(struct nft_trans *trans) 3466static void nft_chain_commit_update(struct nft_trans *trans)
@@ -3352,11 +3554,9 @@ static int nf_tables_commit(struct sk_buff *skb)
3352 break; 3554 break;
3353 case NFT_MSG_DELCHAIN: 3555 case NFT_MSG_DELCHAIN:
3354 nf_tables_chain_notify(&trans->ctx, NFT_MSG_DELCHAIN); 3556 nf_tables_chain_notify(&trans->ctx, NFT_MSG_DELCHAIN);
3355 if (!(trans->ctx.table->flags & NFT_TABLE_F_DORMANT) && 3557 nf_tables_unregister_hooks(trans->ctx.table,
3356 trans->ctx.chain->flags & NFT_BASE_CHAIN) { 3558 trans->ctx.chain,
3357 nf_unregister_hooks(nft_base_chain(trans->ctx.chain)->ops, 3559 trans->ctx.afi->nops);
3358 trans->ctx.afi->nops);
3359 }
3360 break; 3560 break;
3361 case NFT_MSG_NEWRULE: 3561 case NFT_MSG_NEWRULE:
3362 nft_rule_clear(trans->ctx.net, nft_trans_rule(trans)); 3562 nft_rule_clear(trans->ctx.net, nft_trans_rule(trans));
@@ -3418,6 +3618,8 @@ static int nf_tables_commit(struct sk_buff *skb)
3418 call_rcu(&trans->rcu_head, nf_tables_commit_release_rcu); 3618 call_rcu(&trans->rcu_head, nf_tables_commit_release_rcu);
3419 } 3619 }
3420 3620
3621 nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN);
3622
3421 return 0; 3623 return 0;
3422} 3624}
3423 3625
@@ -3479,11 +3681,9 @@ static int nf_tables_abort(struct sk_buff *skb)
3479 } else { 3681 } else {
3480 trans->ctx.table->use--; 3682 trans->ctx.table->use--;
3481 list_del_rcu(&trans->ctx.chain->list); 3683 list_del_rcu(&trans->ctx.chain->list);
3482 if (!(trans->ctx.table->flags & NFT_TABLE_F_DORMANT) && 3684 nf_tables_unregister_hooks(trans->ctx.table,
3483 trans->ctx.chain->flags & NFT_BASE_CHAIN) { 3685 trans->ctx.chain,
3484 nf_unregister_hooks(nft_base_chain(trans->ctx.chain)->ops, 3686 trans->ctx.afi->nops);
3485 trans->ctx.afi->nops);
3486 }
3487 } 3687 }
3488 break; 3688 break;
3489 case NFT_MSG_DELCHAIN: 3689 case NFT_MSG_DELCHAIN:
@@ -3963,6 +4163,7 @@ static void __exit nf_tables_module_exit(void)
3963{ 4163{
3964 unregister_pernet_subsys(&nf_tables_net_ops); 4164 unregister_pernet_subsys(&nf_tables_net_ops);
3965 nfnetlink_subsys_unregister(&nf_tables_subsys); 4165 nfnetlink_subsys_unregister(&nf_tables_subsys);
4166 rcu_barrier();
3966 nf_tables_core_module_exit(); 4167 nf_tables_core_module_exit();
3967 kfree(info); 4168 kfree(info);
3968} 4169}
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index f37f0716a9fc..6c5a915cfa75 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -381,7 +381,7 @@ replay:
381 */ 381 */
382 if (err == -EAGAIN) { 382 if (err == -EAGAIN) {
383 nfnl_err_reset(&err_list); 383 nfnl_err_reset(&err_list);
384 ss->abort(skb); 384 ss->abort(oskb);
385 nfnl_unlock(subsys_id); 385 nfnl_unlock(subsys_id);
386 kfree_skb(nskb); 386 kfree_skb(nskb);
387 goto replay; 387 goto replay;
@@ -418,9 +418,9 @@ ack:
418 } 418 }
419done: 419done:
420 if (success && done) 420 if (success && done)
421 ss->commit(skb); 421 ss->commit(oskb);
422 else 422 else
423 ss->abort(skb); 423 ss->abort(oskb);
424 424
425 nfnl_err_deliver(&err_list, oskb); 425 nfnl_err_deliver(&err_list, oskb);
426 nfnl_unlock(subsys_id); 426 nfnl_unlock(subsys_id);
diff --git a/net/netfilter/nfnetlink_acct.c b/net/netfilter/nfnetlink_acct.c
index 3ea0eacbd970..c18af2f63eef 100644
--- a/net/netfilter/nfnetlink_acct.c
+++ b/net/netfilter/nfnetlink_acct.c
@@ -40,6 +40,11 @@ struct nf_acct {
40 char data[0]; 40 char data[0];
41}; 41};
42 42
43struct nfacct_filter {
44 u32 value;
45 u32 mask;
46};
47
43#define NFACCT_F_QUOTA (NFACCT_F_QUOTA_PKTS | NFACCT_F_QUOTA_BYTES) 48#define NFACCT_F_QUOTA (NFACCT_F_QUOTA_PKTS | NFACCT_F_QUOTA_BYTES)
44#define NFACCT_OVERQUOTA_BIT 2 /* NFACCT_F_OVERQUOTA */ 49#define NFACCT_OVERQUOTA_BIT 2 /* NFACCT_F_OVERQUOTA */
45 50
@@ -181,6 +186,7 @@ static int
181nfnl_acct_dump(struct sk_buff *skb, struct netlink_callback *cb) 186nfnl_acct_dump(struct sk_buff *skb, struct netlink_callback *cb)
182{ 187{
183 struct nf_acct *cur, *last; 188 struct nf_acct *cur, *last;
189 const struct nfacct_filter *filter = cb->data;
184 190
185 if (cb->args[2]) 191 if (cb->args[2])
186 return 0; 192 return 0;
@@ -197,6 +203,10 @@ nfnl_acct_dump(struct sk_buff *skb, struct netlink_callback *cb)
197 203
198 last = NULL; 204 last = NULL;
199 } 205 }
206
207 if (filter && (cur->flags & filter->mask) != filter->value)
208 continue;
209
200 if (nfnl_acct_fill_info(skb, NETLINK_CB(cb->skb).portid, 210 if (nfnl_acct_fill_info(skb, NETLINK_CB(cb->skb).portid,
201 cb->nlh->nlmsg_seq, 211 cb->nlh->nlmsg_seq,
202 NFNL_MSG_TYPE(cb->nlh->nlmsg_type), 212 NFNL_MSG_TYPE(cb->nlh->nlmsg_type),
@@ -211,6 +221,38 @@ nfnl_acct_dump(struct sk_buff *skb, struct netlink_callback *cb)
211 return skb->len; 221 return skb->len;
212} 222}
213 223
224static int nfnl_acct_done(struct netlink_callback *cb)
225{
226 kfree(cb->data);
227 return 0;
228}
229
230static const struct nla_policy filter_policy[NFACCT_FILTER_MAX + 1] = {
231 [NFACCT_FILTER_MASK] = { .type = NLA_U32 },
232 [NFACCT_FILTER_VALUE] = { .type = NLA_U32 },
233};
234
235static struct nfacct_filter *
236nfacct_filter_alloc(const struct nlattr * const attr)
237{
238 struct nfacct_filter *filter;
239 struct nlattr *tb[NFACCT_FILTER_MAX + 1];
240 int err;
241
242 err = nla_parse_nested(tb, NFACCT_FILTER_MAX, attr, filter_policy);
243 if (err < 0)
244 return ERR_PTR(err);
245
246 filter = kzalloc(sizeof(struct nfacct_filter), GFP_KERNEL);
247 if (!filter)
248 return ERR_PTR(-ENOMEM);
249
250 filter->mask = ntohl(nla_get_be32(tb[NFACCT_FILTER_MASK]));
251 filter->value = ntohl(nla_get_be32(tb[NFACCT_FILTER_VALUE]));
252
253 return filter;
254}
255
214static int 256static int
215nfnl_acct_get(struct sock *nfnl, struct sk_buff *skb, 257nfnl_acct_get(struct sock *nfnl, struct sk_buff *skb,
216 const struct nlmsghdr *nlh, const struct nlattr * const tb[]) 258 const struct nlmsghdr *nlh, const struct nlattr * const tb[])
@@ -222,7 +264,18 @@ nfnl_acct_get(struct sock *nfnl, struct sk_buff *skb,
222 if (nlh->nlmsg_flags & NLM_F_DUMP) { 264 if (nlh->nlmsg_flags & NLM_F_DUMP) {
223 struct netlink_dump_control c = { 265 struct netlink_dump_control c = {
224 .dump = nfnl_acct_dump, 266 .dump = nfnl_acct_dump,
267 .done = nfnl_acct_done,
225 }; 268 };
269
270 if (tb[NFACCT_FILTER]) {
271 struct nfacct_filter *filter;
272
273 filter = nfacct_filter_alloc(tb[NFACCT_FILTER]);
274 if (IS_ERR(filter))
275 return PTR_ERR(filter);
276
277 c.data = filter;
278 }
226 return netlink_dump_start(nfnl, skb, nlh, &c); 279 return netlink_dump_start(nfnl, skb, nlh, &c);
227 } 280 }
228 281
@@ -314,6 +367,7 @@ static const struct nla_policy nfnl_acct_policy[NFACCT_MAX+1] = {
314 [NFACCT_PKTS] = { .type = NLA_U64 }, 367 [NFACCT_PKTS] = { .type = NLA_U64 },
315 [NFACCT_FLAGS] = { .type = NLA_U32 }, 368 [NFACCT_FLAGS] = { .type = NLA_U32 },
316 [NFACCT_QUOTA] = { .type = NLA_U64 }, 369 [NFACCT_QUOTA] = { .type = NLA_U64 },
370 [NFACCT_FILTER] = {.type = NLA_NESTED },
317}; 371};
318 372
319static const struct nfnl_callback nfnl_acct_cb[NFNL_MSG_ACCT_MAX] = { 373static const struct nfnl_callback nfnl_acct_cb[NFNL_MSG_ACCT_MAX] = {
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index a11c5ff2f720..b1e3a0579416 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -36,7 +36,7 @@
36 36
37#include <linux/atomic.h> 37#include <linux/atomic.h>
38 38
39#ifdef CONFIG_BRIDGE_NETFILTER 39#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
40#include "../bridge/br_private.h" 40#include "../bridge/br_private.h"
41#endif 41#endif
42 42
@@ -429,7 +429,7 @@ __build_packet_message(struct nfnl_log_net *log,
429 goto nla_put_failure; 429 goto nla_put_failure;
430 430
431 if (indev) { 431 if (indev) {
432#ifndef CONFIG_BRIDGE_NETFILTER 432#if !IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
433 if (nla_put_be32(inst->skb, NFULA_IFINDEX_INDEV, 433 if (nla_put_be32(inst->skb, NFULA_IFINDEX_INDEV,
434 htonl(indev->ifindex))) 434 htonl(indev->ifindex)))
435 goto nla_put_failure; 435 goto nla_put_failure;
@@ -460,7 +460,7 @@ __build_packet_message(struct nfnl_log_net *log,
460 } 460 }
461 461
462 if (outdev) { 462 if (outdev) {
463#ifndef CONFIG_BRIDGE_NETFILTER 463#if !IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
464 if (nla_put_be32(inst->skb, NFULA_IFINDEX_OUTDEV, 464 if (nla_put_be32(inst->skb, NFULA_IFINDEX_OUTDEV,
465 htonl(outdev->ifindex))) 465 htonl(outdev->ifindex)))
466 goto nla_put_failure; 466 goto nla_put_failure;
@@ -640,7 +640,7 @@ nfulnl_log_packet(struct net *net,
640 + nla_total_size(sizeof(struct nfulnl_msg_packet_hdr)) 640 + nla_total_size(sizeof(struct nfulnl_msg_packet_hdr))
641 + nla_total_size(sizeof(u_int32_t)) /* ifindex */ 641 + nla_total_size(sizeof(u_int32_t)) /* ifindex */
642 + nla_total_size(sizeof(u_int32_t)) /* ifindex */ 642 + nla_total_size(sizeof(u_int32_t)) /* ifindex */
643#ifdef CONFIG_BRIDGE_NETFILTER 643#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
644 + nla_total_size(sizeof(u_int32_t)) /* ifindex */ 644 + nla_total_size(sizeof(u_int32_t)) /* ifindex */
645 + nla_total_size(sizeof(u_int32_t)) /* ifindex */ 645 + nla_total_size(sizeof(u_int32_t)) /* ifindex */
646#endif 646#endif
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 108120f216b1..a82077d9f59b 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -36,7 +36,7 @@
36 36
37#include <linux/atomic.h> 37#include <linux/atomic.h>
38 38
39#ifdef CONFIG_BRIDGE_NETFILTER 39#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
40#include "../bridge/br_private.h" 40#include "../bridge/br_private.h"
41#endif 41#endif
42 42
@@ -302,7 +302,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
302 + nla_total_size(sizeof(struct nfqnl_msg_packet_hdr)) 302 + nla_total_size(sizeof(struct nfqnl_msg_packet_hdr))
303 + nla_total_size(sizeof(u_int32_t)) /* ifindex */ 303 + nla_total_size(sizeof(u_int32_t)) /* ifindex */
304 + nla_total_size(sizeof(u_int32_t)) /* ifindex */ 304 + nla_total_size(sizeof(u_int32_t)) /* ifindex */
305#ifdef CONFIG_BRIDGE_NETFILTER 305#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
306 + nla_total_size(sizeof(u_int32_t)) /* ifindex */ 306 + nla_total_size(sizeof(u_int32_t)) /* ifindex */
307 + nla_total_size(sizeof(u_int32_t)) /* ifindex */ 307 + nla_total_size(sizeof(u_int32_t)) /* ifindex */
308#endif 308#endif
@@ -380,7 +380,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
380 380
381 indev = entry->indev; 381 indev = entry->indev;
382 if (indev) { 382 if (indev) {
383#ifndef CONFIG_BRIDGE_NETFILTER 383#if !IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
384 if (nla_put_be32(skb, NFQA_IFINDEX_INDEV, htonl(indev->ifindex))) 384 if (nla_put_be32(skb, NFQA_IFINDEX_INDEV, htonl(indev->ifindex)))
385 goto nla_put_failure; 385 goto nla_put_failure;
386#else 386#else
@@ -410,7 +410,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
410 } 410 }
411 411
412 if (outdev) { 412 if (outdev) {
413#ifndef CONFIG_BRIDGE_NETFILTER 413#if !IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
414 if (nla_put_be32(skb, NFQA_IFINDEX_OUTDEV, htonl(outdev->ifindex))) 414 if (nla_put_be32(skb, NFQA_IFINDEX_OUTDEV, htonl(outdev->ifindex)))
415 goto nla_put_failure; 415 goto nla_put_failure;
416#else 416#else
@@ -569,7 +569,7 @@ nf_queue_entry_dup(struct nf_queue_entry *e)
569 return NULL; 569 return NULL;
570} 570}
571 571
572#ifdef CONFIG_BRIDGE_NETFILTER 572#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
573/* When called from bridge netfilter, skb->data must point to MAC header 573/* When called from bridge netfilter, skb->data must point to MAC header
574 * before calling skb_gso_segment(). Else, original MAC header is lost 574 * before calling skb_gso_segment(). Else, original MAC header is lost
575 * and segmented skbs will be sent to wrong destination. 575 * and segmented skbs will be sent to wrong destination.
@@ -763,7 +763,7 @@ dev_cmp(struct nf_queue_entry *entry, unsigned long ifindex)
763 if (entry->outdev) 763 if (entry->outdev)
764 if (entry->outdev->ifindex == ifindex) 764 if (entry->outdev->ifindex == ifindex)
765 return 1; 765 return 1;
766#ifdef CONFIG_BRIDGE_NETFILTER 766#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
767 if (entry->skb->nf_bridge) { 767 if (entry->skb->nf_bridge) {
768 if (entry->skb->nf_bridge->physindev && 768 if (entry->skb->nf_bridge->physindev &&
769 entry->skb->nf_bridge->physindev->ifindex == ifindex) 769 entry->skb->nf_bridge->physindev->ifindex == ifindex)
diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index 1840989092ed..7e2683c8a44a 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -101,26 +101,12 @@ nft_target_set_tgchk_param(struct xt_tgchk_param *par,
101 101
102static void target_compat_from_user(struct xt_target *t, void *in, void *out) 102static void target_compat_from_user(struct xt_target *t, void *in, void *out)
103{ 103{
104#ifdef CONFIG_COMPAT 104 int pad;
105 if (t->compat_from_user) {
106 int pad;
107
108 t->compat_from_user(out, in);
109 pad = XT_ALIGN(t->targetsize) - t->targetsize;
110 if (pad > 0)
111 memset(out + t->targetsize, 0, pad);
112 } else
113#endif
114 memcpy(out, in, XT_ALIGN(t->targetsize));
115}
116 105
117static inline int nft_compat_target_offset(struct xt_target *target) 106 memcpy(out, in, t->targetsize);
118{ 107 pad = XT_ALIGN(t->targetsize) - t->targetsize;
119#ifdef CONFIG_COMPAT 108 if (pad > 0)
120 return xt_compat_target_offset(target); 109 memset(out + t->targetsize, 0, pad);
121#else
122 return 0;
123#endif
124} 110}
125 111
126static const struct nla_policy nft_rule_compat_policy[NFTA_RULE_COMPAT_MAX + 1] = { 112static const struct nla_policy nft_rule_compat_policy[NFTA_RULE_COMPAT_MAX + 1] = {
@@ -208,34 +194,6 @@ nft_target_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr)
208 module_put(target->me); 194 module_put(target->me);
209} 195}
210 196
211static int
212target_dump_info(struct sk_buff *skb, const struct xt_target *t, const void *in)
213{
214 int ret;
215
216#ifdef CONFIG_COMPAT
217 if (t->compat_to_user) {
218 mm_segment_t old_fs;
219 void *out;
220
221 out = kmalloc(XT_ALIGN(t->targetsize), GFP_ATOMIC);
222 if (out == NULL)
223 return -ENOMEM;
224
225 /* We want to reuse existing compat_to_user */
226 old_fs = get_fs();
227 set_fs(KERNEL_DS);
228 t->compat_to_user(out, in);
229 set_fs(old_fs);
230 ret = nla_put(skb, NFTA_TARGET_INFO, XT_ALIGN(t->targetsize), out);
231 kfree(out);
232 } else
233#endif
234 ret = nla_put(skb, NFTA_TARGET_INFO, XT_ALIGN(t->targetsize), in);
235
236 return ret;
237}
238
239static int nft_target_dump(struct sk_buff *skb, const struct nft_expr *expr) 197static int nft_target_dump(struct sk_buff *skb, const struct nft_expr *expr)
240{ 198{
241 const struct xt_target *target = expr->ops->data; 199 const struct xt_target *target = expr->ops->data;
@@ -243,7 +201,7 @@ static int nft_target_dump(struct sk_buff *skb, const struct nft_expr *expr)
243 201
244 if (nla_put_string(skb, NFTA_TARGET_NAME, target->name) || 202 if (nla_put_string(skb, NFTA_TARGET_NAME, target->name) ||
245 nla_put_be32(skb, NFTA_TARGET_REV, htonl(target->revision)) || 203 nla_put_be32(skb, NFTA_TARGET_REV, htonl(target->revision)) ||
246 target_dump_info(skb, target, info)) 204 nla_put(skb, NFTA_TARGET_INFO, XT_ALIGN(target->targetsize), info))
247 goto nla_put_failure; 205 goto nla_put_failure;
248 206
249 return 0; 207 return 0;
@@ -341,17 +299,12 @@ nft_match_set_mtchk_param(struct xt_mtchk_param *par, const struct nft_ctx *ctx,
341 299
342static void match_compat_from_user(struct xt_match *m, void *in, void *out) 300static void match_compat_from_user(struct xt_match *m, void *in, void *out)
343{ 301{
344#ifdef CONFIG_COMPAT 302 int pad;
345 if (m->compat_from_user) { 303
346 int pad; 304 memcpy(out, in, m->matchsize);
347 305 pad = XT_ALIGN(m->matchsize) - m->matchsize;
348 m->compat_from_user(out, in); 306 if (pad > 0)
349 pad = XT_ALIGN(m->matchsize) - m->matchsize; 307 memset(out + m->matchsize, 0, pad);
350 if (pad > 0)
351 memset(out + m->matchsize, 0, pad);
352 } else
353#endif
354 memcpy(out, in, XT_ALIGN(m->matchsize));
355} 308}
356 309
357static int 310static int
@@ -404,43 +357,6 @@ nft_match_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr)
404 module_put(match->me); 357 module_put(match->me);
405} 358}
406 359
407static int
408match_dump_info(struct sk_buff *skb, const struct xt_match *m, const void *in)
409{
410 int ret;
411
412#ifdef CONFIG_COMPAT
413 if (m->compat_to_user) {
414 mm_segment_t old_fs;
415 void *out;
416
417 out = kmalloc(XT_ALIGN(m->matchsize), GFP_ATOMIC);
418 if (out == NULL)
419 return -ENOMEM;
420
421 /* We want to reuse existing compat_to_user */
422 old_fs = get_fs();
423 set_fs(KERNEL_DS);
424 m->compat_to_user(out, in);
425 set_fs(old_fs);
426 ret = nla_put(skb, NFTA_MATCH_INFO, XT_ALIGN(m->matchsize), out);
427 kfree(out);
428 } else
429#endif
430 ret = nla_put(skb, NFTA_MATCH_INFO, XT_ALIGN(m->matchsize), in);
431
432 return ret;
433}
434
435static inline int nft_compat_match_offset(struct xt_match *match)
436{
437#ifdef CONFIG_COMPAT
438 return xt_compat_match_offset(match);
439#else
440 return 0;
441#endif
442}
443
444static int nft_match_dump(struct sk_buff *skb, const struct nft_expr *expr) 360static int nft_match_dump(struct sk_buff *skb, const struct nft_expr *expr)
445{ 361{
446 void *info = nft_expr_priv(expr); 362 void *info = nft_expr_priv(expr);
@@ -448,7 +364,7 @@ static int nft_match_dump(struct sk_buff *skb, const struct nft_expr *expr)
448 364
449 if (nla_put_string(skb, NFTA_MATCH_NAME, match->name) || 365 if (nla_put_string(skb, NFTA_MATCH_NAME, match->name) ||
450 nla_put_be32(skb, NFTA_MATCH_REV, htonl(match->revision)) || 366 nla_put_be32(skb, NFTA_MATCH_REV, htonl(match->revision)) ||
451 match_dump_info(skb, match, info)) 367 nla_put(skb, NFTA_MATCH_INFO, XT_ALIGN(match->matchsize), info))
452 goto nla_put_failure; 368 goto nla_put_failure;
453 369
454 return 0; 370 return 0;
@@ -643,8 +559,7 @@ nft_match_select_ops(const struct nft_ctx *ctx,
643 return ERR_PTR(-ENOMEM); 559 return ERR_PTR(-ENOMEM);
644 560
645 nft_match->ops.type = &nft_match_type; 561 nft_match->ops.type = &nft_match_type;
646 nft_match->ops.size = NFT_EXPR_SIZE(XT_ALIGN(match->matchsize) + 562 nft_match->ops.size = NFT_EXPR_SIZE(XT_ALIGN(match->matchsize));
647 nft_compat_match_offset(match));
648 nft_match->ops.eval = nft_match_eval; 563 nft_match->ops.eval = nft_match_eval;
649 nft_match->ops.init = nft_match_init; 564 nft_match->ops.init = nft_match_init;
650 nft_match->ops.destroy = nft_match_destroy; 565 nft_match->ops.destroy = nft_match_destroy;
@@ -714,8 +629,7 @@ nft_target_select_ops(const struct nft_ctx *ctx,
714 return ERR_PTR(-ENOMEM); 629 return ERR_PTR(-ENOMEM);
715 630
716 nft_target->ops.type = &nft_target_type; 631 nft_target->ops.type = &nft_target_type;
717 nft_target->ops.size = NFT_EXPR_SIZE(XT_ALIGN(target->targetsize) + 632 nft_target->ops.size = NFT_EXPR_SIZE(XT_ALIGN(target->targetsize));
718 nft_compat_target_offset(target));
719 nft_target->ops.eval = nft_target_eval; 633 nft_target->ops.eval = nft_target_eval;
720 nft_target->ops.init = nft_target_init; 634 nft_target->ops.init = nft_target_init;
721 nft_target->ops.destroy = nft_target_destroy; 635 nft_target->ops.destroy = nft_target_destroy;
diff --git a/net/netfilter/nft_masq.c b/net/netfilter/nft_masq.c
new file mode 100644
index 000000000000..6637bab00567
--- /dev/null
+++ b/net/netfilter/nft_masq.c
@@ -0,0 +1,59 @@
1/*
2 * Copyright (c) 2014 Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
3 *
4 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License version 2 as
6 * published by the Free Software Foundation.
7 */
8
9#include <linux/kernel.h>
10#include <linux/init.h>
11#include <linux/module.h>
12#include <linux/netlink.h>
13#include <linux/netfilter.h>
14#include <linux/netfilter/nf_tables.h>
15#include <net/netfilter/nf_tables.h>
16#include <net/netfilter/nf_nat.h>
17#include <net/netfilter/nft_masq.h>
18
19const struct nla_policy nft_masq_policy[NFTA_MASQ_MAX + 1] = {
20 [NFTA_MASQ_FLAGS] = { .type = NLA_U32 },
21};
22EXPORT_SYMBOL_GPL(nft_masq_policy);
23
24int nft_masq_init(const struct nft_ctx *ctx,
25 const struct nft_expr *expr,
26 const struct nlattr * const tb[])
27{
28 struct nft_masq *priv = nft_expr_priv(expr);
29
30 if (tb[NFTA_MASQ_FLAGS] == NULL)
31 return 0;
32
33 priv->flags = ntohl(nla_get_be32(tb[NFTA_MASQ_FLAGS]));
34 if (priv->flags & ~NF_NAT_RANGE_MASK)
35 return -EINVAL;
36
37 return 0;
38}
39EXPORT_SYMBOL_GPL(nft_masq_init);
40
41int nft_masq_dump(struct sk_buff *skb, const struct nft_expr *expr)
42{
43 const struct nft_masq *priv = nft_expr_priv(expr);
44
45 if (priv->flags == 0)
46 return 0;
47
48 if (nla_put_be32(skb, NFTA_MASQ_FLAGS, htonl(priv->flags)))
49 goto nla_put_failure;
50
51 return 0;
52
53nla_put_failure:
54 return -1;
55}
56EXPORT_SYMBOL_GPL(nft_masq_dump);
57
58MODULE_LICENSE("GPL");
59MODULE_AUTHOR("Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>");
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 852b178c6ae7..1e7c076ca63a 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -14,6 +14,10 @@
14#include <linux/netlink.h> 14#include <linux/netlink.h>
15#include <linux/netfilter.h> 15#include <linux/netfilter.h>
16#include <linux/netfilter/nf_tables.h> 16#include <linux/netfilter/nf_tables.h>
17#include <linux/in.h>
18#include <linux/ip.h>
19#include <linux/ipv6.h>
20#include <linux/smp.h>
17#include <net/dst.h> 21#include <net/dst.h>
18#include <net/sock.h> 22#include <net/sock.h>
19#include <net/tcp_states.h> /* for TCP_TIME_WAIT */ 23#include <net/tcp_states.h> /* for TCP_TIME_WAIT */
@@ -124,6 +128,43 @@ void nft_meta_get_eval(const struct nft_expr *expr,
124 dest->data[0] = skb->secmark; 128 dest->data[0] = skb->secmark;
125 break; 129 break;
126#endif 130#endif
131 case NFT_META_PKTTYPE:
132 if (skb->pkt_type != PACKET_LOOPBACK) {
133 dest->data[0] = skb->pkt_type;
134 break;
135 }
136
137 switch (pkt->ops->pf) {
138 case NFPROTO_IPV4:
139 if (ipv4_is_multicast(ip_hdr(skb)->daddr))
140 dest->data[0] = PACKET_MULTICAST;
141 else
142 dest->data[0] = PACKET_BROADCAST;
143 break;
144 case NFPROTO_IPV6:
145 if (ipv6_hdr(skb)->daddr.s6_addr[0] == 0xFF)
146 dest->data[0] = PACKET_MULTICAST;
147 else
148 dest->data[0] = PACKET_BROADCAST;
149 break;
150 default:
151 WARN_ON(1);
152 goto err;
153 }
154 break;
155 case NFT_META_CPU:
156 dest->data[0] = smp_processor_id();
157 break;
158 case NFT_META_IIFGROUP:
159 if (in == NULL)
160 goto err;
161 dest->data[0] = in->group;
162 break;
163 case NFT_META_OIFGROUP:
164 if (out == NULL)
165 goto err;
166 dest->data[0] = out->group;
167 break;
127 default: 168 default:
128 WARN_ON(1); 169 WARN_ON(1);
129 goto err; 170 goto err;
@@ -195,6 +236,10 @@ int nft_meta_get_init(const struct nft_ctx *ctx,
195#ifdef CONFIG_NETWORK_SECMARK 236#ifdef CONFIG_NETWORK_SECMARK
196 case NFT_META_SECMARK: 237 case NFT_META_SECMARK:
197#endif 238#endif
239 case NFT_META_PKTTYPE:
240 case NFT_META_CPU:
241 case NFT_META_IIFGROUP:
242 case NFT_META_OIFGROUP:
198 break; 243 break;
199 default: 244 default:
200 return -EOPNOTSUPP; 245 return -EOPNOTSUPP;
diff --git a/net/netfilter/nft_nat.c b/net/netfilter/nft_nat.c
index 79ff58cd36dc..799550b476fb 100644
--- a/net/netfilter/nft_nat.c
+++ b/net/netfilter/nft_nat.c
@@ -33,6 +33,7 @@ struct nft_nat {
33 enum nft_registers sreg_proto_max:8; 33 enum nft_registers sreg_proto_max:8;
34 enum nf_nat_manip_type type:8; 34 enum nf_nat_manip_type type:8;
35 u8 family; 35 u8 family;
36 u16 flags;
36}; 37};
37 38
38static void nft_nat_eval(const struct nft_expr *expr, 39static void nft_nat_eval(const struct nft_expr *expr,
@@ -71,6 +72,8 @@ static void nft_nat_eval(const struct nft_expr *expr,
71 range.flags |= NF_NAT_RANGE_PROTO_SPECIFIED; 72 range.flags |= NF_NAT_RANGE_PROTO_SPECIFIED;
72 } 73 }
73 74
75 range.flags |= priv->flags;
76
74 data[NFT_REG_VERDICT].verdict = 77 data[NFT_REG_VERDICT].verdict =
75 nf_nat_setup_info(ct, &range, priv->type); 78 nf_nat_setup_info(ct, &range, priv->type);
76} 79}
@@ -82,6 +85,7 @@ static const struct nla_policy nft_nat_policy[NFTA_NAT_MAX + 1] = {
82 [NFTA_NAT_REG_ADDR_MAX] = { .type = NLA_U32 }, 85 [NFTA_NAT_REG_ADDR_MAX] = { .type = NLA_U32 },
83 [NFTA_NAT_REG_PROTO_MIN] = { .type = NLA_U32 }, 86 [NFTA_NAT_REG_PROTO_MIN] = { .type = NLA_U32 },
84 [NFTA_NAT_REG_PROTO_MAX] = { .type = NLA_U32 }, 87 [NFTA_NAT_REG_PROTO_MAX] = { .type = NLA_U32 },
88 [NFTA_NAT_FLAGS] = { .type = NLA_U32 },
85}; 89};
86 90
87static int nft_nat_init(const struct nft_ctx *ctx, const struct nft_expr *expr, 91static int nft_nat_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
@@ -149,6 +153,12 @@ static int nft_nat_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
149 } else 153 } else
150 priv->sreg_proto_max = priv->sreg_proto_min; 154 priv->sreg_proto_max = priv->sreg_proto_min;
151 155
156 if (tb[NFTA_NAT_FLAGS]) {
157 priv->flags = ntohl(nla_get_be32(tb[NFTA_NAT_FLAGS]));
158 if (priv->flags & ~NF_NAT_RANGE_MASK)
159 return -EINVAL;
160 }
161
152 return 0; 162 return 0;
153} 163}
154 164
@@ -183,6 +193,12 @@ static int nft_nat_dump(struct sk_buff *skb, const struct nft_expr *expr)
183 htonl(priv->sreg_proto_max))) 193 htonl(priv->sreg_proto_max)))
184 goto nla_put_failure; 194 goto nla_put_failure;
185 } 195 }
196
197 if (priv->flags != 0) {
198 if (nla_put_be32(skb, NFTA_NAT_FLAGS, htonl(priv->flags)))
199 goto nla_put_failure;
200 }
201
186 return 0; 202 return 0;
187 203
188nla_put_failure: 204nla_put_failure:
diff --git a/net/netfilter/nft_reject.c b/net/netfilter/nft_reject.c
index f3448c296446..ec8a456092a7 100644
--- a/net/netfilter/nft_reject.c
+++ b/net/netfilter/nft_reject.c
@@ -17,6 +17,8 @@
17#include <linux/netfilter/nf_tables.h> 17#include <linux/netfilter/nf_tables.h>
18#include <net/netfilter/nf_tables.h> 18#include <net/netfilter/nf_tables.h>
19#include <net/netfilter/nft_reject.h> 19#include <net/netfilter/nft_reject.h>
20#include <linux/icmp.h>
21#include <linux/icmpv6.h>
20 22
21const struct nla_policy nft_reject_policy[NFTA_REJECT_MAX + 1] = { 23const struct nla_policy nft_reject_policy[NFTA_REJECT_MAX + 1] = {
22 [NFTA_REJECT_TYPE] = { .type = NLA_U32 }, 24 [NFTA_REJECT_TYPE] = { .type = NLA_U32 },
@@ -70,5 +72,40 @@ nla_put_failure:
70} 72}
71EXPORT_SYMBOL_GPL(nft_reject_dump); 73EXPORT_SYMBOL_GPL(nft_reject_dump);
72 74
75static u8 icmp_code_v4[NFT_REJECT_ICMPX_MAX] = {
76 [NFT_REJECT_ICMPX_NO_ROUTE] = ICMP_NET_UNREACH,
77 [NFT_REJECT_ICMPX_PORT_UNREACH] = ICMP_PORT_UNREACH,
78 [NFT_REJECT_ICMPX_HOST_UNREACH] = ICMP_HOST_UNREACH,
79 [NFT_REJECT_ICMPX_ADMIN_PROHIBITED] = ICMP_PKT_FILTERED,
80};
81
82int nft_reject_icmp_code(u8 code)
83{
84 if (code > NFT_REJECT_ICMPX_MAX)
85 return -EINVAL;
86
87 return icmp_code_v4[code];
88}
89
90EXPORT_SYMBOL_GPL(nft_reject_icmp_code);
91
92
93static u8 icmp_code_v6[NFT_REJECT_ICMPX_MAX] = {
94 [NFT_REJECT_ICMPX_NO_ROUTE] = ICMPV6_NOROUTE,
95 [NFT_REJECT_ICMPX_PORT_UNREACH] = ICMPV6_PORT_UNREACH,
96 [NFT_REJECT_ICMPX_HOST_UNREACH] = ICMPV6_ADDR_UNREACH,
97 [NFT_REJECT_ICMPX_ADMIN_PROHIBITED] = ICMPV6_ADM_PROHIBITED,
98};
99
100int nft_reject_icmpv6_code(u8 code)
101{
102 if (code > NFT_REJECT_ICMPX_MAX)
103 return -EINVAL;
104
105 return icmp_code_v6[code];
106}
107
108EXPORT_SYMBOL_GPL(nft_reject_icmpv6_code);
109
73MODULE_LICENSE("GPL"); 110MODULE_LICENSE("GPL");
74MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); 111MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
diff --git a/net/netfilter/nft_reject_inet.c b/net/netfilter/nft_reject_inet.c
index b718a52a4654..7b5f9d58680a 100644
--- a/net/netfilter/nft_reject_inet.c
+++ b/net/netfilter/nft_reject_inet.c
@@ -14,17 +14,103 @@
14#include <linux/netfilter/nf_tables.h> 14#include <linux/netfilter/nf_tables.h>
15#include <net/netfilter/nf_tables.h> 15#include <net/netfilter/nf_tables.h>
16#include <net/netfilter/nft_reject.h> 16#include <net/netfilter/nft_reject.h>
17#include <net/netfilter/ipv4/nf_reject.h>
18#include <net/netfilter/ipv6/nf_reject.h>
17 19
18static void nft_reject_inet_eval(const struct nft_expr *expr, 20static void nft_reject_inet_eval(const struct nft_expr *expr,
19 struct nft_data data[NFT_REG_MAX + 1], 21 struct nft_data data[NFT_REG_MAX + 1],
20 const struct nft_pktinfo *pkt) 22 const struct nft_pktinfo *pkt)
21{ 23{
24 struct nft_reject *priv = nft_expr_priv(expr);
25 struct net *net = dev_net((pkt->in != NULL) ? pkt->in : pkt->out);
26
22 switch (pkt->ops->pf) { 27 switch (pkt->ops->pf) {
23 case NFPROTO_IPV4: 28 case NFPROTO_IPV4:
24 return nft_reject_ipv4_eval(expr, data, pkt); 29 switch (priv->type) {
30 case NFT_REJECT_ICMP_UNREACH:
31 nf_send_unreach(pkt->skb, priv->icmp_code);
32 break;
33 case NFT_REJECT_TCP_RST:
34 nf_send_reset(pkt->skb, pkt->ops->hooknum);
35 break;
36 case NFT_REJECT_ICMPX_UNREACH:
37 nf_send_unreach(pkt->skb,
38 nft_reject_icmp_code(priv->icmp_code));
39 break;
40 }
41 break;
25 case NFPROTO_IPV6: 42 case NFPROTO_IPV6:
26 return nft_reject_ipv6_eval(expr, data, pkt); 43 switch (priv->type) {
44 case NFT_REJECT_ICMP_UNREACH:
45 nf_send_unreach6(net, pkt->skb, priv->icmp_code,
46 pkt->ops->hooknum);
47 break;
48 case NFT_REJECT_TCP_RST:
49 nf_send_reset6(net, pkt->skb, pkt->ops->hooknum);
50 break;
51 case NFT_REJECT_ICMPX_UNREACH:
52 nf_send_unreach6(net, pkt->skb,
53 nft_reject_icmpv6_code(priv->icmp_code),
54 pkt->ops->hooknum);
55 break;
56 }
57 break;
58 }
59 data[NFT_REG_VERDICT].verdict = NF_DROP;
60}
61
62static int nft_reject_inet_init(const struct nft_ctx *ctx,
63 const struct nft_expr *expr,
64 const struct nlattr * const tb[])
65{
66 struct nft_reject *priv = nft_expr_priv(expr);
67 int icmp_code;
68
69 if (tb[NFTA_REJECT_TYPE] == NULL)
70 return -EINVAL;
71
72 priv->type = ntohl(nla_get_be32(tb[NFTA_REJECT_TYPE]));
73 switch (priv->type) {
74 case NFT_REJECT_ICMP_UNREACH:
75 case NFT_REJECT_ICMPX_UNREACH:
76 if (tb[NFTA_REJECT_ICMP_CODE] == NULL)
77 return -EINVAL;
78
79 icmp_code = nla_get_u8(tb[NFTA_REJECT_ICMP_CODE]);
80 if (priv->type == NFT_REJECT_ICMPX_UNREACH &&
81 icmp_code > NFT_REJECT_ICMPX_MAX)
82 return -EINVAL;
83
84 priv->icmp_code = icmp_code;
85 break;
86 case NFT_REJECT_TCP_RST:
87 break;
88 default:
89 return -EINVAL;
27 } 90 }
91 return 0;
92}
93
94static int nft_reject_inet_dump(struct sk_buff *skb,
95 const struct nft_expr *expr)
96{
97 const struct nft_reject *priv = nft_expr_priv(expr);
98
99 if (nla_put_be32(skb, NFTA_REJECT_TYPE, htonl(priv->type)))
100 goto nla_put_failure;
101
102 switch (priv->type) {
103 case NFT_REJECT_ICMP_UNREACH:
104 case NFT_REJECT_ICMPX_UNREACH:
105 if (nla_put_u8(skb, NFTA_REJECT_ICMP_CODE, priv->icmp_code))
106 goto nla_put_failure;
107 break;
108 }
109
110 return 0;
111
112nla_put_failure:
113 return -1;
28} 114}
29 115
30static struct nft_expr_type nft_reject_inet_type; 116static struct nft_expr_type nft_reject_inet_type;
@@ -32,8 +118,8 @@ static const struct nft_expr_ops nft_reject_inet_ops = {
32 .type = &nft_reject_inet_type, 118 .type = &nft_reject_inet_type,
33 .size = NFT_EXPR_SIZE(sizeof(struct nft_reject)), 119 .size = NFT_EXPR_SIZE(sizeof(struct nft_reject)),
34 .eval = nft_reject_inet_eval, 120 .eval = nft_reject_inet_eval,
35 .init = nft_reject_init, 121 .init = nft_reject_inet_init,
36 .dump = nft_reject_dump, 122 .dump = nft_reject_inet_dump,
37}; 123};
38 124
39static struct nft_expr_type nft_reject_inet_type __read_mostly = { 125static struct nft_expr_type nft_reject_inet_type __read_mostly = {
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 272ae4d6fdf4..133eb4772f12 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1101,22 +1101,11 @@ static const struct seq_operations xt_match_seq_ops = {
1101 1101
1102static int xt_match_open(struct inode *inode, struct file *file) 1102static int xt_match_open(struct inode *inode, struct file *file)
1103{ 1103{
1104 struct seq_file *seq;
1105 struct nf_mttg_trav *trav; 1104 struct nf_mttg_trav *trav;
1106 int ret; 1105 trav = __seq_open_private(file, &xt_match_seq_ops, sizeof(*trav));
1107 1106 if (!trav)
1108 trav = kmalloc(sizeof(*trav), GFP_KERNEL);
1109 if (trav == NULL)
1110 return -ENOMEM; 1107 return -ENOMEM;
1111 1108
1112 ret = seq_open(file, &xt_match_seq_ops);
1113 if (ret < 0) {
1114 kfree(trav);
1115 return ret;
1116 }
1117
1118 seq = file->private_data;
1119 seq->private = trav;
1120 trav->nfproto = (unsigned long)PDE_DATA(inode); 1109 trav->nfproto = (unsigned long)PDE_DATA(inode);
1121 return 0; 1110 return 0;
1122} 1111}
@@ -1165,22 +1154,11 @@ static const struct seq_operations xt_target_seq_ops = {
1165 1154
1166static int xt_target_open(struct inode *inode, struct file *file) 1155static int xt_target_open(struct inode *inode, struct file *file)
1167{ 1156{
1168 struct seq_file *seq;
1169 struct nf_mttg_trav *trav; 1157 struct nf_mttg_trav *trav;
1170 int ret; 1158 trav = __seq_open_private(file, &xt_target_seq_ops, sizeof(*trav));
1171 1159 if (!trav)
1172 trav = kmalloc(sizeof(*trav), GFP_KERNEL);
1173 if (trav == NULL)
1174 return -ENOMEM; 1160 return -ENOMEM;
1175 1161
1176 ret = seq_open(file, &xt_target_seq_ops);
1177 if (ret < 0) {
1178 kfree(trav);
1179 return ret;
1180 }
1181
1182 seq = file->private_data;
1183 seq->private = trav;
1184 trav->nfproto = (unsigned long)PDE_DATA(inode); 1162 trav->nfproto = (unsigned long)PDE_DATA(inode);
1185 return 0; 1163 return 0;
1186} 1164}
diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
index 73b73f687c58..02afaf48a729 100644
--- a/net/netfilter/xt_HMARK.c
+++ b/net/netfilter/xt_HMARK.c
@@ -126,7 +126,7 @@ hmark_hash(struct hmark_tuple *t, const struct xt_hmark_info *info)
126 hash = jhash_3words(src, dst, t->uports.v32, info->hashrnd); 126 hash = jhash_3words(src, dst, t->uports.v32, info->hashrnd);
127 hash = hash ^ (t->proto & info->proto_mask); 127 hash = hash ^ (t->proto & info->proto_mask);
128 128
129 return (((u64)hash * info->hmodulus) >> 32) + info->hoffset; 129 return reciprocal_scale(hash, info->hmodulus) + info->hoffset;
130} 130}
131 131
132static void 132static void
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index 370adf622cef..604df6fae6fc 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -136,7 +136,7 @@ static int xt_rateest_tg_checkentry(const struct xt_tgchk_param *par)
136 cfg.est.interval = info->interval; 136 cfg.est.interval = info->interval;
137 cfg.est.ewma_log = info->ewma_log; 137 cfg.est.ewma_log = info->ewma_log;
138 138
139 ret = gen_new_estimator(&est->bstats, &est->rstats, 139 ret = gen_new_estimator(&est->bstats, NULL, &est->rstats,
140 &est->lock, &cfg.opt); 140 &est->lock, &cfg.opt);
141 if (ret < 0) 141 if (ret < 0)
142 goto err2; 142 goto err2;
diff --git a/net/netfilter/xt_cluster.c b/net/netfilter/xt_cluster.c
index f4af1bfafb1c..96fa26b20b67 100644
--- a/net/netfilter/xt_cluster.c
+++ b/net/netfilter/xt_cluster.c
@@ -55,7 +55,8 @@ xt_cluster_hash(const struct nf_conn *ct,
55 WARN_ON(1); 55 WARN_ON(1);
56 break; 56 break;
57 } 57 }
58 return (((u64)hash * info->total_nodes) >> 32); 58
59 return reciprocal_scale(hash, info->total_nodes);
59} 60}
60 61
61static inline bool 62static inline bool
diff --git a/net/netfilter/xt_connbytes.c b/net/netfilter/xt_connbytes.c
index 1e634615ab9d..d4bec261e74e 100644
--- a/net/netfilter/xt_connbytes.c
+++ b/net/netfilter/xt_connbytes.c
@@ -120,7 +120,7 @@ static int connbytes_mt_check(const struct xt_mtchk_param *par)
120 * accounting is enabled, so complain in the hope that someone notices. 120 * accounting is enabled, so complain in the hope that someone notices.
121 */ 121 */
122 if (!nf_ct_acct_enabled(par->net)) { 122 if (!nf_ct_acct_enabled(par->net)) {
123 pr_warning("Forcing CT accounting to be enabled\n"); 123 pr_warn("Forcing CT accounting to be enabled\n");
124 nf_ct_set_acct(par->net, true); 124 nf_ct_set_acct(par->net, true);
125 } 125 }
126 126
diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 47dc6836830a..05fbc2a0be46 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -135,7 +135,7 @@ hash_dst(const struct xt_hashlimit_htable *ht, const struct dsthash_dst *dst)
135 * give results between [0 and cfg.size-1] and same hash distribution, 135 * give results between [0 and cfg.size-1] and same hash distribution,
136 * but using a multiply, less expensive than a divide 136 * but using a multiply, less expensive than a divide
137 */ 137 */
138 return ((u64)hash * ht->cfg.size) >> 32; 138 return reciprocal_scale(hash, ht->cfg.size);
139} 139}
140 140
141static struct dsthash_ent * 141static struct dsthash_ent *
@@ -943,7 +943,7 @@ static int __init hashlimit_mt_init(void)
943 sizeof(struct dsthash_ent), 0, 0, 943 sizeof(struct dsthash_ent), 0, 0,
944 NULL); 944 NULL);
945 if (!hashlimit_cachep) { 945 if (!hashlimit_cachep) {
946 pr_warning("unable to create slab cache\n"); 946 pr_warn("unable to create slab cache\n");
947 goto err2; 947 goto err2;
948 } 948 }
949 return 0; 949 return 0;
diff --git a/net/netfilter/xt_physdev.c b/net/netfilter/xt_physdev.c
index d7ca16b8b8df..f440f57a452f 100644
--- a/net/netfilter/xt_physdev.c
+++ b/net/netfilter/xt_physdev.c
@@ -13,6 +13,7 @@
13#include <linux/netfilter_bridge.h> 13#include <linux/netfilter_bridge.h>
14#include <linux/netfilter/xt_physdev.h> 14#include <linux/netfilter/xt_physdev.h>
15#include <linux/netfilter/x_tables.h> 15#include <linux/netfilter/x_tables.h>
16#include <net/netfilter/br_netfilter.h>
16 17
17MODULE_LICENSE("GPL"); 18MODULE_LICENSE("GPL");
18MODULE_AUTHOR("Bart De Schuymer <bdschuym@pandora.be>"); 19MODULE_AUTHOR("Bart De Schuymer <bdschuym@pandora.be>");
@@ -87,6 +88,8 @@ static int physdev_mt_check(const struct xt_mtchk_param *par)
87{ 88{
88 const struct xt_physdev_info *info = par->matchinfo; 89 const struct xt_physdev_info *info = par->matchinfo;
89 90
91 br_netfilter_enable();
92
90 if (!(info->bitmask & XT_PHYSDEV_OP_MASK) || 93 if (!(info->bitmask & XT_PHYSDEV_OP_MASK) ||
91 info->bitmask & ~XT_PHYSDEV_OP_MASK) 94 info->bitmask & ~XT_PHYSDEV_OP_MASK)
92 return -EINVAL; 95 return -EINVAL;
diff --git a/net/netfilter/xt_set.c b/net/netfilter/xt_set.c
index 80c2e2d603e0..5732cd64acc0 100644
--- a/net/netfilter/xt_set.c
+++ b/net/netfilter/xt_set.c
@@ -84,13 +84,12 @@ set_match_v0_checkentry(const struct xt_mtchk_param *par)
84 index = ip_set_nfnl_get_byindex(par->net, info->match_set.index); 84 index = ip_set_nfnl_get_byindex(par->net, info->match_set.index);
85 85
86 if (index == IPSET_INVALID_ID) { 86 if (index == IPSET_INVALID_ID) {
87 pr_warning("Cannot find set identified by id %u to match\n", 87 pr_warn("Cannot find set identified by id %u to match\n",
88 info->match_set.index); 88 info->match_set.index);
89 return -ENOENT; 89 return -ENOENT;
90 } 90 }
91 if (info->match_set.u.flags[IPSET_DIM_MAX-1] != 0) { 91 if (info->match_set.u.flags[IPSET_DIM_MAX-1] != 0) {
92 pr_warning("Protocol error: set match dimension " 92 pr_warn("Protocol error: set match dimension is over the limit!\n");
93 "is over the limit!\n");
94 ip_set_nfnl_put(par->net, info->match_set.index); 93 ip_set_nfnl_put(par->net, info->match_set.index);
95 return -ERANGE; 94 return -ERANGE;
96 } 95 }
@@ -134,13 +133,12 @@ set_match_v1_checkentry(const struct xt_mtchk_param *par)
134 index = ip_set_nfnl_get_byindex(par->net, info->match_set.index); 133 index = ip_set_nfnl_get_byindex(par->net, info->match_set.index);
135 134
136 if (index == IPSET_INVALID_ID) { 135 if (index == IPSET_INVALID_ID) {
137 pr_warning("Cannot find set identified by id %u to match\n", 136 pr_warn("Cannot find set identified by id %u to match\n",
138 info->match_set.index); 137 info->match_set.index);
139 return -ENOENT; 138 return -ENOENT;
140 } 139 }
141 if (info->match_set.dim > IPSET_DIM_MAX) { 140 if (info->match_set.dim > IPSET_DIM_MAX) {
142 pr_warning("Protocol error: set match dimension " 141 pr_warn("Protocol error: set match dimension is over the limit!\n");
143 "is over the limit!\n");
144 ip_set_nfnl_put(par->net, info->match_set.index); 142 ip_set_nfnl_put(par->net, info->match_set.index);
145 return -ERANGE; 143 return -ERANGE;
146 } 144 }
@@ -230,8 +228,8 @@ set_target_v0_checkentry(const struct xt_tgchk_param *par)
230 if (info->add_set.index != IPSET_INVALID_ID) { 228 if (info->add_set.index != IPSET_INVALID_ID) {
231 index = ip_set_nfnl_get_byindex(par->net, info->add_set.index); 229 index = ip_set_nfnl_get_byindex(par->net, info->add_set.index);
232 if (index == IPSET_INVALID_ID) { 230 if (index == IPSET_INVALID_ID) {
233 pr_warning("Cannot find add_set index %u as target\n", 231 pr_warn("Cannot find add_set index %u as target\n",
234 info->add_set.index); 232 info->add_set.index);
235 return -ENOENT; 233 return -ENOENT;
236 } 234 }
237 } 235 }
@@ -239,8 +237,8 @@ set_target_v0_checkentry(const struct xt_tgchk_param *par)
239 if (info->del_set.index != IPSET_INVALID_ID) { 237 if (info->del_set.index != IPSET_INVALID_ID) {
240 index = ip_set_nfnl_get_byindex(par->net, info->del_set.index); 238 index = ip_set_nfnl_get_byindex(par->net, info->del_set.index);
241 if (index == IPSET_INVALID_ID) { 239 if (index == IPSET_INVALID_ID) {
242 pr_warning("Cannot find del_set index %u as target\n", 240 pr_warn("Cannot find del_set index %u as target\n",
243 info->del_set.index); 241 info->del_set.index);
244 if (info->add_set.index != IPSET_INVALID_ID) 242 if (info->add_set.index != IPSET_INVALID_ID)
245 ip_set_nfnl_put(par->net, info->add_set.index); 243 ip_set_nfnl_put(par->net, info->add_set.index);
246 return -ENOENT; 244 return -ENOENT;
@@ -248,8 +246,7 @@ set_target_v0_checkentry(const struct xt_tgchk_param *par)
248 } 246 }
249 if (info->add_set.u.flags[IPSET_DIM_MAX-1] != 0 || 247 if (info->add_set.u.flags[IPSET_DIM_MAX-1] != 0 ||
250 info->del_set.u.flags[IPSET_DIM_MAX-1] != 0) { 248 info->del_set.u.flags[IPSET_DIM_MAX-1] != 0) {
251 pr_warning("Protocol error: SET target dimension " 249 pr_warn("Protocol error: SET target dimension is over the limit!\n");
252 "is over the limit!\n");
253 if (info->add_set.index != IPSET_INVALID_ID) 250 if (info->add_set.index != IPSET_INVALID_ID)
254 ip_set_nfnl_put(par->net, info->add_set.index); 251 ip_set_nfnl_put(par->net, info->add_set.index);
255 if (info->del_set.index != IPSET_INVALID_ID) 252 if (info->del_set.index != IPSET_INVALID_ID)
@@ -303,8 +300,8 @@ set_target_v1_checkentry(const struct xt_tgchk_param *par)
303 if (info->add_set.index != IPSET_INVALID_ID) { 300 if (info->add_set.index != IPSET_INVALID_ID) {
304 index = ip_set_nfnl_get_byindex(par->net, info->add_set.index); 301 index = ip_set_nfnl_get_byindex(par->net, info->add_set.index);
305 if (index == IPSET_INVALID_ID) { 302 if (index == IPSET_INVALID_ID) {
306 pr_warning("Cannot find add_set index %u as target\n", 303 pr_warn("Cannot find add_set index %u as target\n",
307 info->add_set.index); 304 info->add_set.index);
308 return -ENOENT; 305 return -ENOENT;
309 } 306 }
310 } 307 }
@@ -312,8 +309,8 @@ set_target_v1_checkentry(const struct xt_tgchk_param *par)
312 if (info->del_set.index != IPSET_INVALID_ID) { 309 if (info->del_set.index != IPSET_INVALID_ID) {
313 index = ip_set_nfnl_get_byindex(par->net, info->del_set.index); 310 index = ip_set_nfnl_get_byindex(par->net, info->del_set.index);
314 if (index == IPSET_INVALID_ID) { 311 if (index == IPSET_INVALID_ID) {
315 pr_warning("Cannot find del_set index %u as target\n", 312 pr_warn("Cannot find del_set index %u as target\n",
316 info->del_set.index); 313 info->del_set.index);
317 if (info->add_set.index != IPSET_INVALID_ID) 314 if (info->add_set.index != IPSET_INVALID_ID)
318 ip_set_nfnl_put(par->net, info->add_set.index); 315 ip_set_nfnl_put(par->net, info->add_set.index);
319 return -ENOENT; 316 return -ENOENT;
@@ -321,8 +318,7 @@ set_target_v1_checkentry(const struct xt_tgchk_param *par)
321 } 318 }
322 if (info->add_set.dim > IPSET_DIM_MAX || 319 if (info->add_set.dim > IPSET_DIM_MAX ||
323 info->del_set.dim > IPSET_DIM_MAX) { 320 info->del_set.dim > IPSET_DIM_MAX) {
324 pr_warning("Protocol error: SET target dimension " 321 pr_warn("Protocol error: SET target dimension is over the limit!\n");
325 "is over the limit!\n");
326 if (info->add_set.index != IPSET_INVALID_ID) 322 if (info->add_set.index != IPSET_INVALID_ID)
327 ip_set_nfnl_put(par->net, info->add_set.index); 323 ip_set_nfnl_put(par->net, info->add_set.index);
328 if (info->del_set.index != IPSET_INVALID_ID) 324 if (info->del_set.index != IPSET_INVALID_ID)
@@ -370,6 +366,140 @@ set_target_v2(struct sk_buff *skb, const struct xt_action_param *par)
370#define set_target_v2_checkentry set_target_v1_checkentry 366#define set_target_v2_checkentry set_target_v1_checkentry
371#define set_target_v2_destroy set_target_v1_destroy 367#define set_target_v2_destroy set_target_v1_destroy
372 368
369/* Revision 3 target */
370
371static unsigned int
372set_target_v3(struct sk_buff *skb, const struct xt_action_param *par)
373{
374 const struct xt_set_info_target_v3 *info = par->targinfo;
375 ADT_OPT(add_opt, par->family, info->add_set.dim,
376 info->add_set.flags, info->flags, info->timeout);
377 ADT_OPT(del_opt, par->family, info->del_set.dim,
378 info->del_set.flags, 0, UINT_MAX);
379 ADT_OPT(map_opt, par->family, info->map_set.dim,
380 info->map_set.flags, 0, UINT_MAX);
381
382 int ret;
383
384 /* Normalize to fit into jiffies */
385 if (add_opt.ext.timeout != IPSET_NO_TIMEOUT &&
386 add_opt.ext.timeout > UINT_MAX/MSEC_PER_SEC)
387 add_opt.ext.timeout = UINT_MAX/MSEC_PER_SEC;
388 if (info->add_set.index != IPSET_INVALID_ID)
389 ip_set_add(info->add_set.index, skb, par, &add_opt);
390 if (info->del_set.index != IPSET_INVALID_ID)
391 ip_set_del(info->del_set.index, skb, par, &del_opt);
392 if (info->map_set.index != IPSET_INVALID_ID) {
393 map_opt.cmdflags |= info->flags & (IPSET_FLAG_MAP_SKBMARK |
394 IPSET_FLAG_MAP_SKBPRIO |
395 IPSET_FLAG_MAP_SKBQUEUE);
396 ret = match_set(info->map_set.index, skb, par, &map_opt,
397 info->map_set.flags & IPSET_INV_MATCH);
398 if (!ret)
399 return XT_CONTINUE;
400 if (map_opt.cmdflags & IPSET_FLAG_MAP_SKBMARK)
401 skb->mark = (skb->mark & ~(map_opt.ext.skbmarkmask))
402 ^ (map_opt.ext.skbmark);
403 if (map_opt.cmdflags & IPSET_FLAG_MAP_SKBPRIO)
404 skb->priority = map_opt.ext.skbprio;
405 if ((map_opt.cmdflags & IPSET_FLAG_MAP_SKBQUEUE) &&
406 skb->dev &&
407 skb->dev->real_num_tx_queues > map_opt.ext.skbqueue)
408 skb_set_queue_mapping(skb, map_opt.ext.skbqueue);
409 }
410 return XT_CONTINUE;
411}
412
413
414static int
415set_target_v3_checkentry(const struct xt_tgchk_param *par)
416{
417 const struct xt_set_info_target_v3 *info = par->targinfo;
418 ip_set_id_t index;
419
420 if (info->add_set.index != IPSET_INVALID_ID) {
421 index = ip_set_nfnl_get_byindex(par->net,
422 info->add_set.index);
423 if (index == IPSET_INVALID_ID) {
424 pr_warn("Cannot find add_set index %u as target\n",
425 info->add_set.index);
426 return -ENOENT;
427 }
428 }
429
430 if (info->del_set.index != IPSET_INVALID_ID) {
431 index = ip_set_nfnl_get_byindex(par->net,
432 info->del_set.index);
433 if (index == IPSET_INVALID_ID) {
434 pr_warn("Cannot find del_set index %u as target\n",
435 info->del_set.index);
436 if (info->add_set.index != IPSET_INVALID_ID)
437 ip_set_nfnl_put(par->net,
438 info->add_set.index);
439 return -ENOENT;
440 }
441 }
442
443 if (info->map_set.index != IPSET_INVALID_ID) {
444 if (strncmp(par->table, "mangle", 7)) {
445 pr_warn("--map-set only usable from mangle table\n");
446 return -EINVAL;
447 }
448 if (((info->flags & IPSET_FLAG_MAP_SKBPRIO) |
449 (info->flags & IPSET_FLAG_MAP_SKBQUEUE)) &&
450 !(par->hook_mask & (1 << NF_INET_FORWARD |
451 1 << NF_INET_LOCAL_OUT |
452 1 << NF_INET_POST_ROUTING))) {
453 pr_warn("mapping of prio or/and queue is allowed only"
454 "from OUTPUT/FORWARD/POSTROUTING chains\n");
455 return -EINVAL;
456 }
457 index = ip_set_nfnl_get_byindex(par->net,
458 info->map_set.index);
459 if (index == IPSET_INVALID_ID) {
460 pr_warn("Cannot find map_set index %u as target\n",
461 info->map_set.index);
462 if (info->add_set.index != IPSET_INVALID_ID)
463 ip_set_nfnl_put(par->net,
464 info->add_set.index);
465 if (info->del_set.index != IPSET_INVALID_ID)
466 ip_set_nfnl_put(par->net,
467 info->del_set.index);
468 return -ENOENT;
469 }
470 }
471
472 if (info->add_set.dim > IPSET_DIM_MAX ||
473 info->del_set.dim > IPSET_DIM_MAX ||
474 info->map_set.dim > IPSET_DIM_MAX) {
475 pr_warn("Protocol error: SET target dimension "
476 "is over the limit!\n");
477 if (info->add_set.index != IPSET_INVALID_ID)
478 ip_set_nfnl_put(par->net, info->add_set.index);
479 if (info->del_set.index != IPSET_INVALID_ID)
480 ip_set_nfnl_put(par->net, info->del_set.index);
481 if (info->map_set.index != IPSET_INVALID_ID)
482 ip_set_nfnl_put(par->net, info->map_set.index);
483 return -ERANGE;
484 }
485
486 return 0;
487}
488
489static void
490set_target_v3_destroy(const struct xt_tgdtor_param *par)
491{
492 const struct xt_set_info_target_v3 *info = par->targinfo;
493
494 if (info->add_set.index != IPSET_INVALID_ID)
495 ip_set_nfnl_put(par->net, info->add_set.index);
496 if (info->del_set.index != IPSET_INVALID_ID)
497 ip_set_nfnl_put(par->net, info->del_set.index);
498 if (info->map_set.index != IPSET_INVALID_ID)
499 ip_set_nfnl_put(par->net, info->map_set.index);
500}
501
502
373static struct xt_match set_matches[] __read_mostly = { 503static struct xt_match set_matches[] __read_mostly = {
374 { 504 {
375 .name = "set", 505 .name = "set",
@@ -497,6 +627,27 @@ static struct xt_target set_targets[] __read_mostly = {
497 .destroy = set_target_v2_destroy, 627 .destroy = set_target_v2_destroy,
498 .me = THIS_MODULE 628 .me = THIS_MODULE
499 }, 629 },
630 /* --map-set support */
631 {
632 .name = "SET",
633 .revision = 3,
634 .family = NFPROTO_IPV4,
635 .target = set_target_v3,
636 .targetsize = sizeof(struct xt_set_info_target_v3),
637 .checkentry = set_target_v3_checkentry,
638 .destroy = set_target_v3_destroy,
639 .me = THIS_MODULE
640 },
641 {
642 .name = "SET",
643 .revision = 3,
644 .family = NFPROTO_IPV6,
645 .target = set_target_v3,
646 .targetsize = sizeof(struct xt_set_info_target_v3),
647 .checkentry = set_target_v3_checkentry,
648 .destroy = set_target_v3_destroy,
649 .me = THIS_MODULE
650 },
500}; 651};
501 652
502static int __init xt_set_init(void) 653static int __init xt_set_init(void)
diff --git a/net/netfilter/xt_string.c b/net/netfilter/xt_string.c
index d3c48b14ab94..5699adb97652 100644
--- a/net/netfilter/xt_string.c
+++ b/net/netfilter/xt_string.c
@@ -29,7 +29,6 @@ string_mt(const struct sk_buff *skb, struct xt_action_param *par)
29 struct ts_state state; 29 struct ts_state state;
30 bool invert; 30 bool invert;
31 31
32 memset(&state, 0, sizeof(struct ts_state));
33 invert = conf->u.v1.flags & XT_STRING_FLAG_INVERT; 32 invert = conf->u.v1.flags & XT_STRING_FLAG_INVERT;
34 33
35 return (skb_find_text((struct sk_buff *)skb, conf->from_offset, 34 return (skb_find_text((struct sk_buff *)skb, conf->from_offset,
diff --git a/net/netlabel/netlabel_user.c b/net/netlabel/netlabel_user.c
index 1e779bb7fa43..adf8b7900da2 100644
--- a/net/netlabel/netlabel_user.c
+++ b/net/netlabel/netlabel_user.c
@@ -71,11 +71,7 @@ int __init netlbl_netlink_init(void)
71 if (ret_val != 0) 71 if (ret_val != 0)
72 return ret_val; 72 return ret_val;
73 73
74 ret_val = netlbl_unlabel_genl_init(); 74 return netlbl_unlabel_genl_init();
75 if (ret_val != 0)
76 return ret_val;
77
78 return 0;
79} 75}
80 76
81/* 77/*
diff --git a/net/nfc/digital_dep.c b/net/nfc/digital_dep.c
index e1638dab076d..b60aa35c074f 100644
--- a/net/nfc/digital_dep.c
+++ b/net/nfc/digital_dep.c
@@ -33,6 +33,8 @@
33#define DIGITAL_ATR_REQ_MAX_SIZE 64 33#define DIGITAL_ATR_REQ_MAX_SIZE 64
34 34
35#define DIGITAL_LR_BITS_PAYLOAD_SIZE_254B 0x30 35#define DIGITAL_LR_BITS_PAYLOAD_SIZE_254B 0x30
36#define DIGITAL_FSL_BITS_PAYLOAD_SIZE_254B \
37 (DIGITAL_LR_BITS_PAYLOAD_SIZE_254B >> 4)
36#define DIGITAL_GB_BIT 0x02 38#define DIGITAL_GB_BIT 0x02
37 39
38#define DIGITAL_NFC_DEP_PFB_TYPE(pfb) ((pfb) & 0xE0) 40#define DIGITAL_NFC_DEP_PFB_TYPE(pfb) ((pfb) & 0xE0)
@@ -127,6 +129,98 @@ static int digital_skb_pull_dep_sod(struct nfc_digital_dev *ddev,
127 return 0; 129 return 0;
128} 130}
129 131
132static void digital_in_recv_psl_res(struct nfc_digital_dev *ddev, void *arg,
133 struct sk_buff *resp)
134{
135 struct nfc_target *target = arg;
136 struct digital_psl_res *psl_res;
137 int rc;
138
139 if (IS_ERR(resp)) {
140 rc = PTR_ERR(resp);
141 resp = NULL;
142 goto exit;
143 }
144
145 rc = ddev->skb_check_crc(resp);
146 if (rc) {
147 PROTOCOL_ERR("14.4.1.6");
148 goto exit;
149 }
150
151 rc = digital_skb_pull_dep_sod(ddev, resp);
152 if (rc) {
153 PROTOCOL_ERR("14.4.1.2");
154 goto exit;
155 }
156
157 psl_res = (struct digital_psl_res *)resp->data;
158
159 if ((resp->len != sizeof(*psl_res)) ||
160 (psl_res->dir != DIGITAL_NFC_DEP_FRAME_DIR_IN) ||
161 (psl_res->cmd != DIGITAL_CMD_PSL_RES)) {
162 rc = -EIO;
163 goto exit;
164 }
165
166 rc = digital_in_configure_hw(ddev, NFC_DIGITAL_CONFIG_RF_TECH,
167 NFC_DIGITAL_RF_TECH_424F);
168 if (rc)
169 goto exit;
170
171 rc = digital_in_configure_hw(ddev, NFC_DIGITAL_CONFIG_FRAMING,
172 NFC_DIGITAL_FRAMING_NFCF_NFC_DEP);
173 if (rc)
174 goto exit;
175
176 if (!DIGITAL_DRV_CAPS_IN_CRC(ddev) &&
177 (ddev->curr_rf_tech == NFC_DIGITAL_RF_TECH_106A)) {
178 ddev->skb_add_crc = digital_skb_add_crc_f;
179 ddev->skb_check_crc = digital_skb_check_crc_f;
180 }
181
182 ddev->curr_rf_tech = NFC_DIGITAL_RF_TECH_424F;
183
184 nfc_dep_link_is_up(ddev->nfc_dev, target->idx, NFC_COMM_ACTIVE,
185 NFC_RF_INITIATOR);
186
187 ddev->curr_nfc_dep_pni = 0;
188
189exit:
190 dev_kfree_skb(resp);
191
192 if (rc)
193 ddev->curr_protocol = 0;
194}
195
196static int digital_in_send_psl_req(struct nfc_digital_dev *ddev,
197 struct nfc_target *target)
198{
199 struct sk_buff *skb;
200 struct digital_psl_req *psl_req;
201
202 skb = digital_skb_alloc(ddev, sizeof(*psl_req));
203 if (!skb)
204 return -ENOMEM;
205
206 skb_put(skb, sizeof(*psl_req));
207
208 psl_req = (struct digital_psl_req *)skb->data;
209
210 psl_req->dir = DIGITAL_NFC_DEP_FRAME_DIR_OUT;
211 psl_req->cmd = DIGITAL_CMD_PSL_REQ;
212 psl_req->did = 0;
213 psl_req->brs = (0x2 << 3) | 0x2; /* 424F both directions */
214 psl_req->fsl = DIGITAL_FSL_BITS_PAYLOAD_SIZE_254B;
215
216 digital_skb_push_dep_sod(ddev, skb);
217
218 ddev->skb_add_crc(skb);
219
220 return digital_in_send_cmd(ddev, skb, 500, digital_in_recv_psl_res,
221 target);
222}
223
130static void digital_in_recv_atr_res(struct nfc_digital_dev *ddev, void *arg, 224static void digital_in_recv_atr_res(struct nfc_digital_dev *ddev, void *arg,
131 struct sk_buff *resp) 225 struct sk_buff *resp)
132{ 226{
@@ -166,6 +260,13 @@ static void digital_in_recv_atr_res(struct nfc_digital_dev *ddev, void *arg,
166 if (rc) 260 if (rc)
167 goto exit; 261 goto exit;
168 262
263 if ((ddev->protocols & NFC_PROTO_FELICA_MASK) &&
264 (ddev->curr_rf_tech != NFC_DIGITAL_RF_TECH_424F)) {
265 rc = digital_in_send_psl_req(ddev, target);
266 if (!rc)
267 goto exit;
268 }
269
169 rc = nfc_dep_link_is_up(ddev->nfc_dev, target->idx, NFC_COMM_ACTIVE, 270 rc = nfc_dep_link_is_up(ddev->nfc_dev, target->idx, NFC_COMM_ACTIVE,
170 NFC_RF_INITIATOR); 271 NFC_RF_INITIATOR);
171 272
diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c
index 2b400e1a8695..90b16cb40058 100644
--- a/net/nfc/nci/core.c
+++ b/net/nfc/nci/core.c
@@ -231,6 +231,14 @@ static void nci_rf_discover_req(struct nci_dev *ndev, unsigned long opt)
231 cmd.num_disc_configs++; 231 cmd.num_disc_configs++;
232 } 232 }
233 233
234 if ((cmd.num_disc_configs < NCI_MAX_NUM_RF_CONFIGS) &&
235 (protocols & NFC_PROTO_ISO15693_MASK)) {
236 cmd.disc_configs[cmd.num_disc_configs].rf_tech_and_mode =
237 NCI_NFC_V_PASSIVE_POLL_MODE;
238 cmd.disc_configs[cmd.num_disc_configs].frequency = 1;
239 cmd.num_disc_configs++;
240 }
241
234 nci_send_cmd(ndev, NCI_OP_RF_DISCOVER_CMD, 242 nci_send_cmd(ndev, NCI_OP_RF_DISCOVER_CMD,
235 (1 + (cmd.num_disc_configs * sizeof(struct disc_config))), 243 (1 + (cmd.num_disc_configs * sizeof(struct disc_config))),
236 &cmd); 244 &cmd);
@@ -751,10 +759,6 @@ int nci_register_device(struct nci_dev *ndev)
751 struct device *dev = &ndev->nfc_dev->dev; 759 struct device *dev = &ndev->nfc_dev->dev;
752 char name[32]; 760 char name[32];
753 761
754 rc = nfc_register_device(ndev->nfc_dev);
755 if (rc)
756 goto exit;
757
758 ndev->flags = 0; 762 ndev->flags = 0;
759 763
760 INIT_WORK(&ndev->cmd_work, nci_cmd_work); 764 INIT_WORK(&ndev->cmd_work, nci_cmd_work);
@@ -762,7 +766,7 @@ int nci_register_device(struct nci_dev *ndev)
762 ndev->cmd_wq = create_singlethread_workqueue(name); 766 ndev->cmd_wq = create_singlethread_workqueue(name);
763 if (!ndev->cmd_wq) { 767 if (!ndev->cmd_wq) {
764 rc = -ENOMEM; 768 rc = -ENOMEM;
765 goto unreg_exit; 769 goto exit;
766 } 770 }
767 771
768 INIT_WORK(&ndev->rx_work, nci_rx_work); 772 INIT_WORK(&ndev->rx_work, nci_rx_work);
@@ -792,6 +796,10 @@ int nci_register_device(struct nci_dev *ndev)
792 796
793 mutex_init(&ndev->req_lock); 797 mutex_init(&ndev->req_lock);
794 798
799 rc = nfc_register_device(ndev->nfc_dev);
800 if (rc)
801 goto destroy_rx_wq_exit;
802
795 goto exit; 803 goto exit;
796 804
797destroy_rx_wq_exit: 805destroy_rx_wq_exit:
@@ -800,9 +808,6 @@ destroy_rx_wq_exit:
800destroy_cmd_wq_exit: 808destroy_cmd_wq_exit:
801 destroy_workqueue(ndev->cmd_wq); 809 destroy_workqueue(ndev->cmd_wq);
802 810
803unreg_exit:
804 nfc_unregister_device(ndev->nfc_dev);
805
806exit: 811exit:
807 return rc; 812 return rc;
808} 813}
diff --git a/net/nfc/nci/data.c b/net/nfc/nci/data.c
index 6c3aef852876..427ef2c7ab68 100644
--- a/net/nfc/nci/data.c
+++ b/net/nfc/nci/data.c
@@ -241,9 +241,12 @@ void nci_rx_data_packet(struct nci_dev *ndev, struct sk_buff *skb)
241 /* strip the nci data header */ 241 /* strip the nci data header */
242 skb_pull(skb, NCI_DATA_HDR_SIZE); 242 skb_pull(skb, NCI_DATA_HDR_SIZE);
243 243
244 if (ndev->target_active_prot == NFC_PROTO_MIFARE) { 244 if (ndev->target_active_prot == NFC_PROTO_MIFARE ||
245 ndev->target_active_prot == NFC_PROTO_JEWEL ||
246 ndev->target_active_prot == NFC_PROTO_FELICA ||
247 ndev->target_active_prot == NFC_PROTO_ISO15693) {
245 /* frame I/F => remove the status byte */ 248 /* frame I/F => remove the status byte */
246 pr_debug("NFC_PROTO_MIFARE => remove the status byte\n"); 249 pr_debug("frame I/F => remove the status byte\n");
247 skb_trim(skb, (skb->len - 1)); 250 skb_trim(skb, (skb->len - 1));
248 } 251 }
249 252
diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c
index df91bb95b12a..205b35f666db 100644
--- a/net/nfc/nci/ntf.c
+++ b/net/nfc/nci/ntf.c
@@ -2,6 +2,7 @@
2 * The NFC Controller Interface is the communication protocol between an 2 * The NFC Controller Interface is the communication protocol between an
3 * NFC Controller (NFCC) and a Device Host (DH). 3 * NFC Controller (NFCC) and a Device Host (DH).
4 * 4 *
5 * Copyright (C) 2014 Marvell International Ltd.
5 * Copyright (C) 2011 Texas Instruments, Inc. 6 * Copyright (C) 2011 Texas Instruments, Inc.
6 * 7 *
7 * Written by Ilan Elias <ilane@ti.com> 8 * Written by Ilan Elias <ilane@ti.com>
@@ -155,6 +156,24 @@ static __u8 *nci_extract_rf_params_nfcf_passive_poll(struct nci_dev *ndev,
155 return data; 156 return data;
156} 157}
157 158
159static __u8 *nci_extract_rf_params_nfcv_passive_poll(struct nci_dev *ndev,
160 struct rf_tech_specific_params_nfcv_poll *nfcv_poll,
161 __u8 *data)
162{
163 ++data;
164 nfcv_poll->dsfid = *data++;
165 memcpy(nfcv_poll->uid, data, NFC_ISO15693_UID_MAXSIZE);
166 data += NFC_ISO15693_UID_MAXSIZE;
167 return data;
168}
169
170__u32 nci_get_prop_rf_protocol(struct nci_dev *ndev, __u8 rf_protocol)
171{
172 if (ndev->ops->get_rfprotocol)
173 return ndev->ops->get_rfprotocol(ndev, rf_protocol);
174 return 0;
175}
176
158static int nci_add_new_protocol(struct nci_dev *ndev, 177static int nci_add_new_protocol(struct nci_dev *ndev,
159 struct nfc_target *target, 178 struct nfc_target *target,
160 __u8 rf_protocol, 179 __u8 rf_protocol,
@@ -164,6 +183,7 @@ static int nci_add_new_protocol(struct nci_dev *ndev,
164 struct rf_tech_specific_params_nfca_poll *nfca_poll; 183 struct rf_tech_specific_params_nfca_poll *nfca_poll;
165 struct rf_tech_specific_params_nfcb_poll *nfcb_poll; 184 struct rf_tech_specific_params_nfcb_poll *nfcb_poll;
166 struct rf_tech_specific_params_nfcf_poll *nfcf_poll; 185 struct rf_tech_specific_params_nfcf_poll *nfcf_poll;
186 struct rf_tech_specific_params_nfcv_poll *nfcv_poll;
167 __u32 protocol; 187 __u32 protocol;
168 188
169 if (rf_protocol == NCI_RF_PROTOCOL_T1T) 189 if (rf_protocol == NCI_RF_PROTOCOL_T1T)
@@ -179,8 +199,10 @@ static int nci_add_new_protocol(struct nci_dev *ndev,
179 protocol = NFC_PROTO_FELICA_MASK; 199 protocol = NFC_PROTO_FELICA_MASK;
180 else if (rf_protocol == NCI_RF_PROTOCOL_NFC_DEP) 200 else if (rf_protocol == NCI_RF_PROTOCOL_NFC_DEP)
181 protocol = NFC_PROTO_NFC_DEP_MASK; 201 protocol = NFC_PROTO_NFC_DEP_MASK;
202 else if (rf_protocol == NCI_RF_PROTOCOL_T5T)
203 protocol = NFC_PROTO_ISO15693_MASK;
182 else 204 else
183 protocol = 0; 205 protocol = nci_get_prop_rf_protocol(ndev, rf_protocol);
184 206
185 if (!(protocol & ndev->poll_prots)) { 207 if (!(protocol & ndev->poll_prots)) {
186 pr_err("the target found does not have the desired protocol\n"); 208 pr_err("the target found does not have the desired protocol\n");
@@ -213,6 +235,12 @@ static int nci_add_new_protocol(struct nci_dev *ndev,
213 memcpy(target->sensf_res, nfcf_poll->sensf_res, 235 memcpy(target->sensf_res, nfcf_poll->sensf_res,
214 target->sensf_res_len); 236 target->sensf_res_len);
215 } 237 }
238 } else if (rf_tech_and_mode == NCI_NFC_V_PASSIVE_POLL_MODE) {
239 nfcv_poll = (struct rf_tech_specific_params_nfcv_poll *)params;
240
241 target->is_iso15693 = 1;
242 target->iso15693_dsfid = nfcv_poll->dsfid;
243 memcpy(target->iso15693_uid, nfcv_poll->uid, NFC_ISO15693_UID_MAXSIZE);
216 } else { 244 } else {
217 pr_err("unsupported rf_tech_and_mode 0x%x\n", rf_tech_and_mode); 245 pr_err("unsupported rf_tech_and_mode 0x%x\n", rf_tech_and_mode);
218 return -EPROTO; 246 return -EPROTO;
@@ -305,6 +333,11 @@ static void nci_rf_discover_ntf_packet(struct nci_dev *ndev,
305 &(ntf.rf_tech_specific_params.nfcf_poll), data); 333 &(ntf.rf_tech_specific_params.nfcf_poll), data);
306 break; 334 break;
307 335
336 case NCI_NFC_V_PASSIVE_POLL_MODE:
337 data = nci_extract_rf_params_nfcv_passive_poll(ndev,
338 &(ntf.rf_tech_specific_params.nfcv_poll), data);
339 break;
340
308 default: 341 default:
309 pr_err("unsupported rf_tech_and_mode 0x%x\n", 342 pr_err("unsupported rf_tech_and_mode 0x%x\n",
310 ntf.rf_tech_and_mode); 343 ntf.rf_tech_and_mode);
@@ -455,6 +488,11 @@ static void nci_rf_intf_activated_ntf_packet(struct nci_dev *ndev,
455 &(ntf.rf_tech_specific_params.nfcf_poll), data); 488 &(ntf.rf_tech_specific_params.nfcf_poll), data);
456 break; 489 break;
457 490
491 case NCI_NFC_V_PASSIVE_POLL_MODE:
492 data = nci_extract_rf_params_nfcv_passive_poll(ndev,
493 &(ntf.rf_tech_specific_params.nfcv_poll), data);
494 break;
495
458 default: 496 default:
459 pr_err("unsupported activation_rf_tech_and_mode 0x%x\n", 497 pr_err("unsupported activation_rf_tech_and_mode 0x%x\n",
460 ntf.activation_rf_tech_and_mode); 498 ntf.activation_rf_tech_and_mode);
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 6ecf491ad509..ba3bb8203b99 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -54,3 +54,14 @@ config OPENVSWITCH_VXLAN
54 Say N to exclude this support and reduce the binary size. 54 Say N to exclude this support and reduce the binary size.
55 55
56 If unsure, say Y. 56 If unsure, say Y.
57
58config OPENVSWITCH_GENEVE
59 bool "Open vSwitch Geneve tunneling support"
60 depends on INET
61 depends on OPENVSWITCH
62 depends on GENEVE && !(OPENVSWITCH=y && GENEVE=m)
63 default y
64 ---help---
65 If you say Y here, then the Open vSwitch will be able create geneve vport.
66
67 Say N to exclude this support and reduce the binary size.
diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
index 3591cb5dae91..9a33a273c375 100644
--- a/net/openvswitch/Makefile
+++ b/net/openvswitch/Makefile
@@ -15,6 +15,10 @@ openvswitch-y := \
15 vport-internal_dev.o \ 15 vport-internal_dev.o \
16 vport-netdev.o 16 vport-netdev.o
17 17
18ifneq ($(CONFIG_OPENVSWITCH_GENEVE),)
19openvswitch-y += vport-geneve.o
20endif
21
18ifneq ($(CONFIG_OPENVSWITCH_VXLAN),) 22ifneq ($(CONFIG_OPENVSWITCH_VXLAN),)
19openvswitch-y += vport-vxlan.o 23openvswitch-y += vport-vxlan.o
20endif 24endif
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 5231652a95d9..006886dbee36 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1,5 +1,5 @@
1/* 1/*
2 * Copyright (c) 2007-2013 Nicira, Inc. 2 * Copyright (c) 2007-2014 Nicira, Inc.
3 * 3 *
4 * This program is free software; you can redistribute it and/or 4 * This program is free software; you can redistribute it and/or
5 * modify it under the terms of version 2 of the GNU General Public 5 * modify it under the terms of version 2 of the GNU General Public
@@ -35,11 +35,78 @@
35#include <net/sctp/checksum.h> 35#include <net/sctp/checksum.h>
36 36
37#include "datapath.h" 37#include "datapath.h"
38#include "flow.h"
38#include "vport.h" 39#include "vport.h"
39 40
40static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, 41static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
42 struct sw_flow_key *key,
41 const struct nlattr *attr, int len); 43 const struct nlattr *attr, int len);
42 44
45struct deferred_action {
46 struct sk_buff *skb;
47 const struct nlattr *actions;
48
49 /* Store pkt_key clone when creating deferred action. */
50 struct sw_flow_key pkt_key;
51};
52
53#define DEFERRED_ACTION_FIFO_SIZE 10
54struct action_fifo {
55 int head;
56 int tail;
57 /* Deferred action fifo queue storage. */
58 struct deferred_action fifo[DEFERRED_ACTION_FIFO_SIZE];
59};
60
61static struct action_fifo __percpu *action_fifos;
62static DEFINE_PER_CPU(int, exec_actions_level);
63
64static void action_fifo_init(struct action_fifo *fifo)
65{
66 fifo->head = 0;
67 fifo->tail = 0;
68}
69
70static bool action_fifo_is_empty(struct action_fifo *fifo)
71{
72 return (fifo->head == fifo->tail);
73}
74
75static struct deferred_action *action_fifo_get(struct action_fifo *fifo)
76{
77 if (action_fifo_is_empty(fifo))
78 return NULL;
79
80 return &fifo->fifo[fifo->tail++];
81}
82
83static struct deferred_action *action_fifo_put(struct action_fifo *fifo)
84{
85 if (fifo->head >= DEFERRED_ACTION_FIFO_SIZE - 1)
86 return NULL;
87
88 return &fifo->fifo[fifo->head++];
89}
90
91/* Return true if fifo is not full */
92static struct deferred_action *add_deferred_actions(struct sk_buff *skb,
93 struct sw_flow_key *key,
94 const struct nlattr *attr)
95{
96 struct action_fifo *fifo;
97 struct deferred_action *da;
98
99 fifo = this_cpu_ptr(action_fifos);
100 da = action_fifo_put(fifo);
101 if (da) {
102 da->skb = skb;
103 da->actions = attr;
104 da->pkt_key = *key;
105 }
106
107 return da;
108}
109
43static int make_writable(struct sk_buff *skb, int write_len) 110static int make_writable(struct sk_buff *skb, int write_len)
44{ 111{
45 if (!pskb_may_pull(skb, write_len)) 112 if (!pskb_may_pull(skb, write_len))
@@ -410,16 +477,14 @@ static int do_output(struct datapath *dp, struct sk_buff *skb, int out_port)
410} 477}
411 478
412static int output_userspace(struct datapath *dp, struct sk_buff *skb, 479static int output_userspace(struct datapath *dp, struct sk_buff *skb,
413 const struct nlattr *attr) 480 struct sw_flow_key *key, const struct nlattr *attr)
414{ 481{
415 struct dp_upcall_info upcall; 482 struct dp_upcall_info upcall;
416 const struct nlattr *a; 483 const struct nlattr *a;
417 int rem; 484 int rem;
418 485
419 BUG_ON(!OVS_CB(skb)->pkt_key);
420
421 upcall.cmd = OVS_PACKET_CMD_ACTION; 486 upcall.cmd = OVS_PACKET_CMD_ACTION;
422 upcall.key = OVS_CB(skb)->pkt_key; 487 upcall.key = key;
423 upcall.userdata = NULL; 488 upcall.userdata = NULL;
424 upcall.portid = 0; 489 upcall.portid = 0;
425 490
@@ -445,11 +510,10 @@ static bool last_action(const struct nlattr *a, int rem)
445} 510}
446 511
447static int sample(struct datapath *dp, struct sk_buff *skb, 512static int sample(struct datapath *dp, struct sk_buff *skb,
448 const struct nlattr *attr) 513 struct sw_flow_key *key, const struct nlattr *attr)
449{ 514{
450 const struct nlattr *acts_list = NULL; 515 const struct nlattr *acts_list = NULL;
451 const struct nlattr *a; 516 const struct nlattr *a;
452 struct sk_buff *sample_skb;
453 int rem; 517 int rem;
454 518
455 for (a = nla_data(attr), rem = nla_len(attr); rem > 0; 519 for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
@@ -469,31 +533,47 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
469 rem = nla_len(acts_list); 533 rem = nla_len(acts_list);
470 a = nla_data(acts_list); 534 a = nla_data(acts_list);
471 535
472 /* Actions list is either empty or only contains a single user-space 536 /* Actions list is empty, do nothing */
473 * action, the latter being a special case as it is the only known 537 if (unlikely(!rem))
474 * usage of the sample action. 538 return 0;
475 * In these special cases don't clone the skb as there are no 539
476 * side-effects in the nested actions. 540 /* The only known usage of sample action is having a single user-space
477 * Otherwise, clone in case the nested actions have side effects. 541 * action. Treat this usage as a special case.
542 * The output_userspace() should clone the skb to be sent to the
543 * user space. This skb will be consumed by its caller.
478 */ 544 */
479 if (likely(rem == 0 || (nla_type(a) == OVS_ACTION_ATTR_USERSPACE && 545 if (likely(nla_type(a) == OVS_ACTION_ATTR_USERSPACE &&
480 last_action(a, rem)))) { 546 last_action(a, rem)))
481 sample_skb = skb; 547 return output_userspace(dp, skb, key, a);
482 skb_get(skb); 548
483 } else { 549 skb = skb_clone(skb, GFP_ATOMIC);
484 sample_skb = skb_clone(skb, GFP_ATOMIC); 550 if (!skb)
485 if (!sample_skb) /* Skip sample action when out of memory. */ 551 /* Skip the sample action when out of memory. */
486 return 0; 552 return 0;
553
554 if (!add_deferred_actions(skb, key, a)) {
555 if (net_ratelimit())
556 pr_warn("%s: deferred actions limit reached, dropping sample action\n",
557 ovs_dp_name(dp));
558
559 kfree_skb(skb);
487 } 560 }
561 return 0;
562}
488 563
489 /* Note that do_execute_actions() never consumes skb. 564static void execute_hash(struct sk_buff *skb, struct sw_flow_key *key,
490 * In the case where skb has been cloned above it is the clone that 565 const struct nlattr *attr)
491 * is consumed. Otherwise the skb_get(skb) call prevents 566{
492 * consumption by do_execute_actions(). Thus, it is safe to simply 567 struct ovs_action_hash *hash_act = nla_data(attr);
493 * return the error code and let the caller (also 568 u32 hash = 0;
494 * do_execute_actions()) free skb on error. 569
495 */ 570 /* OVS_HASH_ALG_L4 is the only possible hash algorithm. */
496 return do_execute_actions(dp, sample_skb, a, rem); 571 hash = skb_get_hash(skb);
572 hash = jhash_1word(hash, hash_act->hash_basis);
573 if (!hash)
574 hash = 0x1;
575
576 key->ovs_flow_hash = hash;
497} 577}
498 578
499static int execute_set_action(struct sk_buff *skb, 579static int execute_set_action(struct sk_buff *skb,
@@ -510,8 +590,8 @@ static int execute_set_action(struct sk_buff *skb,
510 skb->mark = nla_get_u32(nested_attr); 590 skb->mark = nla_get_u32(nested_attr);
511 break; 591 break;
512 592
513 case OVS_KEY_ATTR_IPV4_TUNNEL: 593 case OVS_KEY_ATTR_TUNNEL_INFO:
514 OVS_CB(skb)->tun_key = nla_data(nested_attr); 594 OVS_CB(skb)->egress_tun_info = nla_data(nested_attr);
515 break; 595 break;
516 596
517 case OVS_KEY_ATTR_ETHERNET: 597 case OVS_KEY_ATTR_ETHERNET:
@@ -542,8 +622,47 @@ static int execute_set_action(struct sk_buff *skb,
542 return err; 622 return err;
543} 623}
544 624
625static int execute_recirc(struct datapath *dp, struct sk_buff *skb,
626 struct sw_flow_key *key,
627 const struct nlattr *a, int rem)
628{
629 struct deferred_action *da;
630 int err;
631
632 err = ovs_flow_key_update(skb, key);
633 if (err)
634 return err;
635
636 if (!last_action(a, rem)) {
637 /* Recirc action is the not the last action
638 * of the action list, need to clone the skb.
639 */
640 skb = skb_clone(skb, GFP_ATOMIC);
641
642 /* Skip the recirc action when out of memory, but
643 * continue on with the rest of the action list.
644 */
645 if (!skb)
646 return 0;
647 }
648
649 da = add_deferred_actions(skb, key, NULL);
650 if (da) {
651 da->pkt_key.recirc_id = nla_get_u32(a);
652 } else {
653 kfree_skb(skb);
654
655 if (net_ratelimit())
656 pr_warn("%s: deferred action limit reached, drop recirc action\n",
657 ovs_dp_name(dp));
658 }
659
660 return 0;
661}
662
545/* Execute a list of actions against 'skb'. */ 663/* Execute a list of actions against 'skb'. */
546static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, 664static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
665 struct sw_flow_key *key,
547 const struct nlattr *attr, int len) 666 const struct nlattr *attr, int len)
548{ 667{
549 /* Every output action needs a separate clone of 'skb', but the common 668 /* Every output action needs a separate clone of 'skb', but the common
@@ -569,7 +688,11 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
569 break; 688 break;
570 689
571 case OVS_ACTION_ATTR_USERSPACE: 690 case OVS_ACTION_ATTR_USERSPACE:
572 output_userspace(dp, skb, a); 691 output_userspace(dp, skb, key, a);
692 break;
693
694 case OVS_ACTION_ATTR_HASH:
695 execute_hash(skb, key, a);
573 break; 696 break;
574 697
575 case OVS_ACTION_ATTR_PUSH_VLAN: 698 case OVS_ACTION_ATTR_PUSH_VLAN:
@@ -582,12 +705,23 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
582 err = pop_vlan(skb); 705 err = pop_vlan(skb);
583 break; 706 break;
584 707
708 case OVS_ACTION_ATTR_RECIRC:
709 err = execute_recirc(dp, skb, key, a, rem);
710 if (last_action(a, rem)) {
711 /* If this is the last action, the skb has
712 * been consumed or freed.
713 * Return immediately.
714 */
715 return err;
716 }
717 break;
718
585 case OVS_ACTION_ATTR_SET: 719 case OVS_ACTION_ATTR_SET:
586 err = execute_set_action(skb, nla_data(a)); 720 err = execute_set_action(skb, nla_data(a));
587 break; 721 break;
588 722
589 case OVS_ACTION_ATTR_SAMPLE: 723 case OVS_ACTION_ATTR_SAMPLE:
590 err = sample(dp, skb, a); 724 err = sample(dp, skb, key, a);
591 if (unlikely(err)) /* skb already freed. */ 725 if (unlikely(err)) /* skb already freed. */
592 return err; 726 return err;
593 break; 727 break;
@@ -607,11 +741,64 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
607 return 0; 741 return 0;
608} 742}
609 743
744static void process_deferred_actions(struct datapath *dp)
745{
746 struct action_fifo *fifo = this_cpu_ptr(action_fifos);
747
748 /* Do not touch the FIFO in case there is no deferred actions. */
749 if (action_fifo_is_empty(fifo))
750 return;
751
752 /* Finishing executing all deferred actions. */
753 do {
754 struct deferred_action *da = action_fifo_get(fifo);
755 struct sk_buff *skb = da->skb;
756 struct sw_flow_key *key = &da->pkt_key;
757 const struct nlattr *actions = da->actions;
758
759 if (actions)
760 do_execute_actions(dp, skb, key, actions,
761 nla_len(actions));
762 else
763 ovs_dp_process_packet(skb, key);
764 } while (!action_fifo_is_empty(fifo));
765
766 /* Reset FIFO for the next packet. */
767 action_fifo_init(fifo);
768}
769
610/* Execute a list of actions against 'skb'. */ 770/* Execute a list of actions against 'skb'. */
611int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb) 771int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
772 struct sw_flow_key *key)
612{ 773{
613 struct sw_flow_actions *acts = rcu_dereference(OVS_CB(skb)->flow->sf_acts); 774 int level = this_cpu_read(exec_actions_level);
775 struct sw_flow_actions *acts;
776 int err;
777
778 acts = rcu_dereference(OVS_CB(skb)->flow->sf_acts);
779
780 this_cpu_inc(exec_actions_level);
781 OVS_CB(skb)->egress_tun_info = NULL;
782 err = do_execute_actions(dp, skb, key,
783 acts->actions, acts->actions_len);
614 784
615 OVS_CB(skb)->tun_key = NULL; 785 if (!level)
616 return do_execute_actions(dp, skb, acts->actions, acts->actions_len); 786 process_deferred_actions(dp);
787
788 this_cpu_dec(exec_actions_level);
789 return err;
790}
791
792int action_fifos_init(void)
793{
794 action_fifos = alloc_percpu(struct action_fifo);
795 if (!action_fifos)
796 return -ENOMEM;
797
798 return 0;
799}
800
801void action_fifos_exit(void)
802{
803 free_percpu(action_fifos);
617} 804}
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 64dc864a417f..2e31d9e7f4dc 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -157,7 +157,7 @@ static struct datapath *get_dp(struct net *net, int dp_ifindex)
157} 157}
158 158
159/* Must be called with rcu_read_lock or ovs_mutex. */ 159/* Must be called with rcu_read_lock or ovs_mutex. */
160static const char *ovs_dp_name(const struct datapath *dp) 160const char *ovs_dp_name(const struct datapath *dp)
161{ 161{
162 struct vport *vport = ovs_vport_ovsl_rcu(dp, OVSP_LOCAL); 162 struct vport *vport = ovs_vport_ovsl_rcu(dp, OVSP_LOCAL);
163 return vport->ops->get_name(vport); 163 return vport->ops->get_name(vport);
@@ -238,32 +238,25 @@ void ovs_dp_detach_port(struct vport *p)
238} 238}
239 239
240/* Must be called with rcu_read_lock. */ 240/* Must be called with rcu_read_lock. */
241void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb) 241void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
242{ 242{
243 const struct vport *p = OVS_CB(skb)->input_vport;
243 struct datapath *dp = p->dp; 244 struct datapath *dp = p->dp;
244 struct sw_flow *flow; 245 struct sw_flow *flow;
245 struct dp_stats_percpu *stats; 246 struct dp_stats_percpu *stats;
246 struct sw_flow_key key;
247 u64 *stats_counter; 247 u64 *stats_counter;
248 u32 n_mask_hit; 248 u32 n_mask_hit;
249 int error;
250 249
251 stats = this_cpu_ptr(dp->stats_percpu); 250 stats = this_cpu_ptr(dp->stats_percpu);
252 251
253 /* Extract flow from 'skb' into 'key'. */
254 error = ovs_flow_extract(skb, p->port_no, &key);
255 if (unlikely(error)) {
256 kfree_skb(skb);
257 return;
258 }
259
260 /* Look up flow. */ 252 /* Look up flow. */
261 flow = ovs_flow_tbl_lookup_stats(&dp->table, &key, &n_mask_hit); 253 flow = ovs_flow_tbl_lookup_stats(&dp->table, key, &n_mask_hit);
262 if (unlikely(!flow)) { 254 if (unlikely(!flow)) {
263 struct dp_upcall_info upcall; 255 struct dp_upcall_info upcall;
256 int error;
264 257
265 upcall.cmd = OVS_PACKET_CMD_MISS; 258 upcall.cmd = OVS_PACKET_CMD_MISS;
266 upcall.key = &key; 259 upcall.key = key;
267 upcall.userdata = NULL; 260 upcall.userdata = NULL;
268 upcall.portid = ovs_vport_find_upcall_portid(p, skb); 261 upcall.portid = ovs_vport_find_upcall_portid(p, skb);
269 error = ovs_dp_upcall(dp, skb, &upcall); 262 error = ovs_dp_upcall(dp, skb, &upcall);
@@ -276,10 +269,9 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
276 } 269 }
277 270
278 OVS_CB(skb)->flow = flow; 271 OVS_CB(skb)->flow = flow;
279 OVS_CB(skb)->pkt_key = &key;
280 272
281 ovs_flow_stats_update(OVS_CB(skb)->flow, key.tp.flags, skb); 273 ovs_flow_stats_update(OVS_CB(skb)->flow, key->tp.flags, skb);
282 ovs_execute_actions(dp, skb); 274 ovs_execute_actions(dp, skb, key);
283 stats_counter = &stats->n_hit; 275 stats_counter = &stats->n_hit;
284 276
285out: 277out:
@@ -377,6 +369,8 @@ static size_t key_attr_size(void)
377 + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TTL */ 369 + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TTL */
378 + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */ 370 + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */
379 + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_CSUM */ 371 + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_CSUM */
372 + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_OAM */
373 + nla_total_size(256) /* OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS */
380 + nla_total_size(4) /* OVS_KEY_ATTR_IN_PORT */ 374 + nla_total_size(4) /* OVS_KEY_ATTR_IN_PORT */
381 + nla_total_size(4) /* OVS_KEY_ATTR_SKB_MARK */ 375 + nla_total_size(4) /* OVS_KEY_ATTR_SKB_MARK */
382 + nla_total_size(12) /* OVS_KEY_ATTR_ETHERNET */ 376 + nla_total_size(12) /* OVS_KEY_ATTR_ETHERNET */
@@ -516,6 +510,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
516 struct sw_flow *flow; 510 struct sw_flow *flow;
517 struct datapath *dp; 511 struct datapath *dp;
518 struct ethhdr *eth; 512 struct ethhdr *eth;
513 struct vport *input_vport;
519 int len; 514 int len;
520 int err; 515 int err;
521 516
@@ -550,13 +545,11 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
550 if (IS_ERR(flow)) 545 if (IS_ERR(flow))
551 goto err_kfree_skb; 546 goto err_kfree_skb;
552 547
553 err = ovs_flow_extract(packet, -1, &flow->key); 548 err = ovs_flow_key_extract_userspace(a[OVS_PACKET_ATTR_KEY], packet,
549 &flow->key);
554 if (err) 550 if (err)
555 goto err_flow_free; 551 goto err_flow_free;
556 552
557 err = ovs_nla_get_flow_metadata(flow, a[OVS_PACKET_ATTR_KEY]);
558 if (err)
559 goto err_flow_free;
560 acts = ovs_nla_alloc_flow_actions(nla_len(a[OVS_PACKET_ATTR_ACTIONS])); 553 acts = ovs_nla_alloc_flow_actions(nla_len(a[OVS_PACKET_ATTR_ACTIONS]));
561 err = PTR_ERR(acts); 554 err = PTR_ERR(acts);
562 if (IS_ERR(acts)) 555 if (IS_ERR(acts))
@@ -564,12 +557,13 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
564 557
565 err = ovs_nla_copy_actions(a[OVS_PACKET_ATTR_ACTIONS], 558 err = ovs_nla_copy_actions(a[OVS_PACKET_ATTR_ACTIONS],
566 &flow->key, 0, &acts); 559 &flow->key, 0, &acts);
567 rcu_assign_pointer(flow->sf_acts, acts);
568 if (err) 560 if (err)
569 goto err_flow_free; 561 goto err_flow_free;
570 562
563 rcu_assign_pointer(flow->sf_acts, acts);
564
565 OVS_CB(packet)->egress_tun_info = NULL;
571 OVS_CB(packet)->flow = flow; 566 OVS_CB(packet)->flow = flow;
572 OVS_CB(packet)->pkt_key = &flow->key;
573 packet->priority = flow->key.phy.priority; 567 packet->priority = flow->key.phy.priority;
574 packet->mark = flow->key.phy.skb_mark; 568 packet->mark = flow->key.phy.skb_mark;
575 569
@@ -579,8 +573,17 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
579 if (!dp) 573 if (!dp)
580 goto err_unlock; 574 goto err_unlock;
581 575
576 input_vport = ovs_vport_rcu(dp, flow->key.phy.in_port);
577 if (!input_vport)
578 input_vport = ovs_vport_rcu(dp, OVSP_LOCAL);
579
580 if (!input_vport)
581 goto err_unlock;
582
583 OVS_CB(packet)->input_vport = input_vport;
584
582 local_bh_disable(); 585 local_bh_disable();
583 err = ovs_execute_actions(dp, packet); 586 err = ovs_execute_actions(dp, packet, &flow->key);
584 local_bh_enable(); 587 local_bh_enable();
585 rcu_read_unlock(); 588 rcu_read_unlock();
586 589
@@ -933,11 +936,34 @@ error:
933 return error; 936 return error;
934} 937}
935 938
939static struct sw_flow_actions *get_flow_actions(const struct nlattr *a,
940 const struct sw_flow_key *key,
941 const struct sw_flow_mask *mask)
942{
943 struct sw_flow_actions *acts;
944 struct sw_flow_key masked_key;
945 int error;
946
947 acts = ovs_nla_alloc_flow_actions(nla_len(a));
948 if (IS_ERR(acts))
949 return acts;
950
951 ovs_flow_mask_key(&masked_key, key, mask);
952 error = ovs_nla_copy_actions(a, &masked_key, 0, &acts);
953 if (error) {
954 OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
955 kfree(acts);
956 return ERR_PTR(error);
957 }
958
959 return acts;
960}
961
936static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info) 962static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
937{ 963{
938 struct nlattr **a = info->attrs; 964 struct nlattr **a = info->attrs;
939 struct ovs_header *ovs_header = info->userhdr; 965 struct ovs_header *ovs_header = info->userhdr;
940 struct sw_flow_key key, masked_key; 966 struct sw_flow_key key;
941 struct sw_flow *flow; 967 struct sw_flow *flow;
942 struct sw_flow_mask mask; 968 struct sw_flow_mask mask;
943 struct sk_buff *reply = NULL; 969 struct sk_buff *reply = NULL;
@@ -959,17 +985,10 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
959 985
960 /* Validate actions. */ 986 /* Validate actions. */
961 if (a[OVS_FLOW_ATTR_ACTIONS]) { 987 if (a[OVS_FLOW_ATTR_ACTIONS]) {
962 acts = ovs_nla_alloc_flow_actions(nla_len(a[OVS_FLOW_ATTR_ACTIONS])); 988 acts = get_flow_actions(a[OVS_FLOW_ATTR_ACTIONS], &key, &mask);
963 error = PTR_ERR(acts); 989 if (IS_ERR(acts)) {
964 if (IS_ERR(acts)) 990 error = PTR_ERR(acts);
965 goto error; 991 goto error;
966
967 ovs_flow_mask_key(&masked_key, &key, &mask);
968 error = ovs_nla_copy_actions(a[OVS_FLOW_ATTR_ACTIONS],
969 &masked_key, 0, &acts);
970 if (error) {
971 OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
972 goto err_kfree_acts;
973 } 992 }
974 } 993 }
975 994
@@ -2067,10 +2086,14 @@ static int __init dp_init(void)
2067 2086
2068 pr_info("Open vSwitch switching datapath\n"); 2087 pr_info("Open vSwitch switching datapath\n");
2069 2088
2070 err = ovs_internal_dev_rtnl_link_register(); 2089 err = action_fifos_init();
2071 if (err) 2090 if (err)
2072 goto error; 2091 goto error;
2073 2092
2093 err = ovs_internal_dev_rtnl_link_register();
2094 if (err)
2095 goto error_action_fifos_exit;
2096
2074 err = ovs_flow_init(); 2097 err = ovs_flow_init();
2075 if (err) 2098 if (err)
2076 goto error_unreg_rtnl_link; 2099 goto error_unreg_rtnl_link;
@@ -2103,6 +2126,8 @@ error_flow_exit:
2103 ovs_flow_exit(); 2126 ovs_flow_exit();
2104error_unreg_rtnl_link: 2127error_unreg_rtnl_link:
2105 ovs_internal_dev_rtnl_link_unregister(); 2128 ovs_internal_dev_rtnl_link_unregister();
2129error_action_fifos_exit:
2130 action_fifos_exit();
2106error: 2131error:
2107 return err; 2132 return err;
2108} 2133}
@@ -2116,6 +2141,7 @@ static void dp_cleanup(void)
2116 ovs_vport_exit(); 2141 ovs_vport_exit();
2117 ovs_flow_exit(); 2142 ovs_flow_exit();
2118 ovs_internal_dev_rtnl_link_unregister(); 2143 ovs_internal_dev_rtnl_link_unregister();
2144 action_fifos_exit();
2119} 2145}
2120 2146
2121module_init(dp_init); 2147module_init(dp_init);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 701b5738c38a..974135439c5c 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -1,5 +1,5 @@
1/* 1/*
2 * Copyright (c) 2007-2012 Nicira, Inc. 2 * Copyright (c) 2007-2014 Nicira, Inc.
3 * 3 *
4 * This program is free software; you can redistribute it and/or 4 * This program is free software; you can redistribute it and/or
5 * modify it under the terms of version 2 of the GNU General Public 5 * modify it under the terms of version 2 of the GNU General Public
@@ -95,14 +95,15 @@ struct datapath {
95/** 95/**
96 * struct ovs_skb_cb - OVS data in skb CB 96 * struct ovs_skb_cb - OVS data in skb CB
97 * @flow: The flow associated with this packet. May be %NULL if no flow. 97 * @flow: The flow associated with this packet. May be %NULL if no flow.
98 * @pkt_key: The flow information extracted from the packet. Must be nonnull. 98 * @egress_tun_key: Tunnel information about this packet on egress path.
99 * @tun_key: Key for the tunnel that encapsulated this packet. NULL if the 99 * NULL if the packet is not being tunneled.
100 * packet is not being tunneled. 100 * @input_vport: The original vport packet came in on. This value is cached
101 * when a packet is received by OVS.
101 */ 102 */
102struct ovs_skb_cb { 103struct ovs_skb_cb {
103 struct sw_flow *flow; 104 struct sw_flow *flow;
104 struct sw_flow_key *pkt_key; 105 struct ovs_tunnel_info *egress_tun_info;
105 struct ovs_key_ipv4_tunnel *tun_key; 106 struct vport *input_vport;
106}; 107};
107#define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb) 108#define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)
108 109
@@ -183,17 +184,23 @@ static inline struct vport *ovs_vport_ovsl(const struct datapath *dp, int port_n
183extern struct notifier_block ovs_dp_device_notifier; 184extern struct notifier_block ovs_dp_device_notifier;
184extern struct genl_family dp_vport_genl_family; 185extern struct genl_family dp_vport_genl_family;
185 186
186void ovs_dp_process_received_packet(struct vport *, struct sk_buff *); 187void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key);
187void ovs_dp_detach_port(struct vport *); 188void ovs_dp_detach_port(struct vport *);
188int ovs_dp_upcall(struct datapath *, struct sk_buff *, 189int ovs_dp_upcall(struct datapath *, struct sk_buff *,
189 const struct dp_upcall_info *); 190 const struct dp_upcall_info *);
190 191
192const char *ovs_dp_name(const struct datapath *dp);
191struct sk_buff *ovs_vport_cmd_build_info(struct vport *, u32 pid, u32 seq, 193struct sk_buff *ovs_vport_cmd_build_info(struct vport *, u32 pid, u32 seq,
192 u8 cmd); 194 u8 cmd);
193 195
194int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb); 196int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
197 struct sw_flow_key *);
198
195void ovs_dp_notify_wq(struct work_struct *work); 199void ovs_dp_notify_wq(struct work_struct *work);
196 200
201int action_fifos_init(void);
202void action_fifos_exit(void);
203
197#define OVS_NLERR(fmt, ...) \ 204#define OVS_NLERR(fmt, ...) \
198do { \ 205do { \
199 if (net_ratelimit()) \ 206 if (net_ratelimit()) \
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index d07ab538fc9d..62db02ba36bc 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -1,5 +1,5 @@
1/* 1/*
2 * Copyright (c) 2007-2013 Nicira, Inc. 2 * Copyright (c) 2007-2014 Nicira, Inc.
3 * 3 *
4 * This program is free software; you can redistribute it and/or 4 * This program is free software; you can redistribute it and/or
5 * modify it under the terms of version 2 of the GNU General Public 5 * modify it under the terms of version 2 of the GNU General Public
@@ -16,8 +16,6 @@
16 * 02110-1301, USA 16 * 02110-1301, USA
17 */ 17 */
18 18
19#include "flow.h"
20#include "datapath.h"
21#include <linux/uaccess.h> 19#include <linux/uaccess.h>
22#include <linux/netdevice.h> 20#include <linux/netdevice.h>
23#include <linux/etherdevice.h> 21#include <linux/etherdevice.h>
@@ -46,6 +44,10 @@
46#include <net/ipv6.h> 44#include <net/ipv6.h>
47#include <net/ndisc.h> 45#include <net/ndisc.h>
48 46
47#include "datapath.h"
48#include "flow.h"
49#include "flow_netlink.h"
50
49u64 ovs_flow_used_time(unsigned long flow_jiffies) 51u64 ovs_flow_used_time(unsigned long flow_jiffies)
50{ 52{
51 struct timespec cur_ts; 53 struct timespec cur_ts;
@@ -89,7 +91,7 @@ void ovs_flow_stats_update(struct sw_flow *flow, __be16 tcp_flags,
89 * allocated stats as we have already locked them. 91 * allocated stats as we have already locked them.
90 */ 92 */
91 if (likely(flow->stats_last_writer != NUMA_NO_NODE) 93 if (likely(flow->stats_last_writer != NUMA_NO_NODE)
92 && likely(!rcu_dereference(flow->stats[node]))) { 94 && likely(!rcu_access_pointer(flow->stats[node]))) {
93 /* Try to allocate node-specific stats. */ 95 /* Try to allocate node-specific stats. */
94 struct flow_stats *new_stats; 96 struct flow_stats *new_stats;
95 97
@@ -420,10 +422,9 @@ invalid:
420} 422}
421 423
422/** 424/**
423 * ovs_flow_extract - extracts a flow key from an Ethernet frame. 425 * key_extract - extracts a flow key from an Ethernet frame.
424 * @skb: sk_buff that contains the frame, with skb->data pointing to the 426 * @skb: sk_buff that contains the frame, with skb->data pointing to the
425 * Ethernet header 427 * Ethernet header
426 * @in_port: port number on which @skb was received.
427 * @key: output flow key 428 * @key: output flow key
428 * 429 *
429 * The caller must ensure that skb->len >= ETH_HLEN. 430 * The caller must ensure that skb->len >= ETH_HLEN.
@@ -442,18 +443,13 @@ invalid:
442 * of a correct length, otherwise the same as skb->network_header. 443 * of a correct length, otherwise the same as skb->network_header.
443 * For other key->eth.type values it is left untouched. 444 * For other key->eth.type values it is left untouched.
444 */ 445 */
445int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key) 446static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
446{ 447{
447 int error; 448 int error;
448 struct ethhdr *eth; 449 struct ethhdr *eth;
449 450
450 memset(key, 0, sizeof(*key)); 451 /* Flags are always used as part of stats */
451 452 key->tp.flags = 0;
452 key->phy.priority = skb->priority;
453 if (OVS_CB(skb)->tun_key)
454 memcpy(&key->tun_key, OVS_CB(skb)->tun_key, sizeof(key->tun_key));
455 key->phy.in_port = in_port;
456 key->phy.skb_mark = skb->mark;
457 453
458 skb_reset_mac_header(skb); 454 skb_reset_mac_header(skb);
459 455
@@ -469,6 +465,7 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
469 * update skb->csum here. 465 * update skb->csum here.
470 */ 466 */
471 467
468 key->eth.tci = 0;
472 if (vlan_tx_tag_present(skb)) 469 if (vlan_tx_tag_present(skb))
473 key->eth.tci = htons(skb->vlan_tci); 470 key->eth.tci = htons(skb->vlan_tci);
474 else if (eth->h_proto == htons(ETH_P_8021Q)) 471 else if (eth->h_proto == htons(ETH_P_8021Q))
@@ -489,6 +486,8 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
489 486
490 error = check_iphdr(skb); 487 error = check_iphdr(skb);
491 if (unlikely(error)) { 488 if (unlikely(error)) {
489 memset(&key->ip, 0, sizeof(key->ip));
490 memset(&key->ipv4, 0, sizeof(key->ipv4));
492 if (error == -EINVAL) { 491 if (error == -EINVAL) {
493 skb->transport_header = skb->network_header; 492 skb->transport_header = skb->network_header;
494 error = 0; 493 error = 0;
@@ -510,8 +509,10 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
510 return 0; 509 return 0;
511 } 510 }
512 if (nh->frag_off & htons(IP_MF) || 511 if (nh->frag_off & htons(IP_MF) ||
513 skb_shinfo(skb)->gso_type & SKB_GSO_UDP) 512 skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
514 key->ip.frag = OVS_FRAG_TYPE_FIRST; 513 key->ip.frag = OVS_FRAG_TYPE_FIRST;
514 else
515 key->ip.frag = OVS_FRAG_TYPE_NONE;
515 516
516 /* Transport layer. */ 517 /* Transport layer. */
517 if (key->ip.proto == IPPROTO_TCP) { 518 if (key->ip.proto == IPPROTO_TCP) {
@@ -520,18 +521,25 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
520 key->tp.src = tcp->source; 521 key->tp.src = tcp->source;
521 key->tp.dst = tcp->dest; 522 key->tp.dst = tcp->dest;
522 key->tp.flags = TCP_FLAGS_BE16(tcp); 523 key->tp.flags = TCP_FLAGS_BE16(tcp);
524 } else {
525 memset(&key->tp, 0, sizeof(key->tp));
523 } 526 }
527
524 } else if (key->ip.proto == IPPROTO_UDP) { 528 } else if (key->ip.proto == IPPROTO_UDP) {
525 if (udphdr_ok(skb)) { 529 if (udphdr_ok(skb)) {
526 struct udphdr *udp = udp_hdr(skb); 530 struct udphdr *udp = udp_hdr(skb);
527 key->tp.src = udp->source; 531 key->tp.src = udp->source;
528 key->tp.dst = udp->dest; 532 key->tp.dst = udp->dest;
533 } else {
534 memset(&key->tp, 0, sizeof(key->tp));
529 } 535 }
530 } else if (key->ip.proto == IPPROTO_SCTP) { 536 } else if (key->ip.proto == IPPROTO_SCTP) {
531 if (sctphdr_ok(skb)) { 537 if (sctphdr_ok(skb)) {
532 struct sctphdr *sctp = sctp_hdr(skb); 538 struct sctphdr *sctp = sctp_hdr(skb);
533 key->tp.src = sctp->source; 539 key->tp.src = sctp->source;
534 key->tp.dst = sctp->dest; 540 key->tp.dst = sctp->dest;
541 } else {
542 memset(&key->tp, 0, sizeof(key->tp));
535 } 543 }
536 } else if (key->ip.proto == IPPROTO_ICMP) { 544 } else if (key->ip.proto == IPPROTO_ICMP) {
537 if (icmphdr_ok(skb)) { 545 if (icmphdr_ok(skb)) {
@@ -541,33 +549,44 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
541 * them in 16-bit network byte order. */ 549 * them in 16-bit network byte order. */
542 key->tp.src = htons(icmp->type); 550 key->tp.src = htons(icmp->type);
543 key->tp.dst = htons(icmp->code); 551 key->tp.dst = htons(icmp->code);
552 } else {
553 memset(&key->tp, 0, sizeof(key->tp));
544 } 554 }
545 } 555 }
546 556
547 } else if ((key->eth.type == htons(ETH_P_ARP) || 557 } else if (key->eth.type == htons(ETH_P_ARP) ||
548 key->eth.type == htons(ETH_P_RARP)) && arphdr_ok(skb)) { 558 key->eth.type == htons(ETH_P_RARP)) {
549 struct arp_eth_header *arp; 559 struct arp_eth_header *arp;
550 560
551 arp = (struct arp_eth_header *)skb_network_header(skb); 561 arp = (struct arp_eth_header *)skb_network_header(skb);
552 562
553 if (arp->ar_hrd == htons(ARPHRD_ETHER) 563 if (arphdr_ok(skb) &&
554 && arp->ar_pro == htons(ETH_P_IP) 564 arp->ar_hrd == htons(ARPHRD_ETHER) &&
555 && arp->ar_hln == ETH_ALEN 565 arp->ar_pro == htons(ETH_P_IP) &&
556 && arp->ar_pln == 4) { 566 arp->ar_hln == ETH_ALEN &&
567 arp->ar_pln == 4) {
557 568
558 /* We only match on the lower 8 bits of the opcode. */ 569 /* We only match on the lower 8 bits of the opcode. */
559 if (ntohs(arp->ar_op) <= 0xff) 570 if (ntohs(arp->ar_op) <= 0xff)
560 key->ip.proto = ntohs(arp->ar_op); 571 key->ip.proto = ntohs(arp->ar_op);
572 else
573 key->ip.proto = 0;
574
561 memcpy(&key->ipv4.addr.src, arp->ar_sip, sizeof(key->ipv4.addr.src)); 575 memcpy(&key->ipv4.addr.src, arp->ar_sip, sizeof(key->ipv4.addr.src));
562 memcpy(&key->ipv4.addr.dst, arp->ar_tip, sizeof(key->ipv4.addr.dst)); 576 memcpy(&key->ipv4.addr.dst, arp->ar_tip, sizeof(key->ipv4.addr.dst));
563 ether_addr_copy(key->ipv4.arp.sha, arp->ar_sha); 577 ether_addr_copy(key->ipv4.arp.sha, arp->ar_sha);
564 ether_addr_copy(key->ipv4.arp.tha, arp->ar_tha); 578 ether_addr_copy(key->ipv4.arp.tha, arp->ar_tha);
579 } else {
580 memset(&key->ip, 0, sizeof(key->ip));
581 memset(&key->ipv4, 0, sizeof(key->ipv4));
565 } 582 }
566 } else if (key->eth.type == htons(ETH_P_IPV6)) { 583 } else if (key->eth.type == htons(ETH_P_IPV6)) {
567 int nh_len; /* IPv6 Header + Extensions */ 584 int nh_len; /* IPv6 Header + Extensions */
568 585
569 nh_len = parse_ipv6hdr(skb, key); 586 nh_len = parse_ipv6hdr(skb, key);
570 if (unlikely(nh_len < 0)) { 587 if (unlikely(nh_len < 0)) {
588 memset(&key->ip, 0, sizeof(key->ip));
589 memset(&key->ipv6.addr, 0, sizeof(key->ipv6.addr));
571 if (nh_len == -EINVAL) { 590 if (nh_len == -EINVAL) {
572 skb->transport_header = skb->network_header; 591 skb->transport_header = skb->network_header;
573 error = 0; 592 error = 0;
@@ -589,27 +608,87 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
589 key->tp.src = tcp->source; 608 key->tp.src = tcp->source;
590 key->tp.dst = tcp->dest; 609 key->tp.dst = tcp->dest;
591 key->tp.flags = TCP_FLAGS_BE16(tcp); 610 key->tp.flags = TCP_FLAGS_BE16(tcp);
611 } else {
612 memset(&key->tp, 0, sizeof(key->tp));
592 } 613 }
593 } else if (key->ip.proto == NEXTHDR_UDP) { 614 } else if (key->ip.proto == NEXTHDR_UDP) {
594 if (udphdr_ok(skb)) { 615 if (udphdr_ok(skb)) {
595 struct udphdr *udp = udp_hdr(skb); 616 struct udphdr *udp = udp_hdr(skb);
596 key->tp.src = udp->source; 617 key->tp.src = udp->source;
597 key->tp.dst = udp->dest; 618 key->tp.dst = udp->dest;
619 } else {
620 memset(&key->tp, 0, sizeof(key->tp));
598 } 621 }
599 } else if (key->ip.proto == NEXTHDR_SCTP) { 622 } else if (key->ip.proto == NEXTHDR_SCTP) {
600 if (sctphdr_ok(skb)) { 623 if (sctphdr_ok(skb)) {
601 struct sctphdr *sctp = sctp_hdr(skb); 624 struct sctphdr *sctp = sctp_hdr(skb);
602 key->tp.src = sctp->source; 625 key->tp.src = sctp->source;
603 key->tp.dst = sctp->dest; 626 key->tp.dst = sctp->dest;
627 } else {
628 memset(&key->tp, 0, sizeof(key->tp));
604 } 629 }
605 } else if (key->ip.proto == NEXTHDR_ICMP) { 630 } else if (key->ip.proto == NEXTHDR_ICMP) {
606 if (icmp6hdr_ok(skb)) { 631 if (icmp6hdr_ok(skb)) {
607 error = parse_icmpv6(skb, key, nh_len); 632 error = parse_icmpv6(skb, key, nh_len);
608 if (error) 633 if (error)
609 return error; 634 return error;
635 } else {
636 memset(&key->tp, 0, sizeof(key->tp));
610 } 637 }
611 } 638 }
612 } 639 }
613
614 return 0; 640 return 0;
615} 641}
642
643int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key)
644{
645 return key_extract(skb, key);
646}
647
648int ovs_flow_key_extract(struct ovs_tunnel_info *tun_info,
649 struct sk_buff *skb, struct sw_flow_key *key)
650{
651 /* Extract metadata from packet. */
652 if (tun_info) {
653 memcpy(&key->tun_key, &tun_info->tunnel, sizeof(key->tun_key));
654
655 if (tun_info->options) {
656 BUILD_BUG_ON((1 << (sizeof(tun_info->options_len) *
657 8)) - 1
658 > sizeof(key->tun_opts));
659 memcpy(GENEVE_OPTS(key, tun_info->options_len),
660 tun_info->options, tun_info->options_len);
661 key->tun_opts_len = tun_info->options_len;
662 } else {
663 key->tun_opts_len = 0;
664 }
665 } else {
666 key->tun_opts_len = 0;
667 memset(&key->tun_key, 0, sizeof(key->tun_key));
668 }
669
670 key->phy.priority = skb->priority;
671 key->phy.in_port = OVS_CB(skb)->input_vport->port_no;
672 key->phy.skb_mark = skb->mark;
673 key->ovs_flow_hash = 0;
674 key->recirc_id = 0;
675
676 /* Flags are always used as part of stats */
677 key->tp.flags = 0;
678
679 return key_extract(skb, key);
680}
681
682int ovs_flow_key_extract_userspace(const struct nlattr *attr,
683 struct sk_buff *skb,
684 struct sw_flow_key *key)
685{
686 int err;
687
688 /* Extract metadata from netlink attributes. */
689 err = ovs_nla_get_flow_metadata(attr, key);
690 if (err)
691 return err;
692
693 return key_extract(skb, key);
694}
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 5e5aaed3a85b..71813318c8c7 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -49,29 +49,53 @@ struct ovs_key_ipv4_tunnel {
49 u8 ipv4_ttl; 49 u8 ipv4_ttl;
50} __packed __aligned(4); /* Minimize padding. */ 50} __packed __aligned(4); /* Minimize padding. */
51 51
52static inline void ovs_flow_tun_key_init(struct ovs_key_ipv4_tunnel *tun_key, 52struct ovs_tunnel_info {
53 const struct iphdr *iph, __be64 tun_id, 53 struct ovs_key_ipv4_tunnel tunnel;
54 __be16 tun_flags) 54 struct geneve_opt *options;
55 u8 options_len;
56};
57
58/* Store options at the end of the array if they are less than the
59 * maximum size. This allows us to get the benefits of variable length
60 * matching for small options.
61 */
62#define GENEVE_OPTS(flow_key, opt_len) \
63 ((struct geneve_opt *)((flow_key)->tun_opts + \
64 FIELD_SIZEOF(struct sw_flow_key, tun_opts) - \
65 opt_len))
66
67static inline void ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
68 const struct iphdr *iph,
69 __be64 tun_id, __be16 tun_flags,
70 struct geneve_opt *opts,
71 u8 opts_len)
55{ 72{
56 tun_key->tun_id = tun_id; 73 tun_info->tunnel.tun_id = tun_id;
57 tun_key->ipv4_src = iph->saddr; 74 tun_info->tunnel.ipv4_src = iph->saddr;
58 tun_key->ipv4_dst = iph->daddr; 75 tun_info->tunnel.ipv4_dst = iph->daddr;
59 tun_key->ipv4_tos = iph->tos; 76 tun_info->tunnel.ipv4_tos = iph->tos;
60 tun_key->ipv4_ttl = iph->ttl; 77 tun_info->tunnel.ipv4_ttl = iph->ttl;
61 tun_key->tun_flags = tun_flags; 78 tun_info->tunnel.tun_flags = tun_flags;
62 79
63 /* clear struct padding. */ 80 /* clear struct padding. */
64 memset((unsigned char *) tun_key + OVS_TUNNEL_KEY_SIZE, 0, 81 memset((unsigned char *)&tun_info->tunnel + OVS_TUNNEL_KEY_SIZE, 0,
65 sizeof(*tun_key) - OVS_TUNNEL_KEY_SIZE); 82 sizeof(tun_info->tunnel) - OVS_TUNNEL_KEY_SIZE);
83
84 tun_info->options = opts;
85 tun_info->options_len = opts_len;
66} 86}
67 87
68struct sw_flow_key { 88struct sw_flow_key {
89 u8 tun_opts[255];
90 u8 tun_opts_len;
69 struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel key. */ 91 struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel key. */
70 struct { 92 struct {
71 u32 priority; /* Packet QoS priority. */ 93 u32 priority; /* Packet QoS priority. */
72 u32 skb_mark; /* SKB mark. */ 94 u32 skb_mark; /* SKB mark. */
73 u16 in_port; /* Input switch port (or DP_MAX_PORTS). */ 95 u16 in_port; /* Input switch port (or DP_MAX_PORTS). */
74 } __packed phy; /* Safe when right after 'tun_key'. */ 96 } __packed phy; /* Safe when right after 'tun_key'. */
97 u32 ovs_flow_hash; /* Datapath computed hash value. */
98 u32 recirc_id; /* Recirculation ID. */
75 struct { 99 struct {
76 u8 src[ETH_ALEN]; /* Ethernet source address. */ 100 u8 src[ETH_ALEN]; /* Ethernet source address. */
77 u8 dst[ETH_ALEN]; /* Ethernet destination address. */ 101 u8 dst[ETH_ALEN]; /* Ethernet destination address. */
@@ -187,6 +211,12 @@ void ovs_flow_stats_get(const struct sw_flow *, struct ovs_flow_stats *,
187void ovs_flow_stats_clear(struct sw_flow *); 211void ovs_flow_stats_clear(struct sw_flow *);
188u64 ovs_flow_used_time(unsigned long flow_jiffies); 212u64 ovs_flow_used_time(unsigned long flow_jiffies);
189 213
190int ovs_flow_extract(struct sk_buff *, u16 in_port, struct sw_flow_key *); 214int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key);
215int ovs_flow_key_extract(struct ovs_tunnel_info *tun_info, struct sk_buff *skb,
216 struct sw_flow_key *key);
217/* Extract key from packet coming from userspace. */
218int ovs_flow_key_extract_userspace(const struct nlattr *attr,
219 struct sk_buff *skb,
220 struct sw_flow_key *key);
191 221
192#endif /* flow.h */ 222#endif /* flow.h */
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d757848da89c..368f23307911 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1,5 +1,5 @@
1/* 1/*
2 * Copyright (c) 2007-2013 Nicira, Inc. 2 * Copyright (c) 2007-2014 Nicira, Inc.
3 * 3 *
4 * This program is free software; you can redistribute it and/or 4 * This program is free software; you can redistribute it and/or
5 * modify it under the terms of version 2 of the GNU General Public 5 * modify it under the terms of version 2 of the GNU General Public
@@ -42,6 +42,7 @@
42#include <linux/icmp.h> 42#include <linux/icmp.h>
43#include <linux/icmpv6.h> 43#include <linux/icmpv6.h>
44#include <linux/rculist.h> 44#include <linux/rculist.h>
45#include <net/geneve.h>
45#include <net/ip.h> 46#include <net/ip.h>
46#include <net/ipv6.h> 47#include <net/ipv6.h>
47#include <net/ndisc.h> 48#include <net/ndisc.h>
@@ -88,18 +89,20 @@ static void update_range__(struct sw_flow_match *match,
88 } \ 89 } \
89 } while (0) 90 } while (0)
90 91
91#define SW_FLOW_KEY_MEMCPY(match, field, value_p, len, is_mask) \ 92#define SW_FLOW_KEY_MEMCPY_OFFSET(match, offset, value_p, len, is_mask) \
92 do { \ 93 do { \
93 update_range__(match, offsetof(struct sw_flow_key, field), \ 94 update_range__(match, offset, len, is_mask); \
94 len, is_mask); \ 95 if (is_mask) \
95 if (is_mask) { \ 96 memcpy((u8 *)&(match)->mask->key + offset, value_p, \
96 if ((match)->mask) \ 97 len); \
97 memcpy(&(match)->mask->key.field, value_p, len);\ 98 else \
98 } else { \ 99 memcpy((u8 *)(match)->key + offset, value_p, len); \
99 memcpy(&(match)->key->field, value_p, len); \
100 } \
101 } while (0) 100 } while (0)
102 101
102#define SW_FLOW_KEY_MEMCPY(match, field, value_p, len, is_mask) \
103 SW_FLOW_KEY_MEMCPY_OFFSET(match, offsetof(struct sw_flow_key, field), \
104 value_p, len, is_mask)
105
103static u16 range_n_bytes(const struct sw_flow_key_range *range) 106static u16 range_n_bytes(const struct sw_flow_key_range *range)
104{ 107{
105 return range->end - range->start; 108 return range->end - range->start;
@@ -251,6 +254,8 @@ static const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
251 [OVS_KEY_ATTR_ICMPV6] = sizeof(struct ovs_key_icmpv6), 254 [OVS_KEY_ATTR_ICMPV6] = sizeof(struct ovs_key_icmpv6),
252 [OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp), 255 [OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp),
253 [OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd), 256 [OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd),
257 [OVS_KEY_ATTR_RECIRC_ID] = sizeof(u32),
258 [OVS_KEY_ATTR_DP_HASH] = sizeof(u32),
254 [OVS_KEY_ATTR_TUNNEL] = -1, 259 [OVS_KEY_ATTR_TUNNEL] = -1,
255}; 260};
256 261
@@ -333,6 +338,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
333 int rem; 338 int rem;
334 bool ttl = false; 339 bool ttl = false;
335 __be16 tun_flags = 0; 340 __be16 tun_flags = 0;
341 unsigned long opt_key_offset;
336 342
337 nla_for_each_nested(a, attr, rem) { 343 nla_for_each_nested(a, attr, rem) {
338 int type = nla_type(a); 344 int type = nla_type(a);
@@ -344,6 +350,8 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
344 [OVS_TUNNEL_KEY_ATTR_TTL] = 1, 350 [OVS_TUNNEL_KEY_ATTR_TTL] = 1,
345 [OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT] = 0, 351 [OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT] = 0,
346 [OVS_TUNNEL_KEY_ATTR_CSUM] = 0, 352 [OVS_TUNNEL_KEY_ATTR_CSUM] = 0,
353 [OVS_TUNNEL_KEY_ATTR_OAM] = 0,
354 [OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS] = -1,
347 }; 355 };
348 356
349 if (type > OVS_TUNNEL_KEY_ATTR_MAX) { 357 if (type > OVS_TUNNEL_KEY_ATTR_MAX) {
@@ -352,7 +360,8 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
352 return -EINVAL; 360 return -EINVAL;
353 } 361 }
354 362
355 if (ovs_tunnel_key_lens[type] != nla_len(a)) { 363 if (ovs_tunnel_key_lens[type] != nla_len(a) &&
364 ovs_tunnel_key_lens[type] != -1) {
356 OVS_NLERR("IPv4 tunnel attribute type has unexpected " 365 OVS_NLERR("IPv4 tunnel attribute type has unexpected "
357 " length (type=%d, length=%d, expected=%d).\n", 366 " length (type=%d, length=%d, expected=%d).\n",
358 type, nla_len(a), ovs_tunnel_key_lens[type]); 367 type, nla_len(a), ovs_tunnel_key_lens[type]);
@@ -388,7 +397,63 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
388 case OVS_TUNNEL_KEY_ATTR_CSUM: 397 case OVS_TUNNEL_KEY_ATTR_CSUM:
389 tun_flags |= TUNNEL_CSUM; 398 tun_flags |= TUNNEL_CSUM;
390 break; 399 break;
400 case OVS_TUNNEL_KEY_ATTR_OAM:
401 tun_flags |= TUNNEL_OAM;
402 break;
403 case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS:
404 tun_flags |= TUNNEL_OPTIONS_PRESENT;
405 if (nla_len(a) > sizeof(match->key->tun_opts)) {
406 OVS_NLERR("Geneve option length exceeds maximum size (len %d, max %zu).\n",
407 nla_len(a),
408 sizeof(match->key->tun_opts));
409 return -EINVAL;
410 }
411
412 if (nla_len(a) % 4 != 0) {
413 OVS_NLERR("Geneve option length is not a multiple of 4 (len %d).\n",
414 nla_len(a));
415 return -EINVAL;
416 }
417
418 /* We need to record the length of the options passed
419 * down, otherwise packets with the same format but
420 * additional options will be silently matched.
421 */
422 if (!is_mask) {
423 SW_FLOW_KEY_PUT(match, tun_opts_len, nla_len(a),
424 false);
425 } else {
426 /* This is somewhat unusual because it looks at
427 * both the key and mask while parsing the
428 * attributes (and by extension assumes the key
429 * is parsed first). Normally, we would verify
430 * that each is the correct length and that the
431 * attributes line up in the validate function.
432 * However, that is difficult because this is
433 * variable length and we won't have the
434 * information later.
435 */
436 if (match->key->tun_opts_len != nla_len(a)) {
437 OVS_NLERR("Geneve option key length (%d) is different from mask length (%d).",
438 match->key->tun_opts_len,
439 nla_len(a));
440 return -EINVAL;
441 }
442
443 SW_FLOW_KEY_PUT(match, tun_opts_len, 0xff,
444 true);
445 }
446
447 opt_key_offset = (unsigned long)GENEVE_OPTS(
448 (struct sw_flow_key *)0,
449 nla_len(a));
450 SW_FLOW_KEY_MEMCPY_OFFSET(match, opt_key_offset,
451 nla_data(a), nla_len(a),
452 is_mask);
453 break;
391 default: 454 default:
455 OVS_NLERR("Unknown IPv4 tunnel attribute (%d).\n",
456 type);
392 return -EINVAL; 457 return -EINVAL;
393 } 458 }
394 } 459 }
@@ -415,45 +480,80 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
415 return 0; 480 return 0;
416} 481}
417 482
418static int ipv4_tun_to_nlattr(struct sk_buff *skb, 483static int __ipv4_tun_to_nlattr(struct sk_buff *skb,
419 const struct ovs_key_ipv4_tunnel *tun_key, 484 const struct ovs_key_ipv4_tunnel *output,
420 const struct ovs_key_ipv4_tunnel *output) 485 const struct geneve_opt *tun_opts,
486 int swkey_tun_opts_len)
421{ 487{
422 struct nlattr *nla;
423
424 nla = nla_nest_start(skb, OVS_KEY_ATTR_TUNNEL);
425 if (!nla)
426 return -EMSGSIZE;
427
428 if (output->tun_flags & TUNNEL_KEY && 488 if (output->tun_flags & TUNNEL_KEY &&
429 nla_put_be64(skb, OVS_TUNNEL_KEY_ATTR_ID, output->tun_id)) 489 nla_put_be64(skb, OVS_TUNNEL_KEY_ATTR_ID, output->tun_id))
430 return -EMSGSIZE; 490 return -EMSGSIZE;
431 if (output->ipv4_src && 491 if (output->ipv4_src &&
432 nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_SRC, output->ipv4_src)) 492 nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_SRC, output->ipv4_src))
433 return -EMSGSIZE; 493 return -EMSGSIZE;
434 if (output->ipv4_dst && 494 if (output->ipv4_dst &&
435 nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_DST, output->ipv4_dst)) 495 nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_DST, output->ipv4_dst))
436 return -EMSGSIZE; 496 return -EMSGSIZE;
437 if (output->ipv4_tos && 497 if (output->ipv4_tos &&
438 nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TOS, output->ipv4_tos)) 498 nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TOS, output->ipv4_tos))
439 return -EMSGSIZE; 499 return -EMSGSIZE;
440 if (nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TTL, output->ipv4_ttl)) 500 if (nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TTL, output->ipv4_ttl))
441 return -EMSGSIZE; 501 return -EMSGSIZE;
442 if ((output->tun_flags & TUNNEL_DONT_FRAGMENT) && 502 if ((output->tun_flags & TUNNEL_DONT_FRAGMENT) &&
443 nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT)) 503 nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT))
444 return -EMSGSIZE; 504 return -EMSGSIZE;
445 if ((output->tun_flags & TUNNEL_CSUM) && 505 if ((output->tun_flags & TUNNEL_CSUM) &&
446 nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_CSUM)) 506 nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_CSUM))
507 return -EMSGSIZE;
508 if ((output->tun_flags & TUNNEL_OAM) &&
509 nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_OAM))
510 return -EMSGSIZE;
511 if (tun_opts &&
512 nla_put(skb, OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS,
513 swkey_tun_opts_len, tun_opts))
447 return -EMSGSIZE; 514 return -EMSGSIZE;
448 515
449 nla_nest_end(skb, nla);
450 return 0; 516 return 0;
451} 517}
452 518
453 519
520static int ipv4_tun_to_nlattr(struct sk_buff *skb,
521 const struct ovs_key_ipv4_tunnel *output,
522 const struct geneve_opt *tun_opts,
523 int swkey_tun_opts_len)
524{
525 struct nlattr *nla;
526 int err;
527
528 nla = nla_nest_start(skb, OVS_KEY_ATTR_TUNNEL);
529 if (!nla)
530 return -EMSGSIZE;
531
532 err = __ipv4_tun_to_nlattr(skb, output, tun_opts, swkey_tun_opts_len);
533 if (err)
534 return err;
535
536 nla_nest_end(skb, nla);
537 return 0;
538}
539
454static int metadata_from_nlattrs(struct sw_flow_match *match, u64 *attrs, 540static int metadata_from_nlattrs(struct sw_flow_match *match, u64 *attrs,
455 const struct nlattr **a, bool is_mask) 541 const struct nlattr **a, bool is_mask)
456{ 542{
543 if (*attrs & (1 << OVS_KEY_ATTR_DP_HASH)) {
544 u32 hash_val = nla_get_u32(a[OVS_KEY_ATTR_DP_HASH]);
545
546 SW_FLOW_KEY_PUT(match, ovs_flow_hash, hash_val, is_mask);
547 *attrs &= ~(1 << OVS_KEY_ATTR_DP_HASH);
548 }
549
550 if (*attrs & (1 << OVS_KEY_ATTR_RECIRC_ID)) {
551 u32 recirc_id = nla_get_u32(a[OVS_KEY_ATTR_RECIRC_ID]);
552
553 SW_FLOW_KEY_PUT(match, recirc_id, recirc_id, is_mask);
554 *attrs &= ~(1 << OVS_KEY_ATTR_RECIRC_ID);
555 }
556
457 if (*attrs & (1 << OVS_KEY_ATTR_PRIORITY)) { 557 if (*attrs & (1 << OVS_KEY_ATTR_PRIORITY)) {
458 SW_FLOW_KEY_PUT(match, phy.priority, 558 SW_FLOW_KEY_PUT(match, phy.priority,
459 nla_get_u32(a[OVS_KEY_ATTR_PRIORITY]), is_mask); 559 nla_get_u32(a[OVS_KEY_ATTR_PRIORITY]), is_mask);
@@ -836,7 +936,7 @@ int ovs_nla_get_match(struct sw_flow_match *match,
836 936
837/** 937/**
838 * ovs_nla_get_flow_metadata - parses Netlink attributes into a flow key. 938 * ovs_nla_get_flow_metadata - parses Netlink attributes into a flow key.
839 * @flow: Receives extracted in_port, priority, tun_key and skb_mark. 939 * @key: Receives extracted in_port, priority, tun_key and skb_mark.
840 * @attr: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute 940 * @attr: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute
841 * sequence. 941 * sequence.
842 * 942 *
@@ -846,32 +946,24 @@ int ovs_nla_get_match(struct sw_flow_match *match,
846 * extracted from the packet itself. 946 * extracted from the packet itself.
847 */ 947 */
848 948
849int ovs_nla_get_flow_metadata(struct sw_flow *flow, 949int ovs_nla_get_flow_metadata(const struct nlattr *attr,
850 const struct nlattr *attr) 950 struct sw_flow_key *key)
851{ 951{
852 struct ovs_key_ipv4_tunnel *tun_key = &flow->key.tun_key;
853 const struct nlattr *a[OVS_KEY_ATTR_MAX + 1]; 952 const struct nlattr *a[OVS_KEY_ATTR_MAX + 1];
953 struct sw_flow_match match;
854 u64 attrs = 0; 954 u64 attrs = 0;
855 int err; 955 int err;
856 struct sw_flow_match match;
857
858 flow->key.phy.in_port = DP_MAX_PORTS;
859 flow->key.phy.priority = 0;
860 flow->key.phy.skb_mark = 0;
861 memset(tun_key, 0, sizeof(flow->key.tun_key));
862 956
863 err = parse_flow_nlattrs(attr, a, &attrs); 957 err = parse_flow_nlattrs(attr, a, &attrs);
864 if (err) 958 if (err)
865 return -EINVAL; 959 return -EINVAL;
866 960
867 memset(&match, 0, sizeof(match)); 961 memset(&match, 0, sizeof(match));
868 match.key = &flow->key; 962 match.key = key;
869 963
870 err = metadata_from_nlattrs(&match, &attrs, a, false); 964 key->phy.in_port = DP_MAX_PORTS;
871 if (err)
872 return err;
873 965
874 return 0; 966 return metadata_from_nlattrs(&match, &attrs, a, false);
875} 967}
876 968
877int ovs_nla_put_flow(const struct sw_flow_key *swkey, 969int ovs_nla_put_flow(const struct sw_flow_key *swkey,
@@ -881,13 +973,26 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
881 struct nlattr *nla, *encap; 973 struct nlattr *nla, *encap;
882 bool is_mask = (swkey != output); 974 bool is_mask = (swkey != output);
883 975
884 if (nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, output->phy.priority)) 976 if (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id))
977 goto nla_put_failure;
978
979 if (nla_put_u32(skb, OVS_KEY_ATTR_DP_HASH, output->ovs_flow_hash))
885 goto nla_put_failure; 980 goto nla_put_failure;
886 981
887 if ((swkey->tun_key.ipv4_dst || is_mask) && 982 if (nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, output->phy.priority))
888 ipv4_tun_to_nlattr(skb, &swkey->tun_key, &output->tun_key))
889 goto nla_put_failure; 983 goto nla_put_failure;
890 984
985 if ((swkey->tun_key.ipv4_dst || is_mask)) {
986 const struct geneve_opt *opts = NULL;
987
988 if (output->tun_key.tun_flags & TUNNEL_OPTIONS_PRESENT)
989 opts = GENEVE_OPTS(output, swkey->tun_opts_len);
990
991 if (ipv4_tun_to_nlattr(skb, &output->tun_key, opts,
992 swkey->tun_opts_len))
993 goto nla_put_failure;
994 }
995
891 if (swkey->phy.in_port == DP_MAX_PORTS) { 996 if (swkey->phy.in_port == DP_MAX_PORTS) {
892 if (is_mask && (output->phy.in_port == 0xffff)) 997 if (is_mask && (output->phy.in_port == 0xffff))
893 if (nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, 0xffffffff)) 998 if (nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, 0xffffffff))
@@ -1127,13 +1232,14 @@ out:
1127 return (struct nlattr *) ((unsigned char *)(*sfa) + next_offset); 1232 return (struct nlattr *) ((unsigned char *)(*sfa) + next_offset);
1128} 1233}
1129 1234
1130static int add_action(struct sw_flow_actions **sfa, int attrtype, void *data, int len) 1235static struct nlattr *__add_action(struct sw_flow_actions **sfa,
1236 int attrtype, void *data, int len)
1131{ 1237{
1132 struct nlattr *a; 1238 struct nlattr *a;
1133 1239
1134 a = reserve_sfa_size(sfa, nla_attr_size(len)); 1240 a = reserve_sfa_size(sfa, nla_attr_size(len));
1135 if (IS_ERR(a)) 1241 if (IS_ERR(a))
1136 return PTR_ERR(a); 1242 return a;
1137 1243
1138 a->nla_type = attrtype; 1244 a->nla_type = attrtype;
1139 a->nla_len = nla_attr_size(len); 1245 a->nla_len = nla_attr_size(len);
@@ -1142,6 +1248,18 @@ static int add_action(struct sw_flow_actions **sfa, int attrtype, void *data, in
1142 memcpy(nla_data(a), data, len); 1248 memcpy(nla_data(a), data, len);
1143 memset((unsigned char *) a + a->nla_len, 0, nla_padlen(len)); 1249 memset((unsigned char *) a + a->nla_len, 0, nla_padlen(len));
1144 1250
1251 return a;
1252}
1253
1254static int add_action(struct sw_flow_actions **sfa, int attrtype,
1255 void *data, int len)
1256{
1257 struct nlattr *a;
1258
1259 a = __add_action(sfa, attrtype, data, len);
1260 if (IS_ERR(a))
1261 return PTR_ERR(a);
1262
1145 return 0; 1263 return 0;
1146} 1264}
1147 1265
@@ -1247,6 +1365,8 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
1247{ 1365{
1248 struct sw_flow_match match; 1366 struct sw_flow_match match;
1249 struct sw_flow_key key; 1367 struct sw_flow_key key;
1368 struct ovs_tunnel_info *tun_info;
1369 struct nlattr *a;
1250 int err, start; 1370 int err, start;
1251 1371
1252 ovs_match_init(&match, &key, NULL); 1372 ovs_match_init(&match, &key, NULL);
@@ -1254,12 +1374,56 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
1254 if (err) 1374 if (err)
1255 return err; 1375 return err;
1256 1376
1377 if (key.tun_opts_len) {
1378 struct geneve_opt *option = GENEVE_OPTS(&key,
1379 key.tun_opts_len);
1380 int opts_len = key.tun_opts_len;
1381 bool crit_opt = false;
1382
1383 while (opts_len > 0) {
1384 int len;
1385
1386 if (opts_len < sizeof(*option))
1387 return -EINVAL;
1388
1389 len = sizeof(*option) + option->length * 4;
1390 if (len > opts_len)
1391 return -EINVAL;
1392
1393 crit_opt |= !!(option->type & GENEVE_CRIT_OPT_TYPE);
1394
1395 option = (struct geneve_opt *)((u8 *)option + len);
1396 opts_len -= len;
1397 };
1398
1399 key.tun_key.tun_flags |= crit_opt ? TUNNEL_CRIT_OPT : 0;
1400 };
1401
1257 start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SET); 1402 start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SET);
1258 if (start < 0) 1403 if (start < 0)
1259 return start; 1404 return start;
1260 1405
1261 err = add_action(sfa, OVS_KEY_ATTR_IPV4_TUNNEL, &match.key->tun_key, 1406 a = __add_action(sfa, OVS_KEY_ATTR_TUNNEL_INFO, NULL,
1262 sizeof(match.key->tun_key)); 1407 sizeof(*tun_info) + key.tun_opts_len);
1408 if (IS_ERR(a))
1409 return PTR_ERR(a);
1410
1411 tun_info = nla_data(a);
1412 tun_info->tunnel = key.tun_key;
1413 tun_info->options_len = key.tun_opts_len;
1414
1415 if (tun_info->options_len) {
1416 /* We need to store the options in the action itself since
1417 * everything else will go away after flow setup. We can append
1418 * it to tun_info and then point there.
1419 */
1420 memcpy((tun_info + 1), GENEVE_OPTS(&key, key.tun_opts_len),
1421 key.tun_opts_len);
1422 tun_info->options = (struct geneve_opt *)(tun_info + 1);
1423 } else {
1424 tun_info->options = NULL;
1425 }
1426
1263 add_nested_action_end(*sfa, start); 1427 add_nested_action_end(*sfa, start);
1264 1428
1265 return err; 1429 return err;
@@ -1409,11 +1573,13 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
1409 /* Expected argument lengths, (u32)-1 for variable length. */ 1573 /* Expected argument lengths, (u32)-1 for variable length. */
1410 static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = { 1574 static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = {
1411 [OVS_ACTION_ATTR_OUTPUT] = sizeof(u32), 1575 [OVS_ACTION_ATTR_OUTPUT] = sizeof(u32),
1576 [OVS_ACTION_ATTR_RECIRC] = sizeof(u32),
1412 [OVS_ACTION_ATTR_USERSPACE] = (u32)-1, 1577 [OVS_ACTION_ATTR_USERSPACE] = (u32)-1,
1413 [OVS_ACTION_ATTR_PUSH_VLAN] = sizeof(struct ovs_action_push_vlan), 1578 [OVS_ACTION_ATTR_PUSH_VLAN] = sizeof(struct ovs_action_push_vlan),
1414 [OVS_ACTION_ATTR_POP_VLAN] = 0, 1579 [OVS_ACTION_ATTR_POP_VLAN] = 0,
1415 [OVS_ACTION_ATTR_SET] = (u32)-1, 1580 [OVS_ACTION_ATTR_SET] = (u32)-1,
1416 [OVS_ACTION_ATTR_SAMPLE] = (u32)-1 1581 [OVS_ACTION_ATTR_SAMPLE] = (u32)-1,
1582 [OVS_ACTION_ATTR_HASH] = sizeof(struct ovs_action_hash)
1417 }; 1583 };
1418 const struct ovs_action_push_vlan *vlan; 1584 const struct ovs_action_push_vlan *vlan;
1419 int type = nla_type(a); 1585 int type = nla_type(a);
@@ -1440,6 +1606,18 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
1440 return -EINVAL; 1606 return -EINVAL;
1441 break; 1607 break;
1442 1608
1609 case OVS_ACTION_ATTR_HASH: {
1610 const struct ovs_action_hash *act_hash = nla_data(a);
1611
1612 switch (act_hash->hash_alg) {
1613 case OVS_HASH_ALG_L4:
1614 break;
1615 default:
1616 return -EINVAL;
1617 }
1618
1619 break;
1620 }
1443 1621
1444 case OVS_ACTION_ATTR_POP_VLAN: 1622 case OVS_ACTION_ATTR_POP_VLAN:
1445 break; 1623 break;
@@ -1452,6 +1630,9 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
1452 return -EINVAL; 1630 return -EINVAL;
1453 break; 1631 break;
1454 1632
1633 case OVS_ACTION_ATTR_RECIRC:
1634 break;
1635
1455 case OVS_ACTION_ATTR_SET: 1636 case OVS_ACTION_ATTR_SET:
1456 err = validate_set(a, key, sfa, &skip_copy); 1637 err = validate_set(a, key, sfa, &skip_copy);
1457 if (err) 1638 if (err)
@@ -1525,17 +1706,22 @@ static int set_action_to_attr(const struct nlattr *a, struct sk_buff *skb)
1525 int err; 1706 int err;
1526 1707
1527 switch (key_type) { 1708 switch (key_type) {
1528 case OVS_KEY_ATTR_IPV4_TUNNEL: 1709 case OVS_KEY_ATTR_TUNNEL_INFO: {
1710 struct ovs_tunnel_info *tun_info = nla_data(ovs_key);
1711
1529 start = nla_nest_start(skb, OVS_ACTION_ATTR_SET); 1712 start = nla_nest_start(skb, OVS_ACTION_ATTR_SET);
1530 if (!start) 1713 if (!start)
1531 return -EMSGSIZE; 1714 return -EMSGSIZE;
1532 1715
1533 err = ipv4_tun_to_nlattr(skb, nla_data(ovs_key), 1716 err = ipv4_tun_to_nlattr(skb, &tun_info->tunnel,
1534 nla_data(ovs_key)); 1717 tun_info->options_len ?
1718 tun_info->options : NULL,
1719 tun_info->options_len);
1535 if (err) 1720 if (err)
1536 return err; 1721 return err;
1537 nla_nest_end(skb, start); 1722 nla_nest_end(skb, start);
1538 break; 1723 break;
1724 }
1539 default: 1725 default:
1540 if (nla_put(skb, OVS_ACTION_ATTR_SET, nla_len(a), ovs_key)) 1726 if (nla_put(skb, OVS_ACTION_ATTR_SET, nla_len(a), ovs_key))
1541 return -EMSGSIZE; 1727 return -EMSGSIZE;
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index 440151045d39..206e45add888 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -42,8 +42,8 @@ void ovs_match_init(struct sw_flow_match *match,
42 42
43int ovs_nla_put_flow(const struct sw_flow_key *, 43int ovs_nla_put_flow(const struct sw_flow_key *,
44 const struct sw_flow_key *, struct sk_buff *); 44 const struct sw_flow_key *, struct sk_buff *);
45int ovs_nla_get_flow_metadata(struct sw_flow *flow, 45int ovs_nla_get_flow_metadata(const struct nlattr *, struct sw_flow_key *);
46 const struct nlattr *attr); 46
47int ovs_nla_get_match(struct sw_flow_match *match, 47int ovs_nla_get_match(struct sw_flow_match *match,
48 const struct nlattr *, 48 const struct nlattr *,
49 const struct nlattr *); 49 const struct nlattr *);
diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
new file mode 100644
index 000000000000..910b3ef2c0d5
--- /dev/null
+++ b/net/openvswitch/vport-geneve.c
@@ -0,0 +1,235 @@
1/*
2 * Copyright (c) 2014 Nicira, Inc.
3 *
4 * This program is free software; you can redistribute it and/or
5 * modify it under the terms of the GNU General Public License
6 * as published by the Free Software Foundation; either version
7 * 2 of the License, or (at your option) any later version.
8 */
9
10#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
11
12#include <linux/version.h>
13
14#include <linux/in.h>
15#include <linux/ip.h>
16#include <linux/net.h>
17#include <linux/rculist.h>
18#include <linux/udp.h>
19#include <linux/if_vlan.h>
20
21#include <net/geneve.h>
22#include <net/icmp.h>
23#include <net/ip.h>
24#include <net/route.h>
25#include <net/udp.h>
26#include <net/xfrm.h>
27
28#include "datapath.h"
29#include "vport.h"
30
31/**
32 * struct geneve_port - Keeps track of open UDP ports
33 * @sock: The socket created for this port number.
34 * @name: vport name.
35 */
36struct geneve_port {
37 struct geneve_sock *gs;
38 char name[IFNAMSIZ];
39};
40
41static LIST_HEAD(geneve_ports);
42
43static inline struct geneve_port *geneve_vport(const struct vport *vport)
44{
45 return vport_priv(vport);
46}
47
48static inline struct genevehdr *geneve_hdr(const struct sk_buff *skb)
49{
50 return (struct genevehdr *)(udp_hdr(skb) + 1);
51}
52
53/* Convert 64 bit tunnel ID to 24 bit VNI. */
54static void tunnel_id_to_vni(__be64 tun_id, __u8 *vni)
55{
56#ifdef __BIG_ENDIAN
57 vni[0] = (__force __u8)(tun_id >> 16);
58 vni[1] = (__force __u8)(tun_id >> 8);
59 vni[2] = (__force __u8)tun_id;
60#else
61 vni[0] = (__force __u8)((__force u64)tun_id >> 40);
62 vni[1] = (__force __u8)((__force u64)tun_id >> 48);
63 vni[2] = (__force __u8)((__force u64)tun_id >> 56);
64#endif
65}
66
67/* Convert 24 bit VNI to 64 bit tunnel ID. */
68static __be64 vni_to_tunnel_id(__u8 *vni)
69{
70#ifdef __BIG_ENDIAN
71 return (vni[0] << 16) | (vni[1] << 8) | vni[2];
72#else
73 return (__force __be64)(((__force u64)vni[0] << 40) |
74 ((__force u64)vni[1] << 48) |
75 ((__force u64)vni[2] << 56));
76#endif
77}
78
79static void geneve_rcv(struct geneve_sock *gs, struct sk_buff *skb)
80{
81 struct vport *vport = gs->rcv_data;
82 struct genevehdr *geneveh = geneve_hdr(skb);
83 int opts_len;
84 struct ovs_tunnel_info tun_info;
85 __be64 key;
86 __be16 flags;
87
88 opts_len = geneveh->opt_len * 4;
89
90 flags = TUNNEL_KEY | TUNNEL_OPTIONS_PRESENT |
91 (udp_hdr(skb)->check != 0 ? TUNNEL_CSUM : 0) |
92 (geneveh->oam ? TUNNEL_OAM : 0) |
93 (geneveh->critical ? TUNNEL_CRIT_OPT : 0);
94
95 key = vni_to_tunnel_id(geneveh->vni);
96
97 ovs_flow_tun_info_init(&tun_info, ip_hdr(skb), key, flags,
98 geneveh->options, opts_len);
99
100 ovs_vport_receive(vport, skb, &tun_info);
101}
102
103static int geneve_get_options(const struct vport *vport,
104 struct sk_buff *skb)
105{
106 struct geneve_port *geneve_port = geneve_vport(vport);
107 struct inet_sock *sk = inet_sk(geneve_port->gs->sock->sk);
108
109 if (nla_put_u16(skb, OVS_TUNNEL_ATTR_DST_PORT, ntohs(sk->inet_sport)))
110 return -EMSGSIZE;
111 return 0;
112}
113
114static void geneve_tnl_destroy(struct vport *vport)
115{
116 struct geneve_port *geneve_port = geneve_vport(vport);
117
118 geneve_sock_release(geneve_port->gs);
119
120 ovs_vport_deferred_free(vport);
121}
122
123static struct vport *geneve_tnl_create(const struct vport_parms *parms)
124{
125 struct net *net = ovs_dp_get_net(parms->dp);
126 struct nlattr *options = parms->options;
127 struct geneve_port *geneve_port;
128 struct geneve_sock *gs;
129 struct vport *vport;
130 struct nlattr *a;
131 int err;
132 u16 dst_port;
133
134 if (!options) {
135 err = -EINVAL;
136 goto error;
137 }
138
139 a = nla_find_nested(options, OVS_TUNNEL_ATTR_DST_PORT);
140 if (a && nla_len(a) == sizeof(u16)) {
141 dst_port = nla_get_u16(a);
142 } else {
143 /* Require destination port from userspace. */
144 err = -EINVAL;
145 goto error;
146 }
147
148 vport = ovs_vport_alloc(sizeof(struct geneve_port),
149 &ovs_geneve_vport_ops, parms);
150 if (IS_ERR(vport))
151 return vport;
152
153 geneve_port = geneve_vport(vport);
154 strncpy(geneve_port->name, parms->name, IFNAMSIZ);
155
156 gs = geneve_sock_add(net, htons(dst_port), geneve_rcv, vport, true, 0);
157 if (IS_ERR(gs)) {
158 ovs_vport_free(vport);
159 return (void *)gs;
160 }
161 geneve_port->gs = gs;
162
163 return vport;
164error:
165 return ERR_PTR(err);
166}
167
168static int geneve_tnl_send(struct vport *vport, struct sk_buff *skb)
169{
170 struct ovs_key_ipv4_tunnel *tun_key;
171 struct ovs_tunnel_info *tun_info;
172 struct net *net = ovs_dp_get_net(vport->dp);
173 struct geneve_port *geneve_port = geneve_vport(vport);
174 __be16 dport = inet_sk(geneve_port->gs->sock->sk)->inet_sport;
175 __be16 sport;
176 struct rtable *rt;
177 struct flowi4 fl;
178 u8 vni[3];
179 __be16 df;
180 int err;
181
182 tun_info = OVS_CB(skb)->egress_tun_info;
183 if (unlikely(!tun_info)) {
184 err = -EINVAL;
185 goto error;
186 }
187
188 tun_key = &tun_info->tunnel;
189
190 /* Route lookup */
191 memset(&fl, 0, sizeof(fl));
192 fl.daddr = tun_key->ipv4_dst;
193 fl.saddr = tun_key->ipv4_src;
194 fl.flowi4_tos = RT_TOS(tun_key->ipv4_tos);
195 fl.flowi4_mark = skb->mark;
196 fl.flowi4_proto = IPPROTO_UDP;
197
198 rt = ip_route_output_key(net, &fl);
199 if (IS_ERR(rt)) {
200 err = PTR_ERR(rt);
201 goto error;
202 }
203
204 df = tun_key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
205 sport = udp_flow_src_port(net, skb, 1, USHRT_MAX, true);
206 tunnel_id_to_vni(tun_key->tun_id, vni);
207 skb->ignore_df = 1;
208
209 err = geneve_xmit_skb(geneve_port->gs, rt, skb, fl.saddr,
210 tun_key->ipv4_dst, tun_key->ipv4_tos,
211 tun_key->ipv4_ttl, df, sport, dport,
212 tun_key->tun_flags, vni,
213 tun_info->options_len, (u8 *)tun_info->options,
214 false);
215 if (err < 0)
216 ip_rt_put(rt);
217error:
218 return err;
219}
220
221static const char *geneve_get_name(const struct vport *vport)
222{
223 struct geneve_port *geneve_port = geneve_vport(vport);
224
225 return geneve_port->name;
226}
227
228const struct vport_ops ovs_geneve_vport_ops = {
229 .type = OVS_VPORT_TYPE_GENEVE,
230 .create = geneve_tnl_create,
231 .destroy = geneve_tnl_destroy,
232 .get_name = geneve_get_name,
233 .get_options = geneve_get_options,
234 .send = geneve_tnl_send,
235};
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index f49148a07da2..108b82da2fd9 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -1,5 +1,5 @@
1/* 1/*
2 * Copyright (c) 2007-2013 Nicira, Inc. 2 * Copyright (c) 2007-2014 Nicira, Inc.
3 * 3 *
4 * This program is free software; you can redistribute it and/or 4 * This program is free software; you can redistribute it and/or
5 * modify it under the terms of version 2 of the GNU General Public 5 * modify it under the terms of version 2 of the GNU General Public
@@ -63,8 +63,10 @@ static __be16 filter_tnl_flags(__be16 flags)
63static struct sk_buff *__build_header(struct sk_buff *skb, 63static struct sk_buff *__build_header(struct sk_buff *skb,
64 int tunnel_hlen) 64 int tunnel_hlen)
65{ 65{
66 const struct ovs_key_ipv4_tunnel *tun_key = OVS_CB(skb)->tun_key;
67 struct tnl_ptk_info tpi; 66 struct tnl_ptk_info tpi;
67 const struct ovs_key_ipv4_tunnel *tun_key;
68
69 tun_key = &OVS_CB(skb)->egress_tun_info->tunnel;
68 70
69 skb = gre_handle_offloads(skb, !!(tun_key->tun_flags & TUNNEL_CSUM)); 71 skb = gre_handle_offloads(skb, !!(tun_key->tun_flags & TUNNEL_CSUM));
70 if (IS_ERR(skb)) 72 if (IS_ERR(skb))
@@ -92,7 +94,7 @@ static __be64 key_to_tunnel_id(__be32 key, __be32 seq)
92static int gre_rcv(struct sk_buff *skb, 94static int gre_rcv(struct sk_buff *skb,
93 const struct tnl_ptk_info *tpi) 95 const struct tnl_ptk_info *tpi)
94{ 96{
95 struct ovs_key_ipv4_tunnel tun_key; 97 struct ovs_tunnel_info tun_info;
96 struct ovs_net *ovs_net; 98 struct ovs_net *ovs_net;
97 struct vport *vport; 99 struct vport *vport;
98 __be64 key; 100 __be64 key;
@@ -103,10 +105,10 @@ static int gre_rcv(struct sk_buff *skb,
103 return PACKET_REJECT; 105 return PACKET_REJECT;
104 106
105 key = key_to_tunnel_id(tpi->key, tpi->seq); 107 key = key_to_tunnel_id(tpi->key, tpi->seq);
106 ovs_flow_tun_key_init(&tun_key, ip_hdr(skb), key, 108 ovs_flow_tun_info_init(&tun_info, ip_hdr(skb), key,
107 filter_tnl_flags(tpi->flags)); 109 filter_tnl_flags(tpi->flags), NULL, 0);
108 110
109 ovs_vport_receive(vport, skb, &tun_key); 111 ovs_vport_receive(vport, skb, &tun_info);
110 return PACKET_RCVD; 112 return PACKET_RCVD;
111} 113}
112 114
@@ -129,6 +131,7 @@ static int gre_err(struct sk_buff *skb, u32 info,
129static int gre_tnl_send(struct vport *vport, struct sk_buff *skb) 131static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
130{ 132{
131 struct net *net = ovs_dp_get_net(vport->dp); 133 struct net *net = ovs_dp_get_net(vport->dp);
134 struct ovs_key_ipv4_tunnel *tun_key;
132 struct flowi4 fl; 135 struct flowi4 fl;
133 struct rtable *rt; 136 struct rtable *rt;
134 int min_headroom; 137 int min_headroom;
@@ -136,16 +139,17 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
136 __be16 df; 139 __be16 df;
137 int err; 140 int err;
138 141
139 if (unlikely(!OVS_CB(skb)->tun_key)) { 142 if (unlikely(!OVS_CB(skb)->egress_tun_info)) {
140 err = -EINVAL; 143 err = -EINVAL;
141 goto error; 144 goto error;
142 } 145 }
143 146
147 tun_key = &OVS_CB(skb)->egress_tun_info->tunnel;
144 /* Route lookup */ 148 /* Route lookup */
145 memset(&fl, 0, sizeof(fl)); 149 memset(&fl, 0, sizeof(fl));
146 fl.daddr = OVS_CB(skb)->tun_key->ipv4_dst; 150 fl.daddr = tun_key->ipv4_dst;
147 fl.saddr = OVS_CB(skb)->tun_key->ipv4_src; 151 fl.saddr = tun_key->ipv4_src;
148 fl.flowi4_tos = RT_TOS(OVS_CB(skb)->tun_key->ipv4_tos); 152 fl.flowi4_tos = RT_TOS(tun_key->ipv4_tos);
149 fl.flowi4_mark = skb->mark; 153 fl.flowi4_mark = skb->mark;
150 fl.flowi4_proto = IPPROTO_GRE; 154 fl.flowi4_proto = IPPROTO_GRE;
151 155
@@ -153,7 +157,7 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
153 if (IS_ERR(rt)) 157 if (IS_ERR(rt))
154 return PTR_ERR(rt); 158 return PTR_ERR(rt);
155 159
156 tunnel_hlen = ip_gre_calc_hlen(OVS_CB(skb)->tun_key->tun_flags); 160 tunnel_hlen = ip_gre_calc_hlen(tun_key->tun_flags);
157 161
158 min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len 162 min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
159 + tunnel_hlen + sizeof(struct iphdr) 163 + tunnel_hlen + sizeof(struct iphdr)
@@ -185,15 +189,14 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
185 goto err_free_rt; 189 goto err_free_rt;
186 } 190 }
187 191
188 df = OVS_CB(skb)->tun_key->tun_flags & TUNNEL_DONT_FRAGMENT ? 192 df = tun_key->tun_flags & TUNNEL_DONT_FRAGMENT ?
189 htons(IP_DF) : 0; 193 htons(IP_DF) : 0;
190 194
191 skb->ignore_df = 1; 195 skb->ignore_df = 1;
192 196
193 return iptunnel_xmit(skb->sk, rt, skb, fl.saddr, 197 return iptunnel_xmit(skb->sk, rt, skb, fl.saddr,
194 OVS_CB(skb)->tun_key->ipv4_dst, IPPROTO_GRE, 198 tun_key->ipv4_dst, IPPROTO_GRE,
195 OVS_CB(skb)->tun_key->ipv4_tos, 199 tun_key->ipv4_tos, tun_key->ipv4_ttl, df, false);
196 OVS_CB(skb)->tun_key->ipv4_ttl, df, false);
197err_free_rt: 200err_free_rt:
198 ip_rt_put(rt); 201 ip_rt_put(rt);
199error: 202error:
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index d8b7e247bebf..2735e01dca73 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -1,5 +1,5 @@
1/* 1/*
2 * Copyright (c) 2013 Nicira, Inc. 2 * Copyright (c) 2014 Nicira, Inc.
3 * Copyright (c) 2013 Cisco Systems, Inc. 3 * Copyright (c) 2013 Cisco Systems, Inc.
4 * 4 *
5 * This program is free software; you can redistribute it and/or 5 * This program is free software; you can redistribute it and/or
@@ -58,7 +58,7 @@ static inline struct vxlan_port *vxlan_vport(const struct vport *vport)
58/* Called with rcu_read_lock and BH disabled. */ 58/* Called with rcu_read_lock and BH disabled. */
59static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni) 59static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
60{ 60{
61 struct ovs_key_ipv4_tunnel tun_key; 61 struct ovs_tunnel_info tun_info;
62 struct vport *vport = vs->data; 62 struct vport *vport = vs->data;
63 struct iphdr *iph; 63 struct iphdr *iph;
64 __be64 key; 64 __be64 key;
@@ -66,9 +66,9 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
66 /* Save outer tunnel values */ 66 /* Save outer tunnel values */
67 iph = ip_hdr(skb); 67 iph = ip_hdr(skb);
68 key = cpu_to_be64(ntohl(vx_vni) >> 8); 68 key = cpu_to_be64(ntohl(vx_vni) >> 8);
69 ovs_flow_tun_key_init(&tun_key, iph, key, TUNNEL_KEY); 69 ovs_flow_tun_info_init(&tun_info, iph, key, TUNNEL_KEY, NULL, 0);
70 70
71 ovs_vport_receive(vport, skb, &tun_key); 71 ovs_vport_receive(vport, skb, &tun_info);
72} 72}
73 73
74static int vxlan_get_options(const struct vport *vport, struct sk_buff *skb) 74static int vxlan_get_options(const struct vport *vport, struct sk_buff *skb)
@@ -140,22 +140,24 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
140 struct net *net = ovs_dp_get_net(vport->dp); 140 struct net *net = ovs_dp_get_net(vport->dp);
141 struct vxlan_port *vxlan_port = vxlan_vport(vport); 141 struct vxlan_port *vxlan_port = vxlan_vport(vport);
142 __be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport; 142 __be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
143 struct ovs_key_ipv4_tunnel *tun_key;
143 struct rtable *rt; 144 struct rtable *rt;
144 struct flowi4 fl; 145 struct flowi4 fl;
145 __be16 src_port; 146 __be16 src_port;
146 __be16 df; 147 __be16 df;
147 int err; 148 int err;
148 149
149 if (unlikely(!OVS_CB(skb)->tun_key)) { 150 if (unlikely(!OVS_CB(skb)->egress_tun_info)) {
150 err = -EINVAL; 151 err = -EINVAL;
151 goto error; 152 goto error;
152 } 153 }
153 154
155 tun_key = &OVS_CB(skb)->egress_tun_info->tunnel;
154 /* Route lookup */ 156 /* Route lookup */
155 memset(&fl, 0, sizeof(fl)); 157 memset(&fl, 0, sizeof(fl));
156 fl.daddr = OVS_CB(skb)->tun_key->ipv4_dst; 158 fl.daddr = tun_key->ipv4_dst;
157 fl.saddr = OVS_CB(skb)->tun_key->ipv4_src; 159 fl.saddr = tun_key->ipv4_src;
158 fl.flowi4_tos = RT_TOS(OVS_CB(skb)->tun_key->ipv4_tos); 160 fl.flowi4_tos = RT_TOS(tun_key->ipv4_tos);
159 fl.flowi4_mark = skb->mark; 161 fl.flowi4_mark = skb->mark;
160 fl.flowi4_proto = IPPROTO_UDP; 162 fl.flowi4_proto = IPPROTO_UDP;
161 163
@@ -165,7 +167,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
165 goto error; 167 goto error;
166 } 168 }
167 169
168 df = OVS_CB(skb)->tun_key->tun_flags & TUNNEL_DONT_FRAGMENT ? 170 df = tun_key->tun_flags & TUNNEL_DONT_FRAGMENT ?
169 htons(IP_DF) : 0; 171 htons(IP_DF) : 0;
170 172
171 skb->ignore_df = 1; 173 skb->ignore_df = 1;
@@ -173,11 +175,10 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
173 src_port = udp_flow_src_port(net, skb, 0, 0, true); 175 src_port = udp_flow_src_port(net, skb, 0, 0, true);
174 176
175 err = vxlan_xmit_skb(vxlan_port->vs, rt, skb, 177 err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
176 fl.saddr, OVS_CB(skb)->tun_key->ipv4_dst, 178 fl.saddr, tun_key->ipv4_dst,
177 OVS_CB(skb)->tun_key->ipv4_tos, 179 tun_key->ipv4_tos, tun_key->ipv4_ttl, df,
178 OVS_CB(skb)->tun_key->ipv4_ttl, df,
179 src_port, dst_port, 180 src_port, dst_port,
180 htonl(be64_to_cpu(OVS_CB(skb)->tun_key->tun_id) << 8), 181 htonl(be64_to_cpu(tun_key->tun_id) << 8),
181 false); 182 false);
182 if (err < 0) 183 if (err < 0)
183 ip_rt_put(rt); 184 ip_rt_put(rt);
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 6d8f2ec481d9..53001b020ca7 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -1,5 +1,5 @@
1/* 1/*
2 * Copyright (c) 2007-2012 Nicira, Inc. 2 * Copyright (c) 2007-2014 Nicira, Inc.
3 * 3 *
4 * This program is free software; you can redistribute it and/or 4 * This program is free software; you can redistribute it and/or
5 * modify it under the terms of version 2 of the GNU General Public 5 * modify it under the terms of version 2 of the GNU General Public
@@ -48,6 +48,9 @@ static const struct vport_ops *vport_ops_list[] = {
48#ifdef CONFIG_OPENVSWITCH_VXLAN 48#ifdef CONFIG_OPENVSWITCH_VXLAN
49 &ovs_vxlan_vport_ops, 49 &ovs_vxlan_vport_ops,
50#endif 50#endif
51#ifdef CONFIG_OPENVSWITCH_GENEVE
52 &ovs_geneve_vport_ops,
53#endif
51}; 54};
52 55
53/* Protected by RCU read lock for reading, ovs_mutex for writing. */ 56/* Protected by RCU read lock for reading, ovs_mutex for writing. */
@@ -148,8 +151,6 @@ struct vport *ovs_vport_alloc(int priv_size, const struct vport_ops *ops,
148 return ERR_PTR(-ENOMEM); 151 return ERR_PTR(-ENOMEM);
149 } 152 }
150 153
151 spin_lock_init(&vport->stats_lock);
152
153 return vport; 154 return vport;
154} 155}
155 156
@@ -268,14 +269,10 @@ void ovs_vport_get_stats(struct vport *vport, struct ovs_vport_stats *stats)
268 * netdev-stats can be directly read over netlink-ioctl. 269 * netdev-stats can be directly read over netlink-ioctl.
269 */ 270 */
270 271
271 spin_lock_bh(&vport->stats_lock); 272 stats->rx_errors = atomic_long_read(&vport->err_stats.rx_errors);
272 273 stats->tx_errors = atomic_long_read(&vport->err_stats.tx_errors);
273 stats->rx_errors = vport->err_stats.rx_errors; 274 stats->tx_dropped = atomic_long_read(&vport->err_stats.tx_dropped);
274 stats->tx_errors = vport->err_stats.tx_errors; 275 stats->rx_dropped = atomic_long_read(&vport->err_stats.rx_dropped);
275 stats->tx_dropped = vport->err_stats.tx_dropped;
276 stats->rx_dropped = vport->err_stats.rx_dropped;
277
278 spin_unlock_bh(&vport->stats_lock);
279 276
280 for_each_possible_cpu(i) { 277 for_each_possible_cpu(i) {
281 const struct pcpu_sw_netstats *percpu_stats; 278 const struct pcpu_sw_netstats *percpu_stats;
@@ -438,9 +435,11 @@ u32 ovs_vport_find_upcall_portid(const struct vport *p, struct sk_buff *skb)
438 * skb->data should point to the Ethernet header. 435 * skb->data should point to the Ethernet header.
439 */ 436 */
440void ovs_vport_receive(struct vport *vport, struct sk_buff *skb, 437void ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
441 struct ovs_key_ipv4_tunnel *tun_key) 438 struct ovs_tunnel_info *tun_info)
442{ 439{
443 struct pcpu_sw_netstats *stats; 440 struct pcpu_sw_netstats *stats;
441 struct sw_flow_key key;
442 int error;
444 443
445 stats = this_cpu_ptr(vport->percpu_stats); 444 stats = this_cpu_ptr(vport->percpu_stats);
446 u64_stats_update_begin(&stats->syncp); 445 u64_stats_update_begin(&stats->syncp);
@@ -448,8 +447,15 @@ void ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
448 stats->rx_bytes += skb->len; 447 stats->rx_bytes += skb->len;
449 u64_stats_update_end(&stats->syncp); 448 u64_stats_update_end(&stats->syncp);
450 449
451 OVS_CB(skb)->tun_key = tun_key; 450 OVS_CB(skb)->input_vport = vport;
452 ovs_dp_process_received_packet(vport, skb); 451 OVS_CB(skb)->egress_tun_info = NULL;
452 /* Extract flow from 'skb' into 'key'. */
453 error = ovs_flow_key_extract(tun_info, skb, &key);
454 if (unlikely(error)) {
455 kfree_skb(skb);
456 return;
457 }
458 ovs_dp_process_packet(skb, &key);
453} 459}
454 460
455/** 461/**
@@ -495,27 +501,24 @@ int ovs_vport_send(struct vport *vport, struct sk_buff *skb)
495static void ovs_vport_record_error(struct vport *vport, 501static void ovs_vport_record_error(struct vport *vport,
496 enum vport_err_type err_type) 502 enum vport_err_type err_type)
497{ 503{
498 spin_lock(&vport->stats_lock);
499
500 switch (err_type) { 504 switch (err_type) {
501 case VPORT_E_RX_DROPPED: 505 case VPORT_E_RX_DROPPED:
502 vport->err_stats.rx_dropped++; 506 atomic_long_inc(&vport->err_stats.rx_dropped);
503 break; 507 break;
504 508
505 case VPORT_E_RX_ERROR: 509 case VPORT_E_RX_ERROR:
506 vport->err_stats.rx_errors++; 510 atomic_long_inc(&vport->err_stats.rx_errors);
507 break; 511 break;
508 512
509 case VPORT_E_TX_DROPPED: 513 case VPORT_E_TX_DROPPED:
510 vport->err_stats.tx_dropped++; 514 atomic_long_inc(&vport->err_stats.tx_dropped);
511 break; 515 break;
512 516
513 case VPORT_E_TX_ERROR: 517 case VPORT_E_TX_ERROR:
514 vport->err_stats.tx_errors++; 518 atomic_long_inc(&vport->err_stats.tx_errors);
515 break; 519 break;
516 } 520 }
517 521
518 spin_unlock(&vport->stats_lock);
519} 522}
520 523
521static void free_vport_rcu(struct rcu_head *rcu) 524static void free_vport_rcu(struct rcu_head *rcu)
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index 35f89d84b45e..8942125de3a6 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -35,7 +35,6 @@ struct vport_parms;
35 35
36/* The following definitions are for users of the vport subsytem: */ 36/* The following definitions are for users of the vport subsytem: */
37 37
38/* The following definitions are for users of the vport subsytem: */
39struct vport_net { 38struct vport_net {
40 struct vport __rcu *gre_vport; 39 struct vport __rcu *gre_vport;
41}; 40};
@@ -62,10 +61,10 @@ int ovs_vport_send(struct vport *, struct sk_buff *);
62/* The following definitions are for implementers of vport devices: */ 61/* The following definitions are for implementers of vport devices: */
63 62
64struct vport_err_stats { 63struct vport_err_stats {
65 u64 rx_dropped; 64 atomic_long_t rx_dropped;
66 u64 rx_errors; 65 atomic_long_t rx_errors;
67 u64 tx_dropped; 66 atomic_long_t tx_dropped;
68 u64 tx_errors; 67 atomic_long_t tx_errors;
69}; 68};
70/** 69/**
71 * struct vport_portids - array of netlink portids of a vport. 70 * struct vport_portids - array of netlink portids of a vport.
@@ -93,7 +92,6 @@ struct vport_portids {
93 * @dp_hash_node: Element in @datapath->ports hash table in datapath.c. 92 * @dp_hash_node: Element in @datapath->ports hash table in datapath.c.
94 * @ops: Class structure. 93 * @ops: Class structure.
95 * @percpu_stats: Points to per-CPU statistics used and maintained by vport 94 * @percpu_stats: Points to per-CPU statistics used and maintained by vport
96 * @stats_lock: Protects @err_stats;
97 * @err_stats: Points to error statistics used and maintained by vport 95 * @err_stats: Points to error statistics used and maintained by vport
98 */ 96 */
99struct vport { 97struct vport {
@@ -108,7 +106,6 @@ struct vport {
108 106
109 struct pcpu_sw_netstats __percpu *percpu_stats; 107 struct pcpu_sw_netstats __percpu *percpu_stats;
110 108
111 spinlock_t stats_lock;
112 struct vport_err_stats err_stats; 109 struct vport_err_stats err_stats;
113}; 110};
114 111
@@ -210,7 +207,7 @@ static inline struct vport *vport_from_priv(void *priv)
210} 207}
211 208
212void ovs_vport_receive(struct vport *, struct sk_buff *, 209void ovs_vport_receive(struct vport *, struct sk_buff *,
213 struct ovs_key_ipv4_tunnel *); 210 struct ovs_tunnel_info *);
214 211
215/* List of statically compiled vport implementations. Don't forget to also 212/* List of statically compiled vport implementations. Don't forget to also
216 * add yours to the list at the top of vport.c. */ 213 * add yours to the list at the top of vport.c. */
@@ -218,6 +215,7 @@ extern const struct vport_ops ovs_netdev_vport_ops;
218extern const struct vport_ops ovs_internal_vport_ops; 215extern const struct vport_ops ovs_internal_vport_ops;
219extern const struct vport_ops ovs_gre_vport_ops; 216extern const struct vport_ops ovs_gre_vport_ops;
220extern const struct vport_ops ovs_vxlan_vport_ops; 217extern const struct vport_ops ovs_vxlan_vport_ops;
218extern const struct vport_ops ovs_geneve_vport_ops;
221 219
222static inline void ovs_skb_postpush_rcsum(struct sk_buff *skb, 220static inline void ovs_skb_postpush_rcsum(struct sk_buff *skb,
223 const void *start, unsigned int len) 221 const void *start, unsigned int len)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 93896d2092f6..87d20f48ff06 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -240,11 +240,9 @@ static void __fanout_link(struct sock *sk, struct packet_sock *po);
240static int packet_direct_xmit(struct sk_buff *skb) 240static int packet_direct_xmit(struct sk_buff *skb)
241{ 241{
242 struct net_device *dev = skb->dev; 242 struct net_device *dev = skb->dev;
243 const struct net_device_ops *ops = dev->netdev_ops;
244 netdev_features_t features; 243 netdev_features_t features;
245 struct netdev_queue *txq; 244 struct netdev_queue *txq;
246 int ret = NETDEV_TX_BUSY; 245 int ret = NETDEV_TX_BUSY;
247 u16 queue_map;
248 246
249 if (unlikely(!netif_running(dev) || 247 if (unlikely(!netif_running(dev) ||
250 !netif_carrier_ok(dev))) 248 !netif_carrier_ok(dev)))
@@ -255,17 +253,13 @@ static int packet_direct_xmit(struct sk_buff *skb)
255 __skb_linearize(skb)) 253 __skb_linearize(skb))
256 goto drop; 254 goto drop;
257 255
258 queue_map = skb_get_queue_mapping(skb); 256 txq = skb_get_tx_queue(dev, skb);
259 txq = netdev_get_tx_queue(dev, queue_map);
260 257
261 local_bh_disable(); 258 local_bh_disable();
262 259
263 HARD_TX_LOCK(dev, txq, smp_processor_id()); 260 HARD_TX_LOCK(dev, txq, smp_processor_id());
264 if (!netif_xmit_frozen_or_drv_stopped(txq)) { 261 if (!netif_xmit_frozen_or_drv_stopped(txq))
265 ret = ops->ndo_start_xmit(skb, dev); 262 ret = netdev_start_xmit(skb, dev, txq, false);
266 if (ret == NETDEV_TX_OK)
267 txq_trans_update(txq);
268 }
269 HARD_TX_UNLOCK(dev, txq); 263 HARD_TX_UNLOCK(dev, txq);
270 264
271 local_bh_enable(); 265 local_bh_enable();
diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c
index 56a6146ac94b..a58680016472 100644
--- a/net/phonet/pn_dev.c
+++ b/net/phonet/pn_dev.c
@@ -36,7 +36,7 @@
36 36
37struct phonet_routes { 37struct phonet_routes {
38 struct mutex lock; 38 struct mutex lock;
39 struct net_device *table[64]; 39 struct net_device __rcu *table[64];
40}; 40};
41 41
42struct phonet_net { 42struct phonet_net {
@@ -275,7 +275,7 @@ static void phonet_route_autodel(struct net_device *dev)
275 bitmap_zero(deleted, 64); 275 bitmap_zero(deleted, 64);
276 mutex_lock(&pnn->routes.lock); 276 mutex_lock(&pnn->routes.lock);
277 for (i = 0; i < 64; i++) 277 for (i = 0; i < 64; i++)
278 if (dev == pnn->routes.table[i]) { 278 if (rcu_access_pointer(pnn->routes.table[i]) == dev) {
279 RCU_INIT_POINTER(pnn->routes.table[i], NULL); 279 RCU_INIT_POINTER(pnn->routes.table[i], NULL);
280 set_bit(i, deleted); 280 set_bit(i, deleted);
281 } 281 }
@@ -388,7 +388,7 @@ int phonet_route_del(struct net_device *dev, u8 daddr)
388 388
389 daddr = daddr >> 2; 389 daddr = daddr >> 2;
390 mutex_lock(&routes->lock); 390 mutex_lock(&routes->lock);
391 if (dev == routes->table[daddr]) 391 if (rcu_access_pointer(routes->table[daddr]) == dev)
392 RCU_INIT_POINTER(routes->table[daddr], NULL); 392 RCU_INIT_POINTER(routes->table[daddr], NULL);
393 else 393 else
394 dev = NULL; 394 dev = NULL;
diff --git a/net/rds/send.c b/net/rds/send.c
index 23718160d71e..0a64541020b0 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -593,8 +593,11 @@ static void rds_send_remove_from_sock(struct list_head *messages, int status)
593 sock_put(rds_rs_to_sk(rs)); 593 sock_put(rds_rs_to_sk(rs));
594 } 594 }
595 rs = rm->m_rs; 595 rs = rm->m_rs;
596 sock_hold(rds_rs_to_sk(rs)); 596 if (rs)
597 sock_hold(rds_rs_to_sk(rs));
597 } 598 }
599 if (!rs)
600 goto unlock_and_drop;
598 spin_lock(&rs->rs_lock); 601 spin_lock(&rs->rs_lock);
599 602
600 if (test_and_clear_bit(RDS_MSG_ON_SOCK, &rm->m_flags)) { 603 if (test_and_clear_bit(RDS_MSG_ON_SOCK, &rm->m_flags)) {
@@ -638,9 +641,6 @@ unlock_and_drop:
638 * queue. This means that in the TCP case, the message may not have been 641 * queue. This means that in the TCP case, the message may not have been
639 * assigned the m_ack_seq yet - but that's fine as long as tcp_is_acked 642 * assigned the m_ack_seq yet - but that's fine as long as tcp_is_acked
640 * checks the RDS_MSG_HAS_ACK_SEQ bit. 643 * checks the RDS_MSG_HAS_ACK_SEQ bit.
641 *
642 * XXX It's not clear to me how this is safely serialized with socket
643 * destruction. Maybe it should bail if it sees SOCK_DEAD.
644 */ 644 */
645void rds_send_drop_acked(struct rds_connection *conn, u64 ack, 645void rds_send_drop_acked(struct rds_connection *conn, u64 ack,
646 is_acked_func is_acked) 646 is_acked_func is_acked)
@@ -711,6 +711,9 @@ void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in *dest)
711 */ 711 */
712 if (!test_and_clear_bit(RDS_MSG_ON_CONN, &rm->m_flags)) { 712 if (!test_and_clear_bit(RDS_MSG_ON_CONN, &rm->m_flags)) {
713 spin_unlock_irqrestore(&conn->c_lock, flags); 713 spin_unlock_irqrestore(&conn->c_lock, flags);
714 spin_lock_irqsave(&rm->m_rs_lock, flags);
715 rm->m_rs = NULL;
716 spin_unlock_irqrestore(&rm->m_rs_lock, flags);
714 continue; 717 continue;
715 } 718 }
716 list_del_init(&rm->m_conn_item); 719 list_del_init(&rm->m_conn_item);
diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index a65ee78db0c5..f9f564a6c960 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -106,11 +106,14 @@ int rds_tcp_conn_connect(struct rds_connection *conn)
106 rds_tcp_set_callbacks(sock, conn); 106 rds_tcp_set_callbacks(sock, conn);
107 ret = sock->ops->connect(sock, (struct sockaddr *)&dest, sizeof(dest), 107 ret = sock->ops->connect(sock, (struct sockaddr *)&dest, sizeof(dest),
108 O_NONBLOCK); 108 O_NONBLOCK);
109 sock = NULL;
110 109
111 rdsdebug("connect to address %pI4 returned %d\n", &conn->c_faddr, ret); 110 rdsdebug("connect to address %pI4 returned %d\n", &conn->c_faddr, ret);
112 if (ret == -EINPROGRESS) 111 if (ret == -EINPROGRESS)
113 ret = 0; 112 ret = 0;
113 if (ret == 0)
114 sock = NULL;
115 else
116 rds_tcp_restore_callbacks(sock, conn->c_transport_data);
114 117
115out: 118out:
116 if (sock) 119 if (sock)
diff --git a/net/rds/threads.c b/net/rds/threads.c
index 65eaefcab241..dc2402e871fd 100644
--- a/net/rds/threads.c
+++ b/net/rds/threads.c
@@ -78,8 +78,7 @@ void rds_connect_complete(struct rds_connection *conn)
78 "current state is %d\n", 78 "current state is %d\n",
79 __func__, 79 __func__,
80 atomic_read(&conn->c_state)); 80 atomic_read(&conn->c_state));
81 atomic_set(&conn->c_state, RDS_CONN_ERROR); 81 rds_conn_drop(conn);
82 queue_work(rds_wq, &conn->c_down_w);
83 return; 82 return;
84 } 83 }
85 84
diff --git a/net/rose/rose_link.c b/net/rose/rose_link.c
index bc5514211b0c..e873d7d9f857 100644
--- a/net/rose/rose_link.c
+++ b/net/rose/rose_link.c
@@ -160,7 +160,8 @@ void rose_link_rx_restart(struct sk_buff *skb, struct rose_neigh *neigh, unsigne
160 break; 160 break;
161 161
162 case ROSE_DIAGNOSTIC: 162 case ROSE_DIAGNOSTIC:
163 printk(KERN_WARNING "ROSE: received diagnostic #%d - %02X %02X %02X\n", skb->data[3], skb->data[4], skb->data[5], skb->data[6]); 163 pr_warn("ROSE: received diagnostic #%d - %3ph\n", skb->data[3],
164 skb->data + 4);
164 break; 165 break;
165 166
166 default: 167 default:
diff --git a/net/rxrpc/ar-error.c b/net/rxrpc/ar-error.c
index db57458c824c..74c0fcd36838 100644
--- a/net/rxrpc/ar-error.c
+++ b/net/rxrpc/ar-error.c
@@ -37,7 +37,7 @@ void rxrpc_UDP_error_report(struct sock *sk)
37 37
38 _enter("%p{%d}", sk, local->debug_id); 38 _enter("%p{%d}", sk, local->debug_id);
39 39
40 skb = skb_dequeue(&sk->sk_error_queue); 40 skb = sock_dequeue_err_skb(sk);
41 if (!skb) { 41 if (!skb) {
42 _leave("UDP socket errqueue empty"); 42 _leave("UDP socket errqueue empty");
43 return; 43 return;
@@ -111,18 +111,6 @@ void rxrpc_UDP_error_report(struct sock *sk)
111 skb_queue_tail(&trans->error_queue, skb); 111 skb_queue_tail(&trans->error_queue, skb);
112 rxrpc_queue_work(&trans->error_handler); 112 rxrpc_queue_work(&trans->error_handler);
113 113
114 /* reset and regenerate socket error */
115 spin_lock_bh(&sk->sk_error_queue.lock);
116 sk->sk_err = 0;
117 skb = skb_peek(&sk->sk_error_queue);
118 if (skb) {
119 sk->sk_err = SKB_EXT_ERR(skb)->ee.ee_errno;
120 spin_unlock_bh(&sk->sk_error_queue.lock);
121 sk->sk_error_report(sk);
122 } else {
123 spin_unlock_bh(&sk->sk_error_queue.lock);
124 }
125
126 _leave(""); 114 _leave("");
127} 115}
128 116
diff --git a/net/rxrpc/ar-input.c b/net/rxrpc/ar-input.c
index 63b21e580de9..481f89f93789 100644
--- a/net/rxrpc/ar-input.c
+++ b/net/rxrpc/ar-input.c
@@ -45,7 +45,7 @@ int rxrpc_queue_rcv_skb(struct rxrpc_call *call, struct sk_buff *skb,
45 struct rxrpc_skb_priv *sp; 45 struct rxrpc_skb_priv *sp;
46 struct rxrpc_sock *rx = call->socket; 46 struct rxrpc_sock *rx = call->socket;
47 struct sock *sk; 47 struct sock *sk;
48 int skb_len, ret; 48 int ret;
49 49
50 _enter(",,%d,%d", force, terminal); 50 _enter(",,%d,%d", force, terminal);
51 51
@@ -101,13 +101,6 @@ int rxrpc_queue_rcv_skb(struct rxrpc_call *call, struct sk_buff *skb,
101 rx->interceptor(sk, call->user_call_ID, skb); 101 rx->interceptor(sk, call->user_call_ID, skb);
102 spin_unlock_bh(&sk->sk_receive_queue.lock); 102 spin_unlock_bh(&sk->sk_receive_queue.lock);
103 } else { 103 } else {
104
105 /* Cache the SKB length before we tack it onto the
106 * receive queue. Once it is added it no longer
107 * belongs to us and may be freed by other threads of
108 * control pulling packets from the queue */
109 skb_len = skb->len;
110
111 _net("post skb %p", skb); 104 _net("post skb %p", skb);
112 __skb_queue_tail(&sk->sk_receive_queue, skb); 105 __skb_queue_tail(&sk->sk_receive_queue, skb);
113 spin_unlock_bh(&sk->sk_receive_queue.lock); 106 spin_unlock_bh(&sk->sk_receive_queue.lock);
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 648778aef1a2..3d43e4979f27 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -252,7 +252,8 @@ int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a,
252 p->tcfc_tm.install = jiffies; 252 p->tcfc_tm.install = jiffies;
253 p->tcfc_tm.lastuse = jiffies; 253 p->tcfc_tm.lastuse = jiffies;
254 if (est) { 254 if (est) {
255 int err = gen_new_estimator(&p->tcfc_bstats, &p->tcfc_rate_est, 255 int err = gen_new_estimator(&p->tcfc_bstats, NULL,
256 &p->tcfc_rate_est,
256 &p->tcfc_lock, est); 257 &p->tcfc_lock, est);
257 if (err) { 258 if (err) {
258 kfree(p); 259 kfree(p);
@@ -619,10 +620,12 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct tc_action *a,
619 if (err < 0) 620 if (err < 0)
620 goto errout; 621 goto errout;
621 622
622 if (gnet_stats_copy_basic(&d, &p->tcfc_bstats) < 0 || 623 if (gnet_stats_copy_basic(&d, NULL, &p->tcfc_bstats) < 0 ||
623 gnet_stats_copy_rate_est(&d, &p->tcfc_bstats, 624 gnet_stats_copy_rate_est(&d, &p->tcfc_bstats,
624 &p->tcfc_rate_est) < 0 || 625 &p->tcfc_rate_est) < 0 ||
625 gnet_stats_copy_queue(&d, &p->tcfc_qstats) < 0) 626 gnet_stats_copy_queue(&d, NULL,
627 &p->tcfc_qstats,
628 p->tcfc_qstats.qlen) < 0)
626 goto errout; 629 goto errout;
627 630
628 if (gnet_stats_finish_copy(&d) < 0) 631 if (gnet_stats_finish_copy(&d) < 0)
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 0566e4606a4a..69791ca77a05 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -178,7 +178,7 @@ override:
178 178
179 spin_lock_bh(&police->tcf_lock); 179 spin_lock_bh(&police->tcf_lock);
180 if (est) { 180 if (est) {
181 err = gen_replace_estimator(&police->tcf_bstats, 181 err = gen_replace_estimator(&police->tcf_bstats, NULL,
182 &police->tcf_rate_est, 182 &police->tcf_rate_est,
183 &police->tcf_lock, est); 183 &police->tcf_lock, est);
184 if (err) 184 if (err)
@@ -231,7 +231,7 @@ override:
231 if (ret != ACT_P_CREATED) 231 if (ret != ACT_P_CREATED)
232 return ret; 232 return ret;
233 233
234 police->tcfp_t_c = ktime_to_ns(ktime_get()); 234 police->tcfp_t_c = ktime_get_ns();
235 police->tcf_index = parm->index ? parm->index : 235 police->tcf_index = parm->index ? parm->index :
236 tcf_hash_new_index(hinfo); 236 tcf_hash_new_index(hinfo);
237 h = tcf_hash(police->tcf_index, POL_TAB_MASK); 237 h = tcf_hash(police->tcf_index, POL_TAB_MASK);
@@ -279,7 +279,7 @@ static int tcf_act_police(struct sk_buff *skb, const struct tc_action *a,
279 return police->tcfp_result; 279 return police->tcfp_result;
280 } 280 }
281 281
282 now = ktime_to_ns(ktime_get()); 282 now = ktime_get_ns();
283 toks = min_t(s64, now - police->tcfp_t_c, 283 toks = min_t(s64, now - police->tcfp_t_c,
284 police->tcfp_burst); 284 police->tcfp_burst);
285 if (police->peak_present) { 285 if (police->peak_present) {
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index c28b0d327b12..aad6a679fb13 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -117,7 +117,6 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
117{ 117{
118 struct net *net = sock_net(skb->sk); 118 struct net *net = sock_net(skb->sk);
119 struct nlattr *tca[TCA_MAX + 1]; 119 struct nlattr *tca[TCA_MAX + 1];
120 spinlock_t *root_lock;
121 struct tcmsg *t; 120 struct tcmsg *t;
122 u32 protocol; 121 u32 protocol;
123 u32 prio; 122 u32 prio;
@@ -125,7 +124,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
125 u32 parent; 124 u32 parent;
126 struct net_device *dev; 125 struct net_device *dev;
127 struct Qdisc *q; 126 struct Qdisc *q;
128 struct tcf_proto **back, **chain; 127 struct tcf_proto __rcu **back;
128 struct tcf_proto __rcu **chain;
129 struct tcf_proto *tp; 129 struct tcf_proto *tp;
130 const struct tcf_proto_ops *tp_ops; 130 const struct tcf_proto_ops *tp_ops;
131 const struct Qdisc_class_ops *cops; 131 const struct Qdisc_class_ops *cops;
@@ -197,7 +197,9 @@ replay:
197 goto errout; 197 goto errout;
198 198
199 /* Check the chain for existence of proto-tcf with this priority */ 199 /* Check the chain for existence of proto-tcf with this priority */
200 for (back = chain; (tp = *back) != NULL; back = &tp->next) { 200 for (back = chain;
201 (tp = rtnl_dereference(*back)) != NULL;
202 back = &tp->next) {
201 if (tp->prio >= prio) { 203 if (tp->prio >= prio) {
202 if (tp->prio == prio) { 204 if (tp->prio == prio) {
203 if (!nprio || 205 if (!nprio ||
@@ -209,8 +211,6 @@ replay:
209 } 211 }
210 } 212 }
211 213
212 root_lock = qdisc_root_sleeping_lock(q);
213
214 if (tp == NULL) { 214 if (tp == NULL) {
215 /* Proto-tcf does not exist, create new one */ 215 /* Proto-tcf does not exist, create new one */
216 216
@@ -259,7 +259,8 @@ replay:
259 } 259 }
260 tp->ops = tp_ops; 260 tp->ops = tp_ops;
261 tp->protocol = protocol; 261 tp->protocol = protocol;
262 tp->prio = nprio ? : TC_H_MAJ(tcf_auto_prio(*back)); 262 tp->prio = nprio ? :
263 TC_H_MAJ(tcf_auto_prio(rtnl_dereference(*back)));
263 tp->q = q; 264 tp->q = q;
264 tp->classify = tp_ops->classify; 265 tp->classify = tp_ops->classify;
265 tp->classid = parent; 266 tp->classid = parent;
@@ -280,9 +281,9 @@ replay:
280 281
281 if (fh == 0) { 282 if (fh == 0) {
282 if (n->nlmsg_type == RTM_DELTFILTER && t->tcm_handle == 0) { 283 if (n->nlmsg_type == RTM_DELTFILTER && t->tcm_handle == 0) {
283 spin_lock_bh(root_lock); 284 struct tcf_proto *next = rtnl_dereference(tp->next);
284 *back = tp->next; 285
285 spin_unlock_bh(root_lock); 286 RCU_INIT_POINTER(*back, next);
286 287
287 tfilter_notify(net, skb, n, tp, fh, RTM_DELTFILTER); 288 tfilter_notify(net, skb, n, tp, fh, RTM_DELTFILTER);
288 tcf_destroy(tp); 289 tcf_destroy(tp);
@@ -322,10 +323,8 @@ replay:
322 n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE : TCA_ACT_REPLACE); 323 n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE : TCA_ACT_REPLACE);
323 if (err == 0) { 324 if (err == 0) {
324 if (tp_created) { 325 if (tp_created) {
325 spin_lock_bh(root_lock); 326 RCU_INIT_POINTER(tp->next, rtnl_dereference(*back));
326 tp->next = *back; 327 rcu_assign_pointer(*back, tp);
327 *back = tp;
328 spin_unlock_bh(root_lock);
329 } 328 }
330 tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER); 329 tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER);
331 } else { 330 } else {
@@ -420,7 +419,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
420 int s_t; 419 int s_t;
421 struct net_device *dev; 420 struct net_device *dev;
422 struct Qdisc *q; 421 struct Qdisc *q;
423 struct tcf_proto *tp, **chain; 422 struct tcf_proto *tp, __rcu **chain;
424 struct tcmsg *tcm = nlmsg_data(cb->nlh); 423 struct tcmsg *tcm = nlmsg_data(cb->nlh);
425 unsigned long cl = 0; 424 unsigned long cl = 0;
426 const struct Qdisc_class_ops *cops; 425 const struct Qdisc_class_ops *cops;
@@ -454,7 +453,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
454 453
455 s_t = cb->args[0]; 454 s_t = cb->args[0];
456 455
457 for (tp = *chain, t = 0; tp; tp = tp->next, t++) { 456 for (tp = rtnl_dereference(*chain), t = 0;
457 tp; tp = rtnl_dereference(tp->next), t++) {
458 if (t < s_t) 458 if (t < s_t)
459 continue; 459 continue;
460 if (TC_H_MAJ(tcm->tcm_info) && 460 if (TC_H_MAJ(tcm->tcm_info) &&
@@ -496,7 +496,7 @@ out:
496 return skb->len; 496 return skb->len;
497} 497}
498 498
499void tcf_exts_destroy(struct tcf_proto *tp, struct tcf_exts *exts) 499void tcf_exts_destroy(struct tcf_exts *exts)
500{ 500{
501#ifdef CONFIG_NET_CLS_ACT 501#ifdef CONFIG_NET_CLS_ACT
502 tcf_action_destroy(&exts->actions, TCA_ACT_UNBIND); 502 tcf_action_destroy(&exts->actions, TCA_ACT_UNBIND);
@@ -549,6 +549,7 @@ void tcf_exts_change(struct tcf_proto *tp, struct tcf_exts *dst,
549 tcf_tree_lock(tp); 549 tcf_tree_lock(tp);
550 list_splice_init(&dst->actions, &tmp); 550 list_splice_init(&dst->actions, &tmp);
551 list_splice(&src->actions, &dst->actions); 551 list_splice(&src->actions, &dst->actions);
552 dst->type = src->type;
552 tcf_tree_unlock(tp); 553 tcf_tree_unlock(tp);
553 tcf_action_destroy(&tmp, TCA_ACT_UNBIND); 554 tcf_action_destroy(&tmp, TCA_ACT_UNBIND);
554#endif 555#endif
diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 0ae1813e3e90..cd61280941e5 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -24,6 +24,7 @@
24struct basic_head { 24struct basic_head {
25 u32 hgenerator; 25 u32 hgenerator;
26 struct list_head flist; 26 struct list_head flist;
27 struct rcu_head rcu;
27}; 28};
28 29
29struct basic_filter { 30struct basic_filter {
@@ -31,17 +32,19 @@ struct basic_filter {
31 struct tcf_exts exts; 32 struct tcf_exts exts;
32 struct tcf_ematch_tree ematches; 33 struct tcf_ematch_tree ematches;
33 struct tcf_result res; 34 struct tcf_result res;
35 struct tcf_proto *tp;
34 struct list_head link; 36 struct list_head link;
37 struct rcu_head rcu;
35}; 38};
36 39
37static int basic_classify(struct sk_buff *skb, const struct tcf_proto *tp, 40static int basic_classify(struct sk_buff *skb, const struct tcf_proto *tp,
38 struct tcf_result *res) 41 struct tcf_result *res)
39{ 42{
40 int r; 43 int r;
41 struct basic_head *head = tp->root; 44 struct basic_head *head = rcu_dereference_bh(tp->root);
42 struct basic_filter *f; 45 struct basic_filter *f;
43 46
44 list_for_each_entry(f, &head->flist, link) { 47 list_for_each_entry_rcu(f, &head->flist, link) {
45 if (!tcf_em_tree_match(skb, &f->ematches, NULL)) 48 if (!tcf_em_tree_match(skb, &f->ematches, NULL))
46 continue; 49 continue;
47 *res = f->res; 50 *res = f->res;
@@ -56,7 +59,7 @@ static int basic_classify(struct sk_buff *skb, const struct tcf_proto *tp,
56static unsigned long basic_get(struct tcf_proto *tp, u32 handle) 59static unsigned long basic_get(struct tcf_proto *tp, u32 handle)
57{ 60{
58 unsigned long l = 0UL; 61 unsigned long l = 0UL;
59 struct basic_head *head = tp->root; 62 struct basic_head *head = rtnl_dereference(tp->root);
60 struct basic_filter *f; 63 struct basic_filter *f;
61 64
62 if (head == NULL) 65 if (head == NULL)
@@ -81,41 +84,43 @@ static int basic_init(struct tcf_proto *tp)
81 if (head == NULL) 84 if (head == NULL)
82 return -ENOBUFS; 85 return -ENOBUFS;
83 INIT_LIST_HEAD(&head->flist); 86 INIT_LIST_HEAD(&head->flist);
84 tp->root = head; 87 rcu_assign_pointer(tp->root, head);
85 return 0; 88 return 0;
86} 89}
87 90
88static void basic_delete_filter(struct tcf_proto *tp, struct basic_filter *f) 91static void basic_delete_filter(struct rcu_head *head)
89{ 92{
90 tcf_unbind_filter(tp, &f->res); 93 struct basic_filter *f = container_of(head, struct basic_filter, rcu);
91 tcf_exts_destroy(tp, &f->exts); 94
92 tcf_em_tree_destroy(tp, &f->ematches); 95 tcf_exts_destroy(&f->exts);
96 tcf_em_tree_destroy(&f->ematches);
93 kfree(f); 97 kfree(f);
94} 98}
95 99
96static void basic_destroy(struct tcf_proto *tp) 100static void basic_destroy(struct tcf_proto *tp)
97{ 101{
98 struct basic_head *head = tp->root; 102 struct basic_head *head = rtnl_dereference(tp->root);
99 struct basic_filter *f, *n; 103 struct basic_filter *f, *n;
100 104
101 list_for_each_entry_safe(f, n, &head->flist, link) { 105 list_for_each_entry_safe(f, n, &head->flist, link) {
102 list_del(&f->link); 106 list_del_rcu(&f->link);
103 basic_delete_filter(tp, f); 107 tcf_unbind_filter(tp, &f->res);
108 call_rcu(&f->rcu, basic_delete_filter);
104 } 109 }
105 kfree(head); 110 RCU_INIT_POINTER(tp->root, NULL);
111 kfree_rcu(head, rcu);
106} 112}
107 113
108static int basic_delete(struct tcf_proto *tp, unsigned long arg) 114static int basic_delete(struct tcf_proto *tp, unsigned long arg)
109{ 115{
110 struct basic_head *head = tp->root; 116 struct basic_head *head = rtnl_dereference(tp->root);
111 struct basic_filter *t, *f = (struct basic_filter *) arg; 117 struct basic_filter *t, *f = (struct basic_filter *) arg;
112 118
113 list_for_each_entry(t, &head->flist, link) 119 list_for_each_entry(t, &head->flist, link)
114 if (t == f) { 120 if (t == f) {
115 tcf_tree_lock(tp); 121 list_del_rcu(&t->link);
116 list_del(&t->link); 122 tcf_unbind_filter(tp, &t->res);
117 tcf_tree_unlock(tp); 123 call_rcu(&t->rcu, basic_delete_filter);
118 basic_delete_filter(tp, t);
119 return 0; 124 return 0;
120 } 125 }
121 126
@@ -152,10 +157,11 @@ static int basic_set_parms(struct net *net, struct tcf_proto *tp,
152 157
153 tcf_exts_change(tp, &f->exts, &e); 158 tcf_exts_change(tp, &f->exts, &e);
154 tcf_em_tree_change(tp, &f->ematches, &t); 159 tcf_em_tree_change(tp, &f->ematches, &t);
160 f->tp = tp;
155 161
156 return 0; 162 return 0;
157errout: 163errout:
158 tcf_exts_destroy(tp, &e); 164 tcf_exts_destroy(&e);
159 return err; 165 return err;
160} 166}
161 167
@@ -164,9 +170,10 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
164 struct nlattr **tca, unsigned long *arg, bool ovr) 170 struct nlattr **tca, unsigned long *arg, bool ovr)
165{ 171{
166 int err; 172 int err;
167 struct basic_head *head = tp->root; 173 struct basic_head *head = rtnl_dereference(tp->root);
168 struct nlattr *tb[TCA_BASIC_MAX + 1]; 174 struct nlattr *tb[TCA_BASIC_MAX + 1];
169 struct basic_filter *f = (struct basic_filter *) *arg; 175 struct basic_filter *fold = (struct basic_filter *) *arg;
176 struct basic_filter *fnew;
170 177
171 if (tca[TCA_OPTIONS] == NULL) 178 if (tca[TCA_OPTIONS] == NULL)
172 return -EINVAL; 179 return -EINVAL;
@@ -176,22 +183,23 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
176 if (err < 0) 183 if (err < 0)
177 return err; 184 return err;
178 185
179 if (f != NULL) { 186 if (fold != NULL) {
180 if (handle && f->handle != handle) 187 if (handle && fold->handle != handle)
181 return -EINVAL; 188 return -EINVAL;
182 return basic_set_parms(net, tp, f, base, tb, tca[TCA_RATE], ovr);
183 } 189 }
184 190
185 err = -ENOBUFS; 191 err = -ENOBUFS;
186 f = kzalloc(sizeof(*f), GFP_KERNEL); 192 fnew = kzalloc(sizeof(*fnew), GFP_KERNEL);
187 if (f == NULL) 193 if (fnew == NULL)
188 goto errout; 194 goto errout;
189 195
190 tcf_exts_init(&f->exts, TCA_BASIC_ACT, TCA_BASIC_POLICE); 196 tcf_exts_init(&fnew->exts, TCA_BASIC_ACT, TCA_BASIC_POLICE);
191 err = -EINVAL; 197 err = -EINVAL;
192 if (handle) 198 if (handle) {
193 f->handle = handle; 199 fnew->handle = handle;
194 else { 200 } else if (fold) {
201 fnew->handle = fold->handle;
202 } else {
195 unsigned int i = 0x80000000; 203 unsigned int i = 0x80000000;
196 do { 204 do {
197 if (++head->hgenerator == 0x7FFFFFFF) 205 if (++head->hgenerator == 0x7FFFFFFF)
@@ -203,29 +211,32 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
203 goto errout; 211 goto errout;
204 } 212 }
205 213
206 f->handle = head->hgenerator; 214 fnew->handle = head->hgenerator;
207 } 215 }
208 216
209 err = basic_set_parms(net, tp, f, base, tb, tca[TCA_RATE], ovr); 217 err = basic_set_parms(net, tp, fnew, base, tb, tca[TCA_RATE], ovr);
210 if (err < 0) 218 if (err < 0)
211 goto errout; 219 goto errout;
212 220
213 tcf_tree_lock(tp); 221 *arg = (unsigned long)fnew;
214 list_add(&f->link, &head->flist); 222
215 tcf_tree_unlock(tp); 223 if (fold) {
216 *arg = (unsigned long) f; 224 list_replace_rcu(&fold->link, &fnew->link);
225 tcf_unbind_filter(tp, &fold->res);
226 call_rcu(&fold->rcu, basic_delete_filter);
227 } else {
228 list_add_rcu(&fnew->link, &head->flist);
229 }
217 230
218 return 0; 231 return 0;
219errout: 232errout:
220 if (*arg == 0UL && f) 233 kfree(fnew);
221 kfree(f);
222
223 return err; 234 return err;
224} 235}
225 236
226static void basic_walk(struct tcf_proto *tp, struct tcf_walker *arg) 237static void basic_walk(struct tcf_proto *tp, struct tcf_walker *arg)
227{ 238{
228 struct basic_head *head = tp->root; 239 struct basic_head *head = rtnl_dereference(tp->root);
229 struct basic_filter *f; 240 struct basic_filter *f;
230 241
231 list_for_each_entry(f, &head->flist, link) { 242 list_for_each_entry(f, &head->flist, link) {
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 0e30d58149da..eed49d1d0878 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -27,6 +27,7 @@ MODULE_DESCRIPTION("TC BPF based classifier");
27struct cls_bpf_head { 27struct cls_bpf_head {
28 struct list_head plist; 28 struct list_head plist;
29 u32 hgen; 29 u32 hgen;
30 struct rcu_head rcu;
30}; 31};
31 32
32struct cls_bpf_prog { 33struct cls_bpf_prog {
@@ -37,6 +38,8 @@ struct cls_bpf_prog {
37 struct list_head link; 38 struct list_head link;
38 u32 handle; 39 u32 handle;
39 u16 bpf_len; 40 u16 bpf_len;
41 struct tcf_proto *tp;
42 struct rcu_head rcu;
40}; 43};
41 44
42static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = { 45static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
@@ -49,11 +52,11 @@ static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
49static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp, 52static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
50 struct tcf_result *res) 53 struct tcf_result *res)
51{ 54{
52 struct cls_bpf_head *head = tp->root; 55 struct cls_bpf_head *head = rcu_dereference_bh(tp->root);
53 struct cls_bpf_prog *prog; 56 struct cls_bpf_prog *prog;
54 int ret; 57 int ret;
55 58
56 list_for_each_entry(prog, &head->plist, link) { 59 list_for_each_entry_rcu(prog, &head->plist, link) {
57 int filter_res = BPF_PROG_RUN(prog->filter, skb); 60 int filter_res = BPF_PROG_RUN(prog->filter, skb);
58 61
59 if (filter_res == 0) 62 if (filter_res == 0)
@@ -81,16 +84,15 @@ static int cls_bpf_init(struct tcf_proto *tp)
81 if (head == NULL) 84 if (head == NULL)
82 return -ENOBUFS; 85 return -ENOBUFS;
83 86
84 INIT_LIST_HEAD(&head->plist); 87 INIT_LIST_HEAD_RCU(&head->plist);
85 tp->root = head; 88 rcu_assign_pointer(tp->root, head);
86 89
87 return 0; 90 return 0;
88} 91}
89 92
90static void cls_bpf_delete_prog(struct tcf_proto *tp, struct cls_bpf_prog *prog) 93static void cls_bpf_delete_prog(struct tcf_proto *tp, struct cls_bpf_prog *prog)
91{ 94{
92 tcf_unbind_filter(tp, &prog->res); 95 tcf_exts_destroy(&prog->exts);
93 tcf_exts_destroy(tp, &prog->exts);
94 96
95 bpf_prog_destroy(prog->filter); 97 bpf_prog_destroy(prog->filter);
96 98
@@ -98,18 +100,23 @@ static void cls_bpf_delete_prog(struct tcf_proto *tp, struct cls_bpf_prog *prog)
98 kfree(prog); 100 kfree(prog);
99} 101}
100 102
103static void __cls_bpf_delete_prog(struct rcu_head *rcu)
104{
105 struct cls_bpf_prog *prog = container_of(rcu, struct cls_bpf_prog, rcu);
106
107 cls_bpf_delete_prog(prog->tp, prog);
108}
109
101static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg) 110static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
102{ 111{
103 struct cls_bpf_head *head = tp->root; 112 struct cls_bpf_head *head = rtnl_dereference(tp->root);
104 struct cls_bpf_prog *prog, *todel = (struct cls_bpf_prog *) arg; 113 struct cls_bpf_prog *prog, *todel = (struct cls_bpf_prog *) arg;
105 114
106 list_for_each_entry(prog, &head->plist, link) { 115 list_for_each_entry(prog, &head->plist, link) {
107 if (prog == todel) { 116 if (prog == todel) {
108 tcf_tree_lock(tp); 117 list_del_rcu(&prog->link);
109 list_del(&prog->link); 118 tcf_unbind_filter(tp, &prog->res);
110 tcf_tree_unlock(tp); 119 call_rcu(&prog->rcu, __cls_bpf_delete_prog);
111
112 cls_bpf_delete_prog(tp, prog);
113 return 0; 120 return 0;
114 } 121 }
115 } 122 }
@@ -119,27 +126,29 @@ static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
119 126
120static void cls_bpf_destroy(struct tcf_proto *tp) 127static void cls_bpf_destroy(struct tcf_proto *tp)
121{ 128{
122 struct cls_bpf_head *head = tp->root; 129 struct cls_bpf_head *head = rtnl_dereference(tp->root);
123 struct cls_bpf_prog *prog, *tmp; 130 struct cls_bpf_prog *prog, *tmp;
124 131
125 list_for_each_entry_safe(prog, tmp, &head->plist, link) { 132 list_for_each_entry_safe(prog, tmp, &head->plist, link) {
126 list_del(&prog->link); 133 list_del_rcu(&prog->link);
127 cls_bpf_delete_prog(tp, prog); 134 tcf_unbind_filter(tp, &prog->res);
135 call_rcu(&prog->rcu, __cls_bpf_delete_prog);
128 } 136 }
129 137
130 kfree(head); 138 RCU_INIT_POINTER(tp->root, NULL);
139 kfree_rcu(head, rcu);
131} 140}
132 141
133static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle) 142static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
134{ 143{
135 struct cls_bpf_head *head = tp->root; 144 struct cls_bpf_head *head = rtnl_dereference(tp->root);
136 struct cls_bpf_prog *prog; 145 struct cls_bpf_prog *prog;
137 unsigned long ret = 0UL; 146 unsigned long ret = 0UL;
138 147
139 if (head == NULL) 148 if (head == NULL)
140 return 0UL; 149 return 0UL;
141 150
142 list_for_each_entry(prog, &head->plist, link) { 151 list_for_each_entry_rcu(prog, &head->plist, link) {
143 if (prog->handle == handle) { 152 if (prog->handle == handle) {
144 ret = (unsigned long) prog; 153 ret = (unsigned long) prog;
145 break; 154 break;
@@ -158,10 +167,10 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
158 unsigned long base, struct nlattr **tb, 167 unsigned long base, struct nlattr **tb,
159 struct nlattr *est, bool ovr) 168 struct nlattr *est, bool ovr)
160{ 169{
161 struct sock_filter *bpf_ops, *bpf_old; 170 struct sock_filter *bpf_ops;
162 struct tcf_exts exts; 171 struct tcf_exts exts;
163 struct sock_fprog_kern tmp; 172 struct sock_fprog_kern tmp;
164 struct bpf_prog *fp, *fp_old; 173 struct bpf_prog *fp;
165 u16 bpf_size, bpf_len; 174 u16 bpf_size, bpf_len;
166 u32 classid; 175 u32 classid;
167 int ret; 176 int ret;
@@ -197,30 +206,19 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
197 if (ret) 206 if (ret)
198 goto errout_free; 207 goto errout_free;
199 208
200 tcf_tree_lock(tp);
201 fp_old = prog->filter;
202 bpf_old = prog->bpf_ops;
203
204 prog->bpf_len = bpf_len; 209 prog->bpf_len = bpf_len;
205 prog->bpf_ops = bpf_ops; 210 prog->bpf_ops = bpf_ops;
206 prog->filter = fp; 211 prog->filter = fp;
207 prog->res.classid = classid; 212 prog->res.classid = classid;
208 tcf_tree_unlock(tp);
209 213
210 tcf_bind_filter(tp, &prog->res, base); 214 tcf_bind_filter(tp, &prog->res, base);
211 tcf_exts_change(tp, &prog->exts, &exts); 215 tcf_exts_change(tp, &prog->exts, &exts);
212 216
213 if (fp_old)
214 bpf_prog_destroy(fp_old);
215 if (bpf_old)
216 kfree(bpf_old);
217
218 return 0; 217 return 0;
219
220errout_free: 218errout_free:
221 kfree(bpf_ops); 219 kfree(bpf_ops);
222errout: 220errout:
223 tcf_exts_destroy(tp, &exts); 221 tcf_exts_destroy(&exts);
224 return ret; 222 return ret;
225} 223}
226 224
@@ -244,9 +242,10 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
244 u32 handle, struct nlattr **tca, 242 u32 handle, struct nlattr **tca,
245 unsigned long *arg, bool ovr) 243 unsigned long *arg, bool ovr)
246{ 244{
247 struct cls_bpf_head *head = tp->root; 245 struct cls_bpf_head *head = rtnl_dereference(tp->root);
248 struct cls_bpf_prog *prog = (struct cls_bpf_prog *) *arg; 246 struct cls_bpf_prog *oldprog = (struct cls_bpf_prog *) *arg;
249 struct nlattr *tb[TCA_BPF_MAX + 1]; 247 struct nlattr *tb[TCA_BPF_MAX + 1];
248 struct cls_bpf_prog *prog;
250 int ret; 249 int ret;
251 250
252 if (tca[TCA_OPTIONS] == NULL) 251 if (tca[TCA_OPTIONS] == NULL)
@@ -256,18 +255,19 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
256 if (ret < 0) 255 if (ret < 0)
257 return ret; 256 return ret;
258 257
259 if (prog != NULL) {
260 if (handle && prog->handle != handle)
261 return -EINVAL;
262 return cls_bpf_modify_existing(net, tp, prog, base, tb,
263 tca[TCA_RATE], ovr);
264 }
265
266 prog = kzalloc(sizeof(*prog), GFP_KERNEL); 258 prog = kzalloc(sizeof(*prog), GFP_KERNEL);
267 if (prog == NULL) 259 if (!prog)
268 return -ENOBUFS; 260 return -ENOBUFS;
269 261
270 tcf_exts_init(&prog->exts, TCA_BPF_ACT, TCA_BPF_POLICE); 262 tcf_exts_init(&prog->exts, TCA_BPF_ACT, TCA_BPF_POLICE);
263
264 if (oldprog) {
265 if (handle && oldprog->handle != handle) {
266 ret = -EINVAL;
267 goto errout;
268 }
269 }
270
271 if (handle == 0) 271 if (handle == 0)
272 prog->handle = cls_bpf_grab_new_handle(tp, head); 272 prog->handle = cls_bpf_grab_new_handle(tp, head);
273 else 273 else
@@ -281,16 +281,18 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
281 if (ret < 0) 281 if (ret < 0)
282 goto errout; 282 goto errout;
283 283
284 tcf_tree_lock(tp); 284 if (oldprog) {
285 list_add(&prog->link, &head->plist); 285 list_replace_rcu(&prog->link, &oldprog->link);
286 tcf_tree_unlock(tp); 286 tcf_unbind_filter(tp, &oldprog->res);
287 call_rcu(&oldprog->rcu, __cls_bpf_delete_prog);
288 } else {
289 list_add_rcu(&prog->link, &head->plist);
290 }
287 291
288 *arg = (unsigned long) prog; 292 *arg = (unsigned long) prog;
289
290 return 0; 293 return 0;
291errout: 294errout:
292 if (*arg == 0UL && prog) 295 kfree(prog);
293 kfree(prog);
294 296
295 return ret; 297 return ret;
296} 298}
@@ -339,10 +341,10 @@ nla_put_failure:
339 341
340static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg) 342static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg)
341{ 343{
342 struct cls_bpf_head *head = tp->root; 344 struct cls_bpf_head *head = rtnl_dereference(tp->root);
343 struct cls_bpf_prog *prog; 345 struct cls_bpf_prog *prog;
344 346
345 list_for_each_entry(prog, &head->plist, link) { 347 list_for_each_entry_rcu(prog, &head->plist, link) {
346 if (arg->count < arg->skip) 348 if (arg->count < arg->skip)
347 goto skip; 349 goto skip;
348 if (arg->fn(tp, (unsigned long) prog, arg) < 0) { 350 if (arg->fn(tp, (unsigned long) prog, arg) < 0) {
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index cacf01bd04f0..d61a801222c1 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -22,17 +22,17 @@ struct cls_cgroup_head {
22 u32 handle; 22 u32 handle;
23 struct tcf_exts exts; 23 struct tcf_exts exts;
24 struct tcf_ematch_tree ematches; 24 struct tcf_ematch_tree ematches;
25 struct tcf_proto *tp;
26 struct rcu_head rcu;
25}; 27};
26 28
27static int cls_cgroup_classify(struct sk_buff *skb, const struct tcf_proto *tp, 29static int cls_cgroup_classify(struct sk_buff *skb, const struct tcf_proto *tp,
28 struct tcf_result *res) 30 struct tcf_result *res)
29{ 31{
30 struct cls_cgroup_head *head = tp->root; 32 struct cls_cgroup_head *head = rcu_dereference_bh(tp->root);
31 u32 classid; 33 u32 classid;
32 34
33 rcu_read_lock();
34 classid = task_cls_state(current)->classid; 35 classid = task_cls_state(current)->classid;
35 rcu_read_unlock();
36 36
37 /* 37 /*
38 * Due to the nature of the classifier it is required to ignore all 38 * Due to the nature of the classifier it is required to ignore all
@@ -80,13 +80,25 @@ static const struct nla_policy cgroup_policy[TCA_CGROUP_MAX + 1] = {
80 [TCA_CGROUP_EMATCHES] = { .type = NLA_NESTED }, 80 [TCA_CGROUP_EMATCHES] = { .type = NLA_NESTED },
81}; 81};
82 82
83static void cls_cgroup_destroy_rcu(struct rcu_head *root)
84{
85 struct cls_cgroup_head *head = container_of(root,
86 struct cls_cgroup_head,
87 rcu);
88
89 tcf_exts_destroy(&head->exts);
90 tcf_em_tree_destroy(&head->ematches);
91 kfree(head);
92}
93
83static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb, 94static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
84 struct tcf_proto *tp, unsigned long base, 95 struct tcf_proto *tp, unsigned long base,
85 u32 handle, struct nlattr **tca, 96 u32 handle, struct nlattr **tca,
86 unsigned long *arg, bool ovr) 97 unsigned long *arg, bool ovr)
87{ 98{
88 struct nlattr *tb[TCA_CGROUP_MAX + 1]; 99 struct nlattr *tb[TCA_CGROUP_MAX + 1];
89 struct cls_cgroup_head *head = tp->root; 100 struct cls_cgroup_head *head = rtnl_dereference(tp->root);
101 struct cls_cgroup_head *new;
90 struct tcf_ematch_tree t; 102 struct tcf_ematch_tree t;
91 struct tcf_exts e; 103 struct tcf_exts e;
92 int err; 104 int err;
@@ -94,53 +106,58 @@ static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
94 if (!tca[TCA_OPTIONS]) 106 if (!tca[TCA_OPTIONS])
95 return -EINVAL; 107 return -EINVAL;
96 108
97 if (head == NULL) { 109 if (!head && !handle)
98 if (!handle) 110 return -EINVAL;
99 return -EINVAL;
100
101 head = kzalloc(sizeof(*head), GFP_KERNEL);
102 if (head == NULL)
103 return -ENOBUFS;
104 111
105 tcf_exts_init(&head->exts, TCA_CGROUP_ACT, TCA_CGROUP_POLICE); 112 if (head && handle != head->handle)
106 head->handle = handle; 113 return -ENOENT;
107 114
108 tcf_tree_lock(tp); 115 new = kzalloc(sizeof(*head), GFP_KERNEL);
109 tp->root = head; 116 if (!new)
110 tcf_tree_unlock(tp); 117 return -ENOBUFS;
111 }
112 118
113 if (handle != head->handle) 119 tcf_exts_init(&new->exts, TCA_CGROUP_ACT, TCA_CGROUP_POLICE);
114 return -ENOENT; 120 if (head)
121 new->handle = head->handle;
122 else
123 new->handle = handle;
115 124
125 new->tp = tp;
116 err = nla_parse_nested(tb, TCA_CGROUP_MAX, tca[TCA_OPTIONS], 126 err = nla_parse_nested(tb, TCA_CGROUP_MAX, tca[TCA_OPTIONS],
117 cgroup_policy); 127 cgroup_policy);
118 if (err < 0) 128 if (err < 0)
119 return err; 129 goto errout;
120 130
121 tcf_exts_init(&e, TCA_CGROUP_ACT, TCA_CGROUP_POLICE); 131 tcf_exts_init(&e, TCA_CGROUP_ACT, TCA_CGROUP_POLICE);
122 err = tcf_exts_validate(net, tp, tb, tca[TCA_RATE], &e, ovr); 132 err = tcf_exts_validate(net, tp, tb, tca[TCA_RATE], &e, ovr);
123 if (err < 0) 133 if (err < 0)
124 return err; 134 goto errout;
125 135
126 err = tcf_em_tree_validate(tp, tb[TCA_CGROUP_EMATCHES], &t); 136 err = tcf_em_tree_validate(tp, tb[TCA_CGROUP_EMATCHES], &t);
127 if (err < 0) 137 if (err < 0) {
128 return err; 138 tcf_exts_destroy(&e);
139 goto errout;
140 }
129 141
130 tcf_exts_change(tp, &head->exts, &e); 142 tcf_exts_change(tp, &new->exts, &e);
131 tcf_em_tree_change(tp, &head->ematches, &t); 143 tcf_em_tree_change(tp, &new->ematches, &t);
132 144
145 rcu_assign_pointer(tp->root, new);
146 if (head)
147 call_rcu(&head->rcu, cls_cgroup_destroy_rcu);
133 return 0; 148 return 0;
149errout:
150 kfree(new);
151 return err;
134} 152}
135 153
136static void cls_cgroup_destroy(struct tcf_proto *tp) 154static void cls_cgroup_destroy(struct tcf_proto *tp)
137{ 155{
138 struct cls_cgroup_head *head = tp->root; 156 struct cls_cgroup_head *head = rtnl_dereference(tp->root);
139 157
140 if (head) { 158 if (head) {
141 tcf_exts_destroy(tp, &head->exts); 159 RCU_INIT_POINTER(tp->root, NULL);
142 tcf_em_tree_destroy(tp, &head->ematches); 160 call_rcu(&head->rcu, cls_cgroup_destroy_rcu);
143 kfree(head);
144 } 161 }
145} 162}
146 163
@@ -151,7 +168,7 @@ static int cls_cgroup_delete(struct tcf_proto *tp, unsigned long arg)
151 168
152static void cls_cgroup_walk(struct tcf_proto *tp, struct tcf_walker *arg) 169static void cls_cgroup_walk(struct tcf_proto *tp, struct tcf_walker *arg)
153{ 170{
154 struct cls_cgroup_head *head = tp->root; 171 struct cls_cgroup_head *head = rtnl_dereference(tp->root);
155 172
156 if (arg->count < arg->skip) 173 if (arg->count < arg->skip)
157 goto skip; 174 goto skip;
@@ -167,7 +184,7 @@ skip:
167static int cls_cgroup_dump(struct net *net, struct tcf_proto *tp, unsigned long fh, 184static int cls_cgroup_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
168 struct sk_buff *skb, struct tcmsg *t) 185 struct sk_buff *skb, struct tcmsg *t)
169{ 186{
170 struct cls_cgroup_head *head = tp->root; 187 struct cls_cgroup_head *head = rtnl_dereference(tp->root);
171 unsigned char *b = skb_tail_pointer(skb); 188 unsigned char *b = skb_tail_pointer(skb);
172 struct nlattr *nest; 189 struct nlattr *nest;
173 190
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index 35be16f7c192..4ac515f2a6ce 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -34,12 +34,14 @@
34 34
35struct flow_head { 35struct flow_head {
36 struct list_head filters; 36 struct list_head filters;
37 struct rcu_head rcu;
37}; 38};
38 39
39struct flow_filter { 40struct flow_filter {
40 struct list_head list; 41 struct list_head list;
41 struct tcf_exts exts; 42 struct tcf_exts exts;
42 struct tcf_ematch_tree ematches; 43 struct tcf_ematch_tree ematches;
44 struct tcf_proto *tp;
43 struct timer_list perturb_timer; 45 struct timer_list perturb_timer;
44 u32 perturb_period; 46 u32 perturb_period;
45 u32 handle; 47 u32 handle;
@@ -54,6 +56,7 @@ struct flow_filter {
54 u32 divisor; 56 u32 divisor;
55 u32 baseclass; 57 u32 baseclass;
56 u32 hashrnd; 58 u32 hashrnd;
59 struct rcu_head rcu;
57}; 60};
58 61
59static inline u32 addr_fold(void *addr) 62static inline u32 addr_fold(void *addr)
@@ -276,14 +279,14 @@ static u32 flow_key_get(struct sk_buff *skb, int key, struct flow_keys *flow)
276static int flow_classify(struct sk_buff *skb, const struct tcf_proto *tp, 279static int flow_classify(struct sk_buff *skb, const struct tcf_proto *tp,
277 struct tcf_result *res) 280 struct tcf_result *res)
278{ 281{
279 struct flow_head *head = tp->root; 282 struct flow_head *head = rcu_dereference_bh(tp->root);
280 struct flow_filter *f; 283 struct flow_filter *f;
281 u32 keymask; 284 u32 keymask;
282 u32 classid; 285 u32 classid;
283 unsigned int n, key; 286 unsigned int n, key;
284 int r; 287 int r;
285 288
286 list_for_each_entry(f, &head->filters, list) { 289 list_for_each_entry_rcu(f, &head->filters, list) {
287 u32 keys[FLOW_KEY_MAX + 1]; 290 u32 keys[FLOW_KEY_MAX + 1];
288 struct flow_keys flow_keys; 291 struct flow_keys flow_keys;
289 292
@@ -346,13 +349,23 @@ static const struct nla_policy flow_policy[TCA_FLOW_MAX + 1] = {
346 [TCA_FLOW_PERTURB] = { .type = NLA_U32 }, 349 [TCA_FLOW_PERTURB] = { .type = NLA_U32 },
347}; 350};
348 351
352static void flow_destroy_filter(struct rcu_head *head)
353{
354 struct flow_filter *f = container_of(head, struct flow_filter, rcu);
355
356 del_timer_sync(&f->perturb_timer);
357 tcf_exts_destroy(&f->exts);
358 tcf_em_tree_destroy(&f->ematches);
359 kfree(f);
360}
361
349static int flow_change(struct net *net, struct sk_buff *in_skb, 362static int flow_change(struct net *net, struct sk_buff *in_skb,
350 struct tcf_proto *tp, unsigned long base, 363 struct tcf_proto *tp, unsigned long base,
351 u32 handle, struct nlattr **tca, 364 u32 handle, struct nlattr **tca,
352 unsigned long *arg, bool ovr) 365 unsigned long *arg, bool ovr)
353{ 366{
354 struct flow_head *head = tp->root; 367 struct flow_head *head = rtnl_dereference(tp->root);
355 struct flow_filter *f; 368 struct flow_filter *fold, *fnew;
356 struct nlattr *opt = tca[TCA_OPTIONS]; 369 struct nlattr *opt = tca[TCA_OPTIONS];
357 struct nlattr *tb[TCA_FLOW_MAX + 1]; 370 struct nlattr *tb[TCA_FLOW_MAX + 1];
358 struct tcf_exts e; 371 struct tcf_exts e;
@@ -401,20 +414,42 @@ static int flow_change(struct net *net, struct sk_buff *in_skb,
401 if (err < 0) 414 if (err < 0)
402 goto err1; 415 goto err1;
403 416
404 f = (struct flow_filter *)*arg; 417 err = -ENOBUFS;
405 if (f != NULL) { 418 fnew = kzalloc(sizeof(*fnew), GFP_KERNEL);
419 if (!fnew)
420 goto err2;
421
422 fold = (struct flow_filter *)*arg;
423 if (fold) {
406 err = -EINVAL; 424 err = -EINVAL;
407 if (f->handle != handle && handle) 425 if (fold->handle != handle && handle)
408 goto err2; 426 goto err2;
409 427
410 mode = f->mode; 428 /* Copy fold into fnew */
429 fnew->handle = fold->handle;
430 fnew->keymask = fold->keymask;
431 fnew->tp = fold->tp;
432
433 fnew->handle = fold->handle;
434 fnew->nkeys = fold->nkeys;
435 fnew->keymask = fold->keymask;
436 fnew->mode = fold->mode;
437 fnew->mask = fold->mask;
438 fnew->xor = fold->xor;
439 fnew->rshift = fold->rshift;
440 fnew->addend = fold->addend;
441 fnew->divisor = fold->divisor;
442 fnew->baseclass = fold->baseclass;
443 fnew->hashrnd = fold->hashrnd;
444
445 mode = fold->mode;
411 if (tb[TCA_FLOW_MODE]) 446 if (tb[TCA_FLOW_MODE])
412 mode = nla_get_u32(tb[TCA_FLOW_MODE]); 447 mode = nla_get_u32(tb[TCA_FLOW_MODE]);
413 if (mode != FLOW_MODE_HASH && nkeys > 1) 448 if (mode != FLOW_MODE_HASH && nkeys > 1)
414 goto err2; 449 goto err2;
415 450
416 if (mode == FLOW_MODE_HASH) 451 if (mode == FLOW_MODE_HASH)
417 perturb_period = f->perturb_period; 452 perturb_period = fold->perturb_period;
418 if (tb[TCA_FLOW_PERTURB]) { 453 if (tb[TCA_FLOW_PERTURB]) {
419 if (mode != FLOW_MODE_HASH) 454 if (mode != FLOW_MODE_HASH)
420 goto err2; 455 goto err2;
@@ -444,83 +479,72 @@ static int flow_change(struct net *net, struct sk_buff *in_skb,
444 if (TC_H_MIN(baseclass) == 0) 479 if (TC_H_MIN(baseclass) == 0)
445 baseclass = TC_H_MAKE(baseclass, 1); 480 baseclass = TC_H_MAKE(baseclass, 1);
446 481
447 err = -ENOBUFS; 482 fnew->handle = handle;
448 f = kzalloc(sizeof(*f), GFP_KERNEL); 483 fnew->mask = ~0U;
449 if (f == NULL) 484 fnew->tp = tp;
450 goto err2; 485 get_random_bytes(&fnew->hashrnd, 4);
451 486 tcf_exts_init(&fnew->exts, TCA_FLOW_ACT, TCA_FLOW_POLICE);
452 f->handle = handle;
453 f->mask = ~0U;
454 tcf_exts_init(&f->exts, TCA_FLOW_ACT, TCA_FLOW_POLICE);
455
456 get_random_bytes(&f->hashrnd, 4);
457 f->perturb_timer.function = flow_perturbation;
458 f->perturb_timer.data = (unsigned long)f;
459 init_timer_deferrable(&f->perturb_timer);
460 } 487 }
461 488
462 tcf_exts_change(tp, &f->exts, &e); 489 fnew->perturb_timer.function = flow_perturbation;
463 tcf_em_tree_change(tp, &f->ematches, &t); 490 fnew->perturb_timer.data = (unsigned long)fnew;
491 init_timer_deferrable(&fnew->perturb_timer);
464 492
465 tcf_tree_lock(tp); 493 tcf_exts_change(tp, &fnew->exts, &e);
494 tcf_em_tree_change(tp, &fnew->ematches, &t);
495
496 netif_keep_dst(qdisc_dev(tp->q));
466 497
467 if (tb[TCA_FLOW_KEYS]) { 498 if (tb[TCA_FLOW_KEYS]) {
468 f->keymask = keymask; 499 fnew->keymask = keymask;
469 f->nkeys = nkeys; 500 fnew->nkeys = nkeys;
470 } 501 }
471 502
472 f->mode = mode; 503 fnew->mode = mode;
473 504
474 if (tb[TCA_FLOW_MASK]) 505 if (tb[TCA_FLOW_MASK])
475 f->mask = nla_get_u32(tb[TCA_FLOW_MASK]); 506 fnew->mask = nla_get_u32(tb[TCA_FLOW_MASK]);
476 if (tb[TCA_FLOW_XOR]) 507 if (tb[TCA_FLOW_XOR])
477 f->xor = nla_get_u32(tb[TCA_FLOW_XOR]); 508 fnew->xor = nla_get_u32(tb[TCA_FLOW_XOR]);
478 if (tb[TCA_FLOW_RSHIFT]) 509 if (tb[TCA_FLOW_RSHIFT])
479 f->rshift = nla_get_u32(tb[TCA_FLOW_RSHIFT]); 510 fnew->rshift = nla_get_u32(tb[TCA_FLOW_RSHIFT]);
480 if (tb[TCA_FLOW_ADDEND]) 511 if (tb[TCA_FLOW_ADDEND])
481 f->addend = nla_get_u32(tb[TCA_FLOW_ADDEND]); 512 fnew->addend = nla_get_u32(tb[TCA_FLOW_ADDEND]);
482 513
483 if (tb[TCA_FLOW_DIVISOR]) 514 if (tb[TCA_FLOW_DIVISOR])
484 f->divisor = nla_get_u32(tb[TCA_FLOW_DIVISOR]); 515 fnew->divisor = nla_get_u32(tb[TCA_FLOW_DIVISOR]);
485 if (baseclass) 516 if (baseclass)
486 f->baseclass = baseclass; 517 fnew->baseclass = baseclass;
487 518
488 f->perturb_period = perturb_period; 519 fnew->perturb_period = perturb_period;
489 del_timer(&f->perturb_timer);
490 if (perturb_period) 520 if (perturb_period)
491 mod_timer(&f->perturb_timer, jiffies + perturb_period); 521 mod_timer(&fnew->perturb_timer, jiffies + perturb_period);
492 522
493 if (*arg == 0) 523 if (*arg == 0)
494 list_add_tail(&f->list, &head->filters); 524 list_add_tail_rcu(&fnew->list, &head->filters);
525 else
526 list_replace_rcu(&fnew->list, &fold->list);
495 527
496 tcf_tree_unlock(tp); 528 *arg = (unsigned long)fnew;
497 529
498 *arg = (unsigned long)f; 530 if (fold)
531 call_rcu(&fold->rcu, flow_destroy_filter);
499 return 0; 532 return 0;
500 533
501err2: 534err2:
502 tcf_em_tree_destroy(tp, &t); 535 tcf_em_tree_destroy(&t);
536 kfree(fnew);
503err1: 537err1:
504 tcf_exts_destroy(tp, &e); 538 tcf_exts_destroy(&e);
505 return err; 539 return err;
506} 540}
507 541
508static void flow_destroy_filter(struct tcf_proto *tp, struct flow_filter *f)
509{
510 del_timer_sync(&f->perturb_timer);
511 tcf_exts_destroy(tp, &f->exts);
512 tcf_em_tree_destroy(tp, &f->ematches);
513 kfree(f);
514}
515
516static int flow_delete(struct tcf_proto *tp, unsigned long arg) 542static int flow_delete(struct tcf_proto *tp, unsigned long arg)
517{ 543{
518 struct flow_filter *f = (struct flow_filter *)arg; 544 struct flow_filter *f = (struct flow_filter *)arg;
519 545
520 tcf_tree_lock(tp); 546 list_del_rcu(&f->list);
521 list_del(&f->list); 547 call_rcu(&f->rcu, flow_destroy_filter);
522 tcf_tree_unlock(tp);
523 flow_destroy_filter(tp, f);
524 return 0; 548 return 0;
525} 549}
526 550
@@ -532,28 +556,29 @@ static int flow_init(struct tcf_proto *tp)
532 if (head == NULL) 556 if (head == NULL)
533 return -ENOBUFS; 557 return -ENOBUFS;
534 INIT_LIST_HEAD(&head->filters); 558 INIT_LIST_HEAD(&head->filters);
535 tp->root = head; 559 rcu_assign_pointer(tp->root, head);
536 return 0; 560 return 0;
537} 561}
538 562
539static void flow_destroy(struct tcf_proto *tp) 563static void flow_destroy(struct tcf_proto *tp)
540{ 564{
541 struct flow_head *head = tp->root; 565 struct flow_head *head = rtnl_dereference(tp->root);
542 struct flow_filter *f, *next; 566 struct flow_filter *f, *next;
543 567
544 list_for_each_entry_safe(f, next, &head->filters, list) { 568 list_for_each_entry_safe(f, next, &head->filters, list) {
545 list_del(&f->list); 569 list_del_rcu(&f->list);
546 flow_destroy_filter(tp, f); 570 call_rcu(&f->rcu, flow_destroy_filter);
547 } 571 }
548 kfree(head); 572 RCU_INIT_POINTER(tp->root, NULL);
573 kfree_rcu(head, rcu);
549} 574}
550 575
551static unsigned long flow_get(struct tcf_proto *tp, u32 handle) 576static unsigned long flow_get(struct tcf_proto *tp, u32 handle)
552{ 577{
553 struct flow_head *head = tp->root; 578 struct flow_head *head = rtnl_dereference(tp->root);
554 struct flow_filter *f; 579 struct flow_filter *f;
555 580
556 list_for_each_entry(f, &head->filters, list) 581 list_for_each_entry_rcu(f, &head->filters, list)
557 if (f->handle == handle) 582 if (f->handle == handle)
558 return (unsigned long)f; 583 return (unsigned long)f;
559 return 0; 584 return 0;
@@ -626,10 +651,10 @@ nla_put_failure:
626 651
627static void flow_walk(struct tcf_proto *tp, struct tcf_walker *arg) 652static void flow_walk(struct tcf_proto *tp, struct tcf_walker *arg)
628{ 653{
629 struct flow_head *head = tp->root; 654 struct flow_head *head = rtnl_dereference(tp->root);
630 struct flow_filter *f; 655 struct flow_filter *f;
631 656
632 list_for_each_entry(f, &head->filters, list) { 657 list_for_each_entry_rcu(f, &head->filters, list) {
633 if (arg->count < arg->skip) 658 if (arg->count < arg->skip)
634 goto skip; 659 goto skip;
635 if (arg->fn(tp, (unsigned long)f, arg) < 0) { 660 if (arg->fn(tp, (unsigned long)f, arg) < 0) {
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index 861b03ccfed0..dbfdfd1f1a9f 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -33,17 +33,20 @@
33 33
34struct fw_head { 34struct fw_head {
35 u32 mask; 35 u32 mask;
36 struct fw_filter *ht[HTSIZE]; 36 struct fw_filter __rcu *ht[HTSIZE];
37 struct rcu_head rcu;
37}; 38};
38 39
39struct fw_filter { 40struct fw_filter {
40 struct fw_filter *next; 41 struct fw_filter __rcu *next;
41 u32 id; 42 u32 id;
42 struct tcf_result res; 43 struct tcf_result res;
43#ifdef CONFIG_NET_CLS_IND 44#ifdef CONFIG_NET_CLS_IND
44 int ifindex; 45 int ifindex;
45#endif /* CONFIG_NET_CLS_IND */ 46#endif /* CONFIG_NET_CLS_IND */
46 struct tcf_exts exts; 47 struct tcf_exts exts;
48 struct tcf_proto *tp;
49 struct rcu_head rcu;
47}; 50};
48 51
49static u32 fw_hash(u32 handle) 52static u32 fw_hash(u32 handle)
@@ -56,14 +59,16 @@ static u32 fw_hash(u32 handle)
56static int fw_classify(struct sk_buff *skb, const struct tcf_proto *tp, 59static int fw_classify(struct sk_buff *skb, const struct tcf_proto *tp,
57 struct tcf_result *res) 60 struct tcf_result *res)
58{ 61{
59 struct fw_head *head = tp->root; 62 struct fw_head *head = rcu_dereference_bh(tp->root);
60 struct fw_filter *f; 63 struct fw_filter *f;
61 int r; 64 int r;
62 u32 id = skb->mark; 65 u32 id = skb->mark;
63 66
64 if (head != NULL) { 67 if (head != NULL) {
65 id &= head->mask; 68 id &= head->mask;
66 for (f = head->ht[fw_hash(id)]; f; f = f->next) { 69
70 for (f = rcu_dereference_bh(head->ht[fw_hash(id)]); f;
71 f = rcu_dereference_bh(f->next)) {
67 if (f->id == id) { 72 if (f->id == id) {
68 *res = f->res; 73 *res = f->res;
69#ifdef CONFIG_NET_CLS_IND 74#ifdef CONFIG_NET_CLS_IND
@@ -92,13 +97,14 @@ static int fw_classify(struct sk_buff *skb, const struct tcf_proto *tp,
92 97
93static unsigned long fw_get(struct tcf_proto *tp, u32 handle) 98static unsigned long fw_get(struct tcf_proto *tp, u32 handle)
94{ 99{
95 struct fw_head *head = tp->root; 100 struct fw_head *head = rtnl_dereference(tp->root);
96 struct fw_filter *f; 101 struct fw_filter *f;
97 102
98 if (head == NULL) 103 if (head == NULL)
99 return 0; 104 return 0;
100 105
101 for (f = head->ht[fw_hash(handle)]; f; f = f->next) { 106 f = rtnl_dereference(head->ht[fw_hash(handle)]);
107 for (; f; f = rtnl_dereference(f->next)) {
102 if (f->id == handle) 108 if (f->id == handle)
103 return (unsigned long)f; 109 return (unsigned long)f;
104 } 110 }
@@ -114,16 +120,17 @@ static int fw_init(struct tcf_proto *tp)
114 return 0; 120 return 0;
115} 121}
116 122
117static void fw_delete_filter(struct tcf_proto *tp, struct fw_filter *f) 123static void fw_delete_filter(struct rcu_head *head)
118{ 124{
119 tcf_unbind_filter(tp, &f->res); 125 struct fw_filter *f = container_of(head, struct fw_filter, rcu);
120 tcf_exts_destroy(tp, &f->exts); 126
127 tcf_exts_destroy(&f->exts);
121 kfree(f); 128 kfree(f);
122} 129}
123 130
124static void fw_destroy(struct tcf_proto *tp) 131static void fw_destroy(struct tcf_proto *tp)
125{ 132{
126 struct fw_head *head = tp->root; 133 struct fw_head *head = rtnl_dereference(tp->root);
127 struct fw_filter *f; 134 struct fw_filter *f;
128 int h; 135 int h;
129 136
@@ -131,29 +138,35 @@ static void fw_destroy(struct tcf_proto *tp)
131 return; 138 return;
132 139
133 for (h = 0; h < HTSIZE; h++) { 140 for (h = 0; h < HTSIZE; h++) {
134 while ((f = head->ht[h]) != NULL) { 141 while ((f = rtnl_dereference(head->ht[h])) != NULL) {
135 head->ht[h] = f->next; 142 RCU_INIT_POINTER(head->ht[h],
136 fw_delete_filter(tp, f); 143 rtnl_dereference(f->next));
144 tcf_unbind_filter(tp, &f->res);
145 call_rcu(&f->rcu, fw_delete_filter);
137 } 146 }
138 } 147 }
139 kfree(head); 148 RCU_INIT_POINTER(tp->root, NULL);
149 kfree_rcu(head, rcu);
140} 150}
141 151
142static int fw_delete(struct tcf_proto *tp, unsigned long arg) 152static int fw_delete(struct tcf_proto *tp, unsigned long arg)
143{ 153{
144 struct fw_head *head = tp->root; 154 struct fw_head *head = rtnl_dereference(tp->root);
145 struct fw_filter *f = (struct fw_filter *)arg; 155 struct fw_filter *f = (struct fw_filter *)arg;
146 struct fw_filter **fp; 156 struct fw_filter __rcu **fp;
157 struct fw_filter *pfp;
147 158
148 if (head == NULL || f == NULL) 159 if (head == NULL || f == NULL)
149 goto out; 160 goto out;
150 161
151 for (fp = &head->ht[fw_hash(f->id)]; *fp; fp = &(*fp)->next) { 162 fp = &head->ht[fw_hash(f->id)];
152 if (*fp == f) { 163
153 tcf_tree_lock(tp); 164 for (pfp = rtnl_dereference(*fp); pfp;
154 *fp = f->next; 165 fp = &pfp->next, pfp = rtnl_dereference(*fp)) {
155 tcf_tree_unlock(tp); 166 if (pfp == f) {
156 fw_delete_filter(tp, f); 167 RCU_INIT_POINTER(*fp, rtnl_dereference(f->next));
168 tcf_unbind_filter(tp, &f->res);
169 call_rcu(&f->rcu, fw_delete_filter);
157 return 0; 170 return 0;
158 } 171 }
159 } 172 }
@@ -171,7 +184,7 @@ static int
171fw_change_attrs(struct net *net, struct tcf_proto *tp, struct fw_filter *f, 184fw_change_attrs(struct net *net, struct tcf_proto *tp, struct fw_filter *f,
172 struct nlattr **tb, struct nlattr **tca, unsigned long base, bool ovr) 185 struct nlattr **tb, struct nlattr **tca, unsigned long base, bool ovr)
173{ 186{
174 struct fw_head *head = tp->root; 187 struct fw_head *head = rtnl_dereference(tp->root);
175 struct tcf_exts e; 188 struct tcf_exts e;
176 u32 mask; 189 u32 mask;
177 int err; 190 int err;
@@ -210,7 +223,7 @@ fw_change_attrs(struct net *net, struct tcf_proto *tp, struct fw_filter *f,
210 223
211 return 0; 224 return 0;
212errout: 225errout:
213 tcf_exts_destroy(tp, &e); 226 tcf_exts_destroy(&e);
214 return err; 227 return err;
215} 228}
216 229
@@ -220,7 +233,7 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
220 struct nlattr **tca, 233 struct nlattr **tca,
221 unsigned long *arg, bool ovr) 234 unsigned long *arg, bool ovr)
222{ 235{
223 struct fw_head *head = tp->root; 236 struct fw_head *head = rtnl_dereference(tp->root);
224 struct fw_filter *f = (struct fw_filter *) *arg; 237 struct fw_filter *f = (struct fw_filter *) *arg;
225 struct nlattr *opt = tca[TCA_OPTIONS]; 238 struct nlattr *opt = tca[TCA_OPTIONS];
226 struct nlattr *tb[TCA_FW_MAX + 1]; 239 struct nlattr *tb[TCA_FW_MAX + 1];
@@ -233,10 +246,45 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
233 if (err < 0) 246 if (err < 0)
234 return err; 247 return err;
235 248
236 if (f != NULL) { 249 if (f) {
250 struct fw_filter *pfp, *fnew;
251 struct fw_filter __rcu **fp;
252
237 if (f->id != handle && handle) 253 if (f->id != handle && handle)
238 return -EINVAL; 254 return -EINVAL;
239 return fw_change_attrs(net, tp, f, tb, tca, base, ovr); 255
256 fnew = kzalloc(sizeof(struct fw_filter), GFP_KERNEL);
257 if (!fnew)
258 return -ENOBUFS;
259
260 fnew->id = f->id;
261 fnew->res = f->res;
262#ifdef CONFIG_NET_CLS_IND
263 fnew->ifindex = f->ifindex;
264#endif /* CONFIG_NET_CLS_IND */
265 fnew->tp = f->tp;
266
267 tcf_exts_init(&fnew->exts, TCA_FW_ACT, TCA_FW_POLICE);
268
269 err = fw_change_attrs(net, tp, fnew, tb, tca, base, ovr);
270 if (err < 0) {
271 kfree(fnew);
272 return err;
273 }
274
275 fp = &head->ht[fw_hash(fnew->id)];
276 for (pfp = rtnl_dereference(*fp); pfp;
277 fp = &pfp->next, pfp = rtnl_dereference(*fp))
278 if (pfp == f)
279 break;
280
281 RCU_INIT_POINTER(fnew->next, rtnl_dereference(pfp->next));
282 rcu_assign_pointer(*fp, fnew);
283 tcf_unbind_filter(tp, &f->res);
284 call_rcu(&f->rcu, fw_delete_filter);
285
286 *arg = (unsigned long)fnew;
287 return err;
240 } 288 }
241 289
242 if (!handle) 290 if (!handle)
@@ -252,9 +300,7 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
252 return -ENOBUFS; 300 return -ENOBUFS;
253 head->mask = mask; 301 head->mask = mask;
254 302
255 tcf_tree_lock(tp); 303 rcu_assign_pointer(tp->root, head);
256 tp->root = head;
257 tcf_tree_unlock(tp);
258 } 304 }
259 305
260 f = kzalloc(sizeof(struct fw_filter), GFP_KERNEL); 306 f = kzalloc(sizeof(struct fw_filter), GFP_KERNEL);
@@ -263,15 +309,14 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
263 309
264 tcf_exts_init(&f->exts, TCA_FW_ACT, TCA_FW_POLICE); 310 tcf_exts_init(&f->exts, TCA_FW_ACT, TCA_FW_POLICE);
265 f->id = handle; 311 f->id = handle;
312 f->tp = tp;
266 313
267 err = fw_change_attrs(net, tp, f, tb, tca, base, ovr); 314 err = fw_change_attrs(net, tp, f, tb, tca, base, ovr);
268 if (err < 0) 315 if (err < 0)
269 goto errout; 316 goto errout;
270 317
271 f->next = head->ht[fw_hash(handle)]; 318 RCU_INIT_POINTER(f->next, head->ht[fw_hash(handle)]);
272 tcf_tree_lock(tp); 319 rcu_assign_pointer(head->ht[fw_hash(handle)], f);
273 head->ht[fw_hash(handle)] = f;
274 tcf_tree_unlock(tp);
275 320
276 *arg = (unsigned long)f; 321 *arg = (unsigned long)f;
277 return 0; 322 return 0;
@@ -283,7 +328,7 @@ errout:
283 328
284static void fw_walk(struct tcf_proto *tp, struct tcf_walker *arg) 329static void fw_walk(struct tcf_proto *tp, struct tcf_walker *arg)
285{ 330{
286 struct fw_head *head = tp->root; 331 struct fw_head *head = rtnl_dereference(tp->root);
287 int h; 332 int h;
288 333
289 if (head == NULL) 334 if (head == NULL)
@@ -295,7 +340,8 @@ static void fw_walk(struct tcf_proto *tp, struct tcf_walker *arg)
295 for (h = 0; h < HTSIZE; h++) { 340 for (h = 0; h < HTSIZE; h++) {
296 struct fw_filter *f; 341 struct fw_filter *f;
297 342
298 for (f = head->ht[h]; f; f = f->next) { 343 for (f = rtnl_dereference(head->ht[h]); f;
344 f = rtnl_dereference(f->next)) {
299 if (arg->count < arg->skip) { 345 if (arg->count < arg->skip) {
300 arg->count++; 346 arg->count++;
301 continue; 347 continue;
@@ -312,7 +358,7 @@ static void fw_walk(struct tcf_proto *tp, struct tcf_walker *arg)
312static int fw_dump(struct net *net, struct tcf_proto *tp, unsigned long fh, 358static int fw_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
313 struct sk_buff *skb, struct tcmsg *t) 359 struct sk_buff *skb, struct tcmsg *t)
314{ 360{
315 struct fw_head *head = tp->root; 361 struct fw_head *head = rtnl_dereference(tp->root);
316 struct fw_filter *f = (struct fw_filter *)fh; 362 struct fw_filter *f = (struct fw_filter *)fh;
317 unsigned char *b = skb_tail_pointer(skb); 363 unsigned char *b = skb_tail_pointer(skb);
318 struct nlattr *nest; 364 struct nlattr *nest;
diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index dd9fc2523c76..109a329b7198 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -29,25 +29,26 @@
29 * are mutually exclusive. 29 * are mutually exclusive.
30 * 3. "to TAG from ANY" has higher priority, than "to ANY from XXX" 30 * 3. "to TAG from ANY" has higher priority, than "to ANY from XXX"
31 */ 31 */
32
33struct route4_fastmap { 32struct route4_fastmap {
34 struct route4_filter *filter; 33 struct route4_filter *filter;
35 u32 id; 34 u32 id;
36 int iif; 35 int iif;
37}; 36};
38 37
39struct route4_head { 38struct route4_head {
40 struct route4_fastmap fastmap[16]; 39 struct route4_fastmap fastmap[16];
41 struct route4_bucket *table[256 + 1]; 40 struct route4_bucket __rcu *table[256 + 1];
41 struct rcu_head rcu;
42}; 42};
43 43
44struct route4_bucket { 44struct route4_bucket {
45 /* 16 FROM buckets + 16 IIF buckets + 1 wildcard bucket */ 45 /* 16 FROM buckets + 16 IIF buckets + 1 wildcard bucket */
46 struct route4_filter *ht[16 + 16 + 1]; 46 struct route4_filter __rcu *ht[16 + 16 + 1];
47 struct rcu_head rcu;
47}; 48};
48 49
49struct route4_filter { 50struct route4_filter {
50 struct route4_filter *next; 51 struct route4_filter __rcu *next;
51 u32 id; 52 u32 id;
52 int iif; 53 int iif;
53 54
@@ -55,6 +56,8 @@ struct route4_filter {
55 struct tcf_exts exts; 56 struct tcf_exts exts;
56 u32 handle; 57 u32 handle;
57 struct route4_bucket *bkt; 58 struct route4_bucket *bkt;
59 struct tcf_proto *tp;
60 struct rcu_head rcu;
58}; 61};
59 62
60#define ROUTE4_FAILURE ((struct route4_filter *)(-1L)) 63#define ROUTE4_FAILURE ((struct route4_filter *)(-1L))
@@ -64,14 +67,13 @@ static inline int route4_fastmap_hash(u32 id, int iif)
64 return id & 0xF; 67 return id & 0xF;
65} 68}
66 69
70static DEFINE_SPINLOCK(fastmap_lock);
67static void 71static void
68route4_reset_fastmap(struct Qdisc *q, struct route4_head *head, u32 id) 72route4_reset_fastmap(struct route4_head *head)
69{ 73{
70 spinlock_t *root_lock = qdisc_root_sleeping_lock(q); 74 spin_lock_bh(&fastmap_lock);
71
72 spin_lock_bh(root_lock);
73 memset(head->fastmap, 0, sizeof(head->fastmap)); 75 memset(head->fastmap, 0, sizeof(head->fastmap));
74 spin_unlock_bh(root_lock); 76 spin_unlock_bh(&fastmap_lock);
75} 77}
76 78
77static void 79static void
@@ -80,9 +82,12 @@ route4_set_fastmap(struct route4_head *head, u32 id, int iif,
80{ 82{
81 int h = route4_fastmap_hash(id, iif); 83 int h = route4_fastmap_hash(id, iif);
82 84
85 /* fastmap updates must look atomic to aling id, iff, filter */
86 spin_lock_bh(&fastmap_lock);
83 head->fastmap[h].id = id; 87 head->fastmap[h].id = id;
84 head->fastmap[h].iif = iif; 88 head->fastmap[h].iif = iif;
85 head->fastmap[h].filter = f; 89 head->fastmap[h].filter = f;
90 spin_unlock_bh(&fastmap_lock);
86} 91}
87 92
88static inline int route4_hash_to(u32 id) 93static inline int route4_hash_to(u32 id)
@@ -123,7 +128,7 @@ static inline int route4_hash_wild(void)
123static int route4_classify(struct sk_buff *skb, const struct tcf_proto *tp, 128static int route4_classify(struct sk_buff *skb, const struct tcf_proto *tp,
124 struct tcf_result *res) 129 struct tcf_result *res)
125{ 130{
126 struct route4_head *head = tp->root; 131 struct route4_head *head = rcu_dereference_bh(tp->root);
127 struct dst_entry *dst; 132 struct dst_entry *dst;
128 struct route4_bucket *b; 133 struct route4_bucket *b;
129 struct route4_filter *f; 134 struct route4_filter *f;
@@ -141,32 +146,43 @@ static int route4_classify(struct sk_buff *skb, const struct tcf_proto *tp,
141 iif = inet_iif(skb); 146 iif = inet_iif(skb);
142 147
143 h = route4_fastmap_hash(id, iif); 148 h = route4_fastmap_hash(id, iif);
149
150 spin_lock(&fastmap_lock);
144 if (id == head->fastmap[h].id && 151 if (id == head->fastmap[h].id &&
145 iif == head->fastmap[h].iif && 152 iif == head->fastmap[h].iif &&
146 (f = head->fastmap[h].filter) != NULL) { 153 (f = head->fastmap[h].filter) != NULL) {
147 if (f == ROUTE4_FAILURE) 154 if (f == ROUTE4_FAILURE) {
155 spin_unlock(&fastmap_lock);
148 goto failure; 156 goto failure;
157 }
149 158
150 *res = f->res; 159 *res = f->res;
160 spin_unlock(&fastmap_lock);
151 return 0; 161 return 0;
152 } 162 }
163 spin_unlock(&fastmap_lock);
153 164
154 h = route4_hash_to(id); 165 h = route4_hash_to(id);
155 166
156restart: 167restart:
157 b = head->table[h]; 168 b = rcu_dereference_bh(head->table[h]);
158 if (b) { 169 if (b) {
159 for (f = b->ht[route4_hash_from(id)]; f; f = f->next) 170 for (f = rcu_dereference_bh(b->ht[route4_hash_from(id)]);
171 f;
172 f = rcu_dereference_bh(f->next))
160 if (f->id == id) 173 if (f->id == id)
161 ROUTE4_APPLY_RESULT(); 174 ROUTE4_APPLY_RESULT();
162 175
163 for (f = b->ht[route4_hash_iif(iif)]; f; f = f->next) 176 for (f = rcu_dereference_bh(b->ht[route4_hash_iif(iif)]);
177 f;
178 f = rcu_dereference_bh(f->next))
164 if (f->iif == iif) 179 if (f->iif == iif)
165 ROUTE4_APPLY_RESULT(); 180 ROUTE4_APPLY_RESULT();
166 181
167 for (f = b->ht[route4_hash_wild()]; f; f = f->next) 182 for (f = rcu_dereference_bh(b->ht[route4_hash_wild()]);
183 f;
184 f = rcu_dereference_bh(f->next))
168 ROUTE4_APPLY_RESULT(); 185 ROUTE4_APPLY_RESULT();
169
170 } 186 }
171 if (h < 256) { 187 if (h < 256) {
172 h = 256; 188 h = 256;
@@ -213,7 +229,7 @@ static inline u32 from_hash(u32 id)
213 229
214static unsigned long route4_get(struct tcf_proto *tp, u32 handle) 230static unsigned long route4_get(struct tcf_proto *tp, u32 handle)
215{ 231{
216 struct route4_head *head = tp->root; 232 struct route4_head *head = rtnl_dereference(tp->root);
217 struct route4_bucket *b; 233 struct route4_bucket *b;
218 struct route4_filter *f; 234 struct route4_filter *f;
219 unsigned int h1, h2; 235 unsigned int h1, h2;
@@ -229,9 +245,11 @@ static unsigned long route4_get(struct tcf_proto *tp, u32 handle)
229 if (h2 > 32) 245 if (h2 > 32)
230 return 0; 246 return 0;
231 247
232 b = head->table[h1]; 248 b = rtnl_dereference(head->table[h1]);
233 if (b) { 249 if (b) {
234 for (f = b->ht[h2]; f; f = f->next) 250 for (f = rtnl_dereference(b->ht[h2]);
251 f;
252 f = rtnl_dereference(f->next))
235 if (f->handle == handle) 253 if (f->handle == handle)
236 return (unsigned long)f; 254 return (unsigned long)f;
237 } 255 }
@@ -248,16 +266,17 @@ static int route4_init(struct tcf_proto *tp)
248} 266}
249 267
250static void 268static void
251route4_delete_filter(struct tcf_proto *tp, struct route4_filter *f) 269route4_delete_filter(struct rcu_head *head)
252{ 270{
253 tcf_unbind_filter(tp, &f->res); 271 struct route4_filter *f = container_of(head, struct route4_filter, rcu);
254 tcf_exts_destroy(tp, &f->exts); 272
273 tcf_exts_destroy(&f->exts);
255 kfree(f); 274 kfree(f);
256} 275}
257 276
258static void route4_destroy(struct tcf_proto *tp) 277static void route4_destroy(struct tcf_proto *tp)
259{ 278{
260 struct route4_head *head = tp->root; 279 struct route4_head *head = rtnl_dereference(tp->root);
261 int h1, h2; 280 int h1, h2;
262 281
263 if (head == NULL) 282 if (head == NULL)
@@ -266,28 +285,36 @@ static void route4_destroy(struct tcf_proto *tp)
266 for (h1 = 0; h1 <= 256; h1++) { 285 for (h1 = 0; h1 <= 256; h1++) {
267 struct route4_bucket *b; 286 struct route4_bucket *b;
268 287
269 b = head->table[h1]; 288 b = rtnl_dereference(head->table[h1]);
270 if (b) { 289 if (b) {
271 for (h2 = 0; h2 <= 32; h2++) { 290 for (h2 = 0; h2 <= 32; h2++) {
272 struct route4_filter *f; 291 struct route4_filter *f;
273 292
274 while ((f = b->ht[h2]) != NULL) { 293 while ((f = rtnl_dereference(b->ht[h2])) != NULL) {
275 b->ht[h2] = f->next; 294 struct route4_filter *next;
276 route4_delete_filter(tp, f); 295
296 next = rtnl_dereference(f->next);
297 RCU_INIT_POINTER(b->ht[h2], next);
298 tcf_unbind_filter(tp, &f->res);
299 call_rcu(&f->rcu, route4_delete_filter);
277 } 300 }
278 } 301 }
279 kfree(b); 302 RCU_INIT_POINTER(head->table[h1], NULL);
303 kfree_rcu(b, rcu);
280 } 304 }
281 } 305 }
282 kfree(head); 306 RCU_INIT_POINTER(tp->root, NULL);
307 kfree_rcu(head, rcu);
283} 308}
284 309
285static int route4_delete(struct tcf_proto *tp, unsigned long arg) 310static int route4_delete(struct tcf_proto *tp, unsigned long arg)
286{ 311{
287 struct route4_head *head = tp->root; 312 struct route4_head *head = rtnl_dereference(tp->root);
288 struct route4_filter **fp, *f = (struct route4_filter *)arg; 313 struct route4_filter *f = (struct route4_filter *)arg;
289 unsigned int h = 0; 314 struct route4_filter __rcu **fp;
315 struct route4_filter *nf;
290 struct route4_bucket *b; 316 struct route4_bucket *b;
317 unsigned int h = 0;
291 int i; 318 int i;
292 319
293 if (!head || !f) 320 if (!head || !f)
@@ -296,27 +323,36 @@ static int route4_delete(struct tcf_proto *tp, unsigned long arg)
296 h = f->handle; 323 h = f->handle;
297 b = f->bkt; 324 b = f->bkt;
298 325
299 for (fp = &b->ht[from_hash(h >> 16)]; *fp; fp = &(*fp)->next) { 326 fp = &b->ht[from_hash(h >> 16)];
300 if (*fp == f) { 327 for (nf = rtnl_dereference(*fp); nf;
301 tcf_tree_lock(tp); 328 fp = &nf->next, nf = rtnl_dereference(*fp)) {
302 *fp = f->next; 329 if (nf == f) {
303 tcf_tree_unlock(tp); 330 /* unlink it */
304 331 RCU_INIT_POINTER(*fp, rtnl_dereference(f->next));
305 route4_reset_fastmap(tp->q, head, f->id); 332
306 route4_delete_filter(tp, f); 333 /* Remove any fastmap lookups that might ref filter
307 334 * notice we unlink'd the filter so we can't get it
308 /* Strip tree */ 335 * back in the fastmap.
309 336 */
310 for (i = 0; i <= 32; i++) 337 route4_reset_fastmap(head);
311 if (b->ht[i]) 338
339 /* Delete it */
340 tcf_unbind_filter(tp, &f->res);
341 call_rcu(&f->rcu, route4_delete_filter);
342
343 /* Strip RTNL protected tree */
344 for (i = 0; i <= 32; i++) {
345 struct route4_filter *rt;
346
347 rt = rtnl_dereference(b->ht[i]);
348 if (rt)
312 return 0; 349 return 0;
350 }
313 351
314 /* OK, session has no flows */ 352 /* OK, session has no flows */
315 tcf_tree_lock(tp); 353 RCU_INIT_POINTER(head->table[to_hash(h)], NULL);
316 head->table[to_hash(h)] = NULL; 354 kfree_rcu(b, rcu);
317 tcf_tree_unlock(tp);
318 355
319 kfree(b);
320 return 0; 356 return 0;
321 } 357 }
322 } 358 }
@@ -380,26 +416,25 @@ static int route4_set_parms(struct net *net, struct tcf_proto *tp,
380 } 416 }
381 417
382 h1 = to_hash(nhandle); 418 h1 = to_hash(nhandle);
383 b = head->table[h1]; 419 b = rtnl_dereference(head->table[h1]);
384 if (!b) { 420 if (!b) {
385 err = -ENOBUFS; 421 err = -ENOBUFS;
386 b = kzalloc(sizeof(struct route4_bucket), GFP_KERNEL); 422 b = kzalloc(sizeof(struct route4_bucket), GFP_KERNEL);
387 if (b == NULL) 423 if (b == NULL)
388 goto errout; 424 goto errout;
389 425
390 tcf_tree_lock(tp); 426 rcu_assign_pointer(head->table[h1], b);
391 head->table[h1] = b;
392 tcf_tree_unlock(tp);
393 } else { 427 } else {
394 unsigned int h2 = from_hash(nhandle >> 16); 428 unsigned int h2 = from_hash(nhandle >> 16);
395 429
396 err = -EEXIST; 430 err = -EEXIST;
397 for (fp = b->ht[h2]; fp; fp = fp->next) 431 for (fp = rtnl_dereference(b->ht[h2]);
432 fp;
433 fp = rtnl_dereference(fp->next))
398 if (fp->handle == f->handle) 434 if (fp->handle == f->handle)
399 goto errout; 435 goto errout;
400 } 436 }
401 437
402 tcf_tree_lock(tp);
403 if (tb[TCA_ROUTE4_TO]) 438 if (tb[TCA_ROUTE4_TO])
404 f->id = to; 439 f->id = to;
405 440
@@ -410,7 +445,7 @@ static int route4_set_parms(struct net *net, struct tcf_proto *tp,
410 445
411 f->handle = nhandle; 446 f->handle = nhandle;
412 f->bkt = b; 447 f->bkt = b;
413 tcf_tree_unlock(tp); 448 f->tp = tp;
414 449
415 if (tb[TCA_ROUTE4_CLASSID]) { 450 if (tb[TCA_ROUTE4_CLASSID]) {
416 f->res.classid = nla_get_u32(tb[TCA_ROUTE4_CLASSID]); 451 f->res.classid = nla_get_u32(tb[TCA_ROUTE4_CLASSID]);
@@ -421,7 +456,7 @@ static int route4_set_parms(struct net *net, struct tcf_proto *tp,
421 456
422 return 0; 457 return 0;
423errout: 458errout:
424 tcf_exts_destroy(tp, &e); 459 tcf_exts_destroy(&e);
425 return err; 460 return err;
426} 461}
427 462
@@ -431,14 +466,15 @@ static int route4_change(struct net *net, struct sk_buff *in_skb,
431 struct nlattr **tca, 466 struct nlattr **tca,
432 unsigned long *arg, bool ovr) 467 unsigned long *arg, bool ovr)
433{ 468{
434 struct route4_head *head = tp->root; 469 struct route4_head *head = rtnl_dereference(tp->root);
435 struct route4_filter *f, *f1, **fp; 470 struct route4_filter __rcu **fp;
471 struct route4_filter *fold, *f1, *pfp, *f = NULL;
436 struct route4_bucket *b; 472 struct route4_bucket *b;
437 struct nlattr *opt = tca[TCA_OPTIONS]; 473 struct nlattr *opt = tca[TCA_OPTIONS];
438 struct nlattr *tb[TCA_ROUTE4_MAX + 1]; 474 struct nlattr *tb[TCA_ROUTE4_MAX + 1];
439 unsigned int h, th; 475 unsigned int h, th;
440 u32 old_handle = 0;
441 int err; 476 int err;
477 bool new = true;
442 478
443 if (opt == NULL) 479 if (opt == NULL)
444 return handle ? -EINVAL : 0; 480 return handle ? -EINVAL : 0;
@@ -447,70 +483,73 @@ static int route4_change(struct net *net, struct sk_buff *in_skb,
447 if (err < 0) 483 if (err < 0)
448 return err; 484 return err;
449 485
450 f = (struct route4_filter *)*arg; 486 fold = (struct route4_filter *)*arg;
451 if (f) { 487 if (fold && handle && fold->handle != handle)
452 if (f->handle != handle && handle)
453 return -EINVAL; 488 return -EINVAL;
454 489
455 if (f->bkt)
456 old_handle = f->handle;
457
458 err = route4_set_parms(net, tp, base, f, handle, head, tb,
459 tca[TCA_RATE], 0, ovr);
460 if (err < 0)
461 return err;
462
463 goto reinsert;
464 }
465
466 err = -ENOBUFS; 490 err = -ENOBUFS;
467 if (head == NULL) { 491 if (head == NULL) {
468 head = kzalloc(sizeof(struct route4_head), GFP_KERNEL); 492 head = kzalloc(sizeof(struct route4_head), GFP_KERNEL);
469 if (head == NULL) 493 if (head == NULL)
470 goto errout; 494 goto errout;
471 495 rcu_assign_pointer(tp->root, head);
472 tcf_tree_lock(tp);
473 tp->root = head;
474 tcf_tree_unlock(tp);
475 } 496 }
476 497
477 f = kzalloc(sizeof(struct route4_filter), GFP_KERNEL); 498 f = kzalloc(sizeof(struct route4_filter), GFP_KERNEL);
478 if (f == NULL) 499 if (!f)
479 goto errout; 500 goto errout;
480 501
481 tcf_exts_init(&f->exts, TCA_ROUTE4_ACT, TCA_ROUTE4_POLICE); 502 tcf_exts_init(&f->exts, TCA_ROUTE4_ACT, TCA_ROUTE4_POLICE);
503 if (fold) {
504 f->id = fold->id;
505 f->iif = fold->iif;
506 f->res = fold->res;
507 f->handle = fold->handle;
508
509 f->tp = fold->tp;
510 f->bkt = fold->bkt;
511 new = false;
512 }
513
482 err = route4_set_parms(net, tp, base, f, handle, head, tb, 514 err = route4_set_parms(net, tp, base, f, handle, head, tb,
483 tca[TCA_RATE], 1, ovr); 515 tca[TCA_RATE], new, ovr);
484 if (err < 0) 516 if (err < 0)
485 goto errout; 517 goto errout;
486 518
487reinsert:
488 h = from_hash(f->handle >> 16); 519 h = from_hash(f->handle >> 16);
489 for (fp = &f->bkt->ht[h]; (f1 = *fp) != NULL; fp = &f1->next) 520 fp = &f->bkt->ht[h];
521 for (pfp = rtnl_dereference(*fp);
522 (f1 = rtnl_dereference(*fp)) != NULL;
523 fp = &f1->next)
490 if (f->handle < f1->handle) 524 if (f->handle < f1->handle)
491 break; 525 break;
492 526
493 f->next = f1; 527 netif_keep_dst(qdisc_dev(tp->q));
494 tcf_tree_lock(tp); 528 rcu_assign_pointer(f->next, f1);
495 *fp = f; 529 rcu_assign_pointer(*fp, f);
496 530
497 if (old_handle && f->handle != old_handle) { 531 if (fold && fold->handle && f->handle != fold->handle) {
498 th = to_hash(old_handle); 532 th = to_hash(fold->handle);
499 h = from_hash(old_handle >> 16); 533 h = from_hash(fold->handle >> 16);
500 b = head->table[th]; 534 b = rtnl_dereference(head->table[th]);
501 if (b) { 535 if (b) {
502 for (fp = &b->ht[h]; *fp; fp = &(*fp)->next) { 536 fp = &b->ht[h];
503 if (*fp == f) { 537 for (pfp = rtnl_dereference(*fp); pfp;
538 fp = &pfp->next, pfp = rtnl_dereference(*fp)) {
539 if (pfp == f) {
504 *fp = f->next; 540 *fp = f->next;
505 break; 541 break;
506 } 542 }
507 } 543 }
508 } 544 }
509 } 545 }
510 tcf_tree_unlock(tp);
511 546
512 route4_reset_fastmap(tp->q, head, f->id); 547 route4_reset_fastmap(head);
513 *arg = (unsigned long)f; 548 *arg = (unsigned long)f;
549 if (fold) {
550 tcf_unbind_filter(tp, &fold->res);
551 call_rcu(&fold->rcu, route4_delete_filter);
552 }
514 return 0; 553 return 0;
515 554
516errout: 555errout:
@@ -520,7 +559,7 @@ errout:
520 559
521static void route4_walk(struct tcf_proto *tp, struct tcf_walker *arg) 560static void route4_walk(struct tcf_proto *tp, struct tcf_walker *arg)
522{ 561{
523 struct route4_head *head = tp->root; 562 struct route4_head *head = rtnl_dereference(tp->root);
524 unsigned int h, h1; 563 unsigned int h, h1;
525 564
526 if (head == NULL) 565 if (head == NULL)
@@ -530,13 +569,15 @@ static void route4_walk(struct tcf_proto *tp, struct tcf_walker *arg)
530 return; 569 return;
531 570
532 for (h = 0; h <= 256; h++) { 571 for (h = 0; h <= 256; h++) {
533 struct route4_bucket *b = head->table[h]; 572 struct route4_bucket *b = rtnl_dereference(head->table[h]);
534 573
535 if (b) { 574 if (b) {
536 for (h1 = 0; h1 <= 32; h1++) { 575 for (h1 = 0; h1 <= 32; h1++) {
537 struct route4_filter *f; 576 struct route4_filter *f;
538 577
539 for (f = b->ht[h1]; f; f = f->next) { 578 for (f = rtnl_dereference(b->ht[h1]);
579 f;
580 f = rtnl_dereference(f->next)) {
540 if (arg->count < arg->skip) { 581 if (arg->count < arg->skip) {
541 arg->count++; 582 arg->count++;
542 continue; 583 continue;
diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
index 1020e233a5d6..6bb55f277a5a 100644
--- a/net/sched/cls_rsvp.h
+++ b/net/sched/cls_rsvp.h
@@ -70,31 +70,34 @@ struct rsvp_head {
70 u32 tmap[256/32]; 70 u32 tmap[256/32];
71 u32 hgenerator; 71 u32 hgenerator;
72 u8 tgenerator; 72 u8 tgenerator;
73 struct rsvp_session *ht[256]; 73 struct rsvp_session __rcu *ht[256];
74 struct rcu_head rcu;
74}; 75};
75 76
76struct rsvp_session { 77struct rsvp_session {
77 struct rsvp_session *next; 78 struct rsvp_session __rcu *next;
78 __be32 dst[RSVP_DST_LEN]; 79 __be32 dst[RSVP_DST_LEN];
79 struct tc_rsvp_gpi dpi; 80 struct tc_rsvp_gpi dpi;
80 u8 protocol; 81 u8 protocol;
81 u8 tunnelid; 82 u8 tunnelid;
82 /* 16 (src,sport) hash slots, and one wildcard source slot */ 83 /* 16 (src,sport) hash slots, and one wildcard source slot */
83 struct rsvp_filter *ht[16 + 1]; 84 struct rsvp_filter __rcu *ht[16 + 1];
85 struct rcu_head rcu;
84}; 86};
85 87
86 88
87struct rsvp_filter { 89struct rsvp_filter {
88 struct rsvp_filter *next; 90 struct rsvp_filter __rcu *next;
89 __be32 src[RSVP_DST_LEN]; 91 __be32 src[RSVP_DST_LEN];
90 struct tc_rsvp_gpi spi; 92 struct tc_rsvp_gpi spi;
91 u8 tunnelhdr; 93 u8 tunnelhdr;
92 94
93 struct tcf_result res; 95 struct tcf_result res;
94 struct tcf_exts exts; 96 struct tcf_exts exts;
95 97
96 u32 handle; 98 u32 handle;
97 struct rsvp_session *sess; 99 struct rsvp_session *sess;
100 struct rcu_head rcu;
98}; 101};
99 102
100static inline unsigned int hash_dst(__be32 *dst, u8 protocol, u8 tunnelid) 103static inline unsigned int hash_dst(__be32 *dst, u8 protocol, u8 tunnelid)
@@ -128,7 +131,7 @@ static inline unsigned int hash_src(__be32 *src)
128static int rsvp_classify(struct sk_buff *skb, const struct tcf_proto *tp, 131static int rsvp_classify(struct sk_buff *skb, const struct tcf_proto *tp,
129 struct tcf_result *res) 132 struct tcf_result *res)
130{ 133{
131 struct rsvp_session **sht = ((struct rsvp_head *)tp->root)->ht; 134 struct rsvp_head *head = rcu_dereference_bh(tp->root);
132 struct rsvp_session *s; 135 struct rsvp_session *s;
133 struct rsvp_filter *f; 136 struct rsvp_filter *f;
134 unsigned int h1, h2; 137 unsigned int h1, h2;
@@ -169,7 +172,8 @@ restart:
169 h1 = hash_dst(dst, protocol, tunnelid); 172 h1 = hash_dst(dst, protocol, tunnelid);
170 h2 = hash_src(src); 173 h2 = hash_src(src);
171 174
172 for (s = sht[h1]; s; s = s->next) { 175 for (s = rcu_dereference_bh(head->ht[h1]); s;
176 s = rcu_dereference_bh(s->next)) {
173 if (dst[RSVP_DST_LEN-1] == s->dst[RSVP_DST_LEN - 1] && 177 if (dst[RSVP_DST_LEN-1] == s->dst[RSVP_DST_LEN - 1] &&
174 protocol == s->protocol && 178 protocol == s->protocol &&
175 !(s->dpi.mask & 179 !(s->dpi.mask &
@@ -181,7 +185,8 @@ restart:
181#endif 185#endif
182 tunnelid == s->tunnelid) { 186 tunnelid == s->tunnelid) {
183 187
184 for (f = s->ht[h2]; f; f = f->next) { 188 for (f = rcu_dereference_bh(s->ht[h2]); f;
189 f = rcu_dereference_bh(f->next)) {
185 if (src[RSVP_DST_LEN-1] == f->src[RSVP_DST_LEN - 1] && 190 if (src[RSVP_DST_LEN-1] == f->src[RSVP_DST_LEN - 1] &&
186 !(f->spi.mask & (*(u32 *)(xprt + f->spi.offset) ^ f->spi.key)) 191 !(f->spi.mask & (*(u32 *)(xprt + f->spi.offset) ^ f->spi.key))
187#if RSVP_DST_LEN == 4 192#if RSVP_DST_LEN == 4
@@ -205,7 +210,8 @@ matched:
205 } 210 }
206 211
207 /* And wildcard bucket... */ 212 /* And wildcard bucket... */
208 for (f = s->ht[16]; f; f = f->next) { 213 for (f = rcu_dereference_bh(s->ht[16]); f;
214 f = rcu_dereference_bh(f->next)) {
209 *res = f->res; 215 *res = f->res;
210 RSVP_APPLY_RESULT(); 216 RSVP_APPLY_RESULT();
211 goto matched; 217 goto matched;
@@ -216,9 +222,36 @@ matched:
216 return -1; 222 return -1;
217} 223}
218 224
225static void rsvp_replace(struct tcf_proto *tp, struct rsvp_filter *n, u32 h)
226{
227 struct rsvp_head *head = rtnl_dereference(tp->root);
228 struct rsvp_session *s;
229 struct rsvp_filter __rcu **ins;
230 struct rsvp_filter *pins;
231 unsigned int h1 = h & 0xFF;
232 unsigned int h2 = (h >> 8) & 0xFF;
233
234 for (s = rtnl_dereference(head->ht[h1]); s;
235 s = rtnl_dereference(s->next)) {
236 for (ins = &s->ht[h2], pins = rtnl_dereference(*ins); ;
237 ins = &pins->next, pins = rtnl_dereference(*ins)) {
238 if (pins->handle == h) {
239 RCU_INIT_POINTER(n->next, pins->next);
240 rcu_assign_pointer(*ins, n);
241 return;
242 }
243 }
244 }
245
246 /* Something went wrong if we are trying to replace a non-existant
247 * node. Mind as well halt instead of silently failing.
248 */
249 BUG_ON(1);
250}
251
219static unsigned long rsvp_get(struct tcf_proto *tp, u32 handle) 252static unsigned long rsvp_get(struct tcf_proto *tp, u32 handle)
220{ 253{
221 struct rsvp_session **sht = ((struct rsvp_head *)tp->root)->ht; 254 struct rsvp_head *head = rtnl_dereference(tp->root);
222 struct rsvp_session *s; 255 struct rsvp_session *s;
223 struct rsvp_filter *f; 256 struct rsvp_filter *f;
224 unsigned int h1 = handle & 0xFF; 257 unsigned int h1 = handle & 0xFF;
@@ -227,8 +260,10 @@ static unsigned long rsvp_get(struct tcf_proto *tp, u32 handle)
227 if (h2 > 16) 260 if (h2 > 16)
228 return 0; 261 return 0;
229 262
230 for (s = sht[h1]; s; s = s->next) { 263 for (s = rtnl_dereference(head->ht[h1]); s;
231 for (f = s->ht[h2]; f; f = f->next) { 264 s = rtnl_dereference(s->next)) {
265 for (f = rtnl_dereference(s->ht[h2]); f;
266 f = rtnl_dereference(f->next)) {
232 if (f->handle == handle) 267 if (f->handle == handle)
233 return (unsigned long)f; 268 return (unsigned long)f;
234 } 269 }
@@ -246,7 +281,7 @@ static int rsvp_init(struct tcf_proto *tp)
246 281
247 data = kzalloc(sizeof(struct rsvp_head), GFP_KERNEL); 282 data = kzalloc(sizeof(struct rsvp_head), GFP_KERNEL);
248 if (data) { 283 if (data) {
249 tp->root = data; 284 rcu_assign_pointer(tp->root, data);
250 return 0; 285 return 0;
251 } 286 }
252 return -ENOBUFS; 287 return -ENOBUFS;
@@ -256,54 +291,55 @@ static void
256rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f) 291rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f)
257{ 292{
258 tcf_unbind_filter(tp, &f->res); 293 tcf_unbind_filter(tp, &f->res);
259 tcf_exts_destroy(tp, &f->exts); 294 tcf_exts_destroy(&f->exts);
260 kfree(f); 295 kfree_rcu(f, rcu);
261} 296}
262 297
263static void rsvp_destroy(struct tcf_proto *tp) 298static void rsvp_destroy(struct tcf_proto *tp)
264{ 299{
265 struct rsvp_head *data = xchg(&tp->root, NULL); 300 struct rsvp_head *data = rtnl_dereference(tp->root);
266 struct rsvp_session **sht;
267 int h1, h2; 301 int h1, h2;
268 302
269 if (data == NULL) 303 if (data == NULL)
270 return; 304 return;
271 305
272 sht = data->ht; 306 RCU_INIT_POINTER(tp->root, NULL);
273 307
274 for (h1 = 0; h1 < 256; h1++) { 308 for (h1 = 0; h1 < 256; h1++) {
275 struct rsvp_session *s; 309 struct rsvp_session *s;
276 310
277 while ((s = sht[h1]) != NULL) { 311 while ((s = rtnl_dereference(data->ht[h1])) != NULL) {
278 sht[h1] = s->next; 312 RCU_INIT_POINTER(data->ht[h1], s->next);
279 313
280 for (h2 = 0; h2 <= 16; h2++) { 314 for (h2 = 0; h2 <= 16; h2++) {
281 struct rsvp_filter *f; 315 struct rsvp_filter *f;
282 316
283 while ((f = s->ht[h2]) != NULL) { 317 while ((f = rtnl_dereference(s->ht[h2])) != NULL) {
284 s->ht[h2] = f->next; 318 rcu_assign_pointer(s->ht[h2], f->next);
285 rsvp_delete_filter(tp, f); 319 rsvp_delete_filter(tp, f);
286 } 320 }
287 } 321 }
288 kfree(s); 322 kfree_rcu(s, rcu);
289 } 323 }
290 } 324 }
291 kfree(data); 325 kfree_rcu(data, rcu);
292} 326}
293 327
294static int rsvp_delete(struct tcf_proto *tp, unsigned long arg) 328static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
295{ 329{
296 struct rsvp_filter **fp, *f = (struct rsvp_filter *)arg; 330 struct rsvp_head *head = rtnl_dereference(tp->root);
331 struct rsvp_filter *nfp, *f = (struct rsvp_filter *)arg;
332 struct rsvp_filter __rcu **fp;
297 unsigned int h = f->handle; 333 unsigned int h = f->handle;
298 struct rsvp_session **sp; 334 struct rsvp_session __rcu **sp;
299 struct rsvp_session *s = f->sess; 335 struct rsvp_session *nsp, *s = f->sess;
300 int i; 336 int i;
301 337
302 for (fp = &s->ht[(h >> 8) & 0xFF]; *fp; fp = &(*fp)->next) { 338 fp = &s->ht[(h >> 8) & 0xFF];
303 if (*fp == f) { 339 for (nfp = rtnl_dereference(*fp); nfp;
304 tcf_tree_lock(tp); 340 fp = &nfp->next, nfp = rtnl_dereference(*fp)) {
305 *fp = f->next; 341 if (nfp == f) {
306 tcf_tree_unlock(tp); 342 RCU_INIT_POINTER(*fp, f->next);
307 rsvp_delete_filter(tp, f); 343 rsvp_delete_filter(tp, f);
308 344
309 /* Strip tree */ 345 /* Strip tree */
@@ -313,14 +349,12 @@ static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
313 return 0; 349 return 0;
314 350
315 /* OK, session has no flows */ 351 /* OK, session has no flows */
316 for (sp = &((struct rsvp_head *)tp->root)->ht[h & 0xFF]; 352 sp = &head->ht[h & 0xFF];
317 *sp; sp = &(*sp)->next) { 353 for (nsp = rtnl_dereference(*sp); nsp;
318 if (*sp == s) { 354 sp = &nsp->next, nsp = rtnl_dereference(*sp)) {
319 tcf_tree_lock(tp); 355 if (nsp == s) {
320 *sp = s->next; 356 RCU_INIT_POINTER(*sp, s->next);
321 tcf_tree_unlock(tp); 357 kfree_rcu(s, rcu);
322
323 kfree(s);
324 return 0; 358 return 0;
325 } 359 }
326 } 360 }
@@ -333,7 +367,7 @@ static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
333 367
334static unsigned int gen_handle(struct tcf_proto *tp, unsigned salt) 368static unsigned int gen_handle(struct tcf_proto *tp, unsigned salt)
335{ 369{
336 struct rsvp_head *data = tp->root; 370 struct rsvp_head *data = rtnl_dereference(tp->root);
337 int i = 0xFFFF; 371 int i = 0xFFFF;
338 372
339 while (i-- > 0) { 373 while (i-- > 0) {
@@ -361,7 +395,7 @@ static int tunnel_bts(struct rsvp_head *data)
361 395
362static void tunnel_recycle(struct rsvp_head *data) 396static void tunnel_recycle(struct rsvp_head *data)
363{ 397{
364 struct rsvp_session **sht = data->ht; 398 struct rsvp_session __rcu **sht = data->ht;
365 u32 tmap[256/32]; 399 u32 tmap[256/32];
366 int h1, h2; 400 int h1, h2;
367 401
@@ -369,11 +403,13 @@ static void tunnel_recycle(struct rsvp_head *data)
369 403
370 for (h1 = 0; h1 < 256; h1++) { 404 for (h1 = 0; h1 < 256; h1++) {
371 struct rsvp_session *s; 405 struct rsvp_session *s;
372 for (s = sht[h1]; s; s = s->next) { 406 for (s = rtnl_dereference(sht[h1]); s;
407 s = rtnl_dereference(s->next)) {
373 for (h2 = 0; h2 <= 16; h2++) { 408 for (h2 = 0; h2 <= 16; h2++) {
374 struct rsvp_filter *f; 409 struct rsvp_filter *f;
375 410
376 for (f = s->ht[h2]; f; f = f->next) { 411 for (f = rtnl_dereference(s->ht[h2]); f;
412 f = rtnl_dereference(f->next)) {
377 if (f->tunnelhdr == 0) 413 if (f->tunnelhdr == 0)
378 continue; 414 continue;
379 data->tgenerator = f->res.classid; 415 data->tgenerator = f->res.classid;
@@ -417,9 +453,11 @@ static int rsvp_change(struct net *net, struct sk_buff *in_skb,
417 struct nlattr **tca, 453 struct nlattr **tca,
418 unsigned long *arg, bool ovr) 454 unsigned long *arg, bool ovr)
419{ 455{
420 struct rsvp_head *data = tp->root; 456 struct rsvp_head *data = rtnl_dereference(tp->root);
421 struct rsvp_filter *f, **fp; 457 struct rsvp_filter *f, *nfp;
422 struct rsvp_session *s, **sp; 458 struct rsvp_filter __rcu **fp;
459 struct rsvp_session *nsp, *s;
460 struct rsvp_session __rcu **sp;
423 struct tc_rsvp_pinfo *pinfo = NULL; 461 struct tc_rsvp_pinfo *pinfo = NULL;
424 struct nlattr *opt = tca[TCA_OPTIONS]; 462 struct nlattr *opt = tca[TCA_OPTIONS];
425 struct nlattr *tb[TCA_RSVP_MAX + 1]; 463 struct nlattr *tb[TCA_RSVP_MAX + 1];
@@ -443,15 +481,26 @@ static int rsvp_change(struct net *net, struct sk_buff *in_skb,
443 f = (struct rsvp_filter *)*arg; 481 f = (struct rsvp_filter *)*arg;
444 if (f) { 482 if (f) {
445 /* Node exists: adjust only classid */ 483 /* Node exists: adjust only classid */
484 struct rsvp_filter *n;
446 485
447 if (f->handle != handle && handle) 486 if (f->handle != handle && handle)
448 goto errout2; 487 goto errout2;
488
489 n = kmemdup(f, sizeof(*f), GFP_KERNEL);
490 if (!n) {
491 err = -ENOMEM;
492 goto errout2;
493 }
494
495 tcf_exts_init(&n->exts, TCA_RSVP_ACT, TCA_RSVP_POLICE);
496
449 if (tb[TCA_RSVP_CLASSID]) { 497 if (tb[TCA_RSVP_CLASSID]) {
450 f->res.classid = nla_get_u32(tb[TCA_RSVP_CLASSID]); 498 n->res.classid = nla_get_u32(tb[TCA_RSVP_CLASSID]);
451 tcf_bind_filter(tp, &f->res, base); 499 tcf_bind_filter(tp, &n->res, base);
452 } 500 }
453 501
454 tcf_exts_change(tp, &f->exts, &e); 502 tcf_exts_change(tp, &n->exts, &e);
503 rsvp_replace(tp, n, handle);
455 return 0; 504 return 0;
456 } 505 }
457 506
@@ -499,7 +548,9 @@ static int rsvp_change(struct net *net, struct sk_buff *in_skb,
499 goto errout; 548 goto errout;
500 } 549 }
501 550
502 for (sp = &data->ht[h1]; (s = *sp) != NULL; sp = &s->next) { 551 for (sp = &data->ht[h1];
552 (s = rtnl_dereference(*sp)) != NULL;
553 sp = &s->next) {
503 if (dst[RSVP_DST_LEN-1] == s->dst[RSVP_DST_LEN-1] && 554 if (dst[RSVP_DST_LEN-1] == s->dst[RSVP_DST_LEN-1] &&
504 pinfo && pinfo->protocol == s->protocol && 555 pinfo && pinfo->protocol == s->protocol &&
505 memcmp(&pinfo->dpi, &s->dpi, sizeof(s->dpi)) == 0 && 556 memcmp(&pinfo->dpi, &s->dpi, sizeof(s->dpi)) == 0 &&
@@ -521,12 +572,16 @@ insert:
521 572
522 tcf_exts_change(tp, &f->exts, &e); 573 tcf_exts_change(tp, &f->exts, &e);
523 574
524 for (fp = &s->ht[h2]; *fp; fp = &(*fp)->next) 575 fp = &s->ht[h2];
525 if (((*fp)->spi.mask & f->spi.mask) != f->spi.mask) 576 for (nfp = rtnl_dereference(*fp); nfp;
577 fp = &nfp->next, nfp = rtnl_dereference(*fp)) {
578 __u32 mask = nfp->spi.mask & f->spi.mask;
579
580 if (mask != f->spi.mask)
526 break; 581 break;
527 f->next = *fp; 582 }
528 wmb(); 583 RCU_INIT_POINTER(f->next, nfp);
529 *fp = f; 584 rcu_assign_pointer(*fp, f);
530 585
531 *arg = (unsigned long)f; 586 *arg = (unsigned long)f;
532 return 0; 587 return 0;
@@ -546,26 +601,27 @@ insert:
546 s->protocol = pinfo->protocol; 601 s->protocol = pinfo->protocol;
547 s->tunnelid = pinfo->tunnelid; 602 s->tunnelid = pinfo->tunnelid;
548 } 603 }
549 for (sp = &data->ht[h1]; *sp; sp = &(*sp)->next) { 604 sp = &data->ht[h1];
550 if (((*sp)->dpi.mask&s->dpi.mask) != s->dpi.mask) 605 for (nsp = rtnl_dereference(*sp); nsp;
606 sp = &nsp->next, nsp = rtnl_dereference(*sp)) {
607 if ((nsp->dpi.mask & s->dpi.mask) != s->dpi.mask)
551 break; 608 break;
552 } 609 }
553 s->next = *sp; 610 RCU_INIT_POINTER(s->next, nsp);
554 wmb(); 611 rcu_assign_pointer(*sp, s);
555 *sp = s;
556 612
557 goto insert; 613 goto insert;
558 614
559errout: 615errout:
560 kfree(f); 616 kfree(f);
561errout2: 617errout2:
562 tcf_exts_destroy(tp, &e); 618 tcf_exts_destroy(&e);
563 return err; 619 return err;
564} 620}
565 621
566static void rsvp_walk(struct tcf_proto *tp, struct tcf_walker *arg) 622static void rsvp_walk(struct tcf_proto *tp, struct tcf_walker *arg)
567{ 623{
568 struct rsvp_head *head = tp->root; 624 struct rsvp_head *head = rtnl_dereference(tp->root);
569 unsigned int h, h1; 625 unsigned int h, h1;
570 626
571 if (arg->stop) 627 if (arg->stop)
@@ -574,11 +630,13 @@ static void rsvp_walk(struct tcf_proto *tp, struct tcf_walker *arg)
574 for (h = 0; h < 256; h++) { 630 for (h = 0; h < 256; h++) {
575 struct rsvp_session *s; 631 struct rsvp_session *s;
576 632
577 for (s = head->ht[h]; s; s = s->next) { 633 for (s = rtnl_dereference(head->ht[h]); s;
634 s = rtnl_dereference(s->next)) {
578 for (h1 = 0; h1 <= 16; h1++) { 635 for (h1 = 0; h1 <= 16; h1++) {
579 struct rsvp_filter *f; 636 struct rsvp_filter *f;
580 637
581 for (f = s->ht[h1]; f; f = f->next) { 638 for (f = rtnl_dereference(s->ht[h1]); f;
639 f = rtnl_dereference(f->next)) {
582 if (arg->count < arg->skip) { 640 if (arg->count < arg->skip) {
583 arg->count++; 641 arg->count++;
584 continue; 642 continue;
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 3e9f76413b3b..30f10fb07f4a 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -32,19 +32,21 @@ struct tcindex_filter_result {
32struct tcindex_filter { 32struct tcindex_filter {
33 u16 key; 33 u16 key;
34 struct tcindex_filter_result result; 34 struct tcindex_filter_result result;
35 struct tcindex_filter *next; 35 struct tcindex_filter __rcu *next;
36 struct rcu_head rcu;
36}; 37};
37 38
38 39
39struct tcindex_data { 40struct tcindex_data {
40 struct tcindex_filter_result *perfect; /* perfect hash; NULL if none */ 41 struct tcindex_filter_result *perfect; /* perfect hash; NULL if none */
41 struct tcindex_filter **h; /* imperfect hash; only used if !perfect; 42 struct tcindex_filter __rcu **h; /* imperfect hash; */
42 NULL if unused */ 43 struct tcf_proto *tp;
43 u16 mask; /* AND key with mask */ 44 u16 mask; /* AND key with mask */
44 int shift; /* shift ANDed key to the right */ 45 u32 shift; /* shift ANDed key to the right */
45 int hash; /* hash table size; 0 if undefined */ 46 u32 hash; /* hash table size; 0 if undefined */
46 int alloc_hash; /* allocated size */ 47 u32 alloc_hash; /* allocated size */
47 int fall_through; /* 0: only classify if explicit match */ 48 u32 fall_through; /* 0: only classify if explicit match */
49 struct rcu_head rcu;
48}; 50};
49 51
50static inline int 52static inline int
@@ -56,13 +58,18 @@ tcindex_filter_is_set(struct tcindex_filter_result *r)
56static struct tcindex_filter_result * 58static struct tcindex_filter_result *
57tcindex_lookup(struct tcindex_data *p, u16 key) 59tcindex_lookup(struct tcindex_data *p, u16 key)
58{ 60{
59 struct tcindex_filter *f; 61 if (p->perfect) {
62 struct tcindex_filter_result *f = p->perfect + key;
63
64 return tcindex_filter_is_set(f) ? f : NULL;
65 } else if (p->h) {
66 struct tcindex_filter __rcu **fp;
67 struct tcindex_filter *f;
60 68
61 if (p->perfect) 69 fp = &p->h[key % p->hash];
62 return tcindex_filter_is_set(p->perfect + key) ? 70 for (f = rcu_dereference_bh_rtnl(*fp);
63 p->perfect + key : NULL; 71 f;
64 else if (p->h) { 72 fp = &f->next, f = rcu_dereference_bh_rtnl(*fp))
65 for (f = p->h[key % p->hash]; f; f = f->next)
66 if (f->key == key) 73 if (f->key == key)
67 return &f->result; 74 return &f->result;
68 } 75 }
@@ -74,7 +81,7 @@ tcindex_lookup(struct tcindex_data *p, u16 key)
74static int tcindex_classify(struct sk_buff *skb, const struct tcf_proto *tp, 81static int tcindex_classify(struct sk_buff *skb, const struct tcf_proto *tp,
75 struct tcf_result *res) 82 struct tcf_result *res)
76{ 83{
77 struct tcindex_data *p = tp->root; 84 struct tcindex_data *p = rcu_dereference_bh(tp->root);
78 struct tcindex_filter_result *f; 85 struct tcindex_filter_result *f;
79 int key = (skb->tc_index & p->mask) >> p->shift; 86 int key = (skb->tc_index & p->mask) >> p->shift;
80 87
@@ -99,7 +106,7 @@ static int tcindex_classify(struct sk_buff *skb, const struct tcf_proto *tp,
99 106
100static unsigned long tcindex_get(struct tcf_proto *tp, u32 handle) 107static unsigned long tcindex_get(struct tcf_proto *tp, u32 handle)
101{ 108{
102 struct tcindex_data *p = tp->root; 109 struct tcindex_data *p = rtnl_dereference(tp->root);
103 struct tcindex_filter_result *r; 110 struct tcindex_filter_result *r;
104 111
105 pr_debug("tcindex_get(tp %p,handle 0x%08x)\n", tp, handle); 112 pr_debug("tcindex_get(tp %p,handle 0x%08x)\n", tp, handle);
@@ -129,49 +136,59 @@ static int tcindex_init(struct tcf_proto *tp)
129 p->hash = DEFAULT_HASH_SIZE; 136 p->hash = DEFAULT_HASH_SIZE;
130 p->fall_through = 1; 137 p->fall_through = 1;
131 138
132 tp->root = p; 139 rcu_assign_pointer(tp->root, p);
133 return 0; 140 return 0;
134} 141}
135 142
136
137static int 143static int
138__tcindex_delete(struct tcf_proto *tp, unsigned long arg, int lock) 144tcindex_delete(struct tcf_proto *tp, unsigned long arg)
139{ 145{
140 struct tcindex_data *p = tp->root; 146 struct tcindex_data *p = rtnl_dereference(tp->root);
141 struct tcindex_filter_result *r = (struct tcindex_filter_result *) arg; 147 struct tcindex_filter_result *r = (struct tcindex_filter_result *) arg;
148 struct tcindex_filter __rcu **walk;
142 struct tcindex_filter *f = NULL; 149 struct tcindex_filter *f = NULL;
143 150
144 pr_debug("tcindex_delete(tp %p,arg 0x%lx),p %p,f %p\n", tp, arg, p, f); 151 pr_debug("tcindex_delete(tp %p,arg 0x%lx),p %p\n", tp, arg, p);
145 if (p->perfect) { 152 if (p->perfect) {
146 if (!r->res.class) 153 if (!r->res.class)
147 return -ENOENT; 154 return -ENOENT;
148 } else { 155 } else {
149 int i; 156 int i;
150 struct tcindex_filter **walk = NULL;
151 157
152 for (i = 0; i < p->hash; i++) 158 for (i = 0; i < p->hash; i++) {
153 for (walk = p->h+i; *walk; walk = &(*walk)->next) 159 walk = p->h + i;
154 if (&(*walk)->result == r) 160 for (f = rtnl_dereference(*walk); f;
161 walk = &f->next, f = rtnl_dereference(*walk)) {
162 if (&f->result == r)
155 goto found; 163 goto found;
164 }
165 }
156 return -ENOENT; 166 return -ENOENT;
157 167
158found: 168found:
159 f = *walk; 169 rcu_assign_pointer(*walk, rtnl_dereference(f->next));
160 if (lock)
161 tcf_tree_lock(tp);
162 *walk = f->next;
163 if (lock)
164 tcf_tree_unlock(tp);
165 } 170 }
166 tcf_unbind_filter(tp, &r->res); 171 tcf_unbind_filter(tp, &r->res);
167 tcf_exts_destroy(tp, &r->exts); 172 tcf_exts_destroy(&r->exts);
168 kfree(f); 173 if (f)
174 kfree_rcu(f, rcu);
169 return 0; 175 return 0;
170} 176}
171 177
172static int tcindex_delete(struct tcf_proto *tp, unsigned long arg) 178static int tcindex_destroy_element(struct tcf_proto *tp,
179 unsigned long arg,
180 struct tcf_walker *walker)
173{ 181{
174 return __tcindex_delete(tp, arg, 1); 182 return tcindex_delete(tp, arg);
183}
184
185static void __tcindex_destroy(struct rcu_head *head)
186{
187 struct tcindex_data *p = container_of(head, struct tcindex_data, rcu);
188
189 kfree(p->perfect);
190 kfree(p->h);
191 kfree(p);
175} 192}
176 193
177static inline int 194static inline int
@@ -194,6 +211,14 @@ static void tcindex_filter_result_init(struct tcindex_filter_result *r)
194 tcf_exts_init(&r->exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); 211 tcf_exts_init(&r->exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
195} 212}
196 213
214static void __tcindex_partial_destroy(struct rcu_head *head)
215{
216 struct tcindex_data *p = container_of(head, struct tcindex_data, rcu);
217
218 kfree(p->perfect);
219 kfree(p);
220}
221
197static int 222static int
198tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, 223tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
199 u32 handle, struct tcindex_data *p, 224 u32 handle, struct tcindex_data *p,
@@ -203,7 +228,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
203 int err, balloc = 0; 228 int err, balloc = 0;
204 struct tcindex_filter_result new_filter_result, *old_r = r; 229 struct tcindex_filter_result new_filter_result, *old_r = r;
205 struct tcindex_filter_result cr; 230 struct tcindex_filter_result cr;
206 struct tcindex_data cp; 231 struct tcindex_data *cp, *oldp;
207 struct tcindex_filter *f = NULL; /* make gcc behave */ 232 struct tcindex_filter *f = NULL; /* make gcc behave */
208 struct tcf_exts e; 233 struct tcf_exts e;
209 234
@@ -212,89 +237,130 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
212 if (err < 0) 237 if (err < 0)
213 return err; 238 return err;
214 239
215 memcpy(&cp, p, sizeof(cp)); 240 err = -ENOMEM;
216 tcindex_filter_result_init(&new_filter_result); 241 /* tcindex_data attributes must look atomic to classifier/lookup so
242 * allocate new tcindex data and RCU assign it onto root. Keeping
243 * perfect hash and hash pointers from old data.
244 */
245 cp = kzalloc(sizeof(*cp), GFP_KERNEL);
246 if (!cp)
247 goto errout;
248
249 cp->mask = p->mask;
250 cp->shift = p->shift;
251 cp->hash = p->hash;
252 cp->alloc_hash = p->alloc_hash;
253 cp->fall_through = p->fall_through;
254 cp->tp = tp;
217 255
256 if (p->perfect) {
257 int i;
258
259 cp->perfect = kmemdup(p->perfect,
260 sizeof(*r) * cp->hash, GFP_KERNEL);
261 if (!cp->perfect)
262 goto errout;
263 for (i = 0; i < cp->hash; i++)
264 tcf_exts_init(&cp->perfect[i].exts,
265 TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
266 balloc = 1;
267 }
268 cp->h = p->h;
269
270 tcindex_filter_result_init(&new_filter_result);
218 tcindex_filter_result_init(&cr); 271 tcindex_filter_result_init(&cr);
219 if (old_r) 272 if (old_r)
220 cr.res = r->res; 273 cr.res = r->res;
221 274
222 if (tb[TCA_TCINDEX_HASH]) 275 if (tb[TCA_TCINDEX_HASH])
223 cp.hash = nla_get_u32(tb[TCA_TCINDEX_HASH]); 276 cp->hash = nla_get_u32(tb[TCA_TCINDEX_HASH]);
224 277
225 if (tb[TCA_TCINDEX_MASK]) 278 if (tb[TCA_TCINDEX_MASK])
226 cp.mask = nla_get_u16(tb[TCA_TCINDEX_MASK]); 279 cp->mask = nla_get_u16(tb[TCA_TCINDEX_MASK]);
227 280
228 if (tb[TCA_TCINDEX_SHIFT]) 281 if (tb[TCA_TCINDEX_SHIFT])
229 cp.shift = nla_get_u32(tb[TCA_TCINDEX_SHIFT]); 282 cp->shift = nla_get_u32(tb[TCA_TCINDEX_SHIFT]);
230 283
231 err = -EBUSY; 284 err = -EBUSY;
285
232 /* Hash already allocated, make sure that we still meet the 286 /* Hash already allocated, make sure that we still meet the
233 * requirements for the allocated hash. 287 * requirements for the allocated hash.
234 */ 288 */
235 if (cp.perfect) { 289 if (cp->perfect) {
236 if (!valid_perfect_hash(&cp) || 290 if (!valid_perfect_hash(cp) ||
237 cp.hash > cp.alloc_hash) 291 cp->hash > cp->alloc_hash)
238 goto errout; 292 goto errout_alloc;
239 } else if (cp.h && cp.hash != cp.alloc_hash) 293 } else if (cp->h && cp->hash != cp->alloc_hash) {
240 goto errout; 294 goto errout_alloc;
295 }
241 296
242 err = -EINVAL; 297 err = -EINVAL;
243 if (tb[TCA_TCINDEX_FALL_THROUGH]) 298 if (tb[TCA_TCINDEX_FALL_THROUGH])
244 cp.fall_through = nla_get_u32(tb[TCA_TCINDEX_FALL_THROUGH]); 299 cp->fall_through = nla_get_u32(tb[TCA_TCINDEX_FALL_THROUGH]);
245 300
246 if (!cp.hash) { 301 if (!cp->hash) {
247 /* Hash not specified, use perfect hash if the upper limit 302 /* Hash not specified, use perfect hash if the upper limit
248 * of the hashing index is below the threshold. 303 * of the hashing index is below the threshold.
249 */ 304 */
250 if ((cp.mask >> cp.shift) < PERFECT_HASH_THRESHOLD) 305 if ((cp->mask >> cp->shift) < PERFECT_HASH_THRESHOLD)
251 cp.hash = (cp.mask >> cp.shift) + 1; 306 cp->hash = (cp->mask >> cp->shift) + 1;
252 else 307 else
253 cp.hash = DEFAULT_HASH_SIZE; 308 cp->hash = DEFAULT_HASH_SIZE;
254 } 309 }
255 310
256 if (!cp.perfect && !cp.h) 311 if (!cp->perfect && !cp->h)
257 cp.alloc_hash = cp.hash; 312 cp->alloc_hash = cp->hash;
258 313
259 /* Note: this could be as restrictive as if (handle & ~(mask >> shift)) 314 /* Note: this could be as restrictive as if (handle & ~(mask >> shift))
260 * but then, we'd fail handles that may become valid after some future 315 * but then, we'd fail handles that may become valid after some future
261 * mask change. While this is extremely unlikely to ever matter, 316 * mask change. While this is extremely unlikely to ever matter,
262 * the check below is safer (and also more backwards-compatible). 317 * the check below is safer (and also more backwards-compatible).
263 */ 318 */
264 if (cp.perfect || valid_perfect_hash(&cp)) 319 if (cp->perfect || valid_perfect_hash(cp))
265 if (handle >= cp.alloc_hash) 320 if (handle >= cp->alloc_hash)
266 goto errout; 321 goto errout_alloc;
267 322
268 323
269 err = -ENOMEM; 324 err = -ENOMEM;
270 if (!cp.perfect && !cp.h) { 325 if (!cp->perfect && !cp->h) {
271 if (valid_perfect_hash(&cp)) { 326 if (valid_perfect_hash(cp)) {
272 int i; 327 int i;
273 328
274 cp.perfect = kcalloc(cp.hash, sizeof(*r), GFP_KERNEL); 329 cp->perfect = kcalloc(cp->hash, sizeof(*r), GFP_KERNEL);
275 if (!cp.perfect) 330 if (!cp->perfect)
276 goto errout; 331 goto errout_alloc;
277 for (i = 0; i < cp.hash; i++) 332 for (i = 0; i < cp->hash; i++)
278 tcf_exts_init(&cp.perfect[i].exts, TCA_TCINDEX_ACT, 333 tcf_exts_init(&cp->perfect[i].exts,
334 TCA_TCINDEX_ACT,
279 TCA_TCINDEX_POLICE); 335 TCA_TCINDEX_POLICE);
280 balloc = 1; 336 balloc = 1;
281 } else { 337 } else {
282 cp.h = kcalloc(cp.hash, sizeof(f), GFP_KERNEL); 338 struct tcindex_filter __rcu **hash;
283 if (!cp.h) 339
284 goto errout; 340 hash = kcalloc(cp->hash,
341 sizeof(struct tcindex_filter *),
342 GFP_KERNEL);
343
344 if (!hash)
345 goto errout_alloc;
346
347 cp->h = hash;
285 balloc = 2; 348 balloc = 2;
286 } 349 }
287 } 350 }
288 351
289 if (cp.perfect) 352 if (cp->perfect)
290 r = cp.perfect + handle; 353 r = cp->perfect + handle;
291 else 354 else
292 r = tcindex_lookup(&cp, handle) ? : &new_filter_result; 355 r = tcindex_lookup(cp, handle) ? : &new_filter_result;
293 356
294 if (r == &new_filter_result) { 357 if (r == &new_filter_result) {
295 f = kzalloc(sizeof(*f), GFP_KERNEL); 358 f = kzalloc(sizeof(*f), GFP_KERNEL);
296 if (!f) 359 if (!f)
297 goto errout_alloc; 360 goto errout_alloc;
361 f->key = handle;
362 tcindex_filter_result_init(&f->result);
363 f->next = NULL;
298 } 364 }
299 365
300 if (tb[TCA_TCINDEX_CLASSID]) { 366 if (tb[TCA_TCINDEX_CLASSID]) {
@@ -307,34 +373,40 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
307 else 373 else
308 tcf_exts_change(tp, &cr.exts, &e); 374 tcf_exts_change(tp, &cr.exts, &e);
309 375
310 tcf_tree_lock(tp);
311 if (old_r && old_r != r) 376 if (old_r && old_r != r)
312 tcindex_filter_result_init(old_r); 377 tcindex_filter_result_init(old_r);
313 378
314 memcpy(p, &cp, sizeof(cp)); 379 oldp = p;
315 r->res = cr.res; 380 r->res = cr.res;
381 rcu_assign_pointer(tp->root, cp);
316 382
317 if (r == &new_filter_result) { 383 if (r == &new_filter_result) {
318 struct tcindex_filter **fp; 384 struct tcindex_filter *nfp;
385 struct tcindex_filter __rcu **fp;
319 386
320 f->key = handle; 387 tcf_exts_change(tp, &f->result.exts, &r->exts);
321 f->result = new_filter_result; 388
322 f->next = NULL; 389 fp = cp->h + (handle % cp->hash);
323 for (fp = p->h+(handle % p->hash); *fp; fp = &(*fp)->next) 390 for (nfp = rtnl_dereference(*fp);
324 /* nothing */; 391 nfp;
325 *fp = f; 392 fp = &nfp->next, nfp = rtnl_dereference(*fp))
393 ; /* nothing */
394
395 rcu_assign_pointer(*fp, f);
326 } 396 }
327 tcf_tree_unlock(tp);
328 397
398 if (oldp)
399 call_rcu(&oldp->rcu, __tcindex_partial_destroy);
329 return 0; 400 return 0;
330 401
331errout_alloc: 402errout_alloc:
332 if (balloc == 1) 403 if (balloc == 1)
333 kfree(cp.perfect); 404 kfree(cp->perfect);
334 else if (balloc == 2) 405 else if (balloc == 2)
335 kfree(cp.h); 406 kfree(cp->h);
336errout: 407errout:
337 tcf_exts_destroy(tp, &e); 408 kfree(cp);
409 tcf_exts_destroy(&e);
338 return err; 410 return err;
339} 411}
340 412
@@ -345,7 +417,7 @@ tcindex_change(struct net *net, struct sk_buff *in_skb,
345{ 417{
346 struct nlattr *opt = tca[TCA_OPTIONS]; 418 struct nlattr *opt = tca[TCA_OPTIONS];
347 struct nlattr *tb[TCA_TCINDEX_MAX + 1]; 419 struct nlattr *tb[TCA_TCINDEX_MAX + 1];
348 struct tcindex_data *p = tp->root; 420 struct tcindex_data *p = rtnl_dereference(tp->root);
349 struct tcindex_filter_result *r = (struct tcindex_filter_result *) *arg; 421 struct tcindex_filter_result *r = (struct tcindex_filter_result *) *arg;
350 int err; 422 int err;
351 423
@@ -364,10 +436,9 @@ tcindex_change(struct net *net, struct sk_buff *in_skb,
364 tca[TCA_RATE], ovr); 436 tca[TCA_RATE], ovr);
365} 437}
366 438
367
368static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker) 439static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker)
369{ 440{
370 struct tcindex_data *p = tp->root; 441 struct tcindex_data *p = rtnl_dereference(tp->root);
371 struct tcindex_filter *f, *next; 442 struct tcindex_filter *f, *next;
372 int i; 443 int i;
373 444
@@ -390,8 +461,8 @@ static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker)
390 if (!p->h) 461 if (!p->h)
391 return; 462 return;
392 for (i = 0; i < p->hash; i++) { 463 for (i = 0; i < p->hash; i++) {
393 for (f = p->h[i]; f; f = next) { 464 for (f = rtnl_dereference(p->h[i]); f; f = next) {
394 next = f->next; 465 next = rtnl_dereference(f->next);
395 if (walker->count >= walker->skip) { 466 if (walker->count >= walker->skip) {
396 if (walker->fn(tp, (unsigned long) &f->result, 467 if (walker->fn(tp, (unsigned long) &f->result,
397 walker) < 0) { 468 walker) < 0) {
@@ -404,17 +475,9 @@ static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker)
404 } 475 }
405} 476}
406 477
407
408static int tcindex_destroy_element(struct tcf_proto *tp,
409 unsigned long arg, struct tcf_walker *walker)
410{
411 return __tcindex_delete(tp, arg, 0);
412}
413
414
415static void tcindex_destroy(struct tcf_proto *tp) 478static void tcindex_destroy(struct tcf_proto *tp)
416{ 479{
417 struct tcindex_data *p = tp->root; 480 struct tcindex_data *p = rtnl_dereference(tp->root);
418 struct tcf_walker walker; 481 struct tcf_walker walker;
419 482
420 pr_debug("tcindex_destroy(tp %p),p %p\n", tp, p); 483 pr_debug("tcindex_destroy(tp %p),p %p\n", tp, p);
@@ -422,17 +485,16 @@ static void tcindex_destroy(struct tcf_proto *tp)
422 walker.skip = 0; 485 walker.skip = 0;
423 walker.fn = tcindex_destroy_element; 486 walker.fn = tcindex_destroy_element;
424 tcindex_walk(tp, &walker); 487 tcindex_walk(tp, &walker);
425 kfree(p->perfect); 488
426 kfree(p->h); 489 RCU_INIT_POINTER(tp->root, NULL);
427 kfree(p); 490 call_rcu(&p->rcu, __tcindex_destroy);
428 tp->root = NULL;
429} 491}
430 492
431 493
432static int tcindex_dump(struct net *net, struct tcf_proto *tp, unsigned long fh, 494static int tcindex_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
433 struct sk_buff *skb, struct tcmsg *t) 495 struct sk_buff *skb, struct tcmsg *t)
434{ 496{
435 struct tcindex_data *p = tp->root; 497 struct tcindex_data *p = rtnl_dereference(tp->root);
436 struct tcindex_filter_result *r = (struct tcindex_filter_result *) fh; 498 struct tcindex_filter_result *r = (struct tcindex_filter_result *) fh;
437 unsigned char *b = skb_tail_pointer(skb); 499 unsigned char *b = skb_tail_pointer(skb);
438 struct nlattr *nest; 500 struct nlattr *nest;
@@ -455,15 +517,18 @@ static int tcindex_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
455 nla_nest_end(skb, nest); 517 nla_nest_end(skb, nest);
456 } else { 518 } else {
457 if (p->perfect) { 519 if (p->perfect) {
458 t->tcm_handle = r-p->perfect; 520 t->tcm_handle = r - p->perfect;
459 } else { 521 } else {
460 struct tcindex_filter *f; 522 struct tcindex_filter *f;
523 struct tcindex_filter __rcu **fp;
461 int i; 524 int i;
462 525
463 t->tcm_handle = 0; 526 t->tcm_handle = 0;
464 for (i = 0; !t->tcm_handle && i < p->hash; i++) { 527 for (i = 0; !t->tcm_handle && i < p->hash; i++) {
465 for (f = p->h[i]; !t->tcm_handle && f; 528 fp = &p->h[i];
466 f = f->next) { 529 for (f = rtnl_dereference(*fp);
530 !t->tcm_handle && f;
531 fp = &f->next, f = rtnl_dereference(*fp)) {
467 if (&f->result == r) 532 if (&f->result == r)
468 t->tcm_handle = f->key; 533 t->tcm_handle = f->key;
469 } 534 }
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 70c0be8d0121..0472909bb014 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -36,6 +36,7 @@
36#include <linux/kernel.h> 36#include <linux/kernel.h>
37#include <linux/string.h> 37#include <linux/string.h>
38#include <linux/errno.h> 38#include <linux/errno.h>
39#include <linux/percpu.h>
39#include <linux/rtnetlink.h> 40#include <linux/rtnetlink.h>
40#include <linux/skbuff.h> 41#include <linux/skbuff.h>
41#include <linux/bitmap.h> 42#include <linux/bitmap.h>
@@ -44,40 +45,49 @@
44#include <net/pkt_cls.h> 45#include <net/pkt_cls.h>
45 46
46struct tc_u_knode { 47struct tc_u_knode {
47 struct tc_u_knode *next; 48 struct tc_u_knode __rcu *next;
48 u32 handle; 49 u32 handle;
49 struct tc_u_hnode *ht_up; 50 struct tc_u_hnode __rcu *ht_up;
50 struct tcf_exts exts; 51 struct tcf_exts exts;
51#ifdef CONFIG_NET_CLS_IND 52#ifdef CONFIG_NET_CLS_IND
52 int ifindex; 53 int ifindex;
53#endif 54#endif
54 u8 fshift; 55 u8 fshift;
55 struct tcf_result res; 56 struct tcf_result res;
56 struct tc_u_hnode *ht_down; 57 struct tc_u_hnode __rcu *ht_down;
57#ifdef CONFIG_CLS_U32_PERF 58#ifdef CONFIG_CLS_U32_PERF
58 struct tc_u32_pcnt *pf; 59 struct tc_u32_pcnt __percpu *pf;
59#endif 60#endif
60#ifdef CONFIG_CLS_U32_MARK 61#ifdef CONFIG_CLS_U32_MARK
61 struct tc_u32_mark mark; 62 u32 val;
63 u32 mask;
64 u32 __percpu *pcpu_success;
62#endif 65#endif
66 struct tcf_proto *tp;
67 struct rcu_head rcu;
68 /* The 'sel' field MUST be the last field in structure to allow for
69 * tc_u32_keys allocated at end of structure.
70 */
63 struct tc_u32_sel sel; 71 struct tc_u32_sel sel;
64}; 72};
65 73
66struct tc_u_hnode { 74struct tc_u_hnode {
67 struct tc_u_hnode *next; 75 struct tc_u_hnode __rcu *next;
68 u32 handle; 76 u32 handle;
69 u32 prio; 77 u32 prio;
70 struct tc_u_common *tp_c; 78 struct tc_u_common *tp_c;
71 int refcnt; 79 int refcnt;
72 unsigned int divisor; 80 unsigned int divisor;
73 struct tc_u_knode *ht[1]; 81 struct tc_u_knode __rcu *ht[1];
82 struct rcu_head rcu;
74}; 83};
75 84
76struct tc_u_common { 85struct tc_u_common {
77 struct tc_u_hnode *hlist; 86 struct tc_u_hnode __rcu *hlist;
78 struct Qdisc *q; 87 struct Qdisc *q;
79 int refcnt; 88 int refcnt;
80 u32 hgenerator; 89 u32 hgenerator;
90 struct rcu_head rcu;
81}; 91};
82 92
83static inline unsigned int u32_hash_fold(__be32 key, 93static inline unsigned int u32_hash_fold(__be32 key,
@@ -96,7 +106,7 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct
96 unsigned int off; 106 unsigned int off;
97 } stack[TC_U32_MAXDEPTH]; 107 } stack[TC_U32_MAXDEPTH];
98 108
99 struct tc_u_hnode *ht = tp->root; 109 struct tc_u_hnode *ht = rcu_dereference_bh(tp->root);
100 unsigned int off = skb_network_offset(skb); 110 unsigned int off = skb_network_offset(skb);
101 struct tc_u_knode *n; 111 struct tc_u_knode *n;
102 int sdepth = 0; 112 int sdepth = 0;
@@ -108,23 +118,23 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct
108 int i, r; 118 int i, r;
109 119
110next_ht: 120next_ht:
111 n = ht->ht[sel]; 121 n = rcu_dereference_bh(ht->ht[sel]);
112 122
113next_knode: 123next_knode:
114 if (n) { 124 if (n) {
115 struct tc_u32_key *key = n->sel.keys; 125 struct tc_u32_key *key = n->sel.keys;
116 126
117#ifdef CONFIG_CLS_U32_PERF 127#ifdef CONFIG_CLS_U32_PERF
118 n->pf->rcnt += 1; 128 __this_cpu_inc(n->pf->rcnt);
119 j = 0; 129 j = 0;
120#endif 130#endif
121 131
122#ifdef CONFIG_CLS_U32_MARK 132#ifdef CONFIG_CLS_U32_MARK
123 if ((skb->mark & n->mark.mask) != n->mark.val) { 133 if ((skb->mark & n->mask) != n->val) {
124 n = n->next; 134 n = rcu_dereference_bh(n->next);
125 goto next_knode; 135 goto next_knode;
126 } else { 136 } else {
127 n->mark.success++; 137 __this_cpu_inc(*n->pcpu_success);
128 } 138 }
129#endif 139#endif
130 140
@@ -139,37 +149,39 @@ next_knode:
139 if (!data) 149 if (!data)
140 goto out; 150 goto out;
141 if ((*data ^ key->val) & key->mask) { 151 if ((*data ^ key->val) & key->mask) {
142 n = n->next; 152 n = rcu_dereference_bh(n->next);
143 goto next_knode; 153 goto next_knode;
144 } 154 }
145#ifdef CONFIG_CLS_U32_PERF 155#ifdef CONFIG_CLS_U32_PERF
146 n->pf->kcnts[j] += 1; 156 __this_cpu_inc(n->pf->kcnts[j]);
147 j++; 157 j++;
148#endif 158#endif
149 } 159 }
150 if (n->ht_down == NULL) { 160
161 ht = rcu_dereference_bh(n->ht_down);
162 if (!ht) {
151check_terminal: 163check_terminal:
152 if (n->sel.flags & TC_U32_TERMINAL) { 164 if (n->sel.flags & TC_U32_TERMINAL) {
153 165
154 *res = n->res; 166 *res = n->res;
155#ifdef CONFIG_NET_CLS_IND 167#ifdef CONFIG_NET_CLS_IND
156 if (!tcf_match_indev(skb, n->ifindex)) { 168 if (!tcf_match_indev(skb, n->ifindex)) {
157 n = n->next; 169 n = rcu_dereference_bh(n->next);
158 goto next_knode; 170 goto next_knode;
159 } 171 }
160#endif 172#endif
161#ifdef CONFIG_CLS_U32_PERF 173#ifdef CONFIG_CLS_U32_PERF
162 n->pf->rhit += 1; 174 __this_cpu_inc(n->pf->rhit);
163#endif 175#endif
164 r = tcf_exts_exec(skb, &n->exts, res); 176 r = tcf_exts_exec(skb, &n->exts, res);
165 if (r < 0) { 177 if (r < 0) {
166 n = n->next; 178 n = rcu_dereference_bh(n->next);
167 goto next_knode; 179 goto next_knode;
168 } 180 }
169 181
170 return r; 182 return r;
171 } 183 }
172 n = n->next; 184 n = rcu_dereference_bh(n->next);
173 goto next_knode; 185 goto next_knode;
174 } 186 }
175 187
@@ -180,7 +192,7 @@ check_terminal:
180 stack[sdepth].off = off; 192 stack[sdepth].off = off;
181 sdepth++; 193 sdepth++;
182 194
183 ht = n->ht_down; 195 ht = rcu_dereference_bh(n->ht_down);
184 sel = 0; 196 sel = 0;
185 if (ht->divisor) { 197 if (ht->divisor) {
186 __be32 *data, hdata; 198 __be32 *data, hdata;
@@ -222,7 +234,7 @@ check_terminal:
222 /* POP */ 234 /* POP */
223 if (sdepth--) { 235 if (sdepth--) {
224 n = stack[sdepth].knode; 236 n = stack[sdepth].knode;
225 ht = n->ht_up; 237 ht = rcu_dereference_bh(n->ht_up);
226 off = stack[sdepth].off; 238 off = stack[sdepth].off;
227 goto check_terminal; 239 goto check_terminal;
228 } 240 }
@@ -239,7 +251,9 @@ u32_lookup_ht(struct tc_u_common *tp_c, u32 handle)
239{ 251{
240 struct tc_u_hnode *ht; 252 struct tc_u_hnode *ht;
241 253
242 for (ht = tp_c->hlist; ht; ht = ht->next) 254 for (ht = rtnl_dereference(tp_c->hlist);
255 ht;
256 ht = rtnl_dereference(ht->next))
243 if (ht->handle == handle) 257 if (ht->handle == handle)
244 break; 258 break;
245 259
@@ -256,7 +270,9 @@ u32_lookup_key(struct tc_u_hnode *ht, u32 handle)
256 if (sel > ht->divisor) 270 if (sel > ht->divisor)
257 goto out; 271 goto out;
258 272
259 for (n = ht->ht[sel]; n; n = n->next) 273 for (n = rtnl_dereference(ht->ht[sel]);
274 n;
275 n = rtnl_dereference(n->next))
260 if (n->handle == handle) 276 if (n->handle == handle)
261 break; 277 break;
262out: 278out:
@@ -270,7 +286,7 @@ static unsigned long u32_get(struct tcf_proto *tp, u32 handle)
270 struct tc_u_common *tp_c = tp->data; 286 struct tc_u_common *tp_c = tp->data;
271 287
272 if (TC_U32_HTID(handle) == TC_U32_ROOT) 288 if (TC_U32_HTID(handle) == TC_U32_ROOT)
273 ht = tp->root; 289 ht = rtnl_dereference(tp->root);
274 else 290 else
275 ht = u32_lookup_ht(tp_c, TC_U32_HTID(handle)); 291 ht = u32_lookup_ht(tp_c, TC_U32_HTID(handle));
276 292
@@ -291,6 +307,9 @@ static u32 gen_new_htid(struct tc_u_common *tp_c)
291{ 307{
292 int i = 0x800; 308 int i = 0x800;
293 309
310 /* hgenerator only used inside rtnl lock it is safe to increment
311 * without read _copy_ update semantics
312 */
294 do { 313 do {
295 if (++tp_c->hgenerator == 0x7FF) 314 if (++tp_c->hgenerator == 0x7FF)
296 tp_c->hgenerator = 1; 315 tp_c->hgenerator = 1;
@@ -326,41 +345,78 @@ static int u32_init(struct tcf_proto *tp)
326 } 345 }
327 346
328 tp_c->refcnt++; 347 tp_c->refcnt++;
329 root_ht->next = tp_c->hlist; 348 RCU_INIT_POINTER(root_ht->next, tp_c->hlist);
330 tp_c->hlist = root_ht; 349 rcu_assign_pointer(tp_c->hlist, root_ht);
331 root_ht->tp_c = tp_c; 350 root_ht->tp_c = tp_c;
332 351
333 tp->root = root_ht; 352 rcu_assign_pointer(tp->root, root_ht);
334 tp->data = tp_c; 353 tp->data = tp_c;
335 return 0; 354 return 0;
336} 355}
337 356
338static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n) 357static int u32_destroy_key(struct tcf_proto *tp,
358 struct tc_u_knode *n,
359 bool free_pf)
339{ 360{
340 tcf_unbind_filter(tp, &n->res); 361 tcf_exts_destroy(&n->exts);
341 tcf_exts_destroy(tp, &n->exts);
342 if (n->ht_down) 362 if (n->ht_down)
343 n->ht_down->refcnt--; 363 n->ht_down->refcnt--;
344#ifdef CONFIG_CLS_U32_PERF 364#ifdef CONFIG_CLS_U32_PERF
345 kfree(n->pf); 365 if (free_pf)
366 free_percpu(n->pf);
367#endif
368#ifdef CONFIG_CLS_U32_MARK
369 if (free_pf)
370 free_percpu(n->pcpu_success);
346#endif 371#endif
347 kfree(n); 372 kfree(n);
348 return 0; 373 return 0;
349} 374}
350 375
376/* u32_delete_key_rcu should be called when free'ing a copied
377 * version of a tc_u_knode obtained from u32_init_knode(). When
378 * copies are obtained from u32_init_knode() the statistics are
379 * shared between the old and new copies to allow readers to
380 * continue to update the statistics during the copy. To support
381 * this the u32_delete_key_rcu variant does not free the percpu
382 * statistics.
383 */
384static void u32_delete_key_rcu(struct rcu_head *rcu)
385{
386 struct tc_u_knode *key = container_of(rcu, struct tc_u_knode, rcu);
387
388 u32_destroy_key(key->tp, key, false);
389}
390
391/* u32_delete_key_freepf_rcu is the rcu callback variant
392 * that free's the entire structure including the statistics
393 * percpu variables. Only use this if the key is not a copy
394 * returned by u32_init_knode(). See u32_delete_key_rcu()
395 * for the variant that should be used with keys return from
396 * u32_init_knode()
397 */
398static void u32_delete_key_freepf_rcu(struct rcu_head *rcu)
399{
400 struct tc_u_knode *key = container_of(rcu, struct tc_u_knode, rcu);
401
402 u32_destroy_key(key->tp, key, true);
403}
404
351static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key) 405static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
352{ 406{
353 struct tc_u_knode **kp; 407 struct tc_u_knode __rcu **kp;
354 struct tc_u_hnode *ht = key->ht_up; 408 struct tc_u_knode *pkp;
409 struct tc_u_hnode *ht = rtnl_dereference(key->ht_up);
355 410
356 if (ht) { 411 if (ht) {
357 for (kp = &ht->ht[TC_U32_HASH(key->handle)]; *kp; kp = &(*kp)->next) { 412 kp = &ht->ht[TC_U32_HASH(key->handle)];
358 if (*kp == key) { 413 for (pkp = rtnl_dereference(*kp); pkp;
359 tcf_tree_lock(tp); 414 kp = &pkp->next, pkp = rtnl_dereference(*kp)) {
360 *kp = key->next; 415 if (pkp == key) {
361 tcf_tree_unlock(tp); 416 RCU_INIT_POINTER(*kp, key->next);
362 417
363 u32_destroy_key(tp, key); 418 tcf_unbind_filter(tp, &key->res);
419 call_rcu(&key->rcu, u32_delete_key_freepf_rcu);
364 return 0; 420 return 0;
365 } 421 }
366 } 422 }
@@ -375,10 +431,11 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
375 unsigned int h; 431 unsigned int h;
376 432
377 for (h = 0; h <= ht->divisor; h++) { 433 for (h = 0; h <= ht->divisor; h++) {
378 while ((n = ht->ht[h]) != NULL) { 434 while ((n = rtnl_dereference(ht->ht[h])) != NULL) {
379 ht->ht[h] = n->next; 435 RCU_INIT_POINTER(ht->ht[h],
380 436 rtnl_dereference(n->next));
381 u32_destroy_key(tp, n); 437 tcf_unbind_filter(tp, &n->res);
438 call_rcu(&n->rcu, u32_delete_key_freepf_rcu);
382 } 439 }
383 } 440 }
384} 441}
@@ -386,28 +443,31 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
386static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht) 443static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
387{ 444{
388 struct tc_u_common *tp_c = tp->data; 445 struct tc_u_common *tp_c = tp->data;
389 struct tc_u_hnode **hn; 446 struct tc_u_hnode __rcu **hn;
447 struct tc_u_hnode *phn;
390 448
391 WARN_ON(ht->refcnt); 449 WARN_ON(ht->refcnt);
392 450
393 u32_clear_hnode(tp, ht); 451 u32_clear_hnode(tp, ht);
394 452
395 for (hn = &tp_c->hlist; *hn; hn = &(*hn)->next) { 453 hn = &tp_c->hlist;
396 if (*hn == ht) { 454 for (phn = rtnl_dereference(*hn);
397 *hn = ht->next; 455 phn;
398 kfree(ht); 456 hn = &phn->next, phn = rtnl_dereference(*hn)) {
457 if (phn == ht) {
458 RCU_INIT_POINTER(*hn, ht->next);
459 kfree_rcu(ht, rcu);
399 return 0; 460 return 0;
400 } 461 }
401 } 462 }
402 463
403 WARN_ON(1);
404 return -ENOENT; 464 return -ENOENT;
405} 465}
406 466
407static void u32_destroy(struct tcf_proto *tp) 467static void u32_destroy(struct tcf_proto *tp)
408{ 468{
409 struct tc_u_common *tp_c = tp->data; 469 struct tc_u_common *tp_c = tp->data;
410 struct tc_u_hnode *root_ht = tp->root; 470 struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
411 471
412 WARN_ON(root_ht == NULL); 472 WARN_ON(root_ht == NULL);
413 473
@@ -419,17 +479,16 @@ static void u32_destroy(struct tcf_proto *tp)
419 479
420 tp->q->u32_node = NULL; 480 tp->q->u32_node = NULL;
421 481
422 for (ht = tp_c->hlist; ht; ht = ht->next) { 482 for (ht = rtnl_dereference(tp_c->hlist);
483 ht;
484 ht = rtnl_dereference(ht->next)) {
423 ht->refcnt--; 485 ht->refcnt--;
424 u32_clear_hnode(tp, ht); 486 u32_clear_hnode(tp, ht);
425 } 487 }
426 488
427 while ((ht = tp_c->hlist) != NULL) { 489 while ((ht = rtnl_dereference(tp_c->hlist)) != NULL) {
428 tp_c->hlist = ht->next; 490 RCU_INIT_POINTER(tp_c->hlist, ht->next);
429 491 kfree_rcu(ht, rcu);
430 WARN_ON(ht->refcnt != 0);
431
432 kfree(ht);
433 } 492 }
434 493
435 kfree(tp_c); 494 kfree(tp_c);
@@ -441,6 +500,7 @@ static void u32_destroy(struct tcf_proto *tp)
441static int u32_delete(struct tcf_proto *tp, unsigned long arg) 500static int u32_delete(struct tcf_proto *tp, unsigned long arg)
442{ 501{
443 struct tc_u_hnode *ht = (struct tc_u_hnode *)arg; 502 struct tc_u_hnode *ht = (struct tc_u_hnode *)arg;
503 struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
444 504
445 if (ht == NULL) 505 if (ht == NULL)
446 return 0; 506 return 0;
@@ -448,7 +508,7 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg)
448 if (TC_U32_KEY(ht->handle)) 508 if (TC_U32_KEY(ht->handle))
449 return u32_delete_key(tp, (struct tc_u_knode *)ht); 509 return u32_delete_key(tp, (struct tc_u_knode *)ht);
450 510
451 if (tp->root == ht) 511 if (root_ht == ht)
452 return -EINVAL; 512 return -EINVAL;
453 513
454 if (ht->refcnt == 1) { 514 if (ht->refcnt == 1) {
@@ -471,7 +531,9 @@ static u32 gen_new_kid(struct tc_u_hnode *ht, u32 handle)
471 if (!bitmap) 531 if (!bitmap)
472 return handle | 0xFFF; 532 return handle | 0xFFF;
473 533
474 for (n = ht->ht[TC_U32_HASH(handle)]; n; n = n->next) 534 for (n = rtnl_dereference(ht->ht[TC_U32_HASH(handle)]);
535 n;
536 n = rtnl_dereference(n->next))
475 set_bit(TC_U32_NODE(n->handle), bitmap); 537 set_bit(TC_U32_NODE(n->handle), bitmap);
476 538
477 i = find_next_zero_bit(bitmap, NR_U32_NODE, 0x800); 539 i = find_next_zero_bit(bitmap, NR_U32_NODE, 0x800);
@@ -521,10 +583,8 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp,
521 ht_down->refcnt++; 583 ht_down->refcnt++;
522 } 584 }
523 585
524 tcf_tree_lock(tp); 586 ht_old = rtnl_dereference(n->ht_down);
525 ht_old = n->ht_down; 587 rcu_assign_pointer(n->ht_down, ht_down);
526 n->ht_down = ht_down;
527 tcf_tree_unlock(tp);
528 588
529 if (ht_old) 589 if (ht_old)
530 ht_old->refcnt--; 590 ht_old->refcnt--;
@@ -547,10 +607,86 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp,
547 607
548 return 0; 608 return 0;
549errout: 609errout:
550 tcf_exts_destroy(tp, &e); 610 tcf_exts_destroy(&e);
551 return err; 611 return err;
552} 612}
553 613
614static void u32_replace_knode(struct tcf_proto *tp,
615 struct tc_u_common *tp_c,
616 struct tc_u_knode *n)
617{
618 struct tc_u_knode __rcu **ins;
619 struct tc_u_knode *pins;
620 struct tc_u_hnode *ht;
621
622 if (TC_U32_HTID(n->handle) == TC_U32_ROOT)
623 ht = rtnl_dereference(tp->root);
624 else
625 ht = u32_lookup_ht(tp_c, TC_U32_HTID(n->handle));
626
627 ins = &ht->ht[TC_U32_HASH(n->handle)];
628
629 /* The node must always exist for it to be replaced if this is not the
630 * case then something went very wrong elsewhere.
631 */
632 for (pins = rtnl_dereference(*ins); ;
633 ins = &pins->next, pins = rtnl_dereference(*ins))
634 if (pins->handle == n->handle)
635 break;
636
637 RCU_INIT_POINTER(n->next, pins->next);
638 rcu_assign_pointer(*ins, n);
639}
640
641static struct tc_u_knode *u32_init_knode(struct tcf_proto *tp,
642 struct tc_u_knode *n)
643{
644 struct tc_u_knode *new;
645 struct tc_u32_sel *s = &n->sel;
646
647 new = kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key),
648 GFP_KERNEL);
649
650 if (!new)
651 return NULL;
652
653 RCU_INIT_POINTER(new->next, n->next);
654 new->handle = n->handle;
655 RCU_INIT_POINTER(new->ht_up, n->ht_up);
656
657#ifdef CONFIG_NET_CLS_IND
658 new->ifindex = n->ifindex;
659#endif
660 new->fshift = n->fshift;
661 new->res = n->res;
662 RCU_INIT_POINTER(new->ht_down, n->ht_down);
663
664 /* bump reference count as long as we hold pointer to structure */
665 if (new->ht_down)
666 new->ht_down->refcnt++;
667
668#ifdef CONFIG_CLS_U32_PERF
669 /* Statistics may be incremented by readers during update
670 * so we must keep them in tact. When the node is later destroyed
671 * a special destroy call must be made to not free the pf memory.
672 */
673 new->pf = n->pf;
674#endif
675
676#ifdef CONFIG_CLS_U32_MARK
677 new->val = n->val;
678 new->mask = n->mask;
679 /* Similarly success statistics must be moved as pointers */
680 new->pcpu_success = n->pcpu_success;
681#endif
682 new->tp = tp;
683 memcpy(&new->sel, s, sizeof(*s) + s->nkeys*sizeof(struct tc_u32_key));
684
685 tcf_exts_init(&new->exts, TCA_U32_ACT, TCA_U32_POLICE);
686
687 return new;
688}
689
554static int u32_change(struct net *net, struct sk_buff *in_skb, 690static int u32_change(struct net *net, struct sk_buff *in_skb,
555 struct tcf_proto *tp, unsigned long base, u32 handle, 691 struct tcf_proto *tp, unsigned long base, u32 handle,
556 struct nlattr **tca, 692 struct nlattr **tca,
@@ -564,6 +700,9 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
564 struct nlattr *tb[TCA_U32_MAX + 1]; 700 struct nlattr *tb[TCA_U32_MAX + 1];
565 u32 htid; 701 u32 htid;
566 int err; 702 int err;
703#ifdef CONFIG_CLS_U32_PERF
704 size_t size;
705#endif
567 706
568 if (opt == NULL) 707 if (opt == NULL)
569 return handle ? -EINVAL : 0; 708 return handle ? -EINVAL : 0;
@@ -574,11 +713,28 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
574 713
575 n = (struct tc_u_knode *)*arg; 714 n = (struct tc_u_knode *)*arg;
576 if (n) { 715 if (n) {
716 struct tc_u_knode *new;
717
577 if (TC_U32_KEY(n->handle) == 0) 718 if (TC_U32_KEY(n->handle) == 0)
578 return -EINVAL; 719 return -EINVAL;
579 720
580 return u32_set_parms(net, tp, base, n->ht_up, n, tb, 721 new = u32_init_knode(tp, n);
581 tca[TCA_RATE], ovr); 722 if (!new)
723 return -ENOMEM;
724
725 err = u32_set_parms(net, tp, base,
726 rtnl_dereference(n->ht_up), new, tb,
727 tca[TCA_RATE], ovr);
728
729 if (err) {
730 u32_destroy_key(tp, new, false);
731 return err;
732 }
733
734 u32_replace_knode(tp, tp_c, new);
735 tcf_unbind_filter(tp, &n->res);
736 call_rcu(&n->rcu, u32_delete_key_rcu);
737 return 0;
582 } 738 }
583 739
584 if (tb[TCA_U32_DIVISOR]) { 740 if (tb[TCA_U32_DIVISOR]) {
@@ -601,8 +757,8 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
601 ht->divisor = divisor; 757 ht->divisor = divisor;
602 ht->handle = handle; 758 ht->handle = handle;
603 ht->prio = tp->prio; 759 ht->prio = tp->prio;
604 ht->next = tp_c->hlist; 760 RCU_INIT_POINTER(ht->next, tp_c->hlist);
605 tp_c->hlist = ht; 761 rcu_assign_pointer(tp_c->hlist, ht);
606 *arg = (unsigned long)ht; 762 *arg = (unsigned long)ht;
607 return 0; 763 return 0;
608 } 764 }
@@ -610,7 +766,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
610 if (tb[TCA_U32_HASH]) { 766 if (tb[TCA_U32_HASH]) {
611 htid = nla_get_u32(tb[TCA_U32_HASH]); 767 htid = nla_get_u32(tb[TCA_U32_HASH]);
612 if (TC_U32_HTID(htid) == TC_U32_ROOT) { 768 if (TC_U32_HTID(htid) == TC_U32_ROOT) {
613 ht = tp->root; 769 ht = rtnl_dereference(tp->root);
614 htid = ht->handle; 770 htid = ht->handle;
615 } else { 771 } else {
616 ht = u32_lookup_ht(tp->data, TC_U32_HTID(htid)); 772 ht = u32_lookup_ht(tp->data, TC_U32_HTID(htid));
@@ -618,7 +774,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
618 return -EINVAL; 774 return -EINVAL;
619 } 775 }
620 } else { 776 } else {
621 ht = tp->root; 777 ht = rtnl_dereference(tp->root);
622 htid = ht->handle; 778 htid = ht->handle;
623 } 779 }
624 780
@@ -642,46 +798,62 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
642 return -ENOBUFS; 798 return -ENOBUFS;
643 799
644#ifdef CONFIG_CLS_U32_PERF 800#ifdef CONFIG_CLS_U32_PERF
645 n->pf = kzalloc(sizeof(struct tc_u32_pcnt) + s->nkeys*sizeof(u64), GFP_KERNEL); 801 size = sizeof(struct tc_u32_pcnt) + s->nkeys * sizeof(u64);
646 if (n->pf == NULL) { 802 n->pf = __alloc_percpu(size, __alignof__(struct tc_u32_pcnt));
803 if (!n->pf) {
647 kfree(n); 804 kfree(n);
648 return -ENOBUFS; 805 return -ENOBUFS;
649 } 806 }
650#endif 807#endif
651 808
652 memcpy(&n->sel, s, sizeof(*s) + s->nkeys*sizeof(struct tc_u32_key)); 809 memcpy(&n->sel, s, sizeof(*s) + s->nkeys*sizeof(struct tc_u32_key));
653 n->ht_up = ht; 810 RCU_INIT_POINTER(n->ht_up, ht);
654 n->handle = handle; 811 n->handle = handle;
655 n->fshift = s->hmask ? ffs(ntohl(s->hmask)) - 1 : 0; 812 n->fshift = s->hmask ? ffs(ntohl(s->hmask)) - 1 : 0;
656 tcf_exts_init(&n->exts, TCA_U32_ACT, TCA_U32_POLICE); 813 tcf_exts_init(&n->exts, TCA_U32_ACT, TCA_U32_POLICE);
814 n->tp = tp;
657 815
658#ifdef CONFIG_CLS_U32_MARK 816#ifdef CONFIG_CLS_U32_MARK
817 n->pcpu_success = alloc_percpu(u32);
818 if (!n->pcpu_success) {
819 err = -ENOMEM;
820 goto errout;
821 }
822
659 if (tb[TCA_U32_MARK]) { 823 if (tb[TCA_U32_MARK]) {
660 struct tc_u32_mark *mark; 824 struct tc_u32_mark *mark;
661 825
662 mark = nla_data(tb[TCA_U32_MARK]); 826 mark = nla_data(tb[TCA_U32_MARK]);
663 memcpy(&n->mark, mark, sizeof(struct tc_u32_mark)); 827 n->val = mark->val;
664 n->mark.success = 0; 828 n->mask = mark->mask;
665 } 829 }
666#endif 830#endif
667 831
668 err = u32_set_parms(net, tp, base, ht, n, tb, tca[TCA_RATE], ovr); 832 err = u32_set_parms(net, tp, base, ht, n, tb, tca[TCA_RATE], ovr);
669 if (err == 0) { 833 if (err == 0) {
670 struct tc_u_knode **ins; 834 struct tc_u_knode __rcu **ins;
671 for (ins = &ht->ht[TC_U32_HASH(handle)]; *ins; ins = &(*ins)->next) 835 struct tc_u_knode *pins;
672 if (TC_U32_NODE(handle) < TC_U32_NODE((*ins)->handle)) 836
837 ins = &ht->ht[TC_U32_HASH(handle)];
838 for (pins = rtnl_dereference(*ins); pins;
839 ins = &pins->next, pins = rtnl_dereference(*ins))
840 if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle))
673 break; 841 break;
674 842
675 n->next = *ins; 843 RCU_INIT_POINTER(n->next, pins);
676 tcf_tree_lock(tp); 844 rcu_assign_pointer(*ins, n);
677 *ins = n;
678 tcf_tree_unlock(tp);
679 845
680 *arg = (unsigned long)n; 846 *arg = (unsigned long)n;
681 return 0; 847 return 0;
682 } 848 }
849
850#ifdef CONFIG_CLS_U32_MARK
851 free_percpu(n->pcpu_success);
852errout:
853#endif
854
683#ifdef CONFIG_CLS_U32_PERF 855#ifdef CONFIG_CLS_U32_PERF
684 kfree(n->pf); 856 free_percpu(n->pf);
685#endif 857#endif
686 kfree(n); 858 kfree(n);
687 return err; 859 return err;
@@ -697,7 +869,9 @@ static void u32_walk(struct tcf_proto *tp, struct tcf_walker *arg)
697 if (arg->stop) 869 if (arg->stop)
698 return; 870 return;
699 871
700 for (ht = tp_c->hlist; ht; ht = ht->next) { 872 for (ht = rtnl_dereference(tp_c->hlist);
873 ht;
874 ht = rtnl_dereference(ht->next)) {
701 if (ht->prio != tp->prio) 875 if (ht->prio != tp->prio)
702 continue; 876 continue;
703 if (arg->count >= arg->skip) { 877 if (arg->count >= arg->skip) {
@@ -708,7 +882,9 @@ static void u32_walk(struct tcf_proto *tp, struct tcf_walker *arg)
708 } 882 }
709 arg->count++; 883 arg->count++;
710 for (h = 0; h <= ht->divisor; h++) { 884 for (h = 0; h <= ht->divisor; h++) {
711 for (n = ht->ht[h]; n; n = n->next) { 885 for (n = rtnl_dereference(ht->ht[h]);
886 n;
887 n = rtnl_dereference(n->next)) {
712 if (arg->count < arg->skip) { 888 if (arg->count < arg->skip) {
713 arg->count++; 889 arg->count++;
714 continue; 890 continue;
@@ -727,6 +903,7 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
727 struct sk_buff *skb, struct tcmsg *t) 903 struct sk_buff *skb, struct tcmsg *t)
728{ 904{
729 struct tc_u_knode *n = (struct tc_u_knode *)fh; 905 struct tc_u_knode *n = (struct tc_u_knode *)fh;
906 struct tc_u_hnode *ht_up, *ht_down;
730 struct nlattr *nest; 907 struct nlattr *nest;
731 908
732 if (n == NULL) 909 if (n == NULL)
@@ -745,11 +922,18 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
745 if (nla_put_u32(skb, TCA_U32_DIVISOR, divisor)) 922 if (nla_put_u32(skb, TCA_U32_DIVISOR, divisor))
746 goto nla_put_failure; 923 goto nla_put_failure;
747 } else { 924 } else {
925#ifdef CONFIG_CLS_U32_PERF
926 struct tc_u32_pcnt *gpf;
927 int cpu;
928#endif
929
748 if (nla_put(skb, TCA_U32_SEL, 930 if (nla_put(skb, TCA_U32_SEL,
749 sizeof(n->sel) + n->sel.nkeys*sizeof(struct tc_u32_key), 931 sizeof(n->sel) + n->sel.nkeys*sizeof(struct tc_u32_key),
750 &n->sel)) 932 &n->sel))
751 goto nla_put_failure; 933 goto nla_put_failure;
752 if (n->ht_up) { 934
935 ht_up = rtnl_dereference(n->ht_up);
936 if (ht_up) {
753 u32 htid = n->handle & 0xFFFFF000; 937 u32 htid = n->handle & 0xFFFFF000;
754 if (nla_put_u32(skb, TCA_U32_HASH, htid)) 938 if (nla_put_u32(skb, TCA_U32_HASH, htid))
755 goto nla_put_failure; 939 goto nla_put_failure;
@@ -757,14 +941,28 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
757 if (n->res.classid && 941 if (n->res.classid &&
758 nla_put_u32(skb, TCA_U32_CLASSID, n->res.classid)) 942 nla_put_u32(skb, TCA_U32_CLASSID, n->res.classid))
759 goto nla_put_failure; 943 goto nla_put_failure;
760 if (n->ht_down && 944
761 nla_put_u32(skb, TCA_U32_LINK, n->ht_down->handle)) 945 ht_down = rtnl_dereference(n->ht_down);
946 if (ht_down &&
947 nla_put_u32(skb, TCA_U32_LINK, ht_down->handle))
762 goto nla_put_failure; 948 goto nla_put_failure;
763 949
764#ifdef CONFIG_CLS_U32_MARK 950#ifdef CONFIG_CLS_U32_MARK
765 if ((n->mark.val || n->mark.mask) && 951 if ((n->val || n->mask)) {
766 nla_put(skb, TCA_U32_MARK, sizeof(n->mark), &n->mark)) 952 struct tc_u32_mark mark = {.val = n->val,
767 goto nla_put_failure; 953 .mask = n->mask,
954 .success = 0};
955 int cpum;
956
957 for_each_possible_cpu(cpum) {
958 __u32 cnt = *per_cpu_ptr(n->pcpu_success, cpum);
959
960 mark.success += cnt;
961 }
962
963 if (nla_put(skb, TCA_U32_MARK, sizeof(mark), &mark))
964 goto nla_put_failure;
965 }
768#endif 966#endif
769 967
770 if (tcf_exts_dump(skb, &n->exts) < 0) 968 if (tcf_exts_dump(skb, &n->exts) < 0)
@@ -779,10 +977,29 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
779 } 977 }
780#endif 978#endif
781#ifdef CONFIG_CLS_U32_PERF 979#ifdef CONFIG_CLS_U32_PERF
980 gpf = kzalloc(sizeof(struct tc_u32_pcnt) +
981 n->sel.nkeys * sizeof(u64),
982 GFP_KERNEL);
983 if (!gpf)
984 goto nla_put_failure;
985
986 for_each_possible_cpu(cpu) {
987 int i;
988 struct tc_u32_pcnt *pf = per_cpu_ptr(n->pf, cpu);
989
990 gpf->rcnt += pf->rcnt;
991 gpf->rhit += pf->rhit;
992 for (i = 0; i < n->sel.nkeys; i++)
993 gpf->kcnts[i] += pf->kcnts[i];
994 }
995
782 if (nla_put(skb, TCA_U32_PCNT, 996 if (nla_put(skb, TCA_U32_PCNT,
783 sizeof(struct tc_u32_pcnt) + n->sel.nkeys*sizeof(u64), 997 sizeof(struct tc_u32_pcnt) + n->sel.nkeys*sizeof(u64),
784 n->pf)) 998 gpf)) {
999 kfree(gpf);
785 goto nla_put_failure; 1000 goto nla_put_failure;
1001 }
1002 kfree(gpf);
786#endif 1003#endif
787 } 1004 }
788 1005
diff --git a/net/sched/em_canid.c b/net/sched/em_canid.c
index 7c292d474f47..ddd883ca55b2 100644
--- a/net/sched/em_canid.c
+++ b/net/sched/em_canid.c
@@ -120,7 +120,7 @@ static int em_canid_match(struct sk_buff *skb, struct tcf_ematch *m,
120 return match; 120 return match;
121} 121}
122 122
123static int em_canid_change(struct tcf_proto *tp, void *data, int len, 123static int em_canid_change(struct net *net, void *data, int len,
124 struct tcf_ematch *m) 124 struct tcf_ematch *m)
125{ 125{
126 struct can_filter *conf = data; /* Array with rules */ 126 struct can_filter *conf = data; /* Array with rules */
@@ -183,7 +183,7 @@ static int em_canid_change(struct tcf_proto *tp, void *data, int len,
183 return 0; 183 return 0;
184} 184}
185 185
186static void em_canid_destroy(struct tcf_proto *tp, struct tcf_ematch *m) 186static void em_canid_destroy(struct tcf_ematch *m)
187{ 187{
188 struct canid_match *cm = em_canid_priv(m); 188 struct canid_match *cm = em_canid_priv(m);
189 189
diff --git a/net/sched/em_ipset.c b/net/sched/em_ipset.c
index 527aeb7a3ff0..5b4a4efe468c 100644
--- a/net/sched/em_ipset.c
+++ b/net/sched/em_ipset.c
@@ -19,12 +19,11 @@
19#include <net/ip.h> 19#include <net/ip.h>
20#include <net/pkt_cls.h> 20#include <net/pkt_cls.h>
21 21
22static int em_ipset_change(struct tcf_proto *tp, void *data, int data_len, 22static int em_ipset_change(struct net *net, void *data, int data_len,
23 struct tcf_ematch *em) 23 struct tcf_ematch *em)
24{ 24{
25 struct xt_set_info *set = data; 25 struct xt_set_info *set = data;
26 ip_set_id_t index; 26 ip_set_id_t index;
27 struct net *net = dev_net(qdisc_dev(tp->q));
28 27
29 if (data_len != sizeof(*set)) 28 if (data_len != sizeof(*set))
30 return -EINVAL; 29 return -EINVAL;
@@ -42,11 +41,11 @@ static int em_ipset_change(struct tcf_proto *tp, void *data, int data_len,
42 return -ENOMEM; 41 return -ENOMEM;
43} 42}
44 43
45static void em_ipset_destroy(struct tcf_proto *p, struct tcf_ematch *em) 44static void em_ipset_destroy(struct tcf_ematch *em)
46{ 45{
47 const struct xt_set_info *set = (const void *) em->data; 46 const struct xt_set_info *set = (const void *) em->data;
48 if (set) { 47 if (set) {
49 ip_set_nfnl_put(dev_net(qdisc_dev(p->q)), set->index); 48 ip_set_nfnl_put(em->net, set->index);
50 kfree((void *) em->data); 49 kfree((void *) em->data);
51 } 50 }
52} 51}
diff --git a/net/sched/em_meta.c b/net/sched/em_meta.c
index 9b8c0b0e60d7..c8f8c399b99a 100644
--- a/net/sched/em_meta.c
+++ b/net/sched/em_meta.c
@@ -856,7 +856,7 @@ static const struct nla_policy meta_policy[TCA_EM_META_MAX + 1] = {
856 [TCA_EM_META_HDR] = { .len = sizeof(struct tcf_meta_hdr) }, 856 [TCA_EM_META_HDR] = { .len = sizeof(struct tcf_meta_hdr) },
857}; 857};
858 858
859static int em_meta_change(struct tcf_proto *tp, void *data, int len, 859static int em_meta_change(struct net *net, void *data, int len,
860 struct tcf_ematch *m) 860 struct tcf_ematch *m)
861{ 861{
862 int err; 862 int err;
@@ -908,7 +908,7 @@ errout:
908 return err; 908 return err;
909} 909}
910 910
911static void em_meta_destroy(struct tcf_proto *tp, struct tcf_ematch *m) 911static void em_meta_destroy(struct tcf_ematch *m)
912{ 912{
913 if (m) 913 if (m)
914 meta_delete((struct meta_match *) m->data); 914 meta_delete((struct meta_match *) m->data);
diff --git a/net/sched/em_nbyte.c b/net/sched/em_nbyte.c
index a3bed07a008b..df3110d69585 100644
--- a/net/sched/em_nbyte.c
+++ b/net/sched/em_nbyte.c
@@ -23,7 +23,7 @@ struct nbyte_data {
23 char pattern[0]; 23 char pattern[0];
24}; 24};
25 25
26static int em_nbyte_change(struct tcf_proto *tp, void *data, int data_len, 26static int em_nbyte_change(struct net *net, void *data, int data_len,
27 struct tcf_ematch *em) 27 struct tcf_ematch *em)
28{ 28{
29 struct tcf_em_nbyte *nbyte = data; 29 struct tcf_em_nbyte *nbyte = data;
diff --git a/net/sched/em_text.c b/net/sched/em_text.c
index 15d353d2e4be..f03c3de16c27 100644
--- a/net/sched/em_text.c
+++ b/net/sched/em_text.c
@@ -45,7 +45,7 @@ static int em_text_match(struct sk_buff *skb, struct tcf_ematch *m,
45 return skb_find_text(skb, from, to, tm->config, &state) != UINT_MAX; 45 return skb_find_text(skb, from, to, tm->config, &state) != UINT_MAX;
46} 46}
47 47
48static int em_text_change(struct tcf_proto *tp, void *data, int len, 48static int em_text_change(struct net *net, void *data, int len,
49 struct tcf_ematch *m) 49 struct tcf_ematch *m)
50{ 50{
51 struct text_match *tm; 51 struct text_match *tm;
@@ -100,7 +100,7 @@ retry:
100 return 0; 100 return 0;
101} 101}
102 102
103static void em_text_destroy(struct tcf_proto *tp, struct tcf_ematch *m) 103static void em_text_destroy(struct tcf_ematch *m)
104{ 104{
105 if (EM_TEXT_PRIV(m) && EM_TEXT_PRIV(m)->config) 105 if (EM_TEXT_PRIV(m) && EM_TEXT_PRIV(m)->config)
106 textsearch_destroy(EM_TEXT_PRIV(m)->config); 106 textsearch_destroy(EM_TEXT_PRIV(m)->config);
diff --git a/net/sched/ematch.c b/net/sched/ematch.c
index ad57f4444b9c..6742200b1307 100644
--- a/net/sched/ematch.c
+++ b/net/sched/ematch.c
@@ -178,6 +178,7 @@ static int tcf_em_validate(struct tcf_proto *tp,
178 struct tcf_ematch_hdr *em_hdr = nla_data(nla); 178 struct tcf_ematch_hdr *em_hdr = nla_data(nla);
179 int data_len = nla_len(nla) - sizeof(*em_hdr); 179 int data_len = nla_len(nla) - sizeof(*em_hdr);
180 void *data = (void *) em_hdr + sizeof(*em_hdr); 180 void *data = (void *) em_hdr + sizeof(*em_hdr);
181 struct net *net = dev_net(qdisc_dev(tp->q));
181 182
182 if (!TCF_EM_REL_VALID(em_hdr->flags)) 183 if (!TCF_EM_REL_VALID(em_hdr->flags))
183 goto errout; 184 goto errout;
@@ -240,7 +241,7 @@ static int tcf_em_validate(struct tcf_proto *tp,
240 goto errout; 241 goto errout;
241 242
242 if (em->ops->change) { 243 if (em->ops->change) {
243 err = em->ops->change(tp, data, data_len, em); 244 err = em->ops->change(net, data, data_len, em);
244 if (err < 0) 245 if (err < 0)
245 goto errout; 246 goto errout;
246 } else if (data_len > 0) { 247 } else if (data_len > 0) {
@@ -271,6 +272,7 @@ static int tcf_em_validate(struct tcf_proto *tp,
271 em->matchid = em_hdr->matchid; 272 em->matchid = em_hdr->matchid;
272 em->flags = em_hdr->flags; 273 em->flags = em_hdr->flags;
273 em->datalen = data_len; 274 em->datalen = data_len;
275 em->net = net;
274 276
275 err = 0; 277 err = 0;
276errout: 278errout:
@@ -378,7 +380,7 @@ errout:
378 return err; 380 return err;
379 381
380errout_abort: 382errout_abort:
381 tcf_em_tree_destroy(tp, tree); 383 tcf_em_tree_destroy(tree);
382 return err; 384 return err;
383} 385}
384EXPORT_SYMBOL(tcf_em_tree_validate); 386EXPORT_SYMBOL(tcf_em_tree_validate);
@@ -393,7 +395,7 @@ EXPORT_SYMBOL(tcf_em_tree_validate);
393 * tcf_em_tree_validate()/tcf_em_tree_change(). You must ensure that 395 * tcf_em_tree_validate()/tcf_em_tree_change(). You must ensure that
394 * the ematch tree is not in use before calling this function. 396 * the ematch tree is not in use before calling this function.
395 */ 397 */
396void tcf_em_tree_destroy(struct tcf_proto *tp, struct tcf_ematch_tree *tree) 398void tcf_em_tree_destroy(struct tcf_ematch_tree *tree)
397{ 399{
398 int i; 400 int i;
399 401
@@ -405,7 +407,7 @@ void tcf_em_tree_destroy(struct tcf_proto *tp, struct tcf_ematch_tree *tree)
405 407
406 if (em->ops) { 408 if (em->ops) {
407 if (em->ops->destroy) 409 if (em->ops->destroy)
408 em->ops->destroy(tp, em); 410 em->ops->destroy(em);
409 else if (!tcf_em_is_simple(em)) 411 else if (!tcf_em_is_simple(em))
410 kfree((void *) em->data); 412 kfree((void *) em->data);
411 module_put(em->ops->owner); 413 module_put(em->ops->owner);
@@ -526,9 +528,10 @@ pop_stack:
526 match_idx = stack[--stackp]; 528 match_idx = stack[--stackp];
527 cur_match = tcf_em_get_match(tree, match_idx); 529 cur_match = tcf_em_get_match(tree, match_idx);
528 530
531 if (tcf_em_is_inverted(cur_match))
532 res = !res;
533
529 if (tcf_em_early_end(cur_match, res)) { 534 if (tcf_em_early_end(cur_match, res)) {
530 if (tcf_em_is_inverted(cur_match))
531 res = !res;
532 goto pop_stack; 535 goto pop_stack;
533 } else { 536 } else {
534 match_idx++; 537 match_idx++;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 58bed7599db7..2cf61b3e633c 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -578,31 +578,34 @@ static enum hrtimer_restart qdisc_watchdog(struct hrtimer *timer)
578 struct qdisc_watchdog *wd = container_of(timer, struct qdisc_watchdog, 578 struct qdisc_watchdog *wd = container_of(timer, struct qdisc_watchdog,
579 timer); 579 timer);
580 580
581 rcu_read_lock();
581 qdisc_unthrottled(wd->qdisc); 582 qdisc_unthrottled(wd->qdisc);
582 __netif_schedule(qdisc_root(wd->qdisc)); 583 __netif_schedule(qdisc_root(wd->qdisc));
584 rcu_read_unlock();
583 585
584 return HRTIMER_NORESTART; 586 return HRTIMER_NORESTART;
585} 587}
586 588
587void qdisc_watchdog_init(struct qdisc_watchdog *wd, struct Qdisc *qdisc) 589void qdisc_watchdog_init(struct qdisc_watchdog *wd, struct Qdisc *qdisc)
588{ 590{
589 hrtimer_init(&wd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); 591 hrtimer_init(&wd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
590 wd->timer.function = qdisc_watchdog; 592 wd->timer.function = qdisc_watchdog;
591 wd->qdisc = qdisc; 593 wd->qdisc = qdisc;
592} 594}
593EXPORT_SYMBOL(qdisc_watchdog_init); 595EXPORT_SYMBOL(qdisc_watchdog_init);
594 596
595void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires) 597void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires, bool throttle)
596{ 598{
597 if (test_bit(__QDISC_STATE_DEACTIVATED, 599 if (test_bit(__QDISC_STATE_DEACTIVATED,
598 &qdisc_root_sleeping(wd->qdisc)->state)) 600 &qdisc_root_sleeping(wd->qdisc)->state))
599 return; 601 return;
600 602
601 qdisc_throttled(wd->qdisc); 603 if (throttle)
604 qdisc_throttled(wd->qdisc);
602 605
603 hrtimer_start(&wd->timer, 606 hrtimer_start(&wd->timer,
604 ns_to_ktime(expires), 607 ns_to_ktime(expires),
605 HRTIMER_MODE_ABS); 608 HRTIMER_MODE_ABS_PINNED);
606} 609}
607EXPORT_SYMBOL(qdisc_watchdog_schedule_ns); 610EXPORT_SYMBOL(qdisc_watchdog_schedule_ns);
608 611
@@ -763,7 +766,7 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
763 cops->put(sch, cl); 766 cops->put(sch, cl);
764 } 767 }
765 sch->q.qlen -= n; 768 sch->q.qlen -= n;
766 sch->qstats.drops += drops; 769 __qdisc_qstats_drop(sch, drops);
767 } 770 }
768} 771}
769EXPORT_SYMBOL(qdisc_tree_decrease_qlen); 772EXPORT_SYMBOL(qdisc_tree_decrease_qlen);
@@ -942,6 +945,17 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
942 sch->handle = handle; 945 sch->handle = handle;
943 946
944 if (!ops->init || (err = ops->init(sch, tca[TCA_OPTIONS])) == 0) { 947 if (!ops->init || (err = ops->init(sch, tca[TCA_OPTIONS])) == 0) {
948 if (qdisc_is_percpu_stats(sch)) {
949 sch->cpu_bstats =
950 alloc_percpu(struct gnet_stats_basic_cpu);
951 if (!sch->cpu_bstats)
952 goto err_out4;
953
954 sch->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
955 if (!sch->cpu_qstats)
956 goto err_out4;
957 }
958
945 if (tca[TCA_STAB]) { 959 if (tca[TCA_STAB]) {
946 stab = qdisc_get_stab(tca[TCA_STAB]); 960 stab = qdisc_get_stab(tca[TCA_STAB]);
947 if (IS_ERR(stab)) { 961 if (IS_ERR(stab)) {
@@ -964,8 +978,11 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
964 else 978 else
965 root_lock = qdisc_lock(sch); 979 root_lock = qdisc_lock(sch);
966 980
967 err = gen_new_estimator(&sch->bstats, &sch->rate_est, 981 err = gen_new_estimator(&sch->bstats,
968 root_lock, tca[TCA_RATE]); 982 sch->cpu_bstats,
983 &sch->rate_est,
984 root_lock,
985 tca[TCA_RATE]);
969 if (err) 986 if (err)
970 goto err_out4; 987 goto err_out4;
971 } 988 }
@@ -984,6 +1001,8 @@ err_out:
984 return NULL; 1001 return NULL;
985 1002
986err_out4: 1003err_out4:
1004 free_percpu(sch->cpu_bstats);
1005 free_percpu(sch->cpu_qstats);
987 /* 1006 /*
988 * Any broken qdiscs that would require a ops->reset() here? 1007 * Any broken qdiscs that would require a ops->reset() here?
989 * The qdisc was never in action so it shouldn't be necessary. 1008 * The qdisc was never in action so it shouldn't be necessary.
@@ -1022,9 +1041,11 @@ static int qdisc_change(struct Qdisc *sch, struct nlattr **tca)
1022 because change can't be undone. */ 1041 because change can't be undone. */
1023 if (sch->flags & TCQ_F_MQROOT) 1042 if (sch->flags & TCQ_F_MQROOT)
1024 goto out; 1043 goto out;
1025 gen_replace_estimator(&sch->bstats, &sch->rate_est, 1044 gen_replace_estimator(&sch->bstats,
1026 qdisc_root_sleeping_lock(sch), 1045 sch->cpu_bstats,
1027 tca[TCA_RATE]); 1046 &sch->rate_est,
1047 qdisc_root_sleeping_lock(sch),
1048 tca[TCA_RATE]);
1028 } 1049 }
1029out: 1050out:
1030 return 0; 1051 return 0;
@@ -1299,11 +1320,14 @@ graft:
1299static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid, 1320static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
1300 u32 portid, u32 seq, u16 flags, int event) 1321 u32 portid, u32 seq, u16 flags, int event)
1301{ 1322{
1323 struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
1324 struct gnet_stats_queue __percpu *cpu_qstats = NULL;
1302 struct tcmsg *tcm; 1325 struct tcmsg *tcm;
1303 struct nlmsghdr *nlh; 1326 struct nlmsghdr *nlh;
1304 unsigned char *b = skb_tail_pointer(skb); 1327 unsigned char *b = skb_tail_pointer(skb);
1305 struct gnet_dump d; 1328 struct gnet_dump d;
1306 struct qdisc_size_table *stab; 1329 struct qdisc_size_table *stab;
1330 __u32 qlen;
1307 1331
1308 cond_resched(); 1332 cond_resched();
1309 nlh = nlmsg_put(skb, portid, seq, event, sizeof(*tcm), flags); 1333 nlh = nlmsg_put(skb, portid, seq, event, sizeof(*tcm), flags);
@@ -1321,7 +1345,7 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
1321 goto nla_put_failure; 1345 goto nla_put_failure;
1322 if (q->ops->dump && q->ops->dump(q, skb) < 0) 1346 if (q->ops->dump && q->ops->dump(q, skb) < 0)
1323 goto nla_put_failure; 1347 goto nla_put_failure;
1324 q->qstats.qlen = q->q.qlen; 1348 qlen = q->q.qlen;
1325 1349
1326 stab = rtnl_dereference(q->stab); 1350 stab = rtnl_dereference(q->stab);
1327 if (stab && qdisc_dump_stab(skb, stab) < 0) 1351 if (stab && qdisc_dump_stab(skb, stab) < 0)
@@ -1334,9 +1358,14 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
1334 if (q->ops->dump_stats && q->ops->dump_stats(q, &d) < 0) 1358 if (q->ops->dump_stats && q->ops->dump_stats(q, &d) < 0)
1335 goto nla_put_failure; 1359 goto nla_put_failure;
1336 1360
1337 if (gnet_stats_copy_basic(&d, &q->bstats) < 0 || 1361 if (qdisc_is_percpu_stats(q)) {
1362 cpu_bstats = q->cpu_bstats;
1363 cpu_qstats = q->cpu_qstats;
1364 }
1365
1366 if (gnet_stats_copy_basic(&d, cpu_bstats, &q->bstats) < 0 ||
1338 gnet_stats_copy_rate_est(&d, &q->bstats, &q->rate_est) < 0 || 1367 gnet_stats_copy_rate_est(&d, &q->bstats, &q->rate_est) < 0 ||
1339 gnet_stats_copy_queue(&d, &q->qstats) < 0) 1368 gnet_stats_copy_queue(&d, cpu_qstats, &q->qstats, qlen) < 0)
1340 goto nla_put_failure; 1369 goto nla_put_failure;
1341 1370
1342 if (gnet_stats_finish_copy(&d) < 0) 1371 if (gnet_stats_finish_copy(&d) < 0)
@@ -1781,7 +1810,7 @@ int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
1781 __be16 protocol = skb->protocol; 1810 __be16 protocol = skb->protocol;
1782 int err; 1811 int err;
1783 1812
1784 for (; tp; tp = tp->next) { 1813 for (; tp; tp = rcu_dereference_bh(tp->next)) {
1785 if (tp->protocol != protocol && 1814 if (tp->protocol != protocol &&
1786 tp->protocol != htons(ETH_P_ALL)) 1815 tp->protocol != htons(ETH_P_ALL))
1787 continue; 1816 continue;
@@ -1833,15 +1862,15 @@ void tcf_destroy(struct tcf_proto *tp)
1833{ 1862{
1834 tp->ops->destroy(tp); 1863 tp->ops->destroy(tp);
1835 module_put(tp->ops->owner); 1864 module_put(tp->ops->owner);
1836 kfree(tp); 1865 kfree_rcu(tp, rcu);
1837} 1866}
1838 1867
1839void tcf_destroy_chain(struct tcf_proto **fl) 1868void tcf_destroy_chain(struct tcf_proto __rcu **fl)
1840{ 1869{
1841 struct tcf_proto *tp; 1870 struct tcf_proto *tp;
1842 1871
1843 while ((tp = *fl) != NULL) { 1872 while ((tp = rtnl_dereference(*fl)) != NULL) {
1844 *fl = tp->next; 1873 RCU_INIT_POINTER(*fl, tp->next);
1845 tcf_destroy(tp); 1874 tcf_destroy(tp);
1846 } 1875 }
1847} 1876}
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index 8449b337f9e3..e3e2cc5fd068 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -41,7 +41,7 @@
41 41
42struct atm_flow_data { 42struct atm_flow_data {
43 struct Qdisc *q; /* FIFO, TBF, etc. */ 43 struct Qdisc *q; /* FIFO, TBF, etc. */
44 struct tcf_proto *filter_list; 44 struct tcf_proto __rcu *filter_list;
45 struct atm_vcc *vcc; /* VCC; NULL if VCC is closed */ 45 struct atm_vcc *vcc; /* VCC; NULL if VCC is closed */
46 void (*old_pop)(struct atm_vcc *vcc, 46 void (*old_pop)(struct atm_vcc *vcc,
47 struct sk_buff *skb); /* chaining */ 47 struct sk_buff *skb); /* chaining */
@@ -273,7 +273,7 @@ static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent,
273 error = -ENOBUFS; 273 error = -ENOBUFS;
274 goto err_out; 274 goto err_out;
275 } 275 }
276 flow->filter_list = NULL; 276 RCU_INIT_POINTER(flow->filter_list, NULL);
277 flow->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, classid); 277 flow->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, classid);
278 if (!flow->q) 278 if (!flow->q)
279 flow->q = &noop_qdisc; 279 flow->q = &noop_qdisc;
@@ -311,7 +311,7 @@ static int atm_tc_delete(struct Qdisc *sch, unsigned long arg)
311 pr_debug("atm_tc_delete(sch %p,[qdisc %p],flow %p)\n", sch, p, flow); 311 pr_debug("atm_tc_delete(sch %p,[qdisc %p],flow %p)\n", sch, p, flow);
312 if (list_empty(&flow->list)) 312 if (list_empty(&flow->list))
313 return -EINVAL; 313 return -EINVAL;
314 if (flow->filter_list || flow == &p->link) 314 if (rcu_access_pointer(flow->filter_list) || flow == &p->link)
315 return -EBUSY; 315 return -EBUSY;
316 /* 316 /*
317 * Reference count must be 2: one for "keepalive" (set at class 317 * Reference count must be 2: one for "keepalive" (set at class
@@ -345,7 +345,8 @@ static void atm_tc_walk(struct Qdisc *sch, struct qdisc_walker *walker)
345 } 345 }
346} 346}
347 347
348static struct tcf_proto **atm_tc_find_tcf(struct Qdisc *sch, unsigned long cl) 348static struct tcf_proto __rcu **atm_tc_find_tcf(struct Qdisc *sch,
349 unsigned long cl)
349{ 350{
350 struct atm_qdisc_data *p = qdisc_priv(sch); 351 struct atm_qdisc_data *p = qdisc_priv(sch);
351 struct atm_flow_data *flow = (struct atm_flow_data *)cl; 352 struct atm_flow_data *flow = (struct atm_flow_data *)cl;
@@ -369,11 +370,12 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
369 flow = NULL; 370 flow = NULL;
370 if (TC_H_MAJ(skb->priority) != sch->handle || 371 if (TC_H_MAJ(skb->priority) != sch->handle ||
371 !(flow = (struct atm_flow_data *)atm_tc_get(sch, skb->priority))) { 372 !(flow = (struct atm_flow_data *)atm_tc_get(sch, skb->priority))) {
373 struct tcf_proto *fl;
374
372 list_for_each_entry(flow, &p->flows, list) { 375 list_for_each_entry(flow, &p->flows, list) {
373 if (flow->filter_list) { 376 fl = rcu_dereference_bh(flow->filter_list);
374 result = tc_classify_compat(skb, 377 if (fl) {
375 flow->filter_list, 378 result = tc_classify_compat(skb, fl, &res);
376 &res);
377 if (result < 0) 379 if (result < 0)
378 continue; 380 continue;
379 flow = (struct atm_flow_data *)res.class; 381 flow = (struct atm_flow_data *)res.class;
@@ -415,7 +417,7 @@ done:
415 if (ret != NET_XMIT_SUCCESS) { 417 if (ret != NET_XMIT_SUCCESS) {
416drop: __maybe_unused 418drop: __maybe_unused
417 if (net_xmit_drop_count(ret)) { 419 if (net_xmit_drop_count(ret)) {
418 sch->qstats.drops++; 420 qdisc_qstats_drop(sch);
419 if (flow) 421 if (flow)
420 flow->qstats.drops++; 422 flow->qstats.drops++;
421 } 423 }
@@ -544,7 +546,7 @@ static int atm_tc_init(struct Qdisc *sch, struct nlattr *opt)
544 if (!p->link.q) 546 if (!p->link.q)
545 p->link.q = &noop_qdisc; 547 p->link.q = &noop_qdisc;
546 pr_debug("atm_tc_init: link (%p) qdisc %p\n", &p->link, p->link.q); 548 pr_debug("atm_tc_init: link (%p) qdisc %p\n", &p->link, p->link.q);
547 p->link.filter_list = NULL; 549 RCU_INIT_POINTER(p->link.filter_list, NULL);
548 p->link.vcc = NULL; 550 p->link.vcc = NULL;
549 p->link.sock = NULL; 551 p->link.sock = NULL;
550 p->link.classid = sch->handle; 552 p->link.classid = sch->handle;
@@ -635,10 +637,8 @@ atm_tc_dump_class_stats(struct Qdisc *sch, unsigned long arg,
635{ 637{
636 struct atm_flow_data *flow = (struct atm_flow_data *)arg; 638 struct atm_flow_data *flow = (struct atm_flow_data *)arg;
637 639
638 flow->qstats.qlen = flow->q->q.qlen; 640 if (gnet_stats_copy_basic(d, NULL, &flow->bstats) < 0 ||
639 641 gnet_stats_copy_queue(d, NULL, &flow->qstats, flow->q->q.qlen) < 0)
640 if (gnet_stats_copy_basic(d, &flow->bstats) < 0 ||
641 gnet_stats_copy_queue(d, &flow->qstats) < 0)
642 return -1; 642 return -1;
643 643
644 return 0; 644 return 0;
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 762a04bb8f6d..beeb75f80fdb 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -133,7 +133,7 @@ struct cbq_class {
133 struct gnet_stats_rate_est64 rate_est; 133 struct gnet_stats_rate_est64 rate_est;
134 struct tc_cbq_xstats xstats; 134 struct tc_cbq_xstats xstats;
135 135
136 struct tcf_proto *filter_list; 136 struct tcf_proto __rcu *filter_list;
137 137
138 int refcnt; 138 int refcnt;
139 int filters; 139 int filters;
@@ -221,6 +221,7 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
221 struct cbq_class **defmap; 221 struct cbq_class **defmap;
222 struct cbq_class *cl = NULL; 222 struct cbq_class *cl = NULL;
223 u32 prio = skb->priority; 223 u32 prio = skb->priority;
224 struct tcf_proto *fl;
224 struct tcf_result res; 225 struct tcf_result res;
225 226
226 /* 227 /*
@@ -235,11 +236,12 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
235 int result = 0; 236 int result = 0;
236 defmap = head->defaults; 237 defmap = head->defaults;
237 238
239 fl = rcu_dereference_bh(head->filter_list);
238 /* 240 /*
239 * Step 2+n. Apply classifier. 241 * Step 2+n. Apply classifier.
240 */ 242 */
241 if (!head->filter_list || 243 result = tc_classify_compat(skb, fl, &res);
242 (result = tc_classify_compat(skb, head->filter_list, &res)) < 0) 244 if (!fl || result < 0)
243 goto fallback; 245 goto fallback;
244 246
245 cl = (void *)res.class; 247 cl = (void *)res.class;
@@ -375,7 +377,7 @@ cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
375#endif 377#endif
376 if (cl == NULL) { 378 if (cl == NULL) {
377 if (ret & __NET_XMIT_BYPASS) 379 if (ret & __NET_XMIT_BYPASS)
378 sch->qstats.drops++; 380 qdisc_qstats_drop(sch);
379 kfree_skb(skb); 381 kfree_skb(skb);
380 return ret; 382 return ret;
381 } 383 }
@@ -393,7 +395,7 @@ cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
393 } 395 }
394 396
395 if (net_xmit_drop_count(ret)) { 397 if (net_xmit_drop_count(ret)) {
396 sch->qstats.drops++; 398 qdisc_qstats_drop(sch);
397 cbq_mark_toplevel(q, cl); 399 cbq_mark_toplevel(q, cl);
398 cl->qstats.drops++; 400 cl->qstats.drops++;
399 } 401 }
@@ -615,7 +617,7 @@ static enum hrtimer_restart cbq_undelay(struct hrtimer *timer)
615 617
616 time = ktime_set(0, 0); 618 time = ktime_set(0, 0);
617 time = ktime_add_ns(time, PSCHED_TICKS2NS(now + delay)); 619 time = ktime_add_ns(time, PSCHED_TICKS2NS(now + delay));
618 hrtimer_start(&q->delay_timer, time, HRTIMER_MODE_ABS); 620 hrtimer_start(&q->delay_timer, time, HRTIMER_MODE_ABS_PINNED);
619 } 621 }
620 622
621 qdisc_unthrottled(sch); 623 qdisc_unthrottled(sch);
@@ -648,11 +650,11 @@ static int cbq_reshape_fail(struct sk_buff *skb, struct Qdisc *child)
648 return 0; 650 return 0;
649 } 651 }
650 if (net_xmit_drop_count(ret)) 652 if (net_xmit_drop_count(ret))
651 sch->qstats.drops++; 653 qdisc_qstats_drop(sch);
652 return 0; 654 return 0;
653 } 655 }
654 656
655 sch->qstats.drops++; 657 qdisc_qstats_drop(sch);
656 return -1; 658 return -1;
657} 659}
658#endif 660#endif
@@ -993,7 +995,7 @@ cbq_dequeue(struct Qdisc *sch)
993 */ 995 */
994 996
995 if (sch->q.qlen) { 997 if (sch->q.qlen) {
996 sch->qstats.overlimits++; 998 qdisc_qstats_overlimit(sch);
997 if (q->wd_expires) 999 if (q->wd_expires)
998 qdisc_watchdog_schedule(&q->watchdog, 1000 qdisc_watchdog_schedule(&q->watchdog,
999 now + q->wd_expires); 1001 now + q->wd_expires);
@@ -1384,7 +1386,7 @@ static int cbq_init(struct Qdisc *sch, struct nlattr *opt)
1384 q->link.minidle = -0x7FFFFFFF; 1386 q->link.minidle = -0x7FFFFFFF;
1385 1387
1386 qdisc_watchdog_init(&q->watchdog, sch); 1388 qdisc_watchdog_init(&q->watchdog, sch);
1387 hrtimer_init(&q->delay_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); 1389 hrtimer_init(&q->delay_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
1388 q->delay_timer.function = cbq_undelay; 1390 q->delay_timer.function = cbq_undelay;
1389 q->toplevel = TC_CBQ_MAXLEVEL; 1391 q->toplevel = TC_CBQ_MAXLEVEL;
1390 q->now = psched_get_time(); 1392 q->now = psched_get_time();
@@ -1592,16 +1594,15 @@ cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
1592 struct cbq_sched_data *q = qdisc_priv(sch); 1594 struct cbq_sched_data *q = qdisc_priv(sch);
1593 struct cbq_class *cl = (struct cbq_class *)arg; 1595 struct cbq_class *cl = (struct cbq_class *)arg;
1594 1596
1595 cl->qstats.qlen = cl->q->q.qlen;
1596 cl->xstats.avgidle = cl->avgidle; 1597 cl->xstats.avgidle = cl->avgidle;
1597 cl->xstats.undertime = 0; 1598 cl->xstats.undertime = 0;
1598 1599
1599 if (cl->undertime != PSCHED_PASTPERFECT) 1600 if (cl->undertime != PSCHED_PASTPERFECT)
1600 cl->xstats.undertime = cl->undertime - q->now; 1601 cl->xstats.undertime = cl->undertime - q->now;
1601 1602
1602 if (gnet_stats_copy_basic(d, &cl->bstats) < 0 || 1603 if (gnet_stats_copy_basic(d, NULL, &cl->bstats) < 0 ||
1603 gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 || 1604 gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
1604 gnet_stats_copy_queue(d, &cl->qstats) < 0) 1605 gnet_stats_copy_queue(d, NULL, &cl->qstats, cl->q->q.qlen) < 0)
1605 return -1; 1606 return -1;
1606 1607
1607 return gnet_stats_copy_app(d, &cl->xstats, sizeof(cl->xstats)); 1608 return gnet_stats_copy_app(d, &cl->xstats, sizeof(cl->xstats));
@@ -1757,7 +1758,8 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
1757 } 1758 }
1758 1759
1759 if (tca[TCA_RATE]) { 1760 if (tca[TCA_RATE]) {
1760 err = gen_replace_estimator(&cl->bstats, &cl->rate_est, 1761 err = gen_replace_estimator(&cl->bstats, NULL,
1762 &cl->rate_est,
1761 qdisc_root_sleeping_lock(sch), 1763 qdisc_root_sleeping_lock(sch),
1762 tca[TCA_RATE]); 1764 tca[TCA_RATE]);
1763 if (err) { 1765 if (err) {
@@ -1850,7 +1852,7 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
1850 goto failure; 1852 goto failure;
1851 1853
1852 if (tca[TCA_RATE]) { 1854 if (tca[TCA_RATE]) {
1853 err = gen_new_estimator(&cl->bstats, &cl->rate_est, 1855 err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est,
1854 qdisc_root_sleeping_lock(sch), 1856 qdisc_root_sleeping_lock(sch),
1855 tca[TCA_RATE]); 1857 tca[TCA_RATE]);
1856 if (err) { 1858 if (err) {
@@ -1954,7 +1956,8 @@ static int cbq_delete(struct Qdisc *sch, unsigned long arg)
1954 return 0; 1956 return 0;
1955} 1957}
1956 1958
1957static struct tcf_proto **cbq_find_tcf(struct Qdisc *sch, unsigned long arg) 1959static struct tcf_proto __rcu **cbq_find_tcf(struct Qdisc *sch,
1960 unsigned long arg)
1958{ 1961{
1959 struct cbq_sched_data *q = qdisc_priv(sch); 1962 struct cbq_sched_data *q = qdisc_priv(sch);
1960 struct cbq_class *cl = (struct cbq_class *)arg; 1963 struct cbq_class *cl = (struct cbq_class *)arg;
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index fb666d1e4de3..c009eb9045ce 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -57,7 +57,7 @@ struct choke_sched_data {
57 57
58/* Variables */ 58/* Variables */
59 struct red_vars vars; 59 struct red_vars vars;
60 struct tcf_proto *filter_list; 60 struct tcf_proto __rcu *filter_list;
61 struct { 61 struct {
62 u32 prob_drop; /* Early probability drops */ 62 u32 prob_drop; /* Early probability drops */
63 u32 prob_mark; /* Early probability marks */ 63 u32 prob_mark; /* Early probability marks */
@@ -127,7 +127,7 @@ static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx)
127 if (idx == q->tail) 127 if (idx == q->tail)
128 choke_zap_tail_holes(q); 128 choke_zap_tail_holes(q);
129 129
130 sch->qstats.backlog -= qdisc_pkt_len(skb); 130 qdisc_qstats_backlog_dec(sch, skb);
131 qdisc_drop(skb, sch); 131 qdisc_drop(skb, sch);
132 qdisc_tree_decrease_qlen(sch, 1); 132 qdisc_tree_decrease_qlen(sch, 1);
133 --sch->q.qlen; 133 --sch->q.qlen;
@@ -203,9 +203,11 @@ static bool choke_classify(struct sk_buff *skb,
203{ 203{
204 struct choke_sched_data *q = qdisc_priv(sch); 204 struct choke_sched_data *q = qdisc_priv(sch);
205 struct tcf_result res; 205 struct tcf_result res;
206 struct tcf_proto *fl;
206 int result; 207 int result;
207 208
208 result = tc_classify(skb, q->filter_list, &res); 209 fl = rcu_dereference_bh(q->filter_list);
210 result = tc_classify(skb, fl, &res);
209 if (result >= 0) { 211 if (result >= 0) {
210#ifdef CONFIG_NET_CLS_ACT 212#ifdef CONFIG_NET_CLS_ACT
211 switch (result) { 213 switch (result) {
@@ -259,7 +261,7 @@ static bool choke_match_random(const struct choke_sched_data *q,
259 return false; 261 return false;
260 262
261 oskb = choke_peek_random(q, pidx); 263 oskb = choke_peek_random(q, pidx);
262 if (q->filter_list) 264 if (rcu_access_pointer(q->filter_list))
263 return choke_get_classid(nskb) == choke_get_classid(oskb); 265 return choke_get_classid(nskb) == choke_get_classid(oskb);
264 266
265 return choke_match_flow(oskb, nskb); 267 return choke_match_flow(oskb, nskb);
@@ -267,11 +269,11 @@ static bool choke_match_random(const struct choke_sched_data *q,
267 269
268static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch) 270static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
269{ 271{
272 int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
270 struct choke_sched_data *q = qdisc_priv(sch); 273 struct choke_sched_data *q = qdisc_priv(sch);
271 const struct red_parms *p = &q->parms; 274 const struct red_parms *p = &q->parms;
272 int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
273 275
274 if (q->filter_list) { 276 if (rcu_access_pointer(q->filter_list)) {
275 /* If using external classifiers, get result and record it. */ 277 /* If using external classifiers, get result and record it. */
276 if (!choke_classify(skb, sch, &ret)) 278 if (!choke_classify(skb, sch, &ret))
277 goto other_drop; /* Packet was eaten by filter */ 279 goto other_drop; /* Packet was eaten by filter */
@@ -300,7 +302,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
300 if (q->vars.qavg > p->qth_max) { 302 if (q->vars.qavg > p->qth_max) {
301 q->vars.qcount = -1; 303 q->vars.qcount = -1;
302 304
303 sch->qstats.overlimits++; 305 qdisc_qstats_overlimit(sch);
304 if (use_harddrop(q) || !use_ecn(q) || 306 if (use_harddrop(q) || !use_ecn(q) ||
305 !INET_ECN_set_ce(skb)) { 307 !INET_ECN_set_ce(skb)) {
306 q->stats.forced_drop++; 308 q->stats.forced_drop++;
@@ -313,7 +315,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
313 q->vars.qcount = 0; 315 q->vars.qcount = 0;
314 q->vars.qR = red_random(p); 316 q->vars.qR = red_random(p);
315 317
316 sch->qstats.overlimits++; 318 qdisc_qstats_overlimit(sch);
317 if (!use_ecn(q) || !INET_ECN_set_ce(skb)) { 319 if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
318 q->stats.prob_drop++; 320 q->stats.prob_drop++;
319 goto congestion_drop; 321 goto congestion_drop;
@@ -330,7 +332,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
330 q->tab[q->tail] = skb; 332 q->tab[q->tail] = skb;
331 q->tail = (q->tail + 1) & q->tab_mask; 333 q->tail = (q->tail + 1) & q->tab_mask;
332 ++sch->q.qlen; 334 ++sch->q.qlen;
333 sch->qstats.backlog += qdisc_pkt_len(skb); 335 qdisc_qstats_backlog_inc(sch, skb);
334 return NET_XMIT_SUCCESS; 336 return NET_XMIT_SUCCESS;
335 } 337 }
336 338
@@ -343,7 +345,7 @@ congestion_drop:
343 345
344other_drop: 346other_drop:
345 if (ret & __NET_XMIT_BYPASS) 347 if (ret & __NET_XMIT_BYPASS)
346 sch->qstats.drops++; 348 qdisc_qstats_drop(sch);
347 kfree_skb(skb); 349 kfree_skb(skb);
348 return ret; 350 return ret;
349} 351}
@@ -363,7 +365,7 @@ static struct sk_buff *choke_dequeue(struct Qdisc *sch)
363 q->tab[q->head] = NULL; 365 q->tab[q->head] = NULL;
364 choke_zap_head_holes(q); 366 choke_zap_head_holes(q);
365 --sch->q.qlen; 367 --sch->q.qlen;
366 sch->qstats.backlog -= qdisc_pkt_len(skb); 368 qdisc_qstats_backlog_dec(sch, skb);
367 qdisc_bstats_update(sch, skb); 369 qdisc_bstats_update(sch, skb);
368 370
369 return skb; 371 return skb;
@@ -458,7 +460,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
458 ntab[tail++] = skb; 460 ntab[tail++] = skb;
459 continue; 461 continue;
460 } 462 }
461 sch->qstats.backlog -= qdisc_pkt_len(skb); 463 qdisc_qstats_backlog_dec(sch, skb);
462 --sch->q.qlen; 464 --sch->q.qlen;
463 qdisc_drop(skb, sch); 465 qdisc_drop(skb, sch);
464 } 466 }
@@ -564,7 +566,8 @@ static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
564 return 0; 566 return 0;
565} 567}
566 568
567static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl) 569static struct tcf_proto __rcu **choke_find_tcf(struct Qdisc *sch,
570 unsigned long cl)
568{ 571{
569 struct choke_sched_data *q = qdisc_priv(sch); 572 struct choke_sched_data *q = qdisc_priv(sch);
570 573
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index 2f9ab17db85a..de28f8e968e8 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -149,7 +149,7 @@ static int codel_change(struct Qdisc *sch, struct nlattr *opt)
149 while (sch->q.qlen > sch->limit) { 149 while (sch->q.qlen > sch->limit) {
150 struct sk_buff *skb = __skb_dequeue(&sch->q); 150 struct sk_buff *skb = __skb_dequeue(&sch->q);
151 151
152 sch->qstats.backlog -= qdisc_pkt_len(skb); 152 qdisc_qstats_backlog_dec(sch, skb);
153 qdisc_drop(skb, sch); 153 qdisc_drop(skb, sch);
154 } 154 }
155 qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen); 155 qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 7bbbfe112192..338706092c27 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -35,7 +35,7 @@ struct drr_class {
35 35
36struct drr_sched { 36struct drr_sched {
37 struct list_head active; 37 struct list_head active;
38 struct tcf_proto *filter_list; 38 struct tcf_proto __rcu *filter_list;
39 struct Qdisc_class_hash clhash; 39 struct Qdisc_class_hash clhash;
40}; 40};
41 41
@@ -88,7 +88,8 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
88 88
89 if (cl != NULL) { 89 if (cl != NULL) {
90 if (tca[TCA_RATE]) { 90 if (tca[TCA_RATE]) {
91 err = gen_replace_estimator(&cl->bstats, &cl->rate_est, 91 err = gen_replace_estimator(&cl->bstats, NULL,
92 &cl->rate_est,
92 qdisc_root_sleeping_lock(sch), 93 qdisc_root_sleeping_lock(sch),
93 tca[TCA_RATE]); 94 tca[TCA_RATE]);
94 if (err) 95 if (err)
@@ -116,7 +117,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
116 cl->qdisc = &noop_qdisc; 117 cl->qdisc = &noop_qdisc;
117 118
118 if (tca[TCA_RATE]) { 119 if (tca[TCA_RATE]) {
119 err = gen_replace_estimator(&cl->bstats, &cl->rate_est, 120 err = gen_replace_estimator(&cl->bstats, NULL, &cl->rate_est,
120 qdisc_root_sleeping_lock(sch), 121 qdisc_root_sleeping_lock(sch),
121 tca[TCA_RATE]); 122 tca[TCA_RATE]);
122 if (err) { 123 if (err) {
@@ -184,7 +185,8 @@ static void drr_put_class(struct Qdisc *sch, unsigned long arg)
184 drr_destroy_class(sch, cl); 185 drr_destroy_class(sch, cl);
185} 186}
186 187
187static struct tcf_proto **drr_tcf_chain(struct Qdisc *sch, unsigned long cl) 188static struct tcf_proto __rcu **drr_tcf_chain(struct Qdisc *sch,
189 unsigned long cl)
188{ 190{
189 struct drr_sched *q = qdisc_priv(sch); 191 struct drr_sched *q = qdisc_priv(sch);
190 192
@@ -273,17 +275,16 @@ static int drr_dump_class_stats(struct Qdisc *sch, unsigned long arg,
273 struct gnet_dump *d) 275 struct gnet_dump *d)
274{ 276{
275 struct drr_class *cl = (struct drr_class *)arg; 277 struct drr_class *cl = (struct drr_class *)arg;
278 __u32 qlen = cl->qdisc->q.qlen;
276 struct tc_drr_stats xstats; 279 struct tc_drr_stats xstats;
277 280
278 memset(&xstats, 0, sizeof(xstats)); 281 memset(&xstats, 0, sizeof(xstats));
279 if (cl->qdisc->q.qlen) { 282 if (qlen)
280 xstats.deficit = cl->deficit; 283 xstats.deficit = cl->deficit;
281 cl->qdisc->qstats.qlen = cl->qdisc->q.qlen;
282 }
283 284
284 if (gnet_stats_copy_basic(d, &cl->bstats) < 0 || 285 if (gnet_stats_copy_basic(d, NULL, &cl->bstats) < 0 ||
285 gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 || 286 gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
286 gnet_stats_copy_queue(d, &cl->qdisc->qstats) < 0) 287 gnet_stats_copy_queue(d, NULL, &cl->qdisc->qstats, qlen) < 0)
287 return -1; 288 return -1;
288 289
289 return gnet_stats_copy_app(d, &xstats, sizeof(xstats)); 290 return gnet_stats_copy_app(d, &xstats, sizeof(xstats));
@@ -319,6 +320,7 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
319 struct drr_sched *q = qdisc_priv(sch); 320 struct drr_sched *q = qdisc_priv(sch);
320 struct drr_class *cl; 321 struct drr_class *cl;
321 struct tcf_result res; 322 struct tcf_result res;
323 struct tcf_proto *fl;
322 int result; 324 int result;
323 325
324 if (TC_H_MAJ(skb->priority ^ sch->handle) == 0) { 326 if (TC_H_MAJ(skb->priority ^ sch->handle) == 0) {
@@ -328,7 +330,8 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
328 } 330 }
329 331
330 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; 332 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
331 result = tc_classify(skb, q->filter_list, &res); 333 fl = rcu_dereference_bh(q->filter_list);
334 result = tc_classify(skb, fl, &res);
332 if (result >= 0) { 335 if (result >= 0) {
333#ifdef CONFIG_NET_CLS_ACT 336#ifdef CONFIG_NET_CLS_ACT
334 switch (result) { 337 switch (result) {
@@ -356,7 +359,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch)
356 cl = drr_classify(skb, sch, &err); 359 cl = drr_classify(skb, sch, &err);
357 if (cl == NULL) { 360 if (cl == NULL) {
358 if (err & __NET_XMIT_BYPASS) 361 if (err & __NET_XMIT_BYPASS)
359 sch->qstats.drops++; 362 qdisc_qstats_drop(sch);
360 kfree_skb(skb); 363 kfree_skb(skb);
361 return err; 364 return err;
362 } 365 }
@@ -365,7 +368,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch)
365 if (unlikely(err != NET_XMIT_SUCCESS)) { 368 if (unlikely(err != NET_XMIT_SUCCESS)) {
366 if (net_xmit_drop_count(err)) { 369 if (net_xmit_drop_count(err)) {
367 cl->qstats.drops++; 370 cl->qstats.drops++;
368 sch->qstats.drops++; 371 qdisc_qstats_drop(sch);
369 } 372 }
370 return err; 373 return err;
371 } 374 }
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 49d6ef338b55..227114f27f94 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -37,7 +37,7 @@
37 37
38struct dsmark_qdisc_data { 38struct dsmark_qdisc_data {
39 struct Qdisc *q; 39 struct Qdisc *q;
40 struct tcf_proto *filter_list; 40 struct tcf_proto __rcu *filter_list;
41 u8 *mask; /* "owns" the array */ 41 u8 *mask; /* "owns" the array */
42 u8 *value; 42 u8 *value;
43 u16 indices; 43 u16 indices;
@@ -186,8 +186,8 @@ ignore:
186 } 186 }
187} 187}
188 188
189static inline struct tcf_proto **dsmark_find_tcf(struct Qdisc *sch, 189static inline struct tcf_proto __rcu **dsmark_find_tcf(struct Qdisc *sch,
190 unsigned long cl) 190 unsigned long cl)
191{ 191{
192 struct dsmark_qdisc_data *p = qdisc_priv(sch); 192 struct dsmark_qdisc_data *p = qdisc_priv(sch);
193 return &p->filter_list; 193 return &p->filter_list;
@@ -229,7 +229,8 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch)
229 skb->tc_index = TC_H_MIN(skb->priority); 229 skb->tc_index = TC_H_MIN(skb->priority);
230 else { 230 else {
231 struct tcf_result res; 231 struct tcf_result res;
232 int result = tc_classify(skb, p->filter_list, &res); 232 struct tcf_proto *fl = rcu_dereference_bh(p->filter_list);
233 int result = tc_classify(skb, fl, &res);
233 234
234 pr_debug("result %d class 0x%04x\n", result, res.classid); 235 pr_debug("result %d class 0x%04x\n", result, res.classid);
235 236
@@ -257,7 +258,7 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch)
257 err = qdisc_enqueue(skb, p->q); 258 err = qdisc_enqueue(skb, p->q);
258 if (err != NET_XMIT_SUCCESS) { 259 if (err != NET_XMIT_SUCCESS) {
259 if (net_xmit_drop_count(err)) 260 if (net_xmit_drop_count(err))
260 sch->qstats.drops++; 261 qdisc_qstats_drop(sch);
261 return err; 262 return err;
262 } 263 }
263 264
diff --git a/net/sched/sch_fifo.c b/net/sched/sch_fifo.c
index e15a9eb29087..2e2398cfc694 100644
--- a/net/sched/sch_fifo.c
+++ b/net/sched/sch_fifo.c
@@ -42,7 +42,7 @@ static int pfifo_tail_enqueue(struct sk_buff *skb, struct Qdisc *sch)
42 42
43 /* queue full, remove one skb to fulfill the limit */ 43 /* queue full, remove one skb to fulfill the limit */
44 __qdisc_queue_drop_head(sch, &sch->q); 44 __qdisc_queue_drop_head(sch, &sch->q);
45 sch->qstats.drops++; 45 qdisc_qstats_drop(sch);
46 qdisc_enqueue_tail(skb, sch); 46 qdisc_enqueue_tail(skb, sch);
47 47
48 return NET_XMIT_CN; 48 return NET_XMIT_CN;
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index ba32c2b005d0..cbd7e1fd23b4 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -290,7 +290,7 @@ static struct sk_buff *fq_dequeue_head(struct Qdisc *sch, struct fq_flow *flow)
290 flow->head = skb->next; 290 flow->head = skb->next;
291 skb->next = NULL; 291 skb->next = NULL;
292 flow->qlen--; 292 flow->qlen--;
293 sch->qstats.backlog -= qdisc_pkt_len(skb); 293 qdisc_qstats_backlog_dec(sch, skb);
294 sch->q.qlen--; 294 sch->q.qlen--;
295 } 295 }
296 return skb; 296 return skb;
@@ -371,13 +371,12 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
371 f->qlen++; 371 f->qlen++;
372 if (skb_is_retransmit(skb)) 372 if (skb_is_retransmit(skb))
373 q->stat_tcp_retrans++; 373 q->stat_tcp_retrans++;
374 sch->qstats.backlog += qdisc_pkt_len(skb); 374 qdisc_qstats_backlog_inc(sch, skb);
375 if (fq_flow_is_detached(f)) { 375 if (fq_flow_is_detached(f)) {
376 fq_flow_add_tail(&q->new_flows, f); 376 fq_flow_add_tail(&q->new_flows, f);
377 if (time_after(jiffies, f->age + q->flow_refill_delay)) 377 if (time_after(jiffies, f->age + q->flow_refill_delay))
378 f->credit = max_t(u32, f->credit, q->quantum); 378 f->credit = max_t(u32, f->credit, q->quantum);
379 q->inactive_flows--; 379 q->inactive_flows--;
380 qdisc_unthrottled(sch);
381 } 380 }
382 381
383 /* Note: this overwrites f->age */ 382 /* Note: this overwrites f->age */
@@ -385,7 +384,6 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
385 384
386 if (unlikely(f == &q->internal)) { 385 if (unlikely(f == &q->internal)) {
387 q->stat_internal_packets++; 386 q->stat_internal_packets++;
388 qdisc_unthrottled(sch);
389 } 387 }
390 sch->q.qlen++; 388 sch->q.qlen++;
391 389
@@ -416,7 +414,7 @@ static void fq_check_throttled(struct fq_sched_data *q, u64 now)
416static struct sk_buff *fq_dequeue(struct Qdisc *sch) 414static struct sk_buff *fq_dequeue(struct Qdisc *sch)
417{ 415{
418 struct fq_sched_data *q = qdisc_priv(sch); 416 struct fq_sched_data *q = qdisc_priv(sch);
419 u64 now = ktime_to_ns(ktime_get()); 417 u64 now = ktime_get_ns();
420 struct fq_flow_head *head; 418 struct fq_flow_head *head;
421 struct sk_buff *skb; 419 struct sk_buff *skb;
422 struct fq_flow *f; 420 struct fq_flow *f;
@@ -433,7 +431,8 @@ begin:
433 if (!head->first) { 431 if (!head->first) {
434 if (q->time_next_delayed_flow != ~0ULL) 432 if (q->time_next_delayed_flow != ~0ULL)
435 qdisc_watchdog_schedule_ns(&q->watchdog, 433 qdisc_watchdog_schedule_ns(&q->watchdog,
436 q->time_next_delayed_flow); 434 q->time_next_delayed_flow,
435 false);
437 return NULL; 436 return NULL;
438 } 437 }
439 } 438 }
@@ -495,7 +494,6 @@ begin:
495 } 494 }
496out: 495out:
497 qdisc_bstats_update(sch, skb); 496 qdisc_bstats_update(sch, skb);
498 qdisc_unthrottled(sch);
499 return skb; 497 return skb;
500} 498}
501 499
@@ -787,7 +785,7 @@ nla_put_failure:
787static int fq_dump_stats(struct Qdisc *sch, struct gnet_dump *d) 785static int fq_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
788{ 786{
789 struct fq_sched_data *q = qdisc_priv(sch); 787 struct fq_sched_data *q = qdisc_priv(sch);
790 u64 now = ktime_to_ns(ktime_get()); 788 u64 now = ktime_get_ns();
791 struct tc_fq_qd_stats st = { 789 struct tc_fq_qd_stats st = {
792 .gc_flows = q->stat_gc_flows, 790 .gc_flows = q->stat_gc_flows,
793 .highprio_packets = q->stat_internal_packets, 791 .highprio_packets = q->stat_internal_packets,
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 063b726bf1f8..b9ca32ebc1de 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -52,7 +52,7 @@ struct fq_codel_flow {
52}; /* please try to keep this structure <= 64 bytes */ 52}; /* please try to keep this structure <= 64 bytes */
53 53
54struct fq_codel_sched_data { 54struct fq_codel_sched_data {
55 struct tcf_proto *filter_list; /* optional external classifier */ 55 struct tcf_proto __rcu *filter_list; /* optional external classifier */
56 struct fq_codel_flow *flows; /* Flows table [flows_cnt] */ 56 struct fq_codel_flow *flows; /* Flows table [flows_cnt] */
57 u32 *backlogs; /* backlog table [flows_cnt] */ 57 u32 *backlogs; /* backlog table [flows_cnt] */
58 u32 flows_cnt; /* number of flows */ 58 u32 flows_cnt; /* number of flows */
@@ -77,13 +77,15 @@ static unsigned int fq_codel_hash(const struct fq_codel_sched_data *q,
77 hash = jhash_3words((__force u32)keys.dst, 77 hash = jhash_3words((__force u32)keys.dst,
78 (__force u32)keys.src ^ keys.ip_proto, 78 (__force u32)keys.src ^ keys.ip_proto,
79 (__force u32)keys.ports, q->perturbation); 79 (__force u32)keys.ports, q->perturbation);
80 return ((u64)hash * q->flows_cnt) >> 32; 80
81 return reciprocal_scale(hash, q->flows_cnt);
81} 82}
82 83
83static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch, 84static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
84 int *qerr) 85 int *qerr)
85{ 86{
86 struct fq_codel_sched_data *q = qdisc_priv(sch); 87 struct fq_codel_sched_data *q = qdisc_priv(sch);
88 struct tcf_proto *filter;
87 struct tcf_result res; 89 struct tcf_result res;
88 int result; 90 int result;
89 91
@@ -92,11 +94,12 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
92 TC_H_MIN(skb->priority) <= q->flows_cnt) 94 TC_H_MIN(skb->priority) <= q->flows_cnt)
93 return TC_H_MIN(skb->priority); 95 return TC_H_MIN(skb->priority);
94 96
95 if (!q->filter_list) 97 filter = rcu_dereference(q->filter_list);
98 if (!filter)
96 return fq_codel_hash(q, skb) + 1; 99 return fq_codel_hash(q, skb) + 1;
97 100
98 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; 101 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
99 result = tc_classify(skb, q->filter_list, &res); 102 result = tc_classify(skb, filter, &res);
100 if (result >= 0) { 103 if (result >= 0) {
101#ifdef CONFIG_NET_CLS_ACT 104#ifdef CONFIG_NET_CLS_ACT
102 switch (result) { 105 switch (result) {
@@ -161,8 +164,8 @@ static unsigned int fq_codel_drop(struct Qdisc *sch)
161 q->backlogs[idx] -= len; 164 q->backlogs[idx] -= len;
162 kfree_skb(skb); 165 kfree_skb(skb);
163 sch->q.qlen--; 166 sch->q.qlen--;
164 sch->qstats.drops++; 167 qdisc_qstats_drop(sch);
165 sch->qstats.backlog -= len; 168 qdisc_qstats_backlog_dec(sch, skb);
166 flow->dropped++; 169 flow->dropped++;
167 return idx; 170 return idx;
168} 171}
@@ -177,7 +180,7 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
177 idx = fq_codel_classify(skb, sch, &ret); 180 idx = fq_codel_classify(skb, sch, &ret);
178 if (idx == 0) { 181 if (idx == 0) {
179 if (ret & __NET_XMIT_BYPASS) 182 if (ret & __NET_XMIT_BYPASS)
180 sch->qstats.drops++; 183 qdisc_qstats_drop(sch);
181 kfree_skb(skb); 184 kfree_skb(skb);
182 return ret; 185 return ret;
183 } 186 }
@@ -187,7 +190,7 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
187 flow = &q->flows[idx]; 190 flow = &q->flows[idx];
188 flow_queue_add(flow, skb); 191 flow_queue_add(flow, skb);
189 q->backlogs[idx] += qdisc_pkt_len(skb); 192 q->backlogs[idx] += qdisc_pkt_len(skb);
190 sch->qstats.backlog += qdisc_pkt_len(skb); 193 qdisc_qstats_backlog_inc(sch, skb);
191 194
192 if (list_empty(&flow->flowchain)) { 195 if (list_empty(&flow->flowchain)) {
193 list_add_tail(&flow->flowchain, &q->new_flows); 196 list_add_tail(&flow->flowchain, &q->new_flows);
@@ -495,7 +498,8 @@ static void fq_codel_put(struct Qdisc *q, unsigned long cl)
495{ 498{
496} 499}
497 500
498static struct tcf_proto **fq_codel_find_tcf(struct Qdisc *sch, unsigned long cl) 501static struct tcf_proto __rcu **fq_codel_find_tcf(struct Qdisc *sch,
502 unsigned long cl)
499{ 503{
500 struct fq_codel_sched_data *q = qdisc_priv(sch); 504 struct fq_codel_sched_data *q = qdisc_priv(sch);
501 505
@@ -546,7 +550,7 @@ static int fq_codel_dump_class_stats(struct Qdisc *sch, unsigned long cl,
546 qs.backlog = q->backlogs[idx]; 550 qs.backlog = q->backlogs[idx];
547 qs.drops = flow->dropped; 551 qs.drops = flow->dropped;
548 } 552 }
549 if (gnet_stats_copy_queue(d, &qs) < 0) 553 if (gnet_stats_copy_queue(d, NULL, &qs, 0) < 0)
550 return -1; 554 return -1;
551 if (idx < q->flows_cnt) 555 if (idx < q->flows_cnt)
552 return gnet_stats_copy_app(d, &xstats, sizeof(xstats)); 556 return gnet_stats_copy_app(d, &xstats, sizeof(xstats));
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index fc04fe93c2da..38d58e6cef07 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -47,7 +47,6 @@ EXPORT_SYMBOL(default_qdisc_ops);
47 47
48static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) 48static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
49{ 49{
50 skb_dst_force(skb);
51 q->gso_skb = skb; 50 q->gso_skb = skb;
52 q->qstats.requeues++; 51 q->qstats.requeues++;
53 q->q.qlen++; /* it's still part of the queue */ 52 q->q.qlen++; /* it's still part of the queue */
@@ -56,24 +55,52 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
56 return 0; 55 return 0;
57} 56}
58 57
59static inline struct sk_buff *dequeue_skb(struct Qdisc *q) 58static void try_bulk_dequeue_skb(struct Qdisc *q,
59 struct sk_buff *skb,
60 const struct netdev_queue *txq)
61{
62 int bytelimit = qdisc_avail_bulklimit(txq) - skb->len;
63
64 while (bytelimit > 0) {
65 struct sk_buff *nskb = q->dequeue(q);
66
67 if (!nskb)
68 break;
69
70 bytelimit -= nskb->len; /* covers GSO len */
71 skb->next = nskb;
72 skb = nskb;
73 }
74 skb->next = NULL;
75}
76
77/* Note that dequeue_skb can possibly return a SKB list (via skb->next).
78 * A requeued skb (via q->gso_skb) can also be a SKB list.
79 */
80static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate)
60{ 81{
61 struct sk_buff *skb = q->gso_skb; 82 struct sk_buff *skb = q->gso_skb;
62 const struct netdev_queue *txq = q->dev_queue; 83 const struct netdev_queue *txq = q->dev_queue;
63 84
85 *validate = true;
64 if (unlikely(skb)) { 86 if (unlikely(skb)) {
65 /* check the reason of requeuing without tx lock first */ 87 /* check the reason of requeuing without tx lock first */
66 txq = netdev_get_tx_queue(txq->dev, skb_get_queue_mapping(skb)); 88 txq = skb_get_tx_queue(txq->dev, skb);
67 if (!netif_xmit_frozen_or_stopped(txq)) { 89 if (!netif_xmit_frozen_or_stopped(txq)) {
68 q->gso_skb = NULL; 90 q->gso_skb = NULL;
69 q->q.qlen--; 91 q->q.qlen--;
70 } else 92 } else
71 skb = NULL; 93 skb = NULL;
94 /* skb in gso_skb were already validated */
95 *validate = false;
72 } else { 96 } else {
73 if (!(q->flags & TCQ_F_ONETXQUEUE) || !netif_xmit_frozen_or_stopped(txq)) 97 if (!(q->flags & TCQ_F_ONETXQUEUE) ||
98 !netif_xmit_frozen_or_stopped(txq)) {
74 skb = q->dequeue(q); 99 skb = q->dequeue(q);
100 if (skb && qdisc_may_bulk(q))
101 try_bulk_dequeue_skb(q, skb, txq);
102 }
75 } 103 }
76
77 return skb; 104 return skb;
78} 105}
79 106
@@ -90,7 +117,7 @@ static inline int handle_dev_cpu_collision(struct sk_buff *skb,
90 * detect it by checking xmit owner and drop the packet when 117 * detect it by checking xmit owner and drop the packet when
91 * deadloop is detected. Return OK to try the next skb. 118 * deadloop is detected. Return OK to try the next skb.
92 */ 119 */
93 kfree_skb(skb); 120 kfree_skb_list(skb);
94 net_warn_ratelimited("Dead loop on netdevice %s, fix it urgently!\n", 121 net_warn_ratelimited("Dead loop on netdevice %s, fix it urgently!\n",
95 dev_queue->dev->name); 122 dev_queue->dev->name);
96 ret = qdisc_qlen(q); 123 ret = qdisc_qlen(q);
@@ -107,9 +134,9 @@ static inline int handle_dev_cpu_collision(struct sk_buff *skb,
107} 134}
108 135
109/* 136/*
110 * Transmit one skb, and handle the return status as required. Holding the 137 * Transmit possibly several skbs, and handle the return status as
111 * __QDISC___STATE_RUNNING bit guarantees that only one CPU can execute this 138 * required. Holding the __QDISC___STATE_RUNNING bit guarantees that
112 * function. 139 * only one CPU can execute this function.
113 * 140 *
114 * Returns to the caller: 141 * Returns to the caller:
115 * 0 - queue is empty or throttled. 142 * 0 - queue is empty or throttled.
@@ -117,19 +144,24 @@ static inline int handle_dev_cpu_collision(struct sk_buff *skb,
117 */ 144 */
118int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, 145int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
119 struct net_device *dev, struct netdev_queue *txq, 146 struct net_device *dev, struct netdev_queue *txq,
120 spinlock_t *root_lock) 147 spinlock_t *root_lock, bool validate)
121{ 148{
122 int ret = NETDEV_TX_BUSY; 149 int ret = NETDEV_TX_BUSY;
123 150
124 /* And release qdisc */ 151 /* And release qdisc */
125 spin_unlock(root_lock); 152 spin_unlock(root_lock);
126 153
127 HARD_TX_LOCK(dev, txq, smp_processor_id()); 154 /* Note that we validate skb (GSO, checksum, ...) outside of locks */
128 if (!netif_xmit_frozen_or_stopped(txq)) 155 if (validate)
129 ret = dev_hard_start_xmit(skb, dev, txq); 156 skb = validate_xmit_skb_list(skb, dev);
130 157
131 HARD_TX_UNLOCK(dev, txq); 158 if (skb) {
159 HARD_TX_LOCK(dev, txq, smp_processor_id());
160 if (!netif_xmit_frozen_or_stopped(txq))
161 skb = dev_hard_start_xmit(skb, dev, txq, &ret);
132 162
163 HARD_TX_UNLOCK(dev, txq);
164 }
133 spin_lock(root_lock); 165 spin_lock(root_lock);
134 166
135 if (dev_xmit_complete(ret)) { 167 if (dev_xmit_complete(ret)) {
@@ -178,17 +210,18 @@ static inline int qdisc_restart(struct Qdisc *q)
178 struct net_device *dev; 210 struct net_device *dev;
179 spinlock_t *root_lock; 211 spinlock_t *root_lock;
180 struct sk_buff *skb; 212 struct sk_buff *skb;
213 bool validate;
181 214
182 /* Dequeue packet */ 215 /* Dequeue packet */
183 skb = dequeue_skb(q); 216 skb = dequeue_skb(q, &validate);
184 if (unlikely(!skb)) 217 if (unlikely(!skb))
185 return 0; 218 return 0;
186 WARN_ON_ONCE(skb_dst_is_noref(skb)); 219
187 root_lock = qdisc_lock(q); 220 root_lock = qdisc_lock(q);
188 dev = qdisc_dev(q); 221 dev = qdisc_dev(q);
189 txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb)); 222 txq = skb_get_tx_queue(dev, skb);
190 223
191 return sch_direct_xmit(skb, q, dev, txq, root_lock); 224 return sch_direct_xmit(skb, q, dev, txq, root_lock, validate);
192} 225}
193 226
194void __qdisc_run(struct Qdisc *q) 227void __qdisc_run(struct Qdisc *q)
@@ -518,7 +551,7 @@ static int pfifo_fast_init(struct Qdisc *qdisc, struct nlattr *opt)
518 struct pfifo_fast_priv *priv = qdisc_priv(qdisc); 551 struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
519 552
520 for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) 553 for (prio = 0; prio < PFIFO_FAST_BANDS; prio++)
521 skb_queue_head_init(band2list(priv, prio)); 554 __skb_queue_head_init(band2list(priv, prio));
522 555
523 /* Can by-pass the queue discipline */ 556 /* Can by-pass the queue discipline */
524 qdisc->flags |= TCQ_F_CAN_BYPASS; 557 qdisc->flags |= TCQ_F_CAN_BYPASS;
@@ -616,7 +649,7 @@ void qdisc_reset(struct Qdisc *qdisc)
616 ops->reset(qdisc); 649 ops->reset(qdisc);
617 650
618 if (qdisc->gso_skb) { 651 if (qdisc->gso_skb) {
619 kfree_skb(qdisc->gso_skb); 652 kfree_skb_list(qdisc->gso_skb);
620 qdisc->gso_skb = NULL; 653 qdisc->gso_skb = NULL;
621 qdisc->q.qlen = 0; 654 qdisc->q.qlen = 0;
622 } 655 }
@@ -627,6 +660,9 @@ static void qdisc_rcu_free(struct rcu_head *head)
627{ 660{
628 struct Qdisc *qdisc = container_of(head, struct Qdisc, rcu_head); 661 struct Qdisc *qdisc = container_of(head, struct Qdisc, rcu_head);
629 662
663 if (qdisc_is_percpu_stats(qdisc))
664 free_percpu(qdisc->cpu_bstats);
665
630 kfree((char *) qdisc - qdisc->padded); 666 kfree((char *) qdisc - qdisc->padded);
631} 667}
632 668
@@ -652,7 +688,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
652 module_put(ops->owner); 688 module_put(ops->owner);
653 dev_put(qdisc_dev(qdisc)); 689 dev_put(qdisc_dev(qdisc));
654 690
655 kfree_skb(qdisc->gso_skb); 691 kfree_skb_list(qdisc->gso_skb);
656 /* 692 /*
657 * gen_estimator est_timer() might access qdisc->q.lock, 693 * gen_estimator est_timer() might access qdisc->q.lock,
658 * wait a RCU grace period before freeing qdisc. 694 * wait a RCU grace period before freeing qdisc.
@@ -778,7 +814,7 @@ static void dev_deactivate_queue(struct net_device *dev,
778 struct Qdisc *qdisc_default = _qdisc_default; 814 struct Qdisc *qdisc_default = _qdisc_default;
779 struct Qdisc *qdisc; 815 struct Qdisc *qdisc;
780 816
781 qdisc = dev_queue->qdisc; 817 qdisc = rtnl_dereference(dev_queue->qdisc);
782 if (qdisc) { 818 if (qdisc) {
783 spin_lock_bh(qdisc_lock(qdisc)); 819 spin_lock_bh(qdisc_lock(qdisc));
784 820
@@ -871,7 +907,7 @@ static void dev_init_scheduler_queue(struct net_device *dev,
871{ 907{
872 struct Qdisc *qdisc = _qdisc; 908 struct Qdisc *qdisc = _qdisc;
873 909
874 dev_queue->qdisc = qdisc; 910 rcu_assign_pointer(dev_queue->qdisc, qdisc);
875 dev_queue->qdisc_sleeping = qdisc; 911 dev_queue->qdisc_sleeping = qdisc;
876} 912}
877 913
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 12cbc09157fc..a4ca4517cdc8 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -209,7 +209,7 @@ static int gred_enqueue(struct sk_buff *skb, struct Qdisc *sch)
209 break; 209 break;
210 210
211 case RED_PROB_MARK: 211 case RED_PROB_MARK:
212 sch->qstats.overlimits++; 212 qdisc_qstats_overlimit(sch);
213 if (!gred_use_ecn(t) || !INET_ECN_set_ce(skb)) { 213 if (!gred_use_ecn(t) || !INET_ECN_set_ce(skb)) {
214 q->stats.prob_drop++; 214 q->stats.prob_drop++;
215 goto congestion_drop; 215 goto congestion_drop;
@@ -219,7 +219,7 @@ static int gred_enqueue(struct sk_buff *skb, struct Qdisc *sch)
219 break; 219 break;
220 220
221 case RED_HARD_MARK: 221 case RED_HARD_MARK:
222 sch->qstats.overlimits++; 222 qdisc_qstats_overlimit(sch);
223 if (gred_use_harddrop(t) || !gred_use_ecn(t) || 223 if (gred_use_harddrop(t) || !gred_use_ecn(t) ||
224 !INET_ECN_set_ce(skb)) { 224 !INET_ECN_set_ce(skb)) {
225 q->stats.forced_drop++; 225 q->stats.forced_drop++;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index ec8aeaac1dd7..e6c7416d0332 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -116,7 +116,7 @@ struct hfsc_class {
116 struct gnet_stats_queue qstats; 116 struct gnet_stats_queue qstats;
117 struct gnet_stats_rate_est64 rate_est; 117 struct gnet_stats_rate_est64 rate_est;
118 unsigned int level; /* class level in hierarchy */ 118 unsigned int level; /* class level in hierarchy */
119 struct tcf_proto *filter_list; /* filter list */ 119 struct tcf_proto __rcu *filter_list; /* filter list */
120 unsigned int filter_cnt; /* filter count */ 120 unsigned int filter_cnt; /* filter count */
121 121
122 struct hfsc_sched *sched; /* scheduler data */ 122 struct hfsc_sched *sched; /* scheduler data */
@@ -1014,9 +1014,12 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
1014 cur_time = psched_get_time(); 1014 cur_time = psched_get_time();
1015 1015
1016 if (tca[TCA_RATE]) { 1016 if (tca[TCA_RATE]) {
1017 err = gen_replace_estimator(&cl->bstats, &cl->rate_est, 1017 spinlock_t *lock = qdisc_root_sleeping_lock(sch);
1018 qdisc_root_sleeping_lock(sch), 1018
1019 tca[TCA_RATE]); 1019 err = gen_replace_estimator(&cl->bstats, NULL,
1020 &cl->rate_est,
1021 lock,
1022 tca[TCA_RATE]);
1020 if (err) 1023 if (err)
1021 return err; 1024 return err;
1022 } 1025 }
@@ -1063,7 +1066,7 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
1063 return -ENOBUFS; 1066 return -ENOBUFS;
1064 1067
1065 if (tca[TCA_RATE]) { 1068 if (tca[TCA_RATE]) {
1066 err = gen_new_estimator(&cl->bstats, &cl->rate_est, 1069 err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est,
1067 qdisc_root_sleeping_lock(sch), 1070 qdisc_root_sleeping_lock(sch),
1068 tca[TCA_RATE]); 1071 tca[TCA_RATE]);
1069 if (err) { 1072 if (err) {
@@ -1161,7 +1164,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
1161 1164
1162 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; 1165 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
1163 head = &q->root; 1166 head = &q->root;
1164 tcf = q->root.filter_list; 1167 tcf = rcu_dereference_bh(q->root.filter_list);
1165 while (tcf && (result = tc_classify(skb, tcf, &res)) >= 0) { 1168 while (tcf && (result = tc_classify(skb, tcf, &res)) >= 0) {
1166#ifdef CONFIG_NET_CLS_ACT 1169#ifdef CONFIG_NET_CLS_ACT
1167 switch (result) { 1170 switch (result) {
@@ -1185,7 +1188,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
1185 return cl; /* hit leaf class */ 1188 return cl; /* hit leaf class */
1186 1189
1187 /* apply inner filter chain */ 1190 /* apply inner filter chain */
1188 tcf = cl->filter_list; 1191 tcf = rcu_dereference_bh(cl->filter_list);
1189 head = cl; 1192 head = cl;
1190 } 1193 }
1191 1194
@@ -1285,7 +1288,7 @@ hfsc_unbind_tcf(struct Qdisc *sch, unsigned long arg)
1285 cl->filter_cnt--; 1288 cl->filter_cnt--;
1286} 1289}
1287 1290
1288static struct tcf_proto ** 1291static struct tcf_proto __rcu **
1289hfsc_tcf_chain(struct Qdisc *sch, unsigned long arg) 1292hfsc_tcf_chain(struct Qdisc *sch, unsigned long arg)
1290{ 1293{
1291 struct hfsc_sched *q = qdisc_priv(sch); 1294 struct hfsc_sched *q = qdisc_priv(sch);
@@ -1367,16 +1370,15 @@ hfsc_dump_class_stats(struct Qdisc *sch, unsigned long arg,
1367 struct hfsc_class *cl = (struct hfsc_class *)arg; 1370 struct hfsc_class *cl = (struct hfsc_class *)arg;
1368 struct tc_hfsc_stats xstats; 1371 struct tc_hfsc_stats xstats;
1369 1372
1370 cl->qstats.qlen = cl->qdisc->q.qlen;
1371 cl->qstats.backlog = cl->qdisc->qstats.backlog; 1373 cl->qstats.backlog = cl->qdisc->qstats.backlog;
1372 xstats.level = cl->level; 1374 xstats.level = cl->level;
1373 xstats.period = cl->cl_vtperiod; 1375 xstats.period = cl->cl_vtperiod;
1374 xstats.work = cl->cl_total; 1376 xstats.work = cl->cl_total;
1375 xstats.rtwork = cl->cl_cumul; 1377 xstats.rtwork = cl->cl_cumul;
1376 1378
1377 if (gnet_stats_copy_basic(d, &cl->bstats) < 0 || 1379 if (gnet_stats_copy_basic(d, NULL, &cl->bstats) < 0 ||
1378 gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 || 1380 gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
1379 gnet_stats_copy_queue(d, &cl->qstats) < 0) 1381 gnet_stats_copy_queue(d, NULL, &cl->qstats, cl->qdisc->q.qlen) < 0)
1380 return -1; 1382 return -1;
1381 1383
1382 return gnet_stats_copy_app(d, &xstats, sizeof(xstats)); 1384 return gnet_stats_copy_app(d, &xstats, sizeof(xstats));
@@ -1588,7 +1590,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
1588 cl = hfsc_classify(skb, sch, &err); 1590 cl = hfsc_classify(skb, sch, &err);
1589 if (cl == NULL) { 1591 if (cl == NULL) {
1590 if (err & __NET_XMIT_BYPASS) 1592 if (err & __NET_XMIT_BYPASS)
1591 sch->qstats.drops++; 1593 qdisc_qstats_drop(sch);
1592 kfree_skb(skb); 1594 kfree_skb(skb);
1593 return err; 1595 return err;
1594 } 1596 }
@@ -1597,7 +1599,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
1597 if (unlikely(err != NET_XMIT_SUCCESS)) { 1599 if (unlikely(err != NET_XMIT_SUCCESS)) {
1598 if (net_xmit_drop_count(err)) { 1600 if (net_xmit_drop_count(err)) {
1599 cl->qstats.drops++; 1601 cl->qstats.drops++;
1600 sch->qstats.drops++; 1602 qdisc_qstats_drop(sch);
1601 } 1603 }
1602 return err; 1604 return err;
1603 } 1605 }
@@ -1640,7 +1642,7 @@ hfsc_dequeue(struct Qdisc *sch)
1640 */ 1642 */
1641 cl = vttree_get_minvt(&q->root, cur_time); 1643 cl = vttree_get_minvt(&q->root, cur_time);
1642 if (cl == NULL) { 1644 if (cl == NULL) {
1643 sch->qstats.overlimits++; 1645 qdisc_qstats_overlimit(sch);
1644 hfsc_schedule_watchdog(sch); 1646 hfsc_schedule_watchdog(sch);
1645 return NULL; 1647 return NULL;
1646 } 1648 }
@@ -1695,7 +1697,7 @@ hfsc_drop(struct Qdisc *sch)
1695 list_move_tail(&cl->dlist, &q->droplist); 1697 list_move_tail(&cl->dlist, &q->droplist);
1696 } 1698 }
1697 cl->qstats.drops++; 1699 cl->qstats.drops++;
1698 sch->qstats.drops++; 1700 qdisc_qstats_drop(sch);
1699 sch->q.qlen--; 1701 sch->q.qlen--;
1700 return len; 1702 return len;
1701 } 1703 }
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index d85b6812a7d4..15d3aabfe250 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -376,8 +376,8 @@ static unsigned int hhf_drop(struct Qdisc *sch)
376 struct sk_buff *skb = dequeue_head(bucket); 376 struct sk_buff *skb = dequeue_head(bucket);
377 377
378 sch->q.qlen--; 378 sch->q.qlen--;
379 sch->qstats.drops++; 379 qdisc_qstats_drop(sch);
380 sch->qstats.backlog -= qdisc_pkt_len(skb); 380 qdisc_qstats_backlog_dec(sch, skb);
381 kfree_skb(skb); 381 kfree_skb(skb);
382 } 382 }
383 383
@@ -395,7 +395,7 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch)
395 395
396 bucket = &q->buckets[idx]; 396 bucket = &q->buckets[idx];
397 bucket_add(bucket, skb); 397 bucket_add(bucket, skb);
398 sch->qstats.backlog += qdisc_pkt_len(skb); 398 qdisc_qstats_backlog_inc(sch, skb);
399 399
400 if (list_empty(&bucket->bucketchain)) { 400 if (list_empty(&bucket->bucketchain)) {
401 unsigned int weight; 401 unsigned int weight;
@@ -457,7 +457,7 @@ begin:
457 if (bucket->head) { 457 if (bucket->head) {
458 skb = dequeue_head(bucket); 458 skb = dequeue_head(bucket);
459 sch->q.qlen--; 459 sch->q.qlen--;
460 sch->qstats.backlog -= qdisc_pkt_len(skb); 460 qdisc_qstats_backlog_dec(sch, skb);
461 } 461 }
462 462
463 if (!skb) { 463 if (!skb) {
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 9f949abcacef..f1acb0f60dc3 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -103,7 +103,7 @@ struct htb_class {
103 u32 prio; /* these two are used only by leaves... */ 103 u32 prio; /* these two are used only by leaves... */
104 int quantum; /* but stored for parent-to-leaf return */ 104 int quantum; /* but stored for parent-to-leaf return */
105 105
106 struct tcf_proto *filter_list; /* class attached filters */ 106 struct tcf_proto __rcu *filter_list; /* class attached filters */
107 int filter_cnt; 107 int filter_cnt;
108 int refcnt; /* usage count of this class */ 108 int refcnt; /* usage count of this class */
109 109
@@ -153,7 +153,7 @@ struct htb_sched {
153 int rate2quantum; /* quant = rate / rate2quantum */ 153 int rate2quantum; /* quant = rate / rate2quantum */
154 154
155 /* filters for qdisc itself */ 155 /* filters for qdisc itself */
156 struct tcf_proto *filter_list; 156 struct tcf_proto __rcu *filter_list;
157 157
158#define HTB_WARN_TOOMANYEVENTS 0x1 158#define HTB_WARN_TOOMANYEVENTS 0x1
159 unsigned int warned; /* only one warning */ 159 unsigned int warned; /* only one warning */
@@ -223,9 +223,9 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
223 if (cl->level == 0) 223 if (cl->level == 0)
224 return cl; 224 return cl;
225 /* Start with inner filter chain if a non-leaf class is selected */ 225 /* Start with inner filter chain if a non-leaf class is selected */
226 tcf = cl->filter_list; 226 tcf = rcu_dereference_bh(cl->filter_list);
227 } else { 227 } else {
228 tcf = q->filter_list; 228 tcf = rcu_dereference_bh(q->filter_list);
229 } 229 }
230 230
231 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; 231 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
@@ -251,7 +251,7 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
251 return cl; /* we hit leaf; return it */ 251 return cl; /* we hit leaf; return it */
252 252
253 /* we have got inner class; apply inner filter chain */ 253 /* we have got inner class; apply inner filter chain */
254 tcf = cl->filter_list; 254 tcf = rcu_dereference_bh(cl->filter_list);
255 } 255 }
256 /* classification failed; try to use default class */ 256 /* classification failed; try to use default class */
257 cl = htb_find(TC_H_MAKE(TC_H_MAJ(sch->handle), q->defcls), sch); 257 cl = htb_find(TC_H_MAKE(TC_H_MAJ(sch->handle), q->defcls), sch);
@@ -586,13 +586,13 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
586#ifdef CONFIG_NET_CLS_ACT 586#ifdef CONFIG_NET_CLS_ACT
587 } else if (!cl) { 587 } else if (!cl) {
588 if (ret & __NET_XMIT_BYPASS) 588 if (ret & __NET_XMIT_BYPASS)
589 sch->qstats.drops++; 589 qdisc_qstats_drop(sch);
590 kfree_skb(skb); 590 kfree_skb(skb);
591 return ret; 591 return ret;
592#endif 592#endif
593 } else if ((ret = qdisc_enqueue(skb, cl->un.leaf.q)) != NET_XMIT_SUCCESS) { 593 } else if ((ret = qdisc_enqueue(skb, cl->un.leaf.q)) != NET_XMIT_SUCCESS) {
594 if (net_xmit_drop_count(ret)) { 594 if (net_xmit_drop_count(ret)) {
595 sch->qstats.drops++; 595 qdisc_qstats_drop(sch);
596 cl->qstats.drops++; 596 cl->qstats.drops++;
597 } 597 }
598 return ret; 598 return ret;
@@ -895,7 +895,7 @@ ok:
895 895
896 if (!sch->q.qlen) 896 if (!sch->q.qlen)
897 goto fin; 897 goto fin;
898 q->now = ktime_to_ns(ktime_get()); 898 q->now = ktime_get_ns();
899 start_at = jiffies; 899 start_at = jiffies;
900 900
901 next_event = q->now + 5LLU * NSEC_PER_SEC; 901 next_event = q->now + 5LLU * NSEC_PER_SEC;
@@ -925,14 +925,14 @@ ok:
925 goto ok; 925 goto ok;
926 } 926 }
927 } 927 }
928 sch->qstats.overlimits++; 928 qdisc_qstats_overlimit(sch);
929 if (likely(next_event > q->now)) { 929 if (likely(next_event > q->now)) {
930 if (!test_bit(__QDISC_STATE_DEACTIVATED, 930 if (!test_bit(__QDISC_STATE_DEACTIVATED,
931 &qdisc_root_sleeping(q->watchdog.qdisc)->state)) { 931 &qdisc_root_sleeping(q->watchdog.qdisc)->state)) {
932 ktime_t time = ns_to_ktime(next_event); 932 ktime_t time = ns_to_ktime(next_event);
933 qdisc_throttled(q->watchdog.qdisc); 933 qdisc_throttled(q->watchdog.qdisc);
934 hrtimer_start(&q->watchdog.timer, time, 934 hrtimer_start(&q->watchdog.timer, time,
935 HRTIMER_MODE_ABS); 935 HRTIMER_MODE_ABS_PINNED);
936 } 936 }
937 } else { 937 } else {
938 schedule_work(&q->work); 938 schedule_work(&q->work);
@@ -1044,7 +1044,7 @@ static int htb_init(struct Qdisc *sch, struct nlattr *opt)
1044 1044
1045 qdisc_watchdog_init(&q->watchdog, sch); 1045 qdisc_watchdog_init(&q->watchdog, sch);
1046 INIT_WORK(&q->work, htb_work_func); 1046 INIT_WORK(&q->work, htb_work_func);
1047 skb_queue_head_init(&q->direct_queue); 1047 __skb_queue_head_init(&q->direct_queue);
1048 1048
1049 if (tb[TCA_HTB_DIRECT_QLEN]) 1049 if (tb[TCA_HTB_DIRECT_QLEN])
1050 q->direct_qlen = nla_get_u32(tb[TCA_HTB_DIRECT_QLEN]); 1050 q->direct_qlen = nla_get_u32(tb[TCA_HTB_DIRECT_QLEN]);
@@ -1138,15 +1138,16 @@ static int
1138htb_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump *d) 1138htb_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump *d)
1139{ 1139{
1140 struct htb_class *cl = (struct htb_class *)arg; 1140 struct htb_class *cl = (struct htb_class *)arg;
1141 __u32 qlen = 0;
1141 1142
1142 if (!cl->level && cl->un.leaf.q) 1143 if (!cl->level && cl->un.leaf.q)
1143 cl->qstats.qlen = cl->un.leaf.q->q.qlen; 1144 qlen = cl->un.leaf.q->q.qlen;
1144 cl->xstats.tokens = PSCHED_NS2TICKS(cl->tokens); 1145 cl->xstats.tokens = PSCHED_NS2TICKS(cl->tokens);
1145 cl->xstats.ctokens = PSCHED_NS2TICKS(cl->ctokens); 1146 cl->xstats.ctokens = PSCHED_NS2TICKS(cl->ctokens);
1146 1147
1147 if (gnet_stats_copy_basic(d, &cl->bstats) < 0 || 1148 if (gnet_stats_copy_basic(d, NULL, &cl->bstats) < 0 ||
1148 gnet_stats_copy_rate_est(d, NULL, &cl->rate_est) < 0 || 1149 gnet_stats_copy_rate_est(d, NULL, &cl->rate_est) < 0 ||
1149 gnet_stats_copy_queue(d, &cl->qstats) < 0) 1150 gnet_stats_copy_queue(d, NULL, &cl->qstats, qlen) < 0)
1150 return -1; 1151 return -1;
1151 1152
1152 return gnet_stats_copy_app(d, &cl->xstats, sizeof(cl->xstats)); 1153 return gnet_stats_copy_app(d, &cl->xstats, sizeof(cl->xstats));
@@ -1225,7 +1226,7 @@ static void htb_parent_to_leaf(struct htb_sched *q, struct htb_class *cl,
1225 parent->un.leaf.q = new_q ? new_q : &noop_qdisc; 1226 parent->un.leaf.q = new_q ? new_q : &noop_qdisc;
1226 parent->tokens = parent->buffer; 1227 parent->tokens = parent->buffer;
1227 parent->ctokens = parent->cbuffer; 1228 parent->ctokens = parent->cbuffer;
1228 parent->t_c = ktime_to_ns(ktime_get()); 1229 parent->t_c = ktime_get_ns();
1229 parent->cmode = HTB_CAN_SEND; 1230 parent->cmode = HTB_CAN_SEND;
1230} 1231}
1231 1232
@@ -1402,7 +1403,8 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
1402 goto failure; 1403 goto failure;
1403 1404
1404 if (htb_rate_est || tca[TCA_RATE]) { 1405 if (htb_rate_est || tca[TCA_RATE]) {
1405 err = gen_new_estimator(&cl->bstats, &cl->rate_est, 1406 err = gen_new_estimator(&cl->bstats, NULL,
1407 &cl->rate_est,
1406 qdisc_root_sleeping_lock(sch), 1408 qdisc_root_sleeping_lock(sch),
1407 tca[TCA_RATE] ? : &est.nla); 1409 tca[TCA_RATE] ? : &est.nla);
1408 if (err) { 1410 if (err) {
@@ -1455,7 +1457,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
1455 cl->tokens = PSCHED_TICKS2NS(hopt->buffer); 1457 cl->tokens = PSCHED_TICKS2NS(hopt->buffer);
1456 cl->ctokens = PSCHED_TICKS2NS(hopt->cbuffer); 1458 cl->ctokens = PSCHED_TICKS2NS(hopt->cbuffer);
1457 cl->mbuffer = 60ULL * NSEC_PER_SEC; /* 1min */ 1459 cl->mbuffer = 60ULL * NSEC_PER_SEC; /* 1min */
1458 cl->t_c = ktime_to_ns(ktime_get()); 1460 cl->t_c = ktime_get_ns();
1459 cl->cmode = HTB_CAN_SEND; 1461 cl->cmode = HTB_CAN_SEND;
1460 1462
1461 /* attach to the hash list and parent's family */ 1463 /* attach to the hash list and parent's family */
@@ -1464,8 +1466,11 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
1464 parent->children++; 1466 parent->children++;
1465 } else { 1467 } else {
1466 if (tca[TCA_RATE]) { 1468 if (tca[TCA_RATE]) {
1467 err = gen_replace_estimator(&cl->bstats, &cl->rate_est, 1469 spinlock_t *lock = qdisc_root_sleeping_lock(sch);
1468 qdisc_root_sleeping_lock(sch), 1470
1471 err = gen_replace_estimator(&cl->bstats, NULL,
1472 &cl->rate_est,
1473 lock,
1469 tca[TCA_RATE]); 1474 tca[TCA_RATE]);
1470 if (err) 1475 if (err)
1471 return err; 1476 return err;
@@ -1519,11 +1524,12 @@ failure:
1519 return err; 1524 return err;
1520} 1525}
1521 1526
1522static struct tcf_proto **htb_find_tcf(struct Qdisc *sch, unsigned long arg) 1527static struct tcf_proto __rcu **htb_find_tcf(struct Qdisc *sch,
1528 unsigned long arg)
1523{ 1529{
1524 struct htb_sched *q = qdisc_priv(sch); 1530 struct htb_sched *q = qdisc_priv(sch);
1525 struct htb_class *cl = (struct htb_class *)arg; 1531 struct htb_class *cl = (struct htb_class *)arg;
1526 struct tcf_proto **fl = cl ? &cl->filter_list : &q->filter_list; 1532 struct tcf_proto __rcu **fl = cl ? &cl->filter_list : &q->filter_list;
1527 1533
1528 return fl; 1534 return fl;
1529} 1535}
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 62871c14e1f9..eb5b8445fef9 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -17,7 +17,7 @@
17 17
18 18
19struct ingress_qdisc_data { 19struct ingress_qdisc_data {
20 struct tcf_proto *filter_list; 20 struct tcf_proto __rcu *filter_list;
21}; 21};
22 22
23/* ------------------------- Class/flow operations ------------------------- */ 23/* ------------------------- Class/flow operations ------------------------- */
@@ -46,7 +46,8 @@ static void ingress_walk(struct Qdisc *sch, struct qdisc_walker *walker)
46{ 46{
47} 47}
48 48
49static struct tcf_proto **ingress_find_tcf(struct Qdisc *sch, unsigned long cl) 49static struct tcf_proto __rcu **ingress_find_tcf(struct Qdisc *sch,
50 unsigned long cl)
50{ 51{
51 struct ingress_qdisc_data *p = qdisc_priv(sch); 52 struct ingress_qdisc_data *p = qdisc_priv(sch);
52 53
@@ -59,15 +60,16 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
59{ 60{
60 struct ingress_qdisc_data *p = qdisc_priv(sch); 61 struct ingress_qdisc_data *p = qdisc_priv(sch);
61 struct tcf_result res; 62 struct tcf_result res;
63 struct tcf_proto *fl = rcu_dereference_bh(p->filter_list);
62 int result; 64 int result;
63 65
64 result = tc_classify(skb, p->filter_list, &res); 66 result = tc_classify(skb, fl, &res);
65 67
66 qdisc_bstats_update(sch, skb); 68 qdisc_bstats_update(sch, skb);
67 switch (result) { 69 switch (result) {
68 case TC_ACT_SHOT: 70 case TC_ACT_SHOT:
69 result = TC_ACT_SHOT; 71 result = TC_ACT_SHOT;
70 sch->qstats.drops++; 72 qdisc_qstats_drop(sch);
71 break; 73 break;
72 case TC_ACT_STOLEN: 74 case TC_ACT_STOLEN:
73 case TC_ACT_QUEUED: 75 case TC_ACT_QUEUED:
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index a8b2864a696b..f3cbaecd283a 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -112,7 +112,6 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
112 sch->q.qlen += qdisc->q.qlen; 112 sch->q.qlen += qdisc->q.qlen;
113 sch->bstats.bytes += qdisc->bstats.bytes; 113 sch->bstats.bytes += qdisc->bstats.bytes;
114 sch->bstats.packets += qdisc->bstats.packets; 114 sch->bstats.packets += qdisc->bstats.packets;
115 sch->qstats.qlen += qdisc->qstats.qlen;
116 sch->qstats.backlog += qdisc->qstats.backlog; 115 sch->qstats.backlog += qdisc->qstats.backlog;
117 sch->qstats.drops += qdisc->qstats.drops; 116 sch->qstats.drops += qdisc->qstats.drops;
118 sch->qstats.requeues += qdisc->qstats.requeues; 117 sch->qstats.requeues += qdisc->qstats.requeues;
@@ -200,9 +199,8 @@ static int mq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
200 struct netdev_queue *dev_queue = mq_queue_get(sch, cl); 199 struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
201 200
202 sch = dev_queue->qdisc_sleeping; 201 sch = dev_queue->qdisc_sleeping;
203 sch->qstats.qlen = sch->q.qlen; 202 if (gnet_stats_copy_basic(d, NULL, &sch->bstats) < 0 ||
204 if (gnet_stats_copy_basic(d, &sch->bstats) < 0 || 203 gnet_stats_copy_queue(d, NULL, &sch->qstats, sch->q.qlen) < 0)
205 gnet_stats_copy_queue(d, &sch->qstats) < 0)
206 return -1; 204 return -1;
207 return 0; 205 return 0;
208} 206}
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 6749e2f540d0..3811a745452c 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -231,12 +231,11 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
231 memset(&sch->qstats, 0, sizeof(sch->qstats)); 231 memset(&sch->qstats, 0, sizeof(sch->qstats));
232 232
233 for (i = 0; i < dev->num_tx_queues; i++) { 233 for (i = 0; i < dev->num_tx_queues; i++) {
234 qdisc = netdev_get_tx_queue(dev, i)->qdisc; 234 qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc);
235 spin_lock_bh(qdisc_lock(qdisc)); 235 spin_lock_bh(qdisc_lock(qdisc));
236 sch->q.qlen += qdisc->q.qlen; 236 sch->q.qlen += qdisc->q.qlen;
237 sch->bstats.bytes += qdisc->bstats.bytes; 237 sch->bstats.bytes += qdisc->bstats.bytes;
238 sch->bstats.packets += qdisc->bstats.packets; 238 sch->bstats.packets += qdisc->bstats.packets;
239 sch->qstats.qlen += qdisc->qstats.qlen;
240 sch->qstats.backlog += qdisc->qstats.backlog; 239 sch->qstats.backlog += qdisc->qstats.backlog;
241 sch->qstats.drops += qdisc->qstats.drops; 240 sch->qstats.drops += qdisc->qstats.drops;
242 sch->qstats.requeues += qdisc->qstats.requeues; 241 sch->qstats.requeues += qdisc->qstats.requeues;
@@ -327,6 +326,7 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
327 326
328 if (cl <= netdev_get_num_tc(dev)) { 327 if (cl <= netdev_get_num_tc(dev)) {
329 int i; 328 int i;
329 __u32 qlen = 0;
330 struct Qdisc *qdisc; 330 struct Qdisc *qdisc;
331 struct gnet_stats_queue qstats = {0}; 331 struct gnet_stats_queue qstats = {0};
332 struct gnet_stats_basic_packed bstats = {0}; 332 struct gnet_stats_basic_packed bstats = {0};
@@ -340,11 +340,13 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
340 spin_unlock_bh(d->lock); 340 spin_unlock_bh(d->lock);
341 341
342 for (i = tc.offset; i < tc.offset + tc.count; i++) { 342 for (i = tc.offset; i < tc.offset + tc.count; i++) {
343 qdisc = netdev_get_tx_queue(dev, i)->qdisc; 343 struct netdev_queue *q = netdev_get_tx_queue(dev, i);
344
345 qdisc = rtnl_dereference(q->qdisc);
344 spin_lock_bh(qdisc_lock(qdisc)); 346 spin_lock_bh(qdisc_lock(qdisc));
347 qlen += qdisc->q.qlen;
345 bstats.bytes += qdisc->bstats.bytes; 348 bstats.bytes += qdisc->bstats.bytes;
346 bstats.packets += qdisc->bstats.packets; 349 bstats.packets += qdisc->bstats.packets;
347 qstats.qlen += qdisc->qstats.qlen;
348 qstats.backlog += qdisc->qstats.backlog; 350 qstats.backlog += qdisc->qstats.backlog;
349 qstats.drops += qdisc->qstats.drops; 351 qstats.drops += qdisc->qstats.drops;
350 qstats.requeues += qdisc->qstats.requeues; 352 qstats.requeues += qdisc->qstats.requeues;
@@ -353,16 +355,16 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
353 } 355 }
354 /* Reclaim root sleeping lock before completing stats */ 356 /* Reclaim root sleeping lock before completing stats */
355 spin_lock_bh(d->lock); 357 spin_lock_bh(d->lock);
356 if (gnet_stats_copy_basic(d, &bstats) < 0 || 358 if (gnet_stats_copy_basic(d, NULL, &bstats) < 0 ||
357 gnet_stats_copy_queue(d, &qstats) < 0) 359 gnet_stats_copy_queue(d, NULL, &qstats, qlen) < 0)
358 return -1; 360 return -1;
359 } else { 361 } else {
360 struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl); 362 struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl);
361 363
362 sch = dev_queue->qdisc_sleeping; 364 sch = dev_queue->qdisc_sleeping;
363 sch->qstats.qlen = sch->q.qlen; 365 if (gnet_stats_copy_basic(d, NULL, &sch->bstats) < 0 ||
364 if (gnet_stats_copy_basic(d, &sch->bstats) < 0 || 366 gnet_stats_copy_queue(d, NULL,
365 gnet_stats_copy_queue(d, &sch->qstats) < 0) 367 &sch->qstats, sch->q.qlen) < 0)
366 return -1; 368 return -1;
367 } 369 }
368 return 0; 370 return 0;
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index afb050a735fa..42dd218871e0 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -31,7 +31,7 @@ struct multiq_sched_data {
31 u16 bands; 31 u16 bands;
32 u16 max_bands; 32 u16 max_bands;
33 u16 curband; 33 u16 curband;
34 struct tcf_proto *filter_list; 34 struct tcf_proto __rcu *filter_list;
35 struct Qdisc **queues; 35 struct Qdisc **queues;
36}; 36};
37 37
@@ -42,10 +42,11 @@ multiq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
42 struct multiq_sched_data *q = qdisc_priv(sch); 42 struct multiq_sched_data *q = qdisc_priv(sch);
43 u32 band; 43 u32 band;
44 struct tcf_result res; 44 struct tcf_result res;
45 struct tcf_proto *fl = rcu_dereference_bh(q->filter_list);
45 int err; 46 int err;
46 47
47 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; 48 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
48 err = tc_classify(skb, q->filter_list, &res); 49 err = tc_classify(skb, fl, &res);
49#ifdef CONFIG_NET_CLS_ACT 50#ifdef CONFIG_NET_CLS_ACT
50 switch (err) { 51 switch (err) {
51 case TC_ACT_STOLEN: 52 case TC_ACT_STOLEN:
@@ -74,7 +75,7 @@ multiq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
74 if (qdisc == NULL) { 75 if (qdisc == NULL) {
75 76
76 if (ret & __NET_XMIT_BYPASS) 77 if (ret & __NET_XMIT_BYPASS)
77 sch->qstats.drops++; 78 qdisc_qstats_drop(sch);
78 kfree_skb(skb); 79 kfree_skb(skb);
79 return ret; 80 return ret;
80 } 81 }
@@ -86,7 +87,7 @@ multiq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
86 return NET_XMIT_SUCCESS; 87 return NET_XMIT_SUCCESS;
87 } 88 }
88 if (net_xmit_drop_count(ret)) 89 if (net_xmit_drop_count(ret))
89 sch->qstats.drops++; 90 qdisc_qstats_drop(sch);
90 return ret; 91 return ret;
91} 92}
92 93
@@ -359,9 +360,8 @@ static int multiq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
359 struct Qdisc *cl_q; 360 struct Qdisc *cl_q;
360 361
361 cl_q = q->queues[cl - 1]; 362 cl_q = q->queues[cl - 1];
362 cl_q->qstats.qlen = cl_q->q.qlen; 363 if (gnet_stats_copy_basic(d, NULL, &cl_q->bstats) < 0 ||
363 if (gnet_stats_copy_basic(d, &cl_q->bstats) < 0 || 364 gnet_stats_copy_queue(d, NULL, &cl_q->qstats, cl_q->q.qlen) < 0)
364 gnet_stats_copy_queue(d, &cl_q->qstats) < 0)
365 return -1; 365 return -1;
366 366
367 return 0; 367 return 0;
@@ -388,7 +388,8 @@ static void multiq_walk(struct Qdisc *sch, struct qdisc_walker *arg)
388 } 388 }
389} 389}
390 390
391static struct tcf_proto **multiq_find_tcf(struct Qdisc *sch, unsigned long cl) 391static struct tcf_proto __rcu **multiq_find_tcf(struct Qdisc *sch,
392 unsigned long cl)
392{ 393{
393 struct multiq_sched_data *q = qdisc_priv(sch); 394 struct multiq_sched_data *q = qdisc_priv(sch);
394 395
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 111d70fddaea..b34331967e02 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -429,12 +429,12 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
429 /* Drop packet? */ 429 /* Drop packet? */
430 if (loss_event(q)) { 430 if (loss_event(q)) {
431 if (q->ecn && INET_ECN_set_ce(skb)) 431 if (q->ecn && INET_ECN_set_ce(skb))
432 sch->qstats.drops++; /* mark packet */ 432 qdisc_qstats_drop(sch); /* mark packet */
433 else 433 else
434 --count; 434 --count;
435 } 435 }
436 if (count == 0) { 436 if (count == 0) {
437 sch->qstats.drops++; 437 qdisc_qstats_drop(sch);
438 kfree_skb(skb); 438 kfree_skb(skb);
439 return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; 439 return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
440 } 440 }
@@ -478,7 +478,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
478 if (unlikely(skb_queue_len(&sch->q) >= sch->limit)) 478 if (unlikely(skb_queue_len(&sch->q) >= sch->limit))
479 return qdisc_reshape_fail(skb, sch); 479 return qdisc_reshape_fail(skb, sch);
480 480
481 sch->qstats.backlog += qdisc_pkt_len(skb); 481 qdisc_qstats_backlog_inc(sch, skb);
482 482
483 cb = netem_skb_cb(skb); 483 cb = netem_skb_cb(skb);
484 if (q->gap == 0 || /* not doing reordering */ 484 if (q->gap == 0 || /* not doing reordering */
@@ -549,15 +549,14 @@ static unsigned int netem_drop(struct Qdisc *sch)
549 sch->q.qlen--; 549 sch->q.qlen--;
550 skb->next = NULL; 550 skb->next = NULL;
551 skb->prev = NULL; 551 skb->prev = NULL;
552 len = qdisc_pkt_len(skb); 552 qdisc_qstats_backlog_dec(sch, skb);
553 sch->qstats.backlog -= len;
554 kfree_skb(skb); 553 kfree_skb(skb);
555 } 554 }
556 } 555 }
557 if (!len && q->qdisc && q->qdisc->ops->drop) 556 if (!len && q->qdisc && q->qdisc->ops->drop)
558 len = q->qdisc->ops->drop(q->qdisc); 557 len = q->qdisc->ops->drop(q->qdisc);
559 if (len) 558 if (len)
560 sch->qstats.drops++; 559 qdisc_qstats_drop(sch);
561 560
562 return len; 561 return len;
563} 562}
@@ -575,7 +574,7 @@ tfifo_dequeue:
575 skb = __skb_dequeue(&sch->q); 574 skb = __skb_dequeue(&sch->q);
576 if (skb) { 575 if (skb) {
577deliver: 576deliver:
578 sch->qstats.backlog -= qdisc_pkt_len(skb); 577 qdisc_qstats_backlog_dec(sch, skb);
579 qdisc_unthrottled(sch); 578 qdisc_unthrottled(sch);
580 qdisc_bstats_update(sch, skb); 579 qdisc_bstats_update(sch, skb);
581 return skb; 580 return skb;
@@ -610,7 +609,7 @@ deliver:
610 609
611 if (unlikely(err != NET_XMIT_SUCCESS)) { 610 if (unlikely(err != NET_XMIT_SUCCESS)) {
612 if (net_xmit_drop_count(err)) { 611 if (net_xmit_drop_count(err)) {
613 sch->qstats.drops++; 612 qdisc_qstats_drop(sch);
614 qdisc_tree_decrease_qlen(sch, 1); 613 qdisc_tree_decrease_qlen(sch, 1);
615 } 614 }
616 } 615 }
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index fefeeb73f15f..33d7a98a7a97 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -232,7 +232,7 @@ static int pie_change(struct Qdisc *sch, struct nlattr *opt)
232 while (sch->q.qlen > sch->limit) { 232 while (sch->q.qlen > sch->limit) {
233 struct sk_buff *skb = __skb_dequeue(&sch->q); 233 struct sk_buff *skb = __skb_dequeue(&sch->q);
234 234
235 sch->qstats.backlog -= qdisc_pkt_len(skb); 235 qdisc_qstats_backlog_dec(sch, skb);
236 qdisc_drop(skb, sch); 236 qdisc_drop(skb, sch);
237 } 237 }
238 qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen); 238 qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 79359b69ad8d..8e5cd34aaa74 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -24,7 +24,7 @@
24 24
25struct prio_sched_data { 25struct prio_sched_data {
26 int bands; 26 int bands;
27 struct tcf_proto *filter_list; 27 struct tcf_proto __rcu *filter_list;
28 u8 prio2band[TC_PRIO_MAX+1]; 28 u8 prio2band[TC_PRIO_MAX+1];
29 struct Qdisc *queues[TCQ_PRIO_BANDS]; 29 struct Qdisc *queues[TCQ_PRIO_BANDS];
30}; 30};
@@ -36,11 +36,13 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
36 struct prio_sched_data *q = qdisc_priv(sch); 36 struct prio_sched_data *q = qdisc_priv(sch);
37 u32 band = skb->priority; 37 u32 band = skb->priority;
38 struct tcf_result res; 38 struct tcf_result res;
39 struct tcf_proto *fl;
39 int err; 40 int err;
40 41
41 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; 42 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
42 if (TC_H_MAJ(skb->priority) != sch->handle) { 43 if (TC_H_MAJ(skb->priority) != sch->handle) {
43 err = tc_classify(skb, q->filter_list, &res); 44 fl = rcu_dereference_bh(q->filter_list);
45 err = tc_classify(skb, fl, &res);
44#ifdef CONFIG_NET_CLS_ACT 46#ifdef CONFIG_NET_CLS_ACT
45 switch (err) { 47 switch (err) {
46 case TC_ACT_STOLEN: 48 case TC_ACT_STOLEN:
@@ -50,7 +52,7 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
50 return NULL; 52 return NULL;
51 } 53 }
52#endif 54#endif
53 if (!q->filter_list || err < 0) { 55 if (!fl || err < 0) {
54 if (TC_H_MAJ(band)) 56 if (TC_H_MAJ(band))
55 band = 0; 57 band = 0;
56 return q->queues[q->prio2band[band & TC_PRIO_MAX]]; 58 return q->queues[q->prio2band[band & TC_PRIO_MAX]];
@@ -75,7 +77,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch)
75 if (qdisc == NULL) { 77 if (qdisc == NULL) {
76 78
77 if (ret & __NET_XMIT_BYPASS) 79 if (ret & __NET_XMIT_BYPASS)
78 sch->qstats.drops++; 80 qdisc_qstats_drop(sch);
79 kfree_skb(skb); 81 kfree_skb(skb);
80 return ret; 82 return ret;
81 } 83 }
@@ -87,7 +89,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch)
87 return NET_XMIT_SUCCESS; 89 return NET_XMIT_SUCCESS;
88 } 90 }
89 if (net_xmit_drop_count(ret)) 91 if (net_xmit_drop_count(ret))
90 sch->qstats.drops++; 92 qdisc_qstats_drop(sch);
91 return ret; 93 return ret;
92} 94}
93 95
@@ -322,9 +324,8 @@ static int prio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
322 struct Qdisc *cl_q; 324 struct Qdisc *cl_q;
323 325
324 cl_q = q->queues[cl - 1]; 326 cl_q = q->queues[cl - 1];
325 cl_q->qstats.qlen = cl_q->q.qlen; 327 if (gnet_stats_copy_basic(d, NULL, &cl_q->bstats) < 0 ||
326 if (gnet_stats_copy_basic(d, &cl_q->bstats) < 0 || 328 gnet_stats_copy_queue(d, NULL, &cl_q->qstats, cl_q->q.qlen) < 0)
327 gnet_stats_copy_queue(d, &cl_q->qstats) < 0)
328 return -1; 329 return -1;
329 330
330 return 0; 331 return 0;
@@ -351,7 +352,8 @@ static void prio_walk(struct Qdisc *sch, struct qdisc_walker *arg)
351 } 352 }
352} 353}
353 354
354static struct tcf_proto **prio_find_tcf(struct Qdisc *sch, unsigned long cl) 355static struct tcf_proto __rcu **prio_find_tcf(struct Qdisc *sch,
356 unsigned long cl)
355{ 357{
356 struct prio_sched_data *q = qdisc_priv(sch); 358 struct prio_sched_data *q = qdisc_priv(sch);
357 359
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 8056fb4e618a..3ec7e88a43ca 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -181,7 +181,7 @@ struct qfq_group {
181}; 181};
182 182
183struct qfq_sched { 183struct qfq_sched {
184 struct tcf_proto *filter_list; 184 struct tcf_proto __rcu *filter_list;
185 struct Qdisc_class_hash clhash; 185 struct Qdisc_class_hash clhash;
186 186
187 u64 oldV, V; /* Precise virtual times. */ 187 u64 oldV, V; /* Precise virtual times. */
@@ -459,7 +459,8 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
459 459
460 if (cl != NULL) { /* modify existing class */ 460 if (cl != NULL) { /* modify existing class */
461 if (tca[TCA_RATE]) { 461 if (tca[TCA_RATE]) {
462 err = gen_replace_estimator(&cl->bstats, &cl->rate_est, 462 err = gen_replace_estimator(&cl->bstats, NULL,
463 &cl->rate_est,
463 qdisc_root_sleeping_lock(sch), 464 qdisc_root_sleeping_lock(sch),
464 tca[TCA_RATE]); 465 tca[TCA_RATE]);
465 if (err) 466 if (err)
@@ -484,7 +485,8 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
484 cl->qdisc = &noop_qdisc; 485 cl->qdisc = &noop_qdisc;
485 486
486 if (tca[TCA_RATE]) { 487 if (tca[TCA_RATE]) {
487 err = gen_new_estimator(&cl->bstats, &cl->rate_est, 488 err = gen_new_estimator(&cl->bstats, NULL,
489 &cl->rate_est,
488 qdisc_root_sleeping_lock(sch), 490 qdisc_root_sleeping_lock(sch),
489 tca[TCA_RATE]); 491 tca[TCA_RATE]);
490 if (err) 492 if (err)
@@ -576,7 +578,8 @@ static void qfq_put_class(struct Qdisc *sch, unsigned long arg)
576 qfq_destroy_class(sch, cl); 578 qfq_destroy_class(sch, cl);
577} 579}
578 580
579static struct tcf_proto **qfq_tcf_chain(struct Qdisc *sch, unsigned long cl) 581static struct tcf_proto __rcu **qfq_tcf_chain(struct Qdisc *sch,
582 unsigned long cl)
580{ 583{
581 struct qfq_sched *q = qdisc_priv(sch); 584 struct qfq_sched *q = qdisc_priv(sch);
582 585
@@ -661,14 +664,14 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
661 struct tc_qfq_stats xstats; 664 struct tc_qfq_stats xstats;
662 665
663 memset(&xstats, 0, sizeof(xstats)); 666 memset(&xstats, 0, sizeof(xstats));
664 cl->qdisc->qstats.qlen = cl->qdisc->q.qlen;
665 667
666 xstats.weight = cl->agg->class_weight; 668 xstats.weight = cl->agg->class_weight;
667 xstats.lmax = cl->agg->lmax; 669 xstats.lmax = cl->agg->lmax;
668 670
669 if (gnet_stats_copy_basic(d, &cl->bstats) < 0 || 671 if (gnet_stats_copy_basic(d, NULL, &cl->bstats) < 0 ||
670 gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 || 672 gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
671 gnet_stats_copy_queue(d, &cl->qdisc->qstats) < 0) 673 gnet_stats_copy_queue(d, NULL,
674 &cl->qdisc->qstats, cl->qdisc->q.qlen) < 0)
672 return -1; 675 return -1;
673 676
674 return gnet_stats_copy_app(d, &xstats, sizeof(xstats)); 677 return gnet_stats_copy_app(d, &xstats, sizeof(xstats));
@@ -704,6 +707,7 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
704 struct qfq_sched *q = qdisc_priv(sch); 707 struct qfq_sched *q = qdisc_priv(sch);
705 struct qfq_class *cl; 708 struct qfq_class *cl;
706 struct tcf_result res; 709 struct tcf_result res;
710 struct tcf_proto *fl;
707 int result; 711 int result;
708 712
709 if (TC_H_MAJ(skb->priority ^ sch->handle) == 0) { 713 if (TC_H_MAJ(skb->priority ^ sch->handle) == 0) {
@@ -714,7 +718,8 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
714 } 718 }
715 719
716 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; 720 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
717 result = tc_classify(skb, q->filter_list, &res); 721 fl = rcu_dereference_bh(q->filter_list);
722 result = tc_classify(skb, fl, &res);
718 if (result >= 0) { 723 if (result >= 0) {
719#ifdef CONFIG_NET_CLS_ACT 724#ifdef CONFIG_NET_CLS_ACT
720 switch (result) { 725 switch (result) {
@@ -1224,7 +1229,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
1224 cl = qfq_classify(skb, sch, &err); 1229 cl = qfq_classify(skb, sch, &err);
1225 if (cl == NULL) { 1230 if (cl == NULL) {
1226 if (err & __NET_XMIT_BYPASS) 1231 if (err & __NET_XMIT_BYPASS)
1227 sch->qstats.drops++; 1232 qdisc_qstats_drop(sch);
1228 kfree_skb(skb); 1233 kfree_skb(skb);
1229 return err; 1234 return err;
1230 } 1235 }
@@ -1244,7 +1249,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
1244 pr_debug("qfq_enqueue: enqueue failed %d\n", err); 1249 pr_debug("qfq_enqueue: enqueue failed %d\n", err);
1245 if (net_xmit_drop_count(err)) { 1250 if (net_xmit_drop_count(err)) {
1246 cl->qstats.drops++; 1251 cl->qstats.drops++;
1247 sch->qstats.drops++; 1252 qdisc_qstats_drop(sch);
1248 } 1253 }
1249 return err; 1254 return err;
1250 } 1255 }
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 633e32defdcc..6c0534cc7758 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -74,7 +74,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch)
74 break; 74 break;
75 75
76 case RED_PROB_MARK: 76 case RED_PROB_MARK:
77 sch->qstats.overlimits++; 77 qdisc_qstats_overlimit(sch);
78 if (!red_use_ecn(q) || !INET_ECN_set_ce(skb)) { 78 if (!red_use_ecn(q) || !INET_ECN_set_ce(skb)) {
79 q->stats.prob_drop++; 79 q->stats.prob_drop++;
80 goto congestion_drop; 80 goto congestion_drop;
@@ -84,7 +84,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch)
84 break; 84 break;
85 85
86 case RED_HARD_MARK: 86 case RED_HARD_MARK:
87 sch->qstats.overlimits++; 87 qdisc_qstats_overlimit(sch);
88 if (red_use_harddrop(q) || !red_use_ecn(q) || 88 if (red_use_harddrop(q) || !red_use_ecn(q) ||
89 !INET_ECN_set_ce(skb)) { 89 !INET_ECN_set_ce(skb)) {
90 q->stats.forced_drop++; 90 q->stats.forced_drop++;
@@ -100,7 +100,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch)
100 sch->q.qlen++; 100 sch->q.qlen++;
101 } else if (net_xmit_drop_count(ret)) { 101 } else if (net_xmit_drop_count(ret)) {
102 q->stats.pdrop++; 102 q->stats.pdrop++;
103 sch->qstats.drops++; 103 qdisc_qstats_drop(sch);
104 } 104 }
105 return ret; 105 return ret;
106 106
@@ -142,7 +142,7 @@ static unsigned int red_drop(struct Qdisc *sch)
142 142
143 if (child->ops->drop && (len = child->ops->drop(child)) > 0) { 143 if (child->ops->drop && (len = child->ops->drop(child)) > 0) {
144 q->stats.other++; 144 q->stats.other++;
145 sch->qstats.drops++; 145 qdisc_qstats_drop(sch);
146 sch->q.qlen--; 146 sch->q.qlen--;
147 return len; 147 return len;
148 } 148 }
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 9b0f7093d970..5819dd82630d 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -55,7 +55,7 @@ struct sfb_bins {
55 55
56struct sfb_sched_data { 56struct sfb_sched_data {
57 struct Qdisc *qdisc; 57 struct Qdisc *qdisc;
58 struct tcf_proto *filter_list; 58 struct tcf_proto __rcu *filter_list;
59 unsigned long rehash_interval; 59 unsigned long rehash_interval;
60 unsigned long warmup_time; /* double buffering warmup time in jiffies */ 60 unsigned long warmup_time; /* double buffering warmup time in jiffies */
61 u32 max; 61 u32 max;
@@ -253,13 +253,13 @@ static bool sfb_rate_limit(struct sk_buff *skb, struct sfb_sched_data *q)
253 return false; 253 return false;
254} 254}
255 255
256static bool sfb_classify(struct sk_buff *skb, struct sfb_sched_data *q, 256static bool sfb_classify(struct sk_buff *skb, struct tcf_proto *fl,
257 int *qerr, u32 *salt) 257 int *qerr, u32 *salt)
258{ 258{
259 struct tcf_result res; 259 struct tcf_result res;
260 int result; 260 int result;
261 261
262 result = tc_classify(skb, q->filter_list, &res); 262 result = tc_classify(skb, fl, &res);
263 if (result >= 0) { 263 if (result >= 0) {
264#ifdef CONFIG_NET_CLS_ACT 264#ifdef CONFIG_NET_CLS_ACT
265 switch (result) { 265 switch (result) {
@@ -281,6 +281,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
281 281
282 struct sfb_sched_data *q = qdisc_priv(sch); 282 struct sfb_sched_data *q = qdisc_priv(sch);
283 struct Qdisc *child = q->qdisc; 283 struct Qdisc *child = q->qdisc;
284 struct tcf_proto *fl;
284 int i; 285 int i;
285 u32 p_min = ~0; 286 u32 p_min = ~0;
286 u32 minqlen = ~0; 287 u32 minqlen = ~0;
@@ -289,7 +290,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
289 struct flow_keys keys; 290 struct flow_keys keys;
290 291
291 if (unlikely(sch->q.qlen >= q->limit)) { 292 if (unlikely(sch->q.qlen >= q->limit)) {
292 sch->qstats.overlimits++; 293 qdisc_qstats_overlimit(sch);
293 q->stats.queuedrop++; 294 q->stats.queuedrop++;
294 goto drop; 295 goto drop;
295 } 296 }
@@ -306,9 +307,10 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
306 } 307 }
307 } 308 }
308 309
309 if (q->filter_list) { 310 fl = rcu_dereference_bh(q->filter_list);
311 if (fl) {
310 /* If using external classifiers, get result and record it. */ 312 /* If using external classifiers, get result and record it. */
311 if (!sfb_classify(skb, q, &ret, &salt)) 313 if (!sfb_classify(skb, fl, &ret, &salt))
312 goto other_drop; 314 goto other_drop;
313 keys.src = salt; 315 keys.src = salt;
314 keys.dst = 0; 316 keys.dst = 0;
@@ -346,7 +348,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
346 sfb_skb_cb(skb)->hashes[slot] = 0; 348 sfb_skb_cb(skb)->hashes[slot] = 0;
347 349
348 if (unlikely(minqlen >= q->max)) { 350 if (unlikely(minqlen >= q->max)) {
349 sch->qstats.overlimits++; 351 qdisc_qstats_overlimit(sch);
350 q->stats.bucketdrop++; 352 q->stats.bucketdrop++;
351 goto drop; 353 goto drop;
352 } 354 }
@@ -374,7 +376,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
374 } 376 }
375 } 377 }
376 if (sfb_rate_limit(skb, q)) { 378 if (sfb_rate_limit(skb, q)) {
377 sch->qstats.overlimits++; 379 qdisc_qstats_overlimit(sch);
378 q->stats.penaltydrop++; 380 q->stats.penaltydrop++;
379 goto drop; 381 goto drop;
380 } 382 }
@@ -409,7 +411,7 @@ enqueue:
409 increment_qlen(skb, q); 411 increment_qlen(skb, q);
410 } else if (net_xmit_drop_count(ret)) { 412 } else if (net_xmit_drop_count(ret)) {
411 q->stats.childdrop++; 413 q->stats.childdrop++;
412 sch->qstats.drops++; 414 qdisc_qstats_drop(sch);
413 } 415 }
414 return ret; 416 return ret;
415 417
@@ -418,7 +420,7 @@ drop:
418 return NET_XMIT_CN; 420 return NET_XMIT_CN;
419other_drop: 421other_drop:
420 if (ret & __NET_XMIT_BYPASS) 422 if (ret & __NET_XMIT_BYPASS)
421 sch->qstats.drops++; 423 qdisc_qstats_drop(sch);
422 kfree_skb(skb); 424 kfree_skb(skb);
423 return ret; 425 return ret;
424} 426}
@@ -660,7 +662,8 @@ static void sfb_walk(struct Qdisc *sch, struct qdisc_walker *walker)
660 } 662 }
661} 663}
662 664
663static struct tcf_proto **sfb_find_tcf(struct Qdisc *sch, unsigned long cl) 665static struct tcf_proto __rcu **sfb_find_tcf(struct Qdisc *sch,
666 unsigned long cl)
664{ 667{
665 struct sfb_sched_data *q = qdisc_priv(sch); 668 struct sfb_sched_data *q = qdisc_priv(sch);
666 669
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 1af2f73906d0..b877140beda5 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -125,7 +125,7 @@ struct sfq_sched_data {
125 u8 cur_depth; /* depth of longest slot */ 125 u8 cur_depth; /* depth of longest slot */
126 u8 flags; 126 u8 flags;
127 unsigned short scaled_quantum; /* SFQ_ALLOT_SIZE(quantum) */ 127 unsigned short scaled_quantum; /* SFQ_ALLOT_SIZE(quantum) */
128 struct tcf_proto *filter_list; 128 struct tcf_proto __rcu *filter_list;
129 sfq_index *ht; /* Hash table ('divisor' slots) */ 129 sfq_index *ht; /* Hash table ('divisor' slots) */
130 struct sfq_slot *slots; /* Flows table ('maxflows' entries) */ 130 struct sfq_slot *slots; /* Flows table ('maxflows' entries) */
131 131
@@ -187,6 +187,7 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
187{ 187{
188 struct sfq_sched_data *q = qdisc_priv(sch); 188 struct sfq_sched_data *q = qdisc_priv(sch);
189 struct tcf_result res; 189 struct tcf_result res;
190 struct tcf_proto *fl;
190 int result; 191 int result;
191 192
192 if (TC_H_MAJ(skb->priority) == sch->handle && 193 if (TC_H_MAJ(skb->priority) == sch->handle &&
@@ -194,13 +195,14 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
194 TC_H_MIN(skb->priority) <= q->divisor) 195 TC_H_MIN(skb->priority) <= q->divisor)
195 return TC_H_MIN(skb->priority); 196 return TC_H_MIN(skb->priority);
196 197
197 if (!q->filter_list) { 198 fl = rcu_dereference_bh(q->filter_list);
199 if (!fl) {
198 skb_flow_dissect(skb, &sfq_skb_cb(skb)->keys); 200 skb_flow_dissect(skb, &sfq_skb_cb(skb)->keys);
199 return sfq_hash(q, skb) + 1; 201 return sfq_hash(q, skb) + 1;
200 } 202 }
201 203
202 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; 204 *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
203 result = tc_classify(skb, q->filter_list, &res); 205 result = tc_classify(skb, fl, &res);
204 if (result >= 0) { 206 if (result >= 0) {
205#ifdef CONFIG_NET_CLS_ACT 207#ifdef CONFIG_NET_CLS_ACT
206 switch (result) { 208 switch (result) {
@@ -310,11 +312,6 @@ static inline void slot_queue_add(struct sfq_slot *slot, struct sk_buff *skb)
310 slot->skblist_prev = skb; 312 slot->skblist_prev = skb;
311} 313}
312 314
313#define slot_queue_walk(slot, skb) \
314 for (skb = slot->skblist_next; \
315 skb != (struct sk_buff *)slot; \
316 skb = skb->next)
317
318static unsigned int sfq_drop(struct Qdisc *sch) 315static unsigned int sfq_drop(struct Qdisc *sch)
319{ 316{
320 struct sfq_sched_data *q = qdisc_priv(sch); 317 struct sfq_sched_data *q = qdisc_priv(sch);
@@ -334,8 +331,8 @@ drop:
334 sfq_dec(q, x); 331 sfq_dec(q, x);
335 kfree_skb(skb); 332 kfree_skb(skb);
336 sch->q.qlen--; 333 sch->q.qlen--;
337 sch->qstats.drops++; 334 qdisc_qstats_drop(sch);
338 sch->qstats.backlog -= len; 335 qdisc_qstats_backlog_dec(sch, skb);
339 return len; 336 return len;
340 } 337 }
341 338
@@ -382,7 +379,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
382 hash = sfq_classify(skb, sch, &ret); 379 hash = sfq_classify(skb, sch, &ret);
383 if (hash == 0) { 380 if (hash == 0) {
384 if (ret & __NET_XMIT_BYPASS) 381 if (ret & __NET_XMIT_BYPASS)
385 sch->qstats.drops++; 382 qdisc_qstats_drop(sch);
386 kfree_skb(skb); 383 kfree_skb(skb);
387 return ret; 384 return ret;
388 } 385 }
@@ -412,7 +409,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
412 break; 409 break;
413 410
414 case RED_PROB_MARK: 411 case RED_PROB_MARK:
415 sch->qstats.overlimits++; 412 qdisc_qstats_overlimit(sch);
416 if (sfq_prob_mark(q)) { 413 if (sfq_prob_mark(q)) {
417 /* We know we have at least one packet in queue */ 414 /* We know we have at least one packet in queue */
418 if (sfq_headdrop(q) && 415 if (sfq_headdrop(q) &&
@@ -429,7 +426,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
429 goto congestion_drop; 426 goto congestion_drop;
430 427
431 case RED_HARD_MARK: 428 case RED_HARD_MARK:
432 sch->qstats.overlimits++; 429 qdisc_qstats_overlimit(sch);
433 if (sfq_hard_mark(q)) { 430 if (sfq_hard_mark(q)) {
434 /* We know we have at least one packet in queue */ 431 /* We know we have at least one packet in queue */
435 if (sfq_headdrop(q) && 432 if (sfq_headdrop(q) &&
@@ -464,7 +461,7 @@ congestion_drop:
464 } 461 }
465 462
466enqueue: 463enqueue:
467 sch->qstats.backlog += qdisc_pkt_len(skb); 464 qdisc_qstats_backlog_inc(sch, skb);
468 slot->backlog += qdisc_pkt_len(skb); 465 slot->backlog += qdisc_pkt_len(skb);
469 slot_queue_add(slot, skb); 466 slot_queue_add(slot, skb);
470 sfq_inc(q, x); 467 sfq_inc(q, x);
@@ -523,7 +520,7 @@ next_slot:
523 sfq_dec(q, a); 520 sfq_dec(q, a);
524 qdisc_bstats_update(sch, skb); 521 qdisc_bstats_update(sch, skb);
525 sch->q.qlen--; 522 sch->q.qlen--;
526 sch->qstats.backlog -= qdisc_pkt_len(skb); 523 qdisc_qstats_backlog_dec(sch, skb);
527 slot->backlog -= qdisc_pkt_len(skb); 524 slot->backlog -= qdisc_pkt_len(skb);
528 /* Is the slot empty? */ 525 /* Is the slot empty? */
529 if (slot->qlen == 0) { 526 if (slot->qlen == 0) {
@@ -589,7 +586,8 @@ static void sfq_rehash(struct Qdisc *sch)
589 if (x == SFQ_EMPTY_SLOT) { 586 if (x == SFQ_EMPTY_SLOT) {
590 x = q->dep[0].next; /* get a free slot */ 587 x = q->dep[0].next; /* get a free slot */
591 if (x >= SFQ_MAX_FLOWS) { 588 if (x >= SFQ_MAX_FLOWS) {
592drop: sch->qstats.backlog -= qdisc_pkt_len(skb); 589drop:
590 qdisc_qstats_backlog_dec(sch, skb);
593 kfree_skb(skb); 591 kfree_skb(skb);
594 dropped++; 592 dropped++;
595 continue; 593 continue;
@@ -841,7 +839,8 @@ static void sfq_put(struct Qdisc *q, unsigned long cl)
841{ 839{
842} 840}
843 841
844static struct tcf_proto **sfq_find_tcf(struct Qdisc *sch, unsigned long cl) 842static struct tcf_proto __rcu **sfq_find_tcf(struct Qdisc *sch,
843 unsigned long cl)
845{ 844{
846 struct sfq_sched_data *q = qdisc_priv(sch); 845 struct sfq_sched_data *q = qdisc_priv(sch);
847 846
@@ -872,7 +871,7 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
872 qs.qlen = slot->qlen; 871 qs.qlen = slot->qlen;
873 qs.backlog = slot->backlog; 872 qs.backlog = slot->backlog;
874 } 873 }
875 if (gnet_stats_copy_queue(d, &qs) < 0) 874 if (gnet_stats_copy_queue(d, NULL, &qs, qs.qlen) < 0)
876 return -1; 875 return -1;
877 return gnet_stats_copy_app(d, &xstats, sizeof(xstats)); 876 return gnet_stats_copy_app(d, &xstats, sizeof(xstats));
878} 877}
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 18ff63433709..a4afde14e865 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -175,7 +175,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch)
175 ret = qdisc_enqueue(segs, q->qdisc); 175 ret = qdisc_enqueue(segs, q->qdisc);
176 if (ret != NET_XMIT_SUCCESS) { 176 if (ret != NET_XMIT_SUCCESS) {
177 if (net_xmit_drop_count(ret)) 177 if (net_xmit_drop_count(ret))
178 sch->qstats.drops++; 178 qdisc_qstats_drop(sch);
179 } else { 179 } else {
180 nb++; 180 nb++;
181 } 181 }
@@ -201,7 +201,7 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch)
201 ret = qdisc_enqueue(skb, q->qdisc); 201 ret = qdisc_enqueue(skb, q->qdisc);
202 if (ret != NET_XMIT_SUCCESS) { 202 if (ret != NET_XMIT_SUCCESS) {
203 if (net_xmit_drop_count(ret)) 203 if (net_xmit_drop_count(ret))
204 sch->qstats.drops++; 204 qdisc_qstats_drop(sch);
205 return ret; 205 return ret;
206 } 206 }
207 207
@@ -216,7 +216,7 @@ static unsigned int tbf_drop(struct Qdisc *sch)
216 216
217 if (q->qdisc->ops->drop && (len = q->qdisc->ops->drop(q->qdisc)) != 0) { 217 if (q->qdisc->ops->drop && (len = q->qdisc->ops->drop(q->qdisc)) != 0) {
218 sch->q.qlen--; 218 sch->q.qlen--;
219 sch->qstats.drops++; 219 qdisc_qstats_drop(sch);
220 } 220 }
221 return len; 221 return len;
222} 222}
@@ -239,7 +239,7 @@ static struct sk_buff *tbf_dequeue(struct Qdisc *sch)
239 s64 ptoks = 0; 239 s64 ptoks = 0;
240 unsigned int len = qdisc_pkt_len(skb); 240 unsigned int len = qdisc_pkt_len(skb);
241 241
242 now = ktime_to_ns(ktime_get()); 242 now = ktime_get_ns();
243 toks = min_t(s64, now - q->t_c, q->buffer); 243 toks = min_t(s64, now - q->t_c, q->buffer);
244 244
245 if (tbf_peak_present(q)) { 245 if (tbf_peak_present(q)) {
@@ -268,7 +268,8 @@ static struct sk_buff *tbf_dequeue(struct Qdisc *sch)
268 } 268 }
269 269
270 qdisc_watchdog_schedule_ns(&q->watchdog, 270 qdisc_watchdog_schedule_ns(&q->watchdog,
271 now + max_t(long, -toks, -ptoks)); 271 now + max_t(long, -toks, -ptoks),
272 true);
272 273
273 /* Maybe we have a shorter packet in the queue, 274 /* Maybe we have a shorter packet in the queue,
274 which can be sent now. It sounds cool, 275 which can be sent now. It sounds cool,
@@ -281,7 +282,7 @@ static struct sk_buff *tbf_dequeue(struct Qdisc *sch)
281 (cf. CSZ, HPFQ, HFSC) 282 (cf. CSZ, HPFQ, HFSC)
282 */ 283 */
283 284
284 sch->qstats.overlimits++; 285 qdisc_qstats_overlimit(sch);
285 } 286 }
286 return NULL; 287 return NULL;
287} 288}
@@ -292,7 +293,7 @@ static void tbf_reset(struct Qdisc *sch)
292 293
293 qdisc_reset(q->qdisc); 294 qdisc_reset(q->qdisc);
294 sch->q.qlen = 0; 295 sch->q.qlen = 0;
295 q->t_c = ktime_to_ns(ktime_get()); 296 q->t_c = ktime_get_ns();
296 q->tokens = q->buffer; 297 q->tokens = q->buffer;
297 q->ptokens = q->mtu; 298 q->ptokens = q->mtu;
298 qdisc_watchdog_cancel(&q->watchdog); 299 qdisc_watchdog_cancel(&q->watchdog);
@@ -431,7 +432,7 @@ static int tbf_init(struct Qdisc *sch, struct nlattr *opt)
431 if (opt == NULL) 432 if (opt == NULL)
432 return -EINVAL; 433 return -EINVAL;
433 434
434 q->t_c = ktime_to_ns(ktime_get()); 435 q->t_c = ktime_get_ns();
435 qdisc_watchdog_init(&q->watchdog, sch); 436 qdisc_watchdog_init(&q->watchdog, sch);
436 q->qdisc = &noop_qdisc; 437 q->qdisc = &noop_qdisc;
437 438
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index bd33793b527e..6ada42396a24 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -96,11 +96,14 @@ teql_dequeue(struct Qdisc *sch)
96 struct teql_sched_data *dat = qdisc_priv(sch); 96 struct teql_sched_data *dat = qdisc_priv(sch);
97 struct netdev_queue *dat_queue; 97 struct netdev_queue *dat_queue;
98 struct sk_buff *skb; 98 struct sk_buff *skb;
99 struct Qdisc *q;
99 100
100 skb = __skb_dequeue(&dat->q); 101 skb = __skb_dequeue(&dat->q);
101 dat_queue = netdev_get_tx_queue(dat->m->dev, 0); 102 dat_queue = netdev_get_tx_queue(dat->m->dev, 0);
103 q = rcu_dereference_bh(dat_queue->qdisc);
104
102 if (skb == NULL) { 105 if (skb == NULL) {
103 struct net_device *m = qdisc_dev(dat_queue->qdisc); 106 struct net_device *m = qdisc_dev(q);
104 if (m) { 107 if (m) {
105 dat->m->slaves = sch; 108 dat->m->slaves = sch;
106 netif_wake_queue(m); 109 netif_wake_queue(m);
@@ -108,7 +111,7 @@ teql_dequeue(struct Qdisc *sch)
108 } else { 111 } else {
109 qdisc_bstats_update(sch, skb); 112 qdisc_bstats_update(sch, skb);
110 } 113 }
111 sch->q.qlen = dat->q.qlen + dat_queue->qdisc->q.qlen; 114 sch->q.qlen = dat->q.qlen + q->q.qlen;
112 return skb; 115 return skb;
113} 116}
114 117
@@ -157,9 +160,9 @@ teql_destroy(struct Qdisc *sch)
157 txq = netdev_get_tx_queue(master->dev, 0); 160 txq = netdev_get_tx_queue(master->dev, 0);
158 master->slaves = NULL; 161 master->slaves = NULL;
159 162
160 root_lock = qdisc_root_sleeping_lock(txq->qdisc); 163 root_lock = qdisc_root_sleeping_lock(rtnl_dereference(txq->qdisc));
161 spin_lock_bh(root_lock); 164 spin_lock_bh(root_lock);
162 qdisc_reset(txq->qdisc); 165 qdisc_reset(rtnl_dereference(txq->qdisc));
163 spin_unlock_bh(root_lock); 166 spin_unlock_bh(root_lock);
164 } 167 }
165 } 168 }
@@ -266,7 +269,7 @@ static inline int teql_resolve(struct sk_buff *skb,
266 struct dst_entry *dst = skb_dst(skb); 269 struct dst_entry *dst = skb_dst(skb);
267 int res; 270 int res;
268 271
269 if (txq->qdisc == &noop_qdisc) 272 if (rcu_access_pointer(txq->qdisc) == &noop_qdisc)
270 return -ENODEV; 273 return -ENODEV;
271 274
272 if (!dev->header_ops || !dst) 275 if (!dev->header_ops || !dst)
@@ -301,7 +304,6 @@ restart:
301 do { 304 do {
302 struct net_device *slave = qdisc_dev(q); 305 struct net_device *slave = qdisc_dev(q);
303 struct netdev_queue *slave_txq = netdev_get_tx_queue(slave, 0); 306 struct netdev_queue *slave_txq = netdev_get_tx_queue(slave, 0);
304 const struct net_device_ops *slave_ops = slave->netdev_ops;
305 307
306 if (slave_txq->qdisc_sleeping != q) 308 if (slave_txq->qdisc_sleeping != q)
307 continue; 309 continue;
@@ -317,8 +319,8 @@ restart:
317 unsigned int length = qdisc_pkt_len(skb); 319 unsigned int length = qdisc_pkt_len(skb);
318 320
319 if (!netif_xmit_frozen_or_stopped(slave_txq) && 321 if (!netif_xmit_frozen_or_stopped(slave_txq) &&
320 slave_ops->ndo_start_xmit(skb, slave) == NETDEV_TX_OK) { 322 netdev_start_xmit(skb, slave, slave_txq, false) ==
321 txq_trans_update(slave_txq); 323 NETDEV_TX_OK) {
322 __netif_tx_unlock(slave_txq); 324 __netif_tx_unlock(slave_txq);
323 master->slaves = NEXT_SLAVE(q); 325 master->slaves = NEXT_SLAVE(q);
324 netif_wake_queue(dev); 326 netif_wake_queue(dev);
@@ -468,7 +470,7 @@ static __init void teql_master_setup(struct net_device *dev)
468 dev->tx_queue_len = 100; 470 dev->tx_queue_len = 100;
469 dev->flags = IFF_NOARP; 471 dev->flags = IFF_NOARP;
470 dev->hard_header_len = LL_MAX_HEADER; 472 dev->hard_header_len = LL_MAX_HEADER;
471 dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; 473 netif_keep_dst(dev);
472} 474}
473 475
474static LIST_HEAD(master_dev_list); 476static LIST_HEAD(master_dev_list);
diff --git a/net/sctp/input.c b/net/sctp/input.c
index c1b991294516..b6493b3f11a9 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -133,9 +133,13 @@ int sctp_rcv(struct sk_buff *skb)
133 __skb_pull(skb, skb_transport_offset(skb)); 133 __skb_pull(skb, skb_transport_offset(skb));
134 if (skb->len < sizeof(struct sctphdr)) 134 if (skb->len < sizeof(struct sctphdr))
135 goto discard_it; 135 goto discard_it;
136 if (!sctp_checksum_disable && !skb_csum_unnecessary(skb) && 136
137 sctp_rcv_checksum(net, skb) < 0) 137 skb->csum_valid = 0; /* Previous value not applicable */
138 if (skb_csum_unnecessary(skb))
139 __skb_decr_checksum_unnecessary(skb);
140 else if (!sctp_checksum_disable && sctp_rcv_checksum(net, skb) < 0)
138 goto discard_it; 141 goto discard_it;
142 skb->csum_valid = 1;
139 143
140 skb_pull(skb, sizeof(struct sctphdr)); 144 skb_pull(skb, sizeof(struct sctphdr));
141 145
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 6240834f4b95..9d2c6c9facb6 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -366,7 +366,7 @@ static int sctp_v4_available(union sctp_addr *addr, struct sctp_sock *sp)
366 if (addr->v4.sin_addr.s_addr != htonl(INADDR_ANY) && 366 if (addr->v4.sin_addr.s_addr != htonl(INADDR_ANY) &&
367 ret != RTN_LOCAL && 367 ret != RTN_LOCAL &&
368 !sp->inet.freebind && 368 !sp->inet.freebind &&
369 !sysctl_ip_nonlocal_bind) 369 !net->ipv4.sysctl_ip_nonlocal_bind)
370 return 0; 370 return 0;
371 371
372 if (ipv6_only_sock(sctp_opt2sk(sp))) 372 if (ipv6_only_sock(sctp_opt2sk(sp)))
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index d3f1ea460c50..c8f606324134 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -1775,9 +1775,22 @@ static sctp_disposition_t sctp_sf_do_dupcook_a(struct net *net,
1775 /* Update the content of current association. */ 1775 /* Update the content of current association. */
1776 sctp_add_cmd_sf(commands, SCTP_CMD_UPDATE_ASSOC, SCTP_ASOC(new_asoc)); 1776 sctp_add_cmd_sf(commands, SCTP_CMD_UPDATE_ASSOC, SCTP_ASOC(new_asoc));
1777 sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP, SCTP_ULPEVENT(ev)); 1777 sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP, SCTP_ULPEVENT(ev));
1778 sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE, 1778 if (sctp_state(asoc, SHUTDOWN_PENDING) &&
1779 SCTP_STATE(SCTP_STATE_ESTABLISHED)); 1779 (sctp_sstate(asoc->base.sk, CLOSING) ||
1780 sctp_add_cmd_sf(commands, SCTP_CMD_REPLY, SCTP_CHUNK(repl)); 1780 sock_flag(asoc->base.sk, SOCK_DEAD))) {
1781 /* if were currently in SHUTDOWN_PENDING, but the socket
1782 * has been closed by user, don't transition to ESTABLISHED.
1783 * Instead trigger SHUTDOWN bundled with COOKIE_ACK.
1784 */
1785 sctp_add_cmd_sf(commands, SCTP_CMD_REPLY, SCTP_CHUNK(repl));
1786 return sctp_sf_do_9_2_start_shutdown(net, ep, asoc,
1787 SCTP_ST_CHUNK(0), NULL,
1788 commands);
1789 } else {
1790 sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE,
1791 SCTP_STATE(SCTP_STATE_ESTABLISHED));
1792 sctp_add_cmd_sf(commands, SCTP_CMD_REPLY, SCTP_CHUNK(repl));
1793 }
1781 return SCTP_DISPOSITION_CONSUME; 1794 return SCTP_DISPOSITION_CONSUME;
1782 1795
1783nomem_ev: 1796nomem_ev:
diff --git a/net/socket.c b/net/socket.c
index 4cdbc107606f..ffd9cb46902b 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -610,7 +610,7 @@ void sock_release(struct socket *sock)
610} 610}
611EXPORT_SYMBOL(sock_release); 611EXPORT_SYMBOL(sock_release);
612 612
613void sock_tx_timestamp(const struct sock *sk, __u8 *tx_flags) 613void __sock_tx_timestamp(const struct sock *sk, __u8 *tx_flags)
614{ 614{
615 u8 flags = *tx_flags; 615 u8 flags = *tx_flags;
616 616
@@ -626,12 +626,9 @@ void sock_tx_timestamp(const struct sock *sk, __u8 *tx_flags)
626 if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_ACK) 626 if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_ACK)
627 flags |= SKBTX_ACK_TSTAMP; 627 flags |= SKBTX_ACK_TSTAMP;
628 628
629 if (sock_flag(sk, SOCK_WIFI_STATUS))
630 flags |= SKBTX_WIFI_STATUS;
631
632 *tx_flags = flags; 629 *tx_flags = flags;
633} 630}
634EXPORT_SYMBOL(sock_tx_timestamp); 631EXPORT_SYMBOL(__sock_tx_timestamp);
635 632
636static inline int __sock_sendmsg_nosec(struct kiocb *iocb, struct socket *sock, 633static inline int __sock_sendmsg_nosec(struct kiocb *iocb, struct socket *sock,
637 struct msghdr *msg, size_t size) 634 struct msghdr *msg, size_t size)
diff --git a/net/tipc/Makefile b/net/tipc/Makefile
index a080c66d819a..b8a13caad59a 100644
--- a/net/tipc/Makefile
+++ b/net/tipc/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_TIPC) := tipc.o
7tipc-y += addr.o bcast.o bearer.o config.o \ 7tipc-y += addr.o bcast.o bearer.o config.o \
8 core.o link.o discover.o msg.o \ 8 core.o link.o discover.o msg.o \
9 name_distr.o subscr.o name_table.o net.o \ 9 name_distr.o subscr.o name_table.o net.o \
10 netlink.o node.o node_subscr.o port.o ref.o \ 10 netlink.o node.o node_subscr.o \
11 socket.o log.o eth_media.o server.o 11 socket.o log.o eth_media.o server.o
12 12
13tipc-$(CONFIG_TIPC_MEDIA_IB) += ib_media.o 13tipc-$(CONFIG_TIPC_MEDIA_IB) += ib_media.o
diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index dd13bfa09333..b8670bf262e2 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -37,7 +37,6 @@
37 37
38#include "core.h" 38#include "core.h"
39#include "link.h" 39#include "link.h"
40#include "port.h"
41#include "socket.h" 40#include "socket.h"
42#include "msg.h" 41#include "msg.h"
43#include "bcast.h" 42#include "bcast.h"
@@ -227,6 +226,17 @@ static void bclink_retransmit_pkt(u32 after, u32 to)
227} 226}
228 227
229/** 228/**
229 * tipc_bclink_wakeup_users - wake up pending users
230 *
231 * Called with no locks taken
232 */
233void tipc_bclink_wakeup_users(void)
234{
235 while (skb_queue_len(&bclink->link.waiting_sks))
236 tipc_sk_rcv(skb_dequeue(&bclink->link.waiting_sks));
237}
238
239/**
230 * tipc_bclink_acknowledge - handle acknowledgement of broadcast packets 240 * tipc_bclink_acknowledge - handle acknowledgement of broadcast packets
231 * @n_ptr: node that sent acknowledgement info 241 * @n_ptr: node that sent acknowledgement info
232 * @acked: broadcast sequence # that has been acknowledged 242 * @acked: broadcast sequence # that has been acknowledged
@@ -300,8 +310,9 @@ void tipc_bclink_acknowledge(struct tipc_node *n_ptr, u32 acked)
300 tipc_link_push_queue(bcl); 310 tipc_link_push_queue(bcl);
301 bclink_set_last_sent(); 311 bclink_set_last_sent();
302 } 312 }
303 if (unlikely(released && !list_empty(&bcl->waiting_ports))) 313 if (unlikely(released && !skb_queue_empty(&bcl->waiting_sks)))
304 tipc_link_wakeup_ports(bcl, 0); 314 n_ptr->action_flags |= TIPC_WAKEUP_BCAST_USERS;
315
305exit: 316exit:
306 tipc_bclink_unlock(); 317 tipc_bclink_unlock();
307} 318}
@@ -840,9 +851,10 @@ int tipc_bclink_init(void)
840 sprintf(bcbearer->media.name, "tipc-broadcast"); 851 sprintf(bcbearer->media.name, "tipc-broadcast");
841 852
842 spin_lock_init(&bclink->lock); 853 spin_lock_init(&bclink->lock);
843 INIT_LIST_HEAD(&bcl->waiting_ports); 854 __skb_queue_head_init(&bcl->waiting_sks);
844 bcl->next_out_no = 1; 855 bcl->next_out_no = 1;
845 spin_lock_init(&bclink->node.lock); 856 spin_lock_init(&bclink->node.lock);
857 __skb_queue_head_init(&bclink->node.waiting_sks);
846 bcl->owner = &bclink->node; 858 bcl->owner = &bclink->node;
847 bcl->max_pkt = MAX_PKT_DEFAULT_MCAST; 859 bcl->max_pkt = MAX_PKT_DEFAULT_MCAST;
848 tipc_link_set_queue_limits(bcl, BCLINK_WIN_DEFAULT); 860 tipc_link_set_queue_limits(bcl, BCLINK_WIN_DEFAULT);
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 4875d9536aee..e7b0f85a82bc 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -99,5 +99,5 @@ int tipc_bclink_set_queue_limits(u32 limit);
99void tipc_bcbearer_sort(struct tipc_node_map *nm_ptr, u32 node, bool action); 99void tipc_bcbearer_sort(struct tipc_node_map *nm_ptr, u32 node, bool action);
100uint tipc_bclink_get_mtu(void); 100uint tipc_bclink_get_mtu(void);
101int tipc_bclink_xmit(struct sk_buff *buf); 101int tipc_bclink_xmit(struct sk_buff *buf);
102 102void tipc_bclink_wakeup_users(void);
103#endif 103#endif
diff --git a/net/tipc/config.c b/net/tipc/config.c
index 2b42403ad33a..876f4c6a2631 100644
--- a/net/tipc/config.c
+++ b/net/tipc/config.c
@@ -35,7 +35,7 @@
35 */ 35 */
36 36
37#include "core.h" 37#include "core.h"
38#include "port.h" 38#include "socket.h"
39#include "name_table.h" 39#include "name_table.h"
40#include "config.h" 40#include "config.h"
41#include "server.h" 41#include "server.h"
@@ -266,7 +266,7 @@ struct sk_buff *tipc_cfg_do_cmd(u32 orig_node, u16 cmd, const void *request_area
266 rep_tlv_buf = tipc_media_get_names(); 266 rep_tlv_buf = tipc_media_get_names();
267 break; 267 break;
268 case TIPC_CMD_SHOW_PORTS: 268 case TIPC_CMD_SHOW_PORTS:
269 rep_tlv_buf = tipc_port_get_ports(); 269 rep_tlv_buf = tipc_sk_socks_show();
270 break; 270 break;
271 case TIPC_CMD_SHOW_STATS: 271 case TIPC_CMD_SHOW_STATS:
272 rep_tlv_buf = tipc_show_stats(); 272 rep_tlv_buf = tipc_show_stats();
diff --git a/net/tipc/core.c b/net/tipc/core.c
index 676d18015dd8..a5737b8407dd 100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -35,11 +35,10 @@
35 */ 35 */
36 36
37#include "core.h" 37#include "core.h"
38#include "ref.h"
39#include "name_table.h" 38#include "name_table.h"
40#include "subscr.h" 39#include "subscr.h"
41#include "config.h" 40#include "config.h"
42#include "port.h" 41#include "socket.h"
43 42
44#include <linux/module.h> 43#include <linux/module.h>
45 44
@@ -85,7 +84,7 @@ static void tipc_core_stop(void)
85 tipc_netlink_stop(); 84 tipc_netlink_stop();
86 tipc_subscr_stop(); 85 tipc_subscr_stop();
87 tipc_nametbl_stop(); 86 tipc_nametbl_stop();
88 tipc_ref_table_stop(); 87 tipc_sk_ref_table_stop();
89 tipc_socket_stop(); 88 tipc_socket_stop();
90 tipc_unregister_sysctl(); 89 tipc_unregister_sysctl();
91} 90}
@@ -99,7 +98,7 @@ static int tipc_core_start(void)
99 98
100 get_random_bytes(&tipc_random, sizeof(tipc_random)); 99 get_random_bytes(&tipc_random, sizeof(tipc_random));
101 100
102 err = tipc_ref_table_init(tipc_max_ports, tipc_random); 101 err = tipc_sk_ref_table_init(tipc_max_ports, tipc_random);
103 if (err) 102 if (err)
104 goto out_reftbl; 103 goto out_reftbl;
105 104
@@ -139,7 +138,7 @@ out_socket:
139out_netlink: 138out_netlink:
140 tipc_nametbl_stop(); 139 tipc_nametbl_stop();
141out_nametbl: 140out_nametbl:
142 tipc_ref_table_stop(); 141 tipc_sk_ref_table_stop();
143out_reftbl: 142out_reftbl:
144 return err; 143 return err;
145} 144}
diff --git a/net/tipc/core.h b/net/tipc/core.h
index bb26ed1ee966..f773b148722f 100644
--- a/net/tipc/core.h
+++ b/net/tipc/core.h
@@ -81,6 +81,7 @@ extern u32 tipc_own_addr __read_mostly;
81extern int tipc_max_ports __read_mostly; 81extern int tipc_max_ports __read_mostly;
82extern int tipc_net_id __read_mostly; 82extern int tipc_net_id __read_mostly;
83extern int sysctl_tipc_rmem[3] __read_mostly; 83extern int sysctl_tipc_rmem[3] __read_mostly;
84extern int sysctl_tipc_named_timeout __read_mostly;
84 85
85/* 86/*
86 * Other global variables 87 * Other global variables
@@ -187,8 +188,11 @@ static inline void k_term_timer(struct timer_list *timer)
187 188
188struct tipc_skb_cb { 189struct tipc_skb_cb {
189 void *handle; 190 void *handle;
190 bool deferred;
191 struct sk_buff *tail; 191 struct sk_buff *tail;
192 bool deferred;
193 bool wakeup_pending;
194 u16 chain_sz;
195 u16 chain_imp;
192}; 196};
193 197
194#define TIPC_SKB_CB(__skb) ((struct tipc_skb_cb *)&((__skb)->cb[0])) 198#define TIPC_SKB_CB(__skb) ((struct tipc_skb_cb *)&((__skb)->cb[0]))
diff --git a/net/tipc/link.c b/net/tipc/link.c
index fb1485dc6736..65410e18b8a6 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -36,7 +36,6 @@
36 36
37#include "core.h" 37#include "core.h"
38#include "link.h" 38#include "link.h"
39#include "port.h"
40#include "socket.h" 39#include "socket.h"
41#include "name_distr.h" 40#include "name_distr.h"
42#include "discover.h" 41#include "discover.h"
@@ -275,7 +274,7 @@ struct tipc_link *tipc_link_create(struct tipc_node *n_ptr,
275 link_init_max_pkt(l_ptr); 274 link_init_max_pkt(l_ptr);
276 275
277 l_ptr->next_out_no = 1; 276 l_ptr->next_out_no = 1;
278 INIT_LIST_HEAD(&l_ptr->waiting_ports); 277 __skb_queue_head_init(&l_ptr->waiting_sks);
279 278
280 link_reset_statistics(l_ptr); 279 link_reset_statistics(l_ptr);
281 280
@@ -322,66 +321,47 @@ void tipc_link_delete_list(unsigned int bearer_id, bool shutting_down)
322} 321}
323 322
324/** 323/**
325 * link_schedule_port - schedule port for deferred sending 324 * link_schedule_user - schedule user for wakeup after congestion
326 * @l_ptr: pointer to link 325 * @link: congested link
327 * @origport: reference to sending port 326 * @oport: sending port
328 * @sz: amount of data to be sent 327 * @chain_sz: size of buffer chain that was attempted sent
329 * 328 * @imp: importance of message attempted sent
330 * Schedules port for renewed sending of messages after link congestion 329 * Create pseudo msg to send back to user when congestion abates
331 * has abated.
332 */ 330 */
333static int link_schedule_port(struct tipc_link *l_ptr, u32 origport, u32 sz) 331static bool link_schedule_user(struct tipc_link *link, u32 oport,
332 uint chain_sz, uint imp)
334{ 333{
335 struct tipc_port *p_ptr; 334 struct sk_buff *buf;
336 struct tipc_sock *tsk;
337 335
338 spin_lock_bh(&tipc_port_list_lock); 336 buf = tipc_msg_create(SOCK_WAKEUP, 0, INT_H_SIZE, 0, tipc_own_addr,
339 p_ptr = tipc_port_lock(origport); 337 tipc_own_addr, oport, 0, 0);
340 if (p_ptr) { 338 if (!buf)
341 if (!list_empty(&p_ptr->wait_list)) 339 return false;
342 goto exit; 340 TIPC_SKB_CB(buf)->chain_sz = chain_sz;
343 tsk = tipc_port_to_sock(p_ptr); 341 TIPC_SKB_CB(buf)->chain_imp = imp;
344 tsk->link_cong = 1; 342 __skb_queue_tail(&link->waiting_sks, buf);
345 p_ptr->waiting_pkts = 1 + ((sz - 1) / l_ptr->max_pkt); 343 link->stats.link_congs++;
346 list_add_tail(&p_ptr->wait_list, &l_ptr->waiting_ports); 344 return true;
347 l_ptr->stats.link_congs++;
348exit:
349 tipc_port_unlock(p_ptr);
350 }
351 spin_unlock_bh(&tipc_port_list_lock);
352 return -ELINKCONG;
353} 345}
354 346
355void tipc_link_wakeup_ports(struct tipc_link *l_ptr, int all) 347/**
348 * link_prepare_wakeup - prepare users for wakeup after congestion
349 * @link: congested link
350 * Move a number of waiting users, as permitted by available space in
351 * the send queue, from link wait queue to node wait queue for wakeup
352 */
353static void link_prepare_wakeup(struct tipc_link *link)
356{ 354{
357 struct tipc_port *p_ptr; 355 struct sk_buff_head *wq = &link->waiting_sks;
358 struct tipc_sock *tsk; 356 struct sk_buff *buf;
359 struct tipc_port *temp_p_ptr; 357 uint pend_qsz = link->out_queue_size;
360 int win = l_ptr->queue_limit[0] - l_ptr->out_queue_size; 358
361 359 for (buf = skb_peek(wq); buf; buf = skb_peek(wq)) {
362 if (all) 360 if (pend_qsz >= link->queue_limit[TIPC_SKB_CB(buf)->chain_imp])
363 win = 100000;
364 if (win <= 0)
365 return;
366 if (!spin_trylock_bh(&tipc_port_list_lock))
367 return;
368 if (link_congested(l_ptr))
369 goto exit;
370 list_for_each_entry_safe(p_ptr, temp_p_ptr, &l_ptr->waiting_ports,
371 wait_list) {
372 if (win <= 0)
373 break; 361 break;
374 tsk = tipc_port_to_sock(p_ptr); 362 pend_qsz += TIPC_SKB_CB(buf)->chain_sz;
375 list_del_init(&p_ptr->wait_list); 363 __skb_queue_tail(&link->owner->waiting_sks, __skb_dequeue(wq));
376 spin_lock_bh(p_ptr->lock);
377 tsk->link_cong = 0;
378 tipc_sock_wakeup(tsk);
379 win -= p_ptr->waiting_pkts;
380 spin_unlock_bh(p_ptr->lock);
381 } 364 }
382
383exit:
384 spin_unlock_bh(&tipc_port_list_lock);
385} 365}
386 366
387/** 367/**
@@ -423,6 +403,7 @@ void tipc_link_reset(struct tipc_link *l_ptr)
423 u32 prev_state = l_ptr->state; 403 u32 prev_state = l_ptr->state;
424 u32 checkpoint = l_ptr->next_in_no; 404 u32 checkpoint = l_ptr->next_in_no;
425 int was_active_link = tipc_link_is_active(l_ptr); 405 int was_active_link = tipc_link_is_active(l_ptr);
406 struct tipc_node *owner = l_ptr->owner;
426 407
427 msg_set_session(l_ptr->pmsg, ((msg_session(l_ptr->pmsg) + 1) & 0xffff)); 408 msg_set_session(l_ptr->pmsg, ((msg_session(l_ptr->pmsg) + 1) & 0xffff));
428 409
@@ -450,9 +431,10 @@ void tipc_link_reset(struct tipc_link *l_ptr)
450 kfree_skb(l_ptr->proto_msg_queue); 431 kfree_skb(l_ptr->proto_msg_queue);
451 l_ptr->proto_msg_queue = NULL; 432 l_ptr->proto_msg_queue = NULL;
452 kfree_skb_list(l_ptr->oldest_deferred_in); 433 kfree_skb_list(l_ptr->oldest_deferred_in);
453 if (!list_empty(&l_ptr->waiting_ports)) 434 if (!skb_queue_empty(&l_ptr->waiting_sks)) {
454 tipc_link_wakeup_ports(l_ptr, 1); 435 skb_queue_splice_init(&l_ptr->waiting_sks, &owner->waiting_sks);
455 436 owner->action_flags |= TIPC_WAKEUP_USERS;
437 }
456 l_ptr->retransm_queue_head = 0; 438 l_ptr->retransm_queue_head = 0;
457 l_ptr->retransm_queue_size = 0; 439 l_ptr->retransm_queue_size = 0;
458 l_ptr->last_out = NULL; 440 l_ptr->last_out = NULL;
@@ -688,19 +670,23 @@ static void link_state_event(struct tipc_link *l_ptr, unsigned int event)
688static int tipc_link_cong(struct tipc_link *link, struct sk_buff *buf) 670static int tipc_link_cong(struct tipc_link *link, struct sk_buff *buf)
689{ 671{
690 struct tipc_msg *msg = buf_msg(buf); 672 struct tipc_msg *msg = buf_msg(buf);
691 uint psz = msg_size(msg);
692 uint imp = tipc_msg_tot_importance(msg); 673 uint imp = tipc_msg_tot_importance(msg);
693 u32 oport = msg_tot_origport(msg); 674 u32 oport = msg_tot_origport(msg);
694 675
695 if (likely(imp <= TIPC_CRITICAL_IMPORTANCE)) { 676 if (unlikely(imp > TIPC_CRITICAL_IMPORTANCE)) {
696 if (!msg_errcode(msg) && !msg_reroute_cnt(msg)) {
697 link_schedule_port(link, oport, psz);
698 return -ELINKCONG;
699 }
700 } else {
701 pr_warn("%s<%s>, send queue full", link_rst_msg, link->name); 677 pr_warn("%s<%s>, send queue full", link_rst_msg, link->name);
702 tipc_link_reset(link); 678 tipc_link_reset(link);
679 goto drop;
703 } 680 }
681 if (unlikely(msg_errcode(msg)))
682 goto drop;
683 if (unlikely(msg_reroute_cnt(msg)))
684 goto drop;
685 if (TIPC_SKB_CB(buf)->wakeup_pending)
686 return -ELINKCONG;
687 if (link_schedule_user(link, oport, TIPC_SKB_CB(buf)->chain_sz, imp))
688 return -ELINKCONG;
689drop:
704 kfree_skb_list(buf); 690 kfree_skb_list(buf);
705 return -EHOSTUNREACH; 691 return -EHOSTUNREACH;
706} 692}
@@ -1202,8 +1188,10 @@ void tipc_rcv(struct sk_buff *head, struct tipc_bearer *b_ptr)
1202 if (unlikely(l_ptr->next_out)) 1188 if (unlikely(l_ptr->next_out))
1203 tipc_link_push_queue(l_ptr); 1189 tipc_link_push_queue(l_ptr);
1204 1190
1205 if (unlikely(!list_empty(&l_ptr->waiting_ports))) 1191 if (released && !skb_queue_empty(&l_ptr->waiting_sks)) {
1206 tipc_link_wakeup_ports(l_ptr, 0); 1192 link_prepare_wakeup(l_ptr);
1193 l_ptr->owner->action_flags |= TIPC_WAKEUP_USERS;
1194 }
1207 1195
1208 /* Process the incoming packet */ 1196 /* Process the incoming packet */
1209 if (unlikely(!link_working_working(l_ptr))) { 1197 if (unlikely(!link_working_working(l_ptr))) {
diff --git a/net/tipc/link.h b/net/tipc/link.h
index 782983ccd323..b567a3427fda 100644
--- a/net/tipc/link.h
+++ b/net/tipc/link.h
@@ -1,7 +1,7 @@
1/* 1/*
2 * net/tipc/link.h: Include file for TIPC link code 2 * net/tipc/link.h: Include file for TIPC link code
3 * 3 *
4 * Copyright (c) 1995-2006, 2013, Ericsson AB 4 * Copyright (c) 1995-2006, 2013-2014, Ericsson AB
5 * Copyright (c) 2004-2005, 2010-2011, Wind River Systems 5 * Copyright (c) 2004-2005, 2010-2011, Wind River Systems
6 * All rights reserved. 6 * All rights reserved.
7 * 7 *
@@ -133,7 +133,7 @@ struct tipc_stats {
133 * @retransm_queue_size: number of messages to retransmit 133 * @retransm_queue_size: number of messages to retransmit
134 * @retransm_queue_head: sequence number of first message to retransmit 134 * @retransm_queue_head: sequence number of first message to retransmit
135 * @next_out: ptr to first unsent outbound message in queue 135 * @next_out: ptr to first unsent outbound message in queue
136 * @waiting_ports: linked list of ports waiting for link congestion to abate 136 * @waiting_sks: linked list of sockets waiting for link congestion to abate
137 * @long_msg_seq_no: next identifier to use for outbound fragmented messages 137 * @long_msg_seq_no: next identifier to use for outbound fragmented messages
138 * @reasm_buf: head of partially reassembled inbound message fragments 138 * @reasm_buf: head of partially reassembled inbound message fragments
139 * @stats: collects statistics regarding link activity 139 * @stats: collects statistics regarding link activity
@@ -194,7 +194,7 @@ struct tipc_link {
194 u32 retransm_queue_size; 194 u32 retransm_queue_size;
195 u32 retransm_queue_head; 195 u32 retransm_queue_head;
196 struct sk_buff *next_out; 196 struct sk_buff *next_out;
197 struct list_head waiting_ports; 197 struct sk_buff_head waiting_sks;
198 198
199 /* Fragmentation/reassembly */ 199 /* Fragmentation/reassembly */
200 u32 long_msg_seq_no; 200 u32 long_msg_seq_no;
@@ -235,7 +235,6 @@ void tipc_link_proto_xmit(struct tipc_link *l_ptr, u32 msg_typ, int prob,
235void tipc_link_push_queue(struct tipc_link *l_ptr); 235void tipc_link_push_queue(struct tipc_link *l_ptr);
236u32 tipc_link_defer_pkt(struct sk_buff **head, struct sk_buff **tail, 236u32 tipc_link_defer_pkt(struct sk_buff **head, struct sk_buff **tail,
237 struct sk_buff *buf); 237 struct sk_buff *buf);
238void tipc_link_wakeup_ports(struct tipc_link *l_ptr, int all);
239void tipc_link_set_queue_limits(struct tipc_link *l_ptr, u32 window); 238void tipc_link_set_queue_limits(struct tipc_link *l_ptr, u32 window);
240void tipc_link_retransmit(struct tipc_link *l_ptr, 239void tipc_link_retransmit(struct tipc_link *l_ptr,
241 struct sk_buff *start, u32 retransmits); 240 struct sk_buff *start, u32 retransmits);
diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index 9680be6d388a..74745a47d72a 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -56,8 +56,35 @@ void tipc_msg_init(struct tipc_msg *m, u32 user, u32 type, u32 hsize,
56 msg_set_size(m, hsize); 56 msg_set_size(m, hsize);
57 msg_set_prevnode(m, tipc_own_addr); 57 msg_set_prevnode(m, tipc_own_addr);
58 msg_set_type(m, type); 58 msg_set_type(m, type);
59 msg_set_orignode(m, tipc_own_addr); 59 if (hsize > SHORT_H_SIZE) {
60 msg_set_destnode(m, destnode); 60 msg_set_orignode(m, tipc_own_addr);
61 msg_set_destnode(m, destnode);
62 }
63}
64
65struct sk_buff *tipc_msg_create(uint user, uint type, uint hdr_sz,
66 uint data_sz, u32 dnode, u32 onode,
67 u32 dport, u32 oport, int errcode)
68{
69 struct tipc_msg *msg;
70 struct sk_buff *buf;
71
72 buf = tipc_buf_acquire(hdr_sz + data_sz);
73 if (unlikely(!buf))
74 return NULL;
75
76 msg = buf_msg(buf);
77 tipc_msg_init(msg, user, type, hdr_sz, dnode);
78 msg_set_size(msg, hdr_sz + data_sz);
79 msg_set_prevnode(msg, onode);
80 msg_set_origport(msg, oport);
81 msg_set_destport(msg, dport);
82 msg_set_errcode(msg, errcode);
83 if (hdr_sz > SHORT_H_SIZE) {
84 msg_set_orignode(msg, onode);
85 msg_set_destnode(msg, dnode);
86 }
87 return buf;
61} 88}
62 89
63/* tipc_buf_append(): Append a buffer to the fragment list of another buffer 90/* tipc_buf_append(): Append a buffer to the fragment list of another buffer
@@ -155,7 +182,7 @@ int tipc_msg_build(struct tipc_msg *mhdr, struct iovec const *iov,
155 struct sk_buff *buf, *prev; 182 struct sk_buff *buf, *prev;
156 char *pktpos; 183 char *pktpos;
157 int rc; 184 int rc;
158 185 uint chain_sz = 0;
159 msg_set_size(mhdr, msz); 186 msg_set_size(mhdr, msz);
160 187
161 /* No fragmentation needed? */ 188 /* No fragmentation needed? */
@@ -166,6 +193,7 @@ int tipc_msg_build(struct tipc_msg *mhdr, struct iovec const *iov,
166 return -ENOMEM; 193 return -ENOMEM;
167 skb_copy_to_linear_data(buf, mhdr, mhsz); 194 skb_copy_to_linear_data(buf, mhdr, mhsz);
168 pktpos = buf->data + mhsz; 195 pktpos = buf->data + mhsz;
196 TIPC_SKB_CB(buf)->chain_sz = 1;
169 if (!dsz || !memcpy_fromiovecend(pktpos, iov, offset, dsz)) 197 if (!dsz || !memcpy_fromiovecend(pktpos, iov, offset, dsz))
170 return dsz; 198 return dsz;
171 rc = -EFAULT; 199 rc = -EFAULT;
@@ -182,6 +210,7 @@ int tipc_msg_build(struct tipc_msg *mhdr, struct iovec const *iov,
182 *chain = buf = tipc_buf_acquire(pktmax); 210 *chain = buf = tipc_buf_acquire(pktmax);
183 if (!buf) 211 if (!buf)
184 return -ENOMEM; 212 return -ENOMEM;
213 chain_sz = 1;
185 pktpos = buf->data; 214 pktpos = buf->data;
186 skb_copy_to_linear_data(buf, &pkthdr, INT_H_SIZE); 215 skb_copy_to_linear_data(buf, &pkthdr, INT_H_SIZE);
187 pktpos += INT_H_SIZE; 216 pktpos += INT_H_SIZE;
@@ -215,6 +244,7 @@ int tipc_msg_build(struct tipc_msg *mhdr, struct iovec const *iov,
215 rc = -ENOMEM; 244 rc = -ENOMEM;
216 goto error; 245 goto error;
217 } 246 }
247 chain_sz++;
218 prev->next = buf; 248 prev->next = buf;
219 msg_set_type(&pkthdr, FRAGMENT); 249 msg_set_type(&pkthdr, FRAGMENT);
220 msg_set_size(&pkthdr, pktsz); 250 msg_set_size(&pkthdr, pktsz);
@@ -224,7 +254,7 @@ int tipc_msg_build(struct tipc_msg *mhdr, struct iovec const *iov,
224 pktrem = pktsz - INT_H_SIZE; 254 pktrem = pktsz - INT_H_SIZE;
225 255
226 } while (1); 256 } while (1);
227 257 TIPC_SKB_CB(*chain)->chain_sz = chain_sz;
228 msg_set_type(buf_msg(buf), LAST_FRAGMENT); 258 msg_set_type(buf_msg(buf), LAST_FRAGMENT);
229 return dsz; 259 return dsz;
230error: 260error:
diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index 462fa194a6af..0ea7b695ac4d 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -442,6 +442,7 @@ static inline struct tipc_msg *msg_get_wrapped(struct tipc_msg *m)
442#define NAME_DISTRIBUTOR 11 442#define NAME_DISTRIBUTOR 11
443#define MSG_FRAGMENTER 12 443#define MSG_FRAGMENTER 12
444#define LINK_CONFIG 13 444#define LINK_CONFIG 13
445#define SOCK_WAKEUP 14 /* pseudo user */
445 446
446/* 447/*
447 * Connection management protocol message types 448 * Connection management protocol message types
@@ -732,6 +733,10 @@ int tipc_msg_eval(struct sk_buff *buf, u32 *dnode);
732void tipc_msg_init(struct tipc_msg *m, u32 user, u32 type, u32 hsize, 733void tipc_msg_init(struct tipc_msg *m, u32 user, u32 type, u32 hsize,
733 u32 destnode); 734 u32 destnode);
734 735
736struct sk_buff *tipc_msg_create(uint user, uint type, uint hdr_sz,
737 uint data_sz, u32 dnode, u32 onode,
738 u32 dport, u32 oport, int errcode);
739
735int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf); 740int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf);
736 741
737bool tipc_msg_bundle(struct sk_buff *bbuf, struct sk_buff *buf, u32 mtu); 742bool tipc_msg_bundle(struct sk_buff *bbuf, struct sk_buff *buf, u32 mtu);
diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
index dcc15bcd5692..376d2bb51d8d 100644
--- a/net/tipc/name_distr.c
+++ b/net/tipc/name_distr.c
@@ -1,7 +1,7 @@
1/* 1/*
2 * net/tipc/name_distr.c: TIPC name distribution code 2 * net/tipc/name_distr.c: TIPC name distribution code
3 * 3 *
4 * Copyright (c) 2000-2006, Ericsson AB 4 * Copyright (c) 2000-2006, 2014, Ericsson AB
5 * Copyright (c) 2005, 2010-2011, Wind River Systems 5 * Copyright (c) 2005, 2010-2011, Wind River Systems
6 * All rights reserved. 6 * All rights reserved.
7 * 7 *
@@ -71,6 +71,21 @@ static struct publ_list *publ_lists[] = {
71}; 71};
72 72
73 73
74int sysctl_tipc_named_timeout __read_mostly = 2000;
75
76/**
77 * struct tipc_dist_queue - queue holding deferred name table updates
78 */
79static struct list_head tipc_dist_queue = LIST_HEAD_INIT(tipc_dist_queue);
80
81struct distr_queue_item {
82 struct distr_item i;
83 u32 dtype;
84 u32 node;
85 unsigned long expires;
86 struct list_head next;
87};
88
74/** 89/**
75 * publ_to_item - add publication info to a publication message 90 * publ_to_item - add publication info to a publication message
76 */ 91 */
@@ -263,54 +278,105 @@ static void named_purge_publ(struct publication *publ)
263} 278}
264 279
265/** 280/**
281 * tipc_update_nametbl - try to process a nametable update and notify
282 * subscribers
283 *
284 * tipc_nametbl_lock must be held.
285 * Returns the publication item if successful, otherwise NULL.
286 */
287static bool tipc_update_nametbl(struct distr_item *i, u32 node, u32 dtype)
288{
289 struct publication *publ = NULL;
290
291 if (dtype == PUBLICATION) {
292 publ = tipc_nametbl_insert_publ(ntohl(i->type), ntohl(i->lower),
293 ntohl(i->upper),
294 TIPC_CLUSTER_SCOPE, node,
295 ntohl(i->ref), ntohl(i->key));
296 if (publ) {
297 tipc_nodesub_subscribe(&publ->subscr, node, publ,
298 (net_ev_handler)
299 named_purge_publ);
300 return true;
301 }
302 } else if (dtype == WITHDRAWAL) {
303 publ = tipc_nametbl_remove_publ(ntohl(i->type), ntohl(i->lower),
304 node, ntohl(i->ref),
305 ntohl(i->key));
306 if (publ) {
307 tipc_nodesub_unsubscribe(&publ->subscr);
308 kfree(publ);
309 return true;
310 }
311 } else {
312 pr_warn("Unrecognized name table message received\n");
313 }
314 return false;
315}
316
317/**
318 * tipc_named_add_backlog - add a failed name table update to the backlog
319 *
320 */
321static void tipc_named_add_backlog(struct distr_item *i, u32 type, u32 node)
322{
323 struct distr_queue_item *e;
324 unsigned long now = get_jiffies_64();
325
326 e = kzalloc(sizeof(*e), GFP_ATOMIC);
327 if (!e)
328 return;
329 e->dtype = type;
330 e->node = node;
331 e->expires = now + msecs_to_jiffies(sysctl_tipc_named_timeout);
332 memcpy(e, i, sizeof(*i));
333 list_add_tail(&e->next, &tipc_dist_queue);
334}
335
336/**
337 * tipc_named_process_backlog - try to process any pending name table updates
338 * from the network.
339 */
340void tipc_named_process_backlog(void)
341{
342 struct distr_queue_item *e, *tmp;
343 char addr[16];
344 unsigned long now = get_jiffies_64();
345
346 list_for_each_entry_safe(e, tmp, &tipc_dist_queue, next) {
347 if (time_after(e->expires, now)) {
348 if (!tipc_update_nametbl(&e->i, e->node, e->dtype))
349 continue;
350 } else {
351 tipc_addr_string_fill(addr, e->node);
352 pr_warn_ratelimited("Dropping name table update (%d) of {%u, %u, %u} from %s key=%u\n",
353 e->dtype, ntohl(e->i.type),
354 ntohl(e->i.lower),
355 ntohl(e->i.upper),
356 addr, ntohl(e->i.key));
357 }
358 list_del(&e->next);
359 kfree(e);
360 }
361}
362
363/**
266 * tipc_named_rcv - process name table update message sent by another node 364 * tipc_named_rcv - process name table update message sent by another node
267 */ 365 */
268void tipc_named_rcv(struct sk_buff *buf) 366void tipc_named_rcv(struct sk_buff *buf)
269{ 367{
270 struct publication *publ;
271 struct tipc_msg *msg = buf_msg(buf); 368 struct tipc_msg *msg = buf_msg(buf);
272 struct distr_item *item = (struct distr_item *)msg_data(msg); 369 struct distr_item *item = (struct distr_item *)msg_data(msg);
273 u32 count = msg_data_sz(msg) / ITEM_SIZE; 370 u32 count = msg_data_sz(msg) / ITEM_SIZE;
371 u32 node = msg_orignode(msg);
274 372
275 write_lock_bh(&tipc_nametbl_lock); 373 write_lock_bh(&tipc_nametbl_lock);
276 while (count--) { 374 while (count--) {
277 if (msg_type(msg) == PUBLICATION) { 375 if (!tipc_update_nametbl(item, node, msg_type(msg)))
278 publ = tipc_nametbl_insert_publ(ntohl(item->type), 376 tipc_named_add_backlog(item, msg_type(msg), node);
279 ntohl(item->lower),
280 ntohl(item->upper),
281 TIPC_CLUSTER_SCOPE,
282 msg_orignode(msg),
283 ntohl(item->ref),
284 ntohl(item->key));
285 if (publ) {
286 tipc_nodesub_subscribe(&publ->subscr,
287 msg_orignode(msg),
288 publ,
289 (net_ev_handler)
290 named_purge_publ);
291 }
292 } else if (msg_type(msg) == WITHDRAWAL) {
293 publ = tipc_nametbl_remove_publ(ntohl(item->type),
294 ntohl(item->lower),
295 msg_orignode(msg),
296 ntohl(item->ref),
297 ntohl(item->key));
298
299 if (publ) {
300 tipc_nodesub_unsubscribe(&publ->subscr);
301 kfree(publ);
302 } else {
303 pr_err("Unable to remove publication by node 0x%x\n"
304 " (type=%u, lower=%u, ref=%u, key=%u)\n",
305 msg_orignode(msg), ntohl(item->type),
306 ntohl(item->lower), ntohl(item->ref),
307 ntohl(item->key));
308 }
309 } else {
310 pr_warn("Unrecognized name table message received\n");
311 }
312 item++; 377 item++;
313 } 378 }
379 tipc_named_process_backlog();
314 write_unlock_bh(&tipc_nametbl_lock); 380 write_unlock_bh(&tipc_nametbl_lock);
315 kfree_skb(buf); 381 kfree_skb(buf);
316} 382}
diff --git a/net/tipc/name_distr.h b/net/tipc/name_distr.h
index 8afe32b7fc9a..b9e75feb3434 100644
--- a/net/tipc/name_distr.h
+++ b/net/tipc/name_distr.h
@@ -73,5 +73,6 @@ void named_cluster_distribute(struct sk_buff *buf);
73void tipc_named_node_up(u32 dnode); 73void tipc_named_node_up(u32 dnode);
74void tipc_named_rcv(struct sk_buff *buf); 74void tipc_named_rcv(struct sk_buff *buf);
75void tipc_named_reinit(void); 75void tipc_named_reinit(void);
76void tipc_named_process_backlog(void);
76 77
77#endif 78#endif
diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index 9d7d37d95187..3a6a0a7c0759 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -39,7 +39,6 @@
39#include "name_table.h" 39#include "name_table.h"
40#include "name_distr.h" 40#include "name_distr.h"
41#include "subscr.h" 41#include "subscr.h"
42#include "port.h"
43 42
44#define TIPC_NAMETBL_SIZE 1024 /* must be a power of 2 */ 43#define TIPC_NAMETBL_SIZE 1024 /* must be a power of 2 */
45 44
@@ -262,8 +261,6 @@ static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq,
262 261
263 /* Lower end overlaps existing entry => need an exact match */ 262 /* Lower end overlaps existing entry => need an exact match */
264 if ((sseq->lower != lower) || (sseq->upper != upper)) { 263 if ((sseq->lower != lower) || (sseq->upper != upper)) {
265 pr_warn("Cannot publish {%u,%u,%u}, overlap error\n",
266 type, lower, upper);
267 return NULL; 264 return NULL;
268 } 265 }
269 266
@@ -285,8 +282,6 @@ static struct publication *tipc_nameseq_insert_publ(struct name_seq *nseq,
285 /* Fail if upper end overlaps into an existing entry */ 282 /* Fail if upper end overlaps into an existing entry */
286 if ((inspos < nseq->first_free) && 283 if ((inspos < nseq->first_free) &&
287 (upper >= nseq->sseqs[inspos].lower)) { 284 (upper >= nseq->sseqs[inspos].lower)) {
288 pr_warn("Cannot publish {%u,%u,%u}, overlap error\n",
289 type, lower, upper);
290 return NULL; 285 return NULL;
291 } 286 }
292 287
@@ -678,6 +673,8 @@ struct publication *tipc_nametbl_publish(u32 type, u32 lower, u32 upper,
678 if (likely(publ)) { 673 if (likely(publ)) {
679 table.local_publ_count++; 674 table.local_publ_count++;
680 buf = tipc_named_publish(publ); 675 buf = tipc_named_publish(publ);
676 /* Any pending external events? */
677 tipc_named_process_backlog();
681 } 678 }
682 write_unlock_bh(&tipc_nametbl_lock); 679 write_unlock_bh(&tipc_nametbl_lock);
683 680
@@ -699,6 +696,8 @@ int tipc_nametbl_withdraw(u32 type, u32 lower, u32 ref, u32 key)
699 if (likely(publ)) { 696 if (likely(publ)) {
700 table.local_publ_count--; 697 table.local_publ_count--;
701 buf = tipc_named_withdraw(publ); 698 buf = tipc_named_withdraw(publ);
699 /* Any pending external events? */
700 tipc_named_process_backlog();
702 write_unlock_bh(&tipc_nametbl_lock); 701 write_unlock_bh(&tipc_nametbl_lock);
703 list_del_init(&publ->pport_list); 702 list_del_init(&publ->pport_list);
704 kfree(publ); 703 kfree(publ);
diff --git a/net/tipc/net.c b/net/tipc/net.c
index 7fcc94998fea..93b9944a6a8b 100644
--- a/net/tipc/net.c
+++ b/net/tipc/net.c
@@ -38,7 +38,6 @@
38#include "net.h" 38#include "net.h"
39#include "name_distr.h" 39#include "name_distr.h"
40#include "subscr.h" 40#include "subscr.h"
41#include "port.h"
42#include "socket.h" 41#include "socket.h"
43#include "node.h" 42#include "node.h"
44#include "config.h" 43#include "config.h"
@@ -111,7 +110,7 @@ int tipc_net_start(u32 addr)
111 110
112 tipc_own_addr = addr; 111 tipc_own_addr = addr;
113 tipc_named_reinit(); 112 tipc_named_reinit();
114 tipc_port_reinit(); 113 tipc_sk_reinit();
115 res = tipc_bclink_init(); 114 res = tipc_bclink_init();
116 if (res) 115 if (res)
117 return res; 116 return res;
diff --git a/net/tipc/node.c b/net/tipc/node.c
index f7069299943f..90cee4a6fce4 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -38,6 +38,7 @@
38#include "config.h" 38#include "config.h"
39#include "node.h" 39#include "node.h"
40#include "name_distr.h" 40#include "name_distr.h"
41#include "socket.h"
41 42
42#define NODE_HTABLE_SIZE 512 43#define NODE_HTABLE_SIZE 512
43 44
@@ -50,6 +51,13 @@ static u32 tipc_num_nodes;
50static u32 tipc_num_links; 51static u32 tipc_num_links;
51static DEFINE_SPINLOCK(node_list_lock); 52static DEFINE_SPINLOCK(node_list_lock);
52 53
54struct tipc_sock_conn {
55 u32 port;
56 u32 peer_port;
57 u32 peer_node;
58 struct list_head list;
59};
60
53/* 61/*
54 * A trivial power-of-two bitmask technique is used for speed, since this 62 * A trivial power-of-two bitmask technique is used for speed, since this
55 * operation is done for every incoming TIPC packet. The number of hash table 63 * operation is done for every incoming TIPC packet. The number of hash table
@@ -100,6 +108,8 @@ struct tipc_node *tipc_node_create(u32 addr)
100 INIT_HLIST_NODE(&n_ptr->hash); 108 INIT_HLIST_NODE(&n_ptr->hash);
101 INIT_LIST_HEAD(&n_ptr->list); 109 INIT_LIST_HEAD(&n_ptr->list);
102 INIT_LIST_HEAD(&n_ptr->nsub); 110 INIT_LIST_HEAD(&n_ptr->nsub);
111 INIT_LIST_HEAD(&n_ptr->conn_sks);
112 __skb_queue_head_init(&n_ptr->waiting_sks);
103 113
104 hlist_add_head_rcu(&n_ptr->hash, &node_htable[tipc_hashfn(addr)]); 114 hlist_add_head_rcu(&n_ptr->hash, &node_htable[tipc_hashfn(addr)]);
105 115
@@ -136,6 +146,71 @@ void tipc_node_stop(void)
136 spin_unlock_bh(&node_list_lock); 146 spin_unlock_bh(&node_list_lock);
137} 147}
138 148
149int tipc_node_add_conn(u32 dnode, u32 port, u32 peer_port)
150{
151 struct tipc_node *node;
152 struct tipc_sock_conn *conn;
153
154 if (in_own_node(dnode))
155 return 0;
156
157 node = tipc_node_find(dnode);
158 if (!node) {
159 pr_warn("Connecting sock to node 0x%x failed\n", dnode);
160 return -EHOSTUNREACH;
161 }
162 conn = kmalloc(sizeof(*conn), GFP_ATOMIC);
163 if (!conn)
164 return -EHOSTUNREACH;
165 conn->peer_node = dnode;
166 conn->port = port;
167 conn->peer_port = peer_port;
168
169 tipc_node_lock(node);
170 list_add_tail(&conn->list, &node->conn_sks);
171 tipc_node_unlock(node);
172 return 0;
173}
174
175void tipc_node_remove_conn(u32 dnode, u32 port)
176{
177 struct tipc_node *node;
178 struct tipc_sock_conn *conn, *safe;
179
180 if (in_own_node(dnode))
181 return;
182
183 node = tipc_node_find(dnode);
184 if (!node)
185 return;
186
187 tipc_node_lock(node);
188 list_for_each_entry_safe(conn, safe, &node->conn_sks, list) {
189 if (port != conn->port)
190 continue;
191 list_del(&conn->list);
192 kfree(conn);
193 }
194 tipc_node_unlock(node);
195}
196
197void tipc_node_abort_sock_conns(struct list_head *conns)
198{
199 struct tipc_sock_conn *conn, *safe;
200 struct sk_buff *buf;
201
202 list_for_each_entry_safe(conn, safe, conns, list) {
203 buf = tipc_msg_create(TIPC_CRITICAL_IMPORTANCE, TIPC_CONN_MSG,
204 SHORT_H_SIZE, 0, tipc_own_addr,
205 conn->peer_node, conn->port,
206 conn->peer_port, TIPC_ERR_NO_NODE);
207 if (likely(buf))
208 tipc_sk_rcv(buf);
209 list_del(&conn->list);
210 kfree(conn);
211 }
212}
213
139/** 214/**
140 * tipc_node_link_up - handle addition of link 215 * tipc_node_link_up - handle addition of link
141 * 216 *
@@ -474,25 +549,45 @@ int tipc_node_get_linkname(u32 bearer_id, u32 addr, char *linkname, size_t len)
474void tipc_node_unlock(struct tipc_node *node) 549void tipc_node_unlock(struct tipc_node *node)
475{ 550{
476 LIST_HEAD(nsub_list); 551 LIST_HEAD(nsub_list);
552 LIST_HEAD(conn_sks);
553 struct sk_buff_head waiting_sks;
477 u32 addr = 0; 554 u32 addr = 0;
555 unsigned int flags = node->action_flags;
478 556
479 if (likely(!node->action_flags)) { 557 if (likely(!node->action_flags)) {
480 spin_unlock_bh(&node->lock); 558 spin_unlock_bh(&node->lock);
481 return; 559 return;
482 } 560 }
483 561
562 __skb_queue_head_init(&waiting_sks);
563 if (node->action_flags & TIPC_WAKEUP_USERS) {
564 skb_queue_splice_init(&node->waiting_sks, &waiting_sks);
565 node->action_flags &= ~TIPC_WAKEUP_USERS;
566 }
484 if (node->action_flags & TIPC_NOTIFY_NODE_DOWN) { 567 if (node->action_flags & TIPC_NOTIFY_NODE_DOWN) {
485 list_replace_init(&node->nsub, &nsub_list); 568 list_replace_init(&node->nsub, &nsub_list);
569 list_replace_init(&node->conn_sks, &conn_sks);
486 node->action_flags &= ~TIPC_NOTIFY_NODE_DOWN; 570 node->action_flags &= ~TIPC_NOTIFY_NODE_DOWN;
487 } 571 }
488 if (node->action_flags & TIPC_NOTIFY_NODE_UP) { 572 if (node->action_flags & TIPC_NOTIFY_NODE_UP) {
489 node->action_flags &= ~TIPC_NOTIFY_NODE_UP; 573 node->action_flags &= ~TIPC_NOTIFY_NODE_UP;
490 addr = node->addr; 574 addr = node->addr;
491 } 575 }
576 node->action_flags &= ~TIPC_WAKEUP_BCAST_USERS;
492 spin_unlock_bh(&node->lock); 577 spin_unlock_bh(&node->lock);
493 578
579 while (!skb_queue_empty(&waiting_sks))
580 tipc_sk_rcv(__skb_dequeue(&waiting_sks));
581
582 if (!list_empty(&conn_sks))
583 tipc_node_abort_sock_conns(&conn_sks);
584
494 if (!list_empty(&nsub_list)) 585 if (!list_empty(&nsub_list))
495 tipc_nodesub_notify(&nsub_list); 586 tipc_nodesub_notify(&nsub_list);
587
588 if (flags & TIPC_WAKEUP_BCAST_USERS)
589 tipc_bclink_wakeup_users();
590
496 if (addr) 591 if (addr)
497 tipc_named_node_up(addr); 592 tipc_named_node_up(addr);
498} 593}
diff --git a/net/tipc/node.h b/net/tipc/node.h
index b61716a8218e..67513c3c852c 100644
--- a/net/tipc/node.h
+++ b/net/tipc/node.h
@@ -58,7 +58,9 @@ enum {
58 TIPC_WAIT_PEER_LINKS_DOWN = (1 << 1), 58 TIPC_WAIT_PEER_LINKS_DOWN = (1 << 1),
59 TIPC_WAIT_OWN_LINKS_DOWN = (1 << 2), 59 TIPC_WAIT_OWN_LINKS_DOWN = (1 << 2),
60 TIPC_NOTIFY_NODE_DOWN = (1 << 3), 60 TIPC_NOTIFY_NODE_DOWN = (1 << 3),
61 TIPC_NOTIFY_NODE_UP = (1 << 4) 61 TIPC_NOTIFY_NODE_UP = (1 << 4),
62 TIPC_WAKEUP_USERS = (1 << 5),
63 TIPC_WAKEUP_BCAST_USERS = (1 << 6)
62}; 64};
63 65
64/** 66/**
@@ -115,6 +117,8 @@ struct tipc_node {
115 int working_links; 117 int working_links;
116 u32 signature; 118 u32 signature;
117 struct list_head nsub; 119 struct list_head nsub;
120 struct sk_buff_head waiting_sks;
121 struct list_head conn_sks;
118 struct rcu_head rcu; 122 struct rcu_head rcu;
119}; 123};
120 124
@@ -133,6 +137,8 @@ struct sk_buff *tipc_node_get_links(const void *req_tlv_area, int req_tlv_space)
133struct sk_buff *tipc_node_get_nodes(const void *req_tlv_area, int req_tlv_space); 137struct sk_buff *tipc_node_get_nodes(const void *req_tlv_area, int req_tlv_space);
134int tipc_node_get_linkname(u32 bearer_id, u32 node, char *linkname, size_t len); 138int tipc_node_get_linkname(u32 bearer_id, u32 node, char *linkname, size_t len);
135void tipc_node_unlock(struct tipc_node *node); 139void tipc_node_unlock(struct tipc_node *node);
140int tipc_node_add_conn(u32 dnode, u32 port, u32 peer_port);
141void tipc_node_remove_conn(u32 dnode, u32 port);
136 142
137static inline void tipc_node_lock(struct tipc_node *node) 143static inline void tipc_node_lock(struct tipc_node *node)
138{ 144{
diff --git a/net/tipc/port.c b/net/tipc/port.c
deleted file mode 100644
index 7e096a5e7701..000000000000
--- a/net/tipc/port.c
+++ /dev/null
@@ -1,514 +0,0 @@
1/*
2 * net/tipc/port.c: TIPC port code
3 *
4 * Copyright (c) 1992-2007, 2014, Ericsson AB
5 * Copyright (c) 2004-2008, 2010-2013, Wind River Systems
6 * All rights reserved.
7 *
8 * Redistribution and use in source and binary forms, with or without
9 * modification, are permitted provided that the following conditions are met:
10 *
11 * 1. Redistributions of source code must retain the above copyright
12 * notice, this list of conditions and the following disclaimer.
13 * 2. Redistributions in binary form must reproduce the above copyright
14 * notice, this list of conditions and the following disclaimer in the
15 * documentation and/or other materials provided with the distribution.
16 * 3. Neither the names of the copyright holders nor the names of its
17 * contributors may be used to endorse or promote products derived from
18 * this software without specific prior written permission.
19 *
20 * Alternatively, this software may be distributed under the terms of the
21 * GNU General Public License ("GPL") version 2 as published by the Free
22 * Software Foundation.
23 *
24 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
25 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
26 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
27 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
28 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
29 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
30 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
31 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
32 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
33 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
34 * POSSIBILITY OF SUCH DAMAGE.
35 */
36
37#include "core.h"
38#include "config.h"
39#include "port.h"
40#include "name_table.h"
41#include "socket.h"
42
43/* Connection management: */
44#define PROBING_INTERVAL 3600000 /* [ms] => 1 h */
45
46#define MAX_REJECT_SIZE 1024
47
48DEFINE_SPINLOCK(tipc_port_list_lock);
49
50static LIST_HEAD(ports);
51static void port_handle_node_down(unsigned long ref);
52static struct sk_buff *port_build_self_abort_msg(struct tipc_port *, u32 err);
53static struct sk_buff *port_build_peer_abort_msg(struct tipc_port *, u32 err);
54static void port_timeout(unsigned long ref);
55
56/**
57 * tipc_port_peer_msg - verify message was sent by connected port's peer
58 *
59 * Handles cases where the node's network address has changed from
60 * the default of <0.0.0> to its configured setting.
61 */
62int tipc_port_peer_msg(struct tipc_port *p_ptr, struct tipc_msg *msg)
63{
64 u32 peernode;
65 u32 orignode;
66
67 if (msg_origport(msg) != tipc_port_peerport(p_ptr))
68 return 0;
69
70 orignode = msg_orignode(msg);
71 peernode = tipc_port_peernode(p_ptr);
72 return (orignode == peernode) ||
73 (!orignode && (peernode == tipc_own_addr)) ||
74 (!peernode && (orignode == tipc_own_addr));
75}
76
77/* tipc_port_init - intiate TIPC port and lock it
78 *
79 * Returns obtained reference if initialization is successful, zero otherwise
80 */
81u32 tipc_port_init(struct tipc_port *p_ptr,
82 const unsigned int importance)
83{
84 struct tipc_msg *msg;
85 u32 ref;
86
87 ref = tipc_ref_acquire(p_ptr, &p_ptr->lock);
88 if (!ref) {
89 pr_warn("Port registration failed, ref. table exhausted\n");
90 return 0;
91 }
92
93 p_ptr->max_pkt = MAX_PKT_DEFAULT;
94 p_ptr->ref = ref;
95 INIT_LIST_HEAD(&p_ptr->wait_list);
96 INIT_LIST_HEAD(&p_ptr->subscription.nodesub_list);
97 k_init_timer(&p_ptr->timer, (Handler)port_timeout, ref);
98 INIT_LIST_HEAD(&p_ptr->publications);
99 INIT_LIST_HEAD(&p_ptr->port_list);
100
101 /*
102 * Must hold port list lock while initializing message header template
103 * to ensure a change to node's own network address doesn't result
104 * in template containing out-dated network address information
105 */
106 spin_lock_bh(&tipc_port_list_lock);
107 msg = &p_ptr->phdr;
108 tipc_msg_init(msg, importance, TIPC_NAMED_MSG, NAMED_H_SIZE, 0);
109 msg_set_origport(msg, ref);
110 list_add_tail(&p_ptr->port_list, &ports);
111 spin_unlock_bh(&tipc_port_list_lock);
112 return ref;
113}
114
115void tipc_port_destroy(struct tipc_port *p_ptr)
116{
117 struct sk_buff *buf = NULL;
118 struct tipc_msg *msg = NULL;
119 u32 peer;
120
121 tipc_withdraw(p_ptr, 0, NULL);
122
123 spin_lock_bh(p_ptr->lock);
124 tipc_ref_discard(p_ptr->ref);
125 spin_unlock_bh(p_ptr->lock);
126
127 k_cancel_timer(&p_ptr->timer);
128 if (p_ptr->connected) {
129 buf = port_build_peer_abort_msg(p_ptr, TIPC_ERR_NO_PORT);
130 tipc_nodesub_unsubscribe(&p_ptr->subscription);
131 msg = buf_msg(buf);
132 peer = msg_destnode(msg);
133 tipc_link_xmit(buf, peer, msg_link_selector(msg));
134 }
135 spin_lock_bh(&tipc_port_list_lock);
136 list_del(&p_ptr->port_list);
137 list_del(&p_ptr->wait_list);
138 spin_unlock_bh(&tipc_port_list_lock);
139 k_term_timer(&p_ptr->timer);
140}
141
142/*
143 * port_build_proto_msg(): create connection protocol message for port
144 *
145 * On entry the port must be locked and connected.
146 */
147static struct sk_buff *port_build_proto_msg(struct tipc_port *p_ptr,
148 u32 type, u32 ack)
149{
150 struct sk_buff *buf;
151 struct tipc_msg *msg;
152
153 buf = tipc_buf_acquire(INT_H_SIZE);
154 if (buf) {
155 msg = buf_msg(buf);
156 tipc_msg_init(msg, CONN_MANAGER, type, INT_H_SIZE,
157 tipc_port_peernode(p_ptr));
158 msg_set_destport(msg, tipc_port_peerport(p_ptr));
159 msg_set_origport(msg, p_ptr->ref);
160 msg_set_msgcnt(msg, ack);
161 buf->next = NULL;
162 }
163 return buf;
164}
165
166static void port_timeout(unsigned long ref)
167{
168 struct tipc_port *p_ptr = tipc_port_lock(ref);
169 struct sk_buff *buf = NULL;
170 struct tipc_msg *msg = NULL;
171
172 if (!p_ptr)
173 return;
174
175 if (!p_ptr->connected) {
176 tipc_port_unlock(p_ptr);
177 return;
178 }
179
180 /* Last probe answered ? */
181 if (p_ptr->probing_state == TIPC_CONN_PROBING) {
182 buf = port_build_self_abort_msg(p_ptr, TIPC_ERR_NO_PORT);
183 } else {
184 buf = port_build_proto_msg(p_ptr, CONN_PROBE, 0);
185 p_ptr->probing_state = TIPC_CONN_PROBING;
186 k_start_timer(&p_ptr->timer, p_ptr->probing_interval);
187 }
188 tipc_port_unlock(p_ptr);
189 msg = buf_msg(buf);
190 tipc_link_xmit(buf, msg_destnode(msg), msg_link_selector(msg));
191}
192
193
194static void port_handle_node_down(unsigned long ref)
195{
196 struct tipc_port *p_ptr = tipc_port_lock(ref);
197 struct sk_buff *buf = NULL;
198 struct tipc_msg *msg = NULL;
199
200 if (!p_ptr)
201 return;
202 buf = port_build_self_abort_msg(p_ptr, TIPC_ERR_NO_NODE);
203 tipc_port_unlock(p_ptr);
204 msg = buf_msg(buf);
205 tipc_link_xmit(buf, msg_destnode(msg), msg_link_selector(msg));
206}
207
208
209static struct sk_buff *port_build_self_abort_msg(struct tipc_port *p_ptr, u32 err)
210{
211 struct sk_buff *buf = port_build_peer_abort_msg(p_ptr, err);
212
213 if (buf) {
214 struct tipc_msg *msg = buf_msg(buf);
215 msg_swap_words(msg, 4, 5);
216 msg_swap_words(msg, 6, 7);
217 buf->next = NULL;
218 }
219 return buf;
220}
221
222
223static struct sk_buff *port_build_peer_abort_msg(struct tipc_port *p_ptr, u32 err)
224{
225 struct sk_buff *buf;
226 struct tipc_msg *msg;
227 u32 imp;
228
229 if (!p_ptr->connected)
230 return NULL;
231
232 buf = tipc_buf_acquire(BASIC_H_SIZE);
233 if (buf) {
234 msg = buf_msg(buf);
235 memcpy(msg, &p_ptr->phdr, BASIC_H_SIZE);
236 msg_set_hdr_sz(msg, BASIC_H_SIZE);
237 msg_set_size(msg, BASIC_H_SIZE);
238 imp = msg_importance(msg);
239 if (imp < TIPC_CRITICAL_IMPORTANCE)
240 msg_set_importance(msg, ++imp);
241 msg_set_errcode(msg, err);
242 buf->next = NULL;
243 }
244 return buf;
245}
246
247static int port_print(struct tipc_port *p_ptr, char *buf, int len, int full_id)
248{
249 struct publication *publ;
250 int ret;
251
252 if (full_id)
253 ret = tipc_snprintf(buf, len, "<%u.%u.%u:%u>:",
254 tipc_zone(tipc_own_addr),
255 tipc_cluster(tipc_own_addr),
256 tipc_node(tipc_own_addr), p_ptr->ref);
257 else
258 ret = tipc_snprintf(buf, len, "%-10u:", p_ptr->ref);
259
260 if (p_ptr->connected) {
261 u32 dport = tipc_port_peerport(p_ptr);
262 u32 destnode = tipc_port_peernode(p_ptr);
263
264 ret += tipc_snprintf(buf + ret, len - ret,
265 " connected to <%u.%u.%u:%u>",
266 tipc_zone(destnode),
267 tipc_cluster(destnode),
268 tipc_node(destnode), dport);
269 if (p_ptr->conn_type != 0)
270 ret += tipc_snprintf(buf + ret, len - ret,
271 " via {%u,%u}", p_ptr->conn_type,
272 p_ptr->conn_instance);
273 } else if (p_ptr->published) {
274 ret += tipc_snprintf(buf + ret, len - ret, " bound to");
275 list_for_each_entry(publ, &p_ptr->publications, pport_list) {
276 if (publ->lower == publ->upper)
277 ret += tipc_snprintf(buf + ret, len - ret,
278 " {%u,%u}", publ->type,
279 publ->lower);
280 else
281 ret += tipc_snprintf(buf + ret, len - ret,
282 " {%u,%u,%u}", publ->type,
283 publ->lower, publ->upper);
284 }
285 }
286 ret += tipc_snprintf(buf + ret, len - ret, "\n");
287 return ret;
288}
289
290struct sk_buff *tipc_port_get_ports(void)
291{
292 struct sk_buff *buf;
293 struct tlv_desc *rep_tlv;
294 char *pb;
295 int pb_len;
296 struct tipc_port *p_ptr;
297 int str_len = 0;
298
299 buf = tipc_cfg_reply_alloc(TLV_SPACE(ULTRA_STRING_MAX_LEN));
300 if (!buf)
301 return NULL;
302 rep_tlv = (struct tlv_desc *)buf->data;
303 pb = TLV_DATA(rep_tlv);
304 pb_len = ULTRA_STRING_MAX_LEN;
305
306 spin_lock_bh(&tipc_port_list_lock);
307 list_for_each_entry(p_ptr, &ports, port_list) {
308 spin_lock_bh(p_ptr->lock);
309 str_len += port_print(p_ptr, pb, pb_len, 0);
310 spin_unlock_bh(p_ptr->lock);
311 }
312 spin_unlock_bh(&tipc_port_list_lock);
313 str_len += 1; /* for "\0" */
314 skb_put(buf, TLV_SPACE(str_len));
315 TLV_SET(rep_tlv, TIPC_TLV_ULTRA_STRING, NULL, str_len);
316
317 return buf;
318}
319
320void tipc_port_reinit(void)
321{
322 struct tipc_port *p_ptr;
323 struct tipc_msg *msg;
324
325 spin_lock_bh(&tipc_port_list_lock);
326 list_for_each_entry(p_ptr, &ports, port_list) {
327 msg = &p_ptr->phdr;
328 msg_set_prevnode(msg, tipc_own_addr);
329 msg_set_orignode(msg, tipc_own_addr);
330 }
331 spin_unlock_bh(&tipc_port_list_lock);
332}
333
334void tipc_acknowledge(u32 ref, u32 ack)
335{
336 struct tipc_port *p_ptr;
337 struct sk_buff *buf = NULL;
338 struct tipc_msg *msg;
339
340 p_ptr = tipc_port_lock(ref);
341 if (!p_ptr)
342 return;
343 if (p_ptr->connected)
344 buf = port_build_proto_msg(p_ptr, CONN_ACK, ack);
345
346 tipc_port_unlock(p_ptr);
347 if (!buf)
348 return;
349 msg = buf_msg(buf);
350 tipc_link_xmit(buf, msg_destnode(msg), msg_link_selector(msg));
351}
352
353int tipc_publish(struct tipc_port *p_ptr, unsigned int scope,
354 struct tipc_name_seq const *seq)
355{
356 struct publication *publ;
357 u32 key;
358
359 if (p_ptr->connected)
360 return -EINVAL;
361 key = p_ptr->ref + p_ptr->pub_count + 1;
362 if (key == p_ptr->ref)
363 return -EADDRINUSE;
364
365 publ = tipc_nametbl_publish(seq->type, seq->lower, seq->upper,
366 scope, p_ptr->ref, key);
367 if (publ) {
368 list_add(&publ->pport_list, &p_ptr->publications);
369 p_ptr->pub_count++;
370 p_ptr->published = 1;
371 return 0;
372 }
373 return -EINVAL;
374}
375
376int tipc_withdraw(struct tipc_port *p_ptr, unsigned int scope,
377 struct tipc_name_seq const *seq)
378{
379 struct publication *publ;
380 struct publication *tpubl;
381 int res = -EINVAL;
382
383 if (!seq) {
384 list_for_each_entry_safe(publ, tpubl,
385 &p_ptr->publications, pport_list) {
386 tipc_nametbl_withdraw(publ->type, publ->lower,
387 publ->ref, publ->key);
388 }
389 res = 0;
390 } else {
391 list_for_each_entry_safe(publ, tpubl,
392 &p_ptr->publications, pport_list) {
393 if (publ->scope != scope)
394 continue;
395 if (publ->type != seq->type)
396 continue;
397 if (publ->lower != seq->lower)
398 continue;
399 if (publ->upper != seq->upper)
400 break;
401 tipc_nametbl_withdraw(publ->type, publ->lower,
402 publ->ref, publ->key);
403 res = 0;
404 break;
405 }
406 }
407 if (list_empty(&p_ptr->publications))
408 p_ptr->published = 0;
409 return res;
410}
411
412int tipc_port_connect(u32 ref, struct tipc_portid const *peer)
413{
414 struct tipc_port *p_ptr;
415 int res;
416
417 p_ptr = tipc_port_lock(ref);
418 if (!p_ptr)
419 return -EINVAL;
420 res = __tipc_port_connect(ref, p_ptr, peer);
421 tipc_port_unlock(p_ptr);
422 return res;
423}
424
425/*
426 * __tipc_port_connect - connect to a remote peer
427 *
428 * Port must be locked.
429 */
430int __tipc_port_connect(u32 ref, struct tipc_port *p_ptr,
431 struct tipc_portid const *peer)
432{
433 struct tipc_msg *msg;
434 int res = -EINVAL;
435
436 if (p_ptr->published || p_ptr->connected)
437 goto exit;
438 if (!peer->ref)
439 goto exit;
440
441 msg = &p_ptr->phdr;
442 msg_set_destnode(msg, peer->node);
443 msg_set_destport(msg, peer->ref);
444 msg_set_type(msg, TIPC_CONN_MSG);
445 msg_set_lookup_scope(msg, 0);
446 msg_set_hdr_sz(msg, SHORT_H_SIZE);
447
448 p_ptr->probing_interval = PROBING_INTERVAL;
449 p_ptr->probing_state = TIPC_CONN_OK;
450 p_ptr->connected = 1;
451 k_start_timer(&p_ptr->timer, p_ptr->probing_interval);
452
453 tipc_nodesub_subscribe(&p_ptr->subscription, peer->node,
454 (void *)(unsigned long)ref,
455 (net_ev_handler)port_handle_node_down);
456 res = 0;
457exit:
458 p_ptr->max_pkt = tipc_node_get_mtu(peer->node, ref);
459 return res;
460}
461
462/*
463 * __tipc_disconnect - disconnect port from peer
464 *
465 * Port must be locked.
466 */
467int __tipc_port_disconnect(struct tipc_port *tp_ptr)
468{
469 if (tp_ptr->connected) {
470 tp_ptr->connected = 0;
471 /* let timer expire on it's own to avoid deadlock! */
472 tipc_nodesub_unsubscribe(&tp_ptr->subscription);
473 return 0;
474 }
475
476 return -ENOTCONN;
477}
478
479/*
480 * tipc_port_disconnect(): Disconnect port form peer.
481 * This is a node local operation.
482 */
483int tipc_port_disconnect(u32 ref)
484{
485 struct tipc_port *p_ptr;
486 int res;
487
488 p_ptr = tipc_port_lock(ref);
489 if (!p_ptr)
490 return -EINVAL;
491 res = __tipc_port_disconnect(p_ptr);
492 tipc_port_unlock(p_ptr);
493 return res;
494}
495
496/*
497 * tipc_port_shutdown(): Send a SHUTDOWN msg to peer and disconnect
498 */
499int tipc_port_shutdown(u32 ref)
500{
501 struct tipc_msg *msg;
502 struct tipc_port *p_ptr;
503 struct sk_buff *buf = NULL;
504
505 p_ptr = tipc_port_lock(ref);
506 if (!p_ptr)
507 return -EINVAL;
508
509 buf = port_build_peer_abort_msg(p_ptr, TIPC_CONN_SHUTDOWN);
510 tipc_port_unlock(p_ptr);
511 msg = buf_msg(buf);
512 tipc_link_xmit(buf, msg_destnode(msg), msg_link_selector(msg));
513 return tipc_port_disconnect(ref);
514}
diff --git a/net/tipc/port.h b/net/tipc/port.h
deleted file mode 100644
index 3087da39ee47..000000000000
--- a/net/tipc/port.h
+++ /dev/null
@@ -1,190 +0,0 @@
1/*
2 * net/tipc/port.h: Include file for TIPC port code
3 *
4 * Copyright (c) 1994-2007, 2014, Ericsson AB
5 * Copyright (c) 2004-2007, 2010-2013, Wind River Systems
6 * All rights reserved.
7 *
8 * Redistribution and use in source and binary forms, with or without
9 * modification, are permitted provided that the following conditions are met:
10 *
11 * 1. Redistributions of source code must retain the above copyright
12 * notice, this list of conditions and the following disclaimer.
13 * 2. Redistributions in binary form must reproduce the above copyright
14 * notice, this list of conditions and the following disclaimer in the
15 * documentation and/or other materials provided with the distribution.
16 * 3. Neither the names of the copyright holders nor the names of its
17 * contributors may be used to endorse or promote products derived from
18 * this software without specific prior written permission.
19 *
20 * Alternatively, this software may be distributed under the terms of the
21 * GNU General Public License ("GPL") version 2 as published by the Free
22 * Software Foundation.
23 *
24 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
25 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
26 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
27 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
28 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
29 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
30 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
31 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
32 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
33 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
34 * POSSIBILITY OF SUCH DAMAGE.
35 */
36
37#ifndef _TIPC_PORT_H
38#define _TIPC_PORT_H
39
40#include "ref.h"
41#include "net.h"
42#include "msg.h"
43#include "node_subscr.h"
44
45#define TIPC_CONNACK_INTV 256
46#define TIPC_FLOWCTRL_WIN (TIPC_CONNACK_INTV * 2)
47#define TIPC_CONN_OVERLOAD_LIMIT ((TIPC_FLOWCTRL_WIN * 2 + 1) * \
48 SKB_TRUESIZE(TIPC_MAX_USER_MSG_SIZE))
49
50/**
51 * struct tipc_port - TIPC port structure
52 * @lock: pointer to spinlock for controlling access to port
53 * @connected: non-zero if port is currently connected to a peer port
54 * @conn_type: TIPC type used when connection was established
55 * @conn_instance: TIPC instance used when connection was established
56 * @published: non-zero if port has one or more associated names
57 * @max_pkt: maximum packet size "hint" used when building messages sent by port
58 * @ref: unique reference to port in TIPC object registry
59 * @phdr: preformatted message header used when sending messages
60 * @port_list: adjacent ports in TIPC's global list of ports
61 * @wait_list: adjacent ports in list of ports waiting on link congestion
62 * @waiting_pkts:
63 * @publications: list of publications for port
64 * @pub_count: total # of publications port has made during its lifetime
65 * @probing_state:
66 * @probing_interval:
67 * @timer_ref:
68 * @subscription: "node down" subscription used to terminate failed connections
69 */
70struct tipc_port {
71 spinlock_t *lock;
72 int connected;
73 u32 conn_type;
74 u32 conn_instance;
75 int published;
76 u32 max_pkt;
77 u32 ref;
78 struct tipc_msg phdr;
79 struct list_head port_list;
80 struct list_head wait_list;
81 u32 waiting_pkts;
82 struct list_head publications;
83 u32 pub_count;
84 u32 probing_state;
85 u32 probing_interval;
86 struct timer_list timer;
87 struct tipc_node_subscr subscription;
88};
89
90extern spinlock_t tipc_port_list_lock;
91struct tipc_port_list;
92
93/*
94 * TIPC port manipulation routines
95 */
96u32 tipc_port_init(struct tipc_port *p_ptr,
97 const unsigned int importance);
98
99void tipc_acknowledge(u32 port_ref, u32 ack);
100
101void tipc_port_destroy(struct tipc_port *p_ptr);
102
103int tipc_publish(struct tipc_port *p_ptr, unsigned int scope,
104 struct tipc_name_seq const *name_seq);
105
106int tipc_withdraw(struct tipc_port *p_ptr, unsigned int scope,
107 struct tipc_name_seq const *name_seq);
108
109int tipc_port_connect(u32 portref, struct tipc_portid const *port);
110
111int tipc_port_disconnect(u32 portref);
112
113int tipc_port_shutdown(u32 ref);
114
115/*
116 * The following routines require that the port be locked on entry
117 */
118int __tipc_port_disconnect(struct tipc_port *tp_ptr);
119int __tipc_port_connect(u32 ref, struct tipc_port *p_ptr,
120 struct tipc_portid const *peer);
121int tipc_port_peer_msg(struct tipc_port *p_ptr, struct tipc_msg *msg);
122
123struct sk_buff *tipc_port_get_ports(void);
124void tipc_port_reinit(void);
125
126/**
127 * tipc_port_lock - lock port instance referred to and return its pointer
128 */
129static inline struct tipc_port *tipc_port_lock(u32 ref)
130{
131 return (struct tipc_port *)tipc_ref_lock(ref);
132}
133
134/**
135 * tipc_port_unlock - unlock a port instance
136 *
137 * Can use pointer instead of tipc_ref_unlock() since port is already locked.
138 */
139static inline void tipc_port_unlock(struct tipc_port *p_ptr)
140{
141 spin_unlock_bh(p_ptr->lock);
142}
143
144static inline u32 tipc_port_peernode(struct tipc_port *p_ptr)
145{
146 return msg_destnode(&p_ptr->phdr);
147}
148
149static inline u32 tipc_port_peerport(struct tipc_port *p_ptr)
150{
151 return msg_destport(&p_ptr->phdr);
152}
153
154static inline bool tipc_port_unreliable(struct tipc_port *port)
155{
156 return msg_src_droppable(&port->phdr) != 0;
157}
158
159static inline void tipc_port_set_unreliable(struct tipc_port *port,
160 bool unreliable)
161{
162 msg_set_src_droppable(&port->phdr, unreliable ? 1 : 0);
163}
164
165static inline bool tipc_port_unreturnable(struct tipc_port *port)
166{
167 return msg_dest_droppable(&port->phdr) != 0;
168}
169
170static inline void tipc_port_set_unreturnable(struct tipc_port *port,
171 bool unreturnable)
172{
173 msg_set_dest_droppable(&port->phdr, unreturnable ? 1 : 0);
174}
175
176
177static inline int tipc_port_importance(struct tipc_port *port)
178{
179 return msg_importance(&port->phdr);
180}
181
182static inline int tipc_port_set_importance(struct tipc_port *port, int imp)
183{
184 if (imp > TIPC_CRITICAL_IMPORTANCE)
185 return -EINVAL;
186 msg_set_importance(&port->phdr, (u32)imp);
187 return 0;
188}
189
190#endif
diff --git a/net/tipc/ref.c b/net/tipc/ref.c
deleted file mode 100644
index 3d4ecd754eee..000000000000
--- a/net/tipc/ref.c
+++ /dev/null
@@ -1,266 +0,0 @@
1/*
2 * net/tipc/ref.c: TIPC object registry code
3 *
4 * Copyright (c) 1991-2006, Ericsson AB
5 * Copyright (c) 2004-2007, Wind River Systems
6 * All rights reserved.
7 *
8 * Redistribution and use in source and binary forms, with or without
9 * modification, are permitted provided that the following conditions are met:
10 *
11 * 1. Redistributions of source code must retain the above copyright
12 * notice, this list of conditions and the following disclaimer.
13 * 2. Redistributions in binary form must reproduce the above copyright
14 * notice, this list of conditions and the following disclaimer in the
15 * documentation and/or other materials provided with the distribution.
16 * 3. Neither the names of the copyright holders nor the names of its
17 * contributors may be used to endorse or promote products derived from
18 * this software without specific prior written permission.
19 *
20 * Alternatively, this software may be distributed under the terms of the
21 * GNU General Public License ("GPL") version 2 as published by the Free
22 * Software Foundation.
23 *
24 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
25 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
26 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
27 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
28 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
29 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
30 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
31 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
32 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
33 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
34 * POSSIBILITY OF SUCH DAMAGE.
35 */
36
37#include "core.h"
38#include "ref.h"
39
40/**
41 * struct reference - TIPC object reference entry
42 * @object: pointer to object associated with reference entry
43 * @lock: spinlock controlling access to object
44 * @ref: reference value for object (combines instance & array index info)
45 */
46struct reference {
47 void *object;
48 spinlock_t lock;
49 u32 ref;
50};
51
52/**
53 * struct tipc_ref_table - table of TIPC object reference entries
54 * @entries: pointer to array of reference entries
55 * @capacity: array index of first unusable entry
56 * @init_point: array index of first uninitialized entry
57 * @first_free: array index of first unused object reference entry
58 * @last_free: array index of last unused object reference entry
59 * @index_mask: bitmask for array index portion of reference values
60 * @start_mask: initial value for instance value portion of reference values
61 */
62struct ref_table {
63 struct reference *entries;
64 u32 capacity;
65 u32 init_point;
66 u32 first_free;
67 u32 last_free;
68 u32 index_mask;
69 u32 start_mask;
70};
71
72/*
73 * Object reference table consists of 2**N entries.
74 *
75 * State Object ptr Reference
76 * ----- ---------- ---------
77 * In use non-NULL XXXX|own index
78 * (XXXX changes each time entry is acquired)
79 * Free NULL YYYY|next free index
80 * (YYYY is one more than last used XXXX)
81 * Uninitialized NULL 0
82 *
83 * Entry 0 is not used; this allows index 0 to denote the end of the free list.
84 *
85 * Note that a reference value of 0 does not necessarily indicate that an
86 * entry is uninitialized, since the last entry in the free list could also
87 * have a reference value of 0 (although this is unlikely).
88 */
89
90static struct ref_table tipc_ref_table;
91
92static DEFINE_SPINLOCK(ref_table_lock);
93
94/**
95 * tipc_ref_table_init - create reference table for objects
96 */
97int tipc_ref_table_init(u32 requested_size, u32 start)
98{
99 struct reference *table;
100 u32 actual_size;
101
102 /* account for unused entry, then round up size to a power of 2 */
103
104 requested_size++;
105 for (actual_size = 16; actual_size < requested_size; actual_size <<= 1)
106 /* do nothing */ ;
107
108 /* allocate table & mark all entries as uninitialized */
109 table = vzalloc(actual_size * sizeof(struct reference));
110 if (table == NULL)
111 return -ENOMEM;
112
113 tipc_ref_table.entries = table;
114 tipc_ref_table.capacity = requested_size;
115 tipc_ref_table.init_point = 1;
116 tipc_ref_table.first_free = 0;
117 tipc_ref_table.last_free = 0;
118 tipc_ref_table.index_mask = actual_size - 1;
119 tipc_ref_table.start_mask = start & ~tipc_ref_table.index_mask;
120
121 return 0;
122}
123
124/**
125 * tipc_ref_table_stop - destroy reference table for objects
126 */
127void tipc_ref_table_stop(void)
128{
129 vfree(tipc_ref_table.entries);
130 tipc_ref_table.entries = NULL;
131}
132
133/**
134 * tipc_ref_acquire - create reference to an object
135 *
136 * Register an object pointer in reference table and lock the object.
137 * Returns a unique reference value that is used from then on to retrieve the
138 * object pointer, or to determine that the object has been deregistered.
139 *
140 * Note: The object is returned in the locked state so that the caller can
141 * register a partially initialized object, without running the risk that
142 * the object will be accessed before initialization is complete.
143 */
144u32 tipc_ref_acquire(void *object, spinlock_t **lock)
145{
146 u32 index;
147 u32 index_mask;
148 u32 next_plus_upper;
149 u32 ref;
150 struct reference *entry = NULL;
151
152 if (!object) {
153 pr_err("Attempt to acquire ref. to non-existent obj\n");
154 return 0;
155 }
156 if (!tipc_ref_table.entries) {
157 pr_err("Ref. table not found in acquisition attempt\n");
158 return 0;
159 }
160
161 /* take a free entry, if available; otherwise initialize a new entry */
162 spin_lock_bh(&ref_table_lock);
163 if (tipc_ref_table.first_free) {
164 index = tipc_ref_table.first_free;
165 entry = &(tipc_ref_table.entries[index]);
166 index_mask = tipc_ref_table.index_mask;
167 next_plus_upper = entry->ref;
168 tipc_ref_table.first_free = next_plus_upper & index_mask;
169 ref = (next_plus_upper & ~index_mask) + index;
170 } else if (tipc_ref_table.init_point < tipc_ref_table.capacity) {
171 index = tipc_ref_table.init_point++;
172 entry = &(tipc_ref_table.entries[index]);
173 spin_lock_init(&entry->lock);
174 ref = tipc_ref_table.start_mask + index;
175 } else {
176 ref = 0;
177 }
178 spin_unlock_bh(&ref_table_lock);
179
180 /*
181 * Grab the lock so no one else can modify this entry
182 * While we assign its ref value & object pointer
183 */
184 if (entry) {
185 spin_lock_bh(&entry->lock);
186 entry->ref = ref;
187 entry->object = object;
188 *lock = &entry->lock;
189 /*
190 * keep it locked, the caller is responsible
191 * for unlocking this when they're done with it
192 */
193 }
194
195 return ref;
196}
197
198/**
199 * tipc_ref_discard - invalidate references to an object
200 *
201 * Disallow future references to an object and free up the entry for re-use.
202 * Note: The entry's spin_lock may still be busy after discard
203 */
204void tipc_ref_discard(u32 ref)
205{
206 struct reference *entry;
207 u32 index;
208 u32 index_mask;
209
210 if (!tipc_ref_table.entries) {
211 pr_err("Ref. table not found during discard attempt\n");
212 return;
213 }
214
215 index_mask = tipc_ref_table.index_mask;
216 index = ref & index_mask;
217 entry = &(tipc_ref_table.entries[index]);
218
219 spin_lock_bh(&ref_table_lock);
220
221 if (!entry->object) {
222 pr_err("Attempt to discard ref. to non-existent obj\n");
223 goto exit;
224 }
225 if (entry->ref != ref) {
226 pr_err("Attempt to discard non-existent reference\n");
227 goto exit;
228 }
229
230 /*
231 * mark entry as unused; increment instance part of entry's reference
232 * to invalidate any subsequent references
233 */
234 entry->object = NULL;
235 entry->ref = (ref & ~index_mask) + (index_mask + 1);
236
237 /* append entry to free entry list */
238 if (tipc_ref_table.first_free == 0)
239 tipc_ref_table.first_free = index;
240 else
241 tipc_ref_table.entries[tipc_ref_table.last_free].ref |= index;
242 tipc_ref_table.last_free = index;
243
244exit:
245 spin_unlock_bh(&ref_table_lock);
246}
247
248/**
249 * tipc_ref_lock - lock referenced object and return pointer to it
250 */
251void *tipc_ref_lock(u32 ref)
252{
253 if (likely(tipc_ref_table.entries)) {
254 struct reference *entry;
255
256 entry = &tipc_ref_table.entries[ref &
257 tipc_ref_table.index_mask];
258 if (likely(entry->ref != 0)) {
259 spin_lock_bh(&entry->lock);
260 if (likely((entry->ref == ref) && (entry->object)))
261 return entry->object;
262 spin_unlock_bh(&entry->lock);
263 }
264 }
265 return NULL;
266}
diff --git a/net/tipc/ref.h b/net/tipc/ref.h
deleted file mode 100644
index d01aa1df63b8..000000000000
--- a/net/tipc/ref.h
+++ /dev/null
@@ -1,48 +0,0 @@
1/*
2 * net/tipc/ref.h: Include file for TIPC object registry code
3 *
4 * Copyright (c) 1991-2006, Ericsson AB
5 * Copyright (c) 2005-2006, Wind River Systems
6 * All rights reserved.
7 *
8 * Redistribution and use in source and binary forms, with or without
9 * modification, are permitted provided that the following conditions are met:
10 *
11 * 1. Redistributions of source code must retain the above copyright
12 * notice, this list of conditions and the following disclaimer.
13 * 2. Redistributions in binary form must reproduce the above copyright
14 * notice, this list of conditions and the following disclaimer in the
15 * documentation and/or other materials provided with the distribution.
16 * 3. Neither the names of the copyright holders nor the names of its
17 * contributors may be used to endorse or promote products derived from
18 * this software without specific prior written permission.
19 *
20 * Alternatively, this software may be distributed under the terms of the
21 * GNU General Public License ("GPL") version 2 as published by the Free
22 * Software Foundation.
23 *
24 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
25 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
26 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
27 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
28 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
29 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
30 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
31 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
32 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
33 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
34 * POSSIBILITY OF SUCH DAMAGE.
35 */
36
37#ifndef _TIPC_REF_H
38#define _TIPC_REF_H
39
40int tipc_ref_table_init(u32 requested_size, u32 start);
41void tipc_ref_table_stop(void);
42
43u32 tipc_ref_acquire(void *object, spinlock_t **lock);
44void tipc_ref_discard(u32 ref);
45
46void *tipc_ref_lock(u32 ref);
47
48#endif
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index ff8c8118d56e..75275c5cf929 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -35,17 +35,67 @@
35 */ 35 */
36 36
37#include "core.h" 37#include "core.h"
38#include "port.h"
39#include "name_table.h" 38#include "name_table.h"
40#include "node.h" 39#include "node.h"
41#include "link.h" 40#include "link.h"
42#include <linux/export.h> 41#include <linux/export.h>
42#include "config.h"
43#include "socket.h"
43 44
44#define SS_LISTENING -1 /* socket is listening */ 45#define SS_LISTENING -1 /* socket is listening */
45#define SS_READY -2 /* socket is connectionless */ 46#define SS_READY -2 /* socket is connectionless */
46 47
47#define CONN_TIMEOUT_DEFAULT 8000 /* default connect timeout = 8s */ 48#define CONN_TIMEOUT_DEFAULT 8000 /* default connect timeout = 8s */
48#define TIPC_FWD_MSG 1 49#define CONN_PROBING_INTERVAL 3600000 /* [ms] => 1 h */
50#define TIPC_FWD_MSG 1
51#define TIPC_CONN_OK 0
52#define TIPC_CONN_PROBING 1
53
54/**
55 * struct tipc_sock - TIPC socket structure
56 * @sk: socket - interacts with 'port' and with user via the socket API
57 * @connected: non-zero if port is currently connected to a peer port
58 * @conn_type: TIPC type used when connection was established
59 * @conn_instance: TIPC instance used when connection was established
60 * @published: non-zero if port has one or more associated names
61 * @max_pkt: maximum packet size "hint" used when building messages sent by port
62 * @ref: unique reference to port in TIPC object registry
63 * @phdr: preformatted message header used when sending messages
64 * @port_list: adjacent ports in TIPC's global list of ports
65 * @publications: list of publications for port
66 * @pub_count: total # of publications port has made during its lifetime
67 * @probing_state:
68 * @probing_interval:
69 * @timer:
70 * @port: port - interacts with 'sk' and with the rest of the TIPC stack
71 * @peer_name: the peer of the connection, if any
72 * @conn_timeout: the time we can wait for an unresponded setup request
73 * @dupl_rcvcnt: number of bytes counted twice, in both backlog and rcv queue
74 * @link_cong: non-zero if owner must sleep because of link congestion
75 * @sent_unacked: # messages sent by socket, and not yet acked by peer
76 * @rcv_unacked: # messages read by user, but not yet acked back to peer
77 */
78struct tipc_sock {
79 struct sock sk;
80 int connected;
81 u32 conn_type;
82 u32 conn_instance;
83 int published;
84 u32 max_pkt;
85 u32 ref;
86 struct tipc_msg phdr;
87 struct list_head sock_list;
88 struct list_head publications;
89 u32 pub_count;
90 u32 probing_state;
91 u32 probing_interval;
92 struct timer_list timer;
93 uint conn_timeout;
94 atomic_t dupl_rcvcnt;
95 bool link_cong;
96 uint sent_unacked;
97 uint rcv_unacked;
98};
49 99
50static int tipc_backlog_rcv(struct sock *sk, struct sk_buff *skb); 100static int tipc_backlog_rcv(struct sock *sk, struct sk_buff *skb);
51static void tipc_data_ready(struct sock *sk); 101static void tipc_data_ready(struct sock *sk);
@@ -53,6 +103,16 @@ static void tipc_write_space(struct sock *sk);
53static int tipc_release(struct socket *sock); 103static int tipc_release(struct socket *sock);
54static int tipc_accept(struct socket *sock, struct socket *new_sock, int flags); 104static int tipc_accept(struct socket *sock, struct socket *new_sock, int flags);
55static int tipc_wait_for_sndmsg(struct socket *sock, long *timeo_p); 105static int tipc_wait_for_sndmsg(struct socket *sock, long *timeo_p);
106static void tipc_sk_timeout(unsigned long ref);
107static int tipc_sk_publish(struct tipc_sock *tsk, uint scope,
108 struct tipc_name_seq const *seq);
109static int tipc_sk_withdraw(struct tipc_sock *tsk, uint scope,
110 struct tipc_name_seq const *seq);
111static u32 tipc_sk_ref_acquire(struct tipc_sock *tsk);
112static void tipc_sk_ref_discard(u32 ref);
113static struct tipc_sock *tipc_sk_get(u32 ref);
114static struct tipc_sock *tipc_sk_get_next(u32 *ref);
115static void tipc_sk_put(struct tipc_sock *tsk);
56 116
57static const struct proto_ops packet_ops; 117static const struct proto_ops packet_ops;
58static const struct proto_ops stream_ops; 118static const struct proto_ops stream_ops;
@@ -106,24 +166,75 @@ static struct proto tipc_proto_kern;
106 * - port reference 166 * - port reference
107 */ 167 */
108 168
109#include "socket.h" 169static u32 tsk_peer_node(struct tipc_sock *tsk)
170{
171 return msg_destnode(&tsk->phdr);
172}
173
174static u32 tsk_peer_port(struct tipc_sock *tsk)
175{
176 return msg_destport(&tsk->phdr);
177}
178
179static bool tsk_unreliable(struct tipc_sock *tsk)
180{
181 return msg_src_droppable(&tsk->phdr) != 0;
182}
183
184static void tsk_set_unreliable(struct tipc_sock *tsk, bool unreliable)
185{
186 msg_set_src_droppable(&tsk->phdr, unreliable ? 1 : 0);
187}
188
189static bool tsk_unreturnable(struct tipc_sock *tsk)
190{
191 return msg_dest_droppable(&tsk->phdr) != 0;
192}
193
194static void tsk_set_unreturnable(struct tipc_sock *tsk, bool unreturnable)
195{
196 msg_set_dest_droppable(&tsk->phdr, unreturnable ? 1 : 0);
197}
198
199static int tsk_importance(struct tipc_sock *tsk)
200{
201 return msg_importance(&tsk->phdr);
202}
203
204static int tsk_set_importance(struct tipc_sock *tsk, int imp)
205{
206 if (imp > TIPC_CRITICAL_IMPORTANCE)
207 return -EINVAL;
208 msg_set_importance(&tsk->phdr, (u32)imp);
209 return 0;
210}
211
212static struct tipc_sock *tipc_sk(const struct sock *sk)
213{
214 return container_of(sk, struct tipc_sock, sk);
215}
216
217static int tsk_conn_cong(struct tipc_sock *tsk)
218{
219 return tsk->sent_unacked >= TIPC_FLOWCTRL_WIN;
220}
110 221
111/** 222/**
112 * advance_rx_queue - discard first buffer in socket receive queue 223 * tsk_advance_rx_queue - discard first buffer in socket receive queue
113 * 224 *
114 * Caller must hold socket lock 225 * Caller must hold socket lock
115 */ 226 */
116static void advance_rx_queue(struct sock *sk) 227static void tsk_advance_rx_queue(struct sock *sk)
117{ 228{
118 kfree_skb(__skb_dequeue(&sk->sk_receive_queue)); 229 kfree_skb(__skb_dequeue(&sk->sk_receive_queue));
119} 230}
120 231
121/** 232/**
122 * reject_rx_queue - reject all buffers in socket receive queue 233 * tsk_rej_rx_queue - reject all buffers in socket receive queue
123 * 234 *
124 * Caller must hold socket lock 235 * Caller must hold socket lock
125 */ 236 */
126static void reject_rx_queue(struct sock *sk) 237static void tsk_rej_rx_queue(struct sock *sk)
127{ 238{
128 struct sk_buff *buf; 239 struct sk_buff *buf;
129 u32 dnode; 240 u32 dnode;
@@ -134,6 +245,38 @@ static void reject_rx_queue(struct sock *sk)
134 } 245 }
135} 246}
136 247
248/* tsk_peer_msg - verify if message was sent by connected port's peer
249 *
250 * Handles cases where the node's network address has changed from
251 * the default of <0.0.0> to its configured setting.
252 */
253static bool tsk_peer_msg(struct tipc_sock *tsk, struct tipc_msg *msg)
254{
255 u32 peer_port = tsk_peer_port(tsk);
256 u32 orig_node;
257 u32 peer_node;
258
259 if (unlikely(!tsk->connected))
260 return false;
261
262 if (unlikely(msg_origport(msg) != peer_port))
263 return false;
264
265 orig_node = msg_orignode(msg);
266 peer_node = tsk_peer_node(tsk);
267
268 if (likely(orig_node == peer_node))
269 return true;
270
271 if (!orig_node && (peer_node == tipc_own_addr))
272 return true;
273
274 if (!peer_node && (orig_node == tipc_own_addr))
275 return true;
276
277 return false;
278}
279
137/** 280/**
138 * tipc_sk_create - create a TIPC socket 281 * tipc_sk_create - create a TIPC socket
139 * @net: network namespace (must be default network) 282 * @net: network namespace (must be default network)
@@ -153,7 +296,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
153 socket_state state; 296 socket_state state;
154 struct sock *sk; 297 struct sock *sk;
155 struct tipc_sock *tsk; 298 struct tipc_sock *tsk;
156 struct tipc_port *port; 299 struct tipc_msg *msg;
157 u32 ref; 300 u32 ref;
158 301
159 /* Validate arguments */ 302 /* Validate arguments */
@@ -188,20 +331,24 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
188 return -ENOMEM; 331 return -ENOMEM;
189 332
190 tsk = tipc_sk(sk); 333 tsk = tipc_sk(sk);
191 port = &tsk->port; 334 ref = tipc_sk_ref_acquire(tsk);
192
193 ref = tipc_port_init(port, TIPC_LOW_IMPORTANCE);
194 if (!ref) { 335 if (!ref) {
195 pr_warn("Socket registration failed, ref. table exhausted\n"); 336 pr_warn("Socket create failed; reference table exhausted\n");
196 sk_free(sk);
197 return -ENOMEM; 337 return -ENOMEM;
198 } 338 }
339 tsk->max_pkt = MAX_PKT_DEFAULT;
340 tsk->ref = ref;
341 INIT_LIST_HEAD(&tsk->publications);
342 msg = &tsk->phdr;
343 tipc_msg_init(msg, TIPC_LOW_IMPORTANCE, TIPC_NAMED_MSG,
344 NAMED_H_SIZE, 0);
345 msg_set_origport(msg, ref);
199 346
200 /* Finish initializing socket data structures */ 347 /* Finish initializing socket data structures */
201 sock->ops = ops; 348 sock->ops = ops;
202 sock->state = state; 349 sock->state = state;
203
204 sock_init_data(sock, sk); 350 sock_init_data(sock, sk);
351 k_init_timer(&tsk->timer, (Handler)tipc_sk_timeout, ref);
205 sk->sk_backlog_rcv = tipc_backlog_rcv; 352 sk->sk_backlog_rcv = tipc_backlog_rcv;
206 sk->sk_rcvbuf = sysctl_tipc_rmem[1]; 353 sk->sk_rcvbuf = sysctl_tipc_rmem[1];
207 sk->sk_data_ready = tipc_data_ready; 354 sk->sk_data_ready = tipc_data_ready;
@@ -209,12 +356,11 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
209 tsk->conn_timeout = CONN_TIMEOUT_DEFAULT; 356 tsk->conn_timeout = CONN_TIMEOUT_DEFAULT;
210 tsk->sent_unacked = 0; 357 tsk->sent_unacked = 0;
211 atomic_set(&tsk->dupl_rcvcnt, 0); 358 atomic_set(&tsk->dupl_rcvcnt, 0);
212 tipc_port_unlock(port);
213 359
214 if (sock->state == SS_READY) { 360 if (sock->state == SS_READY) {
215 tipc_port_set_unreturnable(port, true); 361 tsk_set_unreturnable(tsk, true);
216 if (sock->type == SOCK_DGRAM) 362 if (sock->type == SOCK_DGRAM)
217 tipc_port_set_unreliable(port, true); 363 tsk_set_unreliable(tsk, true);
218 } 364 }
219 return 0; 365 return 0;
220} 366}
@@ -308,7 +454,6 @@ static int tipc_release(struct socket *sock)
308{ 454{
309 struct sock *sk = sock->sk; 455 struct sock *sk = sock->sk;
310 struct tipc_sock *tsk; 456 struct tipc_sock *tsk;
311 struct tipc_port *port;
312 struct sk_buff *buf; 457 struct sk_buff *buf;
313 u32 dnode; 458 u32 dnode;
314 459
@@ -320,13 +465,13 @@ static int tipc_release(struct socket *sock)
320 return 0; 465 return 0;
321 466
322 tsk = tipc_sk(sk); 467 tsk = tipc_sk(sk);
323 port = &tsk->port;
324 lock_sock(sk); 468 lock_sock(sk);
325 469
326 /* 470 /*
327 * Reject all unreceived messages, except on an active connection 471 * Reject all unreceived messages, except on an active connection
328 * (which disconnects locally & sends a 'FIN+' to peer) 472 * (which disconnects locally & sends a 'FIN+' to peer)
329 */ 473 */
474 dnode = tsk_peer_node(tsk);
330 while (sock->state != SS_DISCONNECTING) { 475 while (sock->state != SS_DISCONNECTING) {
331 buf = __skb_dequeue(&sk->sk_receive_queue); 476 buf = __skb_dequeue(&sk->sk_receive_queue);
332 if (buf == NULL) 477 if (buf == NULL)
@@ -337,17 +482,27 @@ static int tipc_release(struct socket *sock)
337 if ((sock->state == SS_CONNECTING) || 482 if ((sock->state == SS_CONNECTING) ||
338 (sock->state == SS_CONNECTED)) { 483 (sock->state == SS_CONNECTED)) {
339 sock->state = SS_DISCONNECTING; 484 sock->state = SS_DISCONNECTING;
340 tipc_port_disconnect(port->ref); 485 tsk->connected = 0;
486 tipc_node_remove_conn(dnode, tsk->ref);
341 } 487 }
342 if (tipc_msg_reverse(buf, &dnode, TIPC_ERR_NO_PORT)) 488 if (tipc_msg_reverse(buf, &dnode, TIPC_ERR_NO_PORT))
343 tipc_link_xmit(buf, dnode, 0); 489 tipc_link_xmit(buf, dnode, 0);
344 } 490 }
345 } 491 }
346 492
347 /* Destroy TIPC port; also disconnects an active connection and 493 tipc_sk_withdraw(tsk, 0, NULL);
348 * sends a 'FIN-' to peer. 494 tipc_sk_ref_discard(tsk->ref);
349 */ 495 k_cancel_timer(&tsk->timer);
350 tipc_port_destroy(port); 496 if (tsk->connected) {
497 buf = tipc_msg_create(TIPC_CRITICAL_IMPORTANCE, TIPC_CONN_MSG,
498 SHORT_H_SIZE, 0, dnode, tipc_own_addr,
499 tsk_peer_port(tsk),
500 tsk->ref, TIPC_ERR_NO_PORT);
501 if (buf)
502 tipc_link_xmit(buf, dnode, tsk->ref);
503 tipc_node_remove_conn(dnode, tsk->ref);
504 }
505 k_term_timer(&tsk->timer);
351 506
352 /* Discard any remaining (connection-based) messages in receive queue */ 507 /* Discard any remaining (connection-based) messages in receive queue */
353 __skb_queue_purge(&sk->sk_receive_queue); 508 __skb_queue_purge(&sk->sk_receive_queue);
@@ -355,7 +510,6 @@ static int tipc_release(struct socket *sock)
355 /* Reject any messages that accumulated in backlog queue */ 510 /* Reject any messages that accumulated in backlog queue */
356 sock->state = SS_DISCONNECTING; 511 sock->state = SS_DISCONNECTING;
357 release_sock(sk); 512 release_sock(sk);
358
359 sock_put(sk); 513 sock_put(sk);
360 sock->sk = NULL; 514 sock->sk = NULL;
361 515
@@ -387,7 +541,7 @@ static int tipc_bind(struct socket *sock, struct sockaddr *uaddr,
387 541
388 lock_sock(sk); 542 lock_sock(sk);
389 if (unlikely(!uaddr_len)) { 543 if (unlikely(!uaddr_len)) {
390 res = tipc_withdraw(&tsk->port, 0, NULL); 544 res = tipc_sk_withdraw(tsk, 0, NULL);
391 goto exit; 545 goto exit;
392 } 546 }
393 547
@@ -415,8 +569,8 @@ static int tipc_bind(struct socket *sock, struct sockaddr *uaddr,
415 } 569 }
416 570
417 res = (addr->scope > 0) ? 571 res = (addr->scope > 0) ?
418 tipc_publish(&tsk->port, addr->scope, &addr->addr.nameseq) : 572 tipc_sk_publish(tsk, addr->scope, &addr->addr.nameseq) :
419 tipc_withdraw(&tsk->port, -addr->scope, &addr->addr.nameseq); 573 tipc_sk_withdraw(tsk, -addr->scope, &addr->addr.nameseq);
420exit: 574exit:
421 release_sock(sk); 575 release_sock(sk);
422 return res; 576 return res;
@@ -446,10 +600,10 @@ static int tipc_getname(struct socket *sock, struct sockaddr *uaddr,
446 if ((sock->state != SS_CONNECTED) && 600 if ((sock->state != SS_CONNECTED) &&
447 ((peer != 2) || (sock->state != SS_DISCONNECTING))) 601 ((peer != 2) || (sock->state != SS_DISCONNECTING)))
448 return -ENOTCONN; 602 return -ENOTCONN;
449 addr->addr.id.ref = tipc_port_peerport(&tsk->port); 603 addr->addr.id.ref = tsk_peer_port(tsk);
450 addr->addr.id.node = tipc_port_peernode(&tsk->port); 604 addr->addr.id.node = tsk_peer_node(tsk);
451 } else { 605 } else {
452 addr->addr.id.ref = tsk->port.ref; 606 addr->addr.id.ref = tsk->ref;
453 addr->addr.id.node = tipc_own_addr; 607 addr->addr.id.node = tipc_own_addr;
454 } 608 }
455 609
@@ -518,7 +672,7 @@ static unsigned int tipc_poll(struct file *file, struct socket *sock,
518 break; 672 break;
519 case SS_READY: 673 case SS_READY:
520 case SS_CONNECTED: 674 case SS_CONNECTED:
521 if (!tsk->link_cong && !tipc_sk_conn_cong(tsk)) 675 if (!tsk->link_cong && !tsk_conn_cong(tsk))
522 mask |= POLLOUT; 676 mask |= POLLOUT;
523 /* fall thru' */ 677 /* fall thru' */
524 case SS_CONNECTING: 678 case SS_CONNECTING:
@@ -549,7 +703,7 @@ static int tipc_sendmcast(struct socket *sock, struct tipc_name_seq *seq,
549 struct iovec *iov, size_t dsz, long timeo) 703 struct iovec *iov, size_t dsz, long timeo)
550{ 704{
551 struct sock *sk = sock->sk; 705 struct sock *sk = sock->sk;
552 struct tipc_msg *mhdr = &tipc_sk(sk)->port.phdr; 706 struct tipc_msg *mhdr = &tipc_sk(sk)->phdr;
553 struct sk_buff *buf; 707 struct sk_buff *buf;
554 uint mtu; 708 uint mtu;
555 int rc; 709 int rc;
@@ -579,6 +733,7 @@ new_mtu:
579 goto new_mtu; 733 goto new_mtu;
580 if (rc != -ELINKCONG) 734 if (rc != -ELINKCONG)
581 break; 735 break;
736 tipc_sk(sk)->link_cong = 1;
582 rc = tipc_wait_for_sndmsg(sock, &timeo); 737 rc = tipc_wait_for_sndmsg(sock, &timeo);
583 if (rc) 738 if (rc)
584 kfree_skb_list(buf); 739 kfree_skb_list(buf);
@@ -638,20 +793,19 @@ static int tipc_sk_proto_rcv(struct tipc_sock *tsk, u32 *dnode,
638 struct sk_buff *buf) 793 struct sk_buff *buf)
639{ 794{
640 struct tipc_msg *msg = buf_msg(buf); 795 struct tipc_msg *msg = buf_msg(buf);
641 struct tipc_port *port = &tsk->port;
642 int conn_cong; 796 int conn_cong;
643 797
644 /* Ignore if connection cannot be validated: */ 798 /* Ignore if connection cannot be validated: */
645 if (!port->connected || !tipc_port_peer_msg(port, msg)) 799 if (!tsk_peer_msg(tsk, msg))
646 goto exit; 800 goto exit;
647 801
648 port->probing_state = TIPC_CONN_OK; 802 tsk->probing_state = TIPC_CONN_OK;
649 803
650 if (msg_type(msg) == CONN_ACK) { 804 if (msg_type(msg) == CONN_ACK) {
651 conn_cong = tipc_sk_conn_cong(tsk); 805 conn_cong = tsk_conn_cong(tsk);
652 tsk->sent_unacked -= msg_msgcnt(msg); 806 tsk->sent_unacked -= msg_msgcnt(msg);
653 if (conn_cong) 807 if (conn_cong)
654 tipc_sock_wakeup(tsk); 808 tsk->sk.sk_write_space(&tsk->sk);
655 } else if (msg_type(msg) == CONN_PROBE) { 809 } else if (msg_type(msg) == CONN_PROBE) {
656 if (!tipc_msg_reverse(buf, dnode, TIPC_OK)) 810 if (!tipc_msg_reverse(buf, dnode, TIPC_OK))
657 return TIPC_OK; 811 return TIPC_OK;
@@ -742,8 +896,7 @@ static int tipc_sendmsg(struct kiocb *iocb, struct socket *sock,
742 DECLARE_SOCKADDR(struct sockaddr_tipc *, dest, m->msg_name); 896 DECLARE_SOCKADDR(struct sockaddr_tipc *, dest, m->msg_name);
743 struct sock *sk = sock->sk; 897 struct sock *sk = sock->sk;
744 struct tipc_sock *tsk = tipc_sk(sk); 898 struct tipc_sock *tsk = tipc_sk(sk);
745 struct tipc_port *port = &tsk->port; 899 struct tipc_msg *mhdr = &tsk->phdr;
746 struct tipc_msg *mhdr = &port->phdr;
747 struct iovec *iov = m->msg_iov; 900 struct iovec *iov = m->msg_iov;
748 u32 dnode, dport; 901 u32 dnode, dport;
749 struct sk_buff *buf; 902 struct sk_buff *buf;
@@ -774,13 +927,13 @@ static int tipc_sendmsg(struct kiocb *iocb, struct socket *sock,
774 rc = -EISCONN; 927 rc = -EISCONN;
775 goto exit; 928 goto exit;
776 } 929 }
777 if (tsk->port.published) { 930 if (tsk->published) {
778 rc = -EOPNOTSUPP; 931 rc = -EOPNOTSUPP;
779 goto exit; 932 goto exit;
780 } 933 }
781 if (dest->addrtype == TIPC_ADDR_NAME) { 934 if (dest->addrtype == TIPC_ADDR_NAME) {
782 tsk->port.conn_type = dest->addr.name.name.type; 935 tsk->conn_type = dest->addr.name.name.type;
783 tsk->port.conn_instance = dest->addr.name.name.instance; 936 tsk->conn_instance = dest->addr.name.name.instance;
784 } 937 }
785 } 938 }
786 rc = dest_name_check(dest, m); 939 rc = dest_name_check(dest, m);
@@ -820,13 +973,14 @@ static int tipc_sendmsg(struct kiocb *iocb, struct socket *sock,
820 } 973 }
821 974
822new_mtu: 975new_mtu:
823 mtu = tipc_node_get_mtu(dnode, tsk->port.ref); 976 mtu = tipc_node_get_mtu(dnode, tsk->ref);
824 rc = tipc_msg_build(mhdr, iov, 0, dsz, mtu, &buf); 977 rc = tipc_msg_build(mhdr, iov, 0, dsz, mtu, &buf);
825 if (rc < 0) 978 if (rc < 0)
826 goto exit; 979 goto exit;
827 980
828 do { 981 do {
829 rc = tipc_link_xmit(buf, dnode, tsk->port.ref); 982 TIPC_SKB_CB(buf)->wakeup_pending = tsk->link_cong;
983 rc = tipc_link_xmit(buf, dnode, tsk->ref);
830 if (likely(rc >= 0)) { 984 if (likely(rc >= 0)) {
831 if (sock->state != SS_READY) 985 if (sock->state != SS_READY)
832 sock->state = SS_CONNECTING; 986 sock->state = SS_CONNECTING;
@@ -835,10 +989,9 @@ new_mtu:
835 } 989 }
836 if (rc == -EMSGSIZE) 990 if (rc == -EMSGSIZE)
837 goto new_mtu; 991 goto new_mtu;
838
839 if (rc != -ELINKCONG) 992 if (rc != -ELINKCONG)
840 break; 993 break;
841 994 tsk->link_cong = 1;
842 rc = tipc_wait_for_sndmsg(sock, &timeo); 995 rc = tipc_wait_for_sndmsg(sock, &timeo);
843 if (rc) 996 if (rc)
844 kfree_skb_list(buf); 997 kfree_skb_list(buf);
@@ -873,8 +1026,8 @@ static int tipc_wait_for_sndpkt(struct socket *sock, long *timeo_p)
873 prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); 1026 prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
874 done = sk_wait_event(sk, timeo_p, 1027 done = sk_wait_event(sk, timeo_p,
875 (!tsk->link_cong && 1028 (!tsk->link_cong &&
876 !tipc_sk_conn_cong(tsk)) || 1029 !tsk_conn_cong(tsk)) ||
877 !tsk->port.connected); 1030 !tsk->connected);
878 finish_wait(sk_sleep(sk), &wait); 1031 finish_wait(sk_sleep(sk), &wait);
879 } while (!done); 1032 } while (!done);
880 return 0; 1033 return 0;
@@ -897,11 +1050,10 @@ static int tipc_send_stream(struct kiocb *iocb, struct socket *sock,
897{ 1050{
898 struct sock *sk = sock->sk; 1051 struct sock *sk = sock->sk;
899 struct tipc_sock *tsk = tipc_sk(sk); 1052 struct tipc_sock *tsk = tipc_sk(sk);
900 struct tipc_port *port = &tsk->port; 1053 struct tipc_msg *mhdr = &tsk->phdr;
901 struct tipc_msg *mhdr = &port->phdr;
902 struct sk_buff *buf; 1054 struct sk_buff *buf;
903 DECLARE_SOCKADDR(struct sockaddr_tipc *, dest, m->msg_name); 1055 DECLARE_SOCKADDR(struct sockaddr_tipc *, dest, m->msg_name);
904 u32 ref = port->ref; 1056 u32 ref = tsk->ref;
905 int rc = -EINVAL; 1057 int rc = -EINVAL;
906 long timeo; 1058 long timeo;
907 u32 dnode; 1059 u32 dnode;
@@ -929,16 +1081,16 @@ static int tipc_send_stream(struct kiocb *iocb, struct socket *sock,
929 } 1081 }
930 1082
931 timeo = sock_sndtimeo(sk, m->msg_flags & MSG_DONTWAIT); 1083 timeo = sock_sndtimeo(sk, m->msg_flags & MSG_DONTWAIT);
932 dnode = tipc_port_peernode(port); 1084 dnode = tsk_peer_node(tsk);
933 1085
934next: 1086next:
935 mtu = port->max_pkt; 1087 mtu = tsk->max_pkt;
936 send = min_t(uint, dsz - sent, TIPC_MAX_USER_MSG_SIZE); 1088 send = min_t(uint, dsz - sent, TIPC_MAX_USER_MSG_SIZE);
937 rc = tipc_msg_build(mhdr, m->msg_iov, sent, send, mtu, &buf); 1089 rc = tipc_msg_build(mhdr, m->msg_iov, sent, send, mtu, &buf);
938 if (unlikely(rc < 0)) 1090 if (unlikely(rc < 0))
939 goto exit; 1091 goto exit;
940 do { 1092 do {
941 if (likely(!tipc_sk_conn_cong(tsk))) { 1093 if (likely(!tsk_conn_cong(tsk))) {
942 rc = tipc_link_xmit(buf, dnode, ref); 1094 rc = tipc_link_xmit(buf, dnode, ref);
943 if (likely(!rc)) { 1095 if (likely(!rc)) {
944 tsk->sent_unacked++; 1096 tsk->sent_unacked++;
@@ -948,11 +1100,12 @@ next:
948 goto next; 1100 goto next;
949 } 1101 }
950 if (rc == -EMSGSIZE) { 1102 if (rc == -EMSGSIZE) {
951 port->max_pkt = tipc_node_get_mtu(dnode, ref); 1103 tsk->max_pkt = tipc_node_get_mtu(dnode, ref);
952 goto next; 1104 goto next;
953 } 1105 }
954 if (rc != -ELINKCONG) 1106 if (rc != -ELINKCONG)
955 break; 1107 break;
1108 tsk->link_cong = 1;
956 } 1109 }
957 rc = tipc_wait_for_sndpkt(sock, &timeo); 1110 rc = tipc_wait_for_sndpkt(sock, &timeo);
958 if (rc) 1111 if (rc)
@@ -984,29 +1137,25 @@ static int tipc_send_packet(struct kiocb *iocb, struct socket *sock,
984 return tipc_send_stream(iocb, sock, m, dsz); 1137 return tipc_send_stream(iocb, sock, m, dsz);
985} 1138}
986 1139
987/** 1140/* tipc_sk_finish_conn - complete the setup of a connection
988 * auto_connect - complete connection setup to a remote port
989 * @tsk: tipc socket structure
990 * @msg: peer's response message
991 *
992 * Returns 0 on success, errno otherwise
993 */ 1141 */
994static int auto_connect(struct tipc_sock *tsk, struct tipc_msg *msg) 1142static void tipc_sk_finish_conn(struct tipc_sock *tsk, u32 peer_port,
1143 u32 peer_node)
995{ 1144{
996 struct tipc_port *port = &tsk->port; 1145 struct tipc_msg *msg = &tsk->phdr;
997 struct socket *sock = tsk->sk.sk_socket; 1146
998 struct tipc_portid peer; 1147 msg_set_destnode(msg, peer_node);
999 1148 msg_set_destport(msg, peer_port);
1000 peer.ref = msg_origport(msg); 1149 msg_set_type(msg, TIPC_CONN_MSG);
1001 peer.node = msg_orignode(msg); 1150 msg_set_lookup_scope(msg, 0);
1002 1151 msg_set_hdr_sz(msg, SHORT_H_SIZE);
1003 __tipc_port_connect(port->ref, port, &peer); 1152
1004 1153 tsk->probing_interval = CONN_PROBING_INTERVAL;
1005 if (msg_importance(msg) > TIPC_CRITICAL_IMPORTANCE) 1154 tsk->probing_state = TIPC_CONN_OK;
1006 return -EINVAL; 1155 tsk->connected = 1;
1007 msg_set_importance(&port->phdr, (u32)msg_importance(msg)); 1156 k_start_timer(&tsk->timer, tsk->probing_interval);
1008 sock->state = SS_CONNECTED; 1157 tipc_node_add_conn(peer_node, tsk->ref, peer_port);
1009 return 0; 1158 tsk->max_pkt = tipc_node_get_mtu(peer_node, tsk->ref);
1010} 1159}
1011 1160
1012/** 1161/**
@@ -1033,17 +1182,17 @@ static void set_orig_addr(struct msghdr *m, struct tipc_msg *msg)
1033} 1182}
1034 1183
1035/** 1184/**
1036 * anc_data_recv - optionally capture ancillary data for received message 1185 * tipc_sk_anc_data_recv - optionally capture ancillary data for received message
1037 * @m: descriptor for message info 1186 * @m: descriptor for message info
1038 * @msg: received message header 1187 * @msg: received message header
1039 * @tport: TIPC port associated with message 1188 * @tsk: TIPC port associated with message
1040 * 1189 *
1041 * Note: Ancillary data is not captured if not requested by receiver. 1190 * Note: Ancillary data is not captured if not requested by receiver.
1042 * 1191 *
1043 * Returns 0 if successful, otherwise errno 1192 * Returns 0 if successful, otherwise errno
1044 */ 1193 */
1045static int anc_data_recv(struct msghdr *m, struct tipc_msg *msg, 1194static int tipc_sk_anc_data_recv(struct msghdr *m, struct tipc_msg *msg,
1046 struct tipc_port *tport) 1195 struct tipc_sock *tsk)
1047{ 1196{
1048 u32 anc_data[3]; 1197 u32 anc_data[3];
1049 u32 err; 1198 u32 err;
@@ -1086,10 +1235,10 @@ static int anc_data_recv(struct msghdr *m, struct tipc_msg *msg,
1086 anc_data[2] = msg_nameupper(msg); 1235 anc_data[2] = msg_nameupper(msg);
1087 break; 1236 break;
1088 case TIPC_CONN_MSG: 1237 case TIPC_CONN_MSG:
1089 has_name = (tport->conn_type != 0); 1238 has_name = (tsk->conn_type != 0);
1090 anc_data[0] = tport->conn_type; 1239 anc_data[0] = tsk->conn_type;
1091 anc_data[1] = tport->conn_instance; 1240 anc_data[1] = tsk->conn_instance;
1092 anc_data[2] = tport->conn_instance; 1241 anc_data[2] = tsk->conn_instance;
1093 break; 1242 break;
1094 default: 1243 default:
1095 has_name = 0; 1244 has_name = 0;
@@ -1103,6 +1252,24 @@ static int anc_data_recv(struct msghdr *m, struct tipc_msg *msg,
1103 return 0; 1252 return 0;
1104} 1253}
1105 1254
1255static void tipc_sk_send_ack(struct tipc_sock *tsk, uint ack)
1256{
1257 struct sk_buff *buf = NULL;
1258 struct tipc_msg *msg;
1259 u32 peer_port = tsk_peer_port(tsk);
1260 u32 dnode = tsk_peer_node(tsk);
1261
1262 if (!tsk->connected)
1263 return;
1264 buf = tipc_msg_create(CONN_MANAGER, CONN_ACK, INT_H_SIZE, 0, dnode,
1265 tipc_own_addr, peer_port, tsk->ref, TIPC_OK);
1266 if (!buf)
1267 return;
1268 msg = buf_msg(buf);
1269 msg_set_msgcnt(msg, ack);
1270 tipc_link_xmit(buf, dnode, msg_link_selector(msg));
1271}
1272
1106static int tipc_wait_for_rcvmsg(struct socket *sock, long *timeop) 1273static int tipc_wait_for_rcvmsg(struct socket *sock, long *timeop)
1107{ 1274{
1108 struct sock *sk = sock->sk; 1275 struct sock *sk = sock->sk;
@@ -1153,7 +1320,6 @@ static int tipc_recvmsg(struct kiocb *iocb, struct socket *sock,
1153{ 1320{
1154 struct sock *sk = sock->sk; 1321 struct sock *sk = sock->sk;
1155 struct tipc_sock *tsk = tipc_sk(sk); 1322 struct tipc_sock *tsk = tipc_sk(sk);
1156 struct tipc_port *port = &tsk->port;
1157 struct sk_buff *buf; 1323 struct sk_buff *buf;
1158 struct tipc_msg *msg; 1324 struct tipc_msg *msg;
1159 long timeo; 1325 long timeo;
@@ -1188,7 +1354,7 @@ restart:
1188 1354
1189 /* Discard an empty non-errored message & try again */ 1355 /* Discard an empty non-errored message & try again */
1190 if ((!sz) && (!err)) { 1356 if ((!sz) && (!err)) {
1191 advance_rx_queue(sk); 1357 tsk_advance_rx_queue(sk);
1192 goto restart; 1358 goto restart;
1193 } 1359 }
1194 1360
@@ -1196,7 +1362,7 @@ restart:
1196 set_orig_addr(m, msg); 1362 set_orig_addr(m, msg);
1197 1363
1198 /* Capture ancillary data (optional) */ 1364 /* Capture ancillary data (optional) */
1199 res = anc_data_recv(m, msg, port); 1365 res = tipc_sk_anc_data_recv(m, msg, tsk);
1200 if (res) 1366 if (res)
1201 goto exit; 1367 goto exit;
1202 1368
@@ -1223,10 +1389,10 @@ restart:
1223 if (likely(!(flags & MSG_PEEK))) { 1389 if (likely(!(flags & MSG_PEEK))) {
1224 if ((sock->state != SS_READY) && 1390 if ((sock->state != SS_READY) &&
1225 (++tsk->rcv_unacked >= TIPC_CONNACK_INTV)) { 1391 (++tsk->rcv_unacked >= TIPC_CONNACK_INTV)) {
1226 tipc_acknowledge(port->ref, tsk->rcv_unacked); 1392 tipc_sk_send_ack(tsk, tsk->rcv_unacked);
1227 tsk->rcv_unacked = 0; 1393 tsk->rcv_unacked = 0;
1228 } 1394 }
1229 advance_rx_queue(sk); 1395 tsk_advance_rx_queue(sk);
1230 } 1396 }
1231exit: 1397exit:
1232 release_sock(sk); 1398 release_sock(sk);
@@ -1250,7 +1416,6 @@ static int tipc_recv_stream(struct kiocb *iocb, struct socket *sock,
1250{ 1416{
1251 struct sock *sk = sock->sk; 1417 struct sock *sk = sock->sk;
1252 struct tipc_sock *tsk = tipc_sk(sk); 1418 struct tipc_sock *tsk = tipc_sk(sk);
1253 struct tipc_port *port = &tsk->port;
1254 struct sk_buff *buf; 1419 struct sk_buff *buf;
1255 struct tipc_msg *msg; 1420 struct tipc_msg *msg;
1256 long timeo; 1421 long timeo;
@@ -1288,14 +1453,14 @@ restart:
1288 1453
1289 /* Discard an empty non-errored message & try again */ 1454 /* Discard an empty non-errored message & try again */
1290 if ((!sz) && (!err)) { 1455 if ((!sz) && (!err)) {
1291 advance_rx_queue(sk); 1456 tsk_advance_rx_queue(sk);
1292 goto restart; 1457 goto restart;
1293 } 1458 }
1294 1459
1295 /* Optionally capture sender's address & ancillary data of first msg */ 1460 /* Optionally capture sender's address & ancillary data of first msg */
1296 if (sz_copied == 0) { 1461 if (sz_copied == 0) {
1297 set_orig_addr(m, msg); 1462 set_orig_addr(m, msg);
1298 res = anc_data_recv(m, msg, port); 1463 res = tipc_sk_anc_data_recv(m, msg, tsk);
1299 if (res) 1464 if (res)
1300 goto exit; 1465 goto exit;
1301 } 1466 }
@@ -1334,10 +1499,10 @@ restart:
1334 /* Consume received message (optional) */ 1499 /* Consume received message (optional) */
1335 if (likely(!(flags & MSG_PEEK))) { 1500 if (likely(!(flags & MSG_PEEK))) {
1336 if (unlikely(++tsk->rcv_unacked >= TIPC_CONNACK_INTV)) { 1501 if (unlikely(++tsk->rcv_unacked >= TIPC_CONNACK_INTV)) {
1337 tipc_acknowledge(port->ref, tsk->rcv_unacked); 1502 tipc_sk_send_ack(tsk, tsk->rcv_unacked);
1338 tsk->rcv_unacked = 0; 1503 tsk->rcv_unacked = 0;
1339 } 1504 }
1340 advance_rx_queue(sk); 1505 tsk_advance_rx_queue(sk);
1341 } 1506 }
1342 1507
1343 /* Loop around if more data is required */ 1508 /* Loop around if more data is required */
@@ -1396,12 +1561,9 @@ static void tipc_data_ready(struct sock *sk)
1396static int filter_connect(struct tipc_sock *tsk, struct sk_buff **buf) 1561static int filter_connect(struct tipc_sock *tsk, struct sk_buff **buf)
1397{ 1562{
1398 struct sock *sk = &tsk->sk; 1563 struct sock *sk = &tsk->sk;
1399 struct tipc_port *port = &tsk->port;
1400 struct socket *sock = sk->sk_socket; 1564 struct socket *sock = sk->sk_socket;
1401 struct tipc_msg *msg = buf_msg(*buf); 1565 struct tipc_msg *msg = buf_msg(*buf);
1402
1403 int retval = -TIPC_ERR_NO_PORT; 1566 int retval = -TIPC_ERR_NO_PORT;
1404 int res;
1405 1567
1406 if (msg_mcast(msg)) 1568 if (msg_mcast(msg))
1407 return retval; 1569 return retval;
@@ -1409,16 +1571,23 @@ static int filter_connect(struct tipc_sock *tsk, struct sk_buff **buf)
1409 switch ((int)sock->state) { 1571 switch ((int)sock->state) {
1410 case SS_CONNECTED: 1572 case SS_CONNECTED:
1411 /* Accept only connection-based messages sent by peer */ 1573 /* Accept only connection-based messages sent by peer */
1412 if (msg_connected(msg) && tipc_port_peer_msg(port, msg)) { 1574 if (tsk_peer_msg(tsk, msg)) {
1413 if (unlikely(msg_errcode(msg))) { 1575 if (unlikely(msg_errcode(msg))) {
1414 sock->state = SS_DISCONNECTING; 1576 sock->state = SS_DISCONNECTING;
1415 __tipc_port_disconnect(port); 1577 tsk->connected = 0;
1578 /* let timer expire on it's own */
1579 tipc_node_remove_conn(tsk_peer_node(tsk),
1580 tsk->ref);
1416 } 1581 }
1417 retval = TIPC_OK; 1582 retval = TIPC_OK;
1418 } 1583 }
1419 break; 1584 break;
1420 case SS_CONNECTING: 1585 case SS_CONNECTING:
1421 /* Accept only ACK or NACK message */ 1586 /* Accept only ACK or NACK message */
1587
1588 if (unlikely(!msg_connected(msg)))
1589 break;
1590
1422 if (unlikely(msg_errcode(msg))) { 1591 if (unlikely(msg_errcode(msg))) {
1423 sock->state = SS_DISCONNECTING; 1592 sock->state = SS_DISCONNECTING;
1424 sk->sk_err = ECONNREFUSED; 1593 sk->sk_err = ECONNREFUSED;
@@ -1426,17 +1595,17 @@ static int filter_connect(struct tipc_sock *tsk, struct sk_buff **buf)
1426 break; 1595 break;
1427 } 1596 }
1428 1597
1429 if (unlikely(!msg_connected(msg))) 1598 if (unlikely(msg_importance(msg) > TIPC_CRITICAL_IMPORTANCE)) {
1430 break;
1431
1432 res = auto_connect(tsk, msg);
1433 if (res) {
1434 sock->state = SS_DISCONNECTING; 1599 sock->state = SS_DISCONNECTING;
1435 sk->sk_err = -res; 1600 sk->sk_err = EINVAL;
1436 retval = TIPC_OK; 1601 retval = TIPC_OK;
1437 break; 1602 break;
1438 } 1603 }
1439 1604
1605 tipc_sk_finish_conn(tsk, msg_origport(msg), msg_orignode(msg));
1606 msg_set_importance(&tsk->phdr, msg_importance(msg));
1607 sock->state = SS_CONNECTED;
1608
1440 /* If an incoming message is an 'ACK-', it should be 1609 /* If an incoming message is an 'ACK-', it should be
1441 * discarded here because it doesn't contain useful 1610 * discarded here because it doesn't contain useful
1442 * data. In addition, we should try to wake up 1611 * data. In addition, we should try to wake up
@@ -1518,6 +1687,13 @@ static int filter_rcv(struct sock *sk, struct sk_buff *buf)
1518 if (unlikely(msg_user(msg) == CONN_MANAGER)) 1687 if (unlikely(msg_user(msg) == CONN_MANAGER))
1519 return tipc_sk_proto_rcv(tsk, &onode, buf); 1688 return tipc_sk_proto_rcv(tsk, &onode, buf);
1520 1689
1690 if (unlikely(msg_user(msg) == SOCK_WAKEUP)) {
1691 kfree_skb(buf);
1692 tsk->link_cong = 0;
1693 sk->sk_write_space(sk);
1694 return TIPC_OK;
1695 }
1696
1521 /* Reject message if it is wrong sort of message for socket */ 1697 /* Reject message if it is wrong sort of message for socket */
1522 if (msg_type(msg) > TIPC_DIRECT_MSG) 1698 if (msg_type(msg) > TIPC_DIRECT_MSG)
1523 return -TIPC_ERR_NO_PORT; 1699 return -TIPC_ERR_NO_PORT;
@@ -1585,7 +1761,6 @@ static int tipc_backlog_rcv(struct sock *sk, struct sk_buff *buf)
1585int tipc_sk_rcv(struct sk_buff *buf) 1761int tipc_sk_rcv(struct sk_buff *buf)
1586{ 1762{
1587 struct tipc_sock *tsk; 1763 struct tipc_sock *tsk;
1588 struct tipc_port *port;
1589 struct sock *sk; 1764 struct sock *sk;
1590 u32 dport = msg_destport(buf_msg(buf)); 1765 u32 dport = msg_destport(buf_msg(buf));
1591 int rc = TIPC_OK; 1766 int rc = TIPC_OK;
@@ -1593,13 +1768,11 @@ int tipc_sk_rcv(struct sk_buff *buf)
1593 u32 dnode; 1768 u32 dnode;
1594 1769
1595 /* Validate destination and message */ 1770 /* Validate destination and message */
1596 port = tipc_port_lock(dport); 1771 tsk = tipc_sk_get(dport);
1597 if (unlikely(!port)) { 1772 if (unlikely(!tsk)) {
1598 rc = tipc_msg_eval(buf, &dnode); 1773 rc = tipc_msg_eval(buf, &dnode);
1599 goto exit; 1774 goto exit;
1600 } 1775 }
1601
1602 tsk = tipc_port_to_sock(port);
1603 sk = &tsk->sk; 1776 sk = &tsk->sk;
1604 1777
1605 /* Queue message */ 1778 /* Queue message */
@@ -1615,8 +1788,7 @@ int tipc_sk_rcv(struct sk_buff *buf)
1615 rc = -TIPC_ERR_OVERLOAD; 1788 rc = -TIPC_ERR_OVERLOAD;
1616 } 1789 }
1617 bh_unlock_sock(sk); 1790 bh_unlock_sock(sk);
1618 tipc_port_unlock(port); 1791 tipc_sk_put(tsk);
1619
1620 if (likely(!rc)) 1792 if (likely(!rc))
1621 return 0; 1793 return 0;
1622exit: 1794exit:
@@ -1803,10 +1975,8 @@ static int tipc_accept(struct socket *sock, struct socket *new_sock, int flags)
1803{ 1975{
1804 struct sock *new_sk, *sk = sock->sk; 1976 struct sock *new_sk, *sk = sock->sk;
1805 struct sk_buff *buf; 1977 struct sk_buff *buf;
1806 struct tipc_port *new_port; 1978 struct tipc_sock *new_tsock;
1807 struct tipc_msg *msg; 1979 struct tipc_msg *msg;
1808 struct tipc_portid peer;
1809 u32 new_ref;
1810 long timeo; 1980 long timeo;
1811 int res; 1981 int res;
1812 1982
@@ -1828,8 +1998,7 @@ static int tipc_accept(struct socket *sock, struct socket *new_sock, int flags)
1828 goto exit; 1998 goto exit;
1829 1999
1830 new_sk = new_sock->sk; 2000 new_sk = new_sock->sk;
1831 new_port = &tipc_sk(new_sk)->port; 2001 new_tsock = tipc_sk(new_sk);
1832 new_ref = new_port->ref;
1833 msg = buf_msg(buf); 2002 msg = buf_msg(buf);
1834 2003
1835 /* we lock on new_sk; but lockdep sees the lock on sk */ 2004 /* we lock on new_sk; but lockdep sees the lock on sk */
@@ -1839,18 +2008,16 @@ static int tipc_accept(struct socket *sock, struct socket *new_sock, int flags)
1839 * Reject any stray messages received by new socket 2008 * Reject any stray messages received by new socket
1840 * before the socket lock was taken (very, very unlikely) 2009 * before the socket lock was taken (very, very unlikely)
1841 */ 2010 */
1842 reject_rx_queue(new_sk); 2011 tsk_rej_rx_queue(new_sk);
1843 2012
1844 /* Connect new socket to it's peer */ 2013 /* Connect new socket to it's peer */
1845 peer.ref = msg_origport(msg); 2014 tipc_sk_finish_conn(new_tsock, msg_origport(msg), msg_orignode(msg));
1846 peer.node = msg_orignode(msg);
1847 tipc_port_connect(new_ref, &peer);
1848 new_sock->state = SS_CONNECTED; 2015 new_sock->state = SS_CONNECTED;
1849 2016
1850 tipc_port_set_importance(new_port, msg_importance(msg)); 2017 tsk_set_importance(new_tsock, msg_importance(msg));
1851 if (msg_named(msg)) { 2018 if (msg_named(msg)) {
1852 new_port->conn_type = msg_nametype(msg); 2019 new_tsock->conn_type = msg_nametype(msg);
1853 new_port->conn_instance = msg_nameinst(msg); 2020 new_tsock->conn_instance = msg_nameinst(msg);
1854 } 2021 }
1855 2022
1856 /* 2023 /*
@@ -1860,7 +2027,7 @@ static int tipc_accept(struct socket *sock, struct socket *new_sock, int flags)
1860 if (!msg_data_sz(msg)) { 2027 if (!msg_data_sz(msg)) {
1861 struct msghdr m = {NULL,}; 2028 struct msghdr m = {NULL,};
1862 2029
1863 advance_rx_queue(sk); 2030 tsk_advance_rx_queue(sk);
1864 tipc_send_packet(NULL, new_sock, &m, 0); 2031 tipc_send_packet(NULL, new_sock, &m, 0);
1865 } else { 2032 } else {
1866 __skb_dequeue(&sk->sk_receive_queue); 2033 __skb_dequeue(&sk->sk_receive_queue);
@@ -1886,9 +2053,8 @@ static int tipc_shutdown(struct socket *sock, int how)
1886{ 2053{
1887 struct sock *sk = sock->sk; 2054 struct sock *sk = sock->sk;
1888 struct tipc_sock *tsk = tipc_sk(sk); 2055 struct tipc_sock *tsk = tipc_sk(sk);
1889 struct tipc_port *port = &tsk->port;
1890 struct sk_buff *buf; 2056 struct sk_buff *buf;
1891 u32 peer; 2057 u32 dnode;
1892 int res; 2058 int res;
1893 2059
1894 if (how != SHUT_RDWR) 2060 if (how != SHUT_RDWR)
@@ -1908,15 +2074,21 @@ restart:
1908 kfree_skb(buf); 2074 kfree_skb(buf);
1909 goto restart; 2075 goto restart;
1910 } 2076 }
1911 tipc_port_disconnect(port->ref); 2077 if (tipc_msg_reverse(buf, &dnode, TIPC_CONN_SHUTDOWN))
1912 if (tipc_msg_reverse(buf, &peer, TIPC_CONN_SHUTDOWN)) 2078 tipc_link_xmit(buf, dnode, tsk->ref);
1913 tipc_link_xmit(buf, peer, 0); 2079 tipc_node_remove_conn(dnode, tsk->ref);
1914 } else { 2080 } else {
1915 tipc_port_shutdown(port->ref); 2081 dnode = tsk_peer_node(tsk);
2082 buf = tipc_msg_create(TIPC_CRITICAL_IMPORTANCE,
2083 TIPC_CONN_MSG, SHORT_H_SIZE,
2084 0, dnode, tipc_own_addr,
2085 tsk_peer_port(tsk),
2086 tsk->ref, TIPC_CONN_SHUTDOWN);
2087 tipc_link_xmit(buf, dnode, tsk->ref);
1916 } 2088 }
1917 2089 tsk->connected = 0;
1918 sock->state = SS_DISCONNECTING; 2090 sock->state = SS_DISCONNECTING;
1919 2091 tipc_node_remove_conn(dnode, tsk->ref);
1920 /* fall through */ 2092 /* fall through */
1921 2093
1922 case SS_DISCONNECTING: 2094 case SS_DISCONNECTING:
@@ -1937,6 +2109,432 @@ restart:
1937 return res; 2109 return res;
1938} 2110}
1939 2111
2112static void tipc_sk_timeout(unsigned long ref)
2113{
2114 struct tipc_sock *tsk;
2115 struct sock *sk;
2116 struct sk_buff *buf = NULL;
2117 u32 peer_port, peer_node;
2118
2119 tsk = tipc_sk_get(ref);
2120 if (!tsk)
2121 return;
2122
2123 sk = &tsk->sk;
2124 bh_lock_sock(sk);
2125 if (!tsk->connected) {
2126 bh_unlock_sock(sk);
2127 goto exit;
2128 }
2129 peer_port = tsk_peer_port(tsk);
2130 peer_node = tsk_peer_node(tsk);
2131
2132 if (tsk->probing_state == TIPC_CONN_PROBING) {
2133 /* Previous probe not answered -> self abort */
2134 buf = tipc_msg_create(TIPC_CRITICAL_IMPORTANCE, TIPC_CONN_MSG,
2135 SHORT_H_SIZE, 0, tipc_own_addr,
2136 peer_node, ref, peer_port,
2137 TIPC_ERR_NO_PORT);
2138 } else {
2139 buf = tipc_msg_create(CONN_MANAGER, CONN_PROBE, INT_H_SIZE,
2140 0, peer_node, tipc_own_addr,
2141 peer_port, ref, TIPC_OK);
2142 tsk->probing_state = TIPC_CONN_PROBING;
2143 k_start_timer(&tsk->timer, tsk->probing_interval);
2144 }
2145 bh_unlock_sock(sk);
2146 if (buf)
2147 tipc_link_xmit(buf, peer_node, ref);
2148exit:
2149 tipc_sk_put(tsk);
2150}
2151
2152static int tipc_sk_publish(struct tipc_sock *tsk, uint scope,
2153 struct tipc_name_seq const *seq)
2154{
2155 struct publication *publ;
2156 u32 key;
2157
2158 if (tsk->connected)
2159 return -EINVAL;
2160 key = tsk->ref + tsk->pub_count + 1;
2161 if (key == tsk->ref)
2162 return -EADDRINUSE;
2163
2164 publ = tipc_nametbl_publish(seq->type, seq->lower, seq->upper,
2165 scope, tsk->ref, key);
2166 if (unlikely(!publ))
2167 return -EINVAL;
2168
2169 list_add(&publ->pport_list, &tsk->publications);
2170 tsk->pub_count++;
2171 tsk->published = 1;
2172 return 0;
2173}
2174
2175static int tipc_sk_withdraw(struct tipc_sock *tsk, uint scope,
2176 struct tipc_name_seq const *seq)
2177{
2178 struct publication *publ;
2179 struct publication *safe;
2180 int rc = -EINVAL;
2181
2182 list_for_each_entry_safe(publ, safe, &tsk->publications, pport_list) {
2183 if (seq) {
2184 if (publ->scope != scope)
2185 continue;
2186 if (publ->type != seq->type)
2187 continue;
2188 if (publ->lower != seq->lower)
2189 continue;
2190 if (publ->upper != seq->upper)
2191 break;
2192 tipc_nametbl_withdraw(publ->type, publ->lower,
2193 publ->ref, publ->key);
2194 rc = 0;
2195 break;
2196 }
2197 tipc_nametbl_withdraw(publ->type, publ->lower,
2198 publ->ref, publ->key);
2199 rc = 0;
2200 }
2201 if (list_empty(&tsk->publications))
2202 tsk->published = 0;
2203 return rc;
2204}
2205
2206static int tipc_sk_show(struct tipc_sock *tsk, char *buf,
2207 int len, int full_id)
2208{
2209 struct publication *publ;
2210 int ret;
2211
2212 if (full_id)
2213 ret = tipc_snprintf(buf, len, "<%u.%u.%u:%u>:",
2214 tipc_zone(tipc_own_addr),
2215 tipc_cluster(tipc_own_addr),
2216 tipc_node(tipc_own_addr), tsk->ref);
2217 else
2218 ret = tipc_snprintf(buf, len, "%-10u:", tsk->ref);
2219
2220 if (tsk->connected) {
2221 u32 dport = tsk_peer_port(tsk);
2222 u32 destnode = tsk_peer_node(tsk);
2223
2224 ret += tipc_snprintf(buf + ret, len - ret,
2225 " connected to <%u.%u.%u:%u>",
2226 tipc_zone(destnode),
2227 tipc_cluster(destnode),
2228 tipc_node(destnode), dport);
2229 if (tsk->conn_type != 0)
2230 ret += tipc_snprintf(buf + ret, len - ret,
2231 " via {%u,%u}", tsk->conn_type,
2232 tsk->conn_instance);
2233 } else if (tsk->published) {
2234 ret += tipc_snprintf(buf + ret, len - ret, " bound to");
2235 list_for_each_entry(publ, &tsk->publications, pport_list) {
2236 if (publ->lower == publ->upper)
2237 ret += tipc_snprintf(buf + ret, len - ret,
2238 " {%u,%u}", publ->type,
2239 publ->lower);
2240 else
2241 ret += tipc_snprintf(buf + ret, len - ret,
2242 " {%u,%u,%u}", publ->type,
2243 publ->lower, publ->upper);
2244 }
2245 }
2246 ret += tipc_snprintf(buf + ret, len - ret, "\n");
2247 return ret;
2248}
2249
2250struct sk_buff *tipc_sk_socks_show(void)
2251{
2252 struct sk_buff *buf;
2253 struct tlv_desc *rep_tlv;
2254 char *pb;
2255 int pb_len;
2256 struct tipc_sock *tsk;
2257 int str_len = 0;
2258 u32 ref = 0;
2259
2260 buf = tipc_cfg_reply_alloc(TLV_SPACE(ULTRA_STRING_MAX_LEN));
2261 if (!buf)
2262 return NULL;
2263 rep_tlv = (struct tlv_desc *)buf->data;
2264 pb = TLV_DATA(rep_tlv);
2265 pb_len = ULTRA_STRING_MAX_LEN;
2266
2267 tsk = tipc_sk_get_next(&ref);
2268 for (; tsk; tsk = tipc_sk_get_next(&ref)) {
2269 lock_sock(&tsk->sk);
2270 str_len += tipc_sk_show(tsk, pb + str_len,
2271 pb_len - str_len, 0);
2272 release_sock(&tsk->sk);
2273 tipc_sk_put(tsk);
2274 }
2275 str_len += 1; /* for "\0" */
2276 skb_put(buf, TLV_SPACE(str_len));
2277 TLV_SET(rep_tlv, TIPC_TLV_ULTRA_STRING, NULL, str_len);
2278
2279 return buf;
2280}
2281
2282/* tipc_sk_reinit: set non-zero address in all existing sockets
2283 * when we go from standalone to network mode.
2284 */
2285void tipc_sk_reinit(void)
2286{
2287 struct tipc_msg *msg;
2288 u32 ref = 0;
2289 struct tipc_sock *tsk = tipc_sk_get_next(&ref);
2290
2291 for (; tsk; tsk = tipc_sk_get_next(&ref)) {
2292 lock_sock(&tsk->sk);
2293 msg = &tsk->phdr;
2294 msg_set_prevnode(msg, tipc_own_addr);
2295 msg_set_orignode(msg, tipc_own_addr);
2296 release_sock(&tsk->sk);
2297 tipc_sk_put(tsk);
2298 }
2299}
2300
2301/**
2302 * struct reference - TIPC socket reference entry
2303 * @tsk: pointer to socket associated with reference entry
2304 * @ref: reference value for socket (combines instance & array index info)
2305 */
2306struct reference {
2307 struct tipc_sock *tsk;
2308 u32 ref;
2309};
2310
2311/**
2312 * struct tipc_ref_table - table of TIPC socket reference entries
2313 * @entries: pointer to array of reference entries
2314 * @capacity: array index of first unusable entry
2315 * @init_point: array index of first uninitialized entry
2316 * @first_free: array index of first unused socket reference entry
2317 * @last_free: array index of last unused socket reference entry
2318 * @index_mask: bitmask for array index portion of reference values
2319 * @start_mask: initial value for instance value portion of reference values
2320 */
2321struct ref_table {
2322 struct reference *entries;
2323 u32 capacity;
2324 u32 init_point;
2325 u32 first_free;
2326 u32 last_free;
2327 u32 index_mask;
2328 u32 start_mask;
2329};
2330
2331/* Socket reference table consists of 2**N entries.
2332 *
2333 * State Socket ptr Reference
2334 * ----- ---------- ---------
2335 * In use non-NULL XXXX|own index
2336 * (XXXX changes each time entry is acquired)
2337 * Free NULL YYYY|next free index
2338 * (YYYY is one more than last used XXXX)
2339 * Uninitialized NULL 0
2340 *
2341 * Entry 0 is not used; this allows index 0 to denote the end of the free list.
2342 *
2343 * Note that a reference value of 0 does not necessarily indicate that an
2344 * entry is uninitialized, since the last entry in the free list could also
2345 * have a reference value of 0 (although this is unlikely).
2346 */
2347
2348static struct ref_table tipc_ref_table;
2349
2350static DEFINE_RWLOCK(ref_table_lock);
2351
2352/**
2353 * tipc_ref_table_init - create reference table for sockets
2354 */
2355int tipc_sk_ref_table_init(u32 req_sz, u32 start)
2356{
2357 struct reference *table;
2358 u32 actual_sz;
2359
2360 /* account for unused entry, then round up size to a power of 2 */
2361
2362 req_sz++;
2363 for (actual_sz = 16; actual_sz < req_sz; actual_sz <<= 1) {
2364 /* do nothing */
2365 };
2366
2367 /* allocate table & mark all entries as uninitialized */
2368 table = vzalloc(actual_sz * sizeof(struct reference));
2369 if (table == NULL)
2370 return -ENOMEM;
2371
2372 tipc_ref_table.entries = table;
2373 tipc_ref_table.capacity = req_sz;
2374 tipc_ref_table.init_point = 1;
2375 tipc_ref_table.first_free = 0;
2376 tipc_ref_table.last_free = 0;
2377 tipc_ref_table.index_mask = actual_sz - 1;
2378 tipc_ref_table.start_mask = start & ~tipc_ref_table.index_mask;
2379
2380 return 0;
2381}
2382
2383/**
2384 * tipc_ref_table_stop - destroy reference table for sockets
2385 */
2386void tipc_sk_ref_table_stop(void)
2387{
2388 if (!tipc_ref_table.entries)
2389 return;
2390 vfree(tipc_ref_table.entries);
2391 tipc_ref_table.entries = NULL;
2392}
2393
2394/* tipc_ref_acquire - create reference to a socket
2395 *
2396 * Register an socket pointer in the reference table.
2397 * Returns a unique reference value that is used from then on to retrieve the
2398 * socket pointer, or to determine if the socket has been deregistered.
2399 */
2400u32 tipc_sk_ref_acquire(struct tipc_sock *tsk)
2401{
2402 u32 index;
2403 u32 index_mask;
2404 u32 next_plus_upper;
2405 u32 ref = 0;
2406 struct reference *entry;
2407
2408 if (unlikely(!tsk)) {
2409 pr_err("Attempt to acquire ref. to non-existent obj\n");
2410 return 0;
2411 }
2412 if (unlikely(!tipc_ref_table.entries)) {
2413 pr_err("Ref. table not found in acquisition attempt\n");
2414 return 0;
2415 }
2416
2417 /* Take a free entry, if available; otherwise initialize a new one */
2418 write_lock_bh(&ref_table_lock);
2419 index = tipc_ref_table.first_free;
2420 entry = &tipc_ref_table.entries[index];
2421
2422 if (likely(index)) {
2423 index = tipc_ref_table.first_free;
2424 entry = &tipc_ref_table.entries[index];
2425 index_mask = tipc_ref_table.index_mask;
2426 next_plus_upper = entry->ref;
2427 tipc_ref_table.first_free = next_plus_upper & index_mask;
2428 ref = (next_plus_upper & ~index_mask) + index;
2429 entry->tsk = tsk;
2430 } else if (tipc_ref_table.init_point < tipc_ref_table.capacity) {
2431 index = tipc_ref_table.init_point++;
2432 entry = &tipc_ref_table.entries[index];
2433 ref = tipc_ref_table.start_mask + index;
2434 }
2435
2436 if (ref) {
2437 entry->ref = ref;
2438 entry->tsk = tsk;
2439 }
2440 write_unlock_bh(&ref_table_lock);
2441 return ref;
2442}
2443
2444/* tipc_sk_ref_discard - invalidate reference to an socket
2445 *
2446 * Disallow future references to an socket and free up the entry for re-use.
2447 */
2448void tipc_sk_ref_discard(u32 ref)
2449{
2450 struct reference *entry;
2451 u32 index;
2452 u32 index_mask;
2453
2454 if (unlikely(!tipc_ref_table.entries)) {
2455 pr_err("Ref. table not found during discard attempt\n");
2456 return;
2457 }
2458
2459 index_mask = tipc_ref_table.index_mask;
2460 index = ref & index_mask;
2461 entry = &tipc_ref_table.entries[index];
2462
2463 write_lock_bh(&ref_table_lock);
2464
2465 if (unlikely(!entry->tsk)) {
2466 pr_err("Attempt to discard ref. to non-existent socket\n");
2467 goto exit;
2468 }
2469 if (unlikely(entry->ref != ref)) {
2470 pr_err("Attempt to discard non-existent reference\n");
2471 goto exit;
2472 }
2473
2474 /* Mark entry as unused; increment instance part of entry's
2475 * reference to invalidate any subsequent references
2476 */
2477
2478 entry->tsk = NULL;
2479 entry->ref = (ref & ~index_mask) + (index_mask + 1);
2480
2481 /* Append entry to free entry list */
2482 if (unlikely(tipc_ref_table.first_free == 0))
2483 tipc_ref_table.first_free = index;
2484 else
2485 tipc_ref_table.entries[tipc_ref_table.last_free].ref |= index;
2486 tipc_ref_table.last_free = index;
2487exit:
2488 write_unlock_bh(&ref_table_lock);
2489}
2490
2491/* tipc_sk_get - find referenced socket and return pointer to it
2492 */
2493struct tipc_sock *tipc_sk_get(u32 ref)
2494{
2495 struct reference *entry;
2496 struct tipc_sock *tsk;
2497
2498 if (unlikely(!tipc_ref_table.entries))
2499 return NULL;
2500 read_lock_bh(&ref_table_lock);
2501 entry = &tipc_ref_table.entries[ref & tipc_ref_table.index_mask];
2502 tsk = entry->tsk;
2503 if (likely(tsk && (entry->ref == ref)))
2504 sock_hold(&tsk->sk);
2505 else
2506 tsk = NULL;
2507 read_unlock_bh(&ref_table_lock);
2508 return tsk;
2509}
2510
2511/* tipc_sk_get_next - lock & return next socket after referenced one
2512*/
2513struct tipc_sock *tipc_sk_get_next(u32 *ref)
2514{
2515 struct reference *entry;
2516 struct tipc_sock *tsk = NULL;
2517 uint index = *ref & tipc_ref_table.index_mask;
2518
2519 read_lock_bh(&ref_table_lock);
2520 while (++index < tipc_ref_table.capacity) {
2521 entry = &tipc_ref_table.entries[index];
2522 if (!entry->tsk)
2523 continue;
2524 tsk = entry->tsk;
2525 sock_hold(&tsk->sk);
2526 *ref = entry->ref;
2527 break;
2528 }
2529 read_unlock_bh(&ref_table_lock);
2530 return tsk;
2531}
2532
2533static void tipc_sk_put(struct tipc_sock *tsk)
2534{
2535 sock_put(&tsk->sk);
2536}
2537
1940/** 2538/**
1941 * tipc_setsockopt - set socket option 2539 * tipc_setsockopt - set socket option
1942 * @sock: socket structure 2540 * @sock: socket structure
@@ -1955,7 +2553,6 @@ static int tipc_setsockopt(struct socket *sock, int lvl, int opt,
1955{ 2553{
1956 struct sock *sk = sock->sk; 2554 struct sock *sk = sock->sk;
1957 struct tipc_sock *tsk = tipc_sk(sk); 2555 struct tipc_sock *tsk = tipc_sk(sk);
1958 struct tipc_port *port = &tsk->port;
1959 u32 value; 2556 u32 value;
1960 int res; 2557 int res;
1961 2558
@@ -1973,16 +2570,16 @@ static int tipc_setsockopt(struct socket *sock, int lvl, int opt,
1973 2570
1974 switch (opt) { 2571 switch (opt) {
1975 case TIPC_IMPORTANCE: 2572 case TIPC_IMPORTANCE:
1976 res = tipc_port_set_importance(port, value); 2573 res = tsk_set_importance(tsk, value);
1977 break; 2574 break;
1978 case TIPC_SRC_DROPPABLE: 2575 case TIPC_SRC_DROPPABLE:
1979 if (sock->type != SOCK_STREAM) 2576 if (sock->type != SOCK_STREAM)
1980 tipc_port_set_unreliable(port, value); 2577 tsk_set_unreliable(tsk, value);
1981 else 2578 else
1982 res = -ENOPROTOOPT; 2579 res = -ENOPROTOOPT;
1983 break; 2580 break;
1984 case TIPC_DEST_DROPPABLE: 2581 case TIPC_DEST_DROPPABLE:
1985 tipc_port_set_unreturnable(port, value); 2582 tsk_set_unreturnable(tsk, value);
1986 break; 2583 break;
1987 case TIPC_CONN_TIMEOUT: 2584 case TIPC_CONN_TIMEOUT:
1988 tipc_sk(sk)->conn_timeout = value; 2585 tipc_sk(sk)->conn_timeout = value;
@@ -2015,7 +2612,6 @@ static int tipc_getsockopt(struct socket *sock, int lvl, int opt,
2015{ 2612{
2016 struct sock *sk = sock->sk; 2613 struct sock *sk = sock->sk;
2017 struct tipc_sock *tsk = tipc_sk(sk); 2614 struct tipc_sock *tsk = tipc_sk(sk);
2018 struct tipc_port *port = &tsk->port;
2019 int len; 2615 int len;
2020 u32 value; 2616 u32 value;
2021 int res; 2617 int res;
@@ -2032,16 +2628,16 @@ static int tipc_getsockopt(struct socket *sock, int lvl, int opt,
2032 2628
2033 switch (opt) { 2629 switch (opt) {
2034 case TIPC_IMPORTANCE: 2630 case TIPC_IMPORTANCE:
2035 value = tipc_port_importance(port); 2631 value = tsk_importance(tsk);
2036 break; 2632 break;
2037 case TIPC_SRC_DROPPABLE: 2633 case TIPC_SRC_DROPPABLE:
2038 value = tipc_port_unreliable(port); 2634 value = tsk_unreliable(tsk);
2039 break; 2635 break;
2040 case TIPC_DEST_DROPPABLE: 2636 case TIPC_DEST_DROPPABLE:
2041 value = tipc_port_unreturnable(port); 2637 value = tsk_unreturnable(tsk);
2042 break; 2638 break;
2043 case TIPC_CONN_TIMEOUT: 2639 case TIPC_CONN_TIMEOUT:
2044 value = tipc_sk(sk)->conn_timeout; 2640 value = tsk->conn_timeout;
2045 /* no need to set "res", since already 0 at this point */ 2641 /* no need to set "res", since already 0 at this point */
2046 break; 2642 break;
2047 case TIPC_NODE_RECVQ_DEPTH: 2643 case TIPC_NODE_RECVQ_DEPTH:
diff --git a/net/tipc/socket.h b/net/tipc/socket.h
index 43b75b3ceced..baa43d03901e 100644
--- a/net/tipc/socket.h
+++ b/net/tipc/socket.h
@@ -35,56 +35,17 @@
35#ifndef _TIPC_SOCK_H 35#ifndef _TIPC_SOCK_H
36#define _TIPC_SOCK_H 36#define _TIPC_SOCK_H
37 37
38#include "port.h"
39#include <net/sock.h> 38#include <net/sock.h>
40 39
41#define TIPC_CONN_OK 0 40#define TIPC_CONNACK_INTV 256
42#define TIPC_CONN_PROBING 1 41#define TIPC_FLOWCTRL_WIN (TIPC_CONNACK_INTV * 2)
43 42#define TIPC_CONN_OVERLOAD_LIMIT ((TIPC_FLOWCTRL_WIN * 2 + 1) * \
44/** 43 SKB_TRUESIZE(TIPC_MAX_USER_MSG_SIZE))
45 * struct tipc_sock - TIPC socket structure
46 * @sk: socket - interacts with 'port' and with user via the socket API
47 * @port: port - interacts with 'sk' and with the rest of the TIPC stack
48 * @peer_name: the peer of the connection, if any
49 * @conn_timeout: the time we can wait for an unresponded setup request
50 * @dupl_rcvcnt: number of bytes counted twice, in both backlog and rcv queue
51 * @link_cong: non-zero if owner must sleep because of link congestion
52 * @sent_unacked: # messages sent by socket, and not yet acked by peer
53 * @rcv_unacked: # messages read by user, but not yet acked back to peer
54 */
55
56struct tipc_sock {
57 struct sock sk;
58 struct tipc_port port;
59 unsigned int conn_timeout;
60 atomic_t dupl_rcvcnt;
61 int link_cong;
62 uint sent_unacked;
63 uint rcv_unacked;
64};
65
66static inline struct tipc_sock *tipc_sk(const struct sock *sk)
67{
68 return container_of(sk, struct tipc_sock, sk);
69}
70
71static inline struct tipc_sock *tipc_port_to_sock(const struct tipc_port *port)
72{
73 return container_of(port, struct tipc_sock, port);
74}
75
76static inline void tipc_sock_wakeup(struct tipc_sock *tsk)
77{
78 tsk->sk.sk_write_space(&tsk->sk);
79}
80
81static inline int tipc_sk_conn_cong(struct tipc_sock *tsk)
82{
83 return tsk->sent_unacked >= TIPC_FLOWCTRL_WIN;
84}
85
86int tipc_sk_rcv(struct sk_buff *buf); 44int tipc_sk_rcv(struct sk_buff *buf);
87 45struct sk_buff *tipc_sk_socks_show(void);
88void tipc_sk_mcast_rcv(struct sk_buff *buf); 46void tipc_sk_mcast_rcv(struct sk_buff *buf);
47void tipc_sk_reinit(void);
48int tipc_sk_ref_table_init(u32 requested_size, u32 start);
49void tipc_sk_ref_table_stop(void);
89 50
90#endif 51#endif
diff --git a/net/tipc/subscr.c b/net/tipc/subscr.c
index 642437231ad5..31b5cb232a43 100644
--- a/net/tipc/subscr.c
+++ b/net/tipc/subscr.c
@@ -36,7 +36,6 @@
36 36
37#include "core.h" 37#include "core.h"
38#include "name_table.h" 38#include "name_table.h"
39#include "port.h"
40#include "subscr.h" 39#include "subscr.h"
41 40
42/** 41/**
diff --git a/net/tipc/sysctl.c b/net/tipc/sysctl.c
index f3fef93325a8..1a779b1e8510 100644
--- a/net/tipc/sysctl.c
+++ b/net/tipc/sysctl.c
@@ -47,6 +47,13 @@ static struct ctl_table tipc_table[] = {
47 .mode = 0644, 47 .mode = 0644,
48 .proc_handler = proc_dointvec, 48 .proc_handler = proc_dointvec,
49 }, 49 },
50 {
51 .procname = "named_timeout",
52 .data = &sysctl_tipc_named_timeout,
53 .maxlen = sizeof(sysctl_tipc_named_timeout),
54 .mode = 0644,
55 .proc_handler = proc_dointvec,
56 },
50 {} 57 {}
51}; 58};
52 59
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 9bc73f87f64a..99f7012b23b9 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -258,7 +258,7 @@ static void inc_inflight_move_tail(struct unix_sock *u)
258 list_move_tail(&u->link, &gc_candidates); 258 list_move_tail(&u->link, &gc_candidates);
259} 259}
260 260
261static bool gc_in_progress = false; 261static bool gc_in_progress;
262#define UNIX_INFLIGHT_TRIGGER_GC 16000 262#define UNIX_INFLIGHT_TRIGGER_GC 16000
263 263
264void wait_for_unix_gc(void) 264void wait_for_unix_gc(void)
diff --git a/net/wimax/id-table.c b/net/wimax/id-table.c
index 72273abfcb16..a21508d11036 100644
--- a/net/wimax/id-table.c
+++ b/net/wimax/id-table.c
@@ -137,7 +137,7 @@ void wimax_id_table_release(void)
137#endif 137#endif
138 spin_lock(&wimax_id_table_lock); 138 spin_lock(&wimax_id_table_lock);
139 list_for_each_entry(wimax_dev, &wimax_id_table, id_table_node) { 139 list_for_each_entry(wimax_dev, &wimax_id_table, id_table_node) {
140 printk(KERN_ERR "BUG: %s wimax_dev %p ifindex %d not cleared\n", 140 pr_err("BUG: %s wimax_dev %p ifindex %d not cleared\n",
141 __func__, wimax_dev, wimax_dev->net_dev->ifindex); 141 __func__, wimax_dev, wimax_dev->net_dev->ifindex);
142 WARN_ON(1); 142 WARN_ON(1);
143 } 143 }
diff --git a/net/wimax/op-msg.c b/net/wimax/op-msg.c
index c278b3356f75..54aa146930bd 100644
--- a/net/wimax/op-msg.c
+++ b/net/wimax/op-msg.c
@@ -189,7 +189,7 @@ const void *wimax_msg_data_len(struct sk_buff *msg, size_t *size)
189 nla = nlmsg_find_attr(nlh, sizeof(struct genlmsghdr), 189 nla = nlmsg_find_attr(nlh, sizeof(struct genlmsghdr),
190 WIMAX_GNL_MSG_DATA); 190 WIMAX_GNL_MSG_DATA);
191 if (nla == NULL) { 191 if (nla == NULL) {
192 printk(KERN_ERR "Cannot find attribute WIMAX_GNL_MSG_DATA\n"); 192 pr_err("Cannot find attribute WIMAX_GNL_MSG_DATA\n");
193 return NULL; 193 return NULL;
194 } 194 }
195 *size = nla_len(nla); 195 *size = nla_len(nla);
@@ -211,7 +211,7 @@ const void *wimax_msg_data(struct sk_buff *msg)
211 nla = nlmsg_find_attr(nlh, sizeof(struct genlmsghdr), 211 nla = nlmsg_find_attr(nlh, sizeof(struct genlmsghdr),
212 WIMAX_GNL_MSG_DATA); 212 WIMAX_GNL_MSG_DATA);
213 if (nla == NULL) { 213 if (nla == NULL) {
214 printk(KERN_ERR "Cannot find attribute WIMAX_GNL_MSG_DATA\n"); 214 pr_err("Cannot find attribute WIMAX_GNL_MSG_DATA\n");
215 return NULL; 215 return NULL;
216 } 216 }
217 return nla_data(nla); 217 return nla_data(nla);
@@ -232,7 +232,7 @@ ssize_t wimax_msg_len(struct sk_buff *msg)
232 nla = nlmsg_find_attr(nlh, sizeof(struct genlmsghdr), 232 nla = nlmsg_find_attr(nlh, sizeof(struct genlmsghdr),
233 WIMAX_GNL_MSG_DATA); 233 WIMAX_GNL_MSG_DATA);
234 if (nla == NULL) { 234 if (nla == NULL) {
235 printk(KERN_ERR "Cannot find attribute WIMAX_GNL_MSG_DATA\n"); 235 pr_err("Cannot find attribute WIMAX_GNL_MSG_DATA\n");
236 return -EINVAL; 236 return -EINVAL;
237 } 237 }
238 return nla_len(nla); 238 return nla_len(nla);
@@ -343,8 +343,7 @@ int wimax_gnl_doit_msg_from_user(struct sk_buff *skb, struct genl_info *info)
343 d_fnstart(3, NULL, "(skb %p info %p)\n", skb, info); 343 d_fnstart(3, NULL, "(skb %p info %p)\n", skb, info);
344 result = -ENODEV; 344 result = -ENODEV;
345 if (info->attrs[WIMAX_GNL_MSG_IFIDX] == NULL) { 345 if (info->attrs[WIMAX_GNL_MSG_IFIDX] == NULL) {
346 printk(KERN_ERR "WIMAX_GNL_MSG_FROM_USER: can't find IFIDX " 346 pr_err("WIMAX_GNL_MSG_FROM_USER: can't find IFIDX attribute\n");
347 "attribute\n");
348 goto error_no_wimax_dev; 347 goto error_no_wimax_dev;
349 } 348 }
350 ifindex = nla_get_u32(info->attrs[WIMAX_GNL_MSG_IFIDX]); 349 ifindex = nla_get_u32(info->attrs[WIMAX_GNL_MSG_IFIDX]);
diff --git a/net/wimax/op-reset.c b/net/wimax/op-reset.c
index eb4580784d9d..a42079165e1f 100644
--- a/net/wimax/op-reset.c
+++ b/net/wimax/op-reset.c
@@ -107,8 +107,7 @@ int wimax_gnl_doit_reset(struct sk_buff *skb, struct genl_info *info)
107 d_fnstart(3, NULL, "(skb %p info %p)\n", skb, info); 107 d_fnstart(3, NULL, "(skb %p info %p)\n", skb, info);
108 result = -ENODEV; 108 result = -ENODEV;
109 if (info->attrs[WIMAX_GNL_RESET_IFIDX] == NULL) { 109 if (info->attrs[WIMAX_GNL_RESET_IFIDX] == NULL) {
110 printk(KERN_ERR "WIMAX_GNL_OP_RFKILL: can't find IFIDX " 110 pr_err("WIMAX_GNL_OP_RFKILL: can't find IFIDX attribute\n");
111 "attribute\n");
112 goto error_no_wimax_dev; 111 goto error_no_wimax_dev;
113 } 112 }
114 ifindex = nla_get_u32(info->attrs[WIMAX_GNL_RESET_IFIDX]); 113 ifindex = nla_get_u32(info->attrs[WIMAX_GNL_RESET_IFIDX]);
diff --git a/net/wimax/op-rfkill.c b/net/wimax/op-rfkill.c
index 403078d670a9..7d730543f243 100644
--- a/net/wimax/op-rfkill.c
+++ b/net/wimax/op-rfkill.c
@@ -421,8 +421,7 @@ int wimax_gnl_doit_rfkill(struct sk_buff *skb, struct genl_info *info)
421 d_fnstart(3, NULL, "(skb %p info %p)\n", skb, info); 421 d_fnstart(3, NULL, "(skb %p info %p)\n", skb, info);
422 result = -ENODEV; 422 result = -ENODEV;
423 if (info->attrs[WIMAX_GNL_RFKILL_IFIDX] == NULL) { 423 if (info->attrs[WIMAX_GNL_RFKILL_IFIDX] == NULL) {
424 printk(KERN_ERR "WIMAX_GNL_OP_RFKILL: can't find IFIDX " 424 pr_err("WIMAX_GNL_OP_RFKILL: can't find IFIDX attribute\n");
425 "attribute\n");
426 goto error_no_wimax_dev; 425 goto error_no_wimax_dev;
427 } 426 }
428 ifindex = nla_get_u32(info->attrs[WIMAX_GNL_RFKILL_IFIDX]); 427 ifindex = nla_get_u32(info->attrs[WIMAX_GNL_RFKILL_IFIDX]);
diff --git a/net/wimax/op-state-get.c b/net/wimax/op-state-get.c
index 995c08c827b5..e6788d281d0e 100644
--- a/net/wimax/op-state-get.c
+++ b/net/wimax/op-state-get.c
@@ -49,8 +49,7 @@ int wimax_gnl_doit_state_get(struct sk_buff *skb, struct genl_info *info)
49 d_fnstart(3, NULL, "(skb %p info %p)\n", skb, info); 49 d_fnstart(3, NULL, "(skb %p info %p)\n", skb, info);
50 result = -ENODEV; 50 result = -ENODEV;
51 if (info->attrs[WIMAX_GNL_STGET_IFIDX] == NULL) { 51 if (info->attrs[WIMAX_GNL_STGET_IFIDX] == NULL) {
52 printk(KERN_ERR "WIMAX_GNL_OP_STATE_GET: can't find IFIDX " 52 pr_err("WIMAX_GNL_OP_STATE_GET: can't find IFIDX attribute\n");
53 "attribute\n");
54 goto error_no_wimax_dev; 53 goto error_no_wimax_dev;
55 } 54 }
56 ifindex = nla_get_u32(info->attrs[WIMAX_GNL_STGET_IFIDX]); 55 ifindex = nla_get_u32(info->attrs[WIMAX_GNL_STGET_IFIDX]);
diff --git a/net/wimax/stack.c b/net/wimax/stack.c
index ec8b577db135..3f816e2971ee 100644
--- a/net/wimax/stack.c
+++ b/net/wimax/stack.c
@@ -191,8 +191,8 @@ void __check_new_state(enum wimax_st old_state, enum wimax_st new_state,
191 unsigned int allowed_states_bm) 191 unsigned int allowed_states_bm)
192{ 192{
193 if (WARN_ON(((1 << new_state) & allowed_states_bm) == 0)) { 193 if (WARN_ON(((1 << new_state) & allowed_states_bm) == 0)) {
194 printk(KERN_ERR "SW BUG! Forbidden state change %u -> %u\n", 194 pr_err("SW BUG! Forbidden state change %u -> %u\n",
195 old_state, new_state); 195 old_state, new_state);
196 } 196 }
197} 197}
198 198
@@ -602,8 +602,7 @@ int __init wimax_subsys_init(void)
602 wimax_gnl_ops, 602 wimax_gnl_ops,
603 wimax_gnl_mcgrps); 603 wimax_gnl_mcgrps);
604 if (unlikely(result < 0)) { 604 if (unlikely(result < 0)) {
605 printk(KERN_ERR "cannot register generic netlink family: %d\n", 605 pr_err("cannot register generic netlink family: %d\n", result);
606 result);
607 goto error_register_family; 606 goto error_register_family;
608 } 607 }
609 608
diff --git a/net/wimax/wimax-internal.h b/net/wimax/wimax-internal.h
index b445b82020a8..733c4bf8d4b3 100644
--- a/net/wimax/wimax-internal.h
+++ b/net/wimax/wimax-internal.h
@@ -30,6 +30,12 @@
30#define __WIMAX_INTERNAL_H__ 30#define __WIMAX_INTERNAL_H__
31#ifdef __KERNEL__ 31#ifdef __KERNEL__
32 32
33#ifdef pr_fmt
34#undef pr_fmt
35#endif
36
37#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
38
33#include <linux/device.h> 39#include <linux/device.h>
34#include <net/wimax.h> 40#include <net/wimax.h>
35 41
diff --git a/net/wireless/chan.c b/net/wireless/chan.c
index 992b34070bcb..72d81e2154d5 100644
--- a/net/wireless/chan.c
+++ b/net/wireless/chan.c
@@ -4,6 +4,7 @@
4 * any point in time. 4 * any point in time.
5 * 5 *
6 * Copyright 2009 Johannes Berg <johannes@sipsolutions.net> 6 * Copyright 2009 Johannes Berg <johannes@sipsolutions.net>
7 * Copyright 2013-2014 Intel Mobile Communications GmbH
7 */ 8 */
8 9
9#include <linux/export.h> 10#include <linux/export.h>
diff --git a/net/wireless/core.c b/net/wireless/core.c
index afee5e0455ea..f52a4cd7017c 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -2,6 +2,7 @@
2 * This is the linux wireless configuration interface. 2 * This is the linux wireless configuration interface.
3 * 3 *
4 * Copyright 2006-2010 Johannes Berg <johannes@sipsolutions.net> 4 * Copyright 2006-2010 Johannes Berg <johannes@sipsolutions.net>
5 * Copyright 2013-2014 Intel Mobile Communications GmbH
5 */ 6 */
6 7
7#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 8#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -492,12 +493,6 @@ int wiphy_register(struct wiphy *wiphy)
492 int i; 493 int i;
493 u16 ifmodes = wiphy->interface_modes; 494 u16 ifmodes = wiphy->interface_modes;
494 495
495 /*
496 * There are major locking problems in nl80211/mac80211 for CSA,
497 * disable for all drivers until this has been reworked.
498 */
499 wiphy->flags &= ~WIPHY_FLAG_HAS_CHANNEL_SWITCH;
500
501#ifdef CONFIG_PM 496#ifdef CONFIG_PM
502 if (WARN_ON(wiphy->wowlan && 497 if (WARN_ON(wiphy->wowlan &&
503 (wiphy->wowlan->flags & WIPHY_WOWLAN_GTK_REKEY_FAILURE) && 498 (wiphy->wowlan->flags & WIPHY_WOWLAN_GTK_REKEY_FAILURE) &&
@@ -635,6 +630,9 @@ int wiphy_register(struct wiphy *wiphy)
635 if (IS_ERR(rdev->wiphy.debugfsdir)) 630 if (IS_ERR(rdev->wiphy.debugfsdir))
636 rdev->wiphy.debugfsdir = NULL; 631 rdev->wiphy.debugfsdir = NULL;
637 632
633 cfg80211_debugfs_rdev_add(rdev);
634 nl80211_notify_wiphy(rdev, NL80211_CMD_NEW_WIPHY);
635
638 if (wiphy->regulatory_flags & REGULATORY_CUSTOM_REG) { 636 if (wiphy->regulatory_flags & REGULATORY_CUSTOM_REG) {
639 struct regulatory_request request; 637 struct regulatory_request request;
640 638
@@ -646,8 +644,6 @@ int wiphy_register(struct wiphy *wiphy)
646 nl80211_send_reg_change_event(&request); 644 nl80211_send_reg_change_event(&request);
647 } 645 }
648 646
649 cfg80211_debugfs_rdev_add(rdev);
650
651 rdev->wiphy.registered = true; 647 rdev->wiphy.registered = true;
652 rtnl_unlock(); 648 rtnl_unlock();
653 649
@@ -659,8 +655,6 @@ int wiphy_register(struct wiphy *wiphy)
659 return res; 655 return res;
660 } 656 }
661 657
662 nl80211_notify_wiphy(rdev, NL80211_CMD_NEW_WIPHY);
663
664 return 0; 658 return 0;
665} 659}
666EXPORT_SYMBOL(wiphy_register); 660EXPORT_SYMBOL(wiphy_register);
@@ -1012,7 +1006,7 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
1012 rdev->devlist_generation++; 1006 rdev->devlist_generation++;
1013 cfg80211_mlme_purge_registrations(wdev); 1007 cfg80211_mlme_purge_registrations(wdev);
1014#ifdef CONFIG_CFG80211_WEXT 1008#ifdef CONFIG_CFG80211_WEXT
1015 kfree(wdev->wext.keys); 1009 kzfree(wdev->wext.keys);
1016#endif 1010#endif
1017 } 1011 }
1018 /* 1012 /*
diff --git a/net/wireless/ibss.c b/net/wireless/ibss.c
index 8f345da3ea5f..e24fc585c883 100644
--- a/net/wireless/ibss.c
+++ b/net/wireless/ibss.c
@@ -115,7 +115,7 @@ static int __cfg80211_join_ibss(struct cfg80211_registered_device *rdev,
115 } 115 }
116 116
117 if (WARN_ON(wdev->connect_keys)) 117 if (WARN_ON(wdev->connect_keys))
118 kfree(wdev->connect_keys); 118 kzfree(wdev->connect_keys);
119 wdev->connect_keys = connkeys; 119 wdev->connect_keys = connkeys;
120 120
121 wdev->ibss_fixed = params->channel_fixed; 121 wdev->ibss_fixed = params->channel_fixed;
@@ -161,7 +161,7 @@ static void __cfg80211_clear_ibss(struct net_device *dev, bool nowext)
161 161
162 ASSERT_WDEV_LOCK(wdev); 162 ASSERT_WDEV_LOCK(wdev);
163 163
164 kfree(wdev->connect_keys); 164 kzfree(wdev->connect_keys);
165 wdev->connect_keys = NULL; 165 wdev->connect_keys = NULL;
166 166
167 rdev_set_qos_map(rdev, dev, NULL); 167 rdev_set_qos_map(rdev, dev, NULL);
diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
index 266766b8d80b..2c52b59e43f3 100644
--- a/net/wireless/mlme.c
+++ b/net/wireless/mlme.c
@@ -19,7 +19,7 @@
19 19
20 20
21void cfg80211_rx_assoc_resp(struct net_device *dev, struct cfg80211_bss *bss, 21void cfg80211_rx_assoc_resp(struct net_device *dev, struct cfg80211_bss *bss,
22 const u8 *buf, size_t len) 22 const u8 *buf, size_t len, int uapsd_queues)
23{ 23{
24 struct wireless_dev *wdev = dev->ieee80211_ptr; 24 struct wireless_dev *wdev = dev->ieee80211_ptr;
25 struct wiphy *wiphy = wdev->wiphy; 25 struct wiphy *wiphy = wdev->wiphy;
@@ -43,7 +43,7 @@ void cfg80211_rx_assoc_resp(struct net_device *dev, struct cfg80211_bss *bss,
43 return; 43 return;
44 } 44 }
45 45
46 nl80211_send_rx_assoc(rdev, dev, buf, len, GFP_KERNEL); 46 nl80211_send_rx_assoc(rdev, dev, buf, len, GFP_KERNEL, uapsd_queues);
47 /* update current_bss etc., consumes the bss reference */ 47 /* update current_bss etc., consumes the bss reference */
48 __cfg80211_connect_result(dev, mgmt->bssid, NULL, 0, ie, len - ieoffs, 48 __cfg80211_connect_result(dev, mgmt->bssid, NULL, 0, ie, len - ieoffs,
49 status_code, 49 status_code,
@@ -605,7 +605,7 @@ int cfg80211_mlme_mgmt_tx(struct cfg80211_registered_device *rdev,
605} 605}
606 606
607bool cfg80211_rx_mgmt(struct wireless_dev *wdev, int freq, int sig_mbm, 607bool cfg80211_rx_mgmt(struct wireless_dev *wdev, int freq, int sig_mbm,
608 const u8 *buf, size_t len, u32 flags, gfp_t gfp) 608 const u8 *buf, size_t len, u32 flags)
609{ 609{
610 struct wiphy *wiphy = wdev->wiphy; 610 struct wiphy *wiphy = wdev->wiphy;
611 struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy); 611 struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy);
@@ -648,7 +648,7 @@ bool cfg80211_rx_mgmt(struct wireless_dev *wdev, int freq, int sig_mbm,
648 /* Indicate the received Action frame to user space */ 648 /* Indicate the received Action frame to user space */
649 if (nl80211_send_mgmt(rdev, wdev, reg->nlportid, 649 if (nl80211_send_mgmt(rdev, wdev, reg->nlportid,
650 freq, sig_mbm, 650 freq, sig_mbm,
651 buf, len, flags, gfp)) 651 buf, len, flags, GFP_ATOMIC))
652 continue; 652 continue;
653 653
654 result = true; 654 result = true;
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 7257164af91b..cb9f5a44ffad 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -2,6 +2,7 @@
2 * This is the new netlink-based wireless configuration interface. 2 * This is the new netlink-based wireless configuration interface.
3 * 3 *
4 * Copyright 2006-2010 Johannes Berg <johannes@sipsolutions.net> 4 * Copyright 2006-2010 Johannes Berg <johannes@sipsolutions.net>
5 * Copyright 2013-2014 Intel Mobile Communications GmbH
5 */ 6 */
6 7
7#include <linux/if.h> 8#include <linux/if.h>
@@ -225,6 +226,7 @@ static const struct nla_policy nl80211_policy[NL80211_ATTR_MAX+1] = {
225 [NL80211_ATTR_WIPHY_FRAG_THRESHOLD] = { .type = NLA_U32 }, 226 [NL80211_ATTR_WIPHY_FRAG_THRESHOLD] = { .type = NLA_U32 },
226 [NL80211_ATTR_WIPHY_RTS_THRESHOLD] = { .type = NLA_U32 }, 227 [NL80211_ATTR_WIPHY_RTS_THRESHOLD] = { .type = NLA_U32 },
227 [NL80211_ATTR_WIPHY_COVERAGE_CLASS] = { .type = NLA_U8 }, 228 [NL80211_ATTR_WIPHY_COVERAGE_CLASS] = { .type = NLA_U8 },
229 [NL80211_ATTR_WIPHY_DYN_ACK] = { .type = NLA_FLAG },
228 230
229 [NL80211_ATTR_IFTYPE] = { .type = NLA_U32 }, 231 [NL80211_ATTR_IFTYPE] = { .type = NLA_U32 },
230 [NL80211_ATTR_IFINDEX] = { .type = NLA_U32 }, 232 [NL80211_ATTR_IFINDEX] = { .type = NLA_U32 },
@@ -388,6 +390,11 @@ static const struct nla_policy nl80211_policy[NL80211_ATTR_MAX+1] = {
388 [NL80211_ATTR_TDLS_PEER_CAPABILITY] = { .type = NLA_U32 }, 390 [NL80211_ATTR_TDLS_PEER_CAPABILITY] = { .type = NLA_U32 },
389 [NL80211_ATTR_IFACE_SOCKET_OWNER] = { .type = NLA_FLAG }, 391 [NL80211_ATTR_IFACE_SOCKET_OWNER] = { .type = NLA_FLAG },
390 [NL80211_ATTR_CSA_C_OFFSETS_TX] = { .type = NLA_BINARY }, 392 [NL80211_ATTR_CSA_C_OFFSETS_TX] = { .type = NLA_BINARY },
393 [NL80211_ATTR_USE_RRM] = { .type = NLA_FLAG },
394 [NL80211_ATTR_TSID] = { .type = NLA_U8 },
395 [NL80211_ATTR_USER_PRIO] = { .type = NLA_U8 },
396 [NL80211_ATTR_ADMITTED_TIME] = { .type = NLA_U16 },
397 [NL80211_ATTR_SMPS_MODE] = { .type = NLA_U8 },
391}; 398};
392 399
393/* policy for the key attributes */ 400/* policy for the key attributes */
@@ -1507,6 +1514,9 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
1507 if (rdev->wiphy.flags & WIPHY_FLAG_HAS_CHANNEL_SWITCH) 1514 if (rdev->wiphy.flags & WIPHY_FLAG_HAS_CHANNEL_SWITCH)
1508 CMD(channel_switch, CHANNEL_SWITCH); 1515 CMD(channel_switch, CHANNEL_SWITCH);
1509 CMD(set_qos_map, SET_QOS_MAP); 1516 CMD(set_qos_map, SET_QOS_MAP);
1517 if (rdev->wiphy.flags &
1518 WIPHY_FLAG_SUPPORTS_WMM_ADMISSION)
1519 CMD(add_tx_ts, ADD_TX_TS);
1510 } 1520 }
1511 /* add into the if now */ 1521 /* add into the if now */
1512#undef CMD 1522#undef CMD
@@ -2237,11 +2247,21 @@ static int nl80211_set_wiphy(struct sk_buff *skb, struct genl_info *info)
2237 } 2247 }
2238 2248
2239 if (info->attrs[NL80211_ATTR_WIPHY_COVERAGE_CLASS]) { 2249 if (info->attrs[NL80211_ATTR_WIPHY_COVERAGE_CLASS]) {
2250 if (info->attrs[NL80211_ATTR_WIPHY_DYN_ACK])
2251 return -EINVAL;
2252
2240 coverage_class = nla_get_u8( 2253 coverage_class = nla_get_u8(
2241 info->attrs[NL80211_ATTR_WIPHY_COVERAGE_CLASS]); 2254 info->attrs[NL80211_ATTR_WIPHY_COVERAGE_CLASS]);
2242 changed |= WIPHY_PARAM_COVERAGE_CLASS; 2255 changed |= WIPHY_PARAM_COVERAGE_CLASS;
2243 } 2256 }
2244 2257
2258 if (info->attrs[NL80211_ATTR_WIPHY_DYN_ACK]) {
2259 if (!(rdev->wiphy.features & NL80211_FEATURE_ACKTO_ESTIMATION))
2260 return -EOPNOTSUPP;
2261
2262 changed |= WIPHY_PARAM_DYN_ACK;
2263 }
2264
2245 if (changed) { 2265 if (changed) {
2246 u8 old_retry_short, old_retry_long; 2266 u8 old_retry_short, old_retry_long;
2247 u32 old_frag_threshold, old_rts_threshold; 2267 u32 old_frag_threshold, old_rts_threshold;
@@ -3326,6 +3346,29 @@ static int nl80211_start_ap(struct sk_buff *skb, struct genl_info *info)
3326 return PTR_ERR(params.acl); 3346 return PTR_ERR(params.acl);
3327 } 3347 }
3328 3348
3349 if (info->attrs[NL80211_ATTR_SMPS_MODE]) {
3350 params.smps_mode =
3351 nla_get_u8(info->attrs[NL80211_ATTR_SMPS_MODE]);
3352 switch (params.smps_mode) {
3353 case NL80211_SMPS_OFF:
3354 break;
3355 case NL80211_SMPS_STATIC:
3356 if (!(rdev->wiphy.features &
3357 NL80211_FEATURE_STATIC_SMPS))
3358 return -EINVAL;
3359 break;
3360 case NL80211_SMPS_DYNAMIC:
3361 if (!(rdev->wiphy.features &
3362 NL80211_FEATURE_DYNAMIC_SMPS))
3363 return -EINVAL;
3364 break;
3365 default:
3366 return -EINVAL;
3367 }
3368 } else {
3369 params.smps_mode = NL80211_SMPS_OFF;
3370 }
3371
3329 wdev_lock(wdev); 3372 wdev_lock(wdev);
3330 err = rdev_start_ap(rdev, dev, &params); 3373 err = rdev_start_ap(rdev, dev, &params);
3331 if (!err) { 3374 if (!err) {
@@ -6033,7 +6076,6 @@ static int nl80211_send_bss(struct sk_buff *msg, struct netlink_callback *cb,
6033 const struct cfg80211_bss_ies *ies; 6076 const struct cfg80211_bss_ies *ies;
6034 void *hdr; 6077 void *hdr;
6035 struct nlattr *bss; 6078 struct nlattr *bss;
6036 bool tsf = false;
6037 6079
6038 ASSERT_WDEV_LOCK(wdev); 6080 ASSERT_WDEV_LOCK(wdev);
6039 6081
@@ -6060,18 +6102,27 @@ static int nl80211_send_bss(struct sk_buff *msg, struct netlink_callback *cb,
6060 goto nla_put_failure; 6102 goto nla_put_failure;
6061 6103
6062 rcu_read_lock(); 6104 rcu_read_lock();
6105 /* indicate whether we have probe response data or not */
6106 if (rcu_access_pointer(res->proberesp_ies) &&
6107 nla_put_flag(msg, NL80211_BSS_PRESP_DATA))
6108 goto fail_unlock_rcu;
6109
6110 /* this pointer prefers to be pointed to probe response data
6111 * but is always valid
6112 */
6063 ies = rcu_dereference(res->ies); 6113 ies = rcu_dereference(res->ies);
6064 if (ies) { 6114 if (ies) {
6065 if (nla_put_u64(msg, NL80211_BSS_TSF, ies->tsf)) 6115 if (nla_put_u64(msg, NL80211_BSS_TSF, ies->tsf))
6066 goto fail_unlock_rcu; 6116 goto fail_unlock_rcu;
6067 tsf = true;
6068 if (ies->len && nla_put(msg, NL80211_BSS_INFORMATION_ELEMENTS, 6117 if (ies->len && nla_put(msg, NL80211_BSS_INFORMATION_ELEMENTS,
6069 ies->len, ies->data)) 6118 ies->len, ies->data))
6070 goto fail_unlock_rcu; 6119 goto fail_unlock_rcu;
6071 } 6120 }
6121
6122 /* and this pointer is always (unless driver didn't know) beacon data */
6072 ies = rcu_dereference(res->beacon_ies); 6123 ies = rcu_dereference(res->beacon_ies);
6073 if (ies) { 6124 if (ies && ies->from_beacon) {
6074 if (!tsf && nla_put_u64(msg, NL80211_BSS_TSF, ies->tsf)) 6125 if (nla_put_u64(msg, NL80211_BSS_BEACON_TSF, ies->tsf))
6075 goto fail_unlock_rcu; 6126 goto fail_unlock_rcu;
6076 if (ies->len && nla_put(msg, NL80211_BSS_BEACON_IES, 6127 if (ies->len && nla_put(msg, NL80211_BSS_BEACON_IES,
6077 ies->len, ies->data)) 6128 ies->len, ies->data))
@@ -6575,6 +6626,14 @@ static int nl80211_associate(struct sk_buff *skb, struct genl_info *info)
6575 sizeof(req.vht_capa)); 6626 sizeof(req.vht_capa));
6576 } 6627 }
6577 6628
6629 if (nla_get_flag(info->attrs[NL80211_ATTR_USE_RRM])) {
6630 if (!(rdev->wiphy.features &
6631 NL80211_FEATURE_DS_PARAM_SET_IE_IN_PROBES) ||
6632 !(rdev->wiphy.features & NL80211_FEATURE_QUIET))
6633 return -EINVAL;
6634 req.flags |= ASSOC_REQ_USE_RRM;
6635 }
6636
6578 err = nl80211_crypto_settings(rdev, info, &req.crypto, 1); 6637 err = nl80211_crypto_settings(rdev, info, &req.crypto, 1);
6579 if (!err) { 6638 if (!err) {
6580 wdev_lock(dev->ieee80211_ptr); 6639 wdev_lock(dev->ieee80211_ptr);
@@ -6837,7 +6896,7 @@ static int nl80211_join_ibss(struct sk_buff *skb, struct genl_info *info)
6837 6896
6838 err = cfg80211_join_ibss(rdev, dev, &ibss, connkeys); 6897 err = cfg80211_join_ibss(rdev, dev, &ibss, connkeys);
6839 if (err) 6898 if (err)
6840 kfree(connkeys); 6899 kzfree(connkeys);
6841 return err; 6900 return err;
6842} 6901}
6843 6902
@@ -7209,7 +7268,7 @@ static int nl80211_connect(struct sk_buff *skb, struct genl_info *info)
7209 7268
7210 if (info->attrs[NL80211_ATTR_HT_CAPABILITY]) { 7269 if (info->attrs[NL80211_ATTR_HT_CAPABILITY]) {
7211 if (!info->attrs[NL80211_ATTR_HT_CAPABILITY_MASK]) { 7270 if (!info->attrs[NL80211_ATTR_HT_CAPABILITY_MASK]) {
7212 kfree(connkeys); 7271 kzfree(connkeys);
7213 return -EINVAL; 7272 return -EINVAL;
7214 } 7273 }
7215 memcpy(&connect.ht_capa, 7274 memcpy(&connect.ht_capa,
@@ -7227,7 +7286,7 @@ static int nl80211_connect(struct sk_buff *skb, struct genl_info *info)
7227 7286
7228 if (info->attrs[NL80211_ATTR_VHT_CAPABILITY]) { 7287 if (info->attrs[NL80211_ATTR_VHT_CAPABILITY]) {
7229 if (!info->attrs[NL80211_ATTR_VHT_CAPABILITY_MASK]) { 7288 if (!info->attrs[NL80211_ATTR_VHT_CAPABILITY_MASK]) {
7230 kfree(connkeys); 7289 kzfree(connkeys);
7231 return -EINVAL; 7290 return -EINVAL;
7232 } 7291 }
7233 memcpy(&connect.vht_capa, 7292 memcpy(&connect.vht_capa,
@@ -7235,11 +7294,19 @@ static int nl80211_connect(struct sk_buff *skb, struct genl_info *info)
7235 sizeof(connect.vht_capa)); 7294 sizeof(connect.vht_capa));
7236 } 7295 }
7237 7296
7297 if (nla_get_flag(info->attrs[NL80211_ATTR_USE_RRM])) {
7298 if (!(rdev->wiphy.features &
7299 NL80211_FEATURE_DS_PARAM_SET_IE_IN_PROBES) ||
7300 !(rdev->wiphy.features & NL80211_FEATURE_QUIET))
7301 return -EINVAL;
7302 connect.flags |= ASSOC_REQ_USE_RRM;
7303 }
7304
7238 wdev_lock(dev->ieee80211_ptr); 7305 wdev_lock(dev->ieee80211_ptr);
7239 err = cfg80211_connect(rdev, dev, &connect, connkeys, NULL); 7306 err = cfg80211_connect(rdev, dev, &connect, connkeys, NULL);
7240 wdev_unlock(dev->ieee80211_ptr); 7307 wdev_unlock(dev->ieee80211_ptr);
7241 if (err) 7308 if (err)
7242 kfree(connkeys); 7309 kzfree(connkeys);
7243 return err; 7310 return err;
7244} 7311}
7245 7312
@@ -8925,13 +8992,9 @@ static int nl80211_set_rekey_data(struct sk_buff *skb, struct genl_info *info)
8925 if (nla_len(tb[NL80211_REKEY_DATA_KCK]) != NL80211_KCK_LEN) 8992 if (nla_len(tb[NL80211_REKEY_DATA_KCK]) != NL80211_KCK_LEN)
8926 return -ERANGE; 8993 return -ERANGE;
8927 8994
8928 memcpy(rekey_data.kek, nla_data(tb[NL80211_REKEY_DATA_KEK]), 8995 rekey_data.kek = nla_data(tb[NL80211_REKEY_DATA_KEK]);
8929 NL80211_KEK_LEN); 8996 rekey_data.kck = nla_data(tb[NL80211_REKEY_DATA_KCK]);
8930 memcpy(rekey_data.kck, nla_data(tb[NL80211_REKEY_DATA_KCK]), 8997 rekey_data.replay_ctr = nla_data(tb[NL80211_REKEY_DATA_REPLAY_CTR]);
8931 NL80211_KCK_LEN);
8932 memcpy(rekey_data.replay_ctr,
8933 nla_data(tb[NL80211_REKEY_DATA_REPLAY_CTR]),
8934 NL80211_REPLAY_CTR_LEN);
8935 8998
8936 wdev_lock(wdev); 8999 wdev_lock(wdev);
8937 if (!wdev->current_bss) { 9000 if (!wdev->current_bss) {
@@ -9363,6 +9426,93 @@ static int nl80211_set_qos_map(struct sk_buff *skb,
9363 return ret; 9426 return ret;
9364} 9427}
9365 9428
9429static int nl80211_add_tx_ts(struct sk_buff *skb, struct genl_info *info)
9430{
9431 struct cfg80211_registered_device *rdev = info->user_ptr[0];
9432 struct net_device *dev = info->user_ptr[1];
9433 struct wireless_dev *wdev = dev->ieee80211_ptr;
9434 const u8 *peer;
9435 u8 tsid, up;
9436 u16 admitted_time = 0;
9437 int err;
9438
9439 if (!(rdev->wiphy.flags & WIPHY_FLAG_SUPPORTS_WMM_ADMISSION))
9440 return -EOPNOTSUPP;
9441
9442 if (!info->attrs[NL80211_ATTR_TSID] || !info->attrs[NL80211_ATTR_MAC] ||
9443 !info->attrs[NL80211_ATTR_USER_PRIO])
9444 return -EINVAL;
9445
9446 tsid = nla_get_u8(info->attrs[NL80211_ATTR_TSID]);
9447 if (tsid >= IEEE80211_NUM_TIDS)
9448 return -EINVAL;
9449
9450 up = nla_get_u8(info->attrs[NL80211_ATTR_USER_PRIO]);
9451 if (up >= IEEE80211_NUM_UPS)
9452 return -EINVAL;
9453
9454 /* WMM uses TIDs 0-7 even for TSPEC */
9455 if (tsid < IEEE80211_FIRST_TSPEC_TSID) {
9456 if (!(rdev->wiphy.flags & WIPHY_FLAG_SUPPORTS_WMM_ADMISSION))
9457 return -EINVAL;
9458 } else {
9459 /* TODO: handle 802.11 TSPEC/admission control
9460 * need more attributes for that (e.g. BA session requirement)
9461 */
9462 return -EINVAL;
9463 }
9464
9465 peer = nla_data(info->attrs[NL80211_ATTR_MAC]);
9466
9467 if (info->attrs[NL80211_ATTR_ADMITTED_TIME]) {
9468 admitted_time =
9469 nla_get_u16(info->attrs[NL80211_ATTR_ADMITTED_TIME]);
9470 if (!admitted_time)
9471 return -EINVAL;
9472 }
9473
9474 wdev_lock(wdev);
9475 switch (wdev->iftype) {
9476 case NL80211_IFTYPE_STATION:
9477 case NL80211_IFTYPE_P2P_CLIENT:
9478 if (wdev->current_bss)
9479 break;
9480 err = -ENOTCONN;
9481 goto out;
9482 default:
9483 err = -EOPNOTSUPP;
9484 goto out;
9485 }
9486
9487 err = rdev_add_tx_ts(rdev, dev, tsid, peer, up, admitted_time);
9488
9489 out:
9490 wdev_unlock(wdev);
9491 return err;
9492}
9493
9494static int nl80211_del_tx_ts(struct sk_buff *skb, struct genl_info *info)
9495{
9496 struct cfg80211_registered_device *rdev = info->user_ptr[0];
9497 struct net_device *dev = info->user_ptr[1];
9498 struct wireless_dev *wdev = dev->ieee80211_ptr;
9499 const u8 *peer;
9500 u8 tsid;
9501 int err;
9502
9503 if (!info->attrs[NL80211_ATTR_TSID] || !info->attrs[NL80211_ATTR_MAC])
9504 return -EINVAL;
9505
9506 tsid = nla_get_u8(info->attrs[NL80211_ATTR_TSID]);
9507 peer = nla_data(info->attrs[NL80211_ATTR_MAC]);
9508
9509 wdev_lock(wdev);
9510 err = rdev_del_tx_ts(rdev, dev, tsid, peer);
9511 wdev_unlock(wdev);
9512
9513 return err;
9514}
9515
9366#define NL80211_FLAG_NEED_WIPHY 0x01 9516#define NL80211_FLAG_NEED_WIPHY 0x01
9367#define NL80211_FLAG_NEED_NETDEV 0x02 9517#define NL80211_FLAG_NEED_NETDEV 0x02
9368#define NL80211_FLAG_NEED_RTNL 0x04 9518#define NL80211_FLAG_NEED_RTNL 0x04
@@ -9373,6 +9523,7 @@ static int nl80211_set_qos_map(struct sk_buff *skb,
9373/* If a netdev is associated, it must be UP, P2P must be started */ 9523/* If a netdev is associated, it must be UP, P2P must be started */
9374#define NL80211_FLAG_NEED_WDEV_UP (NL80211_FLAG_NEED_WDEV |\ 9524#define NL80211_FLAG_NEED_WDEV_UP (NL80211_FLAG_NEED_WDEV |\
9375 NL80211_FLAG_CHECK_NETDEV_UP) 9525 NL80211_FLAG_CHECK_NETDEV_UP)
9526#define NL80211_FLAG_CLEAR_SKB 0x20
9376 9527
9377static int nl80211_pre_doit(const struct genl_ops *ops, struct sk_buff *skb, 9528static int nl80211_pre_doit(const struct genl_ops *ops, struct sk_buff *skb,
9378 struct genl_info *info) 9529 struct genl_info *info)
@@ -9456,8 +9607,20 @@ static void nl80211_post_doit(const struct genl_ops *ops, struct sk_buff *skb,
9456 dev_put(info->user_ptr[1]); 9607 dev_put(info->user_ptr[1]);
9457 } 9608 }
9458 } 9609 }
9610
9459 if (ops->internal_flags & NL80211_FLAG_NEED_RTNL) 9611 if (ops->internal_flags & NL80211_FLAG_NEED_RTNL)
9460 rtnl_unlock(); 9612 rtnl_unlock();
9613
9614 /* If needed, clear the netlink message payload from the SKB
9615 * as it might contain key data that shouldn't stick around on
9616 * the heap after the SKB is freed. The netlink message header
9617 * is still needed for further processing, so leave it intact.
9618 */
9619 if (ops->internal_flags & NL80211_FLAG_CLEAR_SKB) {
9620 struct nlmsghdr *nlh = nlmsg_hdr(skb);
9621
9622 memset(nlmsg_data(nlh), 0, nlmsg_len(nlh));
9623 }
9461} 9624}
9462 9625
9463static const struct genl_ops nl80211_ops[] = { 9626static const struct genl_ops nl80211_ops[] = {
@@ -9525,7 +9688,8 @@ static const struct genl_ops nl80211_ops[] = {
9525 .policy = nl80211_policy, 9688 .policy = nl80211_policy,
9526 .flags = GENL_ADMIN_PERM, 9689 .flags = GENL_ADMIN_PERM,
9527 .internal_flags = NL80211_FLAG_NEED_NETDEV_UP | 9690 .internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
9528 NL80211_FLAG_NEED_RTNL, 9691 NL80211_FLAG_NEED_RTNL |
9692 NL80211_FLAG_CLEAR_SKB,
9529 }, 9693 },
9530 { 9694 {
9531 .cmd = NL80211_CMD_NEW_KEY, 9695 .cmd = NL80211_CMD_NEW_KEY,
@@ -9533,7 +9697,8 @@ static const struct genl_ops nl80211_ops[] = {
9533 .policy = nl80211_policy, 9697 .policy = nl80211_policy,
9534 .flags = GENL_ADMIN_PERM, 9698 .flags = GENL_ADMIN_PERM,
9535 .internal_flags = NL80211_FLAG_NEED_NETDEV_UP | 9699 .internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
9536 NL80211_FLAG_NEED_RTNL, 9700 NL80211_FLAG_NEED_RTNL |
9701 NL80211_FLAG_CLEAR_SKB,
9537 }, 9702 },
9538 { 9703 {
9539 .cmd = NL80211_CMD_DEL_KEY, 9704 .cmd = NL80211_CMD_DEL_KEY,
@@ -9711,7 +9876,8 @@ static const struct genl_ops nl80211_ops[] = {
9711 .policy = nl80211_policy, 9876 .policy = nl80211_policy,
9712 .flags = GENL_ADMIN_PERM, 9877 .flags = GENL_ADMIN_PERM,
9713 .internal_flags = NL80211_FLAG_NEED_NETDEV_UP | 9878 .internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
9714 NL80211_FLAG_NEED_RTNL, 9879 NL80211_FLAG_NEED_RTNL |
9880 NL80211_FLAG_CLEAR_SKB,
9715 }, 9881 },
9716 { 9882 {
9717 .cmd = NL80211_CMD_ASSOCIATE, 9883 .cmd = NL80211_CMD_ASSOCIATE,
@@ -9945,7 +10111,8 @@ static const struct genl_ops nl80211_ops[] = {
9945 .policy = nl80211_policy, 10111 .policy = nl80211_policy,
9946 .flags = GENL_ADMIN_PERM, 10112 .flags = GENL_ADMIN_PERM,
9947 .internal_flags = NL80211_FLAG_NEED_NETDEV_UP | 10113 .internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
9948 NL80211_FLAG_NEED_RTNL, 10114 NL80211_FLAG_NEED_RTNL |
10115 NL80211_FLAG_CLEAR_SKB,
9949 }, 10116 },
9950 { 10117 {
9951 .cmd = NL80211_CMD_TDLS_MGMT, 10118 .cmd = NL80211_CMD_TDLS_MGMT,
@@ -10103,6 +10270,22 @@ static const struct genl_ops nl80211_ops[] = {
10103 .internal_flags = NL80211_FLAG_NEED_NETDEV_UP | 10270 .internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
10104 NL80211_FLAG_NEED_RTNL, 10271 NL80211_FLAG_NEED_RTNL,
10105 }, 10272 },
10273 {
10274 .cmd = NL80211_CMD_ADD_TX_TS,
10275 .doit = nl80211_add_tx_ts,
10276 .policy = nl80211_policy,
10277 .flags = GENL_ADMIN_PERM,
10278 .internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
10279 NL80211_FLAG_NEED_RTNL,
10280 },
10281 {
10282 .cmd = NL80211_CMD_DEL_TX_TS,
10283 .doit = nl80211_del_tx_ts,
10284 .policy = nl80211_policy,
10285 .flags = GENL_ADMIN_PERM,
10286 .internal_flags = NL80211_FLAG_NEED_NETDEV_UP |
10287 NL80211_FLAG_NEED_RTNL,
10288 },
10106}; 10289};
10107 10290
10108/* notification functions */ 10291/* notification functions */
@@ -10371,7 +10554,8 @@ nla_put_failure:
10371static void nl80211_send_mlme_event(struct cfg80211_registered_device *rdev, 10554static void nl80211_send_mlme_event(struct cfg80211_registered_device *rdev,
10372 struct net_device *netdev, 10555 struct net_device *netdev,
10373 const u8 *buf, size_t len, 10556 const u8 *buf, size_t len,
10374 enum nl80211_commands cmd, gfp_t gfp) 10557 enum nl80211_commands cmd, gfp_t gfp,
10558 int uapsd_queues)
10375{ 10559{
10376 struct sk_buff *msg; 10560 struct sk_buff *msg;
10377 void *hdr; 10561 void *hdr;
@@ -10391,6 +10575,19 @@ static void nl80211_send_mlme_event(struct cfg80211_registered_device *rdev,
10391 nla_put(msg, NL80211_ATTR_FRAME, len, buf)) 10575 nla_put(msg, NL80211_ATTR_FRAME, len, buf))
10392 goto nla_put_failure; 10576 goto nla_put_failure;
10393 10577
10578 if (uapsd_queues >= 0) {
10579 struct nlattr *nla_wmm =
10580 nla_nest_start(msg, NL80211_ATTR_STA_WME);
10581 if (!nla_wmm)
10582 goto nla_put_failure;
10583
10584 if (nla_put_u8(msg, NL80211_STA_WME_UAPSD_QUEUES,
10585 uapsd_queues))
10586 goto nla_put_failure;
10587
10588 nla_nest_end(msg, nla_wmm);
10589 }
10590
10394 genlmsg_end(msg, hdr); 10591 genlmsg_end(msg, hdr);
10395 10592
10396 genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0, 10593 genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,
@@ -10407,15 +10604,15 @@ void nl80211_send_rx_auth(struct cfg80211_registered_device *rdev,
10407 size_t len, gfp_t gfp) 10604 size_t len, gfp_t gfp)
10408{ 10605{
10409 nl80211_send_mlme_event(rdev, netdev, buf, len, 10606 nl80211_send_mlme_event(rdev, netdev, buf, len,
10410 NL80211_CMD_AUTHENTICATE, gfp); 10607 NL80211_CMD_AUTHENTICATE, gfp, -1);
10411} 10608}
10412 10609
10413void nl80211_send_rx_assoc(struct cfg80211_registered_device *rdev, 10610void nl80211_send_rx_assoc(struct cfg80211_registered_device *rdev,
10414 struct net_device *netdev, const u8 *buf, 10611 struct net_device *netdev, const u8 *buf,
10415 size_t len, gfp_t gfp) 10612 size_t len, gfp_t gfp, int uapsd_queues)
10416{ 10613{
10417 nl80211_send_mlme_event(rdev, netdev, buf, len, 10614 nl80211_send_mlme_event(rdev, netdev, buf, len,
10418 NL80211_CMD_ASSOCIATE, gfp); 10615 NL80211_CMD_ASSOCIATE, gfp, uapsd_queues);
10419} 10616}
10420 10617
10421void nl80211_send_deauth(struct cfg80211_registered_device *rdev, 10618void nl80211_send_deauth(struct cfg80211_registered_device *rdev,
@@ -10423,7 +10620,7 @@ void nl80211_send_deauth(struct cfg80211_registered_device *rdev,
10423 size_t len, gfp_t gfp) 10620 size_t len, gfp_t gfp)
10424{ 10621{
10425 nl80211_send_mlme_event(rdev, netdev, buf, len, 10622 nl80211_send_mlme_event(rdev, netdev, buf, len,
10426 NL80211_CMD_DEAUTHENTICATE, gfp); 10623 NL80211_CMD_DEAUTHENTICATE, gfp, -1);
10427} 10624}
10428 10625
10429void nl80211_send_disassoc(struct cfg80211_registered_device *rdev, 10626void nl80211_send_disassoc(struct cfg80211_registered_device *rdev,
@@ -10431,7 +10628,7 @@ void nl80211_send_disassoc(struct cfg80211_registered_device *rdev,
10431 size_t len, gfp_t gfp) 10628 size_t len, gfp_t gfp)
10432{ 10629{
10433 nl80211_send_mlme_event(rdev, netdev, buf, len, 10630 nl80211_send_mlme_event(rdev, netdev, buf, len,
10434 NL80211_CMD_DISASSOCIATE, gfp); 10631 NL80211_CMD_DISASSOCIATE, gfp, -1);
10435} 10632}
10436 10633
10437void cfg80211_rx_unprot_mlme_mgmt(struct net_device *dev, const u8 *buf, 10634void cfg80211_rx_unprot_mlme_mgmt(struct net_device *dev, const u8 *buf,
@@ -10452,7 +10649,7 @@ void cfg80211_rx_unprot_mlme_mgmt(struct net_device *dev, const u8 *buf,
10452 cmd = NL80211_CMD_UNPROT_DISASSOCIATE; 10649 cmd = NL80211_CMD_UNPROT_DISASSOCIATE;
10453 10650
10454 trace_cfg80211_rx_unprot_mlme_mgmt(dev, buf, len); 10651 trace_cfg80211_rx_unprot_mlme_mgmt(dev, buf, len);
10455 nl80211_send_mlme_event(rdev, dev, buf, len, cmd, GFP_ATOMIC); 10652 nl80211_send_mlme_event(rdev, dev, buf, len, cmd, GFP_ATOMIC, -1);
10456} 10653}
10457EXPORT_SYMBOL(cfg80211_rx_unprot_mlme_mgmt); 10654EXPORT_SYMBOL(cfg80211_rx_unprot_mlme_mgmt);
10458 10655
diff --git a/net/wireless/nl80211.h b/net/wireless/nl80211.h
index 49c9a482dd12..7ad70d6f0cc6 100644
--- a/net/wireless/nl80211.h
+++ b/net/wireless/nl80211.h
@@ -23,7 +23,8 @@ void nl80211_send_rx_auth(struct cfg80211_registered_device *rdev,
23 const u8 *buf, size_t len, gfp_t gfp); 23 const u8 *buf, size_t len, gfp_t gfp);
24void nl80211_send_rx_assoc(struct cfg80211_registered_device *rdev, 24void nl80211_send_rx_assoc(struct cfg80211_registered_device *rdev,
25 struct net_device *netdev, 25 struct net_device *netdev,
26 const u8 *buf, size_t len, gfp_t gfp); 26 const u8 *buf, size_t len, gfp_t gfp,
27 int uapsd_queues);
27void nl80211_send_deauth(struct cfg80211_registered_device *rdev, 28void nl80211_send_deauth(struct cfg80211_registered_device *rdev,
28 struct net_device *netdev, 29 struct net_device *netdev,
29 const u8 *buf, size_t len, gfp_t gfp); 30 const u8 *buf, size_t len, gfp_t gfp);
diff --git a/net/wireless/rdev-ops.h b/net/wireless/rdev-ops.h
index 56c2240c30ce..f6d457d6a558 100644
--- a/net/wireless/rdev-ops.h
+++ b/net/wireless/rdev-ops.h
@@ -915,4 +915,35 @@ rdev_set_ap_chanwidth(struct cfg80211_registered_device *rdev,
915 return ret; 915 return ret;
916} 916}
917 917
918static inline int
919rdev_add_tx_ts(struct cfg80211_registered_device *rdev,
920 struct net_device *dev, u8 tsid, const u8 *peer,
921 u8 user_prio, u16 admitted_time)
922{
923 int ret = -EOPNOTSUPP;
924
925 trace_rdev_add_tx_ts(&rdev->wiphy, dev, tsid, peer,
926 user_prio, admitted_time);
927 if (rdev->ops->add_tx_ts)
928 ret = rdev->ops->add_tx_ts(&rdev->wiphy, dev, tsid, peer,
929 user_prio, admitted_time);
930 trace_rdev_return_int(&rdev->wiphy, ret);
931
932 return ret;
933}
934
935static inline int
936rdev_del_tx_ts(struct cfg80211_registered_device *rdev,
937 struct net_device *dev, u8 tsid, const u8 *peer)
938{
939 int ret = -EOPNOTSUPP;
940
941 trace_rdev_del_tx_ts(&rdev->wiphy, dev, tsid, peer);
942 if (rdev->ops->del_tx_ts)
943 ret = rdev->ops->del_tx_ts(&rdev->wiphy, dev, tsid, peer);
944 trace_rdev_return_int(&rdev->wiphy, ret);
945
946 return ret;
947}
948
918#endif /* __CFG80211_RDEV_OPS */ 949#endif /* __CFG80211_RDEV_OPS */
diff --git a/net/wireless/reg.c b/net/wireless/reg.c
index 1afdf45db38f..b725a31a4751 100644
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -3,6 +3,7 @@
3 * Copyright 2005-2006, Devicescape Software, Inc. 3 * Copyright 2005-2006, Devicescape Software, Inc.
4 * Copyright 2007 Johannes Berg <johannes@sipsolutions.net> 4 * Copyright 2007 Johannes Berg <johannes@sipsolutions.net>
5 * Copyright 2008-2011 Luis R. Rodriguez <mcgrof@qca.qualcomm.com> 5 * Copyright 2008-2011 Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
6 * Copyright 2013-2014 Intel Mobile Communications GmbH
6 * 7 *
7 * Permission to use, copy, modify, and/or distribute this software for any 8 * Permission to use, copy, modify, and/or distribute this software for any
8 * purpose with or without fee is hereby granted, provided that the above 9 * purpose with or without fee is hereby granted, provided that the above
@@ -798,6 +799,57 @@ static int reg_rules_intersect(const struct ieee80211_regdomain *rd1,
798 return 0; 799 return 0;
799} 800}
800 801
802/* check whether old rule contains new rule */
803static bool rule_contains(struct ieee80211_reg_rule *r1,
804 struct ieee80211_reg_rule *r2)
805{
806 /* for simplicity, currently consider only same flags */
807 if (r1->flags != r2->flags)
808 return false;
809
810 /* verify r1 is more restrictive */
811 if ((r1->power_rule.max_antenna_gain >
812 r2->power_rule.max_antenna_gain) ||
813 r1->power_rule.max_eirp > r2->power_rule.max_eirp)
814 return false;
815
816 /* make sure r2's range is contained within r1 */
817 if (r1->freq_range.start_freq_khz > r2->freq_range.start_freq_khz ||
818 r1->freq_range.end_freq_khz < r2->freq_range.end_freq_khz)
819 return false;
820
821 /* and finally verify that r1.max_bw >= r2.max_bw */
822 if (r1->freq_range.max_bandwidth_khz <
823 r2->freq_range.max_bandwidth_khz)
824 return false;
825
826 return true;
827}
828
829/* add or extend current rules. do nothing if rule is already contained */
830static void add_rule(struct ieee80211_reg_rule *rule,
831 struct ieee80211_reg_rule *reg_rules, u32 *n_rules)
832{
833 struct ieee80211_reg_rule *tmp_rule;
834 int i;
835
836 for (i = 0; i < *n_rules; i++) {
837 tmp_rule = &reg_rules[i];
838 /* rule is already contained - do nothing */
839 if (rule_contains(tmp_rule, rule))
840 return;
841
842 /* extend rule if possible */
843 if (rule_contains(rule, tmp_rule)) {
844 memcpy(tmp_rule, rule, sizeof(*rule));
845 return;
846 }
847 }
848
849 memcpy(&reg_rules[*n_rules], rule, sizeof(*rule));
850 (*n_rules)++;
851}
852
801/** 853/**
802 * regdom_intersect - do the intersection between two regulatory domains 854 * regdom_intersect - do the intersection between two regulatory domains
803 * @rd1: first regulatory domain 855 * @rd1: first regulatory domain
@@ -817,12 +869,10 @@ regdom_intersect(const struct ieee80211_regdomain *rd1,
817{ 869{
818 int r, size_of_regd; 870 int r, size_of_regd;
819 unsigned int x, y; 871 unsigned int x, y;
820 unsigned int num_rules = 0, rule_idx = 0; 872 unsigned int num_rules = 0;
821 const struct ieee80211_reg_rule *rule1, *rule2; 873 const struct ieee80211_reg_rule *rule1, *rule2;
822 struct ieee80211_reg_rule *intersected_rule; 874 struct ieee80211_reg_rule intersected_rule;
823 struct ieee80211_regdomain *rd; 875 struct ieee80211_regdomain *rd;
824 /* This is just a dummy holder to help us count */
825 struct ieee80211_reg_rule dummy_rule;
826 876
827 if (!rd1 || !rd2) 877 if (!rd1 || !rd2)
828 return NULL; 878 return NULL;
@@ -840,7 +890,7 @@ regdom_intersect(const struct ieee80211_regdomain *rd1,
840 for (y = 0; y < rd2->n_reg_rules; y++) { 890 for (y = 0; y < rd2->n_reg_rules; y++) {
841 rule2 = &rd2->reg_rules[y]; 891 rule2 = &rd2->reg_rules[y];
842 if (!reg_rules_intersect(rd1, rd2, rule1, rule2, 892 if (!reg_rules_intersect(rd1, rd2, rule1, rule2,
843 &dummy_rule)) 893 &intersected_rule))
844 num_rules++; 894 num_rules++;
845 } 895 }
846 } 896 }
@@ -855,34 +905,24 @@ regdom_intersect(const struct ieee80211_regdomain *rd1,
855 if (!rd) 905 if (!rd)
856 return NULL; 906 return NULL;
857 907
858 for (x = 0; x < rd1->n_reg_rules && rule_idx < num_rules; x++) { 908 for (x = 0; x < rd1->n_reg_rules; x++) {
859 rule1 = &rd1->reg_rules[x]; 909 rule1 = &rd1->reg_rules[x];
860 for (y = 0; y < rd2->n_reg_rules && rule_idx < num_rules; y++) { 910 for (y = 0; y < rd2->n_reg_rules; y++) {
861 rule2 = &rd2->reg_rules[y]; 911 rule2 = &rd2->reg_rules[y];
862 /*
863 * This time around instead of using the stack lets
864 * write to the target rule directly saving ourselves
865 * a memcpy()
866 */
867 intersected_rule = &rd->reg_rules[rule_idx];
868 r = reg_rules_intersect(rd1, rd2, rule1, rule2, 912 r = reg_rules_intersect(rd1, rd2, rule1, rule2,
869 intersected_rule); 913 &intersected_rule);
870 /* 914 /*
871 * No need to memset here the intersected rule here as 915 * No need to memset here the intersected rule here as
872 * we're not using the stack anymore 916 * we're not using the stack anymore
873 */ 917 */
874 if (r) 918 if (r)
875 continue; 919 continue;
876 rule_idx++;
877 }
878 }
879 920
880 if (rule_idx != num_rules) { 921 add_rule(&intersected_rule, rd->reg_rules,
881 kfree(rd); 922 &rd->n_reg_rules);
882 return NULL; 923 }
883 } 924 }
884 925
885 rd->n_reg_rules = num_rules;
886 rd->alpha2[0] = '9'; 926 rd->alpha2[0] = '9';
887 rd->alpha2[1] = '8'; 927 rd->alpha2[1] = '8';
888 rd->dfs_region = reg_intersect_dfs_region(rd1->dfs_region, 928 rd->dfs_region = reg_intersect_dfs_region(rd1->dfs_region,
diff --git a/net/wireless/scan.c b/net/wireless/scan.c
index 0798c62e6085..bda39f149810 100644
--- a/net/wireless/scan.c
+++ b/net/wireless/scan.c
@@ -2,6 +2,7 @@
2 * cfg80211 scan result handling 2 * cfg80211 scan result handling
3 * 3 *
4 * Copyright 2008 Johannes Berg <johannes@sipsolutions.net> 4 * Copyright 2008 Johannes Berg <johannes@sipsolutions.net>
5 * Copyright 2013-2014 Intel Mobile Communications GmbH
5 */ 6 */
6#include <linux/kernel.h> 7#include <linux/kernel.h>
7#include <linux/slab.h> 8#include <linux/slab.h>
@@ -884,6 +885,7 @@ struct cfg80211_bss*
884cfg80211_inform_bss_width(struct wiphy *wiphy, 885cfg80211_inform_bss_width(struct wiphy *wiphy,
885 struct ieee80211_channel *rx_channel, 886 struct ieee80211_channel *rx_channel,
886 enum nl80211_bss_scan_width scan_width, 887 enum nl80211_bss_scan_width scan_width,
888 enum cfg80211_bss_frame_type ftype,
887 const u8 *bssid, u64 tsf, u16 capability, 889 const u8 *bssid, u64 tsf, u16 capability,
888 u16 beacon_interval, const u8 *ie, size_t ielen, 890 u16 beacon_interval, const u8 *ie, size_t ielen,
889 s32 signal, gfp_t gfp) 891 s32 signal, gfp_t gfp)
@@ -911,21 +913,32 @@ cfg80211_inform_bss_width(struct wiphy *wiphy,
911 tmp.pub.beacon_interval = beacon_interval; 913 tmp.pub.beacon_interval = beacon_interval;
912 tmp.pub.capability = capability; 914 tmp.pub.capability = capability;
913 /* 915 /*
914 * Since we do not know here whether the IEs are from a Beacon or Probe 916 * If we do not know here whether the IEs are from a Beacon or Probe
915 * Response frame, we need to pick one of the options and only use it 917 * Response frame, we need to pick one of the options and only use it
916 * with the driver that does not provide the full Beacon/Probe Response 918 * with the driver that does not provide the full Beacon/Probe Response
917 * frame. Use Beacon frame pointer to avoid indicating that this should 919 * frame. Use Beacon frame pointer to avoid indicating that this should
918 * override the IEs pointer should we have received an earlier 920 * override the IEs pointer should we have received an earlier
919 * indication of Probe Response data. 921 * indication of Probe Response data.
920 */ 922 */
921 ies = kmalloc(sizeof(*ies) + ielen, gfp); 923 ies = kzalloc(sizeof(*ies) + ielen, gfp);
922 if (!ies) 924 if (!ies)
923 return NULL; 925 return NULL;
924 ies->len = ielen; 926 ies->len = ielen;
925 ies->tsf = tsf; 927 ies->tsf = tsf;
928 ies->from_beacon = false;
926 memcpy(ies->data, ie, ielen); 929 memcpy(ies->data, ie, ielen);
927 930
928 rcu_assign_pointer(tmp.pub.beacon_ies, ies); 931 switch (ftype) {
932 case CFG80211_BSS_FTYPE_BEACON:
933 ies->from_beacon = true;
934 /* fall through to assign */
935 case CFG80211_BSS_FTYPE_UNKNOWN:
936 rcu_assign_pointer(tmp.pub.beacon_ies, ies);
937 break;
938 case CFG80211_BSS_FTYPE_PRESP:
939 rcu_assign_pointer(tmp.pub.proberesp_ies, ies);
940 break;
941 }
929 rcu_assign_pointer(tmp.pub.ies, ies); 942 rcu_assign_pointer(tmp.pub.ies, ies);
930 943
931 signal_valid = abs(rx_channel->center_freq - channel->center_freq) <= 944 signal_valid = abs(rx_channel->center_freq - channel->center_freq) <=
@@ -982,11 +995,12 @@ cfg80211_inform_bss_width_frame(struct wiphy *wiphy,
982 if (!channel) 995 if (!channel)
983 return NULL; 996 return NULL;
984 997
985 ies = kmalloc(sizeof(*ies) + ielen, gfp); 998 ies = kzalloc(sizeof(*ies) + ielen, gfp);
986 if (!ies) 999 if (!ies)
987 return NULL; 1000 return NULL;
988 ies->len = ielen; 1001 ies->len = ielen;
989 ies->tsf = le64_to_cpu(mgmt->u.probe_resp.timestamp); 1002 ies->tsf = le64_to_cpu(mgmt->u.probe_resp.timestamp);
1003 ies->from_beacon = ieee80211_is_beacon(mgmt->frame_control);
990 memcpy(ies->data, mgmt->u.probe_resp.variable, ielen); 1004 memcpy(ies->data, mgmt->u.probe_resp.variable, ielen);
991 1005
992 if (ieee80211_is_probe_resp(mgmt->frame_control)) 1006 if (ieee80211_is_probe_resp(mgmt->frame_control))
diff --git a/net/wireless/sme.c b/net/wireless/sme.c
index 8bbeeb302216..dc1668ff543b 100644
--- a/net/wireless/sme.c
+++ b/net/wireless/sme.c
@@ -641,7 +641,7 @@ void __cfg80211_connect_result(struct net_device *dev, const u8 *bssid,
641 } 641 }
642 642
643 if (status != WLAN_STATUS_SUCCESS) { 643 if (status != WLAN_STATUS_SUCCESS) {
644 kfree(wdev->connect_keys); 644 kzfree(wdev->connect_keys);
645 wdev->connect_keys = NULL; 645 wdev->connect_keys = NULL;
646 wdev->ssid_len = 0; 646 wdev->ssid_len = 0;
647 if (bss) { 647 if (bss) {
@@ -918,7 +918,7 @@ int cfg80211_connect(struct cfg80211_registered_device *rdev,
918 ASSERT_WDEV_LOCK(wdev); 918 ASSERT_WDEV_LOCK(wdev);
919 919
920 if (WARN_ON(wdev->connect_keys)) { 920 if (WARN_ON(wdev->connect_keys)) {
921 kfree(wdev->connect_keys); 921 kzfree(wdev->connect_keys);
922 wdev->connect_keys = NULL; 922 wdev->connect_keys = NULL;
923 } 923 }
924 924
@@ -978,7 +978,7 @@ int cfg80211_disconnect(struct cfg80211_registered_device *rdev,
978 978
979 ASSERT_WDEV_LOCK(wdev); 979 ASSERT_WDEV_LOCK(wdev);
980 980
981 kfree(wdev->connect_keys); 981 kzfree(wdev->connect_keys);
982 wdev->connect_keys = NULL; 982 wdev->connect_keys = NULL;
983 983
984 if (wdev->conn) 984 if (wdev->conn)
diff --git a/net/wireless/trace.h b/net/wireless/trace.h
index 0c524cd76c83..625a6e6d1168 100644
--- a/net/wireless/trace.h
+++ b/net/wireless/trace.h
@@ -1896,6 +1896,51 @@ TRACE_EVENT(rdev_set_ap_chanwidth,
1896 WIPHY_PR_ARG, NETDEV_PR_ARG, CHAN_DEF_PR_ARG) 1896 WIPHY_PR_ARG, NETDEV_PR_ARG, CHAN_DEF_PR_ARG)
1897); 1897);
1898 1898
1899TRACE_EVENT(rdev_add_tx_ts,
1900 TP_PROTO(struct wiphy *wiphy, struct net_device *netdev,
1901 u8 tsid, const u8 *peer, u8 user_prio, u16 admitted_time),
1902 TP_ARGS(wiphy, netdev, tsid, peer, user_prio, admitted_time),
1903 TP_STRUCT__entry(
1904 WIPHY_ENTRY
1905 NETDEV_ENTRY
1906 MAC_ENTRY(peer)
1907 __field(u8, tsid)
1908 __field(u8, user_prio)
1909 __field(u16, admitted_time)
1910 ),
1911 TP_fast_assign(
1912 WIPHY_ASSIGN;
1913 NETDEV_ASSIGN;
1914 MAC_ASSIGN(peer, peer);
1915 __entry->tsid = tsid;
1916 __entry->user_prio = user_prio;
1917 __entry->admitted_time = admitted_time;
1918 ),
1919 TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " MAC_PR_FMT ", TSID %d, UP %d, time %d",
1920 WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(peer),
1921 __entry->tsid, __entry->user_prio, __entry->admitted_time)
1922);
1923
1924TRACE_EVENT(rdev_del_tx_ts,
1925 TP_PROTO(struct wiphy *wiphy, struct net_device *netdev,
1926 u8 tsid, const u8 *peer),
1927 TP_ARGS(wiphy, netdev, tsid, peer),
1928 TP_STRUCT__entry(
1929 WIPHY_ENTRY
1930 NETDEV_ENTRY
1931 MAC_ENTRY(peer)
1932 __field(u8, tsid)
1933 ),
1934 TP_fast_assign(
1935 WIPHY_ASSIGN;
1936 NETDEV_ASSIGN;
1937 MAC_ASSIGN(peer, peer);
1938 __entry->tsid = tsid;
1939 ),
1940 TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " MAC_PR_FMT ", TSID %d",
1941 WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(peer), __entry->tsid)
1942);
1943
1899/************************************************************* 1944/*************************************************************
1900 * cfg80211 exported functions traces * 1945 * cfg80211 exported functions traces *
1901 *************************************************************/ 1946 *************************************************************/
diff --git a/net/wireless/util.c b/net/wireless/util.c
index 728f1c0dc70d..5e233a577d0f 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -2,6 +2,7 @@
2 * Wireless utility functions 2 * Wireless utility functions
3 * 3 *
4 * Copyright 2007-2009 Johannes Berg <johannes@sipsolutions.net> 4 * Copyright 2007-2009 Johannes Berg <johannes@sipsolutions.net>
5 * Copyright 2013-2014 Intel Mobile Communications GmbH
5 */ 6 */
6#include <linux/export.h> 7#include <linux/export.h>
7#include <linux/bitops.h> 8#include <linux/bitops.h>
@@ -796,7 +797,7 @@ void cfg80211_upload_connect_keys(struct wireless_dev *wdev)
796 netdev_err(dev, "failed to set mgtdef %d\n", i); 797 netdev_err(dev, "failed to set mgtdef %d\n", i);
797 } 798 }
798 799
799 kfree(wdev->connect_keys); 800 kzfree(wdev->connect_keys);
800 wdev->connect_keys = NULL; 801 wdev->connect_keys = NULL;
801} 802}
802 803
diff --git a/net/wireless/wext-compat.c b/net/wireless/wext-compat.c
index 11120bb14162..0f47948c572f 100644
--- a/net/wireless/wext-compat.c
+++ b/net/wireless/wext-compat.c
@@ -496,6 +496,8 @@ static int __cfg80211_set_encryption(struct cfg80211_registered_device *rdev,
496 err = 0; 496 err = 0;
497 if (!err) { 497 if (!err) {
498 if (!addr) { 498 if (!addr) {
499 memset(wdev->wext.keys->data[idx], 0,
500 sizeof(wdev->wext.keys->data[idx]));
499 wdev->wext.keys->params[idx].key_len = 0; 501 wdev->wext.keys->params[idx].key_len = 0;
500 wdev->wext.keys->params[idx].cipher = 0; 502 wdev->wext.keys->params[idx].cipher = 0;
501 } 503 }
diff --git a/net/wireless/wext-sme.c b/net/wireless/wext-sme.c
index c7e5c8eb4f24..368611c05739 100644
--- a/net/wireless/wext-sme.c
+++ b/net/wireless/wext-sme.c
@@ -57,7 +57,7 @@ int cfg80211_mgd_wext_connect(struct cfg80211_registered_device *rdev,
57 err = cfg80211_connect(rdev, wdev->netdev, 57 err = cfg80211_connect(rdev, wdev->netdev,
58 &wdev->wext.connect, ck, prev_bssid); 58 &wdev->wext.connect, ck, prev_bssid);
59 if (err) 59 if (err)
60 kfree(ck); 60 kzfree(ck);
61 61
62 return err; 62 return err;
63} 63}
diff --git a/net/xfrm/xfrm_hash.h b/net/xfrm/xfrm_hash.h
index 0622d319e1f2..666c5ffe929d 100644
--- a/net/xfrm/xfrm_hash.h
+++ b/net/xfrm/xfrm_hash.h
@@ -3,6 +3,7 @@
3 3
4#include <linux/xfrm.h> 4#include <linux/xfrm.h>
5#include <linux/socket.h> 5#include <linux/socket.h>
6#include <linux/jhash.h>
6 7
7static inline unsigned int __xfrm4_addr_hash(const xfrm_address_t *addr) 8static inline unsigned int __xfrm4_addr_hash(const xfrm_address_t *addr)
8{ 9{
@@ -28,6 +29,58 @@ static inline unsigned int __xfrm6_daddr_saddr_hash(const xfrm_address_t *daddr,
28 saddr->a6[2] ^ saddr->a6[3]); 29 saddr->a6[2] ^ saddr->a6[3]);
29} 30}
30 31
32static inline u32 __bits2mask32(__u8 bits)
33{
34 u32 mask32 = 0xffffffff;
35
36 if (bits == 0)
37 mask32 = 0;
38 else if (bits < 32)
39 mask32 <<= (32 - bits);
40
41 return mask32;
42}
43
44static inline unsigned int __xfrm4_dpref_spref_hash(const xfrm_address_t *daddr,
45 const xfrm_address_t *saddr,
46 __u8 dbits,
47 __u8 sbits)
48{
49 return jhash_2words(ntohl(daddr->a4) & __bits2mask32(dbits),
50 ntohl(saddr->a4) & __bits2mask32(sbits),
51 0);
52}
53
54static inline unsigned int __xfrm6_pref_hash(const xfrm_address_t *addr,
55 __u8 prefixlen)
56{
57 int pdw;
58 int pbi;
59 u32 initval = 0;
60
61 pdw = prefixlen >> 5; /* num of whole u32 in prefix */
62 pbi = prefixlen & 0x1f; /* num of bits in incomplete u32 in prefix */
63
64 if (pbi) {
65 __be32 mask;
66
67 mask = htonl((0xffffffff) << (32 - pbi));
68
69 initval = (__force u32)(addr->a6[pdw] & mask);
70 }
71
72 return jhash2((__force u32 *)addr->a6, pdw, initval);
73}
74
75static inline unsigned int __xfrm6_dpref_spref_hash(const xfrm_address_t *daddr,
76 const xfrm_address_t *saddr,
77 __u8 dbits,
78 __u8 sbits)
79{
80 return __xfrm6_pref_hash(daddr, dbits) ^
81 __xfrm6_pref_hash(saddr, sbits);
82}
83
31static inline unsigned int __xfrm_dst_hash(const xfrm_address_t *daddr, 84static inline unsigned int __xfrm_dst_hash(const xfrm_address_t *daddr,
32 const xfrm_address_t *saddr, 85 const xfrm_address_t *saddr,
33 u32 reqid, unsigned short family, 86 u32 reqid, unsigned short family,
@@ -84,7 +137,8 @@ static inline unsigned int __idx_hash(u32 index, unsigned int hmask)
84} 137}
85 138
86static inline unsigned int __sel_hash(const struct xfrm_selector *sel, 139static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
87 unsigned short family, unsigned int hmask) 140 unsigned short family, unsigned int hmask,
141 u8 dbits, u8 sbits)
88{ 142{
89 const xfrm_address_t *daddr = &sel->daddr; 143 const xfrm_address_t *daddr = &sel->daddr;
90 const xfrm_address_t *saddr = &sel->saddr; 144 const xfrm_address_t *saddr = &sel->saddr;
@@ -92,19 +146,19 @@ static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
92 146
93 switch (family) { 147 switch (family) {
94 case AF_INET: 148 case AF_INET:
95 if (sel->prefixlen_d != 32 || 149 if (sel->prefixlen_d < dbits ||
96 sel->prefixlen_s != 32) 150 sel->prefixlen_s < sbits)
97 return hmask + 1; 151 return hmask + 1;
98 152
99 h = __xfrm4_daddr_saddr_hash(daddr, saddr); 153 h = __xfrm4_dpref_spref_hash(daddr, saddr, dbits, sbits);
100 break; 154 break;
101 155
102 case AF_INET6: 156 case AF_INET6:
103 if (sel->prefixlen_d != 128 || 157 if (sel->prefixlen_d < dbits ||
104 sel->prefixlen_s != 128) 158 sel->prefixlen_s < sbits)
105 return hmask + 1; 159 return hmask + 1;
106 160
107 h = __xfrm6_daddr_saddr_hash(daddr, saddr); 161 h = __xfrm6_dpref_spref_hash(daddr, saddr, dbits, sbits);
108 break; 162 break;
109 } 163 }
110 h ^= (h >> 16); 164 h ^= (h >> 16);
@@ -113,17 +167,19 @@ static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
113 167
114static inline unsigned int __addr_hash(const xfrm_address_t *daddr, 168static inline unsigned int __addr_hash(const xfrm_address_t *daddr,
115 const xfrm_address_t *saddr, 169 const xfrm_address_t *saddr,
116 unsigned short family, unsigned int hmask) 170 unsigned short family,
171 unsigned int hmask,
172 u8 dbits, u8 sbits)
117{ 173{
118 unsigned int h = 0; 174 unsigned int h = 0;
119 175
120 switch (family) { 176 switch (family) {
121 case AF_INET: 177 case AF_INET:
122 h = __xfrm4_daddr_saddr_hash(daddr, saddr); 178 h = __xfrm4_dpref_spref_hash(daddr, saddr, dbits, sbits);
123 break; 179 break;
124 180
125 case AF_INET6: 181 case AF_INET6:
126 h = __xfrm6_daddr_saddr_hash(daddr, saddr); 182 h = __xfrm6_dpref_spref_hash(daddr, saddr, dbits, sbits);
127 break; 183 break;
128 } 184 }
129 h ^= (h >> 16); 185 h ^= (h >> 16);
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index c51e8f7b8653..499d6c18a8ce 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -166,11 +166,7 @@ static int xfrm_output_gso(struct sk_buff *skb)
166 err = xfrm_output2(segs); 166 err = xfrm_output2(segs);
167 167
168 if (unlikely(err)) { 168 if (unlikely(err)) {
169 while ((segs = nskb)) { 169 kfree_skb_list(nskb);
170 nskb = segs->next;
171 segs->next = NULL;
172 kfree_skb(segs);
173 }
174 return err; 170 return err;
175 } 171 }
176 172
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index fdde51f4271a..4c4e457e7888 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -349,12 +349,39 @@ static inline unsigned int idx_hash(struct net *net, u32 index)
349 return __idx_hash(index, net->xfrm.policy_idx_hmask); 349 return __idx_hash(index, net->xfrm.policy_idx_hmask);
350} 350}
351 351
352/* calculate policy hash thresholds */
353static void __get_hash_thresh(struct net *net,
354 unsigned short family, int dir,
355 u8 *dbits, u8 *sbits)
356{
357 switch (family) {
358 case AF_INET:
359 *dbits = net->xfrm.policy_bydst[dir].dbits4;
360 *sbits = net->xfrm.policy_bydst[dir].sbits4;
361 break;
362
363 case AF_INET6:
364 *dbits = net->xfrm.policy_bydst[dir].dbits6;
365 *sbits = net->xfrm.policy_bydst[dir].sbits6;
366 break;
367
368 default:
369 *dbits = 0;
370 *sbits = 0;
371 }
372}
373
352static struct hlist_head *policy_hash_bysel(struct net *net, 374static struct hlist_head *policy_hash_bysel(struct net *net,
353 const struct xfrm_selector *sel, 375 const struct xfrm_selector *sel,
354 unsigned short family, int dir) 376 unsigned short family, int dir)
355{ 377{
356 unsigned int hmask = net->xfrm.policy_bydst[dir].hmask; 378 unsigned int hmask = net->xfrm.policy_bydst[dir].hmask;
357 unsigned int hash = __sel_hash(sel, family, hmask); 379 unsigned int hash;
380 u8 dbits;
381 u8 sbits;
382
383 __get_hash_thresh(net, family, dir, &dbits, &sbits);
384 hash = __sel_hash(sel, family, hmask, dbits, sbits);
358 385
359 return (hash == hmask + 1 ? 386 return (hash == hmask + 1 ?
360 &net->xfrm.policy_inexact[dir] : 387 &net->xfrm.policy_inexact[dir] :
@@ -367,25 +394,35 @@ static struct hlist_head *policy_hash_direct(struct net *net,
367 unsigned short family, int dir) 394 unsigned short family, int dir)
368{ 395{
369 unsigned int hmask = net->xfrm.policy_bydst[dir].hmask; 396 unsigned int hmask = net->xfrm.policy_bydst[dir].hmask;
370 unsigned int hash = __addr_hash(daddr, saddr, family, hmask); 397 unsigned int hash;
398 u8 dbits;
399 u8 sbits;
400
401 __get_hash_thresh(net, family, dir, &dbits, &sbits);
402 hash = __addr_hash(daddr, saddr, family, hmask, dbits, sbits);
371 403
372 return net->xfrm.policy_bydst[dir].table + hash; 404 return net->xfrm.policy_bydst[dir].table + hash;
373} 405}
374 406
375static void xfrm_dst_hash_transfer(struct hlist_head *list, 407static void xfrm_dst_hash_transfer(struct net *net,
408 struct hlist_head *list,
376 struct hlist_head *ndsttable, 409 struct hlist_head *ndsttable,
377 unsigned int nhashmask) 410 unsigned int nhashmask,
411 int dir)
378{ 412{
379 struct hlist_node *tmp, *entry0 = NULL; 413 struct hlist_node *tmp, *entry0 = NULL;
380 struct xfrm_policy *pol; 414 struct xfrm_policy *pol;
381 unsigned int h0 = 0; 415 unsigned int h0 = 0;
416 u8 dbits;
417 u8 sbits;
382 418
383redo: 419redo:
384 hlist_for_each_entry_safe(pol, tmp, list, bydst) { 420 hlist_for_each_entry_safe(pol, tmp, list, bydst) {
385 unsigned int h; 421 unsigned int h;
386 422
423 __get_hash_thresh(net, pol->family, dir, &dbits, &sbits);
387 h = __addr_hash(&pol->selector.daddr, &pol->selector.saddr, 424 h = __addr_hash(&pol->selector.daddr, &pol->selector.saddr,
388 pol->family, nhashmask); 425 pol->family, nhashmask, dbits, sbits);
389 if (!entry0) { 426 if (!entry0) {
390 hlist_del(&pol->bydst); 427 hlist_del(&pol->bydst);
391 hlist_add_head(&pol->bydst, ndsttable+h); 428 hlist_add_head(&pol->bydst, ndsttable+h);
@@ -439,7 +476,7 @@ static void xfrm_bydst_resize(struct net *net, int dir)
439 write_lock_bh(&net->xfrm.xfrm_policy_lock); 476 write_lock_bh(&net->xfrm.xfrm_policy_lock);
440 477
441 for (i = hmask; i >= 0; i--) 478 for (i = hmask; i >= 0; i--)
442 xfrm_dst_hash_transfer(odst + i, ndst, nhashmask); 479 xfrm_dst_hash_transfer(net, odst + i, ndst, nhashmask, dir);
443 480
444 net->xfrm.policy_bydst[dir].table = ndst; 481 net->xfrm.policy_bydst[dir].table = ndst;
445 net->xfrm.policy_bydst[dir].hmask = nhashmask; 482 net->xfrm.policy_bydst[dir].hmask = nhashmask;
@@ -534,6 +571,86 @@ static void xfrm_hash_resize(struct work_struct *work)
534 mutex_unlock(&hash_resize_mutex); 571 mutex_unlock(&hash_resize_mutex);
535} 572}
536 573
574static void xfrm_hash_rebuild(struct work_struct *work)
575{
576 struct net *net = container_of(work, struct net,
577 xfrm.policy_hthresh.work);
578 unsigned int hmask;
579 struct xfrm_policy *pol;
580 struct xfrm_policy *policy;
581 struct hlist_head *chain;
582 struct hlist_head *odst;
583 struct hlist_node *newpos;
584 int i;
585 int dir;
586 unsigned seq;
587 u8 lbits4, rbits4, lbits6, rbits6;
588
589 mutex_lock(&hash_resize_mutex);
590
591 /* read selector prefixlen thresholds */
592 do {
593 seq = read_seqbegin(&net->xfrm.policy_hthresh.lock);
594
595 lbits4 = net->xfrm.policy_hthresh.lbits4;
596 rbits4 = net->xfrm.policy_hthresh.rbits4;
597 lbits6 = net->xfrm.policy_hthresh.lbits6;
598 rbits6 = net->xfrm.policy_hthresh.rbits6;
599 } while (read_seqretry(&net->xfrm.policy_hthresh.lock, seq));
600
601 write_lock_bh(&net->xfrm.xfrm_policy_lock);
602
603 /* reset the bydst and inexact table in all directions */
604 for (dir = 0; dir < XFRM_POLICY_MAX * 2; dir++) {
605 INIT_HLIST_HEAD(&net->xfrm.policy_inexact[dir]);
606 hmask = net->xfrm.policy_bydst[dir].hmask;
607 odst = net->xfrm.policy_bydst[dir].table;
608 for (i = hmask; i >= 0; i--)
609 INIT_HLIST_HEAD(odst + i);
610 if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) {
611 /* dir out => dst = remote, src = local */
612 net->xfrm.policy_bydst[dir].dbits4 = rbits4;
613 net->xfrm.policy_bydst[dir].sbits4 = lbits4;
614 net->xfrm.policy_bydst[dir].dbits6 = rbits6;
615 net->xfrm.policy_bydst[dir].sbits6 = lbits6;
616 } else {
617 /* dir in/fwd => dst = local, src = remote */
618 net->xfrm.policy_bydst[dir].dbits4 = lbits4;
619 net->xfrm.policy_bydst[dir].sbits4 = rbits4;
620 net->xfrm.policy_bydst[dir].dbits6 = lbits6;
621 net->xfrm.policy_bydst[dir].sbits6 = rbits6;
622 }
623 }
624
625 /* re-insert all policies by order of creation */
626 list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) {
627 newpos = NULL;
628 chain = policy_hash_bysel(net, &policy->selector,
629 policy->family,
630 xfrm_policy_id2dir(policy->index));
631 hlist_for_each_entry(pol, chain, bydst) {
632 if (policy->priority >= pol->priority)
633 newpos = &pol->bydst;
634 else
635 break;
636 }
637 if (newpos)
638 hlist_add_behind(&policy->bydst, newpos);
639 else
640 hlist_add_head(&policy->bydst, chain);
641 }
642
643 write_unlock_bh(&net->xfrm.xfrm_policy_lock);
644
645 mutex_unlock(&hash_resize_mutex);
646}
647
648void xfrm_policy_hash_rebuild(struct net *net)
649{
650 schedule_work(&net->xfrm.policy_hthresh.work);
651}
652EXPORT_SYMBOL(xfrm_policy_hash_rebuild);
653
537/* Generate new index... KAME seems to generate them ordered by cost 654/* Generate new index... KAME seems to generate them ordered by cost
538 * of an absolute inpredictability of ordering of rules. This will not pass. */ 655 * of an absolute inpredictability of ordering of rules. This will not pass. */
539static u32 xfrm_gen_index(struct net *net, int dir, u32 index) 656static u32 xfrm_gen_index(struct net *net, int dir, u32 index)
@@ -1844,10 +1961,8 @@ static int xdst_queue_output(struct sock *sk, struct sk_buff *skb)
1844 struct xfrm_dst *xdst = (struct xfrm_dst *) dst; 1961 struct xfrm_dst *xdst = (struct xfrm_dst *) dst;
1845 struct xfrm_policy *pol = xdst->pols[0]; 1962 struct xfrm_policy *pol = xdst->pols[0];
1846 struct xfrm_policy_queue *pq = &pol->polq; 1963 struct xfrm_policy_queue *pq = &pol->polq;
1847 const struct sk_buff *fclone = skb + 1;
1848 1964
1849 if (unlikely(skb->fclone == SKB_FCLONE_ORIG && 1965 if (unlikely(skb_fclone_busy(skb))) {
1850 fclone->fclone == SKB_FCLONE_CLONE)) {
1851 kfree_skb(skb); 1966 kfree_skb(skb);
1852 return 0; 1967 return 0;
1853 } 1968 }
@@ -2862,10 +2977,21 @@ static int __net_init xfrm_policy_init(struct net *net)
2862 if (!htab->table) 2977 if (!htab->table)
2863 goto out_bydst; 2978 goto out_bydst;
2864 htab->hmask = hmask; 2979 htab->hmask = hmask;
2980 htab->dbits4 = 32;
2981 htab->sbits4 = 32;
2982 htab->dbits6 = 128;
2983 htab->sbits6 = 128;
2865 } 2984 }
2985 net->xfrm.policy_hthresh.lbits4 = 32;
2986 net->xfrm.policy_hthresh.rbits4 = 32;
2987 net->xfrm.policy_hthresh.lbits6 = 128;
2988 net->xfrm.policy_hthresh.rbits6 = 128;
2989
2990 seqlock_init(&net->xfrm.policy_hthresh.lock);
2866 2991
2867 INIT_LIST_HEAD(&net->xfrm.policy_all); 2992 INIT_LIST_HEAD(&net->xfrm.policy_all);
2868 INIT_WORK(&net->xfrm.policy_hash_work, xfrm_hash_resize); 2993 INIT_WORK(&net->xfrm.policy_hash_work, xfrm_hash_resize);
2994 INIT_WORK(&net->xfrm.policy_hthresh.work, xfrm_hash_rebuild);
2869 if (net_eq(net, &init_net)) 2995 if (net_eq(net, &init_net))
2870 register_netdevice_notifier(&xfrm_dev_notifier); 2996 register_netdevice_notifier(&xfrm_dev_notifier);
2871 return 0; 2997 return 0;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 0ab54134bb40..de971b6d38c5 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -97,8 +97,6 @@ static unsigned long xfrm_hash_new_size(unsigned int state_hmask)
97 return ((state_hmask + 1) << 1) * sizeof(struct hlist_head); 97 return ((state_hmask + 1) << 1) * sizeof(struct hlist_head);
98} 98}
99 99
100static DEFINE_MUTEX(hash_resize_mutex);
101
102static void xfrm_hash_resize(struct work_struct *work) 100static void xfrm_hash_resize(struct work_struct *work)
103{ 101{
104 struct net *net = container_of(work, struct net, xfrm.state_hash_work); 102 struct net *net = container_of(work, struct net, xfrm.state_hash_work);
@@ -107,22 +105,20 @@ static void xfrm_hash_resize(struct work_struct *work)
107 unsigned int nhashmask, ohashmask; 105 unsigned int nhashmask, ohashmask;
108 int i; 106 int i;
109 107
110 mutex_lock(&hash_resize_mutex);
111
112 nsize = xfrm_hash_new_size(net->xfrm.state_hmask); 108 nsize = xfrm_hash_new_size(net->xfrm.state_hmask);
113 ndst = xfrm_hash_alloc(nsize); 109 ndst = xfrm_hash_alloc(nsize);
114 if (!ndst) 110 if (!ndst)
115 goto out_unlock; 111 return;
116 nsrc = xfrm_hash_alloc(nsize); 112 nsrc = xfrm_hash_alloc(nsize);
117 if (!nsrc) { 113 if (!nsrc) {
118 xfrm_hash_free(ndst, nsize); 114 xfrm_hash_free(ndst, nsize);
119 goto out_unlock; 115 return;
120 } 116 }
121 nspi = xfrm_hash_alloc(nsize); 117 nspi = xfrm_hash_alloc(nsize);
122 if (!nspi) { 118 if (!nspi) {
123 xfrm_hash_free(ndst, nsize); 119 xfrm_hash_free(ndst, nsize);
124 xfrm_hash_free(nsrc, nsize); 120 xfrm_hash_free(nsrc, nsize);
125 goto out_unlock; 121 return;
126 } 122 }
127 123
128 spin_lock_bh(&net->xfrm.xfrm_state_lock); 124 spin_lock_bh(&net->xfrm.xfrm_state_lock);
@@ -148,9 +144,6 @@ static void xfrm_hash_resize(struct work_struct *work)
148 xfrm_hash_free(odst, osize); 144 xfrm_hash_free(odst, osize);
149 xfrm_hash_free(osrc, osize); 145 xfrm_hash_free(osrc, osize);
150 xfrm_hash_free(ospi, osize); 146 xfrm_hash_free(ospi, osize);
151
152out_unlock:
153 mutex_unlock(&hash_resize_mutex);
154} 147}
155 148
156static DEFINE_SPINLOCK(xfrm_state_afinfo_lock); 149static DEFINE_SPINLOCK(xfrm_state_afinfo_lock);
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index d4db6ebb089d..e812e988c111 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -333,8 +333,7 @@ static int attach_auth_trunc(struct xfrm_algo_auth **algpp, u8 *props,
333 algo = xfrm_aalg_get_byname(ualg->alg_name, 1); 333 algo = xfrm_aalg_get_byname(ualg->alg_name, 1);
334 if (!algo) 334 if (!algo)
335 return -ENOSYS; 335 return -ENOSYS;
336 if ((ualg->alg_trunc_len / 8) > MAX_AH_AUTH_LEN || 336 if (ualg->alg_trunc_len > algo->uinfo.auth.icv_fullbits)
337 ualg->alg_trunc_len > algo->uinfo.auth.icv_fullbits)
338 return -EINVAL; 337 return -EINVAL;
339 *props = algo->desc.sadb_alg_id; 338 *props = algo->desc.sadb_alg_id;
340 339
@@ -964,7 +963,9 @@ static inline size_t xfrm_spdinfo_msgsize(void)
964{ 963{
965 return NLMSG_ALIGN(4) 964 return NLMSG_ALIGN(4)
966 + nla_total_size(sizeof(struct xfrmu_spdinfo)) 965 + nla_total_size(sizeof(struct xfrmu_spdinfo))
967 + nla_total_size(sizeof(struct xfrmu_spdhinfo)); 966 + nla_total_size(sizeof(struct xfrmu_spdhinfo))
967 + nla_total_size(sizeof(struct xfrmu_spdhthresh))
968 + nla_total_size(sizeof(struct xfrmu_spdhthresh));
968} 969}
969 970
970static int build_spdinfo(struct sk_buff *skb, struct net *net, 971static int build_spdinfo(struct sk_buff *skb, struct net *net,
@@ -973,9 +974,11 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
973 struct xfrmk_spdinfo si; 974 struct xfrmk_spdinfo si;
974 struct xfrmu_spdinfo spc; 975 struct xfrmu_spdinfo spc;
975 struct xfrmu_spdhinfo sph; 976 struct xfrmu_spdhinfo sph;
977 struct xfrmu_spdhthresh spt4, spt6;
976 struct nlmsghdr *nlh; 978 struct nlmsghdr *nlh;
977 int err; 979 int err;
978 u32 *f; 980 u32 *f;
981 unsigned lseq;
979 982
980 nlh = nlmsg_put(skb, portid, seq, XFRM_MSG_NEWSPDINFO, sizeof(u32), 0); 983 nlh = nlmsg_put(skb, portid, seq, XFRM_MSG_NEWSPDINFO, sizeof(u32), 0);
981 if (nlh == NULL) /* shouldn't really happen ... */ 984 if (nlh == NULL) /* shouldn't really happen ... */
@@ -993,9 +996,22 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
993 sph.spdhcnt = si.spdhcnt; 996 sph.spdhcnt = si.spdhcnt;
994 sph.spdhmcnt = si.spdhmcnt; 997 sph.spdhmcnt = si.spdhmcnt;
995 998
999 do {
1000 lseq = read_seqbegin(&net->xfrm.policy_hthresh.lock);
1001
1002 spt4.lbits = net->xfrm.policy_hthresh.lbits4;
1003 spt4.rbits = net->xfrm.policy_hthresh.rbits4;
1004 spt6.lbits = net->xfrm.policy_hthresh.lbits6;
1005 spt6.rbits = net->xfrm.policy_hthresh.rbits6;
1006 } while (read_seqretry(&net->xfrm.policy_hthresh.lock, lseq));
1007
996 err = nla_put(skb, XFRMA_SPD_INFO, sizeof(spc), &spc); 1008 err = nla_put(skb, XFRMA_SPD_INFO, sizeof(spc), &spc);
997 if (!err) 1009 if (!err)
998 err = nla_put(skb, XFRMA_SPD_HINFO, sizeof(sph), &sph); 1010 err = nla_put(skb, XFRMA_SPD_HINFO, sizeof(sph), &sph);
1011 if (!err)
1012 err = nla_put(skb, XFRMA_SPD_IPV4_HTHRESH, sizeof(spt4), &spt4);
1013 if (!err)
1014 err = nla_put(skb, XFRMA_SPD_IPV6_HTHRESH, sizeof(spt6), &spt6);
999 if (err) { 1015 if (err) {
1000 nlmsg_cancel(skb, nlh); 1016 nlmsg_cancel(skb, nlh);
1001 return err; 1017 return err;
@@ -1004,6 +1020,51 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
1004 return nlmsg_end(skb, nlh); 1020 return nlmsg_end(skb, nlh);
1005} 1021}
1006 1022
1023static int xfrm_set_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
1024 struct nlattr **attrs)
1025{
1026 struct net *net = sock_net(skb->sk);
1027 struct xfrmu_spdhthresh *thresh4 = NULL;
1028 struct xfrmu_spdhthresh *thresh6 = NULL;
1029
1030 /* selector prefixlen thresholds to hash policies */
1031 if (attrs[XFRMA_SPD_IPV4_HTHRESH]) {
1032 struct nlattr *rta = attrs[XFRMA_SPD_IPV4_HTHRESH];
1033
1034 if (nla_len(rta) < sizeof(*thresh4))
1035 return -EINVAL;
1036 thresh4 = nla_data(rta);
1037 if (thresh4->lbits > 32 || thresh4->rbits > 32)
1038 return -EINVAL;
1039 }
1040 if (attrs[XFRMA_SPD_IPV6_HTHRESH]) {
1041 struct nlattr *rta = attrs[XFRMA_SPD_IPV6_HTHRESH];
1042
1043 if (nla_len(rta) < sizeof(*thresh6))
1044 return -EINVAL;
1045 thresh6 = nla_data(rta);
1046 if (thresh6->lbits > 128 || thresh6->rbits > 128)
1047 return -EINVAL;
1048 }
1049
1050 if (thresh4 || thresh6) {
1051 write_seqlock(&net->xfrm.policy_hthresh.lock);
1052 if (thresh4) {
1053 net->xfrm.policy_hthresh.lbits4 = thresh4->lbits;
1054 net->xfrm.policy_hthresh.rbits4 = thresh4->rbits;
1055 }
1056 if (thresh6) {
1057 net->xfrm.policy_hthresh.lbits6 = thresh6->lbits;
1058 net->xfrm.policy_hthresh.rbits6 = thresh6->rbits;
1059 }
1060 write_sequnlock(&net->xfrm.policy_hthresh.lock);
1061
1062 xfrm_policy_hash_rebuild(net);
1063 }
1064
1065 return 0;
1066}
1067
1007static int xfrm_get_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh, 1068static int xfrm_get_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
1008 struct nlattr **attrs) 1069 struct nlattr **attrs)
1009{ 1070{
@@ -2274,6 +2335,7 @@ static const int xfrm_msg_min[XFRM_NR_MSGTYPES] = {
2274 [XFRM_MSG_REPORT - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_report), 2335 [XFRM_MSG_REPORT - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_report),
2275 [XFRM_MSG_MIGRATE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id), 2336 [XFRM_MSG_MIGRATE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id),
2276 [XFRM_MSG_GETSADINFO - XFRM_MSG_BASE] = sizeof(u32), 2337 [XFRM_MSG_GETSADINFO - XFRM_MSG_BASE] = sizeof(u32),
2338 [XFRM_MSG_NEWSPDINFO - XFRM_MSG_BASE] = sizeof(u32),
2277 [XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = sizeof(u32), 2339 [XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = sizeof(u32),
2278}; 2340};
2279 2341
@@ -2308,10 +2370,17 @@ static const struct nla_policy xfrma_policy[XFRMA_MAX+1] = {
2308 [XFRMA_ADDRESS_FILTER] = { .len = sizeof(struct xfrm_address_filter) }, 2370 [XFRMA_ADDRESS_FILTER] = { .len = sizeof(struct xfrm_address_filter) },
2309}; 2371};
2310 2372
2373static const struct nla_policy xfrma_spd_policy[XFRMA_SPD_MAX+1] = {
2374 [XFRMA_SPD_IPV4_HTHRESH] = { .len = sizeof(struct xfrmu_spdhthresh) },
2375 [XFRMA_SPD_IPV6_HTHRESH] = { .len = sizeof(struct xfrmu_spdhthresh) },
2376};
2377
2311static const struct xfrm_link { 2378static const struct xfrm_link {
2312 int (*doit)(struct sk_buff *, struct nlmsghdr *, struct nlattr **); 2379 int (*doit)(struct sk_buff *, struct nlmsghdr *, struct nlattr **);
2313 int (*dump)(struct sk_buff *, struct netlink_callback *); 2380 int (*dump)(struct sk_buff *, struct netlink_callback *);
2314 int (*done)(struct netlink_callback *); 2381 int (*done)(struct netlink_callback *);
2382 const struct nla_policy *nla_pol;
2383 int nla_max;
2315} xfrm_dispatch[XFRM_NR_MSGTYPES] = { 2384} xfrm_dispatch[XFRM_NR_MSGTYPES] = {
2316 [XFRM_MSG_NEWSA - XFRM_MSG_BASE] = { .doit = xfrm_add_sa }, 2385 [XFRM_MSG_NEWSA - XFRM_MSG_BASE] = { .doit = xfrm_add_sa },
2317 [XFRM_MSG_DELSA - XFRM_MSG_BASE] = { .doit = xfrm_del_sa }, 2386 [XFRM_MSG_DELSA - XFRM_MSG_BASE] = { .doit = xfrm_del_sa },
@@ -2335,6 +2404,9 @@ static const struct xfrm_link {
2335 [XFRM_MSG_GETAE - XFRM_MSG_BASE] = { .doit = xfrm_get_ae }, 2404 [XFRM_MSG_GETAE - XFRM_MSG_BASE] = { .doit = xfrm_get_ae },
2336 [XFRM_MSG_MIGRATE - XFRM_MSG_BASE] = { .doit = xfrm_do_migrate }, 2405 [XFRM_MSG_MIGRATE - XFRM_MSG_BASE] = { .doit = xfrm_do_migrate },
2337 [XFRM_MSG_GETSADINFO - XFRM_MSG_BASE] = { .doit = xfrm_get_sadinfo }, 2406 [XFRM_MSG_GETSADINFO - XFRM_MSG_BASE] = { .doit = xfrm_get_sadinfo },
2407 [XFRM_MSG_NEWSPDINFO - XFRM_MSG_BASE] = { .doit = xfrm_set_spdinfo,
2408 .nla_pol = xfrma_spd_policy,
2409 .nla_max = XFRMA_SPD_MAX },
2338 [XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = { .doit = xfrm_get_spdinfo }, 2410 [XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = { .doit = xfrm_get_spdinfo },
2339}; 2411};
2340 2412
@@ -2371,8 +2443,9 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
2371 } 2443 }
2372 } 2444 }
2373 2445
2374 err = nlmsg_parse(nlh, xfrm_msg_min[type], attrs, XFRMA_MAX, 2446 err = nlmsg_parse(nlh, xfrm_msg_min[type], attrs,
2375 xfrma_policy); 2447 link->nla_max ? : XFRMA_MAX,
2448 link->nla_pol ? : xfrma_policy);
2376 if (err < 0) 2449 if (err < 0)
2377 return err; 2450 return err;
2378 2451