aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Altera TSE: Add main and header file for Altera Ethernet DriverVince Bridgers2014-03-17
| | | | | | | | This patch adds the main driver and header file for the Altera Triple Speed Ethernet driver. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Altera TSE: Add Miscellaneous Files for Altera Ethernet DriverVince Bridgers2014-03-17
| | | | | | | | This patch adds miscellaneous files for the Altera Ethernet Driver, including ethtool support. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Altera TSE: Add Altera Ethernet Driver SGDMA file componentsVince Bridgers2014-03-17
| | | | | | | | This patch adds the SGDMA soft IP support for the Altera Triple Speed Ethernet driver. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Altera TSE: Add Altera Ethernet Driver MSGDMA File ComponentsVince Bridgers2014-03-17
| | | | | | | | This patch adds the MSGDMA soft IP support for the Altera Triple Speed Ethernet driver. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Documentation: networking: Add Altera Ethernet (TSE) DocumentationVince Bridgers2014-03-17
| | | | | | | | | | | This patch adds a bindings description for the Altera Triple Speed Ethernet (TSE) driver. The bindings support the legacy SGDMA soft IP as well as the preferred MSGDMA soft IP. The TSE can be configured and synthesized in soft logic using Altera's Quartus toolchain. Please consult the bindings document for supported options. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* dts: Add bindings for the Altera Triple Speed Ethernet driverVince Bridgers2014-03-17
| | | | | | | | | | | This patch adds a bindings description for the Altera Triple Speed Ethernet (TSE) driver. The bindings support the legacy SGDMA soft IP as well as the preferred MSGDMA soft IP. The TSE can be configured and synthesized in soft logic using Altera's Quartus toolchain. Please consult the bindings document for supported options. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netfilter: conntrack: Fix UP buildsEric Dumazet2014-03-17
| | | | | | | | | | | | ARRAY_SIZE(nf_conntrack_locks) is undefined if spinlock_t is an empty structure. Replace it by CONNTRACK_LOCKS Fixes: 93bb0ceb75be ("netfilter: conntrack: remove central spinlock nf_conntrack_lock") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'at86rf230'David S. Miller2014-03-17
|\ | | | | | | | | | | | | | | | | | | | | | | | | Alexander Aring says: ==================== at86rf230: various fixes and devicetree support this patch series fix some bugs with the at86rf231 chip and cleaup some code. Also add devicetree support for the at86rf230 driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * at86rf230: add support for devicetreeAlexander Aring2014-03-17
| | | | | | | | | | | | | | | | | | | | | | This patch adds devicetree support for the at86rf230 driver. Possible gpios to configure are "reset-gpio" and "sleep-gpio". Also add support to configure the "irq-type" for the irq polarity register. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * at86rf230: make reset pin optionallyAlexander Aring2014-03-17
| | | | | | | | | | | | | | | | | | This patch make the reset pin optionally. Some devices like the atben from qi-hardware don't have a reset pin externally. The usually way is to turn power off/on for the atben device to initiate a device reset. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * at86rf230: change reset timingsAlexander Aring2014-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While checkpatch another patch I got a: "WARNING: msleep < 20ms can sleep for up to 20ms" The datasheet of at86rf231 and at86rf212 says a minimum delay for reset pulse width and spi access latency after reset is 625 nanoseconds. This patch removes the 1 milliseconds sleep and replace it with a 1 microseconds udelay which should be also okay for the reset pulse width. To change the state from RESET -> TRX_OFF the at86rf230 device needs 120 microseconds, this is a worst case of all at86rf* chips. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * at86rf230: move locking state in xmitAlexander Aring2014-03-17
| | | | | | | | | | | | | | There is no need to lock the clearing of IRQ_TRX_END in status. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * at86rf230: fix unexpected state changeAlexander Aring2014-03-17
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fix a unexpected state change for the at86rf231 chip. We can't change into STATE_FORCE_TX_ON while the chip is in one of SLEEP, P_ON, RESET, TRX_OFF, and all *_NOCLK states. In this case we are in the TRX_OFF state. See datasheet [1] page 71 for more information. Without this patch you will get the following message on a at86rf231 device: [ 20.065218] unexpected state change: 8, asked for 4 [ 20.070527] ------------[ cut here ]------------ [ 20.075414] WARNING: CPU: 0 PID: 160 at net/mac802154/ieee802154_dev.c:43 mac802154_slave_open+0x70/0xb8() [ 20.085594] Modules linked in: autofs4 [ 20.089667] CPU: 0 PID: 160 Comm: ifconfig Not tainted 3.14.0-20140108-1-00993-g905c192 #162 [ 20.098612] [<c00127b8>] (unwind_backtrace) from [<c0010b1c>] (show_stack+0x10/0x14) [ 20.106819] [<c0010b1c>] (show_stack) from [<c0033838>] (warn_slowpath_common+0x60/0x80) [ 20.115311] [<c0033838>] (warn_slowpath_common) from [<c00338e8>] (warn_slowpath_null+0x18/0x20) [ 20.124590] [<c00338e8>] (warn_slowpath_null) from [<c057b7e8>] (mac802154_slave_open+0x70/0xb8) [ 20.133880] [<c057b7e8>] (mac802154_slave_open) from [<c0488a58>] (__dev_open+0xa8/0x108) [ 20.142553] [<c0488a58>] (__dev_open) from [<c0488cb0>] (__dev_change_flags+0x8c/0x148) [ 20.151051] [<c0488cb0>] (__dev_change_flags) from [<c0488d84>] (dev_change_flags+0x18/0x48) [ 20.159968] [<c0488d84>] (dev_change_flags) from [<c04e2e9c>] (devinet_ioctl+0x2b0/0x63c) [ 20.168623] [<c04e2e9c>] (devinet_ioctl) from [<c04712e4>] (sock_ioctl+0x23c/0x29c) [ 20.176727] [<c04712e4>] (sock_ioctl) from [<c00e3cb8>] (do_vfs_ioctl+0x4a8/0x578) [ 20.184671] [<c00e3cb8>] (do_vfs_ioctl) from [<c00e3dd4>] (SyS_ioctl+0x4c/0x78) [ 20.192402] [<c00e3dd4>] (SyS_ioctl) from [<c000da00>] (ret_fast_syscall+0x0/0x48) [ 20.200392] ---[ end trace 9a34542f4ea08e47 ]--- This patch was tested on at86rf231 and at86rf212. [1] http://www.atmel.com/images/doc8111.pdf Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'sh_eth'David S. Miller2014-03-17
|\ | | | | | | | | | | | | | | | | | | | | | | | | Sergei Shtylyov says: ==================== Beautify 'sh_eth' driver's messages This patchset converts te driver to using netdev_*() and netif_*() to print out its messages whenever possible. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * sh_eth: fold netif_msg_*() and netdev_*() calls into netif_*() invocationsSergei Shtylyov2014-03-17
| | | | | | | | | | | | | | | | | | Now that we call netdev_*() under netif_msg_*() checks, we can fold these into netif_*() macro invocations. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sh_eth: convert dev_*() to netdev_*() callsSergei Shtylyov2014-03-17
| | | | | | | | | | | | | | | | | | Convert dev_*(&ndev->dev, ...) to netdev_*(ndev, ...) calls since they are a bit shorter and at the same time give more information on a device. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sh_eth: convert pr_*() to netdev_*() callsSergei Shtylyov2014-03-17
|/ | | | | | | | Convert pr_*() to netdev_*() calls as the latter provide info on a device. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sh_eth: exit probe with unknown register layoutSergei Shtylyov2014-03-17
| | | | | | | | | | | | Exit the driver's probe() method when the register layout is unknown as the driver would cause kernel oops in this case anyway. While at it, move the corresponding error message printout and convert it from pr_err() to dev_err(). Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'netpoll-next'David S. Miller2014-03-17
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric W. Biederman says: ==================== netpoll: Cleanup received packet processing This is the long-winded, careful, and polite version of removing the netpoll receive packet processing. First I untangle the code in small steps. Then I modify the code to not force reception and dropping of packets when we are transmiting a packet with netpoll. Finally I move all of the packet reception under CONFIG_NETPOLL_TRAP and delete CONFIG_NETPOLL_TRAP. If someone wants to do a stable backport of these patches, it would require backporting the first 18 patches that handle the budget == 0 in the networking drivers, and the first 6 of these patches. If anyone wants to resurrect netpoll packet reception someday it should just be a matter of reverting the last patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * netpoll: Remove dead packet receive code (CONFIG_NETPOLL_TRAP)Eric W. Biederman2014-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The netpoll packet receive code only becomes active if the netpoll rx_skb_hook is implemented, and there is not a single implementation of the netpoll rx_skb_hook in the kernel. All of the out of tree implementations I have found all call netpoll_poll which was removed from the kernel in 2011, so this change should not add any additional breakage. There are problems with the netpoll packet receive code. __netpoll_rx does not call dev_kfree_skb_irq or dev_kfree_skb_any in hard irq context. netpoll_neigh_reply leaks every skb it receives. Reception of packets does not work successfully on stacked devices (aka bonding, team, bridge, and vlans). Given that the netpoll packet receive code is buggy, there are no out of tree users that will be merged soon, and the code has not been used for in tree for a decade let's just remove it. Reverting this commit can server as a starting point for anyone who wants to resurrect netpoll packet reception support. Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netpoll: Move all receive processing under CONFIG_NETPOLL_TRAPEric W. Biederman2014-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make rx_skb_hook, and rx in struct netpoll depend on CONFIG_NETPOLL_TRAP Make rx_lock, rx_np, and neigh_tx in struct netpoll_info depend on CONFIG_NETPOLL_TRAP Make the functions netpoll_rx_on, netpoll_rx, and netpoll_receive_skb no-ops when CONFIG_NETPOLL_TRAP is not set. Only build netpoll_neigh_reply, checksum_udp service_neigh_queue, pkt_is_ns, and __netpoll_rx when CONFIG_NETPOLL_TRAP is defined. Add helper functions netpoll_trap_setup, netpoll_trap_setup_info, netpoll_trap_cleanup, and netpoll_trap_cleanup_info that initialize and cleanup the struct netpoll and struct netpoll_info receive specific fields when CONFIG_NETPOLL_TRAP is enabled and do nothing otherwise. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netpoll: Consolidate neigh_tx processing in service_neigh_queueEric W. Biederman2014-03-17
| | | | | | | | | | | | | | | | | | | | | | Move the bond slave device neigh_tx handling into service_neigh_queue. In connection with neigh_tx processing remove unnecessary tests of a NULL netpoll_info. As the netpoll_poll_dev has already used and thus verified the existince of the netpoll_info. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netpoll: Move netpoll_trap under CONFIG_NETPOLL_TRAPEric W. Biederman2014-03-17
| | | | | | | | | | | | | | | | | | | | | | | | Now that we no longer need to receive packets to safely drain the network drivers receive queue move netpoll_trap and netpoll_set_trap under CONFIG_NETPOLL_TRAP Making netpoll_trap and netpoll_set_trap noop inline functions when CONFIG_NETPOLL_TRAP is not set. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netpoll: Don't drop all received packets.Eric W. Biederman2014-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the strategy of netpoll from dropping all packets received during netpoll_poll_dev to calling napi poll with a budget of 0 (to avoid processing drivers rx queue), and to ignore packets received with netif_rx (those will safely be placed on the backlog queue). All of the netpoll supporting drivers have been reviewed to ensure either thay use netif_rx or that a budget of 0 is supported by their napi poll routine and that a budget of 0 will not process the drivers rx queues. Not dropping packets makes NETPOLL_RX_DROP unnecesary so it is removed. npinfo->rx_flags is removed as rx_flags with just the NETPOLL_RX_ENABLED flag becomes just a redundant mirror of list_empty(&npinfo->rx_np). Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netpoll: Add netpoll_rx_processingEric W. Biederman2014-03-17
| | | | | | | | | | | | | | | | Add a helper netpoll_rx_processing that reports when netpoll has receive side processing to perform. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netpoll: Warn if more packets are processed than are budgetedEric W. Biederman2014-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There is already a warning for this case in the normal netpoll path, but put a copy here in case how netpoll calls the poll functions causes a differenet result. netpoll will shortly call the napi poll routine with a budget 0 to avoid any rx packets being processed. As nothing does that today we may encounter drivers that have problems so a netpoll specific warning seems desirable. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netpoll: Visit all napi handlers in poll_napiEric W. Biederman2014-03-17
| | | | | | | | | | | | | | | | | | In poll_napi loop through all of the napi handlers even when the budget falls to 0 to ensure that we process all of the tx_queues, and so that we continue to call into drivers when our initial budget is 0. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netpoll: Pass budget into poll_napiEric W. Biederman2014-03-17
| | | | | | | | | | | | | | | | | | This moves the control logic to the top level in netpoll_poll_dev instead of having it dispersed throughout netpoll_poll_dev, poll_napi and poll_one_napi. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netpoll: move setting of NETPOLL_RX_DROP into netpoll_poll_devEric W. Biederman2014-03-17
|/ | | | | | | | | | | | Today netpoll depends on setting NETPOLL_RX_DROP before networking drivers receive packets in interrupt context so that the packets can be dropped. Move this setting into netpoll_poll_dev from poll_one_napi so that if ndo_poll_controller happens to receive packets we will drop the packets on the floor instead of letting the packets bounce through the networking stack and potentially cause problems. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2014-03-17
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, most relevantly they are: * cleanup to remove double semicolon from stephen hemminger. * calm down sparse warning in xt_ipcomp, from Fan Du. * nf_ct_labels support for nf_tables, from Florian Westphal. * new macros to simplify rcu dereferences in the scope of nfnetlink and nf_tables, from Patrick McHardy. * Accept queue and drop (including reason for drop) to verdict parsing in nf_tables, also from Patrick. * Remove unused random seed initialization in nfnetlink_log, from Florian Westphal. * Allow to attach user-specific information to nf_tables rules, useful to attach user comments to rule, from me. * Return errors in ipset according to the manpage documentation, from Jozsef Kadlecsik. * Fix coccinelle warnings related to incorrect bool type usage for ipset, from Fengguang Wu. * Add hash:ip,mark set type to ipset, from Vytas Dauksa. * Fix message for each spotted by ipset for each netns that is created, from Ilia Mirkin. * Add forceadd option to ipset, which evicts a random entry from the set if it becomes full, from Josh Hunt. * Minor IPVS cleanups and fixes from Andi Kleen and Tingwei Liu. * Improve conntrack scalability by removing a central spinlock, original work from Eric Dumazet. Jesper Dangaard Brouer took them over to address remaining issues. Several patches to prepare this change come in first place. * Rework nft_hash to resolve bugs (leaking chain, missing rcu synchronization on element removal, etc. from Patrick McHardy. * Restore context in the rule deletion path, as we now release rule objects synchronously, from Patrick McHardy. This gets back event notification for anonymous sets. * Fix NAT family validation in nft_nat, also from Patrick. * Improve scalability of xt_connlimit by using an array of spinlocks and by introducing a rb-tree of hashtables for faster lookup of accounted objects per network. This patch was preceded by several patches and refactorizations to accomodate this change including the use of kmem_cache, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: connlimit: use rbtree for per-host conntrack obj storageFlorian Westphal2014-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With current match design every invocation of the connlimit_match function means we have to perform (number_of_conntracks % 256) lookups in the conntrack table [ to perform GC/delete stale entries ]. This is also the reason why ____nf_conntrack_find() in perf top has > 20% cpu time per core. This patch changes the storage to rbtree which cuts down the number of ct objects that need testing. When looking up a new tuple, we only test the connections of the host objects we visit while searching for the wanted host/network (or the leaf we need to insert at). The slot count is reduced to 32. Increasing slot count doesn't speed up things much because of rbtree nature. before patch (50kpps rx, 10kpps tx): + 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find + 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find + 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw + 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw + 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw after (90kpps, 51kpps tx): + 17.24% swapper [nf_conntrack] [k] ____nf_conntrack_find + 6.60% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 2.73% swapper [nf_conntrack] [k] hash_conntrack_raw + 2.36% swapper [xt_connlimit] [k] count_tree Obvious disadvantages to previous version are the increase in code complexity and the increased memory cost. Partially based on Eric Dumazets fq scheduler. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: connlimit: make same_source_net signedFlorian Westphal2014-03-17
| | | | | | | | | | | | | | | | | | currently returns 1 if they're the same. Make it work like mem/strcmp so it can be used as rbtree search function. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: connlimit: use keyed locksFlorian Westphal2014-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | connlimit currently suffers from spinlock contention, example for 4-core system with rps enabled: + 20.84% ksoftirqd/2 [kernel.kallsyms] [k] _raw_spin_lock_bh + 20.76% ksoftirqd/1 [kernel.kallsyms] [k] _raw_spin_lock_bh + 20.42% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_lock_bh + 6.07% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 6.07% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find + 5.97% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find + 2.47% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw + 2.45% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw + 2.44% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw May allow parallel lookup/insert/delete if the entry is hashed to another slot. With patch: + 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find + 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find + 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find + 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw + 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw + 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw + 2.00% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock Improved rx processing rate from ~35kpps to ~50 kpps. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: Convert uses of __constant_<foo> to <foo>Joe Perches2014-03-13
| | | | | | | | | | | | | | | | | | | | The use of __constant_<foo> has been unnecessary for quite awhile now. Make these uses consistent with the rest of the kernel. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: connlimit: use kmem_cache for conn objectsFlorian Westphal2014-03-12
| | | | | | | | | | | | | | | | | | | | We might allocate thousands of these (one object per connection). Use distinct kmem cache to permit simplte tracking on how many objects are currently used by the connlimit match via the sysfs. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: connlimit: move insertion of new element out of count functionFlorian Westphal2014-03-12
| | | | | | | | | | | | | | | | Allows easier code-reuse in followup patches. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: connlimit: improve packet-to-closed-connection logicFlorian Westphal2014-03-12
| | | | | | | | | | | | | | | | | | | | | | Instead of freeing the entry from our list and then adding it back again in the 'packet to closing connection' case just keep the matching entry around. Also drop the found_ct != NULL test as nf_ct_tuplehash_to_ctrack is just container_of(). Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: connlimit: factor hlist search into new functionFlorian Westphal2014-03-12
| | | | | | | | | | | | | | | | | | Simplifies followup patch that introduces separate locks for each of the hash slots. Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nft_nat: fix family validationPatrick McHardy2014-03-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The family in the NAT expression is basically completely useless since we have it available during runtime anyway. Nevertheless it is used to decide the NAT family, so at least validate it properly. As we don't support cross-family NAT, it needs to match the family of the table the expression exists in. Unfortunately we can't remove it completely since we need to dump it for userspace (*sigh*), so at least reduce the memory waste. Additionally clean up the module init function by removing useless temporary variables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nft_ct: remove family from struct nft_ctPatrick McHardy2014-03-08
| | | | | | | | | | | | | | | | Since we have the context available during destruction again, we can remove the family from the private structure. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_tables: restore notifications for anonymous set destructionPatrick McHardy2014-03-08
| | | | | | | | | | | | | | | | Since we have the context available again, we can restore notifications for destruction of anonymous sets. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_tables: restore context for expression destructorsPatrick McHardy2014-03-08
| | | | | | | | | | | | | | | | | | | | | | | | In order to fix set destruction notifications and get rid of unnecessary members in private data structures, pass the context to expressions' destructor functions again. In order to do so, replace various members in the nft_rule_trans structure by the full context. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nf_tables: clean up nf_tables_trans_add() argument orderPatrick McHardy2014-03-08
| | | | | | | | | | | | | | | | The context argument logically comes first, and this is what every other function dealing with contexts does. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: nft_hash: bug fixes and resizingPatrick McHardy2014-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hash set type is very broken and was never meant to be merged in this state. Missing RCU synchronization on element removal, leaking chain refcounts when used as a verdict map, races during lookups, a fixed table size are probably just some of the problems. Luckily it is currently never chosen by the kernel when the rbtree type is also available. Rewrite it to be usable. The new implementation supports automatic hash table resizing using RCU, based on Paul McKenney's and Josh Triplett's algorithm "Optimized Resizing For RCU-Protected Hash Tables" described in [1]. Resizing doesn't require a second list head in the elements, it works by chosing a hash function that remaps elements to a predictable set of buckets, only resizing by integral factors and - during expansion: linking new buckets to the old bucket that contains elements for any of the new buckets, thereby creating imprecise chains, then incrementally seperating the elements until the new buckets only contain elements that hash directly to them. - during shrinking: linking the hash chains of all old buckets that hash to the same new bucket to form a single chain. Expansion requires at most the number of elements in the longest hash chain grace periods, shrinking requires a single grace period. Due to the requirement of having hash chains/elements linked to multiple buckets during resizing, homemade single linked lists are used instead of the existing list helpers, that don't support this in a clean fashion. As a side effect, the amount of memory required per element is reduced by one pointer. Expansion is triggered when the load factors exceeds 75%, shrinking when the load factor goes below 30%. Both operations are allowed to fail and will be retried on the next insertion or removal if their respective conditions still hold. [1] http://dl.acm.org/citation.cfm?id=2002181.2002192 Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: conntrack: remove central spinlock nf_conntrack_lockJesper Dangaard Brouer2014-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nf_conntrack_lock is a monolithic lock and suffers from huge contention on current generation servers (8 or more core/threads). Perf locking congestion is clear on base kernel: - 72.56% ksoftirqd/6 [kernel.kallsyms] [k] _raw_spin_lock_bh - _raw_spin_lock_bh + 25.33% init_conntrack + 24.86% nf_ct_delete_from_lists + 24.62% __nf_conntrack_confirm + 24.38% destroy_conntrack + 0.70% tcp_packet + 2.21% ksoftirqd/6 [kernel.kallsyms] [k] fib_table_lookup + 1.15% ksoftirqd/6 [kernel.kallsyms] [k] __slab_free + 0.77% ksoftirqd/6 [kernel.kallsyms] [k] inet_getpeer + 0.70% ksoftirqd/6 [nf_conntrack] [k] nf_ct_delete + 0.55% ksoftirqd/6 [ip_tables] [k] ipt_do_table This patch change conntrack locking and provides a huge performance improvement. SYN-flood attack tested on a 24-core E5-2695v2(ES) with 10Gbit/s ixgbe (with tool trafgen): Base kernel: 810.405 new conntrack/sec After patch: 2.233.876 new conntrack/sec Notice other floods attack (SYN+ACK or ACK) can easily be deflected using: # iptables -A INPUT -m state --state INVALID -j DROP # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0 Use an array of hashed spinlocks to protect insertions/deletions of conntracks into the hash table. 1024 spinlocks seem to give good results, at minimal cost (4KB memory). Due to lockdep max depth, 1024 becomes 8 if CONFIG_LOCKDEP=y The hash resize is a bit tricky, because we need to take all locks in the array. A seqcount_t is used to synchronize the hash table users with the resizing process. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: conntrack: seperate expect locking from nf_conntrack_lockJesper Dangaard Brouer2014-03-07
| | | | | | | | | | | | | | | | | | | | | | Netfilter expectations are protected with the same lock as conntrack entries (nf_conntrack_lock). This patch split out expectations locking to use it's own lock (nf_conntrack_expect_lock). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: avoid race with exp->master ctJesper Dangaard Brouer2014-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Preparation for disconnecting the nf_conntrack_lock from the expectations code. Once the nf_conntrack_lock is lifted, a race condition is exposed. The expectations master conntrack exp->master, can race with delete operations, as the refcnt increment happens too late in init_conntrack(). Race is against other CPUs invoking ->destroy() (destroy_conntrack()), or nf_ct_delete() (via timeout or early_drop()). Avoid this race in nf_ct_find_expectation() by using atomic_inc_not_zero(), and checking if nf_ct_is_dying() (path via nf_ct_delete()). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: conntrack: spinlock per cpu to protect special lists.Jesper Dangaard Brouer2014-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | One spinlock per cpu to protect dying/unconfirmed/template special lists. (These lists are now per cpu, a bit like the untracked ct) Add a @cpu field to nf_conn, to make sure we hold the appropriate spinlock at removal time. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: trivial code cleanup and doc changesJesper Dangaard Brouer2014-03-07
| | | | | | | | | | | | | | | | | | | | | | | | Changes while reading through the netfilter code. Added hint about how conntrack nf_conn refcnt is accessed. And renamed repl_hash to reply_hash for readability Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-nextPablo Neira Ayuso2014-03-07
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Via Simon Horman: ==================== * Whitespace cleanup spotted by checkpatch.pl from Tingwei Liu. * Section conflict cleanup, basically removal of one wrong __read_mostly, from Andi Kleen. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>