aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar2016-11-16
|\ | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Merge tag 'trace-v4.9-rc5' of ↵Linus Torvalds2016-11-15
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Alexei discovered a race condition in modules failing to load that can cause a ftrace check to trigger and disable ftrace. This is because of the way modules are registered to ftrace. Their functions are loaded in the ftrace function tables but set to "disabled" since they are still in the process of being loaded by the module. After the module is finished, it calls back into the ftrace infrastructure to enable it. Looking deeper into the locations that access all the functions in the table, I found more locations that should ignore the disabled ones" * tag 'trace-v4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Add more checks for FTRACE_FL_DISABLED in processing ip records ftrace: Ignore FTRACE_FL_DISABLED while walking dyn_ftrace records
| | * ftrace: Add more checks for FTRACE_FL_DISABLED in processing ip recordsSteven Rostedt (Red Hat)2016-11-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a module is first loaded and its function ip records are added to the ftrace list of functions to modify, they are set to DISABLED, as their text is still in a read only state. When the module is fully loaded, and can be updated, the flag is cleared, and if their's any functions that should be tracing them, it is updated at that moment. But there's several locations that do record accounting and should ignore records that are marked as disabled, or they can cause issues. Alexei already fixed one location, but others need to be addressed. Cc: stable@vger.kernel.org Fixes: b7ffffbb46f2 "ftrace: Add infrastructure for delayed enabling of module functions" Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| | * ftrace: Ignore FTRACE_FL_DISABLED while walking dyn_ftrace recordsAlexei Starovoitov2016-11-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ftrace_shutdown() checks for sanity of ftrace records and if dyn_ftrace->flags is not zero, it will warn. It can happen that 'flags' are set to FTRACE_FL_DISABLED at this point, since some module was loaded, but before ftrace_module_enable() cleared the flags for this module. In other words the module.c is doing: ftrace_module_init(mod); // calls ftrace_update_code() that sets flags=FTRACE_FL_DISABLED ... // here ftrace_shutdown() is called that warns, since err = prepare_coming_module(mod); // didn't have a chance to clear FTRACE_FL_DISABLED Fix it by ignoring disabled records. It's similar to what __ftrace_hash_rec_update() is already doing. Link: http://lkml.kernel.org/r/1478560460-3818619-1-git-send-email-ast@fb.com Cc: stable@vger.kernel.org Fixes: b7ffffbb46f2 "ftrace: Add infrastructure for delayed enabling of module functions" Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | Merge tag 'fbdev-fixes-4.9' of ↵Linus Torvalds2016-11-15
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux Pull fbdev fix from Tomi Valkeinen: "Fix CLCD regression on Vexpress" * tag 'fbdev-fixes-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: video: ARM CLCD: fix Vexpress regression
| | * | video: ARM CLCD: fix Vexpress regressionLinus Walleij2016-11-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CLCD does not come up on Versatile Express as it does not (currently) have a syscon node for controlling the block apart from the CLCD itself. Make sure the .init() function can bail out without an error making it probe again. Reported-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Amit Pundir <amit.pundir@linaro.org> Tested-by: Nicolae Rosia <nicolae_rosia@mentor.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
| * | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2016-11-14
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Fix off by one wrt. indexing when dumping /proc/net/route entries, from Alexander Duyck. 2) Fix lockdep splats in iwlwifi, from Johannes Berg. 3) Cure panic when inserting certain netfilter rules when NFT_SET_HASH is disabled, from Liping Zhang. 4) Memory leak when nft_expr_clone() fails, also from Liping Zhang. 5) Disable UFO when path will apply IPSEC tranformations, from Jakub Sitnicki. 6) Don't bogusly double cwnd in dctcp module, from Florian Westphal. 7) skb_checksum_help() should never actually use the value "0" for the resulting checksum, that has a special meaning, use CSUM_MANGLED_0 instead. From Eric Dumazet. 8) Per-tx/rx queue statistic strings are wrong in qed driver, fix from Yuval MIntz. 9) Fix SCTP reference counting of associations and transports in sctp_diag. From Xin Long. 10) When we hit ip6tunnel_xmit() we could have come from an ipv4 path in a previous layer or similar, so explicitly clear the ipv6 control block in the skb. From Eli Cooper. 11) Fix bogus sleeping inside of inet_wait_for_connect(), from WANG Cong. 12) Correct deivce ID of T6 adapter in cxgb4 driver, from Hariprasad Shenai. 13) Fix potential access past the end of the skb page frag array in tcp_sendmsg(). From Eric Dumazet. 14) 'skb' can legitimately be NULL in inet{,6}_exact_dif_match(). Fix from David Ahern. 15) Don't return an error in tcp_sendmsg() if we wronte any bytes successfully, from Eric Dumazet. 16) Extraneous unlocks in netlink_diag_dump(), we removed the locking but forgot to purge these unlock calls. From Eric Dumazet. 17) Fix memory leak in error path of __genl_register_family(). We leak the attrbuf, from WANG Cong. 18) cgroupstats netlink policy table is mis-sized, from WANG Cong. 19) Several XDP bug fixes in mlx5, from Saeed Mahameed. 20) Fix several device refcount leaks in network drivers, from Johan Hovold. 21) icmp6_send() should use skb dst device not skb->dev to determine L3 routing domain. From David Ahern. 22) ip_vs_genl_family sets maxattr incorrectly, from WANG Cong. 23) We leak new macvlan port in some cases of maclan_common_netlink() errors. Fix from Gao Feng. 24) Similar to the icmp6_send() fix, icmp_route_lookup() should determine L3 routing domain using skb_dst(skb)->dev not skb->dev. Also from David Ahern. 25) Several fixes for route offloading and FIB notification handling in mlxsw driver, from Jiri Pirko. 26) Properly cap __skb_flow_dissect()'s return value, from Eric Dumazet. 27) Fix long standing regression in ipv4 redirect handling, wrt. validating the new neighbour's reachability. From Stephen Suryaputra Lin. 28) If sk_filter() trims the packet excessively, handle it reasonably in tcp input instead of exploding. From Eric Dumazet. 29) Fix handling of napi hash state when copying channels in sfc driver, from Bert Kenward. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (121 commits) mlxsw: spectrum_router: Flush FIB tables during fini net: stmmac: Fix lack of link transition for fixed PHYs sctp: change sk state only when it has assocs in sctp_shutdown bnx2: Wait for in-flight DMA to complete at probe stage Revert "bnx2: Reset device during driver initialization" ps3_gelic: fix spelling mistake in debug message net: ethernet: ixp4xx_eth: fix spelling mistake in debug message ibmvnic: Fix size of debugfs name buffer ibmvnic: Unmap ibmvnic_statistics structure sfc: clear napi_hash state when copying channels mlxsw: spectrum_router: Correctly dump neighbour activity mlxsw: spectrum: Fix refcount bug on span entries bnxt_en: Fix VF virtual link state. bnxt_en: Fix ring arithmetic in bnxt_setup_tc(). Revert "include/uapi/linux/atm_zatm.h: include linux/time.h" tcp: take care of truncations done by sk_filter() ipv4: use new_gw for redirect neigh lookup r8152: Fix error path in open function net: bpqether.h: remove if_ether.h guard net: __skb_flow_dissect() must cap its return value ...
| | * | | mlxsw: spectrum_router: Flush FIB tables during finiIdo Schimmel2016-11-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls") we reflect to the device the entire FIB table and not only FIBs that point to netdevs created by the driver. During module removal, FIBs of the second type are removed following NETDEV_UNREGISTER events sent. The other FIBs are still present in both the driver's cache and the device's table. Fix this by iterating over all the FIB tables in the device and flush them. There's no need to take locks, as we're the only writer. Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | net: stmmac: Fix lack of link transition for fixed PHYsFlorian Fainelli2016-11-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 52f95bbfcf72 ("stmmac: fix adjust link call in case of a switch is attached") added some logic to avoid polling the fixed PHY and therefore invoking the adjust_link callback more than once, since this is a fixed PHY and link events won't be generated. This works fine the first time, because we start with phydev->irq = PHY_POLL, so we call adjust_link, then we set phydev->irq = PHY_IGNORE_INTERRUPT and we stop polling the PHY. Now, if we called ndo_close(), which calls both phy_stop() and does an explicit netif_carrier_off(), we end up with a link down. Upon calling ndo_open() again, despite starting the PHY state machine, we have PHY_IGNORE_INTERRUPT set, and we generate no link event at all, so the link is permanently down. Fixes: 52f95bbfcf72 ("stmmac: fix adjust link call in case of a switch is attached") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | sctp: change sk state only when it has assocs in sctp_shutdownXin Long2016-11-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now when users shutdown a sock with SEND_SHUTDOWN in sctp, even if this sock has no connection (assoc), sk state would be changed to SCTP_SS_CLOSING, which is not as we expect. Besides, after that if users try to listen on this sock, kernel could even panic when it dereference sctp_sk(sk)->bind_hash in sctp_inet_listen, as bind_hash is null when sock has no assoc. This patch is to move sk state change after checking sk assocs is not empty, and also merge these two if() conditions and reduce indent level. Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | Merge branch 'bnx2-kdump-fix'David S. Miller2016-11-14
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Baoquan He says: ==================== bnx2: Wait for in-flight DMA to complete at probe stage This is v2 post. In commit 3e1be7a ("bnx2: Reset device during driver initialization"), firmware requesting code was moved from open stage to probe stage. The reason is in kdump kernel hardware iommu need device be reset in driver probe stage, otherwise those in-flight DMA from 1st kernel will continue going and look up into the newly created io-page tables. However bnx2 chip resetting involves firmware requesting issue, that need be done in open stage. Michale Chan suggested we can just wait for the old in-flight DMA to complete at probe stage, then though without device resetting, we don't need to worry the old in-flight DMA could continue looking up the newly created io-page tables. v1->v2: Michael suggested to wait for the in-flight DMA to complete at probe stage. So give up the old method of trying to reset chip at probe stage, take the new way accordingly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | bnx2: Wait for in-flight DMA to complete at probe stageBaoquan He2016-11-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In-flight DMA from 1st kernel could continue going in kdump kernel. New io-page table has been created before bnx2 does reset at open stage. We have to wait for the in-flight DMA to complete to avoid it look up into the newly created io-page table at probe stage. Suggested-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | Revert "bnx2: Reset device during driver initialization"Baoquan He2016-11-14
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c. When people build bnx2 driver into kernel, it will fail to detect and load firmware because firmware is contained in initramfs and initramfs has not been uncompressed yet during do_initcalls. So revert commit 3e1be7a and work out a new way in the later patch. Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | ps3_gelic: fix spelling mistake in debug messageColin Ian King2016-11-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trivial fix to spelling mistake "unmached" to "unmatched" in debug message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | net: ethernet: ixp4xx_eth: fix spelling mistake in debug messageColin Ian King2016-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trivial fix to spelling mistake "successed" to "succeeded" in debug message. Also unwrap multi-line literal string. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | ibmvnic: Fix size of debugfs name bufferThomas Falcon2016-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This mistake was causing debugfs directory creation failures when multiple ibmvnic devices were probed. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | ibmvnic: Unmap ibmvnic_statistics structureThomas Falcon2016-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This structure was mapped but never subsequently unmapped. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | sfc: clear napi_hash state when copying channelsBert Kenward2016-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | efx_copy_channel() doesn't correctly clear the napi_hash related state. This means that when napi_hash_add is called for that channel nothing is done, and we are left with a copy of the napi_hash_node from the old channel. When we later call napi_hash_del() on this channel we have a stale napi_hash_node. Corruption is only seen when there are multiple entries in one of the napi_hash lists. This is made more likely by having a very large number of channels. Testing was carried out with 512 channels - 32 channels on each of 16 ports. This failure typically appears as protection faults within napi_by_id() or napi_hash_add(). efx_copy_channel() is only used when tx or rx ring sizes are changed (ethtool -G). Fixes: 36763266bbe8 ("sfc: Add support for busy polling") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | Merge branch 'mlxsw-fixes'David S. Miller2016-11-13
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri Pirko says: ==================== mlxsw: Couple of fixes Please, queue-up both for stable. Thanks! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | mlxsw: spectrum_router: Correctly dump neighbour activityArkadi Sharshevsky2016-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The device's neighbour table is periodically dumped in order to update the kernel about active neighbours. A single dump session may span multiple queries, until the response carries less records than requested or when a record (can contain up to four neighbour entries) is not full. Current code stops the session when the number of returned records is zero, which can result in infinite loop in case of high packet rate. Fix this by stopping the session according to the above logic. Fixes: c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table") Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | mlxsw: spectrum: Fix refcount bug on span entriesYotam Gigi2016-11-13
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When binding port to a newly created span entry, its refcount is initialized to zero even though it has a bound port. That leads to unexpected behaviour when the user tries to delete that port from the span entry. Fix this by initializing the reference count to 1. Also add a warning to put function. Fixes: 763b4b70afcd ("mlxsw: spectrum: Add support in matchall mirror TC offloading") Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | Merge branch 'bnxt_en-fixes'David S. Miller2016-11-13
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michael Chan says: ==================== bnxt_en: 2 bug fixes. Bug fixes in bnxt_setup_tc() and VF vitual link state. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | bnxt_en: Fix VF virtual link state.Michael Chan2016-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the physical link is down and the VF virtual link is set to "enable", the current code does not always work. If the link is down but the cable is attached, the firmware returns LINK_SIGNAL instead of NO_LINK. The current code is treating LINK_SIGNAL as link up. The fix is to treat link as down when the link_status != LINK. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | bnxt_en: Fix ring arithmetic in bnxt_setup_tc().Michael Chan2016-11-13
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The logic is missing the check on whether the tx and rx rings are sharing completion rings or not. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | Revert "include/uapi/linux/atm_zatm.h: include linux/time.h"Mike Frysinger2016-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit cf00713a655d ("include/uapi/linux/atm_zatm.h: include linux/time.h"). This attempted to fix userspace breakage that no longer existed when the patch was merged. Almost one year earlier, commit 70ba07b675b5 ("atm: remove 'struct zatm_t_hist'") deleted the struct in question. After this patch was merged, we now have to deal with people being unable to include this header in conjunction with standard C library headers like stdlib.h (which linux-atm does). Example breakage: x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../.. -I./../q2931 -I./../saal \ -I. -DCPPFLAGS_TEST -I../../src/include -O2 -march=native -pipe -g \ -frecord-gcc-switches -freport-bug -Wimplicit-function-declaration \ -Wnonnull -Wstrict-aliasing -Wparentheses -Warray-bounds \ -Wfree-nonheap-object -Wreturn-local-addr -fno-strict-aliasing -Wall \ -Wshadow -Wpointer-arith -Wwrite-strings -Wstrict-prototypes -c zntune.c In file included from /usr/include/linux/atm_zatm.h:17:0, from zntune.c:17: /usr/include/linux/time.h:9:8: error: redefinition of ‘struct timespec’ struct timespec { ^ In file included from /usr/include/sys/select.h:43:0, from /usr/include/sys/types.h:219, from /usr/include/stdlib.h:314, from zntune.c:9: /usr/include/time.h:120:8: note: originally defined here struct timespec ^ Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | tcp: take care of truncations done by sk_filter()Eric Dumazet2016-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With syzkaller help, Marco Grassi found a bug in TCP stack, crashing in tcp_collapse() Root cause is that sk_filter() can truncate the incoming skb, but TCP stack was not really expecting this to happen. It probably was expecting a simple DROP or ACCEPT behavior. We first need to make sure no part of TCP header could be removed. Then we need to adjust TCP_SKB_CB(skb)->end_seq Many thanks to syzkaller team and Marco for giving us a reproducer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Marco Grassi <marco.gra@gmail.com> Reported-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | ipv4: use new_gw for redirect neigh lookupStephen Suryaputra Lin2016-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0 and then the state of the neigh for the new_gw is checked. If the state isn't valid then the redirected route is deleted. This behavior is maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway is assigned to peer->redirect_learned.a4 before calling ipv4_neigh_lookup(). After commit 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again."), ipv4_neigh_lookup() is performed without the rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw) isn't zero, the function uses it as the key. The neigh is most likely valid since the old_gw is the one that sends the ICMP redirect message. Then the new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may never gets resolved and the traffic is blackholed. So, use the new_gw for neigh lookup. Changes from v1: - use __ipv4_neigh_lookup instead (per Eric Dumazet). Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.") Signed-off-by: Stephen Suryaputra Lin <ssurya@ieee.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | r8152: Fix error path in open functionGuenter Roeck2016-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If usb_submit_urb() called from the open function fails, the following crash may be observed. r8152 8-1:1.0 eth0: intr_urb submit failed: -19 ... r8152 8-1:1.0 eth0: v1.08.3 Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b7b pgd = ffffffc0e7305000 [6b6b6b6b6b6b6b7b] *pgd=0000000000000000, *pud=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP ... PC is at notifier_chain_register+0x2c/0x58 LR is at blocking_notifier_chain_register+0x54/0x70 ... Call trace: [<ffffffc0002407f8>] notifier_chain_register+0x2c/0x58 [<ffffffc000240bdc>] blocking_notifier_chain_register+0x54/0x70 [<ffffffc00026991c>] register_pm_notifier+0x24/0x2c [<ffffffbffc183200>] rtl8152_open+0x3dc/0x3f8 [r8152] [<ffffffc000808000>] __dev_open+0xac/0x104 [<ffffffc0008082f8>] __dev_change_flags+0xb0/0x148 [<ffffffc0008083c4>] dev_change_flags+0x34/0x70 [<ffffffc000818344>] do_setlink+0x2c8/0x888 [<ffffffc0008199d4>] rtnl_newlink+0x328/0x644 [<ffffffc000819e98>] rtnetlink_rcv_msg+0x1a8/0x1d4 [<ffffffc0008373c8>] netlink_rcv_skb+0x68/0xd0 [<ffffffc000817990>] rtnetlink_rcv+0x2c/0x3c [<ffffffc000836d1c>] netlink_unicast+0x16c/0x234 [<ffffffc00083720c>] netlink_sendmsg+0x340/0x364 [<ffffffc0007e85d0>] sock_sendmsg+0x48/0x60 [<ffffffc0007e9c30>] SyS_sendto+0xe0/0x120 [<ffffffc0007e9cb0>] SyS_send+0x40/0x4c [<ffffffc000203e34>] el0_svc_naked+0x24/0x28 Clean up error handling to avoid registering the notifier if the open function is going to fail. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | net: bpqether.h: remove if_ether.h guardBaruch Siach2016-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __LINUX_IF_ETHER_H is not defined anywhere, and if_ether.h can keep itself from double inclusion, though it uses a single underscore prefix. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | net: __skb_flow_dissect() must cap its return valueEric Dumazet2016-11-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After Tom patch, thoff field could point past the end of the buffer, this could fool some callers. If an skb was provided, skb->len should be the upper limit. If not, hlen is supposed to be the upper limit. Fixes: a6e544b0a88b ("flow_dissector: Jump to exit code in __skb_flow_dissect") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Yibin Yang <yibyang@cisco.com Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | Merge branch 'fix-bpf_redirect'David S. Miller2016-11-12
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Martin KaFai Lau says: ==================== bpf: Fix bpf_redirect to an ipip/ip6tnl dev This patch set fixes a bug in bpf_redirect(dev, flags) when dev is an ipip/ip6tnl. The current problem is IP-EthHdr-IP is sent out instead of IP-IP. Patch 1 adds a dev->type test similar to dev_is_mac_header_xmit() in act_mirred.c which is only available in net-next. We can consider to refactor it once this patch is pulled into net-next from net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | bpf: Add test for bpf_redirect to ipip/ip6tnlMartin KaFai Lau2016-11-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test creates two netns, ns1 and ns2. The host (the default netns) has an ipip or ip6tnl dev configured for tunneling traffic to the ns2. ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback) The test is to have ns1 pinging VIPs configured at the loopback interface in ns2. The VIPs are 10.10.1.102 and 2401:face::66 (which are configured at lo@ns2). [Note: 0x66 => 102]. At ns1, the VIPs are routed _via_ the host. At the host, bpf programs are installed at the veth to redirect packets from a veth to the ipip/ip6tnl. The test is configured in a way so that both ingress and egress can be tested. At ns2, the ipip/ip6tnl dev is configured with the local and remote address specified. The return path is routed to the dev ipip/ip6tnl. During egress test, the host also locally tests pinging the VIPs to ensure that bpf_redirect at egress also works for the direct egress (i.e. not forwarding from dev ve1 to ve2). Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | bpf: Fix bpf_redirect to an ipip/ip6tnl devMartin KaFai Lau2016-11-12
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the bpf program calls bpf_redirect(dev, 0) and dev is an ipip/ip6tnl, it currently includes the mac header. e.g. If dev is ipip, the end result is IP-EthHdr-IP instead of IP-IP. The fix is to pull the mac header. At ingress, skb_postpull_rcsum() is not needed because the ethhdr should have been pulled once already and then got pushed back just before calling the bpf_prog. At egress, this patch calls skb_postpull_rcsum(). If bpf_redirect(dev, BPF_F_INGRESS) is called, it also fails now because it calls dev_forward_skb() which eventually calls eth_type_trans(skb, dev). The eth_type_trans() will set skb->type = PACKET_OTHERHOST because the mac address does not match the redirecting dev->dev_addr. The PACKET_OTHERHOST will eventually cause the ip_rcv() errors out. To fix this, ____dev_forward_skb() is added. Joint work with Daniel Borkmann. Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel") Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | Merge branch 'mlxsw-fixes'David S. Miller2016-11-10
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri Pirko says: ==================== mlxsw: Couple of router fixes v1->v2: - patch2: - use net_eq ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | mlxsw: spectrum_router: Ignore FIB notification events for non-init namespacesJiri Pirko2016-11-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since now, the table with same id in multiple netnamespaces were squashed to a single virtual router. That is not only incorrect, it also causes error messages when trying to use RALUE register to do double remove of FIB entries, like this one: mlxsw_spectrum 0000:03:00.0: EMAD reg access failed (tid=facb831c00007b20,reg_id=8013(ralue),type=write,status=7(bad parameter)) Since we don't allow ports to change namespaces (NETIF_F_NETNS_LOCAL), and the infrastructure is not yet prepared to handle netnamespaces, just ignore FIB notification events for non-init namespaces. That is clear to do since we don't need to offload them. Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | mlxsw: spectrum_router: Fix handling of neighbour structureJiri Pirko2016-11-10
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __neigh_create function works in a different way than assumed. It passes "n" as a parameter to ndo_neigh_construct. But this "n" might be destroyed right away before __neigh_create() returns in case there is already another neighbour struct in the hashtable with the same dev and primary key. That is not expected by mlxsw_sp_router_neigh_construct() and the stored "n" points to freed memory, eventually leading to crash. Fix this by doing tight 1:1 coupling between neighbour struct and internal driver neigh_entry. That allows to narrow down the key in internal driver hashtable to do lookups by "n" only. Fixes: 6cf3c971dc84 ("mlxsw: spectrum_router: Add private neigh table") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | Merge branch 'qed-fixes'David S. Miller2016-11-10
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yuval Mintz says: ==================== qed: Fix RoCE infrastructure This series fixes 2 basic issues with RoCE support, one handles a missing configuration in the initial infrastructure support while the other is a regression introduced by one of the initial fix submissions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | qed: Correct rdma params configurationRam Amrani2016-11-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous fix has broken RoCE support as the rdma_pf_params are now being set into the parameters only after the params are alrady assigned into the hw-function. Fixes: 0189efb8f4f8 ("qed*: Fix Kconfig dependencies with INFINIBAND_QEDR") Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | qed: configure ll2 RoCE v1/v2 flavor correctlyRam Amrani2016-11-10
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently RoCE v2 won't operate with RDMA CM due to missing setting of the roce-flavour in the ll2 configuration. This patch properly sets the flavour, and deletes incorrect HSI that doesn't [yet] exist. Fixes: abd49676c707 ("qed: Add RoCE ll2 & GSI support") Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | ipv4: update comment to document GSO fragmentation cases.Lance Richardson2016-11-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow-up to commit 9ee6c5dc816a ("ipv4: allow local fragmentation in ip_finish_output_gso()"), updating the comment documenting cases in which fragmentation is needed for egress GSO packets. Suggested-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | net: tcp response should set oif only if it is L3 masterDavid Ahern2016-11-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lorenzo noted an Android unit test failed due to e0d56fdd7342: "The expectation in the test was that the RST replying to a SYN sent to a closed port should be generated with oif=0. In other words it should not prefer the interface where the SYN came in on, but instead should follow whatever the routing table says it should do." Revert the change to ip_send_unicast_reply and tcp_v6_send_response such that the oif in the flow is set to the skb_iif only if skb_iif is an L3 master. Fixes: e0d56fdd7342 ("net: l3mdev: remove redundant calls") Reported-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | Net Driver: Add Cypress GX3 VID=04b4 PID=3610.Allan Chou2016-11-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller (Vendor=04b4 ProdID=3610). Patch verified on x64 linux kernel 4.7.4, 4.8.6, 4.9-rc4 systems with the Kensington SD4600P USB-C Universal Dock with Power, which uses the Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller. A similar patch was signed-off and tested-by Allan Chou <allan@asix.com.tw> on 2015-12-01. Allan verified his similar patch on x86 Linux kernel 4.1.6 system with Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller. Tested-by: Allan Chou <allan@asix.com.tw> Tested-by: Chris Roth <chris.roth@usask.ca> Tested-by: Artjom Simon <artjom.simon@gmail.com> Signed-off-by: Allan Chou <allan@asix.com.tw> Signed-off-by: Chris Roth <chris.roth@usask.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2016-11-09
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains a larger than usual batch of Netfilter fixes for your net tree. This series contains a mixture of old bugs and recently introduced bugs, they are: 1) Fix a crash when using nft_dynset with nft_set_rbtree, which doesn't support the set element updates from the packet path. From Liping Zhang. 2) Fix leak when nft_expr_clone() fails, from Liping Zhang. 3) Fix a race when inserting new elements to the set hash from the packet path, also from Liping. 4) Handle segmented TCP SIP packets properly, basically avoid that the INVITE in the allow header create bogus expectations by performing stricter SIP message parsing, from Ulrich Weber. 5) nft_parse_u32_check() should return signed integer for errors, from John Linville. 6) Fix wrong allocation instead of connlabels, allocate 16 instead of 32 bytes, from Florian Westphal. 7) Fix compilation breakage when building the ip_vs_sync code with CONFIG_OPTIMIZE_INLINING on x86, from Arnd Bergmann. 8) Destroy the new set if the transaction object cannot be allocated, also from Liping Zhang. 9) Use device to route duplicated packets via nft_dup only when set by the user, otherwise packets may not follow the right route, again from Liping. 10) Fix wrong maximum genetlink attribute definition in IPVS, from WANG Cong. 11) Ignore untracked conntrack objects from xt_connmark, from Florian Westphal. 12) Allow to use conntrack helpers that are registered NFPROTO_UNSPEC via CT target, otherwise we cannot use the h.245 helper, from Florian. 13) Revisit garbage collection heuristic in the new workqueue-based timer approach for conntrack to evict objects earlier, again from Florian. 14) Fix crash in nf_tables when inserting an element into a verdict map, from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | netfilter: nf_tables: fix oops when inserting an element into a verdict mapLiping Zhang2016-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dalegaard says: The following ruleset, when loaded with 'nft -f bad.txt' ----snip---- flush ruleset table ip inlinenat { map sourcemap { type ipv4_addr : verdict; } chain postrouting { ip saddr vmap @sourcemap accept } } add chain inlinenat test add element inlinenat sourcemap { 100.123.10.2 : jump test } ----snip---- results in a kernel oops: BUG: unable to handle kernel paging request at 0000000000001344 IP: [<ffffffffa07bf704>] nf_tables_check_loops+0x114/0x1f0 [nf_tables] [...] Call Trace: [<ffffffffa07c2aae>] ? nft_data_init+0x13e/0x1a0 [nf_tables] [<ffffffffa07c1950>] nft_validate_register_store+0x60/0xb0 [nf_tables] [<ffffffffa07c74b5>] nft_add_set_elem+0x545/0x5e0 [nf_tables] [<ffffffffa07bfdd0>] ? nft_table_lookup+0x30/0x60 [nf_tables] [<ffffffff8132c630>] ? nla_strcmp+0x40/0x50 [<ffffffffa07c766e>] nf_tables_newsetelem+0x11e/0x210 [nf_tables] [<ffffffff8132c400>] ? nla_validate+0x60/0x80 [<ffffffffa030d9b4>] nfnetlink_rcv+0x354/0x5a7 [nfnetlink] Because we forget to fill the net pointer in bind_ctx, so dereferencing it may cause kernel crash. Reported-by: Dalegaard <dalegaard@gmail.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | | * | | netfilter: conntrack: refine gc worker heuristicsFlorian Westphal2016-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nicolas Dichtel says: After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove timed-out entries"), netlink conntrack deletion events may be sent with a huge delay. Nicolas further points at this line: goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS); and indeed, this isn't optimal at all. Rationale here was to ensure that we don't block other work items for too long, even if nf_conntrack_htable_size is huge. But in order to have some guarantee about maximum time period where a scan of the full conntrack table completes we should always use a fixed slice size, so that once every N scans the full table has been examined at least once. We also need to balance this vs. the case where the system is either idle (i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens from packet path). So, after some discussion with Nicolas: 1. want hard guarantee that we scan entire table at least once every X s -> need to scan fraction of table (get rid of upper bound) 2. don't want to eat cycles on idle or very busy system -> increase interval if we did not evict any entries 3. don't want to block other worker items for too long -> make fraction really small, and prefer small scan interval instead 4. Want reasonable short time where we detect timed-out entry when system went idle after a burst of traffic, while not doing scans all the time. -> Store next gc scan in worker, increasing delays when no eviction happened and shrinking delay when we see timed out entries. The old gc interval is turned into a max number, scans can now happen every jiffy if stale entries are present. Longest possible time period until an entry is evicted is now 2 minutes in worst case (entry expires right after it was deemed 'not expired'). Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | | * | | netfilter: conntrack: fix CT target for UNSPEC helpersFlorian Westphal2016-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thomas reports its not possible to attach the H.245 helper: iptables -t raw -A PREROUTING -p udp -j CT --helper H.245 iptables: No chain/target/match by that name. xt_CT: No such helper "H.245" This is because H.245 registers as NFPROTO_UNSPEC, but the CT target passes NFPROTO_IPV4/IPV6 to nf_conntrack_helper_try_module_get. We should treat UNSPEC as wildcard and ignore the l3num instead. Reported-by: Thomas Woerner <twoerner@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | | * | | netfilter: connmark: ignore skbs with magic untracked conntrack objectsFlorian Westphal2016-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The (percpu) untracked conntrack entries can end up with nonzero connmarks. The 'untracked' conntrack objects are merely a way to distinguish INVALID (i.e. protocol connection tracker says payload doesn't meet some requirements or packet was never seen by the connection tracking code) from packets that are intentionally not tracked (some icmpv6 types such as neigh solicitation, or by using 'iptables -j CT --notrack' option). Untracked conntrack objects are implementation detail, we might as well use invalid magic address instead to tell INVALID and UNTRACKED apart. Check skb->nfct for untracked dummy and behave as if skb->nfct is NULL. Reported-by: XU Tianwen <evan.xu.tianwen@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | | * | | ipvs: use IPVS_CMD_ATTR_MAX for family.maxattrWANG Cong2016-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | family.maxattr is the max index for policy[], the size of ops[] is determined with ARRAY_SIZE(). Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | | * | | netfilter: nft_dup: do not use sreg_dev if the user doesn't specify itLiping Zhang2016-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NFTA_DUP_SREG_DEV attribute is not a must option, so we should use it in routing lookup only when the user specify it. Fixes: d877f07112f1 ("netfilter: nf_tables: add nft_dup expression") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | | * | | netfilter: nf_tables: destroy the set if fail to add transactionLiping Zhang2016-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the memory is exhausted, then we will fail to add the NFT_MSG_NEWSET transaction. In such case, we should destroy the set before we free it. Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>