| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
| |
Since netif_carrier_on() will do nothing if device's
carrier is already on, so it's unnecessary to do
carrier status check.
It's the same for netif_carrier_off().
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Nathan Fontenot says:
====================
ibmvnic: Updated reset handler and code fixes
This set of patches multiple code fixes and a new rest handler
for the ibmvnic driver. In order to implement the new reset handler
for the ibmvnic driver resource initialization needed to be moved to
its own routine, a state variable is introduced to replace the
various is_* flags in the driver, and a new routine to handle the
assorted reasons the driver can be reset.
v4 updates:
Patch 3/11: Corrected trailing whitespace
Patch 7/11: Corrected trailing whitespace
v3 updates:
Patch 10/11: Correct patch subject line to be a description of the patch.
v2 updates:
Patch 11/11: Use __netif_subqueue_stopped() instead of
netif_subqueue_stopped() to avoid possible use of an un-initialized
skb variable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Restart of the subqueue should occur outside of the loop processing
any tx buffers instead of doing this in the middle of the loop.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Map each RX SKB to the RX queue associated with the driver's RX SCRQ.
This should improve the RX CPU load balancing issues seen by the
performance team.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
There is not a need to stop processing skbs if we encounter a
skb that has a receive completion error.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Move the check for the driver resetting to the first thing
in ibmvnic_xmit().
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
When closing the ibmvnic driver we need to wait for any pending
sub crq entries to ensure they are handled.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When closing the ibmvnic driver, most notably during the reset
path, the tx pools need to be cleaned to ensure there are no
hanging skbs that need to be free'ed.
The need for this was found during debugging a loss of network
traffic after handling a driver reset. The underlying cause was
some skbs in the tx pool that were never free'ed. As a
result the upper network layers never tried a re-send since it
believed the driver still had the skb.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| | |
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The napi structs allocated at drivier initializatio need to be
free'ed when releasing the drivers resources.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The ibmvnic driver has multiple handlers for resetting the driver
depending on the reason the reset is needed (failover, lpm,
fatal erors,...). All of the reset handlers do essentially the same
thing, this patch moves this work to a common reset handler.
By doing this we also allow the driver to better handle situations
where we can get a reset while handling a reset.
The updated reset handling works by adding a reset work item to the
list of resets and then scheduling work to perform the reset. This
step is necessary because we can receive a reset in interrupt context
and we want to handle the reset out of interrupt context.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Replace the is_closed flag in the ibmvnic adapter strcut with a
more comprehensive state field that tracks the current state of
the driver.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
| |
Move all of the calls to initialize resources for the driver to
a separate routine.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pablo Neira Ayuso says:
====================
Netfilter/IPVS/OVS fixes for net
The following patchset contains a rather large batch of Netfilter, IPVS
and OVS fixes for your net tree. This includes fixes for ctnetlink, the
userspace conntrack helper infrastructure, conntrack OVS support,
ebtables DNAT target, several leaks in error path among other. More
specifically, they are:
1) Fix reference count leak in the CT target error path, from Gao Feng.
2) Remove conntrack entry clashing with a matching expectation, patch
from Jarno Rajahalme.
3) Fix bogus EEXIST when registering two different userspace helpers,
from Liping Zhang.
4) Don't leak dummy elements in the new bitmap set type in nf_tables,
from Liping Zhang.
5) Get rid of module autoload from conntrack update path in ctnetlink,
we don't need autoload at this late stage and it is happening with
rcu read lock held which is not good. From Liping Zhang.
6) Fix deadlock due to double-acquire of the expect_lock from conntrack
update path, this fixes a bug that was introduced when the central
spinlock got removed. Again from Liping Zhang.
7) Safe ct->status update from ctnetlink path, from Liping. The expect_lock
protection that was selected when the central spinlock was removed was
not really protecting anything at all.
8) Protect sequence adjustment under ct->lock.
9) Missing socket match with IPv6, from Peter Tirsek.
10) Adjust skb->pkt_type of DNAT'ed frames from ebtables, from
Linus Luessing.
11) Don't give up on evaluating the expression on new entries added via
dynset expression in nf_tables, from Liping Zhang.
12) Use skb_checksum() when mangling icmpv6 in IPv6 NAT as this deals
with non-linear skbuffs.
13) Don't allow IPv6 service in IPVS if no IPv6 support is available,
from Paolo Abeni.
14) Missing mutex release in error path of xt_find_table_lock(), from
Dan Carpenter.
15) Update maintainers files, Netfilter section. Add Florian to the
file, refer to nftables.org and change project status from Supported
to Maintained.
16) Bail out on mismatching extensions in element updates in nf_tables.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| | |
If no NLM_F_EXCL is set and the element already exists in the set, make
sure that both elements have the same extensions.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Several updates on the MAINTAINERS section for Netfilter:
1) Add Florian Westphal, he's been part of the coreteam since October 2012.
He's been dedicating tireless efforts to improve the Netfilter codebase,
fix bugs and push ongoing new developments ever since.
2) Add http://www.nftables.org/ URL, currently pointing to
http://www.netfilter.org.
3) Update project status from Supported to Maintained.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs
Simon Horman says:
====================
IPVS Fixes for v4.11
I would also like it considered for stable.
* Explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
to avoid oops caused by IPVS accesing IPv6 routing code in such
circumstances.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When creating a new ipvs service, ipv6 addresses are always accepted
if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family
is not explicitly checked.
This allows the user-space to configure ipvs services even if the
system is booted with ipv6.disable=1. On specific configuration, ipvs
can try to call ipv6 routing code at setup time, causing the kernel to
oops due to fib6_rules_ops being NULL.
This change addresses the issue adding a check for the ipv6
module being enabled while validating ipv6 service operations and
adding the same validation for dest operations.
According to git history, this issue is apparently present since
the introduction of ipv6 support, and the oops can be triggered
since commit 09571c7ae30865ad ("IPVS: Add function to determine
if IPv6 address is local")
Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| | |
According to my static checker we should unlock here before the return.
That seems reasonable to me as well.
Fixes" b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When recalculating the outer ICMPv6 checksum for a reverse path NATv6
such as ICMPV6_TIME_EXCEED nf_nat_icmpv6_reply_translation() was
accessing data beyond the headlen of the skb for non-linear skb. This
resulted in incorrect ICMPv6 checksum as garbage data was used.
Patch replaces csum_partial() with skb_checksum() which supports
non-linear skbs similar to nf_nat_icmp_reply_translation() from ipv4.
Signed-off-by: Dave Johnson <dave-kernel@centerclick.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, after adding the following nft rules:
# nft add set x target1 { type ipv4_addr \; flags timeout \;}
# nft add rule x y set add ip daddr timeout 1d @target1 counter
the counters will always be zero despite of the elements are added
to the dynamic set "target1" or not, as we will break the nft expr
traversal unconditionally:
# nft list ruleset
...
set target1 {
...
elements = { 8.8.8.8 expires 23h59m53s}
}
chain output {
...
set add ip daddr timeout 1d @target1 counter packets 0 bytes 0
^ ^
...
}
Since we add the elements to the set successfully, we should continue
to the next expression.
Additionally, if elements are added to "flow table" successfully, we
will _always_ continue to the next expr, even if the operation is
_OP_ADD. So it's better to keep them to be consistent.
Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates")
Reported-by: Robert White <rwhite@pobox.com>
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When trying to redirect bridged frames to the bridge device itself or
a bridge port (brouting) via the dnat target then this currently fails:
The ethernet destination of the frame is dnat'ed to the MAC address of
the bridge device or port just fine. However, the IP code drops it in
the beginning of ip_input.c/ip_rcv() as the dnat target left
the skb->pkt_type as PACKET_OTHERHOST.
Fixing this by resetting skb->pkt_type to an appropriate type after
dnat'ing.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit 834184b1f3a4 ("netfilter: defrag: only register defrag
functionality if needed") used the outdated XT_SOCKET_HAVE_IPV6 macro
which was removed earlier in commit 8db4c5be88f6 ("netfilter: move
socket lookup infrastructure to nf_socket_ipv{4,6}.c"). With that macro
never being defined, the xt_socket match emits an "Unknown family 10"
warning when used with IPv6:
WARNING: CPU: 0 PID: 1377 at net/netfilter/xt_socket.c:160 socket_mt_enable_defrag+0x47/0x50 [xt_socket]
Unknown family 10
Modules linked in: xt_socket nf_socket_ipv4 nf_socket_ipv6 nf_defrag_ipv4 [...]
CPU: 0 PID: 1377 Comm: ip6tables-resto Not tainted 4.10.10 #1
Hardware name: [...]
Call Trace:
? __warn+0xe7/0x100
? socket_mt_enable_defrag+0x47/0x50 [xt_socket]
? socket_mt_enable_defrag+0x47/0x50 [xt_socket]
? warn_slowpath_fmt+0x39/0x40
? socket_mt_enable_defrag+0x47/0x50 [xt_socket]
? socket_mt_v2_check+0x12/0x40 [xt_socket]
? xt_check_match+0x6b/0x1a0 [x_tables]
? xt_find_match+0x93/0xd0 [x_tables]
? xt_request_find_match+0x20/0x80 [x_tables]
? translate_table+0x48e/0x870 [ip6_tables]
? translate_table+0x577/0x870 [ip6_tables]
? walk_component+0x3a/0x200
? kmalloc_order+0x1d/0x50
? do_ip6t_set_ctl+0x181/0x490 [ip6_tables]
? filename_lookup+0xa5/0x120
? nf_setsockopt+0x3a/0x60
? ipv6_setsockopt+0xb0/0xc0
? sock_common_setsockopt+0x23/0x30
? SyS_socketcall+0x41d/0x630
? vfs_read+0xfa/0x120
? do_fast_syscall_32+0x7a/0x110
? entry_SYSENTER_32+0x47/0x71
This patch brings the conditional back in line with how the rest of the
file handles IPv6.
Fixes: 834184b1f3a4 ("netfilter: defrag: only register defrag functionality if needed")
Signed-off-by: Peter Tirsek <peter@tirsek.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We should acquire the ct->lock before accessing or modifying the
nf_ct_seqadj, as another CPU may modify the nf_ct_seqadj at the same
time during its packet proccessing.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
After converting to use rcu for conntrack hash, one CPU may update
the ct->status via ctnetlink, while another CPU may process the
packets and update the ct->status.
So the non-atomic operation "ct->status |= status;" via ctnetlink
becomes unsafe, and this may clear the IPS_DYING_BIT bit set by
another CPU unexpectedly. For example:
CPU0 CPU1
ctnetlink_change_status __nf_conntrack_find_get
old = ct->status nf_ct_gc_expired
- nf_ct_kill
- test_and_set_bit(IPS_DYING_BIT
new = old | status; -
ct->status = new; <-- oops, _DYING_ is cleared!
Now using a series of atomic bit operation to solve the above issue.
Also note, user shouldn't set IPS_TEMPLATE, IPS_SEQ_ADJUST directly,
so make these two bits be unchangable too.
If we set the IPS_TEMPLATE_BIT, ct will be freed by nf_ct_tmpl_free,
but actually it is alloced by nf_conntrack_alloc.
If we set the IPS_SEQ_ADJUST_BIT, this may cause the NULL pointer
deference, as the nfct_seqadj(ct) maybe NULL.
Last, add some comments to describe the logic change due to the
commit a963d710f367 ("netfilter: ctnetlink: Fix regression in CTA_STATUS
processing"), which makes me feel a little confusing.
Fixes: 76507f69c44e ("[NETFILTER]: nf_conntrack: use RCU for conntrack hash")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, ctnetlink_change_conntrack is always protected by _expect_lock,
but this will cause a deadlock when deleting the helper from a conntrack,
as the _expect_lock will be acquired again by nf_ct_remove_expectations:
CPU0
----
lock(nf_conntrack_expect_lock);
lock(nf_conntrack_expect_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by lt-conntrack_gr/12853:
#0: (&table[i].mutex){+.+.+.}, at: [<ffffffffa05e2009>]
nfnetlink_rcv_msg+0x399/0x6a9 [nfnetlink]
#1: (nf_conntrack_expect_lock){+.....}, at: [<ffffffffa05f2c1f>]
ctnetlink_new_conntrack+0x17f/0x408 [nf_conntrack_netlink]
Call Trace:
dump_stack+0x85/0xc2
__lock_acquire+0x1608/0x1680
? ctnetlink_parse_tuple_proto+0x10f/0x1c0 [nf_conntrack_netlink]
lock_acquire+0x100/0x1f0
? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack]
_raw_spin_lock_bh+0x3f/0x50
? nf_ct_remove_expectations+0x32/0x90 [nf_conntrack]
nf_ct_remove_expectations+0x32/0x90 [nf_conntrack]
ctnetlink_change_helper+0xc6/0x190 [nf_conntrack_netlink]
ctnetlink_new_conntrack+0x1b2/0x408 [nf_conntrack_netlink]
nfnetlink_rcv_msg+0x60a/0x6a9 [nfnetlink]
? nfnetlink_rcv_msg+0x1b9/0x6a9 [nfnetlink]
? nfnetlink_bind+0x1a0/0x1a0 [nfnetlink]
netlink_rcv_skb+0xa4/0xc0
nfnetlink_rcv+0x87/0x770 [nfnetlink]
Since the operations are unrelated to nf_ct_expect, so we can drop the
_expect_lock. Also note, after removing the _expect_lock protection,
another CPU may invoke nf_conntrack_helper_unregister, so we should
use rcu_read_lock to protect __nf_conntrack_helper_find invoked by
ctnetlink_change_helper.
Fixes: ca7433df3a67 ("netfilter: conntrack: seperate expect locking from nf_conntrack_lock")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
First, when creating a new ct, we will invoke request_module to try to
load the related inkernel cthelper. So there's no need to call the
request_module again when updating the ct helpinfo.
Second, ctnetlink_change_helper may be called with rcu_read_lock held,
i.e. rcu_read_lock -> nfqnl_recv_verdict -> nfqnl_ct_parse ->
ctnetlink_glue_parse -> ctnetlink_glue_parse_ct ->
ctnetlink_change_helper. But the request_module invocation may sleep,
so we can't call it with the rcu_read_lock held.
Remove it now.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We forget to free dummy elements when deleting the set. So when I was
running nft-test.py, I saw many kmemleak warnings:
kmemleak: 1344 new suspected memory leaks ...
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800631345c8 (size 32):
comm "nft", pid 9075, jiffies 4295743309 (age 1354.815s)
hex dump (first 32 bytes):
f8 63 13 63 00 88 ff ff 88 79 13 63 00 88 ff ff .c.c.....y.c....
04 0c 00 00 00 00 00 00 00 00 00 00 08 03 00 00 ................
backtrace:
[<ffffffff819059da>] kmemleak_alloc+0x4a/0xa0
[<ffffffff81288174>] __kmalloc+0x164/0x310
[<ffffffffa061269d>] nft_set_elem_init+0x3d/0x1b0 [nf_tables]
[<ffffffffa06130da>] nft_add_set_elem+0x45a/0x8c0 [nf_tables]
[<ffffffffa0613645>] nf_tables_newsetelem+0x105/0x1d0 [nf_tables]
[<ffffffffa05fe6d4>] nfnetlink_rcv+0x414/0x770 [nfnetlink]
[<ffffffff817f0ca6>] netlink_unicast+0x1f6/0x310
[<ffffffff817f10c6>] netlink_sendmsg+0x306/0x3b0
...
Fixes: e920dde516088 ("netfilter: nft_set_bitmap: keep a list of dummy elements")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
cthelpers added via nfnetlink may have the same tuple, i.e. except for
the l3proto and l4proto, other fields are all zero. So even with the
different names, we will also fail to add them:
# nfct helper add ssdp inet udp
# nfct helper add tftp inet udp
nfct v1.4.3: netlink error: File exists
So in order to avoid unpredictable behaviour, we should:
1. cthelpers can be selected by nft ct helper obj or xt_CT target, so
report error if duplicated { name, l3proto, l4proto } tuple exist.
2. cthelpers can be selected by nf_ct_tuple_src_mask_cmp when
nf_ct_auto_assign_helper is enabled, so also report error if duplicated
{ l3proto, l4proto, src-port } tuple exist.
Also note, if the cthelper is added from userspace, then the src-port will
always be zero, it's invalid for nf_ct_auto_assign_helper, so there's no
need to check the second point listed above.
Fixes: 893e093c786c ("netfilter: nf_ct_helper: bail out on duplicated helpers")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conntrack helpers do not check for a potentially clashing conntrack
entry when creating a new expectation. Also, nf_conntrack_in() will
check expectations (via init_conntrack()) only if a conntrack entry
can not be found. The expectation for a packet which also matches an
existing conntrack entry will not be removed by conntrack, and is
currently handled inconsistently by OVS, as OVS expects the
expectation to be removed when the connection tracking entry matching
that expectation is confirmed.
It should be noted that normally an IP stack would not allow reuse of
a 5-tuple of an old (possibly lingering) connection for a new data
connection, so this is somewhat unlikely corner case. However, it is
possible that a misbehaving source could cause conntrack entries be
created that could then interfere with new related connections.
Fix this in the OVS module by deleting the clashing conntrack entry
after an expectation has been matched. This causes the following
nf_conntrack_in() call also find the expectation and remove it when
creating the new conntrack entry, as well as the forthcoming reply
direction packets to match the new related connection instead of the
old clashing conntrack entry.
Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Reported-by: Yang Song <yangsong@vmware.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There are two cases which causes refcnt leak.
1. When nf_ct_timeout_ext_add failed in xt_ct_set_timeout, it should
free the timeout refcnt.
Now goto the err_put_timeout error handler instead of going ahead.
2. When the time policy is not found, we should call module_put.
Otherwise, the related cthelper module cannot be removed anymore.
It is easy to reproduce by typing the following command:
# iptables -t raw -A OUTPUT -p tcp -j CT --helper ftp --timeout xxx
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If gcc (e.g. 4.1.2) decides not to inline total_extension_size(), the
build will fail with:
net/built-in.o: In function `nf_conntrack_init_start':
(.text+0x9baf6): undefined reference to `__compiletime_assert_1893'
or
ERROR: "__compiletime_assert_1893" [net/netfilter/nf_conntrack.ko] undefined!
Fix this by forcing inlining of total_extension_size().
Fixes: b3a5db109e0670d6 ("netfilter: conntrack: use u8 for extension sizes again")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On 32-bit:
lib/test_bpf.c:4772: warning: integer constant is too large for ‘unsigned long’ type
lib/test_bpf.c:4772: warning: integer constant is too large for ‘unsigned long’ type
lib/test_bpf.c:4773: warning: integer constant is too large for ‘unsigned long’ type
lib/test_bpf.c:4773: warning: integer constant is too large for ‘unsigned long’ type
lib/test_bpf.c:4787: warning: integer constant is too large for ‘unsigned long’ type
lib/test_bpf.c:4787: warning: integer constant is too large for ‘unsigned long’ type
lib/test_bpf.c:4801: warning: integer constant is too large for ‘unsigned long’ type
lib/test_bpf.c:4801: warning: integer constant is too large for ‘unsigned long’ type
lib/test_bpf.c:4802: warning: integer constant is too large for ‘unsigned long’ type
lib/test_bpf.c:4802: warning: integer constant is too large for ‘unsigned long’ type
On 32-bit systems, "long" is only 32-bit.
Replace the "UL" suffix by "ULL" to fix this.
Fixes: 85f68fe898320575 ("bpf, arm64: implement jiting of BPF_XADD")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds support for Telit ME910 PID 0x1100.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now tg3 NIC's stats will be cleared after ifdown/ifup. bond_get_stats traverse
its salves to get statistics,cumulative the increment.If a tg3 NIC is added to
bonding as a slave,ifdown/ifup will cause bonding's stats become tremendous value
(ex.1638.3 PiB) because of negative increment.
Fixes: 92feeabf3f67 ("tg3: Save stats across chip resets")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
-D__x86_64__ workaround was used to make /usr/include/features.h
to follow expected path through the system include headers.
This is not portable.
Instead define dummy stubs.h which is used by 'clang -target bpf'
Fixes: 6882804c916b ("selftests/bpf: add a test for overlapping packet range checks")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
With clang/llvm 4.0+, the test case is able to generate
the following pattern:
....
440: (b7) r1 = 15
441: (05) goto pc+73
515: (79) r6 = *(u64 *)(r10 -152)
516: (bf) r7 = r10
517: (07) r7 += -112
518: (bf) r2 = r7
519: (0f) r2 += r1
520: (71) r1 = *(u8 *)(r8 +0)
521: (73) *(u8 *)(r2 +45) = r1
....
commit 332270fdc8b6 ("bpf: enhance verifier to understand stack
pointer arithmetic") improved verifier to handle such a pattern.
This patch adds a C test case to actually generate such a pattern.
A dummy tracepoint interface is used to load the program
into the kernel.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Small follow-up to d74a32acd59a ("xdp: use netlink extended ACK reporting")
in order to let drivers all use the same NL_SET_ERR_MSG_MOD() helper macro
for reporting. This also ensures that we consistently add the driver's
prefix for dumping the report in user space to indicate that the error
message is driver specific and not coming from core code. Furthermore,
NL_SET_ERR_MSG_MOD() now reuses NL_SET_ERR_MSG() and thus makes all macros
check the pointer as suggested.
References: https://www.spinics.net/lists/netdev/msg433267.html
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Andrey reported a warning triggered by the rcu code:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 5911 at lib/debugobjects.c:289
debug_print_object+0x175/0x210
ODEBUG: activate active (active state 1) object type: rcu_head hint:
(null)
Modules linked in:
CPU: 1 PID: 5911 Comm: a.out Not tainted 4.11.0-rc8+ #271
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16
dump_stack+0x192/0x22d lib/dump_stack.c:52
__warn+0x19f/0x1e0 kernel/panic.c:549
warn_slowpath_fmt+0xe0/0x120 kernel/panic.c:564
debug_print_object+0x175/0x210 lib/debugobjects.c:286
debug_object_activate+0x574/0x7e0 lib/debugobjects.c:442
debug_rcu_head_queue kernel/rcu/rcu.h:75
__call_rcu.constprop.76+0xff/0x9c0 kernel/rcu/tree.c:3229
call_rcu_sched+0x12/0x20 kernel/rcu/tree.c:3288
rt6_rcu_free net/ipv6/ip6_fib.c:158
rt6_release+0x1ea/0x290 net/ipv6/ip6_fib.c:188
fib6_del_route net/ipv6/ip6_fib.c:1461
fib6_del+0xa42/0xdc0 net/ipv6/ip6_fib.c:1500
__ip6_del_rt+0x100/0x160 net/ipv6/route.c:2174
ip6_del_rt+0x140/0x1b0 net/ipv6/route.c:2187
__ipv6_ifa_notify+0x269/0x780 net/ipv6/addrconf.c:5520
addrconf_ifdown+0xe60/0x1a20 net/ipv6/addrconf.c:3672
...
Andrey's reproducer program runs in a very tight loop, calling
'unshare -n' and then spawning 2 sets of 14 threads running random ioctl
calls. The relevant networking sequence:
1. New network namespace created via unshare -n
- ip6tnl0 device is created in down state
2. address added to ip6tnl0
- equivalent to ip -6 addr add dev ip6tnl0 fd00::bb/1
- DAD is started on the address and when it completes the host
route is inserted into the FIB
3. ip6tnl0 is brought up
- the new fixup_permanent_addr function restarts DAD on the address
4. exit namespace
- teardown / cleanup sequence starts
- once in a blue moon, lo teardown appears to happen BEFORE teardown
of ip6tunl0
+ down on 'lo' removes the host route from the FIB since the dst->dev
for the route is loobback
+ host route added to rcu callback list
* rcu callback has not run yet, so rt is NOT on the gc list so it has
NOT been marked obsolete
5. in parallel to 4. worker_thread runs addrconf_dad_completed
- DAD on the address on ip6tnl0 completes
- calls ipv6_ifa_notify which inserts the host route
All of that happens very quickly. The result is that a host route that
has been deleted from the IPv6 FIB and added to the RCU list is re-inserted
into the FIB.
The exit namespace eventually gets to cleaning up ip6tnl0 which removes the
host route from the FIB again, calls the rcu function for cleanup -- and
triggers the double rcu trace.
The root cause is duplicate DAD on the address -- steps 2 and 3. Arguably,
DAD should not be started in step 2. The interface is in the down state,
so it can not really send out requests for the address which makes starting
DAD pointless.
Since the second DAD was introduced by a recent change, seems appropriate
to use it for the Fixes tag and have the fixup function only start DAD for
addresses in the PREDAD state which occurs in addrconf_ifdown if the
address is retained.
Big thanks to Andrey for isolating a reliable reproducer for this problem.
Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Adding support for Microchip LAN9250 Ethernet controller.
Signed-off-by: David Cai <david.cai@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Jesper Dangaard Brouer says:
====================
Improve bpf ELF-loader under samples/bpf
This series improves and fixes bpf ELF loader and programs under
samples/bpf. The bpf_load.c created some hard to debug issues when
the struct (bpf_map_def) used in the ELF maps section format changed
in commit fb30d4b71214 ("bpf: Add tests for map-in-map").
This was hotfixed in commit 409526bea3c3 ("samples/bpf: bpf_load.c
detect and abort if ELF maps section size is wrong") by detecting the
issue and aborting the program.
In most situations the bpf-loader should be able to handle these kind
of changes to the struct size. This patch series aim to do proper
backward and forward compabilility handling when loading ELF files.
This series also adjust the callback that was introduced in commit
9fd63d05f3e8 ("bpf: Allow bpf sample programs (*_user.c) to change
bpf_map_def") to use the new bpf_map_data structure, before more users
start to use this callback.
Hoping these changes can make the merge window, as above mentioned
commits have not been merged yet, and it would be good to avoid users
hitting these issues.
====================
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Giving *_user.c side tools access to map_data[] provides easier
access to information on the maps being loaded. Still provide
the guarantee that the order maps are being defined in inside the
_kern.c file corresponds with the order in the array. Now user
tools are not blind, but can inspect and verify the maps that got
loaded from the ELF binary.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Do this change before others start to use this callback.
Change map_perf_test_user.c which seems to be the only user.
This patch extends capabilities of commit 9fd63d05f3e8 ("bpf:
Allow bpf sample programs (*_user.c) to change bpf_map_def").
Give fixup callback access to struct bpf_map_data, instead of
only stuct bpf_map_def. This add flexibility to allow userspace
to reassign the map file descriptor. This is very useful when
wanting to share maps between several bpf programs.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch does proper parsing of the ELF "maps" section, in-order to
be both backwards and forwards compatible with changes to the map
definition struct bpf_map_def, which gets compiled into the ELF file.
The assumption is that new features with value zero, means that they
are not in-use. For backward compatibility where loading an ELF file
with a smaller struct bpf_map_def, only copy objects ELF size, leaving
rest of loaders struct zero. For forward compatibility where ELF file
have a larger struct bpf_map_def, only copy loaders own struct size
and verify that rest of the larger struct is zero, assuming this means
the newer feature was not activated, thus it should be safe for this
older loader to load this newer ELF file.
Fixes: fb30d4b71214 ("bpf: Add tests for map-in-map")
Fixes: 409526bea3c3 ("samples/bpf: bpf_load.c detect and abort if ELF maps section size is wrong")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/ /
| |
| |
| |
| |
| |
| |
| | |
Needed to adjust max locked memory RLIMIT_MEMLOCK for testing these bpf samples
as these are using more and larger maps than can fit in distro default 64Kbytes limit.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
tty: fix comment for __tty_alloc_driver()
init/main: properly align the multi-line comment
init/main: Fix double "the" in comment
Fix dead URLs to ftp.kernel.org
drivers: Clean up duplicated email address
treewide: Fix typo in xml/driver-api/basics.xml
tools/testing/selftests/powerpc: remove redundant CFLAGS in Makefile: "-Wall -O2 -Wall" -> "-O2 -Wall"
selftests/timers: Spelling s/privledges/privileges/
HID: picoLCD: Spelling s/REPORT_WRTIE_MEMORY/REPORT_WRITE_MEMORY/
net: phy: dp83848: Fix Typo
UBI: Fix typos
Documentation: ftrace.txt: Correct nice value of 120 priority
net: fec: Fix typo in error msg and comment
treewide: Fix typos in printk
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@cascardo.eti.br>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a tab before it to follow standard practices. Also add the missing
full stop '.'.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
s/the\ the/the
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
URLs to ftp.kernel.org are still exist though the service is closed [0].
This commit fixes the URLs to use www.kernel.org instead.
[0] https://www.kernel.org/shutting-down-ftp-services.html
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|