| Commit message (Collapse) | Author | Age |
... | |
| |\ \ \ \ \ \ \ \ \ \ |
|
| |\ \ \ \ \ \ \ \ \ \ \
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch
Pravin B Shelar says:
====================
Open vSwitch
First two patches are related to OVS MPLS support. Rest of patches
are mostly refactoring and minor improvements to openvswitch.
v1-v2:
- Fix conflicts due to "gue: Remote checksum offload"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | | |
Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.
Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.
Cc: Ravi K <rkerur@gmail.com>
Cc: Leo Alterman <lalterman@nicira.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
|
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | | |
It's unused now, just delete it.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/ / / / / / / / / / /
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Currently we need the IP6_TNL_F_CAP_XMIT capabiltiy to transmit
packets through an ipv6 tunnel. This capability is set when the
tunnel gets configured, based on the tunnel endpoint addresses.
On tunnels with wildcard tunnel endpoints, we need to do the
capabiltiy checking on a per packet basis like it is done in
the receive path.
This patch extends ip6_tnl_xmit_ctl() to take local and remote
addresses as parameters to allow for per packet capabiltiy
checking.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
It is only used in net/ipv6/inet6_hashtables.c.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure
remote checksum offload on an IP tunnel. Add logic in gue_build_header
to insert remote checksum offload option.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Define a private flag for remote checksun offload as well as a length
for the option.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Add functions and basic definitions for processing standard flags,
private flags, and control messages. This includes definitions
to compute length of optional fields corresponding to a set of flags.
Flag validation is in validate_gue_flags function. This checks for
unknown flags, and that length of optional fields is <= length
in guehdr hlen.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Move fou_build_header out of ip_tunnel.c and into fou.c splitting
it up into fou_build_header, gue_build_header, and fou_build_udp.
This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap
to call fou_build_header or gue_build_header based on the tunnel
encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen
functions which are called by ip_encap_hlen. New net/fou.h has
prototypes and defines for this.
Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels
can use FOU/GUE and fou module is also selected.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
This patch allows to set ECN on a per-route basis in case the sysctl
tcp_ecn is not set to 1. In other words, when ECN is set for specific
routes, it provides a tcp_ecn=1 behaviour for that route while the rest
of the stack acts according to the global settings.
One can use 'ip route change dev $dev $net features ecn' to toggle this.
Having a more fine-grained per-route setting can be beneficial for various
reasons, for example, 1) within data centers, or 2) local ISPs may deploy
ECN support for their own video/streaming services [1], etc.
There was a recent measurement study/paper [2] which scanned the Alexa's
publicly available top million websites list from a vantage point in US,
Europe and Asia:
Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely
blamed to commit 255cac91c3 ("tcp: extend ECN sysctl to allow server-side
only ECN") ;)); the break in connectivity on-path was found is about
1 in 10,000 cases. Timeouts rather than receiving back RSTs were much
more common in the negotiation phase (and mostly seen in the Alexa
middle band, ranks around 50k-150k): from 12-thousand hosts on which
there _may_ be ECN-linked connection failures, only 79 failed with RST
when _not_ failing with RST when ECN is not requested.
It's unclear though, how much equipment in the wild actually marks CE
when buffers start to fill up.
We thought about a fallback to non-ECN for retransmitted SYNs as another
global option (which could perhaps one day be made default), but as Eric
points out, there's much more work needed to detect broken middleboxes.
Two examples Eric mentioned are buggy firewalls that accept only a single
SYN per flow, and middleboxes that successfully let an ECN flow establish,
but later mark CE for all packets (so cwnd converges to 1).
[1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
[2] http://ecn.ethz.ch/
Joint work with Daniel Borkmann.
Reference: http://thread.gmane.org/gmane.linux.network/335797
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
The function cookie_check_timestamp(), both called from IPv4/6 context,
is being used to decode the echoed timestamp from the SYN/ACK into TCP
options used for follow-up communication with the peer.
We can remove ECN handling from that function, split it into a separate
one, and simply rename the original function into cookie_decode_options().
cookie_decode_options() just fills in tcp_option struct based on the
echoed timestamp received from the peer. Anything that fails in this
function will actually discard the request socket.
While this is the natural place for decoding options such as ECN which
commit 172d69e63c7f ("syncookies: add support for ECN") added, we argue
that in particular for ECN handling, it can be checked at a later point
in time as the request sock would actually not need to be dropped from
this, but just ECN support turned off.
Therefore, we split this functionality into cookie_ecn_ok(), which tells
us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled.
This prepares for per-route ECN support: just looking at the tcp_ecn sysctl
won't be enough anymore at that point; if the timestamp indicates ECN
and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric.
This would mean adding a route lookup to cookie_check_timestamp(), which
we definitely want to avoid. As we already do a route lookup at a later
point in cookie_{v4,v6}_check(), we can simply make use of that as well
for the new cookie_ecn_ok() function w/o any additional cost.
Joint work with Daniel Borkmann.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \ \ \ \ \ \ \ \ \ \
| | | |_|_|_|_|_|_|_|/ /
| | |/| | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Conflicts:
drivers/net/phy/marvell.c
Simple overlapping changes in drivers/net/phy/marvell.c
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Add support for reading switch registers with 'ethtool -d'.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
On some chips it is possible to access the switch eeprom.
Add infrastructure support for it.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Some switches provide chip temperature data.
Add support for reporting it through the hwmon subsystem.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
In neigh_parms_release() we loop over all entries to find the entry given in
argument and being able to remove it from the list. By using a double linked
list, we can avoid this loop.
Here are some numbers with 30 000 dummy interfaces configured:
Before the patch:
$ time rmmod dummy
real 2m0.118s
user 0m0.000s
sys 1m50.048s
After the patch:
$ time rmmod dummy
real 1m9.970s
user 0m0.000s
sys 0m47.976s
Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
While testing upcoming Yaogong patch (converting out of order queue
into an RB tree), I hit the max reordering level of linux TCP stack.
Reordering level was limited to 127 for no good reason, and some
network setups [1] can easily reach this limit and get limited
throughput.
Allow a new max limit of 300, and add a sysctl to allow admins to even
allow bigger (or lower) values if needed.
[1] Aggregation of links, per packet load balancing, fabrics not doing
deep packet inspections, alternative TCP congestion modules...
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |_|_|_|_|_|/ / / /
| |/| | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
The original motivation for this change was to allow the helper to be used
in files other than actions.c as part of work on an odp select group
action.
It was as pointed out by Thomas Graf that this helper would be best off
living in netlink.h. Furthermore, I think that the generic nature of this
helper means it is best off in netlink.h regardless of if it is used more
than one .c file or not. Thus, I would like it considered independent of
the work on an odp select group action.
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |_|_|_|_|_|_|/ /
| |/| | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
include ipx.h from sysctl_net_ipx.c
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
sctp_addr_is_valid() only appeared in its definition.
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Remove trailing whitespace in tcp.h icmp.c syncookies.c
Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \ \ \ \ \ \ \ \
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
Merge first patchbomb from Andrew Morton:
- a few minor cifs fixes
- dma-debug upadtes
- ocfs2
- slab
- about half of MM
- procfs
- kernel/exit.c
- panic.c tweaks
- printk upates
- lib/ updates
- checkpatch updates
- fs/binfmt updates
- the drivers/rtc tree
- nilfs
- kmod fixes
- more kernel/exit.c
- various other misc tweaks and fixes
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
exit: pidns: fix/update the comments in zap_pid_ns_processes()
exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
exit: exit_notify: re-use "dead" list to autoreap current
exit: reparent: call forget_original_parent() under tasklist_lock
exit: reparent: avoid find_new_reaper() if no children
exit: reparent: introduce find_alive_thread()
exit: reparent: introduce find_child_reaper()
exit: reparent: document the ->has_child_subreaper checks
exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper()
exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting
exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting
exit: proc: don't try to flush /proc/tgid/task/tgid
exit: release_task: fix the comment about group leader accounting
exit: wait: drop tasklist_lock before psig->c* accounting
exit: wait: don't use zombie->real_parent
exit: wait: cleanup the ptrace_reparented() checks
usermodehelper: kill the kmod_thread_locker logic
usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper()
fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp
nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
...
|
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
Memory is internally accounted in bytes, using spinlock-protected 64-bit
counters, even though the smallest accounting delta is a page. The
counter interface is also convoluted and does too many things.
Introduce a new lockless word-sized page counter API, then change all
memory accounting over to it. The translation from and to bytes then only
happens when interfacing with userspace.
The removed locking overhead is noticable when scaling beyond the per-cpu
charge caches - on a 4-socket machine with 144-threads, the following test
shows the performance differences of 288 memcgs concurrently running a
page fault benchmark:
vanilla:
18631648.500498 task-clock (msec) # 140.643 CPUs utilized ( +- 0.33% )
1,380,638 context-switches # 0.074 K/sec ( +- 0.75% )
24,390 cpu-migrations # 0.001 K/sec ( +- 8.44% )
1,843,305,768 page-faults # 0.099 M/sec ( +- 0.00% )
50,134,994,088,218 cycles # 2.691 GHz ( +- 0.33% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
8,049,712,224,651 instructions # 0.16 insns per cycle ( +- 0.04% )
1,586,970,584,979 branches # 85.176 M/sec ( +- 0.05% )
1,724,989,949 branch-misses # 0.11% of all branches ( +- 0.48% )
132.474343877 seconds time elapsed ( +- 0.21% )
lockless:
12195979.037525 task-clock (msec) # 133.480 CPUs utilized ( +- 0.18% )
832,850 context-switches # 0.068 K/sec ( +- 0.54% )
15,624 cpu-migrations # 0.001 K/sec ( +- 10.17% )
1,843,304,774 page-faults # 0.151 M/sec ( +- 0.00% )
32,811,216,801,141 cycles # 2.690 GHz ( +- 0.18% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
9,999,265,091,727 instructions # 0.30 insns per cycle ( +- 0.10% )
2,076,759,325,203 branches # 170.282 M/sec ( +- 0.12% )
1,656,917,214 branch-misses # 0.08% of all branches ( +- 0.55% )
91.369330729 seconds time elapsed ( +- 0.45% )
On top of improved scalability, this also gets rid of the icky long long
types in the very heart of memcg, which is great for 32 bit and also makes
the code a lot more readable.
Notable differences between the old and new API:
- res_counter_charge() and res_counter_charge_nofail() become
page_counter_try_charge() and page_counter_charge() resp. to match
the more common kernel naming scheme of try_do()/do()
- res_counter_uncharge_until() is only ever used to cancel a local
counter and never to uncharge bigger segments of a hierarchy, so
it's replaced by the simpler page_counter_cancel()
- res_counter_set_limit() is replaced by page_counter_limit(), which
expects its callers to serialize against themselves
- res_counter_memparse_write_strategy() is replaced by
page_counter_limit(), which rounds down to the nearest page size -
rather than up. This is more reasonable for explicitely requested
hard upper limits.
- to keep charging light-weight, page_counter_try_charge() charges
speculatively, only to roll back if the result exceeds the limit.
Because of this, a failing bigger charge can temporarily lock out
smaller charges that would otherwise succeed. The error is bounded
to the difference between the smallest and the biggest possible
charge size, so for memcg, this means that a failing THP charge can
send base page charges into reclaim upto 2MB (4MB) before the limit
would have been reached. This should be acceptable.
[akpm@linux-foundation.org: add includes for WARN_ON_ONCE and memparse]
[akpm@linux-foundation.org: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \ \ \ \ \ \ \ \
| |/ / / / / / / / / /
|/| | | | | | | | | /
| | |_|_|_|_|_|_|_|/
| |/| | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS changes from Al Viro:
"First pile out of several (there _definitely_ will be more). Stuff in
this one:
- unification of d_splice_alias()/d_materialize_unique()
- iov_iter rewrite
- killing a bunch of ->f_path.dentry users (and f_dentry macro).
Getting that completed will make life much simpler for
unionmount/overlayfs, since then we'll be able to limit the places
sensitive to file _dentry_ to reasonably few. Which allows to have
file_inode(file) pointing to inode in a covered layer, with dentry
pointing to (negative) dentry in union one.
Still not complete, but much closer now.
- crapectomy in lustre (dead code removal, mostly)
- "let's make seq_printf return nothing" preparations
- assorted cleanups and fixes
There _definitely_ will be more piles"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
copy_from_iter_nocache()
new helper: iov_iter_kvec()
csum_and_copy_..._iter()
iov_iter.c: handle ITER_KVEC directly
iov_iter.c: convert copy_to_iter() to iterate_and_advance
iov_iter.c: convert copy_from_iter() to iterate_and_advance
iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
iov_iter.c: convert iov_iter_zero() to iterate_and_advance
iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
iov_iter.c: iterate_and_advance
iov_iter.c: macros for iterating over iov_iter
kill f_dentry macro
dcache: fix kmemcheck warning in switch_names
new helper: audit_file()
nfsd_vfs_write(): use file_inode()
ncpfs: use file_inode()
kill f_dentry uses
lockd: get rid of ->f_path.dentry->d_sb
...
|
| |\ \ \ \ \ \ \ \ \
| | | |_|_|_|_|_|/ /
| | |/| | | | | | | |
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Since adding a new function to seq_file (seq_has_overflowed())
there isn't any value for functions called from seq_show to
return anything. Remove the int returns of the various
print_tuple/<foo>_print_tuple functions.
Link: http://lkml.kernel.org/p/f2e8cf8df433a197daa62cbaf124c900c708edc7.1412031505.git.joe@perches.com
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | |_|_|_|/ / / /
| |/| | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
The seq_printf() and friends are having their return values removed.
The print_conntrack() returns the result of seq_printf(), which is
meaningless when seq_printf() returns void. Might as well remove the
return values of print_conntrack() as well.
Link: http://lkml.kernel.org/r/20141029220107.465008329@goodmis.org
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|\ \ \ \ \ \ \ \ \
| |_|_|_|_|_|_|_|/
|/| | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:
- 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.
This instruments might_sleep() checks to catch places that nest
blocking primitives - such as mutex usage in a wait loop. Such
bugs can result in hard to debug races/hangs.
Another category of invalid nesting that this facility will detect
is the calling of blocking functions from within schedule() ->
sched_submit_work() -> blk_schedule_flush_plug().
There's some potential for false positives (if secondary blocking
primitives themselves are not ready yet for this facility), but the
kernel will warn once about such bugs per bootup, so the warning
isn't much of a nuisance.
This feature comes with a number of fixes, for problems uncovered
with it, so no messages are expected normally.
- Another round of sched/numa optimizations and refinements, for
CONFIG_NUMA_BALANCING=y.
- Another round of sched/dl fixes and refinements.
Plus various smaller fixes and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched: Add missing rcu protection to wake_up_all_idle_cpus
sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
sched/numa: Init numa balancing fields of init_task
sched/deadline: Remove unnecessary definitions in cpudeadline.h
sched/cpupri: Remove unnecessary definitions in cpupri.h
sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
sched/fair: Fix stale overloaded status in the busiest group finding logic
sched: Move p->nr_cpus_allowed check to select_task_rq()
sched/completion: Document when to use wait_for_completion_io_*()
sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
sched/fair: Kill task_struct::numa_entry and numa_group::task_list
sched: Refactor task_struct to use numa_faults instead of numa_* pointers
sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
sched/deadline: Reschedule from switched_from_dl() after a successful pull
sched/deadline: Push task away if the deadline is equal to curr during wakeup
sched/deadline: Add deadline rq status print
sched/deadline: Fix artificial overrun introduced by yield_task_dl()
sched/rt: Clean up check_preempt_equal_prio()
sched/core: Use dl_bw_of() under rcu_read_lock_sched()
sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
...
|
| |\ \ \ \ \ \ \ \
| | | |_|_|/ / / /
| | |/| | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
applying more changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | |/ / / / / /
| |/| | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 __might_sleep+0x58/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff81070e10>] prepare_to_wait+0x50 /0xa0
[<ffffffff8105bc38>] __might_sleep+0x58/0x90
[<ffffffff8148c671>] lock_sock_nested+0x31/0xb0
[<ffffffff81498aaa>] sk_stream_wait_memory+0x18a/0x2d0
Which is a false positive because sk_wait_event() will already have
TASK_RUNNING at that point if it would've gone through
schedule_timeout().
So annotate with sched_annotate_sleep(); which goes away on !DEBUG builds.
Reported-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140924082242.524407432@infradead.org
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: netdev@vger.kernel.org
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |_|/ / / / /
|/| | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
TCP timestamping introduced MSG_ERRQUEUE handling for TCP sockets.
If the socket is of family AF_INET6, call ipv6_recv_error instead
of ip_recv_error.
This change is more complex than a single branch due to the loadable
ipv6 module. It reuses a pre-existing indirect function call from
ping. The ping code is safe to call, because it is part of the core
ipv6 module and always present when AF_INET6 sockets are active.
Fixes: 4ed2d765 (net-timestamp: TCP timestamping)
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
It may also be worthwhile to add WARN_ON_ONCE(sk->family == AF_INET6)
to ip_recv_error.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Suggested-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter updates for your net tree,
they are:
1) Fix missing initialization of the range structure (allocated in the
stack) in nft_masq_{ipv4, ipv6}_eval, from Daniel Borkmann.
2) Make sure the data we receive from userspace contains the req_version
structure, otherwise return an error incomplete on truncated input.
From Dan Carpenter.
3) Fix handling og skb->sk which may cause incorrect handling
of connections from a local process. Via Simon Horman, patch from
Calvin Owens.
4) Fix wrong netns in nft_compat when setting target and match params
structure.
5) Relax chain type validation in nft_compat that was recently included,
this broke the matches that need to be run from the route chain type.
Now iptables-test.py automated regression tests report success again
and we avoid the only possible problematic case, which is the use of
nat targets out of nat chain type.
6) Use match->table to validate the tablename, instead of the match->name.
Again patch for nft_compat.
7) Restore the synchronous release of objects from the commit and abort
path in nf_tables. This is causing two major problems: splats when using
nft_compat, given that matches and targets may sleep and call_rcu is
invoked from softirq context. Moreover Patrick reported possible event
notification reordering when rules refer to anonymous sets.
8) Fix race condition in between packets that are being confirmed by
conntrack and the ctnetlink flush operation. This happens since the
removal of the central spinlock. Thanks to Jesper D. Brouer to looking
into this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |/ / / / /
| |/| | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The existing xtables matches and targets, when used from nft_compat, may
sleep from the destroy path, ie. when removing rules. Since the objects
are released via call_rcu from softirq context, this results in lockdep
splats and possible lockups that may be hard to reproduce.
Patrick also indicated that delayed object release via call_rcu can
cause us problems in the ordering of event notifications when anonymous
sets are in place.
So, this patch restores the synchronous object release from the commit
and abort paths. This includes a call to synchronize_rcu() to make sure
that no packets are walking on the objects that are going to be
released. This is slowier though, but it's simple and it resolves the
aforementioned problems.
This is a partial revert of c7c32e7 ("netfilter: nf_tables: defer all
object release via rcu") that was introduced in 3.16 to speed up
interaction with userspace.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |_|_|_|_|/
|/| | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Most NICs that report NETIF_F_GSO_UDP_TUNNEL support VXLAN, and not
other UDP-based encapsulation protocols where the format and size of the
header differs. This patch implements a generic ndo_gso_check() for
VXLAN which will only advertise GSO support when the skb looks like it
contains VXLAN (or no UDP tunnelling at all).
Implementation shamelessly stolen from Tom Herbert:
http://thread.gmane.org/gmane.linux.network/332428/focus=333111
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |_|_|/ /
|/| | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When doing GRO processing for UDP tunnels, we never add
SKB_GSO_UDP_TUNNEL to gso_type - only the type of the inner protocol
is added (such as SKB_GSO_TCPV4). The result is that if the packet is
later resegmented we will do GSO but not treat it as a tunnel. This
results in UDP fragmentation of the outer header instead of (i.e.) TCP
segmentation of the inner header as was originally on the wire.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/ / / /
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Ryo Munakata <ryomnktml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pablo Neira Ayuso says:
====================
netfilter/ipvs fixes for net
The following patchset contains fixes for netfilter/ipvs. This round of
fixes is larger than usual at this stage, specifically because of the
nf_tables bridge reject fixes that I would like to see in 3.18. The
patches are:
1) Fix a null-pointer dereference that may occur when logging
errors. This problem was introduced by 4a4739d56b0 ("ipvs: Pull
out crosses_local_route_boundary logic") in v3.17-rc5.
2) Update hook mask in nft_reject_bridge so we can also filter out
packets from there. This fixes 36d2af5 ("netfilter: nf_tables: allow
to filter from prerouting and postrouting"), which needs this chunk
to work.
3) Two patches to refactor common code to forge the IPv4 and IPv6
reject packets from the bridge. These are required by the nf_tables
reject bridge fix.
4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject
packets from the bridge. The idea is to forge the reject packets and
inject them to the original port via br_deliver() which is now
exported for that purpose.
5) Restrict nft_reject_bridge to bridge prerouting and input hooks.
the original skbuff may cloned after prerouting when the bridge stack
needs to flood it to several bridge ports, it is too late to reject
the traffic.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:
* nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_ip6hdr_put(): to build the IPv6 header.
* nf_reject_ip6_tcphdr_put(): to build the TCP header.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |/ /
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:
* nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_iphdr_put(): to build the IPv4 header.
* nf_reject_ip_tcphdr_put(): to build the TCP header.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway. Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \
| |/ /
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pablo Neira Ayuso says:
====================
netfilter fixes for net
The following patchset contains netfilter fixes for your net tree,
they are:
1) Fix missing MODULE_LICENSE() in the new nf_reject_ipv{4,6} modules.
2) Restrict nat and masq expressions to the nat chain type. Otherwise,
users may crash their kernel if they attach a nat/masq rule to a non
nat chain.
3) Fix hook validation in nft_compat when non-base chains are used.
Basically, initialize hook_mask to zero.
4) Make sure you use match/targets in nft_compat from the right chain
type. The existing validation relies on the table name which can be
avoided by
5) Better netlink attribute validation in nft_nat. This expression has
to reject the configuration when no address and proto configurations
are specified.
6) Interpret NFTA_NAT_REG_*_MAX if only if NFTA_NAT_REG_*_MIN is set.
Yet another sanity check to reject incorrect configurations from
userspace.
7) Conditional NAT attribute dumping depending on the existing
configuration.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This adds the missing validation code to avoid the use of nat/masq from
non-nat chains. The validation assumes two possible configuration
scenarios:
1) Use of nat from base chain that is not of nat type. Reject this
configuration from the nft_*_init() path of the expression.
2) Use of nat from non-base chain. In this case, we have to wait until
the non-base chain is referenced by at least one base chain via
jump/goto. This is resolved from the nft_*_validate() path which is
called from nf_tables_check_loops().
The user gets an -EOPNOTSUPP in both cases.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull networking fixes from David Miller:
"A quick batch of bug fixes:
1) Fix build with IPV6 disabled, from Eric Dumazet.
2) Several more cases of caching SKB data pointers across calls to
pskb_may_pull(), thus referencing potentially free'd memory. From
Li RongQing.
3) DSA phy code tests operation presence improperly, instead of going:
if (x->ops->foo)
r = x->ops->foo(args);
it was going:
if (x->ops->foo(args))
r = x->ops->foo(args);
Fix from Andew Lunn"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
Net: DSA: Fix checking for get_phy_flags function
ipv6: fix a potential use after free in sit.c
ipv6: fix a potential use after free in ip6_offload.c
ipv4: fix a potential use after free in gre_offload.c
tcp: fix build error if IPv6 is not enabled
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
$ make M=net/ipv4
CC net/ipv4/route.o
In file included from net/ipv4/route.c:102:0:
include/net/tcp.h: In function ‘tcp_v6_iif’:
include/net/tcp.h:738:32: error: ‘union <anonymous>’ has no member named ‘h6’
return TCP_SKB_CB(skb)->header.h6.iif;
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 870c3151382c ("ipv6: introduce tcp_v6_iif()")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull networking fixes from David Miller:
1) Include fixes for netrom and dsa (Fabian Frederick and Florian
Fainelli)
2) Fix FIXED_PHY support in stmmac, from Giuseppe CAVALLARO.
3) Several SKB use after free fixes (vxlan, openvswitch, vxlan,
ip_tunnel, fou), from Li ROngQing.
4) fec driver PTP support fixes from Luwei Zhou and Nimrod Andy.
5) Use after free in virtio_net, from Michael S Tsirkin.
6) Fix flow mask handling for megaflows in openvswitch, from Pravin B
Shelar.
7) ISDN gigaset and capi bug fixes from Tilman Schmidt.
8) Fix route leak in ip_send_unicast_reply(), from Vasily Averin.
9) Fix two eBPF JIT bugs on x86, from Alexei Starovoitov.
10) TCP_SKB_CB() reorganization caused a few regressions, fixed by Cong
Wang and Eric Dumazet.
11) Don't overwrite end of SKB when parsing malformed sctp ASCONF
chunks, from Daniel Borkmann.
12) Don't call sock_kfree_s() with NULL pointers, this function also has
the side effect of adjusting the socket memory usage. From Cong Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
bna: fix skb->truesize underestimation
net: dsa: add includes for ethtool and phy_fixed definitions
openvswitch: Set flow-key members.
netrom: use linux/uaccess.h
dsa: Fix conversion from host device to mii bus
tipc: fix bug in bundled buffer reception
ipv6: introduce tcp_v6_iif()
sfc: add support for skb->xmit_more
r8152: return -EBUSY for runtime suspend
ipv4: fix a potential use after free in fou.c
ipv4: fix a potential use after free in ip_tunnel_core.c
hyperv: Add handling of IP header with option field in netvsc_set_hash()
openvswitch: Create right mask with disabled megaflows
vxlan: fix a free after use
openvswitch: fix a use after free
ipv4: dst_entry leak in ip_send_unicast_reply()
ipv4: clean up cookie_v4_check()
ipv4: share tcp_v4_save_options() with cookie_v4_check()
ipv4: call __ip_options_echo() in cookie_v4_check()
atm: simplify lanai.c by using module_pci_driver
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
net/dsa/slave.c uses functions and structures declared in phy_fixed.h
but does not explicitely include it, while dsa.h needs structure
declarations for 'struct ethtool_wolinfo' and 'struct ethtool_eee', fix
those by including the correct header files.
Fixes: ec9436baedb6 ("net: dsa: allow drivers to do link adjustment")
Fixes: ce31b31c68e7 ("net: dsa: allow updating fixed PHY link information")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line
misses") added a regression for SO_BINDTODEVICE on IPv6.
This is because we still use inet6_iif() which expects that IP6 control
block is still at the beginning of skb->cb[]
This patch adds tcp_v6_iif() helper and uses it where necessary.
Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif
parameter to it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We can retrieve opt from skb, no need to pass it as a parameter.
And opt should always be non-NULL, no need to check.
Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|