aboutsummaryrefslogtreecommitdiffstats
path: root/net/core
Commit message (Collapse)AuthorAge
...
| * | | net: Allow userns root control of the core of the network stack.Eric W. Biederman2012-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) and capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns, CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls. Settings that merely control a single network device are allowed. Either the network device is a logical network device where restrictions make no difference or the network device is hardware NIC that has been explicity moved from the initial network namespace. In general policy and network stack state changes are allowed while resource control is left unchanged. Allow ethtool ioctls. Allow binding to network devices. Allow setting the socket mark. Allow setting the socket priority. Allow setting the network device alias via sysfs. Allow setting the mtu via sysfs. Allow changing the network device flags via sysfs. Allow setting the network device group via sysfs. Allow the following network device ioctls. SIOCGMIIPHY SIOCGMIIREG SIOCSIFNAME SIOCSIFFLAGS SIOCSIFMETRIC SIOCSIFMTU SIOCSIFHWADDR SIOCSIFSLAVE SIOCADDMULTI SIOCDELMULTI SIOCSIFHWBROADCAST SIOCSMIIREG SIOCBONDENSLAVE SIOCBONDRELEASE SIOCBONDSETHWADDR SIOCBONDCHANGEACTIVE SIOCBRADDIF SIOCBRDELIF SIOCSHWTSTAMP Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Allow userns root to force the scm credsEric W. Biederman2012-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the user calling sendmsg has the appropriate privieleges in their user namespace allow them to set the uid, gid, and pid in the SCM_CREDENTIALS control message to any valid value. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Push capable(CAP_NET_ADMIN) into the rtnl methodsEric W. Biederman2012-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check to ns_capable(net->user-ns, CAP_NET_ADMIN). Allowing unprivileged users to make netlink calls to modify their local network namespace. - In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so that calls that are not safe for unprivileged users are still protected. Later patches will remove the extra capable calls from methods that are safe for unprivilged users. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Don't export sysctls to unprivileged usersEric W. Biederman2012-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for supporting the creation of network namespaces by unprivileged users, modify all of the per net sysctl exports and refuse to allow them to unprivileged users. This makes it safe for unprivileged users in general to access per net sysctls, and allows sysctls to be exported to unprivileged users on an individual basis as they are deemed safe. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | userns: make each net (net_ns) belong to a user_nsEric W. Biederman2012-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The user namespace which creates a new network namespace owns that namespace and all resources created in it. This way we can target capability checks for privileged operations against network resources to the user_ns which created the network namespace in which the resource lives. Privilege to the user namespace which owns the network namespace, or any parent user namespace thereof, provides the same privilege to the network resource. This patch is reworked from a version originally by Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netns: Deduplicate and fix copy_net_ns when !CONFIG_NET_NSEric W. Biederman2012-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The copy of copy_net_ns used when the network stack is not built is broken as it does not return -EINVAL when attempting to create a new network namespace. We don't even have a previous network namespace. Since we need a copy of copy_net_ns in net/net_namespace.h that is available when the networking stack is not built at all move the correct version of copy_net_ns from net_namespace.c into net_namespace.h Leaving us with just 2 versions of copy_net_ns. One version for when we compile in network namespace suport and another stub for all other occasions. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-11-17
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | Minor line offset auto-merges. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | net: use right lock in __dev_remove_offloadEric Dumazet2012-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | offload_base is protected by offload_lock, not ptype_lock Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | net: Remove code duplication between offload structuresVlad Yasevich2012-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the offload callbacks into its own structure. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | net: Switch to using the new packet offload infrustructureVlad Yasevich2012-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert to using the new GSO/GRO registration mechanism and new packet offload structure. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | net: Add generic packet offload infrastructure.Vlad Yasevich2012-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a new data structure to contain the GRO/GSO callbacks and add a new registration mechanism. Singed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-11-10
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c Minor conflict between the BCM_CNIC define removal in net-next and a bug fix added to net. Based upon a conflict resolution patch posted by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net: fix bridge notify hook to manage flags correctlyJohn Fastabend2012-11-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bridge notify hook rtnl_bridge_notify() was not handling the case where the master flags was set or with both flags set. First flags are not being passed correctly and second the logic to parse them is broken. This patch passes the original flags value and fixes the logic. Reported-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | pktgen: clean up ktime_t helpersDaniel Borkmann2012-11-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some years ago, the ktime_t helper functions ktime_now() and ktime_lt() have been introduced. Instead of defining them inside pktgen.c, they should either use ktime_t library functions or, if not available, they should be defined in ktime.h, so that also others can benefit from them. ktime_compare() is introduced with a similar notion as in timespec_compare(). Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net: Fix continued iteration in rtnl_bridge_getlink()Ben Hutchings2012-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e5a55a898720096f43bc24938f8875c0a1b34cd7 ('net: create generic bridge ops') broke the handling of a non-zero starting index in rtnl_bridge_getlink() (based on the old br_dump_ifinfo()). When the starting index is non-zero, we need to increment the current index for each entry that we are skipping. Also, we need to check the index before both cases, since we may previously have stopped iteration between getting information about a device from its master and from itself. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Tested-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | skb: api to report errors for zero copy skbsMichael S. Tsirkin2012-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Orphaning frags for zero copy skbs needs to allocate data in atomic context so is has a chance to fail. If it does we currently discard the skb which is safe, but we don't report anything to the caller, so it can not recover by e.g. disabling zero copy. Add an API to free skb reporting such errors: this is used by tun in case orphaning frags fails. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | skb: report completion status for zero copy skbsMichael S. Tsirkin2012-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even if skb is marked for zero copy, net core might still decide to copy it later which is somewhat slower than a copy in user context: besides copying the data we need to pin/unpin the pages. Add a parameter reporting such cases through zero copy callback: if this happens a lot, device can take this into account and switch to copying in user context. This patch updates all users but ignores the passed value for now: it will be used by follow-up patches. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | sk-filter: Add ability to get socket filter program (v2)Pavel Emelyanov2012-11-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SO_ATTACH_FILTER option is set only. I propose to add the get ability by using SO_ATTACH_FILTER in getsockopt. To be less irritating to eyes the SO_GET_FILTER alias to it is declared. This ability is required by checkpoint-restore project to be able to save full state of a socket. There are two issues with getting filter back. First, kernel modifies the sock_filter->code on filter load, thus in order to return the filter element back to user we have to decode it into user-visible constants. Fortunately the modification in question is interconvertible. Second, the BPF_S_ALU_DIV_K code modifies the command argument k to speed up the run-time division by doing kernel_k = reciprocal(user_k). Bad news is that different user_k may result in same kernel_k, so we can't get the original user_k back. Good news is that we don't have to do it. What we need to is calculate a user2_k so, that reciprocal(user2_k) == reciprocal(user_k) == kernel_k i.e. if it's re-loaded back the compiled again value will be exactly the same as it was. That said, the user2_k can be calculated like this user2_k = reciprocal(kernel_k) with an exception, that if kernel_k == 0, then user2_k == 1. The optlen argument is treated like this -- when zero, kernel returns the amount of sock_fprog elements in filter, otherwise it should be large enough for the sock_fprog array. changes since v1: * Declared SO_GET_FILTER in all arch headers * Added decode of vlan-tag codes Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net: filter: add vlan tag accessEric Dumazet2012-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BPF filters lack ability to access skb->vlan_tci This patch adds two new ancillary accessors : SKF_AD_VLAN_TAG (44) mapped to vlan_tx_tag_get(skb) SKF_AD_VLAN_TAG_PRESENT (48) mapped to vlan_tx_tag_present(skb) This allows libpcap/tcpdump to use a kernel filter instead of having to fallback to accept all packets, then filter them in user space. Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Ani Sinha <ani@aristanetworks.com> Suggested-by: Daniel Borkmann <danborkmann@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ixgbe: add setlink, getlink support to ixgbe and ixgbevfJohn Fastabend2012-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the net device ops to manage the embedded hardware bridge on ixgbe devices. With this patch the bridge mode can be toggled between VEB and VEPA to support stacking macvlan devices or using the embedded switch without any SW component in 802.1Qbg/br environments. Additionally, this adds source address pruning to the ixgbevf driver to prune any frames sent back from a reflective relay on the switch. This is required because the existing hardware does not support this. Without it frames get pushed into the stack with its own src mac which is invalid per 802.1Qbg VEPA definition. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net: set and query VEB/VEPA bridge mode via PF_BRIDGEJohn Fastabend2012-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hardware switches may support enabling and disabling the loopback switch which puts the device in a VEPA mode defined in the IEEE 802.1Qbg specification. In this mode frames are not switched in the hardware but sent directly to the switch. SR-IOV capable NICs will likely support this mode I am aware of at least two such devices. Also I am told (but don't have any of this hardware available) that there are devices that only support VEPA modes. In these cases it is important at a minimum to be able to query these attributes. This patch adds an additional IFLA_BRIDGE_MODE attribute that can be set and dumped via the PF_BRIDGE:{SET|GET}LINK operations. Also anticipating bridge attributes that may be common for both embedded bridges and software bridges this adds a flags attribute IFLA_BRIDGE_FLAGS currently used to determine if the command or event is being generated to/from an embedded bridge or software bridge. Finally, the event generation is pulled out of the bridge module and into rtnetlink proper. For example using the macvlan driver in VEPA mode on top of an embedded switch requires putting the embedded switch into a VEPA mode to get the expected results. -------- -------- | VEPA | | VEPA | <-- macvlan vepa edge relays -------- -------- | | | | ------------------ | VEPA | <-- embedded switch in NIC ------------------ | | ------------------- | external switch | <-- shiny new physical ------------------- switch with VEPA support A packet sent from the macvlan VEPA at the top could be loopbacked on the embedded switch and never seen by the external switch. So in order for this to work the embedded switch needs to be set in the VEPA state via the above described commands. By making these attributes nested in IFLA_AF_SPEC we allow future extensions to be made as needed. CC: Lennert Buytenhek <buytenh@wantstofly.org> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net: create generic bridge opsJohn Fastabend2012-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PF_BRIDGE:RTM_{GET|SET}LINK nlmsg family and type are currently embedded in the ./net/bridge module. This prohibits them from being used by other bridging devices. One example of this being hardware that has embedded bridging components. In order to use these nlmsg types more generically this patch adds two net_device_ops hooks. One to set link bridge attributes and another to dump the current bride attributes. ndo_bridge_setlink() ndo_bridge_getlink() CC: Lennert Buytenhek <buytenh@wantstofly.org> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | cgroup: net_cls: Pass in task to sock_update_classid()Daniel Wagner2012-10-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sock_update_classid() assumes that the update operation always are applied on the current task. sock_update_classid() needs to know on which tasks to work on in order to be able to migrate task between cgroups using the struct cgroup_subsys attach() callback. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Joe Perches <joe@perches.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: <netdev@vger.kernel.org> Cc: <cgroups@vger.kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | cgroup: net_cls: Remove rcu_read_lock/unlockDaniel Wagner2012-10-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Eric pointed out: "Hey task_cls_classid() has its own rcu protection since commit 3fb5a991916091a908d (cls_cgroup: Fix rcu lockdep warning) So we can safely revert Paul commit (1144182a8757f2a1) (We no longer need rcu_read_lock/unlock here)" Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | cgroup: net_prio: Mark local used function staticDaniel Wagner2012-10-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net_prio_attach() is only access via cgroup_subsys callbacks, therefore we can reduce the visibility of this function. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Tejun Heo <tj@kernel.org> Cc: <netdev@vger.kernel.org> Cc: <cgroups@vger.kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | netlink: cleanup the unnecessary return value checkHans Zhang2012-10-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's no needed to check the return value of tab since the NULL situation has been handled already, and the rtnl_msg_handlers[PF_UNSPEC] has been initialized as non-NULL during the rtnetlink_init(). Signed-off-by: Hans Zhang <zhanghonghui@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net:dev: remove double indentical assignment in dev_change_net_namespace().Rami Rosen2012-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes double assignment of err to -EINVAL in dev_change_net_namespace(). Signed-off-by: Rami Rosen <ramirose@gmail.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | sockopt: Make SO_BINDTODEVICE readablePavel Emelyanov2012-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SO_BINDTODEVICE option is the only SOL_SOCKET one that can be set, but cannot be get via sockopt API. The only way we can find the device id a socket is bound to is via sock-diag interface. But the diag works only on hashed sockets, while the opt in question can be set for yet unhashed one. That said, in order to know what device a socket is bound to (we do want to know this in checkpoint-restore project) I propose to make this option getsockopt-able and report the respective device index. Another solution to the problem might be to teach the sock-diag reporting info on unhashed sockets. Should I go this way instead? Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | pktgen: Use ipv6_addr_anyJoe Perches2012-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the standard test for a non-zero ipv6 address. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | Merge branch 'for-3.8' of ↵Linus Torvalds2012-12-12
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: "A lot of activities on cgroup side. The big changes are focused on making cgroup hierarchy handling saner. - cgroup_rmdir() had peculiar semantics - it allowed cgroup destruction to be vetoed by individual controllers and tried to drain refcnt synchronously. The vetoing never worked properly and caused good deal of contortions in cgroup. memcg was the last reamining user. Michal Hocko removed the usage and cgroup_rmdir() path has been simplified significantly. This was done in a separate branch so that the memcg people can base further memcg changes on top. - The above allowed cleaning up cgroup lifecycle management and implementation of generic cgroup iterators which are used to improve hierarchy support. - cgroup_freezer updated to allow migration in and out of a frozen cgroup and handle hierarchy. If a cgroup is frozen, all descendant cgroups are frozen. - netcls_cgroup and netprio_cgroup updated to handle hierarchy properly. - Various fixes and cleanups. - Two merge commits. One to pull in memcg and rmdir cleanups (needed to build iterators). The other pulled in cgroup/for-3.7-fixes for device_cgroup fixes so that further device_cgroup patches can be stacked on top." Fixed up a trivial conflict in mm/memcontrol.c as per Tejun (due to commit bea8c150a7 ("memcg: fix hotplugged memory zone oops") in master touching code close to commit 2ef37d3fe4 ("memcg: Simplify mem_cgroup_force_empty_list error handling") in for-3.8) * 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (65 commits) cgroup: update Documentation/cgroups/00-INDEX cgroup_rm_file: don't delete the uncreated files cgroup: remove subsystem files when remounting cgroup cgroup: use cgroup_addrm_files() in cgroup_clear_directory() cgroup: warn about broken hierarchies only after css_online cgroup: list_del_init() on removed events cgroup: fix lockdep warning for event_control cgroup: move list add after list head initilization netprio_cgroup: allow nesting and inherit config on cgroup creation netprio_cgroup: implement netprio[_set]_prio() helpers netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx netprio_cgroup: reimplement priomap expansion netprio_cgroup: shorten variable names in extend_netdev_table() netprio_cgroup: simplify write_priomap() netcls_cgroup: move config inheritance to ->css_online() and remove .broken_hierarchy marking cgroup: remove obsolete guarantee from cgroup_task_migrate. cgroup: add cgroup->id cgroup, cpuset: remove cgroup_subsys->post_clone() cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/ cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() ...
| * | | | | | netprio_cgroup: allow nesting and inherit config on cgroup creationTejun Heo2012-11-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inherit netprio configuration from ->css_online(), allow nesting and remove .broken_hierarchy marking. This makes netprio_cgroup's behavior match netcls_cgroup's. Note that this patch changes userland-visible behavior. Nesting is allowed and the first level cgroups below the root cgroup behave differently - they inherit priorities from the root cgroup on creation instead of starting with 0. This is unfortunate but not doing so is much crazier. Signed-off-by: Tejun Heo <tj@kernel.org> Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: David S. Miller <davem@davemloft.net>
| * | | | | | netprio_cgroup: implement netprio[_set]_prio() helpersTejun Heo2012-11-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce two helpers - netprio_prio() and netprio_set_prio() - which hide the details of priomap access and expansion. This will help implementing hierarchy support. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: David S. Miller <davem@davemloft.net>
| * | | | | | netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidxTejun Heo2012-11-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With priomap expansion no longer depending on knowing max id allocated, netprio_cgroup can use cgroup->id insted of cs->prioidx. Drop prioidx alloc/free logic and convert all uses to cgroup->id. * In cgrp_css_alloc(), parent->id test is moved above @cs allocation to simplify error path. * In cgrp_css_free(), @cs assignment is made initialization. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: David S. Miller <davem@davemloft.net>
| * | | | | | netprio_cgroup: reimplement priomap expansionTejun Heo2012-11-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | netprio kept track of the highest prioidx allocated and resized priomaps accordingly when necessary. This makes it necessary to keep track of prioidx allocation and may end up resizing on every new prioidx. Update extend_netdev_table() such that it takes @target_idx which the priomap should be able to accomodate. If the priomap is large enough, nothing happens; otherwise, the size is doubled until @target_idx can be accomodated. This makes max_prioidx and write_update_netdev_table() unnecessary. write_priomap() now calls extend_netdev_table() directly. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: David S. Miller <davem@davemloft.net>
| * | | | | | netprio_cgroup: shorten variable names in extend_netdev_table()Tejun Heo2012-11-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function is about to go through a rewrite. In preparation, shorten the variable names so that we don't repeat "priomap" so often. This patch is cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: David S. Miller <davem@davemloft.net>
| * | | | | | netprio_cgroup: simplify write_priomap()Tejun Heo2012-11-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sscanf() doesn't bite. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: David S. Miller <davem@davemloft.net>
| * | | | | | cgroup: rename ->create/post_create/pre_destroy/destroy() to ↵Tejun Heo2012-11-19
| | |_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ->css_alloc/online/offline/free() Rename cgroup_subsys css lifetime related callbacks to better describe what their roles are. Also, update documentation. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | | | | | net: gro: fix possible panic in skb_gro_receive()Eric Dumazet2012-12-07
| |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 2e71a6f8084e (net: gro: selective flush of packets) added a bug for skbs using frag_list. This part of the GRO stack is rarely used, as it needs skb not using a page fragment for their skb->head. Most drivers do use a page fragment, but some of them use GFP_KERNEL allocations for the initial fill of their RX ring buffer. napi_gro_flush() overwrite skb->prev that was used for these skb to point to the last skb in frag_list. Fix this using a separate field in struct napi_gro_cb to point to the last fragment. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge branch 'master' of ↵John W. Linville2012-11-21
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem John W. Linville says: ==================== This is a batch of fixes intended for 3.7... Included are two pulls. Regarding the mac80211 tree, Johannes says: "Please pull my mac80211.git tree (see below) to get two more fixes for 3.7. Both fix regressions introduced *before* this cycle that weren't noticed until now, one for IBSS not cleaning up properly and the other to add back the "wireless" sysfs directory for Fedora's startup scripts." Regarding the iwlwifi tree, Johannes says: "Please also pull my iwlwifi.git tree, I have two fixes: one to remove a spurious warning that can actually trigger in legitimate situations, and the other to fix a regression from when monitor mode was changed to use the "sniffer" firmware mode." Also included is an nfc tree pull. Samuel says: "We mostly have pn533 fixes here, 2 memory leaks and an early unlocking fix. Moreover, we also have an LLCP adapter linked list insertion fix." On top of that, a few more bits... Albert Pool adds a USB ID to rtlwifi. Bing Zhao provides two mwifiex fixes -- one to fix a system hang during a command timeout, and the other to properly report a suspend error to the MMC core. Finally, Sujith Manoharan fixes a thinko that would trigger an ath9k hang during device reset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | Merge branch 'for-john' of ↵John W. Linville2012-11-19
| |\ \ \ \ | | |_|/ / | |/| | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
| | * | | wireless: add back sysfs directoryJohannes Berg2012-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 35b2a113cb0298d4f9a1263338b456094a414057 broke (at least) Fedora's networking scripts, they check for the existence of the wireless directory. As the files aren't used, add the directory back and not the files. Also do it for both drivers based on the old wireless extensions and cfg80211, regardless of whether the compat code for wext is built into cfg80211 or not. Cc: stable@vger.kernel.org [3.6] Reported-by: Dave Airlie <airlied@gmail.com> Reported-by: Bill Nottingham <notting@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* | | | | net-rps: Fix brokeness causing OOO packetsTom Herbert2012-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit c445477d74ab3779 which adds aRFS to the kernel, the CPU selected for RFS is not set correctly when CPU is changing. This is causing OOO packets and probably other issues. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: correct check in dev_addr_del()Jiri Pirko2012-11-15
| |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | Check (ha->addr == dev->dev_addr) is always true because dev_addr_init() sets this. Correct the check to behave properly on addr removal. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | af-packet: fix oops when socket is not presentEric Leblond2012-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to a NULL dereference, the following patch is causing oops in normal trafic condition: commit c0de08d04215031d68fa13af36f347a6cfa252ca Author: Eric Leblond <eric@regit.org> Date:   Thu Aug 16 22:02:58 2012 +0000     af_packet: don't emit packet on orig fanout group This buggy patch was a feature fix and has reached most stable branches. When skb->sk is NULL and when packet fanout is used, there is a crash in match_fanout_group where skb->sk is accessed. This patch fixes the issue by returning false as soon as the socket is NULL: this correspond to the wanted behavior because the kernel as to resend the skb to all the listening socket in this case. Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | rtnetlink: Use nlmsg type RTM_NEWNEIGH from dflt fdb dumpJohn Fastabend2012-11-03
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the dflt fdb dump handler to use RTM_NEWNEIGH to be compatible with bridge dump routines. The dump reply from the network driver handlers should match the reply from bridge handler. The fact they were not in the ixgbe case was effectively a bug. This patch resolves it. Applications that were not checking the nlmsg type will continue to work. And now applications that do check the type will work as expected. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: fix secpath kmemleakEric Dumazet2012-10-22
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mike Kazantsev found 3.5 kernels and beyond were leaking memory, and tracked the faulty commit to a1c7fff7e18f59e ("net: netdev_alloc_skb() use build_skb()") While this commit seems fine, it uncovered a bug introduced in commit bad43ca8325 ("net: introduce skb_try_coalesce()), in function kfree_skb_partial()"): If head is stolen, we free the sk_buff, without removing references on secpath (skb->sp). So IPsec + IP defrag/reassembly (using skb coalescing), or TCP coalescing could leak secpath objects. Fix this bug by calling skb_release_head_state(skb) to properly release all possible references to linked objects. Reported-by: Mike Kazantsev <mk.fraggod@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Bisected-by: Mike Kazantsev <mk.fraggod@gmail.com> Tested-by: Mike Kazantsev <mk.fraggod@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add doc for in4_pton()Amerigo Wang2012-10-12
| | | | | | | | | | | | | | | | | | It is not easy to use in4_pton() correctly without reading its definition, so add some doc for it. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add doc for in6_pton()Amerigo Wang2012-10-12
| | | | | | | | | | | | | | | | | | It is not easy to use in6_pton() correctly without reading its definition, so add some doc for it. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | pktgen: replace scan_ip6() with in6_pton()Amerigo Wang2012-10-10
| | | | | | | | | | | | Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | pktgen: enable automatic IPv6 address settingAmerigo Wang2012-10-10
| | | | | | | | | | | | Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>