aboutsummaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAge
...
| | * | | | | | | | ipv6: gre: correct calculation of max_headroomHannes Frederic Sowa2013-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gre_hlen already accounts for sizeof(struct ipv6_hdr) + gre header, so initialize max_headroom to zero. Otherwise the if (encap_limit >= 0) { max_headroom += 8; mtu -= 8; } increments an uninitialized variable before max_headroom was reset. Found with coverity: 728539 Cc: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | tcp: TSQ can use a dynamic limitEric Dumazet2013-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When TCP Small Queues was added, we used a sysctl to limit amount of packets queues on Qdisc/device queues for a given TCP flow. Problem is this limit is either too big for low rates, or too small for high rates. Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO auto sizing, it can better control number of packets in Qdisc/device queues. New limit is two packets or at least 1 to 2 ms worth of packets. Low rates flows benefit from this patch by having even smaller number of packets in queues, allowing for faster recovery, better RTT estimations. High rates flows benefit from this patch by allowing more than 2 packets in flight as we had reports this was a limiting factor to reach line rate. [ In particular if TX completion is delayed because of coalescing parameters ] Example for a single flow on 10Gbp link controlled by FQ/pacing 14 packets in flight instead of 2 $ tc -s -d qd qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140 Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0 requeues 6822476) rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476 2047 flow, 2046 inactive, 1 throttled, delay 15673 ns 2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit Note that sk_pacing_rate is currently set to twice the actual rate, but this might be refined in the future when a flow is in congestion avoidance. Additional change : skb->destructor should be set to tcp_wfree(). A future patch (for linux 3.13+) might remove tcp_limit_output_bytes Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | pkt_sched: fq: qdisc dismantle fixesEric Dumazet2013-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fq_reset() should drops all packets in queue, including throttled flows. This patch moves code from fq_destroy() to fq_reset() to do the cleaning. fq_change() must stop calling fq_dequeue() if all remaining packets are from throttled flows. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | net: flow_dissector: fix thoff for IPPROTO_AHEric Dumazet2013-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 8ed781668dd49 ("flow_keys: include thoff into flow_keys for later usage"), we missed that existing code was using nhoff as a temporary variable that could not always contain transport header offset. This is not a problem for TCP/UDP because port offset (@poff) is 0 for these protocols. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | ipv6: Fix preferred_lft not updating in some casesPaul Marks2013-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the scenario where an IPv6 router is advertising a fixed preferred_lft of 1800 seconds, while the valid_lft begins at 3600 seconds and counts down in realtime. A client should reset its preferred_lft to 1800 every time the RA is received, but a bug is causing Linux to ignore the update. The core problem is here: if (prefered_lft != ifp->prefered_lft) { Note that ifp->prefered_lft is an offset, so it doesn't decrease over time. Thus, the comparison is always (1800 != 1800), which fails to trigger an update. The most direct solution would be to compute a "stored_prefered_lft", and use that value in the comparison. But I think that trying to filter out unnecessary updates here is a premature optimization. In order for the filter to apply, both of these would need to hold: - The advertised valid_lft and preferred_lft are both declining in real time. - No clock skew exists between the router & client. So in this patch, I've set "update_lft = 1" unconditionally, which allows the surrounding code to be greatly simplified. Signed-off-by: Paul Marks <pmarks@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | ip_tunnel: Do not use stale inner_iph pointer.Pravin B Shelar2013-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While sending packet skb_cow_head() can change skb header which invalidates inner_iph pointer to skb header. Following patch avoid using it. Found by code inspection. This bug was introduced by commit 0e6fbc5b6c6218 (ip_tunnels: extend iptunnel_xmit()). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | net: net_secret should not depend on TCPEric Dumazet2013-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A host might need net_secret[] and never open a single socket. Problem added in commit aebda156a570782 ("net: defer net_secret[] initialization") Based on prior patch from Hannes Frederic Sowa. Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@strressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | net: Delay default_device_exit_batch until no devices are unregistering v2Eric W. Biederman2013-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is currently serialization network namespaces exiting and network devices exiting as the final part of netdev_run_todo does not happen under the rtnl_lock. This is compounded by the fact that the only list of devices unregistering in netdev_run_todo is local to the netdev_run_todo. This lack of serialization in extreme cases results in network devices unregistering in netdev_run_todo after the loopback device of their network namespace has been freed (making dst_ifdown unsafe), and after the their network namespace has exited (making the NETDEV_UNREGISTER, and NETDEV_UNREGISTER_FINAL callbacks unsafe). Add the missing serialization by a per network namespace count of how many network devices are unregistering and having a wait queue that is woken up whenever the count is decreased. The count and wait queue allow default_device_exit_batch to wait until all of the unregistration activity for a network namespace has finished before proceeding to unregister the loopback device and then allowing the network namespace to exit. Only a single global wait queue is used because there is a single global lock, and there is a single waiter, per network namespace wait queues would be a waste of resources. The per network namespace count of unregistering devices gives a progress guarantee because the number of network devices unregistering in an exiting network namespace must ultimately drop to zero (assuming network device unregistration completes). The basic logic remains the same as in v1. This patch is now half comment and half rtnl_lock_unregistering an expanded version of wait_event performs no extra work in the common case where no network devices are unregistering when we get to default_device_exit_batch. Reported-by: Francesco Ruggeri <fruggeri@aristanetworks.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | IPv6 NAT: Do not drop DNATed 6to4/6rd packetsCatalin\(ux\) M. BOIE2013-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a router is doing DNAT for 6to4/6rd packets the latest anti-spoofing commit 218774dc ("ipv6: add anti-spoofing checks for 6to4 and 6rd") will drop them because the IPv6 address embedded does not match the IPv4 destination. This patch will allow them to pass by testing if we have an address that matches on 6to4/6rd interface. I have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR. Also, log the dropped packets (with rate limit). Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | Merge branch 'master' of ↵John W. Linville2013-09-27
| | |\ \ \ \ \ \ \ \ | | | | |_|_|_|_|/ / | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem Also fixed-up a badly indented closing brace... Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | | * | | | | | | Merge branch 'master' of ↵John W. Linville2013-09-26
| | | |\ \ \ \ \ \ \ | | | | |_|_|/ / / / | | | |/| | | | / / | | | | | |_|_|/ / | | | | |/| | | | git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
| | | | * | | | | Bluetooth: don't release the port in rfcomm_dev_state_change()Gianluca Anzolin2013-09-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the dlc is closed, rfcomm_dev_state_change() tries to release the port in the case it cannot get a reference to the tty. However this is racy and not even needed. Infact as Peter Hurley points out: 1. Only consider dlcs that are 'stolen' from a connected socket, ie. reused. Allocated dlcs cannot have been closed prior to port activate and so for these dlcs a tty reference will always be avail in rfcomm_dev_state_change() -- except for the conditions covered by #2b below. 2. If a tty was at some point previously created for this rfcomm, then either (a) the tty reference is still avail, so rfcomm_dev_state_change() will perform a hangup. So nothing to do, or, (b) the tty reference is no longer avail, and the tty_port will be destroyed by the last tty_port_put() in rfcomm_tty_cleanup. Again, no action required. 3. Prior to obtaining the dlc lock in rfcomm_dev_add(), rfcomm_dev_state_change() will not 'see' a rfcomm_dev so nothing to do here. 4. After releasing the dlc lock in rfcomm_dev_add(), rfcomm_dev_state_change() will 'see' an incomplete rfcomm_dev if a tty reference could not be obtained. Again, the best thing to do here is nothing. Any future attempted open() will block on rfcomm_dev_carrier_raised(). The unconnected device will exist until released by ioctl(RFCOMMRELEASEDEV). The patch removes the aforementioned code and uses the tty_port_tty_hangup() helper to hangup the tty. Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
| | | | * | | | | Bluetooth: Fix rfkill functionality during the HCI setup stageJohan Hedberg2013-09-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to let the setup stage complete cleanly even when the HCI device is rfkilled. Otherwise the HCI device will stay in an undefined state and never get notified to user space through mgmt (even when it gets unblocked through rfkill). This patch makes sure that hci_dev_open() can be called in the HCI_SETUP stage, that blocking the device doesn't abort the setup stage, and that the device gets proper powered down as soon as the setup stage completes in case it was blocked meanwhile. The bug that this patch fixed can be very easily reproduced using e.g. the rfkill command line too. By running "rfkill block all" before inserting a Bluetooth dongle the resulting HCI device goes into a state where it is never announced over mgmt, not even when "rfkill unblock all" is run. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Cc: stable@vger.kernel.org Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
| | | | * | | | | Bluetooth: Introduce a new HCI_RFKILLED flagJohan Hedberg2013-09-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes it more convenient to check for rfkill (no need to check for dev->rfkill before calling rfkill_blocked()) and also avoids potential races if the RFKILL state needs to be checked from within the rfkill callback. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Cc: stable@vger.kernel.org Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
| | | | * | | | | Bluetooth: Fix ACL alive for long in case of non pariable devicesSyam Sidhardhan2013-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For certain devices (ex: HID mouse), support for authentication, pairing and bonding is optional. For such devices, the ACL alive for too long after the L2CAP disconnection. To avoid the ACL alive for too long after L2CAP disconnection, reset the ACL disconnect timeout back to HCI_DISCONN_TIMEOUT during L2CAP connect. While merging the commit id:a9ea3ed9b71cc3271dd59e76f65748adcaa76422 this issue might have introduced. Hcidump info: sh-4.1# /opt/hcidump -Xt 2013-08-05 16:49:00.894129 < ACL data: handle 12 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x004a scid 0x0041 2013-08-05 16:49:00.894195 < HCI Command: Exit Sniff Mode (0x02|0x0004) plen 2 handle 12 2013-08-05 16:49:00.894269 < ACL data: handle 12 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x0049 scid 0x0040 2013-08-05 16:49:00.895645 > HCI Event: Command Status (0x0f) plen 4 Exit Sniff Mode (0x02|0x0004) status 0x00 ncmd 1 2013-08-05 16:49:00.934391 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x00 interval 0 Mode: Active 2013-08-05 16:49:00.936592 > HCI Event: Number of Completed Packets (0x13) plen 5 handle 12 packets 2 2013-08-05 16:49:00.951577 > ACL data: handle 12 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x004a scid 0x0041 2013-08-05 16:49:00.952820 > ACL data: handle 12 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x0049 scid 0x0040 2013-08-05 16:49:00.969165 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x02 interval 50 Mode: Sniff 2013-08-05 16:49:48.175533 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x00 interval 0 Mode: Active 2013-08-05 16:49:48.219045 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x02 interval 108 Mode: Sniff 2013-08-05 16:51:00.968209 < HCI Command: Disconnect (0x01|0x0006) plen 3 handle 12 reason 0x13 Reason: Remote User Terminated Connection 2013-08-05 16:51:00.969056 > HCI Event: Command Status (0x0f) plen 4 Disconnect (0x01|0x0006) status 0x00 ncmd 1 2013-08-05 16:51:01.013495 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x00 interval 0 Mode: Active 2013-08-05 16:51:01.073777 > HCI Event: Disconn Complete (0x05) plen 4 status 0x00 handle 12 reason 0x16 Reason: Connection Terminated by Local Host ============================ After fix ================================ 2013-08-05 16:57:35.986648 < ACL data: handle 11 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x004c scid 0x0041 2013-08-05 16:57:35.986713 < HCI Command: Exit Sniff Mode (0x02|0x0004) plen 2 handle 11 2013-08-05 16:57:35.986785 < ACL data: handle 11 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x004b scid 0x0040 2013-08-05 16:57:35.988110 > HCI Event: Command Status (0x0f) plen 4 Exit Sniff Mode (0x02|0x0004) status 0x00 ncmd 1 2013-08-05 16:57:36.030714 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 11 mode 0x00 interval 0 Mode: Active 2013-08-05 16:57:36.032950 > HCI Event: Number of Completed Packets (0x13) plen 5 handle 11 packets 2 2013-08-05 16:57:36.047926 > ACL data: handle 11 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x004c scid 0x0041 2013-08-05 16:57:36.049200 > ACL data: handle 11 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x004b scid 0x0040 2013-08-05 16:57:36.065509 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 11 mode 0x02 interval 50 Mode: Sniff 2013-08-05 16:57:40.052006 < HCI Command: Disconnect (0x01|0x0006) plen 3 handle 11 reason 0x13 Reason: Remote User Terminated Connection 2013-08-05 16:57:40.052869 > HCI Event: Command Status (0x0f) plen 4 Disconnect (0x01|0x0006) status 0x00 ncmd 1 2013-08-05 16:57:40.104731 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 11 mode 0x00 interval 0 Mode: Active 2013-08-05 16:57:40.146935 > HCI Event: Disconn Complete (0x05) plen 4 status 0x00 handle 11 reason 0x16 Reason: Connection Terminated by Local Host Signed-off-by: Sang-Ki Park <sangki79.park@samsung.com> Signed-off-by: Chan-yeol Park <chanyeol.park@samsung.com> Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com> Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Syam Sidhardhan <s.syam@samsung.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
| | | | * | | | | Bluetooth: Fix encryption key size for peripheral roleAndre Guedes2013-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the connection encryption key size information when the host is playing the peripheral role. We should set conn->enc_key_ size in hci_le_ltk_request_evt, otherwise it is left uninitialized. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
| | | | * | | | | Bluetooth: Fix security level for peripheral roleAndre Guedes2013-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While playing the peripheral role, the host gets a LE Long Term Key Request Event from the controller when a connection is established with a bonded device. The host then informs the LTK which should be used for the connection. Once the link is encrypted, the host gets an Encryption Change Event. Therefore we should set conn->pending_sec_level instead of conn-> sec_level in hci_le_ltk_request_evt. This way, conn->sec_level is properly updated in hci_encrypt_change_evt. Moreover, since we have a LTK associated to the device, we have at least BT_SECURITY_MEDIUM security level. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
| | * | | | | | | ipv6: udp packets following an UFO enqueued packet need also be handled by UFOHannes Frederic Sowa2013-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the following scenario the socket is corked: If the first UDP packet is larger then the mtu we try to append it to the write queue via ip6_ufo_append_data. A following packet, which is smaller than the mtu would be appended to the already queued up gso-skb via plain ip6_append_data. This causes random memory corruptions. In ip6_ufo_append_data we also have to be careful to not queue up the same skb multiple times. So setup the gso frame only when no first skb is available. This also fixes a shortcoming where we add the current packet's length to cork->length but return early because of a packet > mtu with dontfrag set (instead of sutracting it again). Found with trinity. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | net: raw: do not report ICMP redirects to user spaceDuan Jiong2013-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redirect isn't an error condition, it should leave the error handler without touching the socket. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | net: udp: do not report ICMP redirects to user spaceDuan Jiong2013-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redirect isn't an error condition, it should leave the error handler without touching the socket. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | mrp: add periodictimer to allow retries when packets get lostNoel Burton-Krahn2013-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MRP doesn't implement the periodictimer in 802.1Q, so it never retries if packets get lost. I ran into this problem when MRP sent a MVRP JoinIn before the interface was fully up. The JoinIn was lost, MRP didn't retry, and MVRP registration failed. Tested against Juniper QFabric switches Signed-off-by: Noel Burton-Krahn <noel@burton-krahn.com> Acked-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | net/lapb: re-send packets on timeoutjosselin.costanzi@mobile-devices.fr2013-09-23
| | |/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Actually re-send packets when the T1 timer runs out. This fixes a bug where packets are waiting on the write queue until disconnection when no other traffic is outstanding. Signed-off-by: Josselin Costanzi <josselin.costanzi@mobile-devices.fr> Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | Merge tag 'nfs-for-3.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2013-09-21
| |\ \ \ \ \ \ \ | | |/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfix from Trond Myklebust: "Fix a regression due to incorrect sharing of gss auth caches" * tag 'nfs-for-3.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: RPCSEC_GSS: fix crash on destroying gss auth
| | * | | | | | RPCSEC_GSS: fix crash on destroying gss authJ. Bruce Fields2013-09-18
| | | |_|_|_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a regression since eb6dc19d8e72ce3a957af5511d20c0db0a8bd007 "RPCSEC_GSS: Share all credential caches on a per-transport basis" which could cause an occasional oops in the nfsd code (see below). The problem was that an auth was left referencing a client that had been freed. To avoid this we need to ensure that auths are shared only between descendants of a common client; the fact that a clone of an rpc_client takes a reference on its parent then ensures that the parent client will last as long as the auth. Also add a comment explaining what I think was the intention of this code. general protection fault: 0000 [#1] PREEMPT SMP Modules linked in: rpcsec_gss_krb5 nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc CPU: 3 PID: 4071 Comm: kworker/u8:2 Not tainted 3.11.0-rc2-00182-g025145f #1665 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: nfsd4_callbacks nfsd4_do_callback_rpc [nfsd] task: ffff88003e206080 ti: ffff88003c384000 task.ti: ffff88003c384000 RIP: 0010:[<ffffffffa00001f3>] [<ffffffffa00001f3>] rpc_net_ns+0x53/0x70 [sunrpc] RSP: 0000:ffff88003c385ab8 EFLAGS: 00010246 RAX: 6b6b6b6b6b6b6b6b RBX: ffff88003af9a800 RCX: 0000000000000002 RDX: ffffffffa00001a5 RSI: 0000000000000001 RDI: ffffffff81e284e0 RBP: ffff88003c385ad8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000015 R12: ffff88003c990840 R13: ffff88003c990878 R14: ffff88003c385ba8 R15: ffff88003e206080 FS: 0000000000000000(0000) GS:ffff88003fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fcdf737e000 CR3: 000000003ad2b000 CR4: 00000000000006e0 Stack: ffffffffa00001a5 0000000000000006 0000000000000006 ffff88003af9a800 ffff88003c385b08 ffffffffa00d52a4 ffff88003c385ba8 ffff88003c751bd8 ffff88003c751bc0 ffff88003e113600 ffff88003c385b18 ffffffffa00d530c Call Trace: [<ffffffffa00001a5>] ? rpc_net_ns+0x5/0x70 [sunrpc] [<ffffffffa00d52a4>] __gss_pipe_release+0x54/0x90 [auth_rpcgss] [<ffffffffa00d530c>] gss_pipe_free+0x2c/0x30 [auth_rpcgss] [<ffffffffa00d678b>] gss_destroy+0x9b/0xf0 [auth_rpcgss] [<ffffffffa000de63>] rpcauth_release+0x23/0x30 [sunrpc] [<ffffffffa0001e81>] rpc_release_client+0x51/0xb0 [sunrpc] [<ffffffffa00020d5>] rpc_shutdown_client+0xe5/0x170 [sunrpc] [<ffffffff81098a14>] ? cpuacct_charge+0xa4/0xb0 [<ffffffff81098975>] ? cpuacct_charge+0x5/0xb0 [<ffffffffa019556f>] nfsd4_process_cb_update.isra.17+0x2f/0x210 [nfsd] [<ffffffff819a4ac0>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff819a4acb>] ? _raw_spin_unlock_irq+0x3b/0x60 [<ffffffff810703ab>] ? process_one_work+0x15b/0x510 [<ffffffffa01957dd>] nfsd4_do_callback_rpc+0x8d/0xa0 [nfsd] [<ffffffff8107041e>] process_one_work+0x1ce/0x510 [<ffffffff810703ab>] ? process_one_work+0x15b/0x510 [<ffffffff810712ab>] worker_thread+0x11b/0x370 [<ffffffff81071190>] ? manage_workers.isra.24+0x2b0/0x2b0 [<ffffffff8107854b>] kthread+0xdb/0xe0 [<ffffffff819a4ac0>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff81078470>] ? __init_kthread_worker+0x70/0x70 [<ffffffff819ac7dc>] ret_from_fork+0x7c/0xb0 [<ffffffff81078470>] ? __init_kthread_worker+0x70/0x70 Code: a5 01 00 a0 31 d2 31 f6 48 c7 c7 e0 84 e2 81 e8 f4 91 0a e1 48 8b 43 60 48 c7 c2 a5 01 00 a0 be 01 00 00 00 48 c7 c7 e0 84 e2 81 <48> 8b 98 10 07 00 00 e8 91 8f 0a e1 e8 +3c 4e 07 e1 48 83 c4 18 RIP [<ffffffffa00001f3>] rpc_net_ns+0x53/0x70 [sunrpc] RSP <ffff88003c385ab8> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | | | | net ipv4: Convert ipv4.ip_local_port_range to be per netns v3Eric W. Biederman2013-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Move sysctl_local_ports from a global variable into struct netns_ipv4. - Modify inet_get_local_port_range to take a struct net, and update all of the callers. - Move the initialization of sysctl_local_ports into sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c v2: - Ensure indentation used tabs - Fixed ip.h so it applies cleanly to todays net-next v3: - Compile fixes of strange callers of inet_get_local_port_range. This patch now successfully passes an allmodconfig build. Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range Originally-by: Samya <samya@twitter.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | ethernet: use likely() for common Ethernet encapstephen hemminger2013-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark code path's likely/unlikely based on most common usage. * Very few devices use dsa tags. * Most traffic is Ethernet (not 802.2) * No sane person uses trailer type or Novell encapsulation Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | ethernet: cleanup eth_type_transstephen hemminger2013-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove old legacy comment and weird if condition. The comment has outlived it's stay and is throwback to some early net code (before my time). Maybe Dave remembers what it meant. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | ipv6: Not need to set fl6.flowi6_flags as zeroLi RongQing2013-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | setting fl6.flowi6_flags as zero after memset is redundant, Remove it. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | qdisc: basic classifier - remove unnecessary initializationstephen hemminger2013-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | err is set once, then first code resets it. err = tcf_exts_validate(...) Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Jamal Hadi Salim <hadi@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | qdisc: meta return ENOMEM on alloc failurestephen hemminger2013-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than returning earlier value (EINVAL), return ENOMEM if kzalloc fails. Found while reviewing to find another EINVAL condition. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | Merge branch 'master' of ↵David S. Miller2013-09-30
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Conflicts: include/net/xfrm.h Simple conflict between Joe Perches "extern" removal for function declarations in header files and the changes in Steffen's tree. Steffen Klassert says: ==================== Two patches that are left from the last development cycle. Manual merging of include/net/xfrm.h is needed. The conflict can be solved as it is currently done in linux-next. 1) We announce the creation of temporary acquire state via an asyc event, so the deletion should be annunced too. From Nicolas Dichtel. 2) The VTI tunnels do not real tunning, they just provide a routable IPsec tunnel interface. So introduce and use xfrm_tunnel_notifier instead of xfrm_tunnel for xfrm tunnel mode callback. From Fan Du. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | {ipv4,xfrm}: Introduce xfrm_tunnel_notifier for xfrm tunnel mode callbackFan Du2013-08-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some thoughts on IPv4 VTI implementation: The connection between VTI receiving part and xfrm tunnel mode input process is hardly a "xfrm_tunnel", xfrm_tunnel is used in places where, e.g ipip/sit and xfrm4_tunnel, acts like a true "tunnel" device. In addition, IMHO, VTI doesn't need vti_err to do something meaningful, as all VTI needs is just a notifier to be called whenever xfrm_input ingress a packet to update statistics. A IPsec protected packet is first handled by protocol handlers, e.g AH/ESP, to check packet authentication or encryption rightness. PMTU update is taken care of in this stage by protocol error handler. Then the packet is rearranged properly depending on whether it's transport mode or tunnel mode packed by mode "input" handler. The VTI handler code takes effects in this stage in tunnel mode only. So it neither need propagate PMTU, as it has already been done if necessary, nor the VTI handler is qualified as a xfrm_tunnel. So this patch introduces xfrm_tunnel_notifier and meanwhile wipe out vti_err code. Signed-off-by: Fan Du <fan.du@windriver.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: David S. Miller <davem@davemloft.net> Reviewed-by: Saurabh Mohan <saurabh.mohan@vyatta.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | | | | | | xfrm: announce deleation of temporary SANicolas Dichtel2013-08-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Creation of temporary SA are announced by netlink, but there is no notification for the deletion. This patch fix this asymmetric situation. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | | | | | | | dev: always advertise rx_flags changes via netlinkNicolas Dichtel2013-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When flags IFF_PROMISC and IFF_ALLMULTI are changed, netlink messages are not consistent. For example, if a multicast daemon is running (flag IFF_ALLMULTI set in dev->flags but not dev->gflags, ie not exported to userspace) and then a user sets it via netlink (flag IFF_ALLMULTI set in dev->flags and dev->gflags, ie exported to userspace), no netlink message is sent. Same for IFF_PROMISC and because dev->promiscuity is exported via IFLA_PROMISCUITY, we may send a netlink message after each change of this counter. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | dev: update __dev_notify_flags() to send rtnl msgNicolas Dichtel2013-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch only prepares the next one, there is no functional change. Now, __dev_notify_flags() can also be used to notify flags changes via rtnetlink. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | net: introduce SO_MAX_PACING_RATEEric Dumazet2013-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As mentioned in commit afe4fd062416b ("pkt_sched: fq: Fair Queue packet scheduler"), this patch adds a new socket option. SO_MAX_PACING_RATE offers the application the ability to cap the rate computed by transport layer. Value is in bytes per second. u32 val = 1000000; setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val)); To be effectively paced, a flow must use FQ packet scheduler. Note that a packet scheduler takes into account the headers for its computations. The effective payload rate depends on MSS and retransmits if any. I chose to make this pacing rate a SOL_SOCKET option instead of a TCP one because this can be used by other protocols. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | ipv4: processing ancillary IP_TOS or IP_TTLFrancesco Fusco2013-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out packets with the specified TTL or TOS overriding the socket values specified with the traditional setsockopt(). The struct inet_cork stores the values of TOS, TTL and priority that are passed through the struct ipcm_cookie. If there are user-specified TOS (tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are used to override the per-socket values. In case of TOS also the priority is changed accordingly. Two helper functions get_rttos and get_rtconn_flags are defined to take into account the presence of a user specified TOS value when computing RT_TOS and RT_CONN_FLAGS. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | ipv4: IP_TOS and IP_TTL can be specified as ancillary dataFrancesco Fusco2013-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables the IP_TTL and IP_TOS values passed from userspace to be stored in the ipcm_cookie struct. Three fields are added to the struct: - the TTL, expressed as __u8. The allowed values are in the [1-255]. A value of 0 means that the TTL is not specified. - the TOS, expressed as __s16. The allowed values are in the range [0,255]. A value of -1 means that the TOS is not specified. - the priority, expressed as a char and computed when handling the ancillary data. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | ipv6: compare sernum when walking fib for /proc/net/ipv6_route as safety netHannes Frederic Sowa2013-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides an additional safety net against NULL pointer dereferences while walking the fib trie for the new /proc/net/ipv6_route walkers. I never needed it myself and am unsure if it is needed at all, but the same checks where introduced in 2bec5a369ee79576a3eea2c23863325089785a2c ("ipv6: fib: fix crash when changing large fib while dumping it") to fix NULL pointer bugs. This patch is separated from the first patch to make it easier to revert if we are sure we can drop this logic. Cc: Ben Greear <greearb@candelatech.com> Cc: Patrick McHardy <kaber@trash.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | ipv6: avoid high order memory allocations for /proc/net/ipv6_routeHannes Frederic Sowa2013-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dumping routes on a system with lots rt6_infos in the fibs causes up to 11-order allocations in seq_file (which fail). While we could switch there to vmalloc we could just implement the streaming interface for /proc/net/ipv6_route. This patch switches /proc/net/ipv6_route from single_open_net to seq_open_net. loff_t *pos tracks dst entries. Also kill never used struct rt6_proc_arg and now unused function fib6_clean_all_ro. Cc: Ben Greear <greearb@candelatech.com> Cc: Patrick McHardy <kaber@trash.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | net: create sysfs symlinks for neighbour devicesVeaceslav Falico2013-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also, remove the same functionality from bonding - it will be already done for any device that links to its lower/upper neighbour. The links will be created for dev's kobject, and will look like lower_eth0 for lower device eth0 and upper_bridge0 for upper device bridge0. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | net: expose the master link to sysfs, and remove it from bondVeaceslav Falico2013-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we can have only one master upper neighbour, so it would be useful to create a symlink to it in the sysfs device directory, the way that bonding now does it, for every device. Lower devices from bridge/team/etc will automagically get it, so we could rely on it. Also, remove the same functionality from bonding. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | vlan: unlink the upper neighbour before unregisteringVeaceslav Falico2013-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On netdev unregister we're removing also all of its sysfs-associated stuff, including the sysfs symlinks that are controlled by netdev neighbour code. Also, it's a subtle race condition - cause we can still access it after unregistering. Move the unlinking right before the unregistering to fix both. CC: Patrick McHardy <kaber@trash.net> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | vlan: link the upper neighbour only after registeringVeaceslav Falico2013-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise users might access it without being fully registered, as per sysfs - it only inits in register_netdevice(), so is unusable till it is called. CC: Patrick McHardy <kaber@trash.net> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | net: add a possibility to get private from netdev_adjacent->listVeaceslav Falico2013-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It will be useful to get first/last element. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | net: add for_each iterators through neighbour lower link's privateVeaceslav Falico2013-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a possibility to iterate through netdev_adjacent's private, currently only for lower neighbours. Add both RCU and RTNL/other locking variants of iterators, and make the non-rcu variant to be safe from removal. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | net: add netdev_adjacent->private and allow to use itVeaceslav Falico2013-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, even though we can access any linked device, we can't attach anything to it, which is vital to properly manage them. To fix this, add a new void *private to netdev_adjacent and functions setting/getting it (per link), so that we can save, per example, bonding's slave structures there, per slave device. netdev_master_upper_dev_link_private(dev, upper_dev, private) links dev to upper dev and populates the neighbour link only with private. netdev_lower_dev_get_private{,_rcu}() returns the private, if found. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | net: add RCU variant to search for netdev_adjacent linkVeaceslav Falico2013-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we have only the RTNL flavour, however we can traverse it while holding only RCU, so add the RCU search. Add an RCU variant that uses list_head * as an argument, so that it can be universally used afterwards. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | net: add adj_list to save only neighboursVeaceslav Falico2013-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we distinguish neighbours (first-level linked devices) from non-neighbours by the neighbour bool in the netdev_adjacent. This could be quite time-consuming in case we would like to traverse *only* through neighbours - cause we'd have to traverse through all devices and check for this flag, and in a (quite common) scenario where we have lots of vlans on top of bridge, which is on top of a bond - the bonding would have to go through all those vlans to get its upper neighbour linked devices. This situation is really unpleasant, cause there are already a lot of cases when a device with slaves needs to go through them in hot path. To fix this, introduce a new upper/lower device lists structure - adj_list, which contains only the neighbours. It works always in pair with the all_adj_list structure (renamed from upper/lower_dev_list), i.e. both of them contain the same links, only that all_adj_list contains also non-neighbour device links. It's really a small change visible, currently, only for __netdev_adjacent_dev_insert/remove(), and doesn't change the main linked logic at all. Also, add some comments a fix a name collision in netdev_for_each_upper_dev_rcu() and rework the naming by the following rules: netdev_(all_)(upper|lower)_* If "all_" is present, then we work with the whole list of upper/lower devices, otherwise - only with direct neighbours. Uninline functions - to get better stack traces. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | net: use lists as arguments instead of bool upperVeaceslav Falico2013-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we make use of bool upper when we want to specify if we want to work with upper/lower list. It's, however, harder to read, debug and occupies a lot more code. Fix this by just passing the correct upper/lower_dev_list list_head pointer instead of bool upper, and work internally with it. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>