aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* net: gro: allow to build full sized skbEric Dumazet2013-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb_gro_receive() is currently limited to 16 or 17 MSS per GRO skb, typically 24616 bytes, because it fills up to MAX_SKB_FRAGS frags. It's relatively easy to extend the skb using frag_list to allow more frags to be appended into the last sk_buff. This still builds very efficient skbs, and allows reaching 45 MSS per skb. (45 MSS GRO packet uses one skb plus a frag_list containing 2 additional sk_buff) High speed TCP flows benefit from this extension by lowering TCP stack cpu usage (less packets stored in receive queue, less ACK packets processed) Forwarding setups could be hurt, as such skbs will need to be linearized, although its not a new problem, as GRO could already provide skbs with a frag_list. We could make the 65536 bytes threshold a tunable to mitigate this. (First time we need to linearize skb in skb_needs_linearize(), we could lower the tunable to ~16*1460 so that following skb_gro_receive() calls build smaller skbs) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* fib_trie: only calc for the un-first nodebaker.zhang2013-10-10
| | | | | | | | | | This is a enhancement. for the first node in fib_trie, newpos is 0, bit is 1. Only for the leaf or node with unmatched key need calc pos. Signed-off-by: baker.zhang <baker.kernel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* veth: allow to setup multicast address for veth deviceGao feng2013-10-10
| | | | | | | | | | | | | We can only setup multicast address for network device when net_device_ops->ndo_set_rx_mode is not null. Some configurations need to add multicast address for net device, such as netfilter cluster match module. Add a fake ndo_set_rx_mode function to allow this operation. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* be2net: change the driver version number to 4.9.224.0Ajit Khaparde2013-10-09
| | | | | Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* be2net: Display RoCE specific counters in ethtool -SAjit Khaparde2013-10-09
| | | | | | | | SkyHawk-R can support RoCE. Add code to display RoCE specific counters maintained in hardware. Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* be2net: Call version 2 of GET_STATS ioctl for Skyhawk-RAjit Khaparde2013-10-09
| | | | | | | | Moving to version 2 of GET_STATS command as SkyHawk-R supports higher number of rings. Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bnx2x: Add ndo_get_phys_port_id supportYuval Mintz2013-10-09
| | | | | | | | | Each network interface (either PF or VF) is identified by its port's MAC id. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix build errors if ipv6 is disabledEric Dumazet2013-10-09
| | | | | | | | | | CONFIG_IPV6=n is still a valid choice ;) It appears we can remove dead code. Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: fix a typo in __udp4_lib_mcast_demux_lookupEric Dumazet2013-10-09
| | | | | | | At this point sk might contain garbage. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: make lookups simpler and fasterEric Dumazet2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP listener refactoring, part 4 : To speed up inet lookups, we moved IPv4 addresses from inet to struct sock_common Now is time to do the same for IPv6, because it permits us to have fast lookups for all kind of sockets, including upcoming SYN_RECV. Getting IPv6 addresses in TCP lookups currently requires two extra cache lines, plus a dereference (and memory stall). inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6 This patch is way bigger than its IPv4 counter part, because for IPv4, we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6, it's not doable easily. inet6_sk(sk)->daddr becomes sk->sk_v6_daddr inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr at the same offset. We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic macro. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp/dccp: remove twchainEric Dumazet2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP listener refactoring, part 3 : Our goal is to hash SYN_RECV sockets into main ehash for fast lookup, and parallel SYN processing. Current inet_ehash_bucket contains two chains, one for ESTABLISH (and friend states) sockets, another for TIME_WAIT sockets only. As the hash table is sized to get at most one socket per bucket, it makes little sense to have separate twchain, as it makes the lookup slightly more complicated, and doubles hash table memory usage. If we make sure all socket types have the lookup keys at the same offsets, we can use a generic and faster lookup. It turns out TIME_WAIT and ESTABLISHED sockets already have common lookup fields for IPv4. [ INET_TW_MATCH() is no longer needed ] I'll provide a follow-up to factorize IPv6 lookup as well, to remove INET6_TW_MATCH() This way, SYN_RECV pseudo sockets will be supported the same. A new sock_gen_put() helper is added, doing either a sock_put() or inet_twsk_put() [ and will support SYN_RECV later ]. Note this helper should only be called in real slow path, when rcu lookup found a socket that was moved to another identity (freed/reused immediately), but could eventually be used in other contexts, like sock_edemux() Before patch : dmesg | grep "TCP established" TCP established hash table entries: 524288 (order: 11, 8388608 bytes) After patch : TCP established hash table entries: 524288 (order: 10, 4194304 bytes) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-10-08
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: include/linux/netdevice.h net/core/sock.c Trivial merge issues. Removal of "extern" for functions declaration in netdevice.h at the same time "const" was added to an argument. Two parallel line additions in net/core/sock.c Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'sfc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfcDavid S. Miller2013-10-08
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ben Hutchings says: ==================== Some more fixes for EF10 support; hopefully the last lot: 1. Fixes for reading statistics, from Edward Cree and Jon Cooper. 2. Addition of ethtool statistics for packets dropped by the hardware before they were associated with a specific function, from Edward Cree. 3. Only bind to functions that are in control of their associated port, as the driver currently assumes this is the case. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * sfc: Only bind to EF10 functions with the LinkCtrl and Trusted flagsBen Hutchings2013-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although we do not yet enable multiple PFs per port, it is possible that a board will be reconfigured to enable them while the driver has not yet been updated to fully support this. The most obvious problem is that multiple functions may try to set conflicting link settings. But we will also run into trouble if the firmware doesn't consider us fully trusted. So, abort probing unless both the LinkCtrl and Trusted flags are set for this function. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| | * sfc: Add PM and RXDP drop counters to ethtool statsEdward Cree2013-10-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recognise the new Packet Memory and RX Data Path counters. The following counters are added: rx_pm_{trunc,discard}_bb_overflow - burst buffer overflowed. This should not occur if BB correctly configured. rx_pm_{trunc,discard}_vfifo_full - not enough space in packet memory. May indicate RX performance problems. rx_pm_{trunc,discard}_qbb - dropped by 802.1Qbb early discard mechanism. Since Qbb is not supported at present, this should not occur. rx_pm_discard_mapping - 802.1p priority configured to be dropped. This should not occur in normal operation. rx_dp_q_disabled_packets - packet was to be delivered to a queue but queue is disabled. May indicate misconfiguration by the driver. rx_dp_di_dropped_packets - parser-dispatcher indicated that a packet should be dropped. rx_dp_streaming_packets - packet was sent to the RXDP streaming bus, ie. a filter directed the packet to the MCPU. rx_dp_emerg_{fetch,wait} - RX datapath had to wait for descriptors to be loaded. Indicates performance problems but not drops. These are only provided if the MC firmware has the PM_AND_RXDP_COUNTERS capability. Otherwise, mask them out. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| | * sfc: Add definitions for new stats counters and capability flagMatthew Slattery2013-10-04
| | | | | | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| | * sfc: Refactor EF10 stat mask code to allow for more conditional statsEdward Cree2013-10-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, efx_ef10_stat_mask returned a static const unsigned long[], which meant that each possible mask had to be declared statically with STAT_MASK_BITMAP. Since adding a condition would double the size of the decision tree, we now create the bitmask dynamically. To do this, we have two functions efx_ef10_raw_stat_mask, which returns a u64, and efx_ef10_get_stat_mask, which fills in an unsigned long * argument. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| | * sfc: Fix internal indices of ethtool stats for EF10Edward Cree2013-10-04
| | | | | | | | | | | | | | | | | | | | | | | | The indices in nic_data->stats need to match the EF10_STAT_whatever enum values. In efx_nic_update_stats, only mask; gaps are removed in efx_ef10_update_stats. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| | * sfc: Add rmb() between reading stats and generation count to ensure consistencyJon Cooper2013-10-04
| | | | | | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * | pkt_sched: fq: fix non TCP flows pacingEric Dumazet2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Steinar reported FQ pacing was not working for UDP flows. It looks like the initial sk->sk_pacing_rate value of 0 was a wrong choice. We should init it to ~0U (unlimited) Then, TCA_FQ_FLOW_DEFAULT_RATE should be removed because it makes no real sense. The default rate is really unlimited, and we need to avoid a zero divide. Reported-by: Steinar H. Gunderson <sesse@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | qlcnic: add missing destroy_workqueue() on error path in qlcnic_probe()Wei Yongjun2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | Add the missing destroy_workqueue() before return from qlcnic_probe() in the error handling case. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | moxa: fix the error handling in moxart_mac_probe()Wei Yongjun2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fix the error handling in moxart_mac_probe(): - return -ENOMEM in some memory alloc fail cases - add missing free_netdev() in the error handling case Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: vlan: fix nlmsg size calculation in vlan_get_size()Marc Kleine-Budde2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the calculation of the nlmsg size, by adding the missing nla_total_size(). Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | pkt_sched: fq: fix typo for initial_quantumEric Dumazet2013-10-08
| | | | | | | | | | | | | | | | | | | | | TCA_FQ_INITIAL_QUANTUM should set q->initial_quantum Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: Fix the upper MTU limit in GRE tunnelOussama Ghorbel2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike ipv4, the struct member hlen holds the length of the GRE and ipv6 headers. This length is also counted in dev->hard_header_len. Perhaps, it's more clean to modify the hlen to count only the GRE header without ipv6 header as the variable name suggest, but the simple way to fix this without regression risk is simply modify the calculation of the limit in ip6gre_tunnel_change_mtu function. Verified in kernel version v3.11. Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | xen-netback: transition to CLOSED when removing a VIFDavid Vrabel2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a guest is destroyed without transitioning its frontend to CLOSED, the domain becomes a zombie as netback was not grant unmapping the shared rings. When removing a VIF, transition the backend to CLOSED so the VIF is disconnected if necessary (which will unmap the shared rings etc). This fixes a regression introduced by 279f438e36c0a70b23b86d2090aeec50155034a9 (xen-netback: Don't destroy the netdev until the vif is shut down). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Paul Durrant <Paul.Durrant@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'mlx4'David S. Miller2013-10-08
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Amir Vadai says: ==================== net/mlx4_en: Fix pages never dma unmapped on rx This patchset fixes a bug introduced by commit 51151a16 (mlx4: allow order-0 memory allocations in RX path). Where dma_unmap_page wasn't called. Changes from V0: - Added "Rename name of mlx4_en_rx_alloc members". Old names were confusing. - Last frag in page calculation was wrong. Since all frags in page are of the same size, need to add this frag_stride to end of frag offset, and not the size of next frag in skb. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net/mlx4_en: Fix pages never dma unmapped on rxAmir Vadai2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a bug introduced by commit 51151a16 (mlx4: allow order-0 memory allocations in RX path). dma_unmap_page never reached because condition to detect last fragment in page is wrong. offset+frag_stride can't be greater than size, need to make sure no additional frag will fit in page => compare offset + frag_stride + next_frag_size instead. next_frag_size is the same as the current one, since page is shared only with frags of the same size. CC: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net/mlx4_en: Rename name of mlx4_en_rx_alloc membersAmir Vadai2013-10-08
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | Add page prefix to page related members: @size and @offset into @page_size and @page_offset CC: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: sh_eth: Fix RX packets errors on R8A7740Nguyen Hong Ky2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch will fix RX packets errors when receiving big size of data by set bit RNC = 1. RNC - Receive Enable Control 0: Upon completion of reception of one frame, the E-DMAC writes the receive status to the descriptor and clears the RR bit in EDRRR to 0. 1: Upon completion of reception of one frame, the E-DMAC writes (writes back) the receive status to the descriptor. In addition, the E-DMAC reads the next descriptor and prepares for reception of the next frame. In addition, for get more stable when receiving packets, I set maximum size for the transmit/receive FIFO and inserts padding in receive data. Signed-off-by: Nguyen Hong Ky <nh-ky@jinso.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | l2tp: Fix build warning with ipv6 disabled.David S. Miller2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net/l2tp/l2tp_core.c: In function ‘l2tp_verify_udp_checksum’: net/l2tp/l2tp_core.c:499:22: warning: unused variable ‘tunnel’ [-Wunused-variable] Create a helper "l2tp_tunnel()" to facilitate this, and as a side effect get rid of a bunch of unnecessary void pointer casts. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tun: don't look at current when non-blockingMichael S. Tsirkin2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We play with a wait queue even if socket is non blocking. This is an obvious waste. Besides, it will prevent calling the non blocking variant when current is not valid. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'mrf24j40'David S. Miller2013-10-08
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alan Ott says: ==================== Fix race conditions in mrf24j40 interrupts After testing with the betas of this patchset, it's been rebased and is ready for inclusion. David Hauweele noticed that the mrf24j40 would hang arbitrarily after some period of heavy traffic. Two race conditions were discovered, and the driver was changed to use threaded interrupts, since the enable/disable of interrupts in the driver has recently been a lighning rod whenever issues arise related to interrupts (costing engineering time), and since threaded interrupts are the right way to do it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | mrf24j40: Use level-triggered interruptsAlan Ott2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mrf24j40 generates level interrupts. There are rare cases where it appears that the interrupt line never gets de-asserted between interrupts, causing interrupts to be lost, and causing a hung device from the driver's perspective. Switching the driver to interpret these interrupts as level-triggered fixes this issue. Signed-off-by: Alan Ott <alan@signal11.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | mrf24j40: Use threaded IRQ handlerAlan Ott2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Eliminate all the workqueue and interrupt enable/disable. Signed-off-by: Alan Ott <alan@signal11.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | mrf24j40: Move INIT_COMPLETION() to before packet transmissionAlan Ott2013-10-08
| |/ / | | | | | | | | | | | | | | | | | | | | | This avoids a race condition where complete(tx_complete) could be called before tx_complete is initialized. Signed-off-by: Alan Ott <alan@signal11.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch '6lowpan'David S. Miller2013-10-08
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alan Ott says: ==================== Alexander Aring suggested that devices desired to be linked to 6lowpan be checked for actually being of type IEEE802154, since IEEE802154 devices are all that are supported by 6lowpan at present. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | 6lowpan: Sync default hardware address of lowpan links to their wpanAlan Ott2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a lowpan link to a wpan device is created, set the hardware address of the lowpan link to that of the wpan device. Signed-off-by: Alan Ott <alan@signal11.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | 6lowpan: Only make 6lowpan links to IEEE802154 devicesAlan Ott2013-10-08
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | Refuse to create 6lowpan links if the actual hardware interface is of any type other than ARPHRD_IEEE802154. Signed-off-by: Alan Ott <alan@signal11.us> Suggested-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Update the sysctl permissions handler to test effective uid/gidEric W. Biederman2013-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Tue, 20 Aug 2013 11:40:04 -0500 Eric Sandeen <sandeen@redhat.com> wrote: > This was brought up in a Red Hat bug (which may be marked private, I'm sorry): > > Bug 987055 - open O_WRONLY succeeds on some root owned files in /proc for process running with unprivileged EUID > > "On RHEL7 some of the files in /proc can be opened for writing by an unprivileged EUID." > > The flaw existed upstream as well last I checked. > > This commit in kernel v3.8 caused the regression: > > commit cff109768b2d9c03095848f4cd4b0754117262aa > Author: Eric W. Biederman <ebiederm@xmission.com> > Date: Fri Nov 16 03:03:01 2012 +0000 > > net: Update the per network namespace sysctls to be available to the network namespace owner > > - Allow anyone with CAP_NET_ADMIN rights in the user namespace of the > the netowrk namespace to change sysctls. > - Allow anyone the uid of the user namespace root the same > permissions over the network namespace sysctls as the global root. > - Allow anyone with gid of the user namespace root group the same > permissions over the network namespace sysctl as the global root group. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > because it changed /sys/net's special permission handler to test current_uid, not > current_euid; same for current_gid/current_egid. > > So in this case, root cannot drop privs via set[ug]id, and retains all privs > in this codepath. Modify the code to use current_euid(), and in_egroup_p, as in done in fs/proc/proc_sysctl.c:test_perm() Cc: stable@vger.kernel.org Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | can: dev: fix nlmsg size calculation in can_get_size()Marc Kleine-Budde2013-10-07
| | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the calculation of the nlmsg size, by adding the missing nla_total_size(). Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: fix ineffective source address selectionJiri Benc2013-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sending out multicast messages, the source address in inet->mc_addr is ignored and rewritten by an autoselected one. This is caused by a typo in commit 813b3b5db831 ("ipv4: Use caller's on-stack flowi as-is in output route lookups"). Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net/ethernet: cpsw: DT read bool dual_emacMarkus Pargmann2013-10-07
| | | | | | | | | | | | | | | | | | | | | Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ethernet: cpsw: Search childs for slave nodesMarkus Pargmann2013-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation searches the whole DT for nodes named "slave". This patch changes it to search only child nodes for slaves. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: fix unsafe set_memory_rw from softirqAlexei Starovoitov2013-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | on x86 system with net.core.bpf_jit_enable = 1 sudo tcpdump -i eth1 'tcp port 22' causes the warning: [ 56.766097] Possible unsafe locking scenario: [ 56.766097] [ 56.780146] CPU0 [ 56.786807] ---- [ 56.793188] lock(&(&vb->lock)->rlock); [ 56.799593] <Interrupt> [ 56.805889] lock(&(&vb->lock)->rlock); [ 56.812266] [ 56.812266] *** DEADLOCK *** [ 56.812266] [ 56.830670] 1 lock held by ksoftirqd/1/13: [ 56.836838] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380 [ 56.849757] [ 56.849757] stack backtrace: [ 56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45 [ 56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 56.882004] ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007 [ 56.895630] ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001 [ 56.909313] ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001 [ 56.923006] Call Trace: [ 56.929532] [<ffffffff8175a145>] dump_stack+0x55/0x76 [ 56.936067] [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208 [ 56.942445] [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50 [ 56.948932] [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150 [ 56.955470] [<ffffffff810ccb52>] mark_lock+0x282/0x2c0 [ 56.961945] [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50 [ 56.968474] [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50 [ 56.975140] [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90 [ 56.981942] [<ffffffff810cef72>] lock_acquire+0x92/0x1d0 [ 56.988745] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 56.995619] [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50 [ 57.002493] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 57.009447] [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380 [ 57.016477] [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380 [ 57.023607] [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460 [ 57.030818] [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10 [ 57.037896] [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0 [ 57.044789] [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0 [ 57.051720] [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40 [ 57.058727] [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40 [ 57.065577] [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30 [ 57.072338] [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0 [ 57.078962] [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0 [ 57.085373] [<ffffffff81058245>] run_ksoftirqd+0x35/0x70 cannot reuse jited filter memory, since it's readonly, so use original bpf insns memory to hold work_struct defer kfree of sk_filter until jit completed freeing tested on x86_64 and i386 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: Allow the MTU of ipip6 tunnel to be set below 1280Oussama Ghorbel2013-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The (inner) MTU of a ipip6 (IPv4-in-IPv6) tunnel cannot be set below 1280, which is the minimum MTU in IPv6. However, there should be no IPv6 on the tunnel interface at all, so the IPv6 rules should not apply. More info at https://bugzilla.kernel.org/show_bug.cgi?id=15530 This patch allows to check the minimum MTU for ipv6 tunnel according to these rules: -In case the tunnel is configured with ipip6 mode the minimum MTU is 68. -In case the tunnel is configured with ip6ip6 or any mode the minimum MTU is 1280. Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netif_set_xps_queue: make cpu mask constMichael S. Tsirkin2013-10-07
| |/ | | | | | | | | | | | | | | virtio wants to pass in cpumask_of(cpu), make parameter const to avoid build warnings. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp: do not forget FIN in tcp_shifted_skb()Eric Dumazet2013-10-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yuchung found following problem : There are bugs in the SACK processing code, merging part in tcp_shift_skb_data(), that incorrectly resets or ignores the sacked skbs FIN flag. When a receiver first SACK the FIN sequence, and later throw away ofo queue (e.g., sack-reneging), the sender will stop retransmitting the FIN flag, and hangs forever. Following packetdrill test can be used to reproduce the bug. $ cat sack-merge-bug.pkt `sysctl -q net.ipv4.tcp_fack=0` // Establish a connection and send 10 MSS. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +.000 bind(3, ..., ...) = 0 +.000 listen(3, 1) = 0 +.050 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +.000 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> +.001 < . 1:1(0) ack 1 win 1024 +.000 accept(3, ..., ...) = 4 +.100 write(4, ..., 12000) = 12000 +.000 shutdown(4, SHUT_WR) = 0 +.000 > . 1:10001(10000) ack 1 +.050 < . 1:1(0) ack 2001 win 257 +.000 > FP. 10001:12001(2000) ack 1 +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:11001,nop,nop> +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:12002,nop,nop> // SACK reneg +.050 < . 1:1(0) ack 12001 win 257 +0 %{ print "unacked: ",tcpi_unacked }% +5 %{ print "" }% First, a typo inverted left/right of one OR operation, then code forgot to advance end_seq if the merged skb carried FIN. Bug was added in 2.6.29 by commit 832d11c5cd076ab ("tcp: Try to restore large SKBs while SACK processing") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'for-davem' of ↵David S. Miller2013-10-03
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== Here is another batch of fixes intended for the 3.12 stream... For the mac80211 bits, Johannes says: "This time I have two fixes for IBSS (including one for wext, hah), a fix for extended rates IEs, an active monitor checking fix and a sysfs registration race fix." On top of those... Amitkumar Karwar brings an mwifiex fix for an interrupt loss issue w/ SDIO devices. The problem was due to a command timeout issue introduced by an earlier patch. Felix Fietkau a stall in the ath9k driver. This patch fixes the regression introduced in the commit "ath9k: use software queues for un-aggregated data packets". Stanislaw Gruszka reverts an rt2x00 patch that was found to cause connection problems with some devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * Merge branch 'master' of ↵John W. Linville2013-10-03
| | |\ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem