aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Merge branch 'mlx4-misc-fixes'David S. Miller2017-02-23
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tariq Toukan says: ==================== mlx4 misc fixes This patchset contains misc bug fixes from Eric Dumazet and our team to the mlx4 Core and Eth drivers. Series generated against net commit: eee2faabc63d tcp: account for ts offset only if tsecr not zero v3: * Rebased, conflict solved. v2: * Added Eric's fix (patch 5/5). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4_en: Use __skb_fill_page_desc()Eric Dumazet2017-02-23
| | | | | | | | | | | | | | | | | | Or we might miss the fact that a page was allocated from memory reserves. Fixes: dceeab0e5258 ("mlx4: support __GFP_MEMALLOC for rx") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4_core: Use cq quota in SRIOV when creating completion EQsJack Morgenstein2017-02-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating EQs to handle CQ completion events for the PF or for VFs, we create enough EQE entries to handle completions for the max number of CQs that can use that EQ. When SRIOV is activated, the max number of CQs a VF (or the PF) can obtain is its CQ quota (determined by the Hypervisor resource tracker). Therefore, when creating an EQ, the number of EQE entries that the VF should request for that EQ is the CQ quota value (and not the total number of CQs available in the FW). Under SRIOV, the PF, also must use its CQ quota, because the resource tracker also controls how many CQs the PF can obtain. Using the FW total CQs instead of the CQ quota when creating EQs resulted wasting MTT entries, due to allocating more EQEs than were needed. Fixes: 5a0d0a6161ae ("mlx4: Structures and init/teardown for VF resource quotas") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reported-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new ↵Majd Dibbiny2017-02-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | probed PFs In the VF driver, module parameter mlx4_log_num_mgm_entry_size was mistakenly overwritten -- and in a manner which overrode the device-managed flow steering option encoded in the parameter. log_num_mgm_entry_size is a global module parameter which affects all ConnectX-3 PFs installed on that host. If a VF changes log_num_mgm_entry_size, this will affect all PFs which are probed subsequent to the change (by disabling DMFS for those PFs). Fixes: 3c439b5586e9 ("mlx4_core: Allow choosing flow steering mode") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4: Spoofcheck and zero MAC can't coexistEugenia Emantayev2017-02-23
| | | | | | | | | | | | | | | | | | | | Spoofcheck can't be enabled if VF MAC is zero. Vice versa, can't zero MAC if spoofcheck is on. Fixes: 8f7ba3ca12f6 ('net/mlx4: Add set VF mac address support') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx4: Change ENOTSUPP to EOPNOTSUPPOr Gerlitz2017-02-23
|/ | | | | | | | | | | As ENOTSUPP is specific to NFS, change the return error value to EOPNOTSUPP in various places in the mlx4 driver. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Suggested-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* uapi: fix linux/rds.h userspace compilation errorsDmitry V. Levin2017-02-23
| | | | | | | | | | | | | | | | | | | | | Consistently use types from linux/types.h to fix the following linux/rds.h userspace compilation errors: /usr/include/linux/rds.h:198:2: error: unknown type name 'u8' u8 rx_traces; /usr/include/linux/rds.h:199:2: error: unknown type name 'u8' u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX]; /usr/include/linux/rds.h:203:2: error: unknown type name 'u8' u8 rx_traces; /usr/include/linux/rds.h:204:2: error: unknown type name 'u8' u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX]; /usr/include/linux/rds.h:205:2: error: unknown type name 'u64' u64 rx_trace[RDS_MSG_RX_DGRAM_TRACE_MAX]; Fixes: 3289025aedc0 ("RDS: add receive message trace used by application") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errorsDmitry V. Levin2017-02-23
| | | | | | | | | | | | | | | | | | Include <linux/in6.h> in uapi/linux/seg6.h to fix the following linux/seg6.h userspace compilation error: /usr/include/linux/seg6.h:31:18: error: array type has incomplete element type 'struct in6_addr' struct in6_addr segments[0]; Include <linux/seg6.h> in uapi/linux/seg6_iptunnel.h to fix the following linux/seg6_iptunnel.h userspace compilation error: /usr/include/linux/seg6_iptunnel.h:26:21: error: array type has incomplete element type 'struct ipv6_sr_hdr' struct ipv6_sr_hdr srh[0]; Fixes: a50a05f497a2 ("ipv6: sr: add missing Kbuild export for header files") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* lib: Remove string from parman config selectionJiri Pirko2017-02-23
| | | | | | | | | | | | As reported by Geert, remove the string so the user does not see this config option. The option is explicitly selected only as a dependency of in-kernel users. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: 44091d29f207 ("lib: Introduce priority array area manager") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* forcedeth: Remove return from a void functionZhu Yanjun2017-02-23
| | | | | | | In a void function, it is not necessary to append a return statement in it. Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bpf: fix spelling mistake: "proccessed" -> "processed"Colin Ian King2017-02-23
| | | | | | | trivial fix to spelling mistake in verbose log message Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* uapi: fix linux/llc.h userspace compilation errorDmitry V. Levin2017-02-23
| | | | | | | | | | | Include <linux/if.h> to fix the following linux/llc.h userspace compilation error: /usr/include/linux/llc.h:26:27: error: 'IFHWADDRLEN' undeclared here (not in a function) unsigned char sllc_mac[IFHWADDRLEN]; Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* uapi: fix linux/ip6_tunnel.h userspace compilation errorsDmitry V. Levin2017-02-23
| | | | | | | | | | | | | Include <linux/if.h> and <linux/in6.h> to fix the following linux/ip6_tunnel.h userspace compilation errors: /usr/include/linux/ip6_tunnel.h:23:12: error: 'IFNAMSIZ' undeclared here (not in a function) char name[IFNAMSIZ]; /* name of tunnel device */ /usr/include/linux/ip6_tunnel.h:30:18: error: field 'laddr' has incomplete type struct in6_addr laddr; /* local tunnel end-point address */ Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'mlx5-fixes'David S. Miller2017-02-23
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Saeed Mahameed says: ==================== Mellanox mlx5e fixes for 4.11-rc1 This series includes some important bug fixes for mlx5e driver. Three misc fixes: From Mohamad, compilation fix on s390 system From Me, A fix for driver unload when switchdev mode is on. From Tariq, HW LRO frag size optimization for when build_skb is not used (striding RQ mode). Three CQE compression related fixes: Two fixes from Tariq and I, to correctly setup CQE compression parameters on driver load and on arbitrary user modifications. Last patch, fixes a very critical issue that was originally reported by Tom, where the driver reported csum errors or even page ref issues for when cqe compression is enabled and rapidly active. For your convenience this series was generated on top of net-next branch: 005c3490e9db ('Revert "ath10k: Search SMBIOS for OEM board file extension"') for -stable: net/mlx5e: Register/unregister vport representors on interface (for kernel >= 4.9) net/mlx5e: Do not reduce LRO WQE size when not using build_skb (for kernel >= 4.9) net/mlx5e: Fix broken CQE compression initialization (for kernel >= 4.9) net/mlx5e: Update MPWQE stride size when modifying CQE compress state (for kernel >= 4.7) net/mlx5e: Fix wrong CQE decompression (for kernel >= 4.7) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Fix wrong CQE decompressionTariq Toukan2017-02-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cqe compression with striding RQ, the decompression of the CQE field wqe_counter was done with a wrong wraparound value. This caused handling cqes with a wrong pointer to wqe (rx descriptor) and creating SKBs with wrong data, pointing to wrong (and already consumed) strides/pages. The meaning of the CQE field wqe_counter in striding RQ holds the stride index instead of the WQE index. Hence, when decompressing a CQE, wqe_counter should have wrapped-around the number of strides in a single multi-packet WQE. We dropped this wrap-around mask at all in CQE decompression of striding RQ. It is not needed as in such cases the CQE compression session would break because of different value of wqe_id field, starting a new compression session. Tested: ethtool -K ethxx lro off/on ethtool --set-priv-flags ethxx rx_cqe_compress on super_netperf 16 {ipv4,ipv6} -t TCP_STREAM -m 50 -D verified no csum errors and no page refcount issues. Fixes: 7219ab34f184 ("net/mlx5e: CQE compression") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Tom Herbert <tom@herbertland.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Update MPWQE stride size when modifying CQE compress stateSaeed Mahameed2017-02-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the admin enables/disables cqe compression, updating mpwqe stride size is required: CQE compress ON ==> stride size = 256B CQE compress OFF ==> stride size = 64B This is already done on driver load via mlx5e_set_rq_type_params, all we need is just to call it on arbitrary admin changes of cqe compression state via priv flags or when changing timestamping state (as it is mutually exclusive with cqe compression). This bug introduces no functional damage, it only makes cqe compression occur less often, since in ConnectX4-LX CQE compression is performed only on packets smaller than stride size. Tested: ethtool --set-priv-flags ethxx rx_cqe_compress on pktgen with 64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6) verify `ethtool -S ethxx | grep compress` are advancing more often (rapidly) Fixes: 7219ab34f184 ("net/mlx5e: CQE compression") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Fix broken CQE compression initializationTariq Toukan2017-02-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of RQ type parameters are derived from CQE compression state flag, CQE compression flag was initialized only after RQ type parameters setup. This leads to load RQ with stride size smaller than what we want for when CQE compression is on. This bug introduces no functional damage, it only makes CQE compression occur less often, since in ConnectX4-LX CQE compression is performed only on packets smaller than stride size. Fix this by marking default status of CQE compression in PFLAG prior to calling mlx5e_set_rq_priv_params(), as it inits some fields based on it. Tested: load driver on systems where rx CQE compress will be on (MH) pktgen with 64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6) verify `ethtool -S ethxx | grep compress` are advancing more often (rapidly) Fixes: 2fc4bfb7250d ("net/mlx5e: Dynamic RQ type infrastructure") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Do not reduce LRO WQE size when not using build_skbTariq Toukan2017-02-23
| | | | | | | | | | | | | | | | | | | | When rq_type is Striding RQ, no room of SKB_RESERVE is needed as SKB allocation is not done via build_skb. Fixes: e4b85508072b ("net/mlx5e: Slightly reduce hardware LRO size") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Register/unregister vport representors on interface attach/detachSaeed Mahameed2017-02-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently vport representors are added only on driver load and removed on driver unload. Apparently we forgot to handle them when we added the seamless reset flow feature. This caused to leave the representors netdevs alive and active with open HW resources on pci shutdown and on error reset flows. To overcome this we move their handling to interface attach/detach, so they would be cleaned up on shutdown and recreated on reset flows. Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: s390 system compilation fixMohamad Haj Yahia2017-02-23
|/ | | | | | | | | | Add necessary headers include for s390 arch compilation. Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Fixes: d605d6686dc7 ("net/mlx5e: Add support for ethtool self..") Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: account for ts offset only if tsecr not zeroAlexey Kodanev2017-02-22
| | | | | | | | | We can get SYN with zero tsecr, don't apply offset in this case. Fixes: ee684b6f2830 ("tcp: send packets with a socket timestamp") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: setup timestamp offset when write_seq already setAlexey Kodanev2017-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | Found that when randomized tcp offsets are enabled (by default) TCP client can still start new connections without them. Later, if server does active close and re-uses sockets in TIME-WAIT state, new SYN from client can be rejected on PAWS check inside tcp_timewait_state_process(), because either tw_ts_recent or rcv_tsval doesn't really have an offset set. Here is how to reproduce it with LTP netstress tool: netstress -R 1 & netstress -H 127.0.0.1 -lr 1000000 -a1 [...] < S seq 1956977072 win 43690 TS val 295618 ecr 459956970 > . ack 1956911535 win 342 TS val 459967184 ecr 1547117608 < R seq 1956911535 win 0 length 0 +1. < S seq 1956977072 win 43690 TS val 296640 ecr 459956970 > S. seq 657450664 ack 1956977073 win 43690 TS val 459968205 ecr 296640 Fixes: 95a22caee396 ("tcp: randomize tcp timestamp offsets for each connection") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/dccp: fix use after free in tw_timer_handler()Andrey Ryabinin2017-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DCCP doesn't purge timewait sockets on network namespace shutdown. So, after net namespace destroyed we could still have an active timer which will trigger use after free in tw_timer_handler(): BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10 Read of size 8 by task swapper/1/0 Call Trace: __asan_load8+0x54/0x90 tw_timer_handler+0x4a/0xa0 call_timer_fn+0x127/0x480 expire_timers+0x1db/0x2e0 run_timer_softirq+0x12f/0x2a0 __do_softirq+0x105/0x5b4 irq_exit+0xdd/0xf0 smp_apic_timer_interrupt+0x57/0x70 apic_timer_interrupt+0x90/0xa0 Object at ffff88010e0d1bc0, in cache net_namespace size: 6848 Allocated: save_stack_trace+0x1b/0x20 kasan_kmalloc+0xee/0x180 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc+0x134/0x310 copy_net_ns+0x8d/0x280 create_new_namespaces+0x23f/0x340 unshare_nsproxy_namespaces+0x75/0xf0 SyS_unshare+0x299/0x4f0 entry_SYSCALL_64_fastpath+0x18/0xad Freed: save_stack_trace+0x1b/0x20 kasan_slab_free+0xae/0x180 kmem_cache_free+0xb4/0x350 net_drop_ns+0x3f/0x50 cleanup_net+0x3df/0x450 process_one_work+0x419/0xbb0 worker_thread+0x92/0x850 kthread+0x192/0x1e0 ret_from_fork+0x2e/0x40 Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge timewait sockets on net namespace destruction and prevent above issue. Fixes: f2bf415cfed7 ("mib: add net to NET_ADD_STATS_BH") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* uapi: fix linux/if.h userspace compilation errorsDmitry V. Levin2017-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Include <sys/socket.h> (guarded by ifndef __KERNEL__) to fix the following linux/if.h userspace compilation errors: /usr/include/linux/if.h:234:19: error: field 'ifru_addr' has incomplete type struct sockaddr ifru_addr; /usr/include/linux/if.h:235:19: error: field 'ifru_dstaddr' has incomplete type struct sockaddr ifru_dstaddr; /usr/include/linux/if.h:236:19: error: field 'ifru_broadaddr' has incomplete type struct sockaddr ifru_broadaddr; /usr/include/linux/if.h:237:19: error: field 'ifru_netmask' has incomplete type struct sockaddr ifru_netmask; /usr/include/linux/if.h:238:20: error: field 'ifru_hwaddr' has incomplete type struct sockaddr ifru_hwaddr; This also fixes userspace compilation of the following uapi headers: linux/atmbr2684.h linux/gsmmux.h linux/if_arp.h linux/if_bonding.h linux/if_frad.h linux/if_pppox.h linux/if_tunnel.h linux/netdevice.h linux/route.h linux/wireless.h As no uapi header provides a definition of struct sockaddr, inclusion of <sys/socket.h> seems to be the most conservative and the only safe fix available. All current users of <linux/if.h> are very likely to be including <sys/socket.h> already because the latter is the sole provider of struct sockaddr definition in libc, so adding a uapi header with a definition of struct sockaddr would create a potential conflict with <sys/socket.h>. Replacing struct sockaddr in the definition of struct ifreq with a different type would create a potential incompatibility with current users of struct ifreq who might rely on ifru_addr et al members being of type struct sockaddr. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* l2tp: Avoid schedule while atomic in exit_netRidge Kennedy2017-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While destroying a network namespace that contains a L2TP tunnel a "BUG: scheduling while atomic" can be observed. Enabling lockdep shows that this is happening because l2tp_exit_net() is calling l2tp_tunnel_closeall() (via l2tp_tunnel_delete()) from within an RCU critical section. l2tp_exit_net() takes rcu_read_lock_bh() << list_for_each_entry_rcu() >> l2tp_tunnel_delete() l2tp_tunnel_closeall() __l2tp_session_unhash() synchronize_rcu() << Illegal inside RCU critical section >> BUG: sleeping function called from invalid context in_atomic(): 1, irqs_disabled(): 0, pid: 86, name: kworker/u16:2 INFO: lockdep is turned off. CPU: 2 PID: 86 Comm: kworker/u16:2 Tainted: G W O 4.4.6-at1 #2 Hardware name: Xen HVM domU, BIOS 4.6.1-xs125300 05/09/2016 Workqueue: netns cleanup_net 0000000000000000 ffff880202417b90 ffffffff812b0013 ffff880202410ac0 ffffffff81870de8 ffff880202417bb8 ffffffff8107aee8 ffffffff81870de8 0000000000000c51 0000000000000000 ffff880202417be0 ffffffff8107b024 Call Trace: [<ffffffff812b0013>] dump_stack+0x85/0xc2 [<ffffffff8107aee8>] ___might_sleep+0x148/0x240 [<ffffffff8107b024>] __might_sleep+0x44/0x80 [<ffffffff810b21bd>] synchronize_sched+0x2d/0xe0 [<ffffffff8109be6d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8105c7bb>] ? __local_bh_enable_ip+0x6b/0xc0 [<ffffffff816a1b00>] ? _raw_spin_unlock_bh+0x30/0x40 [<ffffffff81667482>] __l2tp_session_unhash+0x172/0x220 [<ffffffff81667397>] ? __l2tp_session_unhash+0x87/0x220 [<ffffffff8166888b>] l2tp_tunnel_closeall+0x9b/0x140 [<ffffffff81668c74>] l2tp_tunnel_delete+0x14/0x60 [<ffffffff81668dd0>] l2tp_exit_net+0x110/0x270 [<ffffffff81668d5c>] ? l2tp_exit_net+0x9c/0x270 [<ffffffff815001c3>] ops_exit_list.isra.6+0x33/0x60 [<ffffffff81501166>] cleanup_net+0x1b6/0x280 ... This bug can easily be reproduced with a few steps: $ sudo unshare -n bash # Create a shell in a new namespace # ip link set lo up # ip addr add 127.0.0.1 dev lo # ip l2tp add tunnel remote 127.0.0.1 local 127.0.0.1 tunnel_id 1 \ peer_tunnel_id 1 udp_sport 50000 udp_dport 50000 # ip l2tp add session name foo tunnel_id 1 session_id 1 \ peer_session_id 1 # ip link set foo up # exit # Exit the shell, in turn exiting the namespace $ dmesg ... [942121.089216] BUG: scheduling while atomic: kworker/u16:3/13872/0x00000200 ... To fix this, move the call to l2tp_tunnel_closeall() out of the RCU critical section, and instead call it from l2tp_tunnel_del_work(), which is running from the l2tp_wq workqueue. Fixes: 2b551c6e7d5b ("l2tp: close sessions before initiating tunnel delete") Signed-off-by: Ridge Kennedy <ridge.kennedy@alliedtelesis.co.nz> Acked-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
* qlogic: netxen: constify bin_attribute structuresBhumika Goyal2017-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Declare bin_attribute structures as const as they are only passed as an arguments to the functions device_remove_bin_file and device_create_bin_file. These function arguments are of type const, so bin_attribute structures having this property can be made const too. Done using Coccinelle: @r1 disable optional_qualifier @ identifier i; position p; @@ static struct bin_attribute i@p = {...}; @ok1@ identifier r1.i; position p,p1; @@ ( device_remove_bin_file(...,&i@p) | device_create_bin_file(..., &i@p1) ) @bad@ position p!={r1.p,ok1.p,ok1.p1}; identifier r1.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r1.i; @@ +const struct bin_attribute i; Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* qlogic: qlcnic_sysfs: constify bin_attribute structuresBhumika Goyal2017-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Declare bin_attribute structures as const as they are only passed as an arguments to the functions device_remove_bin_file and device_create_bin_file. These function arguments are of type const, so bin_attribute structures having this property can be made const too. Done using Coccinelle: @r1 disable optional_qualifier @ identifier i; position p; @@ static struct bin_attribute i@p = {...}; @ok1@ identifier r1.i; position p,p1; @@ ( device_remove_bin_file(...,&i@p) | device_create_bin_file(..., &i@p1) ) @bad@ position p!={r1.p,ok1.p,ok1.p1}; identifier r1.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r1.i; @@ +const struct bin_attribute i; Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: emac: add support for device-tree based PHY discovery and setupChristian Lamparter2017-02-22
| | | | | | | | | | | | | | | | | | | | | | | This patch adds glue-code that allows the EMAC driver to interface with the existing dt-supported PHYs in drivers/net/phy. Because currently, the emac driver maintains a small library of supported phys for in a private phy.c file located in the drivers directory. The support is limited to mostly single ethernet transceiver like the: CIS8201, BCM5248, ET1011C, Marvell 88E1111 and 88E1112, AR8035. However, routers like the Netgear WNDR4700 and Cisco Meraki MX60(W) have a 5-port switch (AR8327N) attached to the EMAC. The switch chip is supported by the qca8k mdio driver, which uses the generic phy library. Another reason is that PHYLIB also supports the BCM54610, which was used for the Western Digital My Book Live. This will now also make EMAC select PHYLIB. Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'arm64-upstream' of ↵Linus Torvalds2017-02-22
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: - Errata workarounds for Qualcomm's Falkor CPU - Qualcomm L2 Cache PMU driver - Qualcomm SMCCC firmware quirk - Support for DEBUG_VIRTUAL - CPU feature detection for userspace via MRS emulation - Preliminary work for the Statistical Profiling Extension - Misc cleanups and non-critical fixes * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (74 commits) arm64/kprobes: consistently handle MRS/MSR with XZR arm64: cpufeature: correctly handle MRS to XZR arm64: traps: correctly handle MRS/MSR with XZR arm64: ptrace: add XZR-safe regs accessors arm64: include asm/assembler.h in entry-ftrace.S arm64: fix warning about swapper_pg_dir overflow arm64: Work around Falkor erratum 1003 arm64: head.S: Enable EL1 (host) access to SPE when entered at EL2 arm64: arch_timer: document Hisilicon erratum 161010101 arm64: use is_vmalloc_addr arm64: use linux/sizes.h for constants arm64: uaccess: consistently check object sizes perf: add qcom l2 cache perf events driver arm64: remove wrong CONFIG_PROC_SYSCTL ifdef ARM: smccc: Update HVC comment to describe new quirk parameter arm64: do not trace atomic operations ACPI/IORT: Fix the error return code in iort_add_smmu_platform_device() ACPI/IORT: Fix iort_node_get_id() mapping entries indexing arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA perf: xgene: Include module.h ...
| * arm64/kprobes: consistently handle MRS/MSR with XZRMark Rutland2017-02-15
| | | | | | | | | | | | | | | | | | | | Now that we have XZR-safe helpers for fiddling with registers, use these in the arm64 kprobes code rather than open-coding the logic. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: cpufeature: correctly handle MRS to XZRMark Rutland2017-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In emulate_mrs() we may erroneously write back to the user SP rather than XZR if we trap an MRS instruction where Xt == 31. Use the new pt_regs_write_reg() helper to handle this correctly. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Fixes: 77c97b4ee21290f5 ("arm64: cpufeature: Expose CPUID registers by emulation") Cc: Andre Przywara <andre.przywara@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: traps: correctly handle MRS/MSR with XZRMark Rutland2017-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we hand-roll XZR-safe register handling in user_cache_maint_handler(), though we forget to do the same in ctr_read_handler(), and may erroneously write back to the user SP rather than XZR. Use the new helpers to handle these cases correctly and consistently. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Fixes: 116c81f427ff6c53 ("arm64: Work around systems with mismatched cache line sizes") Cc: Andre Przywara <andre.przywara@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: ptrace: add XZR-safe regs accessorsMark Rutland2017-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In A64, XZR and the SP share the same encoding (31), and whether an instruction accesses XZR or SP for a particular register parameter depends on the definition of the instruction. We store the SP in pt_regs::regs[31], and thus when emulating instructions, we must be careful to not erroneously read from or write back to the saved SP. Unfortunately, we often fail to be this careful. In all cases, instructions using a transfer register parameter Xt use this to refer to XZR rather than SP. This patch adds helpers so that we can more easily and consistently handle these cases. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: include asm/assembler.h in entry-ftrace.SArnd Bergmann2017-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In a randconfig build I ran into this build error: arch/arm64/kernel/entry-ftrace.S: Assembler messages: arch/arm64/kernel/entry-ftrace.S:101: Error: unknown mnemonic `ldr_l' -- `ldr_l x2,ftrace_trace_function' The macro is defined in asm/assembler.h, so we should include that file. Fixes: 829d2bd13392 ("arm64: entry-ftrace.S: avoid open-coded {adr,ldr}_l") Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: fix warning about swapper_pg_dir overflowArnd Bergmann2017-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With 4 levels of 16KB pages, we get this warning about the fact that we are copying a whole page into an array that is declared as having only two pointers for the top level of the page table: arch/arm64/mm/mmu.c: In function 'paging_init': arch/arm64/mm/mmu.c:528:2: error: 'memcpy' writing 16384 bytes into a region of size 16 overflows the destination [-Werror=stringop-overflow=] This is harmless since we actually reserve a whole page in the definition of the array that comes from, and just the extern declaration is short. The pgdir is initialized to zero either way, so copying the actual entries here seems like the best solution. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: Work around Falkor erratum 1003Christopher Covington2017-02-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Qualcomm Datacenter Technologies Falkor v1 CPU may allocate TLB entries using an incorrect ASID when TTBRx_EL1 is being updated. When the erratum is triggered, page table entries using the new translation table base address (BADDR) will be allocated into the TLB using the old ASID. All circumstances leading to the incorrect ASID being cached in the TLB arise when software writes TTBRx_EL1[ASID] and TTBRx_EL1[BADDR], a memory operation is in the process of performing a translation using the specific TTBRx_EL1 being written, and the memory operation uses a translation table descriptor designated as non-global. EL2 and EL3 code changing the EL1&0 ASID is not subject to this erratum because hardware is prohibited from performing translations from an out-of-context translation regime. Consider the following pseudo code. write new BADDR and ASID values to TTBRx_EL1 Replacing the above sequence with the one below will ensure that no TLB entries with an incorrect ASID are used by software. write reserved value to TTBRx_EL1[ASID] ISB write new value to TTBRx_EL1[BADDR] ISB write new value to TTBRx_EL1[ASID] ISB When the above sequence is used, page table entries using the new BADDR value may still be incorrectly allocated into the TLB using the reserved ASID. Yet this will not reduce functionality, since TLB entries incorrectly tagged with the reserved ASID will never be hit by a later instruction. Based on work by Shanker Donthineni <shankerd@codeaurora.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: head.S: Enable EL1 (host) access to SPE when entered at EL2Will Deacon2017-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | The SPE architecture requires each exception level to enable access to the SPE controls for the exception level below it, since additional context-switch logic may be required to handle the buffer safely. This patch allows EL1 (host) access to the SPE controls when entered at EL2. Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: arch_timer: document Hisilicon erratum 161010101Ding Tianhong2017-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have a workaround for Hisilicon erratum 161010101, notes this in the arm64 silicon-errata document. The new config option is too long to fit in the existing kconfig column, so this is widened to accomodate it. At the same time, an existing whitespace error is corrected, and the existing pattern of a line space between vendors is enforced for recent additions. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> [Mark: split patch, reword commit message, rework table] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: use is_vmalloc_addrMiles Chen2017-02-09
| | | | | | | | | | | | | | | | To is_vmalloc_addr() to check if an address is a vmalloc address instead of checking VMALLOC_START and VMALLOC_END manually. Signed-off-by: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: use linux/sizes.h for constantsMiles Chen2017-02-09
| | | | | | | | | | | | | | Use linux/size.h to improve code readability. Signed-off-by: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: uaccess: consistently check object sizesMark Rutland2017-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently in arm64's copy_{to,from}_user, we only check the source/destination object size if access_ok() tells us the user access is permissible. However, in copy_from_user() we'll subsequently zero any remainder on the destination object. If we failed the access_ok() check, that applies to the whole object size, which we didn't check. To ensure that we catch that case, this patch hoists check_object_size() to the start of copy_from_user(), matching __copy_from_user() and __copy_to_user(). To make all of our uaccess copy primitives consistent, the same is done to copy_to_user(). Cc: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * perf: add qcom l2 cache perf events driverNeil Leeder2017-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds perf events support for L2 cache PMU. The L2 cache PMU driver is named 'l2cache_0' and can be used with perf events to profile L2 events such as cache hits and misses on Qualcomm Technologies processors. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Neil Leeder <nleeder@codeaurora.org> [will: minimise nesting in l2_cache_associate_cpu_with_cluster] [will: use kstrtoul for unsigned long, remove redunant .owner setting] Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: remove wrong CONFIG_PROC_SYSCTL ifdefJuri Lelli2017-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The sysfs cpu_capacity entry for each CPU has nothing to do with PROC_FS, nor it's in /proc/sys path. Remove such ifdef. Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Reported-and-suggested-by: Sudeep Holla <sudeep.holla@arm.com> Fixes: be8f185d8af4 ('arm64: add sysfs cpu_capacity attribute') Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * ARM: smccc: Update HVC comment to describe new quirk parameterWill Deacon2017-02-08
| | | | | | | | | | | | | | | | | | Commit 680a0873e193 ("arm: kernel: Add SMC structure parameter") added a new "quirk" parameter to the SMC and HVC SMCCC backends, but only updated the comment for the SMC version. This patch adds the new paramater to the comment describing the HVC version too. Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: do not trace atomic operationsPratyush Anand2017-02-06
| | | | | | | | | | | | | | | | | | | | Atomic operation function symbols are exported,when CONFIG_ARM64_LSE_ATOMICS is defined. Prefix them with notrace, so that an user can not trace these functions. Tracing these functions causes kernel crash. Signed-off-by: Pratyush Anand <panand@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * ACPI/IORT: Fix the error return code in iort_add_smmu_platform_device()Dan Carpenter2017-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The function iort_add_smmu_platform_device() accidentally returns 0 (ie PTR_ERR(pdev) where pdev == NULL) if platform_device_alloc() fails; fix the bug by returning a proper error value. Fixes: 846f0e9e74a0 ("ACPI/IORT: Add support for ARM SMMU platform devices creation") Acked-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> [lorenzo.pieralisi@arm.com: improved commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * ACPI/IORT: Fix iort_node_get_id() mapping entries indexingLorenzo Pieralisi2017-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 618f535a6062 ("ACPI/IORT: Add single mapping function") introduced a function (iort_node_get_id()) to retrieve ids for IORT named components. The iort_node_get_id() takes an index as input to refer to a specific mapping entry in the named component IORT node mapping array. For a mapping entry at a given index, iort_node_get_id() should return the id value (through the id_out function parameter) and the IORT node output_reference (through function return value) the given mapping entry refers to. Technically output_reference values may differ for different map entries, (see diagram below - mapped id values may refer to different eg IORT SMMU nodes; the kernel may not be able to handle different output_reference values for a given named component but the IORT kernel layer should still report the IORT mappings as reported by firmware) but current code in iort_node_get_id() fails to use the index function parameter to return the correct output_reference value (ie it always returns the output_reference value of the first entry in the mapping array whilst using the index correctly to retrieve the id value from the respective entry). |----------------------| | named component | |----------------------| | map entry[0] | |----------------------| | id value | | output_reference----------------> eg SMMU 1 |----------------------| | map entry[1] | |----------------------| | id value | | output_reference----------------> eg SMMU 2 |----------------------| . . . |----------------------| | map entry[N] | |----------------------| | id value | | output_reference----------------> eg SMMU 1 |----------------------| Consequently the iort_node_get_id() function always returns the IORT node pointed at by the output_reference value of the first named component mapping array entry, irrespective of the index parameter, which is a bug. Update the map array entry pointer computation in iort_node_get_id() to take into account the index value, fixing the issue. Fixes: 618f535a6062 ("ACPI/IORT: Add single mapping function") Reported-by: Hanjun Guo <hanjun.guo@linaro.org> Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Hanjun Guo <hanjun.guo@linaro.org> Cc: Sinan Kaya <okaya@codeaurora.org> Cc: Tomasz Nowicki <tn@semihalf.com> Cc: Nate Watterson <nwatters@codeaurora.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMAArd Biesheuvel2017-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The NUMA code may get confused by the presence of NOMAP regions within zones, resulting in spurious BUG() checks where the node id deviates from the containing zone's node id. Since the kernel has no business reasoning about node ids of pages it does not own in the first place, enable CONFIG_HOLES_IN_ZONE to ensure that such pages are disregarded. Acked-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * perf: xgene: Include module.hStephen Boyd2017-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I ran into a build error when I disabled CONFIG_ACPI and tried to compile this driver: drivers/perf/xgene_pmu.c:1242:1: warning: data definition has no type or storage class MODULE_DEVICE_TABLE(of, xgene_pmu_of_match); ^ drivers/perf/xgene_pmu.c:1242:1: error: type defaults to 'int' in declaration of 'MODULE_DEVICE_TABLE' [-Werror=implicit-int] Include module.h for the MODULE_DEVICE_TABLE macro that's implicitly included through ACPI. Tested-by: Tai Nguyen <ttnguyen@apm.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm: perf: use builtin_platform_driverGeliang Tang2017-02-03
| | | | | | | | | | | | | | Use builtin_platform_driver() helper to simplify the code. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>