aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv6
Commit message (Collapse)AuthorAge
...
* | | | | | | ip6_tunnel: remove magic mtu value 0xFFF8Nicolas Dichtel2018-06-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I don't know where this value comes from (probably a copy and paste and paste and paste ...). Let's use standard values which are a bit greater. Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | Merge branch 'master' of ↵David S. Miller2018-06-01
|\ \ \ \ \ \ \ | |_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-05-31 1) Avoid possible overflow of the offset variable in _decode_session6(), this fixes an infinite lookp there. From Eric Dumazet. 2) We may use an error pointer in the error path of xfrm_bundle_create(). Fix this by returning this pointer directly to the caller. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | xfrm6: avoid potential infinite loop in _decode_session6()Eric Dumazet2018-05-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot found a way to trigger an infinitie loop by overflowing @offset variable that has been forced to use u16 for some very obscure reason in the past. We probably want to look at NEXTHDR_FRAGMENT handling which looks wrong, in a separate patch. In net-next, we shall try to use skb_header_pointer() instead of pskb_may_pull(). watchdog: BUG: soft lockup - CPU#1 stuck for 134s! [syz-executor738:4553] Modules linked in: irq event stamp: 13885653 hardirqs last enabled at (13885652): [<ffffffff878009d5>] restore_regs_and_return_to_kernel+0x0/0x2b hardirqs last disabled at (13885653): [<ffffffff87800905>] interrupt_entry+0xb5/0xf0 arch/x86/entry/entry_64.S:625 softirqs last enabled at (13614028): [<ffffffff84df0809>] tun_napi_alloc_frags drivers/net/tun.c:1478 [inline] softirqs last enabled at (13614028): [<ffffffff84df0809>] tun_get_user+0x1dd9/0x4290 drivers/net/tun.c:1825 softirqs last disabled at (13614032): [<ffffffff84df1b6f>] tun_get_user+0x313f/0x4290 drivers/net/tun.c:1942 CPU: 1 PID: 4553 Comm: syz-executor738 Not tainted 4.17.0-rc3+ #40 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:check_kcov_mode kernel/kcov.c:67 [inline] RIP: 0010:__sanitizer_cov_trace_pc+0x20/0x50 kernel/kcov.c:101 RSP: 0018:ffff8801d8cfe250 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffff8801d88a8080 RBX: ffff8801d7389e40 RCX: 0000000000000006 RDX: 0000000000000000 RSI: ffffffff868da4ad RDI: ffff8801c8a53277 RBP: ffff8801d8cfe250 R08: ffff8801d88a8080 R09: ffff8801d8cfe3e8 R10: ffffed003b19fc87 R11: ffff8801d8cfe43f R12: ffff8801c8a5327f R13: 0000000000000000 R14: ffff8801c8a4e5fe R15: ffff8801d8cfe3e8 FS: 0000000000d88940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff600400 CR3: 00000001acab3000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: _decode_session6+0xc1d/0x14f0 net/ipv6/xfrm6_policy.c:150 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2368 xfrm_decode_session_reverse include/net/xfrm.h:1213 [inline] icmpv6_route_lookup+0x395/0x6e0 net/ipv6/icmp.c:372 icmp6_send+0x1982/0x2da0 net/ipv6/icmp.c:551 icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43 ip6_input_finish+0x14e1/0x1a30 net/ipv6/ip6_input.c:305 NF_HOOK include/linux/netfilter.h:288 [inline] ip6_input+0xe1/0x5e0 net/ipv6/ip6_input.c:327 dst_input include/net/dst.h:450 [inline] ip6_rcv_finish+0x29c/0xa10 net/ipv6/ip6_input.c:71 NF_HOOK include/linux/netfilter.h:288 [inline] ipv6_rcv+0xeb8/0x2040 net/ipv6/ip6_input.c:208 __netif_receive_skb_core+0x2468/0x3650 net/core/dev.c:4646 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4711 netif_receive_skb_internal+0x126/0x7b0 net/core/dev.c:4785 napi_frags_finish net/core/dev.c:5226 [inline] napi_gro_frags+0x631/0xc40 net/core/dev.c:5299 tun_get_user+0x3168/0x4290 drivers/net/tun.c:1951 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1996 call_write_iter include/linux/fs.h:1784 [inline] do_iter_readv_writev+0x859/0xa50 fs/read_write.c:680 do_iter_write+0x185/0x5f0 fs/read_write.c:959 vfs_writev+0x1c7/0x330 fs/read_write.c:1004 do_writev+0x112/0x2f0 fs/read_write.c:1039 __do_sys_writev fs/read_write.c:1112 [inline] __se_sys_writev fs/read_write.c:1109 [inline] __x64_sys_writev+0x75/0xb0 fs/read_write.c:1109 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: syzbot+0053c8...@syzkaller.appspotmail.com Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | | | | | | ipv6: sr: fix memory OOB access in seg6_do_srh_encap/inlineMathieu Xhonneux2018-05-28
| |_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | seg6_do_srh_encap and seg6_do_srh_inline can possibly do an out-of-bounds access when adding the SRH to the packet. This no longer happen when expanding the skb not only by the size of the SRH (+ outer IPv6 header), but also by skb->mac_len. [ 53.793056] BUG: KASAN: use-after-free in seg6_do_srh_encap+0x284/0x620 [ 53.794564] Write of size 14 at addr ffff88011975ecfa by task ping/674 [ 53.796665] CPU: 0 PID: 674 Comm: ping Not tainted 4.17.0-rc3-ARCH+ #90 [ 53.796670] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 [ 53.796673] Call Trace: [ 53.796679] <IRQ> [ 53.796689] dump_stack+0x71/0xab [ 53.796700] print_address_description+0x6a/0x270 [ 53.796707] kasan_report+0x258/0x380 [ 53.796715] ? seg6_do_srh_encap+0x284/0x620 [ 53.796722] memmove+0x34/0x50 [ 53.796730] seg6_do_srh_encap+0x284/0x620 [ 53.796741] ? seg6_do_srh+0x29b/0x360 [ 53.796747] seg6_do_srh+0x29b/0x360 [ 53.796756] seg6_input+0x2e/0x2e0 [ 53.796765] lwtunnel_input+0x93/0xd0 [ 53.796774] ipv6_rcv+0x690/0x920 [ 53.796783] ? ip6_input+0x170/0x170 [ 53.796791] ? eth_gro_receive+0x2d0/0x2d0 [ 53.796800] ? ip6_input+0x170/0x170 [ 53.796809] __netif_receive_skb_core+0xcc0/0x13f0 [ 53.796820] ? netdev_info+0x110/0x110 [ 53.796827] ? napi_complete_done+0xb6/0x170 [ 53.796834] ? e1000_clean+0x6da/0xf70 [ 53.796845] ? process_backlog+0x129/0x2a0 [ 53.796853] process_backlog+0x129/0x2a0 [ 53.796862] net_rx_action+0x211/0x5c0 [ 53.796870] ? napi_complete_done+0x170/0x170 [ 53.796887] ? run_rebalance_domains+0x11f/0x150 [ 53.796891] __do_softirq+0x10e/0x39e [ 53.796894] do_softirq_own_stack+0x2a/0x40 [ 53.796895] </IRQ> [ 53.796898] do_softirq.part.16+0x54/0x60 [ 53.796900] __local_bh_enable_ip+0x5b/0x60 [ 53.796903] ip6_finish_output2+0x416/0x9f0 [ 53.796906] ? ip6_dst_lookup_flow+0x110/0x110 [ 53.796909] ? ip6_sk_dst_lookup_flow+0x390/0x390 [ 53.796911] ? __rcu_read_unlock+0x66/0x80 [ 53.796913] ? ip6_mtu+0x44/0xf0 [ 53.796916] ? ip6_output+0xfc/0x220 [ 53.796918] ip6_output+0xfc/0x220 [ 53.796921] ? ip6_finish_output+0x2b0/0x2b0 [ 53.796923] ? memcpy+0x34/0x50 [ 53.796926] ip6_send_skb+0x43/0xc0 [ 53.796929] rawv6_sendmsg+0x1216/0x1530 [ 53.796932] ? __orc_find+0x6b/0xc0 [ 53.796934] ? rawv6_rcv_skb+0x160/0x160 [ 53.796937] ? __rcu_read_unlock+0x66/0x80 [ 53.796939] ? __rcu_read_unlock+0x66/0x80 [ 53.796942] ? is_bpf_text_address+0x1e/0x30 [ 53.796944] ? kernel_text_address+0xec/0x100 [ 53.796946] ? __kernel_text_address+0xe/0x30 [ 53.796948] ? unwind_get_return_address+0x2f/0x50 [ 53.796950] ? __save_stack_trace+0x92/0x100 [ 53.796954] ? save_stack+0x89/0xb0 [ 53.796956] ? kasan_kmalloc+0xa0/0xd0 [ 53.796958] ? kmem_cache_alloc+0xd2/0x1f0 [ 53.796961] ? prepare_creds+0x23/0x160 [ 53.796963] ? __x64_sys_capset+0x252/0x3e0 [ 53.796966] ? do_syscall_64+0x69/0x160 [ 53.796968] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.796971] ? __alloc_pages_nodemask+0x170/0x380 [ 53.796973] ? __alloc_pages_slowpath+0x12c0/0x12c0 [ 53.796977] ? tty_vhangup+0x20/0x20 [ 53.796979] ? policy_nodemask+0x1a/0x90 [ 53.796982] ? __mod_node_page_state+0x8d/0xa0 [ 53.796986] ? __check_object_size+0xe7/0x240 [ 53.796989] ? __sys_sendto+0x229/0x290 [ 53.796991] ? rawv6_rcv_skb+0x160/0x160 [ 53.796993] __sys_sendto+0x229/0x290 [ 53.796996] ? __ia32_sys_getpeername+0x50/0x50 [ 53.796999] ? commit_creds+0x2de/0x520 [ 53.797002] ? security_capset+0x57/0x70 [ 53.797004] ? __x64_sys_capset+0x29f/0x3e0 [ 53.797007] ? __x64_sys_rt_sigsuspend+0xe0/0xe0 [ 53.797011] ? __do_page_fault+0x664/0x770 [ 53.797014] __x64_sys_sendto+0x74/0x90 [ 53.797017] do_syscall_64+0x69/0x160 [ 53.797019] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.797022] RIP: 0033:0x7f43b7a6714a [ 53.797023] RSP: 002b:00007ffd891bd368 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 53.797026] RAX: ffffffffffffffda RBX: 00000000006129c0 RCX: 00007f43b7a6714a [ 53.797028] RDX: 0000000000000040 RSI: 00000000006129c0 RDI: 0000000000000004 [ 53.797029] RBP: 00007ffd891be640 R08: 0000000000610940 R09: 000000000000001c [ 53.797030] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 [ 53.797032] R13: 000000000060e6a0 R14: 0000000000008004 R15: 000000000040b661 [ 53.797171] Allocated by task 642: [ 53.797460] kasan_kmalloc+0xa0/0xd0 [ 53.797463] kmem_cache_alloc+0xd2/0x1f0 [ 53.797465] getname_flags+0x40/0x210 [ 53.797467] user_path_at_empty+0x1d/0x40 [ 53.797469] do_faccessat+0x12a/0x320 [ 53.797471] do_syscall_64+0x69/0x160 [ 53.797473] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.797607] Freed by task 642: [ 53.797869] __kasan_slab_free+0x130/0x180 [ 53.797871] kmem_cache_free+0xa8/0x230 [ 53.797872] filename_lookup+0x15b/0x230 [ 53.797874] do_faccessat+0x12a/0x320 [ 53.797876] do_syscall_64+0x69/0x160 [ 53.797878] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.798014] The buggy address belongs to the object at ffff88011975e600 which belongs to the cache names_cache of size 4096 [ 53.799043] The buggy address is located 1786 bytes inside of 4096-byte region [ffff88011975e600, ffff88011975f600) [ 53.800013] The buggy address belongs to the page: [ 53.800414] page:ffffea000465d600 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 [ 53.801259] flags: 0x17fff0000008100(slab|head) [ 53.801640] raw: 017fff0000008100 0000000000000000 0000000000000000 0000000100070007 [ 53.803147] raw: dead000000000100 dead000000000200 ffff88011b185a40 0000000000000000 [ 53.803787] page dumped because: kasan: bad access detected [ 53.804384] Memory state around the buggy address: [ 53.804788] ffff88011975eb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.805384] ffff88011975ec00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.805979] >ffff88011975ec80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.806577] ^ [ 53.807165] ffff88011975ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.807762] ffff88011975ed80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.808356] ================================================================== [ 53.808949] Disabling lock debugging due to kernel taint Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: ip6_gre: fix tunnel metadata device sharing.William Tu2018-05-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently ip6gre and ip6erspan share single metadata mode device, using 'collect_md_tun'. Thus, when doing: ip link add dev ip6gre11 type ip6gretap external ip link add dev ip6erspan12 type ip6erspan external RTNETLINK answers: File exists simply fails due to the 2nd tries to create the same collect_md_tun. The patch fixes it by adding a separate collect md tunnel device for the ip6erspan, 'collect_md_tun_erspan'. As a result, a couple of places need to refactor/split up in order to distinguish ip6gre and ip6erspan. First, move the collect_md check at ip6gre_tunnel_{unlink,link} and create separate function {ip6gre,ip6ersapn}_tunnel_{link_md,unlink_md}. Then before link/unlink, make sure the link_md/unlink_md is called. Finally, a separate ndo_uninit is created for ip6erspan. Tested it using the samples/bpf/test_tunnel_bpf.sh. Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode") Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: test tailroom before appending to linear skbWillem de Bruijn2018-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Device features may change during transmission. In particular with corking, a device may toggle scatter-gather in between allocating and writing to an skb. Do not unconditionally assume that !NETIF_F_SG at write time implies that the same held at alloc time and thus the skb has sufficient tailroom. This issue predates git history. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: ip6_gre: Fix ip6erspan hlen calculationPetr Machata2018-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even though ip6erspan_tap_init() sets up hlen and tun_hlen according to what ERSPAN needs, it goes ahead to call ip6gre_tnl_link_config() which overwrites these settings with GRE-specific ones. Similarly for changelink callbacks, which are handled by ip6gre_changelink() calls ip6gre_tnl_change() calls ip6gre_tnl_link_config() as well. The difference ends up being 12 vs. 20 bytes, and this is generally not a problem, because a 12-byte request likely ends up allocating more and the extra 8 bytes are thus available. However correct it is not. So replace the newlink and changelink callbacks with an ERSPAN-specific ones, reusing the newly-introduced _common() functions. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: ip6_gre: Split up ip6gre_changelink()Petr Machata2018-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extract from ip6gre_changelink() a reusable function ip6gre_changelink_common(). This will allow introduction of ERSPAN-specific _changelink() function with not a lot of code duplication. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: ip6_gre: Split up ip6gre_newlink()Petr Machata2018-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extract from ip6gre_newlink() a reusable function ip6gre_newlink_common(). The ip6gre_tnl_link_config() call needs to be made customizable for ERSPAN, thus reorder it with calls to ip6_tnl_change_mtu() and dev_hold(), and extract the whole tail to the caller, ip6gre_newlink(). Thus enable an ERSPAN-specific _newlink() function without a lot of duplicity. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: ip6_gre: Split up ip6gre_tnl_change()Petr Machata2018-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split a reusable function ip6gre_tnl_copy_tnl_parm() from ip6gre_tnl_change(). This will allow ERSPAN-specific code to reuse the common parts while customizing the behavior for ERSPAN. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: ip6_gre: Split up ip6gre_tnl_link_config()Petr Machata2018-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function ip6gre_tnl_link_config() is used for setting up configuration of both ip6gretap and ip6erspan tunnels. Split the function into the common part and the route-lookup part. The latter then takes the calculated header length as an argument. This split will allow the patches down the line to sneak in a custom header length computation for the ERSPAN tunnel. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit()Petr Machata2018-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dev->needed_headroom is not primed until ip6_tnl_xmit(), so it starts out zero. Thus the call to skb_cow_head() fails to actually make sure there's enough headroom to push the ERSPAN headers to. That can lead to the panic cited below. (Reproducer below that). Fix by requesting either needed_headroom if already primed, or just the bare minimum needed for the header otherwise. [ 190.703567] kernel BUG at net/core/skbuff.c:104! [ 190.708384] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 190.714007] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld [ 190.728975] CPU: 1 PID: 959 Comm: kworker/1:2 Not tainted 4.17.0-rc4-net_master-custom-139 #10 [ 190.737647] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016 [ 190.747006] Workqueue: ipv6_addrconf addrconf_dad_work [ 190.752222] RIP: 0010:skb_panic+0xc3/0x100 [ 190.756358] RSP: 0018:ffff8801d54072f0 EFLAGS: 00010282 [ 190.761629] RAX: 0000000000000085 RBX: ffff8801c1a8ecc0 RCX: 0000000000000000 [ 190.768830] RDX: 0000000000000085 RSI: dffffc0000000000 RDI: ffffed003aa80e54 [ 190.776025] RBP: ffff8801bd1ec5a0 R08: ffffed003aabce19 R09: ffffed003aabce19 [ 190.783226] R10: 0000000000000001 R11: ffffed003aabce18 R12: ffff8801bf695dbe [ 190.790418] R13: 0000000000000084 R14: 00000000000006c0 R15: ffff8801bf695dc8 [ 190.797621] FS: 0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000 [ 190.805786] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 190.811582] CR2: 000055fa929aced0 CR3: 0000000003228004 CR4: 00000000001606e0 [ 190.818790] Call Trace: [ 190.821264] <IRQ> [ 190.823314] ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre] [ 190.828940] ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre] [ 190.834562] skb_push+0x78/0x90 [ 190.837749] ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre] [ 190.843219] ? ip6gre_tunnel_ioctl+0xd90/0xd90 [ip6_gre] [ 190.848577] ? debug_check_no_locks_freed+0x210/0x210 [ 190.853679] ? debug_check_no_locks_freed+0x210/0x210 [ 190.858783] ? print_irqtrace_events+0x120/0x120 [ 190.863451] ? sched_clock_cpu+0x18/0x210 [ 190.867496] ? cyc2ns_read_end+0x10/0x10 [ 190.871474] ? skb_network_protocol+0x76/0x200 [ 190.875977] dev_hard_start_xmit+0x137/0x770 [ 190.880317] ? do_raw_spin_trylock+0x6d/0xa0 [ 190.884624] sch_direct_xmit+0x2ef/0x5d0 [ 190.888589] ? pfifo_fast_dequeue+0x3fa/0x670 [ 190.892994] ? pfifo_fast_change_tx_queue_len+0x810/0x810 [ 190.898455] ? __lock_is_held+0xa0/0x160 [ 190.902422] __qdisc_run+0x39e/0xfc0 [ 190.906041] ? _raw_spin_unlock+0x29/0x40 [ 190.910090] ? pfifo_fast_enqueue+0x24b/0x3e0 [ 190.914501] ? sch_direct_xmit+0x5d0/0x5d0 [ 190.918658] ? pfifo_fast_dequeue+0x670/0x670 [ 190.923047] ? __dev_queue_xmit+0x172/0x1770 [ 190.927365] ? preempt_count_sub+0xf/0xd0 [ 190.931421] __dev_queue_xmit+0x410/0x1770 [ 190.935553] ? ___slab_alloc+0x605/0x930 [ 190.939524] ? print_irqtrace_events+0x120/0x120 [ 190.944186] ? memcpy+0x34/0x50 [ 190.947364] ? netdev_pick_tx+0x1c0/0x1c0 [ 190.951428] ? __skb_clone+0x2fd/0x3d0 [ 190.955218] ? __copy_skb_header+0x270/0x270 [ 190.959537] ? rcu_read_lock_sched_held+0x93/0xa0 [ 190.964282] ? kmem_cache_alloc+0x344/0x4d0 [ 190.968520] ? cyc2ns_read_end+0x10/0x10 [ 190.972495] ? skb_clone+0x123/0x230 [ 190.976112] ? skb_split+0x820/0x820 [ 190.979747] ? tcf_mirred+0x554/0x930 [act_mirred] [ 190.984582] tcf_mirred+0x554/0x930 [act_mirred] [ 190.989252] ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred] [ 190.996109] ? __lock_acquire+0x706/0x26e0 [ 191.000239] ? sched_clock_cpu+0x18/0x210 [ 191.004294] tcf_action_exec+0xcf/0x2a0 [ 191.008179] tcf_classify+0xfa/0x340 [ 191.011794] __netif_receive_skb_core+0x8e1/0x1c60 [ 191.016630] ? debug_check_no_locks_freed+0x210/0x210 [ 191.021732] ? nf_ingress+0x500/0x500 [ 191.025458] ? process_backlog+0x347/0x4b0 [ 191.029619] ? print_irqtrace_events+0x120/0x120 [ 191.034302] ? lock_acquire+0xd8/0x320 [ 191.038089] ? process_backlog+0x1b6/0x4b0 [ 191.042246] ? process_backlog+0xc2/0x4b0 [ 191.046303] process_backlog+0xc2/0x4b0 [ 191.050189] net_rx_action+0x5cc/0x980 [ 191.053991] ? napi_complete_done+0x2c0/0x2c0 [ 191.058386] ? mark_lock+0x13d/0xb40 [ 191.062001] ? clockevents_program_event+0x6b/0x1d0 [ 191.066922] ? print_irqtrace_events+0x120/0x120 [ 191.071593] ? __lock_is_held+0xa0/0x160 [ 191.075566] __do_softirq+0x1d4/0x9d2 [ 191.079282] ? ip6_finish_output2+0x524/0x1460 [ 191.083771] do_softirq_own_stack+0x2a/0x40 [ 191.087994] </IRQ> [ 191.090130] do_softirq.part.13+0x38/0x40 [ 191.094178] __local_bh_enable_ip+0x135/0x190 [ 191.098591] ip6_finish_output2+0x54d/0x1460 [ 191.102916] ? ip6_forward_finish+0x2f0/0x2f0 [ 191.107314] ? ip6_mtu+0x3c/0x2c0 [ 191.110674] ? ip6_finish_output+0x2f8/0x650 [ 191.114992] ? ip6_output+0x12a/0x500 [ 191.118696] ip6_output+0x12a/0x500 [ 191.122223] ? ip6_route_dev_notify+0x5b0/0x5b0 [ 191.126807] ? ip6_finish_output+0x650/0x650 [ 191.131120] ? ip6_fragment+0x1a60/0x1a60 [ 191.135182] ? icmp6_dst_alloc+0x26e/0x470 [ 191.139317] mld_sendpack+0x672/0x830 [ 191.143021] ? igmp6_mcf_seq_next+0x2f0/0x2f0 [ 191.147429] ? __local_bh_enable_ip+0x77/0x190 [ 191.151913] ipv6_mc_dad_complete+0x47/0x90 [ 191.156144] addrconf_dad_completed+0x561/0x720 [ 191.160731] ? addrconf_rs_timer+0x3a0/0x3a0 [ 191.165036] ? mark_held_locks+0xc9/0x140 [ 191.169095] ? __local_bh_enable_ip+0x77/0x190 [ 191.173570] ? addrconf_dad_work+0x50d/0xa20 [ 191.177886] ? addrconf_dad_work+0x529/0xa20 [ 191.182194] addrconf_dad_work+0x529/0xa20 [ 191.186342] ? addrconf_dad_completed+0x720/0x720 [ 191.191088] ? __lock_is_held+0xa0/0x160 [ 191.195059] ? process_one_work+0x45d/0xe20 [ 191.199302] ? process_one_work+0x51e/0xe20 [ 191.203531] ? rcu_read_lock_sched_held+0x93/0xa0 [ 191.208279] process_one_work+0x51e/0xe20 [ 191.212340] ? pwq_dec_nr_in_flight+0x200/0x200 [ 191.216912] ? get_lock_stats+0x4b/0xf0 [ 191.220788] ? preempt_count_sub+0xf/0xd0 [ 191.224844] ? worker_thread+0x219/0x860 [ 191.228823] ? do_raw_spin_trylock+0x6d/0xa0 [ 191.233142] worker_thread+0xeb/0x860 [ 191.236848] ? process_one_work+0xe20/0xe20 [ 191.241095] kthread+0x206/0x300 [ 191.244352] ? process_one_work+0xe20/0xe20 [ 191.248587] ? kthread_stop+0x570/0x570 [ 191.252459] ret_from_fork+0x3a/0x50 [ 191.256082] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24 [ 191.275327] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d54072f0 [ 191.281024] ---[ end trace 7ea51094e099e006 ]--- [ 191.285724] Kernel panic - not syncing: Fatal exception in interrupt [ 191.292168] Kernel Offset: disabled [ 191.295697] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Reproducer: ip link add h1 type veth peer name swp1 ip link add h3 type veth peer name swp3 ip link set dev h1 up ip address add 192.0.2.1/28 dev h1 ip link add dev vh3 type vrf table 20 ip link set dev h3 master vh3 ip link set dev vh3 up ip link set dev h3 up ip link set dev swp3 up ip address add dev swp3 2001:db8:2::1/64 ip link set dev swp1 up tc qdisc add dev swp1 clsact ip link add name gt6 type ip6erspan \ local 2001:db8:2::1 remote 2001:db8:2::2 oseq okey 123 ip link set dev gt6 up sleep 1 tc filter add dev swp1 ingress pref 1000 matchall skip_hw \ action mirred egress mirror dev gt6 ping -I h1 192.0.2.2 Fixes: e41c7c68ea77 ("ip6erspan: make sure enough headroom at xmit.") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: ip6_gre: Request headroom in __gre6_xmit()Petr Machata2018-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __gre6_xmit() pushes GRE headers before handing over to ip6_tnl_xmit() for generic IP-in-IP processing. However it doesn't make sure that there is enough headroom to push the header to. That can lead to the panic cited below. (Reproducer below that). Fix by requesting either needed_headroom if already primed, or just the bare minimum needed for the header otherwise. [ 158.576725] kernel BUG at net/core/skbuff.c:104! [ 158.581510] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 158.587174] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld [ 158.602268] CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 4.17.0-rc4-net_master-custom-139 #10 [ 158.610938] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016 [ 158.620426] RIP: 0010:skb_panic+0xc3/0x100 [ 158.624586] RSP: 0018:ffff8801d3f27110 EFLAGS: 00010286 [ 158.629882] RAX: 0000000000000082 RBX: ffff8801c02cc040 RCX: 0000000000000000 [ 158.637127] RDX: 0000000000000082 RSI: dffffc0000000000 RDI: ffffed003a7e4e18 [ 158.644366] RBP: ffff8801bfec8020 R08: ffffed003aabce19 R09: ffffed003aabce19 [ 158.651574] R10: 000000000000000b R11: ffffed003aabce18 R12: ffff8801c364de66 [ 158.658786] R13: 000000000000002c R14: 00000000000000c0 R15: ffff8801c364de68 [ 158.666007] FS: 0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000 [ 158.674212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 158.680036] CR2: 00007f4b3702dcd0 CR3: 0000000003228002 CR4: 00000000001606e0 [ 158.687228] Call Trace: [ 158.689752] ? __gre6_xmit+0x246/0xd80 [ip6_gre] [ 158.694475] ? __gre6_xmit+0x246/0xd80 [ip6_gre] [ 158.699141] skb_push+0x78/0x90 [ 158.702344] __gre6_xmit+0x246/0xd80 [ip6_gre] [ 158.706872] ip6gre_tunnel_xmit+0x3bc/0x610 [ip6_gre] [ 158.711992] ? __gre6_xmit+0xd80/0xd80 [ip6_gre] [ 158.716668] ? debug_check_no_locks_freed+0x210/0x210 [ 158.721761] ? print_irqtrace_events+0x120/0x120 [ 158.726461] ? sched_clock_cpu+0x18/0x210 [ 158.730572] ? sched_clock_cpu+0x18/0x210 [ 158.734692] ? cyc2ns_read_end+0x10/0x10 [ 158.738705] ? skb_network_protocol+0x76/0x200 [ 158.743216] ? netif_skb_features+0x1b2/0x550 [ 158.747648] dev_hard_start_xmit+0x137/0x770 [ 158.752010] sch_direct_xmit+0x2ef/0x5d0 [ 158.755992] ? pfifo_fast_dequeue+0x3fa/0x670 [ 158.760460] ? pfifo_fast_change_tx_queue_len+0x810/0x810 [ 158.765975] ? __lock_is_held+0xa0/0x160 [ 158.770002] __qdisc_run+0x39e/0xfc0 [ 158.773673] ? _raw_spin_unlock+0x29/0x40 [ 158.777781] ? pfifo_fast_enqueue+0x24b/0x3e0 [ 158.782191] ? sch_direct_xmit+0x5d0/0x5d0 [ 158.786372] ? pfifo_fast_dequeue+0x670/0x670 [ 158.790818] ? __dev_queue_xmit+0x172/0x1770 [ 158.795195] ? preempt_count_sub+0xf/0xd0 [ 158.799313] __dev_queue_xmit+0x410/0x1770 [ 158.803512] ? ___slab_alloc+0x605/0x930 [ 158.807525] ? ___slab_alloc+0x605/0x930 [ 158.811540] ? memcpy+0x34/0x50 [ 158.814768] ? netdev_pick_tx+0x1c0/0x1c0 [ 158.818895] ? __skb_clone+0x2fd/0x3d0 [ 158.822712] ? __copy_skb_header+0x270/0x270 [ 158.827079] ? rcu_read_lock_sched_held+0x93/0xa0 [ 158.831903] ? kmem_cache_alloc+0x344/0x4d0 [ 158.836199] ? skb_clone+0x123/0x230 [ 158.839869] ? skb_split+0x820/0x820 [ 158.843521] ? tcf_mirred+0x554/0x930 [act_mirred] [ 158.848407] tcf_mirred+0x554/0x930 [act_mirred] [ 158.853104] ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred] [ 158.860005] ? __lock_acquire+0x706/0x26e0 [ 158.864162] ? mark_lock+0x13d/0xb40 [ 158.867832] tcf_action_exec+0xcf/0x2a0 [ 158.871736] tcf_classify+0xfa/0x340 [ 158.875402] __netif_receive_skb_core+0x8e1/0x1c60 [ 158.880334] ? nf_ingress+0x500/0x500 [ 158.884059] ? process_backlog+0x347/0x4b0 [ 158.888241] ? lock_acquire+0xd8/0x320 [ 158.892050] ? process_backlog+0x1b6/0x4b0 [ 158.896228] ? process_backlog+0xc2/0x4b0 [ 158.900291] process_backlog+0xc2/0x4b0 [ 158.904210] net_rx_action+0x5cc/0x980 [ 158.908047] ? napi_complete_done+0x2c0/0x2c0 [ 158.912525] ? rcu_read_unlock+0x80/0x80 [ 158.916534] ? __lock_is_held+0x34/0x160 [ 158.920541] __do_softirq+0x1d4/0x9d2 [ 158.924308] ? trace_event_raw_event_irq_handler_exit+0x140/0x140 [ 158.930515] run_ksoftirqd+0x1d/0x40 [ 158.934152] smpboot_thread_fn+0x32b/0x690 [ 158.938299] ? sort_range+0x20/0x20 [ 158.941842] ? preempt_count_sub+0xf/0xd0 [ 158.945940] ? schedule+0x5b/0x140 [ 158.949412] kthread+0x206/0x300 [ 158.952689] ? sort_range+0x20/0x20 [ 158.956249] ? kthread_stop+0x570/0x570 [ 158.960164] ret_from_fork+0x3a/0x50 [ 158.963823] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24 [ 158.983235] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d3f27110 [ 158.988935] ---[ end trace 5af56ee845aa6cc8 ]--- [ 158.993641] Kernel panic - not syncing: Fatal exception in interrupt [ 159.000176] Kernel Offset: disabled [ 159.003767] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Reproducer: ip link add h1 type veth peer name swp1 ip link add h3 type veth peer name swp3 ip link set dev h1 up ip address add 192.0.2.1/28 dev h1 ip link add dev vh3 type vrf table 20 ip link set dev h3 master vh3 ip link set dev vh3 up ip link set dev h3 up ip link set dev swp3 up ip address add dev swp3 2001:db8:2::1/64 ip link set dev swp1 up tc qdisc add dev swp1 clsact ip link add name gt6 type ip6gretap \ local 2001:db8:2::1 remote 2001:db8:2::2 ip link set dev gt6 up sleep 1 tc filter add dev swp1 ingress pref 1000 matchall skip_hw \ action mirred egress mirror dev gt6 ping -I h1 192.0.2.2 Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | erspan: fix invalid erspan version.William Tu2018-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ERSPAN only support version 1 and 2. When packets send to an erspan device which does not have proper version number set, drop the packet. In real case, we observe multicast packets sent to the erspan pernet device, erspan0, which does not have erspan version configured. Reported-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2018-05-13
|\ \ \ \ \ \ | |_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) Fix handling of simultaneous open TCP connection in conntrack, from Jozsef Kadlecsik. 2) Insufficient sanitify check of xtables extension names, from Florian Westphal. 3) Skip unnecessary synchronize_rcu() call when transaction log is already empty, from Florian Westphal. 4) Incorrect destination mac validation in ebt_stp, from Stephen Hemminger. 5) xtables module reference counter leak in nft_compat, from Florian Westphal. 6) Incorrect connection reference counting logic in IPVS one-packet scheduler, from Julian Anastasov. 7) Wrong stats for 32-bits CPU in IPVS, also from Julian. 8) Calm down sparse error in netfilter core, also from Florian. 9) Use nla_strlcpy to fix compilation warning in nfnetlink_acct and nfnetlink_cthelper, again from Florian. 10) Missing module alias in icmp and icmp6 xtables extensions, from Florian Westphal. 11) Base chain statistics in nf_tables may be unset/null, from Florian. 12) Fix handling of large matchinfo size in nft_compat, this includes one preparation for before this fix. From Florian. 13) Fix bogus EBUSY error when deleting chains due to incorrect reference counting from the preparation phase of the two-phase commit protocol. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | netfilter: x_tables: add module alias for icmp matchesFlorian Westphal2018-05-08
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The icmp matches are implemented in ip_tables and ip6_tables, respectively, so for normal iptables they are always available: those modules are loaded once iptables calls getsockopt() to fetch available module revisions. In iptables-over-nftables case probing occurs via nfnetlink, so these modules might not be loaded. Add aliases so modprobe can load these when icmp/icmp6 is requested. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | | udp: fix SO_BINDTODEVICEPaolo Abeni2018-05-10
| |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Damir reported a breakage of SO_BINDTODEVICE for UDP sockets. In absence of VRF devices, after commit fb74c27735f0 ("net: ipv4: add second dif to udp socket lookups") the dif mismatch isn't fatal anymore for UDP socket lookup with non null sk_bound_dev_if, breaking SO_BINDTODEVICE semantics. This changeset addresses the issue making the dif match mandatory again in the above scenario. Reported-by: Damir Mansurov <dnman@oktetlabs.ru> Fixes: fb74c27735f0 ("net: ipv4: add second dif to udp socket lookups") Fixes: 1801b570dd2a ("net: ipv6: add second dif to udp socket lookups") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | trivial: fix inconsistent help textsGeorg Hofmann2018-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes "experimental" from the help text where depends on CONFIG_EXPERIMENTAL was already removed. Signed-off-by: Georg Hofmann <georg@hofmannsweb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge branch 'master' of ↵David S. Miller2018-05-07
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-05-07 1) Always verify length of provided sadb_key to fix a slab-out-of-bounds read in pfkey_add. From Kevin Easton. 2) Make sure that all states are really deleted before we check that the state lists are empty. Otherwise we trigger a warning. 3) Fix MTU handling of the VTI6 interfaces on interfamily tunnels. From Stefano Brivio. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | vti6: Change minimum MTU to IPV4_MIN_MTU, vti6 can carry IPv4 tooStefano Brivio2018-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A vti6 interface can carry IPv4 as well, so it makes no sense to enforce a minimum MTU of IPV6_MIN_MTU. If the user sets an MTU below IPV6_MIN_MTU, IPv6 will be disabled on the interface, courtesy of addrconf_notify(). Reported-by: Xin Long <lucien.xin@gmail.com> Fixes: b96f9afee4eb ("ipv4/6: use core net MTU range checking") Fixes: c6741fbed6dc ("vti6: Properly adjust vti6 MTU from MTU of lower device") Fixes: 53c81e95df17 ("ip6_vti: adjust vti mtu according to mtu of lower device") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | | xfrm: Fix warning in xfrm6_tunnel_net_exit.Steffen Klassert2018-04-16
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | We need to make sure that all states are really deleted before we check that the state lists are empty. Otherwise we trigger a warning. Fixes: baeb0dbbb5659 ("xfrm6_tunnel: exit_net cleanup check added") Reported-and-tested-by:syzbot+777bf170a89e7b326405@syzkaller.appspotmail.com Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | | ipv6: Revert "ipv6: Allow non-gateway ECMP for IPv6"Ido Schimmel2018-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit edd7ceb78296 ("ipv6: Allow non-gateway ECMP for IPv6"). Eric reported a division by zero in rt6_multipath_rebalance() which is caused by above commit that considers identical local routes to be siblings. The division by zero happens because a nexthop weight is not set for local routes. Revert the commit as it does not fix a bug and has side effects. To reproduce: # ip -6 address add 2001:db8::1/64 dev dummy0 # ip -6 address add 2001:db8::1/64 dev dummy1 Fixes: edd7ceb78296 ("ipv6: Allow non-gateway ECMP for IPv6") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: Allow non-gateway ECMP for IPv6Thomas Winter2018-05-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is valid to have static routes where the nexthop is an interface not an address such as tunnels. For IPv4 it was possible to use ECMP on these routes but not for IPv6. Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> Cc: David Ahern <dsahern@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: fix uninit-value in ip6_multipath_l3_keys()Eric Dumazet2018-05-01
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot/KMSAN reported an uninit-value in ip6_multipath_l3_keys(), root caused to a bad assumption of ICMP header being already pulled in skb->head ip_multipath_l3_keys() does the correct thing, so it is an IPv6 only bug. BUG: KMSAN: uninit-value in ip6_multipath_l3_keys net/ipv6/route.c:1830 [inline] BUG: KMSAN: uninit-value in rt6_multipath_hash+0x5c4/0x640 net/ipv6/route.c:1858 CPU: 0 PID: 4507 Comm: syz-executor661 Not tainted 4.16.0+ #87 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683 ip6_multipath_l3_keys net/ipv6/route.c:1830 [inline] rt6_multipath_hash+0x5c4/0x640 net/ipv6/route.c:1858 ip6_route_input+0x65a/0x920 net/ipv6/route.c:1884 ip6_rcv_finish+0x413/0x6e0 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:288 [inline] ipv6_rcv+0x1e16/0x2340 net/ipv6/ip6_input.c:208 __netif_receive_skb_core+0x47df/0x4a90 net/core/dev.c:4562 __netif_receive_skb net/core/dev.c:4627 [inline] netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701 netif_receive_skb+0x230/0x240 net/core/dev.c:4725 tun_rx_batched drivers/net/tun.c:1555 [inline] tun_get_user+0x740f/0x7c60 drivers/net/tun.c:1962 tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990 call_write_iter include/linux/fs.h:1782 [inline] new_sync_write fs/read_write.c:469 [inline] __vfs_write+0x7fb/0x9f0 fs/read_write.c:482 vfs_write+0x463/0x8d0 fs/read_write.c:544 SYSC_write+0x172/0x360 fs/read_write.c:589 SyS_write+0x55/0x80 fs/read_write.c:581 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: 23aebdacb05d ("ipv6: Compute multipath hash for ICMP errors from offending packet") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2018-04-23
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) Fix SIP conntrack with phones sending session descriptions for different media types but same port numbers, from Florian Westphal. 2) Fix incorrect rtnl_lock mutex logic from IPVS sync thread, from Julian Anastasov. 3) Skip compat array allocation in ebtables if there is no entries, also from Florian. 4) Do not lose left/right bits when shifting marks from xt_connmark, from Jack Ma. 5) Silence false positive memleak in conntrack extensions, from Cong Wang. 6) Fix CONFIG_NF_REJECT_IPV6=m link problems, from Arnd Bergmann. 7) Cannot kfree rule that is already in list in nf_tables, switch order so this error handling is not required, from Florian Westphal. 8) Release set name in error path, from Florian. 9) include kmemleak.h in nf_conntrack_extend.c, from Stepheh Rothwell. 10) NAT chain and extensions depend on NF_TABLES. 11) Out of bound access when renaming chains, from Taehee Yoo. 12) Incorrect casting in xt_connmark leads to wrong bitshifting. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: nf_tables: NAT chain and extensions require NF_TABLESPablo Neira Ayuso2018-04-19
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move these options inside the scope of the 'if' NF_TABLES and NF_TABLES_IPV6 dependencies. This patch fixes: net/ipv6/netfilter/nft_chain_nat_ipv6.o: In function `nft_nat_do_chain': >> net/ipv6/netfilter/nft_chain_nat_ipv6.c:37: undefined reference to `nft_do_chain' net/ipv6/netfilter/nft_chain_nat_ipv6.o: In function `nft_chain_nat_ipv6_exit': >> net/ipv6/netfilter/nft_chain_nat_ipv6.c:94: undefined reference to `nft_unregister_chain_type' net/ipv6/netfilter/nft_chain_nat_ipv6.o: In function `nft_chain_nat_ipv6_init': >> net/ipv6/netfilter/nft_chain_nat_ipv6.c:87: undefined reference to `nft_register_chain_type' that happens with: CONFIG_NF_TABLES=m CONFIG_NFT_CHAIN_NAT_IPV6=y Fixes: 02c7b25e5f54 ("netfilter: nf_tables: build-in filter chain type") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policyEric Dumazet2018-04-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | KMSAN reported use of uninit-value that I tracked to lack of proper size check on RTA_TABLE attribute. I also believe RTA_PREFSRC lacks a similar check. Fixes: 86872cb57925 ("[IPv6] route: FIB6 configuration using struct fib6_config") Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: sr: fix NULL pointer dereference in seg6_do_srh_encap()- v4 pktsAhmed Abdelsalam2018-04-22
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of seg6 in encap mode, seg6_do_srh_encap() calls set_tun_src() in order to set the src addr of outer IPv6 header. The net_device is required for set_tun_src(). However calling ip6_dst_idev() on dst_entry in case of IPv4 traffic results on the following bug. Using just dst->dev should fix this BUG. [ 196.242461] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 196.242975] PGD 800000010f076067 P4D 800000010f076067 PUD 10f060067 PMD 0 [ 196.243329] Oops: 0000 [#1] SMP PTI [ 196.243468] Modules linked in: nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd input_leds glue_helper led_class pcspkr serio_raw mac_hid video autofs4 hid_generic usbhid hid e1000 i2c_piix4 ahci pata_acpi libahci [ 196.244362] CPU: 2 PID: 1089 Comm: ping Not tainted 4.16.0+ #1 [ 196.244606] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 196.244968] RIP: 0010:seg6_do_srh_encap+0x1ac/0x300 [ 196.245236] RSP: 0018:ffffb2ce00b23a60 EFLAGS: 00010202 [ 196.245464] RAX: 0000000000000000 RBX: ffff8c7f53eea300 RCX: 0000000000000000 [ 196.245742] RDX: 0000f10000000000 RSI: ffff8c7f52085a6c RDI: ffff8c7f41166850 [ 196.246018] RBP: ffffb2ce00b23aa8 R08: 00000000000261e0 R09: ffff8c7f41166800 [ 196.246294] R10: ffffdce5040ac780 R11: ffff8c7f41166828 R12: ffff8c7f41166808 [ 196.246570] R13: ffff8c7f52085a44 R14: ffffffffb73211c0 R15: ffff8c7e69e44200 [ 196.246846] FS: 00007fc448789700(0000) GS:ffff8c7f59d00000(0000) knlGS:0000000000000000 [ 196.247286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 196.247526] CR2: 0000000000000000 CR3: 000000010f05a000 CR4: 00000000000406e0 [ 196.247804] Call Trace: [ 196.247972] seg6_do_srh+0x15b/0x1c0 [ 196.248156] seg6_output+0x3c/0x220 [ 196.248341] ? prandom_u32+0x14/0x20 [ 196.248526] ? ip_idents_reserve+0x6c/0x80 [ 196.248723] ? __ip_select_ident+0x90/0x100 [ 196.248923] ? ip_append_data.part.50+0x6c/0xd0 [ 196.249133] lwtunnel_output+0x44/0x70 [ 196.249328] ip_send_skb+0x15/0x40 [ 196.249515] raw_sendmsg+0x8c3/0xac0 [ 196.249701] ? _copy_from_user+0x2e/0x60 [ 196.249897] ? rw_copy_check_uvector+0x53/0x110 [ 196.250106] ? _copy_from_user+0x2e/0x60 [ 196.250299] ? copy_msghdr_from_user+0xce/0x140 [ 196.250508] sock_sendmsg+0x36/0x40 [ 196.250690] ___sys_sendmsg+0x292/0x2a0 [ 196.250881] ? _cond_resched+0x15/0x30 [ 196.251074] ? copy_termios+0x1e/0x70 [ 196.251261] ? _copy_to_user+0x22/0x30 [ 196.251575] ? tty_mode_ioctl+0x1c3/0x4e0 [ 196.251782] ? _cond_resched+0x15/0x30 [ 196.251972] ? mutex_lock+0xe/0x30 [ 196.252152] ? vvar_fault+0xd2/0x110 [ 196.252337] ? __do_fault+0x1f/0xc0 [ 196.252521] ? __handle_mm_fault+0xc1f/0x12d0 [ 196.252727] ? __sys_sendmsg+0x63/0xa0 [ 196.252919] __sys_sendmsg+0x63/0xa0 [ 196.253107] do_syscall_64+0x72/0x200 [ 196.253305] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 196.253530] RIP: 0033:0x7fc4480b0690 [ 196.253715] RSP: 002b:00007ffde9f252f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 196.254053] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007fc4480b0690 [ 196.254331] RDX: 0000000000000000 RSI: 000000000060a360 RDI: 0000000000000003 [ 196.254608] RBP: 00007ffde9f253f0 R08: 00000000002d1e81 R09: 0000000000000002 [ 196.254884] R10: 00007ffde9f250c0 R11: 0000000000000246 R12: 0000000000b22070 [ 196.255205] R13: 20c49ba5e353f7cf R14: 431bde82d7b634db R15: 00007ffde9f278fe [ 196.255484] Code: a5 0f b6 45 c0 41 88 41 28 41 0f b6 41 2c 48 c1 e0 04 49 8b 54 01 38 49 8b 44 01 30 49 89 51 20 49 89 41 18 48 8b 83 b0 00 00 00 <48> 8b 30 49 8b 86 08 0b 00 00 48 8b 40 20 48 8b 50 08 48 0b 10 [ 196.256190] RIP: seg6_do_srh_encap+0x1ac/0x300 RSP: ffffb2ce00b23a60 [ 196.256445] CR2: 0000000000000000 [ 196.256676] ---[ end trace 71af7d093603885c ]--- Fixes: 8936ef7604c11 ("ipv6: sr: fix NULL pointer dereference when setting encap source address") Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com> Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/ipv6: Increment OUTxxx counters after netfilter hookJeff Barnhill2018-04-05
| | | | | | | | | | | | | At the end of ip6_forward(), IPSTATS_MIB_OUTFORWDATAGRAMS and IPSTATS_MIB_OUTOCTETS are incremented immediately before the NF_HOOK call for NFPROTO_IPV6 / NF_INET_FORWARD. As a result, these counters get incremented regardless of whether or not the netfilter hook allows the packet to continue being processed. This change increments the counters in ip6_forward_finish() so that it will not happen if the netfilter hook chooses to terminate the packet, which is similar to how IPv4 works. Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vti6: better validate user provided tunnel namesEric Dumazet2018-04-05
| | | | | | | | | | Use valid_name() to make sure user does not provide illegal device name. Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ip6_tunnel: better validate user provided tunnel namesEric Dumazet2018-04-05
| | | | | | | | | Use valid_name() to make sure user does not provide illegal device name. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ip6_gre: better validate user provided tunnel namesEric Dumazet2018-04-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use dev_valid_name() to make sure user does not provide illegal device name. syzbot caught the following bug : BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline] BUG: KASAN: stack-out-of-bounds in ip6gre_tunnel_locate+0x334/0x860 net/ipv6/ip6_gre.c:339 Write of size 20 at addr ffff8801afb9f7b8 by task syzkaller851048/4466 CPU: 1 PID: 4466 Comm: syzkaller851048 Not tainted 4.16.0+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x1b9/0x29f lib/dump_stack.c:53 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 memcpy+0x37/0x50 mm/kasan/kasan.c:303 strlcpy include/linux/string.h:300 [inline] ip6gre_tunnel_locate+0x334/0x860 net/ipv6/ip6_gre.c:339 ip6gre_tunnel_ioctl+0x69d/0x12e0 net/ipv6/ip6_gre.c:1195 dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334 dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525 sock_ioctl+0x47e/0x680 net/socket.c:1015 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 SYSC_ioctl fs/ioctl.c:708 [inline] SyS_ioctl+0x24/0x30 fs/ioctl.c:706 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: sit: better validate user provided tunnel namesEric Dumazet2018-04-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use dev_valid_name() to make sure user does not provide illegal device name. syzbot caught the following bug : BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline] BUG: KASAN: stack-out-of-bounds in ipip6_tunnel_locate+0x63b/0xaa0 net/ipv6/sit.c:254 Write of size 33 at addr ffff8801b64076d8 by task syzkaller932654/4453 CPU: 0 PID: 4453 Comm: syzkaller932654 Not tainted 4.16.0+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x1b9/0x29f lib/dump_stack.c:53 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 memcpy+0x37/0x50 mm/kasan/kasan.c:303 strlcpy include/linux/string.h:300 [inline] ipip6_tunnel_locate+0x63b/0xaa0 net/ipv6/sit.c:254 ipip6_tunnel_ioctl+0xe71/0x241b net/ipv6/sit.c:1221 dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334 dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525 sock_ioctl+0x47e/0x680 net/socket.c:1015 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 SYSC_ioctl fs/ioctl.c:708 [inline] SyS_ioctl+0x24/0x30 fs/ioctl.c:706 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: frags: fix ip6frag_low_thresh boundaryEric Dumazet2018-04-04
| | | | | | | | | | | | | | | | | Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches, since linker might place next to it a non zero value preventing a change to ip6frag_low_thresh. ip6frag_low_thresh is not used anymore in the kernel, but we do not want to prematuraly break user scripts wanting to change it. Since specifying a minimal value of 0 for proc_doulongvec_minmax() is moot, let's remove these zero values in all defrag units. Fixes: 6e00f7dd5e4e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: avoid unneeded atomic operation in ip*_append_data()Paolo Abeni2018-04-04
| | | | | | | | | | | | | | | | | | | | | | | | After commit 694aba690de0 ("ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()") and commit 1f4c6eb24029 ("ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()"), when transmitting sub MTU datagram, an addtional, unneeded atomic operation is performed in ip*_append_data() to update wmem_alloc: in the above condition the delta is 0. The above cause small but measurable performance regression in UDP xmit tput test with packet size below MTU. This change avoids such overhead updating wmem_alloc only if wmem_alloc_delta is non zero. The error path is left intentionally unmodified: it's a slow path and simplicity is preferred to performances. Fixes: 694aba690de0 ("ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()") Fixes: 1f4c6eb24029 ("ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: udp: set dst cache for a connected sk if current not validAlexey Kodanev2018-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new RTF_CACHE route can be created between ip6_sk_dst_lookup_flow() and ip6_dst_store() calls in udpv6_sendmsg(), when datagram sending results to ICMPV6_PKT_TOOBIG error: udp_v6_send_skb(), for example with vti6 tunnel: vti6_xmit(), get ICMPV6_PKT_TOOBIG error skb_dst_update_pmtu(), can create a RTF_CACHE clone icmpv6_send() ... udpv6_err() ip6_sk_update_pmtu() ip6_update_pmtu(), can create a RTF_CACHE clone ... ip6_datagram_dst_update() ip6_dst_store() And after commit 33c162a980fe ("ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update"), the UDPv6 error handler can update socket's dst cache, but it can happen before the update in the end of udpv6_sendmsg(), preventing getting the new dst cache on the next udpv6_sendmsg() calls. In order to fix it, save dst in a connected socket only if the current socket's dst cache is invalid. The previous patch prepared ip6_sk_dst_lookup_flow() to do that with the new argument, and this patch enables it in udpv6_sendmsg(). Fixes: 33c162a980fe ("ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update") Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: udp: convert 'connected' to bool type in udpv6_sendmsg()Alexey Kodanev2018-04-04
| | | | | | | | This should make it consistent with ip6_sk_dst_lookup_flow() that is accepting the new 'connected' parameter of type bool. Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow()Alexey Kodanev2018-04-04
| | | | | | | | | | | | Add 'connected' parameter to ip6_sk_dst_lookup_flow() and update the cache only if ip6_sk_dst_check() returns NULL and a socket is connected. The function is used as before, the new behavior for UDP sockets in udpv6_sendmsg() will be enabled in the next patch. Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: add a wrapper for ip6_dst_store() with flowi6 checksAlexey Kodanev2018-04-04
| | | | | | | | | | | | Move commonly used pattern of ip6_dst_store() usage to a separate function - ip6_sk_dst_store_flow(), which will check the addresses for equality using the flow information, before saving them. There is no functional changes in this patch. In addition, it will be used in the next patch, in ip6_sk_dst_lookup_flow(). Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_threshEric Dumazet2018-04-02
| | | | | | | | | | I forgot to change ip6frag_low_thresh proc_handler from proc_dointvec_minmax to proc_doulongvec_minmax Fixes: 3e67f106f619 ("inet: frags: break the 2GB limit for frags storage") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-04-01
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c, we had some overlapping changes: 1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE --> MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE 2) In 'net-next' params->log_rq_size is renamed to be params->log_rq_mtu_frames. 3) In 'net-next' params->hard_mtu is added. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/ipv6: Fix route leaking between VRFsDavid Ahern2018-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Donald reported that IPv6 route leaking between VRFs is not working. The root cause is the strict argument in the call to rt6_lookup when validating the nexthop spec. ip6_route_check_nh validates the gateway and device (if given) of a route spec. It in turn could call rt6_lookup (e.g., lookup in a given table did not succeed so it falls back to a full lookup) and if so sets the strict argument to 1. That means if the egress device is given, the route lookup needs to return a result with the same device. This strict requirement does not work with VRFs (IPv4 or IPv6) because the oif in the flow struct is overridden with the index of the VRF device to trigger a match on the l3mdev rule and force the lookup to its table. The right long term solution is to add an l3mdev index to the flow struct such that the oif is not overridden. That solution will not backport well, so this patch aims for a simpler solution to relax the strict argument if the route spec device is an l3mdev slave. As done in other places, use the FLOWI_FLAG_SKIP_NH_OIF to know that the RT6_LOOKUP_F_IFACE flag needs to be removed. Fixes: ca254490c8df ("net: Add VRF support to IPv6 stack") Reported-by: Donald Sharp <sharpd@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: sr: fix seg6 encap performances with TSO enabledDavid Lebrun2018-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enabling TSO can lead to abysmal performances when using seg6 in encap mode, such as with the ixgbe driver. This patch adds a call to iptunnel_handle_offloads() to remove the encapsulation bit if needed. Before: root@comp4-seg6bpf:~# iperf3 -c fc00::55 Connecting to host fc00::55, port 5201 [ 4] local fc45::4 port 36592 connected to fc00::55 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 196 KBytes 1.60 Mbits/sec 47 6.66 KBytes [ 4] 1.00-2.00 sec 304 KBytes 2.49 Mbits/sec 100 5.33 KBytes [ 4] 2.00-3.00 sec 284 KBytes 2.32 Mbits/sec 92 5.33 KBytes After: root@comp4-seg6bpf:~# iperf3 -c fc00::55 Connecting to host fc00::55, port 5201 [ 4] local fc45::4 port 43062 connected to fc00::55 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 1.03 GBytes 8.89 Gbits/sec 0 743 KBytes [ 4] 1.00-2.00 sec 1.03 GBytes 8.87 Gbits/sec 0 743 KBytes [ 4] 2.00-3.00 sec 1.03 GBytes 8.87 Gbits/sec 0 743 KBytes Reported-by: Tom Herbert <tom@quantonium.net> Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'master' of ↵David S. Miller2018-03-29
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-03-29 1) Fix a rcu_read_lock/rcu_read_unlock imbalance in the error path of xfrm_local_error(). From Taehee Yoo. 2) Some VTI MTU fixes. From Stefano Brivio. 3) Fix a too early overwritten skb control buffer on xfrm transport mode. Please note that this pull request has a merge conflict in net/ipv4/ip_tunnel.c. The conflict is between commit f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes") from the net tree and commit 24fc79798b8d ("ip_tunnel: Clamp MTU to bounds on new link") from the ipsec tree. It can be solved as it is currently done in linux-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * vti6: Fix dev->max_mtu settingStefano Brivio2018-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We shouldn't allow a tunnel to have IP_MAX_MTU as MTU, because another IPv6 header is going on top of our packets. Without this patch, we might end up building packets bigger than IP_MAX_MTU. Fixes: b96f9afee4eb ("ipv4/6: use core net MTU range checking") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * vti6: Keep set MTU on link creation or change, validate itStefano Brivio2018-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In vti6_link_config(), if MTU is already given on link creation or change, validate and use it instead of recomputing it. To do that, we need to propagate the knowledge that MTU was set by userspace all the way down to vti6_link_config(). To keep this simple, vti6_dev_init() sets the new 'keep_mtu' argument of vti6_link_config() to true: on initialization, we don't have convenient access to netlink attributes there, but we will anyway check whether dev->mtu is set in vti6_link_config(). If it's non-zero, it was set to the value of the IFLA_MTU attribute during creation. Otherwise, determine a reasonable value. Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces") Fixes: 53c81e95df17 ("ip6_vti: adjust vti mtu according to mtu of lower device") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * vti6: Properly adjust vti6 MTU from MTU of lower deviceStefano Brivio2018-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a lower device is found, we don't need to subtract LL_MAX_HEADER to calculate our MTU: just use its MTU, the link layer headers are already taken into account by it. If the lower device is not found, start from ETH_DATA_LEN instead, and only in this case subtract a worst-case LL_MAX_HEADER. We then need to subtract our additional IPv6 header from the calculation. While at it, note that vti6 doesn't have a hardware header, so it doesn't need to set dev->hard_header_len. And as vti6_link_config() now always sets the MTU, there's no need to set a default value in vti6_dev_setup(). This makes the behaviour consistent with IPv4 vti, after commit a32452366b72 ("vti4: Don't count header length twice."), which was accidentally reverted by merge commit f895f0cfbb77 ("Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec"). While commit 53c81e95df17 ("ip6_vti: adjust vti mtu according to mtu of lower device") improved on the original situation, this was still not ideal. As reported in that commit message itself, if we start from an underlying veth MTU of 9000, we end up with an MTU of 8832, that is, 9000 - LL_MAX_HEADER - sizeof(ipv6hdr). This should simply be 8880, or 9000 - sizeof(ipv6hdr) instead: we found the lower device (veth) and we know we don't have any additional link layer header, so there's no need to subtract an hypothetical worst-case number. Fixes: 53c81e95df17 ("ip6_vti: adjust vti mtu according to mtu of lower device") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | ipv6: the entire IPv6 header chain must fit the first fragmentPaolo Abeni2018-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While building ipv6 datagram we currently allow arbitrary large extheaders, even beyond pmtu size. The syzbot has found a way to exploit the above to trigger the following splat: kernel BUG at ./include/linux/skbuff.h:2073! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4230 Comm: syzkaller672661 Not tainted 4.16.0-rc2+ #326 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__skb_pull include/linux/skbuff.h:2073 [inline] RIP: 0010:__ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636 RSP: 0018:ffff8801bc18f0f0 EFLAGS: 00010293 RAX: ffff8801b17400c0 RBX: 0000000000000738 RCX: ffffffff84f01828 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801b415ac18 RBP: ffff8801bc18f360 R08: ffff8801b4576844 R09: 0000000000000000 R10: ffff8801bc18f380 R11: ffffed00367aee4e R12: 00000000000000d6 R13: ffff8801b415a740 R14: dffffc0000000000 R15: ffff8801b45767c0 FS: 0000000001535880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000002000b000 CR3: 00000001b4123001 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip6_finish_skb include/net/ipv6.h:969 [inline] udp_v6_push_pending_frames+0x269/0x3b0 net/ipv6/udp.c:1073 udpv6_sendmsg+0x2a96/0x3400 net/ipv6/udp.c:1343 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg+0xca/0x110 net/socket.c:640 ___sys_sendmsg+0x320/0x8b0 net/socket.c:2046 __sys_sendmmsg+0x1ee/0x620 net/socket.c:2136 SYSC_sendmmsg net/socket.c:2167 [inline] SyS_sendmmsg+0x35/0x60 net/socket.c:2162 do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x4404c9 RSP: 002b:00007ffdce35f948 EFLAGS: 00000217 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004404c9 RDX: 0000000000000003 RSI: 0000000020001f00 RDI: 0000000000000003 RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8 R10: 0000000020000080 R11: 0000000000000217 R12: 0000000000401df0 R13: 0000000000401e80 R14: 0000000000000000 R15: 0000000000000000 Code: ff e8 1d 5e b9 fc e9 15 e9 ff ff e8 13 5e b9 fc e9 44 e8 ff ff e8 29 5e b9 fc e9 c0 e6 ff ff e8 3f f3 80 fc 0f 0b e8 38 f3 80 fc <0f> 0b 49 8d 87 80 00 00 00 4d 8d 87 84 00 00 00 48 89 85 20 fe RIP: __skb_pull include/linux/skbuff.h:2073 [inline] RSP: ffff8801bc18f0f0 RIP: __ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636 RSP: ffff8801bc18f0f0 As stated by RFC 7112 section 5: When a host fragments an IPv6 datagram, it MUST include the entire IPv6 Header Chain in the First Fragment. So this patch addresses the issue dropping datagrams with excessive extheader length. It also updates the error path to report to the calling socket nonnegative pmtu values. The issue apparently predates git history. v1 -> v2: cleanup error path, as per Eric's suggestion Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+91e6f9932ff122fa4410@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net/ipv4: disable SMC TCP option with SYN CookiesHans Wippel2018-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the SMC experimental TCP option in a SYN packet is lost on the server side when SYN Cookies are active. However, the corresponding SYNACK sent back to the client contains the SMC option. This causes an inconsistent view of the SMC capabilities on the client and server. This patch disables the SMC option in the SYNACK when SYN Cookies are active to avoid this issue. Fixes: 60e2a7780793b ("tcp: TCP experimental option for SMC") Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2018-03-24
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Don't pick fixed hash implementation for NFT_SET_EVAL sets, otherwise userspace hits EOPNOTSUPP with valid rules using the meter statement, from Florian Westphal. 2) If you send a batch that flushes the existing ruleset (that contains a NAT chain) and the new ruleset definition comes with a new NAT chain, don't bogusly hit EBUSY. Also from Florian. 3) Missing netlink policy attribute validation, from Florian. 4) Detach conntrack template from skbuff if IP_NODEFRAG is set on, from Paolo Abeni. 5) Cache device names in flowtable object, otherwise we may end up walking over devices going aways given no rtnl_lock is held. 6) Fix incorrect net_device ingress with ingress hooks. 7) Fix crash when trying to read more data than available in UDP packets from the nf_socket infrastructure, from Subash. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>