aboutsummaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAge
...
* | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2018-11-01
|\ \ \ \ \ \ \ | |_|_|_|/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) BPF verifier fixes from Daniel Borkmann. 2) HNS driver fixes from Huazhong Tan. 3) FDB only works for ethernet devices, reject attempts to install FDB rules for others. From Ido Schimmel. 4) Fix spectre V1 in vhost, from Jason Wang. 5) Don't pass on-stack object to irq_set_affinity_hint() in mvpp2 driver, from Marc Zyngier. 6) Fix mlx5e checksum handling when RXFCS is enabled, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits) openvswitch: Fix push/pop ethernet validation net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules bpf: test make sure to run unpriv test cases in test_verifier bpf: add various test cases to test_verifier bpf: don't set id on after map lookup with ptr_to_map_val return bpf: fix partial copy of map_ptr when dst is scalar libbpf: Fix compile error in libbpf_attach_type_by_name kselftests/bpf: use ping6 as the default ipv6 ping binary if it exists selftests: mlxsw: qos_mc_aware: Add a test for UC awareness selftests: mlxsw: qos_mc_aware: Tweak for min shaper mlxsw: spectrum: Set minimum shaper on MC TCs mlxsw: reg: QEEC: Add minimum shaper fields net: hns3: bugfix for rtnl_lock's range in the hclgevf_reset() net: hns3: bugfix for rtnl_lock's range in the hclge_reset() net: hns3: bugfix for handling mailbox while the command queue reinitialized net: hns3: fix incorrect return value/type of some functions net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read net: hns3: bugfix for is_valid_csq_clean_head() net: hns3: remove unnecessary queue reset in the hns3_uninit_all_ring() net: hns3: bugfix for the initialization of command queue's spin lock ...
| * | | | | | openvswitch: Fix push/pop ethernet validationJaime Caamaño Ruiz2018-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there are both pop and push ethernet header actions among the actions to be applied to a packet, an unexpected EINVAL (Invalid argument) error is obtained. This is due to mac_proto not being reset correctly when those actions are validated. Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047554.html Fixes: 91820da6ae85 ("openvswitch: add Ethernet push and pop actions") Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | Merge branch '10GbE' of ↵David S. Miller2018-10-31
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2018-10-31 This series contains a various collection of fixes. Miroslav Lichvar from Red Hat or should I say IBM now? Updates the PHC timecounter interval for igb so that it gets updated at least once every 550 seconds. Ngai-Mint provides a fix for fm10k to prevent a soft lockup or system crash by adding a new condition to determine if the SM mailbox is in the correct state before proceeding. Jake provides several fm10k fixes, first one marks complier aborts as non-fatal since on some platforms trigger machine check errors when the compile aborts. Added missing device ids to the in-kernel driver. Due to the recent fixes, bumped the driver version. I (Jeff Kirsher) fixed a XFRM_ALGO dependency for both ixgbe and ixgbevf. This fix was based on the original work from Arnd Bergmann, which only fixed ixgbe. Mitch provides a fix for i40e/avf to update the status codes, which resolves an issue between a mis-match between i40e and the iavf driver, which also supports the ice LAN driver. Radoslaw fixes the ixgbe where the driver is logging a message about spoofed packets detected when the VF is re-started with a different MAC address. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | ixgbe/ixgbevf: fix XFRM_ALGO dependencyJeff Kirsher2018-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on the original work from Arnd Bergmann. When XFRM_ALGO is not enabled, the new ixgbe IPsec code produces a link error: drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.o: In function `ixgbe_ipsec_vf_add_sa': ixgbe_ipsec.c:(.text+0x1266): undefined reference to `xfrm_aead_get_byname' Simply selecting XFRM_ALGO from here causes circular dependencies, so to fix it, we probably want this slightly more complex solution that is similar to what other drivers with XFRM offload do: A separate Kconfig symbol now controls whether we include the IPsec offload code. To keep the old behavior, this is left as 'default y'. The dependency in XFRM_OFFLOAD still causes a circular dependency but is not actually needed because this symbol is not user visible, so removing that dependency on top makes it all work. CC: Arnd Bergmann <arnd@arndb.de> CC: Shannon Nelson <shannon.nelson@oracle.com> Fixes: eda0333ac293 ("ixgbe: add VF IPsec management") Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
| * | | | | | | bpf: tcp_bpf_recvmsg should return EAGAIN when nonblocking and no dataJohn Fastabend2018-10-30
| |/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We return 0 in the case of a nonblocking socket that has no data available. However, this is incorrect and may confuse applications. After this patch we do the correct thing and return the error EAGAIN. Quoting return codes from recvmsg manpage, EAGAIN or EWOULDBLOCK The socket is marked nonblocking and the receive operation would block, or a receive timeout had been set and the timeout expired before data was received. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | | | | | rtnetlink: Disallow FDB configuration for non-Ethernet deviceIdo Schimmel2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an FDB entry is configured, the address is validated to have the length of an Ethernet address, but the device for which the address is configured can be of any type. The above can result in the use of uninitialized memory when the address is later compared against existing addresses since 'dev->addr_len' is used and it may be greater than ETH_ALEN, as with ip6tnl devices. Fix this by making sure that FDB entries are only configured for Ethernet devices. BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863 CPU: 1 PID: 4318 Comm: syz-executor998 Not tainted 4.19.0-rc3+ #49 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x14b/0x190 lib/dump_stack.c:113 kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956 __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645 memcmp+0x11d/0x180 lib/string.c:863 dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464 ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline] rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558 rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715 netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg net/socket.c:631 [inline] ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114 __sys_sendmsg net/socket.c:2152 [inline] __do_sys_sendmsg net/socket.c:2161 [inline] __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159 do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 RIP: 0033:0x440ee9 Code: e8 cc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff6a93b518 EFLAGS: 00000213 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440ee9 RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003 RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8 R10: 00000000004002c8 R11: 0000000000000213 R12: 000000000000b4b0 R13: 0000000000401ec0 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:256 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:181 kmsan_kmalloc+0x98/0x100 mm/kmsan/kmsan_hooks.c:91 kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:100 slab_post_alloc_hook mm/slab.h:446 [inline] slab_alloc_node mm/slub.c:2718 [inline] __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:996 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline] netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg net/socket.c:631 [inline] ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114 __sys_sendmsg net/socket.c:2152 [inline] __do_sys_sendmsg net/socket.c:2161 [inline] __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159 do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 v2: * Make error message more specific (David) Fixes: 090096bf3db1 ("net: generic fdb support for drivers without ndo_fdb_<op>") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: David Ahern <dsahern@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | sctp: check policy more carefully when getting pr statusXin Long2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When getting pr_assocstatus and pr_streamstatus by sctp_getsockopt, it doesn't correctly process the case when policy is set with SCTP_PR_SCTP_ALL | SCTP_PR_SCTP_MASK. It even causes a slab-out-of-bounds in sctp_getsockopt_pr_streamstatus(). This patch fixes it by return -EINVAL for this case. Fixes: 0ac1077e3a54 ("sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL") Reported-by: syzbot+5da0d0a72a9e7d791748@syzkaller.appspotmail.com Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | sctp: clear the transport of some out_chunk_list chunks in sctp_assoc_rm_peerXin Long2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a transport is removed by asconf but there still are some chunks with this transport queuing on out_chunk_list, later an use-after-free issue will be caused when accessing this transport from these chunks in sctp_outq_flush(). This is an old bug, we fix it by clearing the transport of these chunks in out_chunk_list when removing a transport in sctp_assoc_rm_peer(). Reported-by: syzbot+56a40ceee5fb35932f4d@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | ipv4/igmp: fix v1/v2 switchback timeout based on rfc3376, 8.12Hangbin Liu2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similiar with ipv6 mcast commit 89225d1ce6af3 ("net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12.") i) RFC3376 8.12. Older Version Querier Present Timeout says: The Older Version Querier Interval is the time-out for transitioning a host back to IGMPv3 mode once an older version query is heard. When an older version query is received, hosts set their Older Version Querier Present Timer to Older Version Querier Interval. This value MUST be ((the Robustness Variable) times (the Query Interval in the last Query received)) plus (one Query Response Interval). Currently we only use a hardcode value IGMP_V1/v2_ROUTER_PRESENT_TIMEOUT. Fix it by adding two new items mr_qi(Query Interval) and mr_qri(Query Response Interval) in struct in_device. Now we can calculate the switchback time via (mr_qrv * mr_qi) + mr_qri. We need update these values when receive IGMPv3 queries. Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | Merge tag 'ceph-for-4.20-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2018-10-31
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: "The highlights are: - a series that fixes some old memory allocation issues in libceph (myself). We no longer allocate memory in places where allocation failures cannot be handled and BUG when the allocation fails. - support for copy_file_range() syscall (Luis Henriques). If size and alignment conditions are met, it leverages RADOS copy-from operation. Otherwise, a local copy is performed. - a patch that reduces memory requirement of ceph_sync_read() from the size of the entire read to the size of one object (Zheng Yan). - fallocate() syscall is now restricted to FALLOC_FL_PUNCH_HOLE (Luis Henriques)" * tag 'ceph-for-4.20-rc1' of git://github.com/ceph/ceph-client: (25 commits) ceph: new mount option to disable usage of copy-from op ceph: support copy_file_range file operation libceph: support the RADOS copy-from operation ceph: add non-blocking parameter to ceph_try_get_caps() libceph: check reply num_data_items in setup_request_data() libceph: preallocate message data items libceph, rbd, ceph: move ceph_osdc_alloc_messages() calls libceph: introduce alloc_watch_request() libceph: assign cookies in linger_submit() libceph: enable fallback to ceph_msg_new() in ceph_msgpool_get() ceph: num_ops is off by one in ceph_aio_retry_work() libceph: no need to call osd_req_opcode_valid() in osd_req_encode_op() ceph: set timeout conditionally in __cap_delay_requeue libceph: don't consume a ref on pagelist in ceph_msg_data_add_pagelist() libceph: introduce ceph_pagelist_alloc() libceph: osd_req_op_cls_init() doesn't need to take opcode libceph: bump CEPH_MSG_MAX_DATA_LEN ceph: only allow punch hole mode in fallocate ceph: refactor ceph_sync_read() ceph: check if LOOKUPNAME request was aborted when filling trace ...
| * | | | | | | libceph: support the RADOS copy-from operationLuis Henriques2018-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for performing remote object copies using the 'copy-from' operation. [ Add COPY_FROM to get_num_data_items(). ] Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | libceph: check reply num_data_items in setup_request_data()Ilya Dryomov2018-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | setup_request_data() adds message data items to both request and reply messages, but only checks request num_data_items before proceeding with the loop. This is wrong because if an op doesn't have any request data items but has a reply data item (e.g. read), a duplicate data item gets added to the message on every resend attempt. This went unnoticed for years but now that message data items are preallocated, it promptly crashes in ceph_msg_data_add(). Amend the signature to make it clear that setup_request_data() operates on both request and reply messages. Also, remove data_len assert -- we have another one in prepare_write_message(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | libceph: preallocate message data itemsIlya Dryomov2018-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently message data items are allocated with ceph_msg_data_create() in setup_request_data() inside send_request(). send_request() has never been allowed to fail, so each allocation is followed by a BUG_ON: data = ceph_msg_data_create(...); BUG_ON(!data); It's been this way since support for multiple message data items was added in commit 6644ed7b7e04 ("libceph: make message data be a pointer") in 3.10. There is no reason to delay the allocation of message data items until the last possible moment and we certainly don't need a linked list of them as they are only ever appended to the end and never erased. Make ceph_msg_new2() take max_data_items and adapt the rest of the code. Reported-by: Jerry Lee <leisurelysw24@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | libceph, rbd, ceph: move ceph_osdc_alloc_messages() callsIlya Dryomov2018-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current requirement is that ceph_osdc_alloc_messages() should be called after oid and oloc are known. In preparation for preallocating message data items, move ceph_osdc_alloc_messages() further down, so that it is called when OSD op codes are known. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | libceph: introduce alloc_watch_request()Ilya Dryomov2018-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ceph_osdc_alloc_messages() call will be moved out of alloc_linger_request() in the next commit, which means that ceph_osdc_watch() will need to call ceph_osdc_alloc_messages() twice. Add a helper for that. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | libceph: assign cookies in linger_submit()Ilya Dryomov2018-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Register lingers directly in linger_submit(). This avoids allocating memory for notify pagelist while holding osdc->lock and simplifies both callers of linger_submit(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | libceph: enable fallback to ceph_msg_new() in ceph_msgpool_get()Ilya Dryomov2018-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ceph_msgpool_get() can fall back to ceph_msg_new() when it is asked for a message whose front portion is larger than pool->front_len. However the caller always passes 0, effectively disabling that code path. The allocation goes to the message pool and returns a message with a front that is smaller than requested, setting us up for a crash. One example of this is a directory with a large number of snapshots. If its snap context doesn't fit, we oops in encode_request_partial(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | libceph: no need to call osd_req_opcode_valid() in osd_req_encode_op()Ilya Dryomov2018-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Any uninitialized or unknown ops will be caught by the default clause anyway. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | libceph: don't consume a ref on pagelist in ceph_msg_data_add_pagelist()Ilya Dryomov2018-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because send_mds_reconnect() wants to send a message with a pagelist and pass the ownership to the messenger, ceph_msg_data_add_pagelist() consumes a ref which is then put in ceph_msg_data_destroy(). This makes managing pagelists in the OSD client (where they are wrapped in ceph_osd_data) unnecessarily hard because the handoff only happens in ceph_osdc_start_request() instead of when the pagelist is passed to ceph_osd_data_pagelist_init(). I counted several memory leaks on various error paths. Fix up ceph_msg_data_add_pagelist() and carry a pagelist ref in ceph_osd_data. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | libceph: introduce ceph_pagelist_alloc()Ilya Dryomov2018-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct ceph_pagelist cannot be embedded into anything else because it has its own refcount. Merge allocation and initialization together. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | libceph: osd_req_op_cls_init() doesn't need to take opcodeIlya Dryomov2018-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | | | | | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2018-10-31
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge more updates from Andrew Morton: - the rest of MM - lib/bitmap updates - hfs updates - fatfs updates - various other misc things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits) mm/gup.c: fix __get_user_pages_fast() comment mm: Fix warning in insert_pfn() memory-hotplug.rst: add some details about locking internals powerpc/powernv: hold device_hotplug_lock when calling memtrace_offline_pages() powerpc/powernv: hold device_hotplug_lock when calling device_online() mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock mm/memory_hotplug: make add_memory() take the device_hotplug_lock mm/memory_hotplug: make remove_memory() take the device_hotplug_lock mm/memblock.c: warn if zero alignment was requested memblock: stop using implicit alignment to SMP_CACHE_BYTES docs/boot-time-mm: remove bootmem documentation mm: remove include/linux/bootmem.h memblock: replace BOOTMEM_ALLOC_* with MEMBLOCK variants mm: remove nobootmem memblock: rename __free_pages_bootmem to memblock_free_pages memblock: rename free_all_bootmem to memblock_free_all memblock: replace free_bootmem_late with memblock_free_late memblock: replace free_bootmem{_node} with memblock_free mm: nobootmem: remove bootmem allocation APIs memblock: replace alloc_bootmem with memblock_alloc ...
| * | | | | | | | mm: remove include/linux/bootmem.hMike Rapoport2018-10-31
| | |/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move remaining definitions and declarations from include/linux/bootmem.h into include/linux/memblock.h and remove the redundant header. The includes were replaced with the semantic patch below and then semi-automated removal of duplicated '#include <linux/memblock.h> @@ @@ - #include <linux/bootmem.h> + #include <linux/memblock.h> [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal] Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | Merge tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2018-10-30
|\ \ \ \ \ \ \ \ | |/ / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "Olga added support for the NFSv4.2 asynchronous copy protocol. We already supported COPY, by copying a limited amount of data and then returning a short result, letting the client resend. The asynchronous protocol should offer better performance at the expense of some complexity. The other highlight is Trond's work to convert the duplicate reply cache to a red-black tree, and to move it and some other server caches to RCU. (Previously these have meant taking global spinlocks on every RPC) Otherwise, some RDMA work and miscellaneous bugfixes" * tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linux: (30 commits) lockd: fix access beyond unterminated strings in prints nfsd: Fix an Oops in free_session() nfsd: correctly decrement odstate refcount in error path svcrdma: Increase the default connection credit limit svcrdma: Remove try_module_get from backchannel svcrdma: Remove ->release_rqst call in bc reply handler svcrdma: Reduce max_send_sges nfsd: fix fall-through annotations knfsd: Improve lookup performance in the duplicate reply cache using an rbtree knfsd: Further simplify the cache lookup knfsd: Simplify NFS duplicate replay cache knfsd: Remove dead code from nfsd_cache_lookup SUNRPC: Simplify TCP receive code SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock SUNRPC: Remove non-RCU protected lookup NFS: Fix up a typo in nfs_dns_ent_put NFS: Lockless DNS lookups knfsd: Lockless lookup of NFSv4 identities. SUNRPC: Lockless server RPCSEC_GSS context lookup knfsd: Allow lockless lookups of the exports ...
| * | | | | | | nfsd: Fix an Oops in free_session()Trond Myklebust2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In call_xpt_users(), we delete the entry from the list, but we do not reinitialise it. This triggers the list poisoning when we later call unregister_xpt_user() in nfsd4_del_conns(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | svcrdma: Remove try_module_get from backchannelChuck Lever2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit ffe1f0df5862 ("rpcrdma: Merge svcrdma and xprtrdma modules into one"), the forward and backchannel components are part of the same kernel module. A separate try_module_get() call in the backchannel code is no longer necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | svcrdma: Remove ->release_rqst call in bc reply handlerChuck Lever2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to a change made in the client's forward channel reply handler: The xprt_release_rqst_cong() call is not necessary. Also, release xprt->recv_lock when taking xprt->transport_lock to avoid disabling and enabling BH's while holding another spin lock. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | svcrdma: Reduce max_send_sgesChuck Lever2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no need to request a large number of send SGEs because the inline threshold already constrains the number of SGEs per Send. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | SUNRPC: Simplify TCP receive codeTrond Myklebust2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the fact that the iov iterators already have functionality for skipping a base offset. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | SUNRPC: Replace the cache_detail->hash_lock with a regular spinlockTrond Myklebust2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the reader functions are all RCU protected, use a regular spinlock rather than a reader/writer lock. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | SUNRPC: Remove non-RCU protected lookupTrond Myklebust2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up the cache code by removing the non-RCU protected lookup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | SUNRPC: Lockless server RPCSEC_GSS context lookupTrond Myklebust2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use RCU protection for looking up the RPCSEC_GSS context. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | SUNRPC: Make server side AUTH_UNIX use lockless lookupsTrond Myklebust2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert structs ip_map and unix_gid to use RCU protected lookups. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlockTrond Myklebust2018-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of the reader/writer spinlock, allow cache lookups to use RCU for looking up entries. This is more efficient since modifications can occur while other entries are being looked up. Note that for now, we keep the reader/writer spinlock until all users have been converted to use RCU-safe freeing of their cache entries. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | SUNRPC: Refactor sunrpc_cache_lookupTrond Myklebust2018-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a trivial split into lookup and insert functions, no change in behavior. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | SUNRPC: Add lockless lookup of the server's auth domainTrond Myklebust2018-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid taking the global auth_domain_lock in most lookups of the auth domain by adding an RCU protected lookup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | SUNRPC: Remove the server 'authtab_lock' and just use RCUTrond Myklebust2018-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Module removal is RCU safe by design, so we really have no need to lock the 'authtab[]' array. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | | | | | Merge tag '9p-for-4.20' of git://github.com/martinetd/linuxLinus Torvalds2018-10-29
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull 9p updates from Dominique Martinet: "Highlights this time around are the end of Matthew's work to remove the custom 9p request cache and use a slab directly for requests, with some extra patches on my end to not degrade performance, but it's a very good cleanup. Tomas and I fixed a few more syzkaller bugs (refcount is the big one), and I had a go at the coverity bugs and at some of the bugzilla reports we had open for a while. I'm a bit disappointed that I couldn't get much reviews for a few of my own patches, but the big ones got some and it's all been soaking in linux-next for quite a while so I think it should be OK. Summary: - Finish removing the custom 9p request cache mechanism - Embed part of the fcall in the request to have better slab performance (msize usually is power of two aligned) - syzkaller fixes: * add a refcount to 9p requests to avoid use after free * a few double free issues - A few coverity fixes - Some old patches that were in the bugzilla: * do not trust pdu content for size header * mount option for lock retry interval" * tag '9p-for-4.20' of git://github.com/martinetd/linux: (21 commits) 9p/trans_fd: put worker reqs on destroy 9p/trans_fd: abort p9_read_work if req status changed 9p: potential NULL dereference 9p locks: fix glock.client_id leak in do_lock 9p: p9dirent_read: check network-provided name length 9p/rdma: remove useless check in cm_event_handler 9p: acl: fix uninitialized iattr access 9p locks: add mount option for lock retry interval 9p: do not trust pdu content for stat item size 9p: Rename req to rreq in trans_fd 9p: fix spelling mistake in fall-through annotation 9p/rdma: do not disconnect on down_interruptible EAGAIN 9p: Add refcount to p9_req_t 9p: rename p9_free_req() function 9p: add a per-client fcall kmem_cache 9p: embed fcall in req to round down buffer allocs 9p: Remove p9_idpool 9p: Use a slab for allocating requests 9p: clear dangling pointers in p9stat_free v9fs_dir_readdir: fix double-free on p9stat_read error ...
| * | | | | | | | 9p/trans_fd: put worker reqs on destroyDominique Martinet2018-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | p9_read_work/p9_write_work might still hold references to a req after having been cancelled; make sure we put any of these to avoid potential request leak on disconnect. Fixes: 728356dedeff8 ("9p: Add refcount to p9_req_t") Link: http://lkml.kernel.org/r/1539057956-23741-2-git-send-email-asmadeus@codewreck.org Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Reviewed-by: Tomas Bortoli <tomasbortoli@gmail.com>
| * | | | | | | | 9p/trans_fd: abort p9_read_work if req status changedDominique Martinet2018-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | p9_read_work would try to handle an errored req even if it got put to error state by another thread between the lookup (that worked) and the time it had been fully read. The request itself is safe to use because we hold a ref to it from the lookup (for m->rreq, so it was safe to read into the request data buffer until this point), but the req_list has been deleted at the same time status changed, and client_cb already has been called as well, so we should not do either. Link: http://lkml.kernel.org/r/1539057956-23741-1-git-send-email-asmadeus@codewreck.org Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Reported-by: syzbot+2222c34dc40b515f30dc@syzkaller.appspotmail.com Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net>
| * | | | | | | | 9p: potential NULL dereferenceDan Carpenter2018-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | p9_tag_alloc() is supposed to return error pointers, but we accidentally return a NULL here. It would cause a NULL dereference in the caller. Link: http://lkml.kernel.org/m/20180926103934.GA14535@mwanda Fixes: 996d5b4db4b1 ("9p: Use a slab for allocating requests") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
| * | | | | | | | 9p: p9dirent_read: check network-provided name lengthDominique Martinet2018-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | strcpy to dirent->d_name could overflow the buffer, use strscpy to check the provided string length and error out if the size was too big. While we are here, make the function return an error when the pdu parsing failed, instead of returning the pdu offset as if it had been a success... Link: http://lkml.kernel.org/r/1536339057-21974-4-git-send-email-asmadeus@codewreck.org Addresses-Coverity-ID: 139133 ("Copy into fixed size buffer") Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
| * | | | | | | | 9p/rdma: remove useless check in cm_event_handlerDominique Martinet2018-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the client c is always dereferenced to get the rdma struct, so c has to be a valid pointer at this point. Gcc would optimize that away but let's make coverity happy... Link: http://lkml.kernel.org/r/1536339057-21974-3-git-send-email-asmadeus@codewreck.org Addresses-Coverity-ID: 102778 ("Dereference before null check") Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
| * | | | | | | | 9p: do not trust pdu content for stat item sizeGertjan Halkes2018-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v9fs_dir_readdir() could deadloop if a struct was sent with a size set to -2 Link: http://lkml.kernel.org/r/1536134432-11997-1-git-send-email-asmadeus@codewreck.org Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88021 Signed-off-by: Gertjan Halkes <gertjan@google.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
| * | | | | | | | 9p: Rename req to rreq in trans_fdTomas Bortoli2018-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In struct p9_conn, rename req to rreq as it is used by the read routine. Link: http://lkml.kernel.org/r/20180903160321.2181-1-tomasbortoli@gmail.com Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Suggested-by: Jun Piao <piaojun@huawei.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
| * | | | | | | | 9p/rdma: do not disconnect on down_interruptible EAGAINDominique Martinet2018-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 9p/rdma would sometimes drop the connection and display errors in recv_done when the user does ^C. The errors were caused by recv buffers that were posted at the time of disconnect, and we just do not want to disconnect when down_interruptible is... interrupted. Link: http://lkml.kernel.org/r/1535625307-18019-1-git-send-email-asmadeus@codewreck.org Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
| * | | | | | | | 9p: Add refcount to p9_req_tTomas Bortoli2018-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid use-after-free(s), use a refcount to keep track of the usable references to any instantiated struct p9_req_t. This commit adds p9_req_put(), p9_req_get() and p9_req_try_get() as wrappers to kref_put(), kref_get() and kref_get_unless_zero(). These are used by the client and the transports to keep track of valid requests' references. p9_free_req() is added back and used as callback by kref_put(). Add SLAB_TYPESAFE_BY_RCU as it ensures that the memory freed by kmem_cache_free() will not be reused for another type until the rcu synchronisation period is over, so an address gotten under rcu read lock is safe to inc_ref() without corrupting random memory while the lock is held. Link: http://lkml.kernel.org/r/1535626341-20693-1-git-send-email-asmadeus@codewreck.org Co-developed-by: Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Reported-by: syzbot+467050c1ce275af2a5b8@syzkaller.appspotmail.com Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
| * | | | | | | | 9p: rename p9_free_req() functionTomas Bortoli2018-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In sight of the next patch to add a refcount in p9_req_t, rename the p9_free_req() function in p9_release_req(). In the next patch the actual kfree will be moved to another function. Link: http://lkml.kernel.org/r/20180811144254.23665-1-tomasbortoli@gmail.com Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Acked-by: Jun Piao <piaojun@huawei.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
| * | | | | | | | 9p: add a per-client fcall kmem_cacheDominique Martinet2018-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having a specific cache for the fcall allocations helps speed up end-to-end latency. The caches will automatically be merged if there are multiple caches of items with the same size so we do not need to try to share a cache between different clients of the same size. Since the msize is negotiated with the server, only allocate the cache after that negotiation has happened - previous allocations or allocations of different sizes (e.g. zero-copy fcall) are made with kmalloc directly. Some figures on two beefy VMs with Connect-IB (sriov) / trans=rdma, with ior running 32 processes in parallel doing small 32 bytes IOs: - no alloc (4.18-rc7 request cache): 65.4k req/s - non-power of two alloc, no patch: 61.6k req/s - power of two alloc, no patch: 62.2k req/s - non-power of two alloc, with patch: 64.7k req/s - power of two alloc, with patch: 65.1k req/s Link: http://lkml.kernel.org/r/1532943263-24378-2-git-send-email-asmadeus@codewreck.org Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Acked-by: Jun Piao <piaojun@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Greg Kurz <groug@kaod.org>
| * | | | | | | | 9p: embed fcall in req to round down buffer allocsDominique Martinet2018-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'msize' is often a power of two, or at least page-aligned, so avoiding an overhead of two dozen bytes for each allocation will help the allocator do its work and reduce memory fragmentation. Link: http://lkml.kernel.org/r/1533825236-22896-1-git-send-email-asmadeus@codewreck.org Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Jun Piao <piaojun@huawei.com> Cc: Matthew Wilcox <willy@infradead.org>