aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* atm: mpoa: remove 32-bit timekeepingTina Ruchandani2017-11-30
| | | | | | | | | | | | | net/atm/mpoa_* files use 'struct timeval' to store event timestamps. struct timeval uses a 32-bit seconds field which will overflow in the year 2038 and beyond. Morever, the timestamps are being compared only to get seconds elapsed, so struct timeval which stores a seconds and microseconds field is an overkill. This patch replaces the use of struct timeval with time64_t to store a 64-bit seconds field. Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* atm: eni: fix several indentation issuesColin Ian King2017-11-30
| | | | | | | | There are several statements that have incorrect indentation. Fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* openvswitch: use ktime_get_ts64() instead of ktime_get_ts()Arnd Bergmann2017-11-30
| | | | | | | | | | | timespec is deprecated because of the y2038 overflow, so let's convert this one to ktime_get_ts64(). The code is already safe even on 32-bit architectures, since it uses monotonic times. On 64-bit architectures, nothing changes, while on 32-bit architectures this avoids one type conversion. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* netxen: remove timespec usageArnd Bergmann2017-11-30
| | | | | | | | | | | | netxen_collect_minidump() evidently just wants to get a monotonic timestamp. Using jiffies_to_timespec(jiffies, &ts) is not appropriate here, since it will overflow after 2^32 jiffies, which may be as short as 49 days of uptime. ktime_get_seconds() is the correct interface here. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: phy: harmonize phy_id{,_mask} data typeRichard Leitner2017-11-30
| | | | | | | | | | | Previously phy_id was u32 and phy_id_mask was unsigned int. As the phy_id_mask defines the important bits of the phy_id (and is therefore the same size) these two variables should be the same data type. Signed-off-by: Richard Leitner <richard.leitner@skidata.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ethernet: davinci_emac: Deduplicate bus_find_device() by name matchingLukas Wunner2017-11-30
| | | | | | | | No need to reinvent the wheel, we have bus_find_device_by_name(). Cc: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: thunderx: Set max queue count taking XDP_TX into accountSunil Goutham2017-11-30
| | | | | | | | | | | on T81 there are only 4 cores, hence setting max queue count to 4 would leave nothing for XDP_TX. This patch fixes this by doubling max queue count in above scenarios. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: cjacob <cjacob@caviumnetworks.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: thunderx: Add support for xdp redirectSunil Goutham2017-11-30
| | | | | | | | | | This patch adds support for XDP_REDIRECT. Flush is not yet supported. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: cjacob <cjacob@caviumnetworks.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2017-11-29
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd fixes from Bruce Fields: "I screwed up my merge window pull request; I only sent half of what I meant to. There were no new features, just bugfixes of various importance and some very minor cleanup, so I think it's all still appropriate for -rc2. Highlights: - Fixes from Trond for some races in the NFSv4 state code. - Fix from Naofumi Honda for a typo in the blocked lock notificiation code - Fixes from Vasily Averin for some problems starting and stopping lockd especially in network namespaces" * tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux: (23 commits) lockd: fix "list_add double add" caused by legacy signal interface nlm_shutdown_hosts_net() cleanup race of nfsd inetaddr notifiers vs nn->nfsd_serv change race of lockd inetaddr notifiers vs nlmsvc_rqst change SUNRPC: make cache_detail structures const NFSD: make cache_detail structures const sunrpc: make the function arg as const nfsd: check for use of the closed special stateid nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat lockd: lost rollback of set_grace_period() in lockd_down_net() lockd: added cleanup checks in exit_net hook grace: replace BUG_ON by WARN_ONCE in exit_net hook nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class lockd: remove net pointer from messages nfsd: remove net pointer from debug messages nfsd: Fix races with check_stateid_generation() nfsd: Ensure we check stateid validity in the seqid operation checks nfsd: Fix race in lock stateid creation nfsd4: move find_lock_stateid nfsd: Ensure we don't recognise lock stateids after freeing them ...
| * lockd: fix "list_add double add" caused by legacy signal interfaceVasily Averin2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | restart_grace() uses hardcoded init_net. It can cause to "list_add double add" in following scenario: 1) nfsd and lockd was started in several net namespaces 2) nfsd in init_net was stopped (lockd was not stopped because it have users from another net namespaces) 3) lockd got signal, called restart_grace() -> set_grace_period() and enabled lock_manager in hardcoded init_net. 4) nfsd in init_net is started again, its lockd_up() calls set_grace_period() and tries to add lock_manager into init_net 2nd time. Jeff Layton suggest: "Make it safe to call locks_start_grace multiple times on the same lock_manager. If it's already on the global grace_list, then don't try to add it again. (But we don't intentionally add twice, so for now we WARN about that case.) With this change, we also need to ensure that the nfsd4 lock manager initializes the list before we call locks_start_grace. While we're at it, move the rest of the nfsd_net initialization into nfs4_state_create_net. I see no reason to have it spread over two functions like it is today." Suggested patch was updated to generate warning in described situation. Suggested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nlm_shutdown_hosts_net() cleanupVasily Averin2017-11-27
| | | | | | | | | | | | | | | | | | nlm_complain_hosts() walks through nlm_server_hosts hlist, which should be protected by nlm_host_mutex. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * race of nfsd inetaddr notifiers vs nn->nfsd_serv changeVasily Averin2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfsd_inet[6]addr_event uses nn->nfsd_serv without taking nfsd_mutex, which can be changed during execution of notifiers and crash the host. Moreover if notifiers were enabled in one net namespace they are enabled in all other net namespaces, from creation until destruction. This patch allows notifiers to access nn->nfsd_serv only after the pointer is correctly initialized and delays cleanup until notifiers are no longer in use. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Tested-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * race of lockd inetaddr notifiers vs nlmsvc_rqst changeVasily Averin2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockd_inet[6]addr_event use nlmsvc_rqst without taken nlmsvc_mutex, nlmsvc_rqst can be changed during execution of notifiers and crash the host. Patch enables access to nlmsvc_rqst only when it was correctly initialized and delays its cleanup until notifiers are no longer in use. Note that nlmsvc_rqst can be temporally set to ERR_PTR, so the "if (nlmsvc_rqst)" check in notifiers is insufficient on its own. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Tested-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * SUNRPC: make cache_detail structures constBhumika Goyal2017-11-27
| | | | | | | | | | | | | | | | | | Make these const as they are only getting passed to the function cache_create_net having the argument as const. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * NFSD: make cache_detail structures constBhumika Goyal2017-11-27
| | | | | | | | | | | | | | | | | | Make these const as they are only getting passed to the function cache_create_net having the argument as const. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * sunrpc: make the function arg as constBhumika Goyal2017-11-27
| | | | | | | | | | | | | | | | | | | | | | Make the struct cache_detail *tmpl argument of the function cache_create_net as const as it is only getting passed to kmemup having the argument as const void *. Add const to the prototype too. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nfsd: check for use of the closed special stateidAndrew Elble2017-11-27
| | | | | | | | | | | | | | Prevent the use of the closed (invalid) special stateid by clients. Signed-off-by: Andrew Elble <aweits@rit.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nfsd: fix panic in posix_unblock_lock called from nfs4_laundromatNaofumi Honda2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From kernel 4.9, my two nfsv4 servers sometimes suffer from "panic: unable to handle kernel page request" in posix_unblock_lock() called from nfs4_laundromat(). These panics diseappear if we revert the commit "nfsd: add a LRU list for blocked locks". The cause appears to be a typo in nfs4_laundromat(), which is also present in nfs4_state_shutdown_net(). Cc: stable@vger.kernel.org Fixes: 7919d0a27f1e "nfsd: add a LRU list for blocked locks" Cc: jlayton@redhat.com Reveiwed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * lockd: lost rollback of set_grace_period() in lockd_down_net()Vasily Averin2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit efda760fe95ea ("lockd: fix lockd shutdown race") is incorrect, it removes lockd_manager and disarm grace_period_end for init_net only. If nfsd was started from another net namespace lockd_up_net() calls set_grace_period() that adds lockd_manager into per-netns list and queues grace_period_end delayed work. These action should be reverted in lockd_down_net(). Otherwise it can lead to double list_add on after restart nfsd in netns, and to use-after-free if non-disarmed delayed work will be executed after netns destroy. Fixes: efda760fe95e ("lockd: fix lockd shutdown race") Cc: stable@vger.kernel.org Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * lockd: added cleanup checks in exit_net hookVasily Averin2017-11-27
| | | | | | | | | | Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * grace: replace BUG_ON by WARN_ONCE in exit_net hookVasily Averin2017-11-27
| | | | | | | | | | Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex classAndrew Elble2017-11-27
| | | | | | | | | | | | | | | | The use of the st_mutex has been confusing the validator. Use the proper nested notation so as to not produce warnings. Signed-off-by: Andrew Elble <aweits@rit.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * lockd: remove net pointer from messagesVasily Averin2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Publishing of net pointer is not safe, use net->ns.inum as net ID in debug messages [ 171.757678] lockd_up_net: per-net data created; net=f00001e7 [ 171.767188] NFSD: starting 90-second grace period (net f00001e7) [ 300.653313] lockd: nuking all hosts in net f00001e7... [ 300.653641] lockd: host garbage collection for net f00001e7 [ 300.653968] lockd: nlmsvc_mark_resources for net f00001e7 [ 300.711483] lockd_down_net: per-net data destroyed; net=f00001e7 [ 300.711847] lockd: nuking all hosts in net 0... [ 300.711847] lockd: host garbage collection for net 0 [ 300.711848] lockd: nlmsvc_mark_resources for net 0 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nfsd: remove net pointer from debug messagesVasily Averin2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | Publishing of net pointer is not safe, replace it in debug meesages by net->ns.inum [ 119.989161] nfsd: initializing export module (net: f00001e7). [ 171.767188] NFSD: starting 90-second grace period (net f00001e7) [ 322.185240] nfsd: shutting down export module (net: f00001e7). [ 322.186062] nfsd: export shutdown complete (net: f00001e7). Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nfsd: Fix races with check_stateid_generation()Trond Myklebust2017-11-27
| | | | | | | | | | | | | | | | | | | | | | The various functions that call check_stateid_generation() in order to compare a client-supplied stateid with the nfs4_stid state, usually need to atomically check for closed state. Those that perform the check after locking the st_mutex using nfsd4_lock_ol_stateid() should now be OK, but we do want to fix up the others. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nfsd: Ensure we check stateid validity in the seqid operation checksTrond Myklebust2017-11-27
| | | | | | | | | | | | | | | | | | After taking the stateid st_mutex, we want to know that the stateid still represents valid state before performing any non-idempotent actions. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nfsd: Fix race in lock stateid creationTrond Myklebust2017-11-27
| | | | | | | | | | | | | | | | | | | | If we're looking up a new lock state, and the creation fails, then we want to unhash it, just like we do for OPEN. However in order to do so, we need to that no other LOCK requests can grab the mutex until we have unhashed it (and marked it as closed). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nfsd4: move find_lock_stateidTrond Myklebust2017-11-27
| | | | | | | | | | | | | | Trivial cleanup to simplify following patch. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nfsd: Ensure we don't recognise lock stateids after freeing themTrond Myklebust2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to deal with lookup races, nfsd4_free_lock_stateid() needs to be able to signal to other stateful functions that the lock stateid is no longer valid. Right now, nfsd_lock() will check whether or not an existing stateid is still hashed, but only in the "new lock" path. To ensure the stateid invalidation is also recognised by the "existing lock" path, and also by a second call to nfsd4_free_lock_stateid() itself, we can change the type to NFS4_CLOSED_STID under the stp->st_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)Trond Myklebust2017-11-27
| | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nfsd: Fix another OPEN stateid raceTrond Myklebust2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If nfsd4_process_open2() is initialising a new stateid, and yet the call to nfs4_get_vfs_file() fails for some reason, then we must declare the stateid closed, and unhash it before dropping the mutex. Right now, we unhash the stateid after dropping the mutex, and without changing the stateid type, meaning that another OPEN could theoretically look it up and attempt to use it. Reported-by: Andrew W Elble <aweits@rit.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nfsd: Fix stateid races between OPEN and CLOSETrond Myklebust2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Open file stateids can linger on the nfs4_file list of stateids even after they have been closed. In order to avoid reusing such a stateid, and confusing the client, we need to recheck the nfs4_stid's type after taking the mutex. Otherwise, we risk reusing an old stateid that was already closed, which will confuse clients that expect new stateids to conform to RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | Merge tag 'for-4.15-rc2-tag' of ↵Linus Torvalds2017-11-29
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "We've collected some fixes in since the pre-merge window freeze. There's technically only one regression fix for 4.15, but the rest seems important and candidates for stable. - fix missing flush bio puts in error cases (is serious, but rarely happens) - fix reporting stat::st_blocks for buffered append writes - fix space cache invalidation - fix out of bound memory access when setting zlib level - fix potential memory corruption when fsync fails in the middle - fix crash in integrity checker - incremetnal send fix, path mixup for certain unlink/rename combination - pass flags to writeback so compressed writes can be throttled properly - error handling fixes" * tag 'for-4.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: incremental send, fix wrong unlink path after renaming file btrfs: tree-checker: Fix false panic for sanity test Btrfs: fix list_add corruption and soft lockups in fsync btrfs: Fix wild memory access in compression level parser btrfs: fix deadlock when writing out space cache btrfs: clear space cache inode generation always Btrfs: fix reported number of inode blocks after buffered append writes Btrfs: move definition of the function btrfs_find_new_delalloc_bytes Btrfs: bail out gracefully rather than BUG_ON btrfs: dev_alloc_list is not protected by RCU, use normal list_del btrfs: add missing device::flush_bio puts btrfs: Fix transaction abort during failure in btrfs_rm_dev_item Btrfs: add write_flags for compression bio
| * | Btrfs: incremental send, fix wrong unlink path after renaming fileFilipe Manana2017-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under some circumstances, an incremental send operation can issue wrong paths for unlink commands related to files that have multiple hard links and some (or all) of those links were renamed between the parent and send snapshots. Consider the following example: Parent snapshot . (ino 256) |---- a/ (ino 257) | |---- b/ (ino 259) | | |---- c/ (ino 260) | | |---- f2 (ino 261) | | | |---- f2l1 (ino 261) | |---- d/ (ino 262) |---- f1l1_2 (ino 258) |---- f2l2 (ino 261) |---- f1_2 (ino 258) Send snapshot . (ino 256) |---- a/ (ino 257) | |---- f2l1/ (ino 263) | |---- b2/ (ino 259) | |---- c/ (ino 260) | | |---- d3 (ino 262) | | |---- f1l1_2 (ino 258) | | |---- f2l2_2 (ino 261) | | |---- f1_2 (ino 258) | | | |---- f2 (ino 261) | |---- f1l2 (ino 258) | |---- d (ino 261) When computing the incremental send stream the following steps happen: 1) When processing inode 261, a rename operation is issued that renames inode 262, which currently as a path of "d", to an orphan name of "o262-7-0". This is done because in the send snapshot, inode 261 has of its hard links with a path of "d" as well. 2) Two link operations are issued that create the new hard links for inode 261, whose names are "d" and "f2l2_2", at paths "/" and "o262-7-0/" respectively. 3) Still while processing inode 261, unlink operations are issued to remove the old hard links of inode 261, with names "f2l1" and "f2l2", at paths "a/" and "d/". However path "d/" does not correspond anymore to the directory inode 262 but corresponds instead to a hard link of inode 261 (link command issued in the previous step). This makes the receiver fail with a ENOTDIR error when attempting the unlink operation. The problem happens because before sending the unlink operation, we failed to detect that inode 262 was one of ancestors for inode 261 in the parent snapshot, and therefore we didn't recompute the path for inode 262 before issuing the unlink operation for the link named "f2l2" of inode 262. The detection failed because the function "is_ancestor()" only follows the first hard link it finds for an inode instead of all of its hard links (as it was originally created for being used with directories only, for which only one hard link exists). So fix this by making "is_ancestor()" follow all hard links of the input inode. A test case for fstests follows soon. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | btrfs: tree-checker: Fix false panic for sanity testQu Wenruo2017-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [BUG] If we run btrfs with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y, it will instantly cause kernel panic like: ------ ... assertion failed: 0, file: fs/btrfs/disk-io.c, line: 3853 ... Call Trace: btrfs_mark_buffer_dirty+0x187/0x1f0 [btrfs] setup_items_for_insert+0x385/0x650 [btrfs] __btrfs_drop_extents+0x129a/0x1870 [btrfs] ... ----- [Cause] Btrfs will call btrfs_check_leaf() in btrfs_mark_buffer_dirty() to check if the leaf is valid with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y. However quite some btrfs_mark_buffer_dirty() callers(*) don't really initialize its item data but only initialize its item pointers, leaving item data uninitialized. This makes tree-checker catch uninitialized data as error, causing such panic. *: These callers include but not limited to setup_items_for_insert() btrfs_split_item() btrfs_expand_item() [Fix] Add a new parameter @check_item_data to btrfs_check_leaf(). With @check_item_data set to false, item data check will be skipped and fallback to old btrfs_check_leaf() behavior. So we can still get early warning if we screw up item pointers, and avoid false panic. Cc: Filipe Manana <fdmanana@gmail.com> Reported-by: Lakshmipathi.G <lakshmipathi.g@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | Btrfs: fix list_add corruption and soft lockups in fsyncLiu Bo2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xfstests btrfs/146 revealed this corruption, [ 58.138831] Buffer I/O error on dev dm-0, logical block 2621424, async page read [ 58.151233] BTRFS error (device sdf): bdev /dev/mapper/error-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 [ 58.152403] list_add corruption. prev->next should be next (ffff88005e6775d8), but was ffffc9000189be88. (prev=ffffc9000189be88). [ 58.153518] ------------[ cut here ]------------ [ 58.153892] WARNING: CPU: 1 PID: 1287 at lib/list_debug.c:31 __list_add_valid+0x169/0x1f0 ... [ 58.157379] RIP: 0010:__list_add_valid+0x169/0x1f0 ... [ 58.161956] Call Trace: [ 58.162264] btrfs_log_inode_parent+0x5bd/0xfb0 [btrfs] [ 58.163583] btrfs_log_dentry_safe+0x60/0x80 [btrfs] [ 58.164003] btrfs_sync_file+0x4c2/0x6f0 [btrfs] [ 58.164393] vfs_fsync_range+0x5f/0xd0 [ 58.164898] do_fsync+0x5a/0x90 [ 58.165170] SyS_fsync+0x10/0x20 [ 58.165395] entry_SYSCALL_64_fastpath+0x1f/0xbe ... It turns out that we could record btrfs_log_ctx:io_err in log_one_extents when IO fails, but make log_one_extents() return '0' instead of -EIO, so the IO error is not acknowledged by the callers, i.e. btrfs_log_inode_parent(), which would remove btrfs_log_ctx:list from list head 'root->log_ctxs'. Since btrfs_log_ctx is allocated from stack memory, it'd get freed with a object alive on the list. then a future list_add will throw the above warning. This returns the correct error in the above case. Jeff also reported this while testing against his fsync error patch set[1]. [1]: https://www.spinics.net/lists/linux-btrfs/msg65308.html "btrfs list corruption and soft lockups while testing writeback error handling" Fixes: 8407f553268a4611f254 ("Btrfs: fix data corruption after fast fsync and writeback error") Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | btrfs: Fix wild memory access in compression level parserQu Wenruo2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [BUG] Kernel panic when mounting with "-o compress" mount option. KASAN will report like: ------ ================================================================== BUG: KASAN: wild-memory-access in strncmp+0x31/0xc0 Read of size 1 at addr d86735fce994f800 by task mount/662 ... Call Trace: dump_stack+0xe3/0x175 kasan_report+0x163/0x370 __asan_load1+0x47/0x50 strncmp+0x31/0xc0 btrfs_compress_str2level+0x20/0x70 [btrfs] btrfs_parse_options+0xff4/0x1870 [btrfs] open_ctree+0x2679/0x49f0 [btrfs] btrfs_mount+0x1b7f/0x1d30 [btrfs] mount_fs+0x49/0x190 vfs_kern_mount.part.29+0xba/0x280 vfs_kern_mount+0x13/0x20 btrfs_mount+0x31e/0x1d30 [btrfs] mount_fs+0x49/0x190 vfs_kern_mount.part.29+0xba/0x280 do_mount+0xaad/0x1a00 SyS_mount+0x98/0xe0 entry_SYSCALL_64_fastpath+0x1f/0xbe ------ [Cause] For 'compress' and 'compress_force' options, its token doesn't expect any parameter so its args[0] contains uninitialized data. Accessing args[0] will cause above wild memory access. [Fix] For Opt_compress and Opt_compress_force, set compression level to the default. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ set the default in advance ] Signed-off-by: David Sterba <dsterba@suse.com>
| * | btrfs: fix deadlock when writing out space cacheJosef Bacik2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we fail to prepare our pages for whatever reason (out of memory in our case) we need to make sure to drop the block_group->data_rwsem, otherwise hilarity ensues. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add label and use existing unlocking code ] Signed-off-by: David Sterba <dsterba@suse.com>
| * | btrfs: clear space cache inode generation alwaysJosef Bacik2017-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We discovered a box that had double allocations, and suspected the space cache may be to blame. While auditing the write out path I noticed that if we've already setup the space cache we will just carry on. This means that any error we hit after cache_save_setup before we go to actually write the cache out we won't reset the inode generation, so whatever was already written will be considered correct, except it'll be stale. Fix this by _always_ resetting the generation on the block group inode, this way we only ever have valid or invalid cache. With this patch I was no longer able to reproduce cache corruption with dm-log-writes and my bpf error injection tool. Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | Btrfs: fix reported number of inode blocks after buffered append writesFilipe Manana2017-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch from commit a7e3b975a0f9 ("Btrfs: fix reported number of inode blocks") introduced a regression where if we do a buffered write starting at position equal to or greater than the file's size and then stat(2) the file before writeback is triggered, the number of used blocks does not change (unless there's a prealloc/unwritten extent). Example: $ xfs_io -f -c "pwrite -S 0xab 0 64K" foobar $ du -h foobar 0 foobar $ sync $ du -h foobar 64K foobar The first version of that patch didn't had this regression and the second version, which was the one committed, was made only to address some performance regression detected by the intel test robots using fs_mark. This fixes the regression by setting the new delaloc bit in the range, and doing it at btrfs_dirty_pages() while setting the regular dealloc bit as well, so that this way we set both bits at once avoiding navigation of the inode's io tree twice. Doing it at btrfs_dirty_pages() is also the most meaninful place, as we should set the new dellaloc bit when if we set the delalloc bit, which happens only if we copied bytes into the pages at __btrfs_buffered_write(). This was making some of LTP's du tests fail, which can be quickly run using a command line like the following: $ ./runltp -q -p -l /ltp.log -f commands -s du -d /mnt Fixes: a7e3b975a0f9 ("Btrfs: fix reported number of inode blocks") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | Btrfs: move definition of the function btrfs_find_new_delalloc_bytesFilipe Manana2017-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the definition of the function btrfs_find_new_delalloc_bytes() closer to the function btrfs_dirty_pages(), because in a future commit it will be used exclusively by btrfs_dirty_pages(). This just moves the function's definition, with no functional changes at all. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | Btrfs: bail out gracefully rather than BUG_ONLiu Bo2017-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a file's DIR_ITEM key is invalid (due to memory errors) and gets written to disk, a future lookup_path can end up with kernel panic due to BUG_ON(). This gets rid of the BUG_ON(), meanwhile output the corrupted key and return ENOENT if it's invalid. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reported-by: Guillaume Bouchard <bouchard@mercs-eng.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | btrfs: dev_alloc_list is not protected by RCU, use normal list_delDavid Sterba2017-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dev_alloc_list list could be protected by various mutexes, depending on the context. The list tracks devices that can take part of allocating new chunks, so the closest mutex is chunk_mutex. Adding a new device from inside the ADD_DEV ioctl will need device_list_mutex and registering a new device from the ioctl needs uuid_mutex. All mutexes naturally guarantee exclusivity against the same context. The device ownership can move between the contexts and the exclusivity is guaranteed by other means, eg. during the mount with the uuid_mutex. There's no RCU involved for dev_alloc_list. Signed-off-by: David Sterba <dsterba@suse.com>
| * | btrfs: add missing device::flush_bio putsDavid Sterba2017-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes potential bio leaks, in several error paths. Unfortunatelly the device structure freeing is opencoded in many places and I missed them when introducing the flush_bio. Most of the time, devices get freed through call_rcu(..., free_device), so it at least it's not that easy to hit the leak, but it's still possible through the path that frees stale devices. Fixes: e0ae99941423 ("btrfs: preallocate device flush bio") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | btrfs: Fix transaction abort during failure in btrfs_rm_dev_itemNikolay Borisov2017-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_rm_dev_item calls several function under an active transaction, however it fails to abort it if an error happens. Fix this by adding explicit btrfs_abort_transaction/btrfs_end_transaction calls. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | Btrfs: add write_flags for compression bioLiu Bo2017-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compression code path has only flaged bios with REQ_OP_WRITE no matter where the bios come from, but it could be a sync write if fsync starts this writeback or a normal writeback write if wb kthread starts a periodic writeback. It breaks the rule that sync writes and writeback writes need to be differentiated from each other, because from the POV of block layer, all bios need to be recognized by these flags in order to do some management, e.g. throttlling. This passes writeback_control to compression write path so that it can send bios with proper flags to block layer. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | | Merge tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds2017-11-29
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull Microblaze fix from Michal Simek: "Add missing header to mmu_context_mm.h" * tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: add missing include to mmu_context_mm.h
| * | | microblaze: add missing include to mmu_context_mm.hOded Gabbay2017-10-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mmu_context_mm.h is using struct task_struct, which is defined in linux/sched.h. Source files that include mm_context_mm.h (directly or indirectly) and doesn't include linux/sched.h will generate an error. An example of that is drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c This patch adds an include of linux/sched.h to mmu_context_mm.h to avoid such errors. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds2017-11-29
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull sparc fix from David Miller: "Sparc T4 and later cpu bootup regression fix" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Fix boot on T4 and later.
| * | | | sparc64: Fix boot on T4 and later.David S. Miller2017-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we don't put the NG4fls.o object into the same part of the link as the generic sparc64 objects for fls() and __fls() then the relocation in the branch we use for patching will not fit. Move NG4fls.o into lib-y to fix this problem. Fixes: 46ad8d2d22c1 ("sparc64: Use sparc optimized fls and __fls for T4 and above") Signed-off-by: David S. Miller <davem@davemloft.net> Reported-by: Anatoly Pugachev <matorola@gmail.com> Tested-by: Anatoly Pugachev <matorola@gmail.com>