| Commit message (Collapse) | Author | Age |
|
|
|
| |
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently instruction_pointer() returns pt_regs->ret and so return value
is of type "long", which implicitly stands for "signed long".
While that's perfectly fine when dealing with 32-bit values if return
value of instruction_pointer() gets assigned to 64-bit variable sign
extension may happen.
And at least in one real use-case it happens already.
In perf_prepare_sample() return value of perf_instruction_pointer()
(which is an alias to instruction_pointer() in case of ARC) is assigned
to (struct perf_sample_data)->ip (which type is "u64").
And what we see if instuction pointer points to user-space application
that in case of ARC lays below 0x8000_0000 "ip" gets set properly with
leading 32 zeros. But if instruction pointer points to kernel address
space that starts from 0x8000_0000 then "ip" is set with 32 leadig
"f"-s. I.e. id instruction_pointer() returns 0x8100_0000, "ip" will be
assigned with 0xffff_ffff__8100_0000. Which is obviously wrong.
In particular that issuse broke output of perf, because perf was unable
to associate addresses like 0xffff_ffff__8100_0000 with anything from
/proc/kallsyms.
That's what we used to see:
----------->8----------
6.27% ls [unknown] [k] 0xffffffff8046c5cc
2.96% ls libuClibc-0.9.34-git.so [.] memcpy
2.25% ls libuClibc-0.9.34-git.so [.] memset
1.66% ls [unknown] [k] 0xffffffff80666536
1.54% ls libuClibc-0.9.34-git.so [.] 0x000224d6
1.18% ls libuClibc-0.9.34-git.so [.] 0x00022472
----------->8----------
With that change perf output looks much better now:
----------->8----------
8.21% ls [kernel.kallsyms] [k] memset
3.52% ls libuClibc-0.9.34-git.so [.] memcpy
2.11% ls libuClibc-0.9.34-git.so [.] malloc
1.88% ls libuClibc-0.9.34-git.so [.] memset
1.64% ls [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.41% ls [kernel.kallsyms] [k] __d_lookup_rcu
----------->8----------
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: arc-linux-dev@synopsys.com
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
|
|
| |
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
|
|
| |
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The IRQCHIP_DECLARE macro migrated to 'include/linux/irqchip.h'.
See commit 91e20b5040c67c51aad88cf87db4305c5bd7f79d
("irqchip: Move IRQCHIP_DECLARE macro to include/linux/irqchip.h").
This patch removes the inclusions of private header 'drivers/irqchip/irqchip.h'
and if necessary replaces them with inclusions of 'include/linux/irqchip.h'.
Signed-off-by: Joel Porquet <joel@porquet.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARCompact/ARCv2 ISA provide that any instructions which deals with
bitpos/count operand ASL, LSL, BSET, BCLR, BMSK .... will only consider
lower 5 bits. i.e. auto-clamp the pos to 0-31.
ARC Linux bitops exploited this fact by NOT explicitly masking out upper
bits for @nr operand in general, saving a bunch of AND/BMSK instructions
in generated code around bitops.
While this micro-optimization has worked well over years it is NOT safe
as shifting a number with a value, greater than native size is
"undefined" per "C" spec.
So as it turns outm EZChip ran into this eventually, in their massive
muti-core SMP build with 64 cpus. There was a test_bit() inside a loop
from 63 to 0 and gcc was weirdly optimizing away the first iteration
(so it was really adhering to standard by implementing undefined behaviour
vs. removing all the iterations which were phony i.e. (1 << [63..32])
| for i = 63 to 0
| X = ( 1 << i )
| if X == 0
| continue
So fix the code to do the explicit masking at the expense of generating
additional instructions. Fortunately, this can be mitigated to a large
extent as gcc has SHIFT_COUNT_TRUNCATED which allows combiner to fold
masking into shift operation itself. It is currently not enabled in ARC
gcc backend, but could be done after a bit of testing.
Fixes STAR 9000866918 ("unsafe "undefined behavior" code in kernel")
Reported-by: Noam Camus <noamc@ezchip.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
|
|
|
|
|
|
| |
With up-to-date FPGA builds ARC cores are supposed to correctly operate
even with 90 MHz clock (which is a target frequency for AXS103 release).
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: arc-linux-dev@synopsys.com
|
|
|
|
| |
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
|
|
|
|
|
| |
With this nsim standlone / OSCI have working irq affinity - AXS103 still
needs some work as IDU is not visible in intc hierarchy yet !
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
|
|
|
| |
Fixes: 9bf39ab2adaf ("vfs: add file_path() helper")
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
|
|
|
|
|
|
| |
alloc_pages_exact() get gfp flags and handle zero'ing already
And while it, fix the case where ioremap fails: return rightaway.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARC kernels have historically been built with -O3, despite top level
Makefile defaulting to -O2. This was facilitated by implicitly ordering
of arch makefile include AFTER top level assigned -O2.
An upstream fix to top level a1c48bb160f ("Makefile: Fix unrecognized
cross-compiler command line options") changed the ordering, making ARC
-O3 defunct.
Fix that by NOT relying on any ordering whatsoever and use the proper
arch override facility now present in kbuild (ARCH_*FLAGS)
Depends-on: ("kbuild: Allow arch Makefiles to override {cpp,ld,c}flags")
Suggested-by: Michal Marek <mmarek@suse.cz>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org # 3.16+
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since commit a1c48bb1 (Makefile: Fix unrecognized cross-compiler command
line options), the arch Makefile is included earlier by the main
Makefile, preventing the arc architecture to set its -O3 compiler
option. Since there might be more use cases for an arch Makefile to
fine-tune the options, add support for ARCH_CPPFLAGS, ARCH_AFLAGS and
ARCH_CFLAGS variables that are appended to the respective kbuild
variables. The user still has the final say via the KCPPFLAGS, KAFLAGS
and KCFLAGS variables.
Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: stable@vger.kernel.org # 3.16+
Signed-off-by: Michal Marek <mmarek@suse.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SLC maintenance ops need to be serialized by software as there is no
inherent buffering / quequing of aux commands. It can silently ignore a
new aux operation if previous one is still ongoing (SLC_CTRL_BUSY)
So gaurd the SLC op using a spin lock
The spin lock doesn't seem to be contended even in heavy workloads such
as iperf. On FPGA @ 75 MHz.
[1] Before this change:
============================================================
# iperf -c 10.42.0.1
------------------------------------------------------------
Client connecting to 10.42.0.1, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[ 3] local 10.42.0.110 port 38935 connected with 10.42.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 48.4 MBytes 40.6 Mbits/sec
============================================================
[2] After this change:
============================================================
# iperf -c 10.42.0.1
------------------------------------------------------------
Client connecting to 10.42.0.1, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[ 3] local 10.42.0.243 port 60248 connected with 10.42.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 47.5 MBytes 39.8 Mbits/sec
# iperf -c 10.42.0.1
------------------------------------------------------------
Client connecting to 10.42.0.1, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[ 3] local 10.42.0.243 port 60249 connected with 10.42.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 54.9 MBytes 46.0 Mbits/sec
============================================================
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: arc-linux-dev@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
|
|
| |
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull late x86 platform driver updates from Darren Hart:
"The following came in a bit later and I wanted them to bake in next a
few more days before submitting, thus the second pull.
A new intel_pmc_ipc driver, a symmetrical allocation and free fix in
dell-laptop, a couple minor fixes, and some updated documentation in
the dell-laptop comments.
intel_pmc_ipc:
- Add Intel Apollo Lake PMC IPC driver
tc1100-wmi:
- Delete an unnecessary check before the function call "kfree"
dell-laptop:
- Fix allocating & freeing SMI buffer page
- Show info about WiGig and UWB in debugfs
- Update information about wireless control"
* tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driver
tc1100-wmi: Delete an unnecessary check before the function call "kfree"
dell-laptop: Fix allocating & freeing SMI buffer page
dell-laptop: Show info about WiGig and UWB in debugfs
dell-laptop: Update information about wireless control
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This driver provides support for PMC control on Apollo Lake platforms.
The PMC is an ARC processor which defines some IPC commands for
communication with other entities in the CPU.
Signed-off-by: qipeng.zha <qipeng.zha@intel.com>
[fengguang.wu@intel.com: Fix Sparse and Cocinelle warnings]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This commit fix kernel crash when probing for rfkill devices in dell-laptop
driver failed. Function free_page() was incorrectly used on struct page *
instead of virtual address of SMI buffer.
This commit also simplify allocating page for SMI buffer by using
__get_free_page() function instead of sequential call of functions
alloc_page() and page_address().
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This commit show additional information about rfkill state in debugfs based
on newly released documentation by Dell.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Make sure that all existing SMBIOS calls for wireless control are properly
documented. This commit also add new documentation released by Dell.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"
[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...
|
| | |
| | |
| | |
| | |
| | |
| | | |
if server claims to have written/read more than we'd told it to,
warn and cap the claimed byte count to avoid advancing more than
we are ready to.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Braino in "9p: switch p9_client_write() to passing it struct iov_iter *";
if response is impossible to parse and we discard the request, get the
out of the loop right there.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.
Cc: stable@vger.kernel.org # v3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The brd driver is the only in-tree driver that may sleep currently.
After some discussion on linux-fsdevel, we decided that any driver
may choose to sleep in its ->direct_access method. To ensure that all
callers of bdev_direct_access() are prepared for this, add a call
to might_sleep().
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If a block device supports the ->direct_access methods, bypass the normal
DIO path and use DAX to go straight to memcpy() instead of allocating
a DIO and a BIO.
Includes support for the DIO_SKIP_DIO_COUNT flag in DAX, as is done in
do_blockdev_direct_IO().
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When userspace does a write, there's no need for the written data to
pollute the CPU cache. This matches the original XIP code.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
For block devices which are small enough, mkfs will default to creating
a filesystem with block sizes smaller than page size.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
__fget() does lockless fetch of pointer from the descriptor
table, attempts to grab a reference and treats "it was already
zero" as "it's already gone from the table, we just hadn't
seen the store, let's fail". Unfortunately, that breaks the
atomicity of dup2() - __fget() might see the old pointer,
notice that it's been already dropped and treat that as
"it's closed". What we should be getting is either the
old file or new one, depending whether we come before or after
dup2().
Dmitry had following test failing sometimes :
int fd;
void *Thread(void *x) {
char buf;
int n = read(fd, &buf, 1);
if (n != 1)
exit(printf("read failed: n=%d errno=%d\n", n, errno));
return 0;
}
int main()
{
fd = open("/dev/urandom", O_RDONLY);
int fd2 = open("/dev/urandom", O_RDONLY);
if (fd == -1 || fd2 == -1)
exit(printf("open failed\n"));
pthread_t th;
pthread_create(&th, 0, Thread, 0);
if (dup2(fd2, fd) == -1)
exit(printf("dup2 failed\n"));
pthread_join(th, 0);
if (close(fd) == -1)
exit(printf("close failed\n"));
if (close(fd2) == -1)
exit(printf("close failed\n"));
printf("DONE\n");
return 0;
}
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Mateusz Guzik reported :
Currently obtaining a new file descriptor results in locking fdtable
twice - once in order to reserve a slot and second time to fill it.
Holding the spinlock in __fd_install() is needed in case a resize is
done, or to prevent a resize.
Mateusz provided an RFC patch and a micro benchmark :
http://people.redhat.com/~mguzik/pipebench.c
A resize is an unlikely operation in a process lifetime,
as table size is at least doubled at every resize.
We can use RCU instead of the spinlock.
__fd_install() must wait if a resize is in progress.
The resize must block new __fd_install() callers from starting,
and wait that ongoing install are finished (synchronize_sched())
resize should be attempted by a single thread to not waste resources.
rcu_sched variant is used, as __fd_install() and expand_fdtable() run
from process context.
It gives us a ~30% speedup using pipebench on a dual Intel(R) Xeon(R)
CPU E5-2696 v2 @ 2.50GHz
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Mateusz Guzik <mguzik@redhat.com>
Tested-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
limitation
Execution of get_anon_bdev concurrently and preemptive kernel all
could bring race condition, it isn't enough to check dev against
its upper limitation with equality operator only.
This patch fix it.
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
currently, get_next_ino() is able to create inodes with inode number = 0.
This have a bad impact in the filesystems relying in this function to generate
inode numbers.
While there is no problem at all in having inodes with number 0, userspace tools
which handle file management tasks can have problems handling these files, like
for example, the impossiblity of users to delete these files, since glibc will
ignore them. So, I believe the best way is kernel to avoid creating them.
This problem has been raised previously, but the old thread didn't have any
other update for a year+, and I've seen too many users hitting the same issue
regarding the impossibility to delete files while using filesystems relying on
this function. So, I'm starting the thread again, with the same patch
that I believe is enough to address this problem.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The only caller that cares about its return value can just
as easily pick it from nd->root_seq itself. We used to just
calculate it and return to caller, but these days we are
storing it in nd->root_seq in all cases.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
dir_pages was declared in a lot of filesystems.
Use newly dir_pages() from pagemap.h
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
That function was declared in a lot of filesystems to calculate
directory pages.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
list_entry is just a wrapper for container_of, but it is arguably
wrong (and slightly confusing) to use it when the pointed-to struct
member is not a struct list_head. Use container_of directly instead.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Now that the retrieval operation may be disposed of by fscache_put_operation()
before we actually set the context, the retrieval-specific cleanup operation
can produce a NULL-pointer dereference when it tries to unconditionally clean
up the netfs context.
Given that it is expected that we'll get at least as far as the place where we
currently set the context pointer and it is unlikely we'll go through the
error handling paths prior to that point, retain the context right from the
point that the retrieval op is allocated.
Concomitant to this, we need to retain the cookie pointer in the retrieval op
also so that we can call the netfs to release its context in the release
method.
In addition, we might now get into fscache_release_retrieval_op() with the op
only initialised. To this end, set the operation to DEAD only after the
release method has been called and skip the n_pages test upon cleanup if the
op is still in the INITIALISED state.
Without these changes, the following oops might be seen:
BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
...
RIP: 0010:[<ffffffffa0089c98>] fscache_release_retrieval_op+0xae/0x100
...
Call Trace:
[<ffffffffa0088560>] fscache_put_operation+0x117/0x2e0
[<ffffffffa008b8f5>] __fscache_read_or_alloc_pages+0x351/0x3ac
[<ffffffffa00b761f>] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
[<ffffffffa00b06c5>] nfs_readpages+0x10c/0x185 [nfs]
[<ffffffff81124925>] ? alloc_pages_current+0x119/0x13e
[<ffffffff810ee5fd>] ? __page_cache_alloc+0xfb/0x10a
[<ffffffff810f87f8>] __do_page_cache_readahead+0x188/0x22c
[<ffffffff810f8b3a>] ondemand_readahead+0x29e/0x2af
[<ffffffff810f8c92>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff810ef337>] generic_file_read_iter+0x1a2/0x55a
[<ffffffffa00a9dff>] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
[<ffffffffa00a6a23>] nfs_file_read+0x49/0x70 [nfs]
[<ffffffff811363be>] new_sync_read+0x78/0x9c
[<ffffffff81137164>] __vfs_read+0x13/0x38
[<ffffffff8113721e>] vfs_read+0x95/0x121
[<ffffffff811372f6>] SyS_read+0x4c/0x8a
[<ffffffff81557a52>] system_call_fastpath+0x12/0x17
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Any time an incomplete operation is cancelled, the operation cancellation
function needs to be called to clean up. This is currently being passed
directly to some of the functions that might want to call it, but not all.
Instead, pass the cancellation method pointer to the fscache_operation_init()
and have that cache it in the operation struct. Further, plug in a dummy
cancellation handler if the caller declines to set one as this allows us to
call the function unconditionally (the extra overhead isn't worth bothering
about as we don't expect to be calling this typically).
The cancellation method must thence be called everywhere the CANCELLED state
is set. Note that we call it *before* setting the CANCELLED state such that
the method can use the old state value to guide its operation.
fscache_do_cancel_retrieval() needs moving higher up in the sources so that
the init function can use it now.
Without this, the following oops may be seen:
FS-Cache: Assertion failed
FS-Cache: 3 == 0 is false
------------[ cut here ]------------
kernel BUG at ../fs/fscache/page.c:261!
...
RIP: 0010:[<ffffffffa0089c1b>] fscache_release_retrieval_op+0x77/0x100
[<ffffffffa008853d>] fscache_put_operation+0x114/0x2da
[<ffffffffa008b8c2>] __fscache_read_or_alloc_pages+0x358/0x3b3
[<ffffffffa00b761f>] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
[<ffffffffa00b06c5>] nfs_readpages+0x10c/0x185 [nfs]
[<ffffffff81124925>] ? alloc_pages_current+0x119/0x13e
[<ffffffff810ee5fd>] ? __page_cache_alloc+0xfb/0x10a
[<ffffffff810f87f8>] __do_page_cache_readahead+0x188/0x22c
[<ffffffff810f8b3a>] ondemand_readahead+0x29e/0x2af
[<ffffffff810f8c92>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff810ef337>] generic_file_read_iter+0x1a2/0x55a
[<ffffffffa00a9dff>] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
[<ffffffffa00a6a23>] nfs_file_read+0x49/0x70 [nfs]
[<ffffffff811363be>] new_sync_read+0x78/0x9c
[<ffffffff81137164>] __vfs_read+0x13/0x38
[<ffffffff8113721e>] vfs_read+0x95/0x121
[<ffffffff811372f6>] SyS_read+0x4c/0x8a
[<ffffffff81557a52>] system_call_fastpath+0x12/0x17
The assertion is showing that the remaining number of pages (n_pages) is not 0
when the operation is being released.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Call fscache_put_operation() or a wrapper on any op that has gone through
fscache_operation_init() so that the accounting shown in /proc is done
correctly, specifically fscache_n_op_release.
fscache_put_operation() therefore now allows an op in the INITIALISED state as
well as in the CANCELLED and COMPLETE states.
Note that this means that an operation can get put that doesn't have its
->object pointer filled in, so anything that depends on the object needs to be
conditional in fscache_put_operation().
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Cancellation of an in-progress operation needs to update the relevant counters
and start any operations that are pending waiting on this one.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Count and display through /proc/fs/fscache/stats the number of initialised
operations.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Out of line fscache_operation_init() so that it can access internal FS-Cache
features, such as stats, in a later commit.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently, fscache_cancel_op() only cancels pending operations - attempts to
cancel in-progress operations are ignored. This leads to a problem in
fscache_wait_for_operation_activation() whereby the wait is terminated, but
the object has been killed.
The check at the end of the function now triggers because it's no longer
contingent on the cache having produced an I/O error since the commit that
fixed the logic error in fscache_object_is_dead().
The result of the check is that it tries to cancel the operation - but since
the object may not be pending by this point, the cancellation request may be
ignored - with the result that the the object is just put by the caller and
fscache_put_operation has an assertion failure because the operation isn't in
either the COMPLETE or the CANCELLED states.
To fix this, we permit in-progress ops to be cancelled under some
circumstances.
The bug results in an oops that looks something like this:
FS-Cache: fscache_wait_for_operation_activation() = -ENOBUFS [obj dead 3]
FS-Cache:
FS-Cache: Assertion failed
FS-Cache: 3 == 5 is false
------------[ cut here ]------------
kernel BUG at ../fs/fscache/operation.c:432!
...
RIP: 0010:[<ffffffffa0088574>] fscache_put_operation+0xf2/0x2cd
Call Trace:
[<ffffffffa008b92a>] __fscache_read_or_alloc_pages+0x2ec/0x3b3
[<ffffffffa00b761f>] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
[<ffffffffa00b06c5>] nfs_readpages+0x10c/0x185 [nfs]
[<ffffffff81124925>] ? alloc_pages_current+0x119/0x13e
[<ffffffff810ee5fd>] ? __page_cache_alloc+0xfb/0x10a
[<ffffffff810f87f8>] __do_page_cache_readahead+0x188/0x22c
[<ffffffff810f8b3a>] ondemand_readahead+0x29e/0x2af
[<ffffffff810f8c92>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff810ef337>] generic_file_read_iter+0x1a2/0x55a
[<ffffffffa00a9dff>] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
[<ffffffffa00a6a23>] nfs_file_read+0x49/0x70 [nfs]
[<ffffffff811363be>] new_sync_read+0x78/0x9c
[<ffffffff81137164>] __vfs_read+0x13/0x38
[<ffffffff8113721e>] vfs_read+0x95/0x121
[<ffffffff811372f6>] SyS_read+0x4c/0x8a
[<ffffffff81557a52>] system_call_fastpath+0x12/0x17
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
fscache_object_is_dead() returns true only if the object is marked dead and
the cache got an I/O error. This should be a logical OR instead. Since two
of the callers got split up into handling for separate subcases, expand the
other callers and kill the function. This is probably the right thing to do
anyway since one of the subcases isn't about the object at all, but rather
about the cache.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When an object is being marked as no longer live, do this under the object
spinlock to prevent a race with operation submission targeted on that object.
The problem occurs due to the following pair of intertwined sequences when the
cache tries to create an object that would take it over the hard available
space limit:
NETFS INTERFACE
===============
(A) The netfs calls fscache_acquire_cookie(). object creation is deferred to
the object state machine and the netfs is allowed to continue.
OBJECT STATE MACHINE KTHREAD
============================
(1) The object is looked up on disk by fscache_look_up_object()
calling cachefiles_walk_to_object(). The latter finds that the
object is not yet represented on disk and calls
fscache_object_lookup_negative().
(2) fscache_object_lookup_negative() sets FSCACHE_COOKIE_NO_DATA_YET
and clears FSCACHE_COOKIE_LOOKING_UP, thus allowing the netfs to
start queuing read operations.
(B) The netfs calls fscache_read_or_alloc_pages(). This calls
fscache_wait_for_deferred_lookup() which sees FSCACHE_COOKIE_LOOKING_UP
become clear, allowing the read to begin.
(C) A read operation is set up and passed to fscache_submit_op() to deal
with.
(3) cachefiles_walk_to_object() calls cachefiles_has_space(), which
fails (or one of the file operations to create stuff fails).
cachefiles returns an error to fscache.
(4) fscache_look_up_object() transits to the LOOKUP_FAILURE state,
(5) fscache_lookup_failure() sets FSCACHE_OBJECT_LOOKED_UP and
FSCACHE_COOKIE_UNAVAILABLE and clears FSCACHE_COOKIE_LOOKING_UP
then transits to the KILL_OBJECT state.
(6) fscache_kill_object() clears FSCACHE_OBJECT_IS_LIVE in an attempt
to reject any further requests from the netfs.
(7) object->n_ops is examined and found to be 0.
fscache_kill_object() transits to the DROP_OBJECT state.
(D) fscache_submit_op() locks the object spinlock, sees if it can dispatch
the op immediately by calling fscache_object_is_active() - which fails
since FSCACHE_OBJECT_IS_AVAILABLE has not yet been set.
(E) fscache_submit_op() then tests FSCACHE_OBJECT_LOOKED_UP - which is set.
It then queues the object and increments object->n_ops.
(8) fscache_drop_object() releases the object and eventually
fscache_put_object() calls cachefiles_put_object() which suffers
an assertion failure here:
ASSERTCMP(object->fscache.n_ops, ==, 0);
Locking the object spinlock in step (6) around the clearance of
FSCACHE_OBJECT_IS_LIVE ensures that the the decision trees in
fscache_submit_op() and fscache_submit_exclusive_op() don't see the IS_LIVE
flag being cleared mid-decision: either the op is queued before step (7) - in
which case fscache_kill_object() will see n_ops>0 and will deal with the op -
or the op will be rejected.
This, combined with rejecting op submission if the target object is dying, fix
the problem.
The problem shows up as the following oops:
CacheFiles: Assertion failed
CacheFiles: 1 == 0 is false
------------[ cut here ]------------
kernel BUG at ../fs/cachefiles/interface.c:339!
...
RIP: 0010:[<ffffffffa014fd9c>] [<ffffffffa014fd9c>] cachefiles_put_object+0x2a4/0x301 [cachefiles]
...
Call Trace:
[<ffffffffa008674b>] fscache_put_object+0x18/0x21 [fscache]
[<ffffffffa00883e6>] fscache_object_work_func+0x3ba/0x3c9 [fscache]
[<ffffffff81054dad>] process_one_work+0x226/0x441
[<ffffffff81055d91>] worker_thread+0x273/0x36b
[<ffffffff81055b1e>] ? rescuer_thread+0x2e1/0x2e1
[<ffffffff81059b9d>] kthread+0x10e/0x116
[<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb
[<ffffffff815579ac>] ret_from_fork+0x7c/0xb0
[<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|