| Commit message (Collapse) | Author | Age |
... | |
| | |/ / /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This platform driver has a OF device ID table but the OF module
alias information is not created so module autoloading won't work.
Signed-off-by: Luis de Bethencourt <luis@debethencourt.com>
Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"The new hfi1 driver in staging/rdma has had a number of fixup patches
since being added to the tree. This is the first batch of those fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/hfi: Properly set permissions for user device files
IB/hfi1: mask vs shift confusion
IB/hfi1: clean up some defines
IB/hfi1: info leak in get_ctxt_info()
IB/hfi1: fix a locking bug
IB/hfi1: checking for NULL instead of IS_ERR
IB/hfi1: fix sdma_descq_cnt parameter parsing
IB/hfi1: fix copy_to/from_user() error handling
IB/hfi1: fix pstateinfo from returning improperly byteswapped value
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Some of the device files are required to be user accessible for PSM while
most should remain accessible only by root.
Add a parameter to hfi1_cdev_init which controls if the user should have access
to this device which places it in a different class with the appropriate
devnode callback.
In addition set the devnode call back for the existing class to be a bit more
explicit for those permissions.
Finally remove the unnecessary null check before class_destroy
Tested-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Haralanov, Mitko (mitko.haralanov@intel.com)
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We are shifting by the _MASK macros instead of the _SHIFT ones.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
I added spaces around operators so it matches kernel style because
normally "-1ULL" is a number and " - 1" is a subtract operation. Also
removed some superflous "ULL" types so "1ULL" becomes "1".
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The cinfo struct has a hole after the last struct member so we need to
zero it out. Otherwise we disclose some uninitialized stack data.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
mutex_trylock() returns zero on failure, not EBUSY.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
__get_txreq() returns an ERR_PTR() but this checks for NULL so it would
oops on failure.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The boolean tests should have been or-ed.
Reported-by: David Binderman <dcb314@hotmail.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
copy_to/from_user() returns the number of bytes which we were not able
to copy. It doesn't return an error code.
Also a couple places had a printk() on error and I removed that because
people can take advantage of it to fill /var/log/messages with spam.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Byteswap link_width_downgrade_*_active values before sending on the wire. In
addition properly define the Port State Info structure.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Christian Gomez <christian.gomez@intel.com>
Signed-off-by: Rimmer, Todd <todd.rimmer@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
- a boot regression (since v4.2) fix for some ARM configurations from
Tyler
- regression (since v4.1) fixes for mkfs.xfs on a DAX enabled device
from Jeff. These are tagged for -stable.
- a pair of locking fixes from Axel that are hidden from lockdep since
they involve device_lock(). The "btt" one is tagged for -stable, the
other only applies to the new "pfn" mechanism in v4.3.
- a fix for the pmem ->rw_page() path to use wmb_pmem() from Ross.
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
mm: fix type cast in __pfn_to_phys()
pmem: add proper fencing to pmem_rw_page()
libnvdimm: pfn_devs: Fix locking in namespace_store
libnvdimm: btt_devs: Fix locking in namespace_store
blockdev: don't set S_DAX for misaligned partitions
dax: fix O_DIRECT I/O to the last block of a blockdev
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The various definitions of __pfn_to_phys() have been consolidated to
use a generic macro in include/asm-generic/memory_model.h. This hit
mainline in the form of 012dcef3f058 "mm: move __phys_to_pfn and
__pfn_to_phys to asm/generic/memory_model.h". When the generic macro
was implemented the type cast to phys_addr_t was dropped which caused
boot regressions on ARM platforms with more than 4GB of memory and
LPAE enabled.
It was suggested to use PFN_PHYS() defined in include/linux/pfn.h
as provides the correct logic and avoids further duplication.
Reported-by: kernelci.org bot <bot@kernelci.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tyler Baker <tyler.baker@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
pmem_rw_page() needs to call wmb_pmem() on writes to make sure that the
newly written data is durable. This flow was added to pmem_rw_bytes()
and pmem_make_request() with this commit:
commit 61031952f4c8 ("arch, x86: pmem api for ensuring durability of
persistent memory updates")
...the pmem_rw_page() path was missed.
Cc: <stable@vger.kernel.org>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Always take device_lock() before nvdimm_bus_lock() to prevent deadlock.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Always take device_lock() before nvdimm_bus_lock() to prevent deadlock.
Cc: <stable@vger.kernel.org>
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The dax code doesn't currently support misaligned partitions,
so disable O_DIRECT via dax until such time as that support
materializes.
Cc: <stable@vger.kernel.org>
Suggested-by: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| | |/ / / /
| |/| | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
commit bbab37ddc20b (block: Add support for DAX reads/writes to
block devices) caused a regression in mkfs.xfs. That utility
sets the block size of the device to the logical block size
using the BLKBSZSET ioctl, and then issues a single sector read
from the last sector of the device. This results in the dax_io
code trying to do a page-sized read from 512 bytes from the end
of the device. The result is -ERANGE being returned to userspace.
The fix is to align the block to the page size before calling
get_block.
Thanks to willy for simplifying my original patch.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Tested-by: Linda Knippers <linda.knippers@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Pull block updates from Jens Axboe:
"This is a bit bigger than it should be, but I could (did) not want to
send it off last week due to both wanting extra testing, and expecting
a fix for the bounce regression as well. In any case, this contains:
- Fix for the blk-merge.c compilation warning on gcc 5.x from me.
- A set of back/front SG gap merge fixes, from me and from Sagi.
This ensures that we honor SG gapping for integrity payloads as
well.
- Two small fixes for null_blk from Matias, fixing a leak and a
capacity propagation issue.
- A blkcg fix from Tejun, fixing a NULL dereference.
- A fast clone optimization from Ming, fixing a performance
regression since the arbitrarily sized bio's were introduced.
- Also from Ming, a regression fix for bouncing IOs"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix bounce_end_io
block: blk-merge: fast-clone bio when splitting rw bios
block: blkg_destroy_all() should clear q->root_blkg and ->root_rl.blkg
block: Copy a user iovec if it includes gaps
block: Refuse adding appending a gapped integrity page to a bio
block: Refuse request/bio merges with gaps in the integrity payload
block: Check for gaps on front and back merges
null_blk: fix wrong capacity when bs is not 512 bytes
null_blk: fix memory leak on cleanup
block: fix bogus compiler warnings in blk-merge.c
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
When bio bounce is involved, one new bio and its biovecs are
cloned from the comming bio, which can be one fast-cloned bio
from upper layer(such as dm).
So it is obviously wrong to assume the start index of the coming(
original) bio's io vector is zero, which can be any value between
0 and (bi_max_vecs - 1), especially in case of bio split.
This patch fixes Fedora's booting oops on i386, often with the
following kernel log together:
> [ 9.026738] systemd[1]: Switching root.
> [ 9.036467] systemd-journald[149]: Received SIGTERM from PID 1
> (systemd).
> [ 9.082262] BUG: Bad page state in process kworker/u5:1 pfn:372ac
> [ 9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178
> index:0x16a
> [ 9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
> [ 9.087284] page dumped because: page still charged to cgroup
> [ 9.088772] bad because of flags:
> [ 9.089731] flags: 0x21(locked|lru)
> [ 9.090818] page->mem_cgroup:f2c3e400
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Tested-by: Adam Williamson <awilliam@redhat.com>
Cc: Ming Lin <mlin@kernel.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
biovecs has become immutable since v3.13, so it isn't necessary
to allocate biovecs for the new cloned bios, then we can save
one extra biovecs allocation/copy, and the allocation is often
not fixed-length and a bit more expensive.
For example, if the 'max_sectors_kb' of null blk's queue is set
as 16(32 sectors) via sysfs just for making more splits, this patch
can increase throught about ~70% in the sequential read test over
null_blk(direct io, bs: 1M).
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Ming Lin <ming.l@ssi.samsung.com>
Cc: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
This fixes a performance regression introduced by commit 54efd50bfd,
and allows us to take full advantage of the fact that we have immutable
bio_vecs. Hand applied, as it rejected violently with commit
5014c311baa2.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
While making the root blkg unconditional, ec13b1d6f0a0 ("blkcg: always
create the blkcg_gq for the root blkcg") removed the part which clears
q->root_blkg and ->root_rl.blkg during q exit. This leaves the two
pointers dangling after blkg_destroy_all(). blk-throttle exit path
performs blkg traversals and dereferences ->root_blkg and can lead to
the following oops.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000558
IP: [<ffffffff81389746>] __blkg_lookup+0x26/0x70
...
task: ffff88001b4e2580 ti: ffff88001ac0c000 task.ti: ffff88001ac0c000
RIP: 0010:[<ffffffff81389746>] [<ffffffff81389746>] __blkg_lookup+0x26/0x70
...
Call Trace:
[<ffffffff8138d14a>] blk_throtl_drain+0x5a/0x110
[<ffffffff8138a108>] blkcg_drain_queue+0x18/0x20
[<ffffffff81369a70>] __blk_drain_queue+0xc0/0x170
[<ffffffff8136a101>] blk_queue_bypass_start+0x61/0x80
[<ffffffff81388c59>] blkcg_deactivate_policy+0x39/0x100
[<ffffffff8138d328>] blk_throtl_exit+0x38/0x50
[<ffffffff8138a14e>] blkcg_exit_queue+0x3e/0x50
[<ffffffff8137016e>] blk_release_queue+0x1e/0xc0
...
While the bug is a straigh-forward use-after-free bug, it is tricky to
reproduce because blkg release is RCU protected and the rest of exit
path usually finishes before RCU grace period.
This patch fixes the bug by updating blkg_destro_all() to clear
q->root_blkg and ->root_rl.blkg.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: "Richard W.M. Jones" <rjones@redhat.com>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Link: http://lkml.kernel.org/g/CA+5PVA5rzQ0s4723n5rHBcxQa9t0cW8BPPBekr_9aMRoWt2aYg@mail.gmail.com
Fixes: ec13b1d6f0a0 ("blkcg: always create the blkcg_gq for the root blkcg")
Cc: stable@vger.kernel.org # v4.2+
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
For drivers that don't support gaps in the SG lists handed to
them we must bounce (copy the user buffers) and pass a bio that
does not include gaps. This doesn't matter for any current user,
but will help to allow iser which can't handle gaps to use the
block virtual boundary instead of using driver-local bounce
buffering when handling SG_IO commands.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This is only theoretical at the moment given that the only
subsystems that generate integrity payloads are the block layer
itself and the scsi target (which generate well aligned integrity
payloads). But when we will expose integrity meta-data to user-space,
we'll need to refuse appending a page with a gap (if the queue
virtual boundary is set).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
If a driver sets the block queue virtual boundary mask, it means that
it cannot handle gaps so we must not allow those in the integrity
payload as well.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Fixed up by me to have duplicate integrity merge functions, depending
on whether block integrity is enabled or not. Fixes a compilations
issue with CONFIG_BLK_DEV_INTEGRITY unset.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
We are checking for gaps to previous bio_vec, which can
only detect back merges gaps. Moreover, at the point where
we check for a gap, we don't know if we will attempt a back
or a front merge. Thus, check for gap to prev in a back merge
attempt and check for a gap to next in a front merge attempt.
Signed-off-by: Jens Axboe <axboe@fb.com>
[sagig: Minor rename change]
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
set_capacity() sets device's capacity using 512 bytes sectors.
null_blk calculates the number of sectors by size / bs, which
set_capacity is called with. This led to null_blk exposing the
wrong number of sectors when bs is not 512 bytes.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Driver was not freeing the memory allocated for internal nullb queues.
This patch frees the memory during driver unload.
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The compiler can't figure out that bvprv is initialized whenever 'prev'
is set to 1 as well. Use a pointer to bvprv instead, setting it to NULL
initially, and get rid of the 'prev' tracking. This dumbs it down
enough that gcc is happy.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Commit 505a666ee3fc ("writeback: plug writeback in wb_writeback() and
writeback_inodes_wb()") has us holding a plug during writeback_sb_inodes,
which increases the merge rate when relatively contiguous small files
are written by the filesystem. It helps both on flash and spindles.
For an fs_mark workload creating 4K files in parallel across 8 drives,
this commit improves performance ~9% more by unplugging before calling
cond_resched(). cond_resched() doesn't trigger an implicit unplug, so
explicitly getting the IO down to the device before scheduling reduces
latencies for anyone waiting on clean pages.
It also cuts down on how often we use kblockd to unplug, which means
less work bouncing from one workqueue to another.
Many more details about how we got here:
https://lkml.org/lkml/2015/9/11/570
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Pull virtio fixes and cleanups from Michael Tsirkin:
"This fixes the virtio-test tool, and improves the error handling for
virtio-ccw"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio/s390: handle failures of READ_VQ_CONF ccw
tools/virtio: propagate V=X to kernel build
vhost: move features to core
tools/virtio: fix build after 4.2 changes
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
In virtio_ccw_read_vq_conf() the return value of ccw_io_helper()
was not checked.
If the configuration could not be read properly, we'd wrongly assume a
queue size of 0.
Let's propagate any I/O error to virtio_ccw_setup_vq() so it may
properly fail.
Signed-off-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
virtio 1 and any layout are core features, move them
there. This fixes vhost test.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
more stubs, mostly
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|\ \ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Pull KVM fixes from Paolo Bonzini:
"Mostly stable material, a lot of ARM fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
sched: access local runqueue directly in single_task_running
arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'
arm64: KVM: Remove all traces of the ThumbEE registers
arm: KVM: Disable virtual timer even if the guest is not using it
arm64: KVM: Disable virtual timer even if the guest is not using it
arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources
KVM: s390: Replace incorrect atomic_or with atomic_andnot
arm: KVM: Fix incorrect device to IPA mapping
arm64: KVM: Fix user access for debug registers
KVM: vmx: fix VPID is 0000H in non-root operation
KVM: add halt_attempted_poll to VCPU stats
kvm: fix zero length mmio searching
kvm: fix double free for fast mmio eventfd
kvm: factor out core eventfd assign/deassign logic
kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd
KVM: make the declaration of functions within 80 characters
KVM: arm64: add workaround for Cortex-A57 erratum #852523
KVM: fix polling for guest halt continued even if disable it
arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores
arm64: KVM: set {v,}TCR_EL2 RES1 bits
...
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Commit 2ee507c47293 ("sched: Add function single_task_running to let a task
check if it is the only task running on a cpu") referenced the current
runqueue with the smp_processor_id. When CONFIG_DEBUG_PREEMPT is enabled,
that is only allowed if preemption is disabled or the currrent task is
bound to the local cpu (e.g. kernel worker).
With commit f78195129963 ("kvm: add halt_poll_ns module parameter") KVM
calls single_task_running. If CONFIG_DEBUG_PREEMPT is enabled that
generates a lot of kernel messages.
To avoid adding preemption in that cases, as it would limit the usefulness,
we change single_task_running to access directly the cpu local runqueue.
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Fixes: 2ee507c472939db4b146d545352b8a7c79ef47f8
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |\ \ \ \ \ \ \ \
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
Second set of KVM/ARM changes for 4.3-rc2
- Workaround for a Cortex-A57 erratum
- Bug fix for the debugging infrastructure
- Fix for 32bit guests with more than 4GB of address space
on a 32bit host
- A number of fixes for the (unusual) case when we don't use
the in-kernel GIC emulation
- Removal of ThumbEE handling on arm64, since these have been
dropped from the architecture before anyone actually ever
built a CPU
- Remove the KVM_ARM_MAX_VCPUS limitation which has become
fairly pointless
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
This patch removes config option of KVM_ARM_MAX_VCPUS,
and like other ARCHs, just choose the maximum allowed
value from hardware, and follows the reasons:
1) from distribution view, the option has to be
defined as the max allowed value because it need to
meet all kinds of virtulization applications and
need to support most of SoCs;
2) using a bigger value doesn't introduce extra memory
consumption, and the help text in Kconfig isn't accurate
because kvm_vpu structure isn't allocated until request
of creating VCPU is sent from QEMU;
3) the main effect is that the field of vcpus[] in 'struct kvm'
becomes a bit bigger(sizeof(void *) per vcpu) and need more cache
lines to hold the structure, but 'struct kvm' is one generic struct,
and it has worked well on other ARCHs already in this way. Also,
the world switch frequecy is often low, for example, it is ~2000
when running kernel building load in VM from APM xgene KVM host,
so the effect is very small, and the difference can't be observed
in my test at all.
Cc: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Although the ThumbEE registers and traps were present in earlier
versions of the v8 architecture, it was retrospectively removed and so
we can do the same.
Whilst this breaks migrating a guest started on a previous version of
the kernel, it is much better to kill these (non existent) registers
as soon as possible.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[maz: added commend about migration]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
When running a guest with the architected timer disabled (with QEMU and
the kernel_irqchip=off option, for example), it is important to make
sure the timer gets turned off. Otherwise, the guest may try to
enable it anyway, leading to a screaming HW interrupt.
The fix is to unconditionally turn off the virtual timer on guest
exit.
Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
When running a guest with the architected timer disabled (with QEMU and
the kernel_irqchip=off option, for example), it is important to make
sure the timer gets turned off. Otherwise, the guest may try to
enable it anyway, leading to a screaming HW interrupt.
The fix is to unconditionally turn off the virtual timer on guest
exit.
Cc: stable@vger.kernel.org
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Until b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops"),
kvm_vgic_map_resources() used to include a check on irqchip_in_kernel(),
and vgic_v2_map_resources() still has it.
But now vm_ops are not initialized until we call kvm_vgic_create().
Therefore kvm_vgic_map_resources() can being called without a VGIC,
and we die because vm_ops.map_resources is NULL.
Fixing this restores QEMU's kernel-irqchip=off option to a working state,
allowing to use GIC emulation in userspace.
Fixes: b26e5fdac43c ("arm/arm64: KVM: introduce per-VM ops")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
[maz: reworked commit message]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
A critical bug has been found in device memory stage1 translation for
VMs with more then 4GB of address space. Once vm_pgoff size is smaller
then pa (which is true for LPAE case, u32 and u64 respectively) some
more significant bits of pa may be lost as a shift operation is performed
on u32 and later cast onto u64.
Example: vm_pgoff(u32)=0x00210030, PAGE_SHIFT=12
expected pa(u64): 0x0000002010030000
produced pa(u64): 0x0000000010030000
The fix is to change the order of operations (casting first onto phys_addr_t
and then shifting).
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
[maz: fixed changelog and patch formatting]
Cc: stable@vger.kernel.org
Signed-off-by: Marek Majtyka <marek.majtyka@tieto.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
When setting the debug register from userspace, make sure that
copy_from_user() is called with its parameters in the expected
order. It otherwise doesn't do what you think.
Fixes: 84e690bfbed1 ("KVM: arm64: introduce vcpu->arch.debug_ptr")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
When restoring the system register state for an AArch32 guest at EL2,
writes to DACR32_EL2 may not be correctly synchronised by Cortex-A57,
which can lead to the guest effectively running with junk in the DACR
and running into unexpected domain faults.
This patch works around the issue by re-ordering our restoration of the
AArch32 register aliases so that they happen before the AArch64 system
registers. Ensuring that the registers are restored in this order
guarantees that they will be correctly synchronised by the core.
Cc: <stable@vger.kernel.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
The offending commit accidentally replaces an atomic_clear with an
atomic_or instead of an atomic_andnot in kvm_s390_vcpu_request_handled.
The symptom is that kvm guests on s390 hang on startup.
This patch simply replaces the incorrect atomic_or with atomic_andnot
Fixes: 805de8f43c20 (atomic: Replace atomic_{set,clear}_mask() usage)
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Reference SDM 28.1:
The current VPID is 0000H in the following situations:
- Outside VMX operation. (This includes operation in system-management
mode under the default treatment of SMIs and SMM with VMX operation;
see Section 34.14.)
- In VMX root operation.
- In VMX non-root operation when the “enable VPID” VM-execution control
is 0.
The VPID should never be 0000H in non-root operation when "enable VPID"
VM-execution control is 1. However, commit 34a1cd60 ("kvm: x86: vmx:
move some vmx setting from vmx_init() to hardware_setup()") remove the
codes which reserve 0000H for VMX root operation.
This patch fix it by again reserving 0000H for VMX root operation.
Cc: stable@vger.kernel.org # 3.19+
Fixes: 34a1cd60d17f62c1f077c1478a6c2ca8c3d17af4
Reported-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
This new statistic can help diagnosing VCPUs that, for any reason,
trigger bad behavior of halt_poll_ns autotuning.
For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
10+20+40+80+160+320+480 = 1110 microseconds out of every
479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
is consuming about 30% more CPU than it would use without
polling. This would show as an abnormally high number of
attempted polling compared to the successful polls.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com<
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Currently, if we had a zero length mmio eventfd assigned on
KVM_MMIO_BUS. It will never be found by kvm_io_bus_cmp() since it
always compares the kvm_io_range() with the length that guest
wrote. This will cause e.g for vhost, kick will be trapped by qemu
userspace instead of vhost. Fixing this by using zero length if an
iodevice is zero length.
Cc: stable@vger.kernel.org
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|