| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
| |
Reformat the existing documentation to have more structure. This allows
for more documentation seperated from the existing paragraphs.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
| |
Usually the admin queue depth of 64 is plenty, but for some use cases we
really need it larger. Examples are use cases like MAT, where you have
to touch all of NAND for init/format like purposes. In those cases, we
see a good 2x increase with an increased queue depth.
Signed-off-by: Jens Axboe <axboe@fb.com>
Acked-by: Keith Busch <keith.busch@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PRP list calculation is supposed to be based on device's page size.
Systems with page size larger than device's page size cause corruption
to the name space as well as system memory with out this fix.
Systems like x86 might not experience this issue because it uses
PAGE_SIZE of 4K where as powerpc uses PAGE_SIZE of 64k while NVMe device's
page size varies depending upon the vendor.
Signed-off-by: Murali Iyer <mniyer@us.ibm.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The driver may issue commands to a device that may never return, so its
request_queue could always have active requests while the controller is
running. Waiting for the queue to freeze could block forever, which is
what blk-mq's hot cpu notification handler was doing when nvme drives
were in use.
This has the nvme driver make the asynchronous event command's tag
reserved and does not keep the request active. We can't have more than
one since the request is released back to the request_queue before the
command is completed. Having only one avoids potential tag collisions,
and reserving the tag for this purpose prevents other admin tasks from
reusing the tag.
I also couldn't think of a scenario where issuing AEN requests single
depth is worse than issuing them in batches, so I don't think we lose
anything with this change.
As an added bonus, doing it this way removes "Cancelling I/O" warnings
observed when unbinding the nvme driver from a device.
Reported-by: Yigal Korman <yigal@plexistor.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
| |
Remove unused mask in nvme_alloc_iod
Signed-off-by: Chong Yuan <chong.yuan@memblaze.com>
Reviewed-by: Wenbo Wang <wenbo.wang@memblaze.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
| |
This fixes a race accessing an invalid address when a controller's admin
queue is in use during a reset for failure or hot removal occurs. The
admin queue will be frozen to prevent new users from entering prior to
the doorbell queue being unmapped.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
| |
Mempools created for slab caches should use
mempool_create_slab_pool().
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mempool_alloc() does not support __GFP_ZERO since elements may come from
memory that has already been released by mempool_free().
Remove __GFP_ZERO from mempool_alloc() in drbd_req_new() and properly
initialize it to 0.
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
| |
Nick Piggin is currently listed as the maintainer of BRD in MAINTAINERS,
but the mails sent to the listed address are returned as undeliverable.
Update the maintainer for BRD to be Jens Axboe, since patches for BRD
flow up through him.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
| |
Return -EBUSY if we're unable to enter a queue immediately when
allocating a blk-mq request without __GFP_WAIT.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Rename blk_mq_run_queues to blk_mq_run_hw_queues, add async argument,
and export it.
DM's suspend support must be able to run the queue without starting
stopped hw queues.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a variant of blk_mq_init_queue that allows a previously allocated
queue to be initialized. blk_mq_init_allocated_queue models
blk_init_allocated_queue -- which was also created for DM's use.
DM's approach to device creation requires a placeholder request_queue be
allocated for use with alloc_dev() but the decision about what type of
request_queue will be ultimately created is deferred until all component
devices referenced in the DM table are processed to determine the table
type (request-based, blk-mq request-based, or bio-based).
Also, because of DM's late finalization of the request_queue type
the call to blk_mq_register_disk() doesn't happen during alloc_dev().
Must export blk_mq_register_disk() so that DM can backfill the 'mq' dir
once the blk-mq queue is fully allocated.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If percpu_ref_init() fails the allocated q and hctxs must get cleaned
up; using 'err_map' doesn't allow that to happen.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ming Lei <ming.lei@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |\
| | |
| | |
| | |
| | |
| | | |
into for-linus
NBD fixes based on v4.0-rc1
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
we have already allocated memory for nbd_dev, but we were not
releasing that memory and just returning the error value.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Acked-by: Paul Clements <Paul.Clements@SteelEye.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
global_update_bandwidth() uses static variable update_time as the
timestamp for the last update but forgets to initialize it to
INITIALIZE_JIFFIES.
This means that global_dirty_limit will be 5 mins into the future on
32bit and some large amount jiffies into the past on 64bit. This
isn't critical as the only effect is that global_dirty_limit won't be
updated for the first 5 mins after booting on 32bit machines,
especially given the auxiliary nature of global_dirty_limit's role -
protecting against global dirty threshold's sudden dips; however, it
does lead to unintended suboptimal behavior. Fix it.
Fixes: c42843f2f0bb ("writeback: introduce smoothed global dirty limit")
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Merge misc fixes from Andrew Morton:
"13 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
memcg: disable hierarchy support if bound to the legacy cgroup hierarchy
mm: reorder can_do_mlock to fix audit denial
kasan, module: move MODULE_ALIGN macro into <linux/moduleloader.h>
kasan, module, vmalloc: rework shadow allocation for modules
fanotify: fix event filtering with FAN_ONDIR set
mm/nommu.c: export symbol max_mapnr
arch/c6x/include/asm/pgtable.h: define dummy pgprot_writecombine for !MMU
nilfs2: fix deadlock of segment constructor during recovery
mm: cma: fix CMA aligned offset calculation
mm, hugetlb: close race when setting PageTail for gigantic pages
mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled
drivers/rtc/rtc-s3c.c: add .needs_src_clk to s3c6410 RTC data
ocfs2: make append_dio an incompat feature
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
If the memory cgroup controller is initially mounted in the scope of the
default cgroup hierarchy and then remounted to a legacy hierarchy, it will
still have hierarchy support enabled, which is incorrect. We should
disable hierarchy support if bound to the legacy cgroup hierarchy.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
A userspace call to mmap(MAP_LOCKED) may result in the successful locking
of memory while also producing a confusing audit log denial. can_do_mlock
checks capable and rlimit. If either of these return positive
can_do_mlock returns true. The capable check leads to an LSM hook used by
apparmour and selinux which produce the audit denial. Reordering so
rlimit is checked first eliminates the denial on success, only recording a
denial when the lock is unsuccessful as a result of the denial.
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Acked-by: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Paul Cassella <cassella@cray.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
include/linux/moduleloader.h is more suitable place for this macro.
Also change alignment to PAGE_SIZE for CONFIG_KASAN=n as such
alignment already assumed in several places.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Current approach in handling shadow memory for modules is broken.
Shadow memory could be freed only after memory shadow corresponds it is no
longer used. vfree() called from interrupt context could use memory its
freeing to store 'struct llist_node' in it:
void vfree(const void *addr)
{
...
if (unlikely(in_interrupt())) {
struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
if (llist_add((struct llist_node *)addr, &p->list))
schedule_work(&p->wq);
Later this list node used in free_work() which actually frees memory.
Currently module_memfree() called in interrupt context will free shadow
before freeing module's memory which could provoke kernel crash.
So shadow memory should be freed after module's memory. However, such
deallocation order could race with kasan_module_alloc() in module_alloc().
Free shadow right before releasing vm area. At this point vfree()'d
memory is not used anymore and yet not available for other allocations.
New VM_KASAN flag used to indicate that vm area has dynamically allocated
shadow memory so kasan frees shadow only if it was previously allocated.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
With FAN_ONDIR set, the user can end up getting events, which it hasn't
marked. This was revealed with fanotify04 testcase failure on
Linux-4.0-rc1, and is a regression from 3.19, revealed with 66ba93c0d7fe6
("fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask").
# /opt/ltp/testcases/bin/fanotify04
[ ... ]
fanotify04 7 TPASS : event generated properly for type 100000
fanotify04 8 TFAIL : fanotify04.c:147: got unexpected event 30
fanotify04 9 TPASS : No event as expected
The testcase sets the adds the following marks : FAN_OPEN | FAN_ONDIR for
a fanotify on a dir. Then does an open(), followed by close() of the
directory and expects to see an event FAN_OPEN(0x20). However, the
fanotify returns (FAN_OPEN|FAN_CLOSE_NOWRITE(0x10)). This happens due to
the flaw in the check for event_mask in fanotify_should_send_event() which
does:
if (event_mask & marks_mask & ~marks_ignored_mask)
return true;
where, event_mask == (FAN_ONDIR | FAN_CLOSE_NOWRITE),
marks_mask == (FAN_ONDIR | FAN_OPEN),
marks_ignored_mask == 0
Fix this by masking the outgoing events to the user, as we already take
care of FAN_ONDIR and FAN_EVENT_ON_CHILD.
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Tested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Several modules may need max_mapnr, so export, the related error with
allmodconfig under c6x:
MODPOST 3327 modules
ERROR: "max_mapnr" [fs/pstore/ramoops.ko] undefined!
ERROR: "max_mapnr" [drivers/media/v4l2-core/videobuf2-dma-contig.ko] undefined!
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When !MMU, asm-generic will not define default pgprot_writecombine, so c6x
needs to define it by itself. The related error:
CC [M] fs/pstore/ram_core.o
fs/pstore/ram_core.c: In function 'persistent_ram_vmap':
fs/pstore/ram_core.c:399:10: error: implicit declaration of function 'pgprot_writecombine' [-Werror=implicit-function-declaration]
prot = pgprot_writecombine(PAGE_KERNEL);
^
fs/pstore/ram_core.c:399:8: error: incompatible types when assigning to type 'pgprot_t {aka struct <anonymous>}' from type 'int'
prot = pgprot_writecombine(PAGE_KERNEL);
^
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck
during recovery at mount time. The code path that caused the deadlock was
as follows:
nilfs_fill_super()
load_nilfs()
nilfs_salvage_orphan_logs()
* Do roll-forwarding, attach segment constructor for recovery,
and kick it.
nilfs_segctor_thread()
nilfs_segctor_thread_construct()
* A lock is held with nilfs_transaction_lock()
nilfs_segctor_do_construct()
nilfs_segctor_drop_written_files()
iput()
iput_final()
write_inode_now()
writeback_single_inode()
__writeback_single_inode()
do_writepages()
nilfs_writepage()
nilfs_construct_dsync_segment()
nilfs_transaction_lock() --> deadlock
This can happen if commit 7ef3ff2fea8b ("nilfs2: fix deadlock of segment
constructor over I_SYNC flag") is applied and roll-forward recovery was
performed at mount time. The roll-forward recovery can happen if datasync
write is done and the file system crashes immediately after that. For
instance, we can reproduce the issue with the following steps:
< nilfs2 is mounted on /nilfs (device: /dev/sdb1) >
# dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync
# dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k
count=1 && reboot -nfh
< the system will immediately reboot >
# mount -t nilfs2 /dev/sdb1 /nilfs
The deadlock occurs because iput() can run segment constructor through
writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags. The
above commit changed segment constructor so that it calls iput()
asynchronously for inodes with i_nlink == 0, but that change was
imperfect.
This fixes the another deadlock by deferring iput() in segment constructor
even for the case that mount is not finished, that is, for the case that
MS_ACTIVE flag is not set.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Yuxuan Shui <yshuiv7@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The CMA aligned offset calculation is incorrect for non-zero order_per_bit
values.
For example, if cma->order_per_bit=1, cma->base_pfn= 0x2f800000 and
align_order=12, the function returns a value of 0x17c00 instead of 0x400.
This patch fixes the CMA aligned offset calculation.
The previous calculation was wrong and would return too-large values for
the offset, so that when cma_alloc looks for free pages in the bitmap with
the requested alignment > order_per_bit, it starts too far into the bitmap
and so CMA allocations will fail despite there actually being plenty of
free pages remaining. It will also probably have the wrong alignment.
With this change, we will get the correct offset into the bitmap.
One affected user is powerpc KVM, which has kvm_cma->order_per_bit set to
KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, or 18 - 12 = 6.
[gregory.0xf0@gmail.com: changelog additions]
Signed-off-by: Danesh Petigara <dpetigara@broadcom.com>
Reviewed-by: Gregory Fong <gregory.0xf0@gmail.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Now that gigantic pages are dynamically allocatable, care must be taken to
ensure that p->first_page is valid before setting PageTail.
If this isn't done, then it is possible to race and have compound_head()
return NULL.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
after OOM killer is disabled if the allocation is performed by a kernel
thread. This behavior was introduced from the very beginning by
7f33d49a2ed5 ("mm, PM/Freezer: Disable OOM killer when tasks are frozen").
This means that the basic contract for the allocation request is broken
and the context requesting such an allocation might blow up unexpectedly.
There are basically two ways forward.
1) move oom_killer_disable after kernel threads are frozen. This has a
risk that the OOM victim wouldn't be able to finish because it would
depend on an already frozen kernel thread. This would be really tricky
to debug.
2) do not fail GFP_NOFAIL allocation no matter what and risk a
potential Freezable kernel threads will loop and fail the suspend.
Incidental allocations after kernel threads are frozen will at least
dump a warning - if we are lucky and the serial console is still active
of course...
This patch implements the later option because it is safer. We would see
warning rather than allocation failures for the kernel threads which would
blow up otherwise and have a higher chances to identify __GFP_NOFAIL users
from deeper pm code.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: David Rientjes <rientjes@gooogle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit df9e26d093d3 ("rtc: s3c: add support for RTC of Exynos3250 SoC")
added an "rtc_src" DT property to specify the clock used as a source to
the S3C real-time clock.
Not all SoCs needs this so commit eaf3a659086e ("drivers/rtc/rtc-s3c.c:
fix initialization failure without rtc source clock") changed to check
the struct s3c_rtc_data .needs_src_clk to conditionally grab the clock.
But that commit didn't update the data for each IP version so the RTC
broke on the boards that needs a source clock. This is the case of at
least Exynos5250 and Exynos5440 which uses the s3c6410 RTC IP block.
This commit fixes the S3C rtc on the Exynos5250 Snow and Exynos5420
Peach Pit and Pi Chromebooks.
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Olof Johansson <olof@lixom.net>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Tyler Baker <tyler.baker@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It turns out that making this feature ro_compat isn't quite enough to
prevent accidental corruption on mount from older kernels. Ocfs2 (like
other file systems) will process orphaned inodes even when the user mounts
in 'ro' mode. So for the case of a filesystem not knowing the append_dio
feature, mounting the filesystem could result in orphaned-for-dio files
being deleted, which we clearly don't want.
So instead, turn this into an incompat flag.
Btw, this is kind of my fault - initially I asked that we add a flag to
cover the feature and even suggested that we use an ro flag. It wasn't
until I was looking through our commits for v4.0-rc1 that I realized we
actually want this to be incompat.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The wrong value is being returned by change_huge_pmd since commit
10c1045f28e8 ("mm: numa: avoid unnecessary TLB flushes when setting
NUMA hinting entries") which allows a fallthrough that tries to adjust
non-existent PTEs. This patch corrects it.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"An important bugfix for the I2C subsystem core"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Revert "i2c: core: Dispose OF IRQ mapping at client removal time"
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This reverts commit e4df3a0b6228
("i2c: core: Dispose OF IRQ mapping at client removal time")
Calling irq_dispose_mapping() will destroy the mapping and disassociate
the IRQ from the IRQ chip to which it belongs. Keeping it is OK, because
existent mappings are reused properly.
Also, this commit breaks drivers using devm* for IRQ management on
OF-based systems because devm* cleanup happens in device code, after
bus's remove() method returns.
Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Reported-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[wsa: updated the commit message with findings fromt the other bug report]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Fixes: e4df3a0b6228
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Here are a couple updates for v4.0.
One fixes a config accessor problem on APM X-Gene that we introduced
when switching to generic config accessors, and the other fixes an
older read-past-end-of-buffer problem in sysfs.
APM X-Gene host bridge driver
- Add register offset to config space base address (Feng Kan)
Miscellaneous
- Don't read past the end of sysfs "driver_override" buffer (Sasha Levin)"
* tag 'pci-v4.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: xgene: Add register offset to config space base address
PCI: Don't read past the end of sysfs "driver_override" buffer
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In xgene_pcie_map_bus(), we neglected to add in the register offset when
calculating the config space address. This means all config accesses
operated on the first four bytes of config space.
Add the register offset to the config space base address.
Also correct the xgene_pcie_map_bus() prototype to fix a compiler warning.
[bhelgaas: changelog]
Fixes: 350f8be5bb40 ("PCI: xgene: Convert to use generic config accessors")
Posting: http://lkml.kernel.org/r/1424214840-26498-1-git-send-email-fkan@apm.com
Signed-off-by: Feng Kan <fkan@apm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Tanmay Inamdar <tinamdar@apm.com>
Acked-by: Rob Herring <robh@kernel.org>
|
| | |_|/
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When printing the driver_override parameter when it is 4095 and 4094 bytes
long, the printing code would access invalid memory because we need count+1
bytes for printing.
Fixes: 782a985d7af2 ("PCI: Introduce new device binding path using pci_dev.driver_override")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
CC: stable@vger.kernel.org # v3.16+
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Alexander Graf <agraf@suse.de>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull arch/microblaze fixes from Michal Simek:
"Fix syscall error recovery.
Two patches - one is just preparation patch for the second which is
fixing the problem with syscalls"
* tag 'microblaze-4.0-rc4' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Fix syscall error recovery for invalid syscall IDs
microblaze: Coding style cleanup
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch fixes two bugs in the Microblaze syscall trap handler when an invalid
syscall ID is used.
First, the range check on line 351 only checks for syscall IDs greater than
__NR_syscalls. A negative syscall ID (either passed to `syscall()` or as returned
by `do_syscall_trace_enter()` on error) will still satisfy this test and cause
the Linux kernel to access an invalid memory location and cause a kernel oops.
This has been fixed by also checking for r12 < 0.
Secondly, the current error recovery at line 378 returns using the wrong register
(r15 instead of r14) and does not restore the previous stack state. This has been
fixed by invoking `ret_from_trap` on error, setting r3 to `-ENOSYS`, similar to
what would happen when calling a valid syscall.
Signed-off-by: Jamie Garside <jamie.garside@york.ac.uk>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
|
| | |_|/
| |/| |
| | | |
| | | |
| | | |
| | | | |
No function change.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull arch/nios2 fix from Ley Foon Tan:
"Remove pt_regs from user header and use generic ucontext.h"
* tag 'nios2-fix-4.0-rc4' of git://git.rocketboards.org/linux-socfpga-next:
nios2: update pt_regs
|
| | |_|/
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | | |
Remove struct pt_regs from user header and use generic ucontext.h.
Signed-off-by: Chung-Ling Tang <cltang@codesourcery.com>
Acked-by: Ley Foon Tan <lftan@altera.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Dave Chinner reported that commit 4d9424669946 ("mm: convert
p[te|md]_mknonnuma and remaining page table manipulations") slowed down
his xfsrepair test enormously. In particular, it was using more system
time due to extra TLB flushing.
The ultimate reason turns out to be how the change to use the regular
page table accessor functions broke the NUMA grouping logic. The old
special mknuma/mknonnuma code accessed the page table present bit and
the magic NUMA bit directly, while the new code just changes the page
protections using PROT_NONE and the regular vma protections.
That sounds equivalent, and from a fault standpoint it really is, but a
subtle side effect is that the *other* protection bits of the page table
entries also change. And the code to decide how to group the NUMA
entries together used the writable bit to decide whether a particular
page was likely to be shared read-only or not.
And with the change to make the NUMA handling use the regular permission
setting functions, that writable bit was basically always cleared for
private mappings due to COW. So even if the page actually ends up being
written to in the end, the NUMA balancing would act as if it was always
shared RO.
This code is a heuristic anyway, so the fix - at least for now - is to
instead check whether the page is dirty rather than writable. The bit
doesn't change with protection changes.
NOTE! This also adds a FIXME comment to revisit this issue,
Not only should we probably re-visit the whole "is this a shared
read-only page" heuristic (we might want to take the vma permissions
into account and base this more on those than the per-page ones, and
also look at whether the particular access that triggers it is a write
or not), but the whole COW issue shows that we should think about the
NUMA fault handling some more.
For example, maybe we should do the early-COW thing that a regular fault
does. Or maybe we should accept that while using the same bits as
PROTNONE was a good thing (and got rid of the specual NUMA bit), we
might still want to just preseve the other protection bits across NUMA
faulting.
Those are bigger questions, left for later. This just fixes up the
heuristic so that it at least approximates working again. More analysis
and work needed.
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>,
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull MTD fixes from Brian Norris:
* pxa3xx_nand
- fix timeout issues when draining the FIFO (BCH only)
- don't crash when no chip-selects are used
* hisi504_nand
- depend on HAS_DMA, to fix compile errors
* tag 'for-linus-20150310' of git://git.infradead.org/linux-mtd:
mtd: nand: MTD_NAND_HISI504 should depend on HAS_DMA
mtd: pxa3xx_nand: fix driver when num_cs is 0
mtd: nand: pxa3xx: Fix PIO FIFO draining
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If NO_DMA=y:
drivers/built-in.o: In function `hisi_nfc_probe':
hisi504_nand.c:(.text+0x23e646): undefined reference to `dmam_alloc_coherent'
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
As the devicetree binding doesn't require num_cs to exist or be strictly
positive, and neither does the platform data case, a bug appear when
num_cs is set to 0 and panics the kernel.
The issue is that in alloc_nand_resource(), chip is dereferenced without
having a value assigned when num_cs == 0.
Fix this by returning ENODEV is num_cs == 0.
The panic seen is :
Unable to handle kernel NULL pointer dereference at virtual address 000002b8
pgd = c0004000
[000002b8] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT ARM
Modules linked in:
Hardware name: Marvell PXA3xx (Device Tree Support)
task: c3822aa0 ti: c3826000 task.ti: c3826000
PC is at alloc_nand_resource+0x180/0x4a8
LR is at alloc_nand_resource+0xa0/0x4a8
pc : [<c0275b90>] lr : [<c0275ab0>] psr: 68000013
sp : c3827d90 ip : 00000000 fp : 00000000
r10: c3862200 r9 : 0000005e r8 : 00000000
r7 : c3865610 r6 : c3862210 r5 : c3924210 r4 : c3862200
r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : 00000000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: 0000397f Table: 80004018 DAC: 00000035
Process swapper (pid: 1, stack limit = 0xc3826198)
Stack: (0xc3827d90 to 0xc3828000)
...zip...
[<c0275b90>] (alloc_nand_resource) from [<c0275ff8>] (pxa3xx_nand_probe+0x140/0x978)
[<c0275ff8>] (pxa3xx_nand_probe) from [<c0258c40>] (platform_drv_probe+0x48/0xa4)
[<c0258c40>] (platform_drv_probe) from [<c0257650>] (driver_probe_device+0x80/0x21c)
[<c0257650>] (driver_probe_device) from [<c0257878>] (__driver_attach+0x8c/0x90)
[<c0257878>] (__driver_attach) from [<c0255ec4>] (bus_for_each_dev+0x58/0x88)
[<c0255ec4>] (bus_for_each_dev) from [<c0256ec8>] (bus_add_driver+0xd8/0x1d4)
[<c0256ec8>] (bus_add_driver) from [<c0257f14>] (driver_register+0x78/0xf4)
[<c0257f14>] (driver_register) from [<c00088a8>] (do_one_initcall+0x80/0x1e4)
[<c00088a8>] (do_one_initcall) from [<c048ed08>] (kernel_init_freeable+0xec/0x1b4)
[<c048ed08>] (kernel_init_freeable) from [<c0377d8c>] (kernel_init+0x8/0xe4)
[<c0377d8c>] (kernel_init) from [<c00095f8>] (ret_from_fork+0x14/0x3c)
Code: e503b234 e5953008 e1530001 caffffd1 (e59002b8)
---[ end trace a5770060c8441895 ]---
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
|
| | |_|/
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The NDDB register holds the data that are needed by the read and write
commands.
However, during a read PIO access, the datasheet specifies that after each 32
bytes read in that register, when BCH is enabled, we have to make sure that the
RDDREQ bit is set in the NDSR register.
This fixes an issue that was seen on the Armada 385, and presumably other mvebu
SoCs, when a read on a newly erased page would end up in the driver reporting a
timeout from the NAND.
Cc: <stable@vger.kernel.org> # v3.14
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"The patches contain:
- fix multiple ARM IOMMU drivers to behave well when the hardware is
not present
- mark MSM driver as broken
- fix build errors with the new ARM generic io-page-table code"
* tag 'iommu-fixes-v4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/io-pgtable-arm: Add built time dependency
iommu/msm: Mark driver BROKEN
iommu/rockchip: Play nice in multi-platform builds
iommu/omap: Play nice in multi-platform builds
iommu/exynos: Play nice in multi-platform builds
iommu/io-pgtable-arm: Fix self-test WARNs on i386
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If io-pgtable-arm is an ARM-specific driver then configuration option
IOMMU_IO_PGTABLE_LPAE should not be presented to the user by default
for non-ARM kernels.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The MSM IOMMU driver unconditionally calls bus_set_iommu(), which is a
very stupid thing to do on multi-platform kernels. While marking the
driver BROKEN may seem a little extreme, there is no other way to make
the driver skip initialization. One of the problems is that it doesn't
have devicetree binding documentation and the driver doesn't contain a
struct of_device_id table either, so no way to check that it is indeed
valid to set up the IOMMU operations for this driver.
This fixes a problem on Tegra20 where the DRM driver will try to use the
obviously non-existent MSM IOMMU.
Marking the driver BROKEN shouldn't do any harm, since there aren't any
users currently. There is no struct of_device_id table, so the device
can't be instantiated from device tree, and I couldn't find any code
that would instantiate a matching platform_device either, so the driver
is effectively unused.
Reported-by: Nicolas Chauvet <kwizart@gmail.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: Daniel Walker <dwalker@fifo99.com>
Cc: Bryan Huntsman <bryanh@codeaurora.org>
Cc: Olav Haugan <ohaugan@codeaurora.org>
Acked-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|