| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
Add stacktrace support for User Mode Linux
Signed-off-by: Daniel Walter <dwalter@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Starting with Linux 3.12 processes get stuck in D state forever in
UserModeLinux under sync heavy workloads. This bug was introduced by
commit 805f11a0d5 (um: ubd: Add REQ_FLUSH suppport).
Fix bug by adding a check if FLUSH request was successfully submitted to
the I/O thread and keeping the FLUSH request on the request queue on
submission failures.
Fixes: 805f11a0d5 (um: ubd: Add REQ_FLUSH suppport)
Signed-off-by: Thorsten Knabe <linux@thorsten-knabe.de>
Cc: stable@kernel.org # >= 3.12
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) uml kernel bootmem managed through bootmem_data->node_bootmem_map,
not the struct page array, so the array is unnecessary.
2) the bootmem struct page array has been pointed by a *local* pointer,
struct page *map, in init_maps function. The array can be accessed only
in init_maps's scope. As a result, uml kernel wastes about 1% of total
memory.
Signed-off-by: Honggang Li <enjoymindful@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
|
|
|
|
|
|
|
|
|
| |
arch/x86/um/checksum_32.S had been copy & paste from x86. When build
x86 uml, csum_partial_copy_generic_i386 mess up the exception table.
In fact, exception table dose not work in uml kernel.
And csum_partial_copy_generic_i386 never been called. So, delete it.
Signed-off-by: Honggang Li <enjoymindful@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu updates from Tejun Heo:
"A lot of activities on percpu front. Notable changes are...
- percpu allocator now can take @gfp. If @gfp doesn't contain
GFP_KERNEL, it tries to allocate from what's already available to
the allocator and a work item tries to keep the reserve around
certain level so that these atomic allocations usually succeed.
This will replace the ad-hoc percpu memory pool used by
blk-throttle and also be used by the planned blkcg support for
writeback IOs.
Please note that I noticed a bug in how @gfp is interpreted while
preparing this pull request and applied the fix 6ae833c7fe0c
("percpu: fix how @gfp is interpreted by the percpu allocator")
just now.
- percpu_ref now uses longs for percpu and global counters instead of
ints. It leads to more sparse packing of the percpu counters on
64bit machines but the overhead should be negligible and this
allows using percpu_ref for refcnting pages and in-memory objects
directly.
- The switching between percpu and single counter modes of a
percpu_ref is made independent of putting the base ref and a
percpu_ref can now optionally be initialized in single or killed
mode. This allows avoiding percpu shutdown latency for cases where
the refcounted objects may be synchronously created and destroyed
in rapid succession with only a fraction of them reaching fully
operational status (SCSI probing does this when combined with
blk-mq support). It's also planned to be used to implement forced
single mode to detect underflow more timely for debugging.
There's a separate branch percpu/for-3.18-consistent-ops which cleans
up the duplicate percpu accessors. That branch causes a number of
conflicts with s390 and other trees. I'll send a separate pull
request w/ resolutions once other branches are merged"
* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
percpu: fix how @gfp is interpreted by the percpu allocator
blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
percpu_ref: add PERCPU_REF_INIT_* flags
percpu_ref: decouple switching to percpu mode and reinit
percpu_ref: decouple switching to atomic mode and killing
percpu_ref: add PCPU_REF_DEAD
percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
percpu_ref: replace pcpu_ prefix with percpu_
percpu_ref: minor code and comment updates
percpu_ref: relocate percpu_ref_reinit()
Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
Revert "percpu: free percpu allocation info for uniprocessor system"
percpu-refcount: make percpu_ref based on longs instead of ints
percpu-refcount: improve WARN messages
percpu: fix locking regression in the failure path of pcpu_alloc()
percpu-refcount: add @gfp to percpu_ref_init()
proportions: add @gfp to init functions
percpu_counter: add @gfp to percpu_counter_init()
percpu_counter: make percpu_counters_lock irq-safe
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When @gfp is specified, the percpu allocator is interested in whether
it contains all of GFP_KERNEL or not. If it does, the normal
allocation path is taken; otherwise, the atomic allocation path.
Unfortunately, pcpu_alloc() was incorrectly testing for whether @gfp
contains any part of GFP_KERNEL.
Fix it by testing "(gfp & GFP_KERNEL) != GFP_KERNEL" instead of
"!(gfp & GFP_KERNEL)" to decide whether the allocation should be
atomic or not.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
blk-mq uses percpu_ref for its usage counter which tracks the number
of in-flight commands and used to synchronously drain the queue on
freeze. percpu_ref shutdown takes measureable wallclock time as it
involves a sched RCU grace period. This means that draining a blk-mq
takes measureable wallclock time. One would think that this shouldn't
matter as queue shutdown should be a rare event which takes place
asynchronously w.r.t. userland.
Unfortunately, SCSI probing involves synchronously setting up and then
tearing down a lot of request_queues back-to-back for non-existent
LUNs. This means that SCSI probing may take above ten seconds when
scsi-mq is used.
[ 0.949892] scsi host0: Virtio SCSI HBA
[ 1.007864] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 1.1. PQ: 0 ANSI: 5
[ 1.021299] scsi 0:0:1:0: Direct-Access QEMU QEMU HARDDISK 1.1. PQ: 0 ANSI: 5
[ 1.520356] tsc: Refined TSC clocksource calibration: 2491.910 MHz
<stall>
[ 16.186549] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 16.190478] sd 0:0:1:0: Attached scsi generic sg1 type 0
[ 16.194099] osd: LOADED open-osd 0.2.1
[ 16.203202] sd 0:0:0:0: [sda] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
[ 16.208478] sd 0:0:0:0: [sda] Write Protect is off
[ 16.211439] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 16.218771] sd 0:0:1:0: [sdb] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
[ 16.223264] sd 0:0:1:0: [sdb] Write Protect is off
[ 16.225682] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
This is also the reason why request_queues start in bypass mode which
is ended on blk_register_queue() as shutting down a fully functional
queue also involves a RCU grace period and the queues for non-existent
SCSI devices never reach registration.
blk-mq basically needs to do the same thing - start the mq in a
degraded mode which is faster to shut down and then make it fully
functional only after the queue reaches registration. percpu_ref
recently grew facilities to force atomic operation until explicitly
switched to percpu mode, which can be used for this purpose. This
patch makes blk-mq initialize q->mq_usage_counter in atomic mode and
switch it to percpu mode only once blk_register_queue() is reached.
Note that this issue was previously worked around by 0a30288da1ae
("blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during
probe") for v3.17. The temp fix was reverted in preparation of adding
persistent atomic mode to percpu_ref by 9eca80461a45 ("Revert "blk-mq,
percpu_ref: implement a kludge for SCSI blk-mq stall during probe"").
This patch and the prerequisite percpu_ref changes will be merged
during v3.18 devel cycle.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christoph Hellwig <hch@infradead.org>
Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
Fixes: add703fda981 ("blk-mq: use percpu_ref for mq usage count")
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, a percpu_ref which is initialized with
PERPCU_REF_INIT_ATOMIC or switched to atomic mode via
switch_to_atomic() automatically reverts to percpu mode on the first
percpu_ref_reinit(). This makes the atomic mode difficult to use for
cases where a percpu_ref is used as a persistent on/off switch which
may be cycled multiple times.
This patch makes such atomic state sticky so that it survives through
kill/reinit cycles. After this patch, atomic state is cleared only by
an explicit percpu_ref_switch_to_percpu() call.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
With the recent addition of percpu_ref_reinit(), percpu_ref now can be
used as a persistent switch which can be turned on and off repeatedly
where turning off maps to killing the ref and waiting for it to drain;
however, there currently isn't a way to initialize a percpu_ref in its
off (killed and drained) state, which can be inconvenient for certain
persistent switch use cases.
Similarly, percpu_ref_switch_to_atomic/percpu() allow dynamic
selection of operation mode; however, currently a newly initialized
percpu_ref is always in percpu mode making it impossible to avoid the
latency overhead of switching to atomic mode.
This patch adds @flags to percpu_ref_init() and implements the
following flags.
* PERCPU_REF_INIT_ATOMIC : start ref in atomic mode
* PERCPU_REF_INIT_DEAD : start ref killed and drained
These flags should be able to serve the above two use cases.
v2: target_core_tpg.c conversion was missing. Fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
percpu_ref has treated the dropping of the base reference and
switching to atomic mode as an integral operation; however, there's
nothing inherent tying the two together.
The use cases for percpu_ref have been expanding continuously. While
the current init/kill/reinit/exit model can cover a lot, the coupling
of kill/reinit with atomic/percpu mode switching is turning out to be
too restrictive for use cases where many percpu_refs are created and
destroyed back-to-back with only some of them reaching extended
operation. The coupling also makes implementing always-atomic debug
mode difficult.
This patch separates out percpu mode switching into
percpu_ref_switch_to_percpu() and reimplements percpu_ref_reinit() on
top of it.
* DEAD still requires ATOMIC. A dead ref can't be switched to percpu
mode w/o going through reinit.
v2: __percpu_ref_switch_to_percpu() was missing static. Fixed.
Reported by Fengguang aka kbuild test robot.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
percpu_ref has treated the dropping of the base reference and
switching to atomic mode as an integral operation; however, there's
nothing inherent tying the two together.
The use cases for percpu_ref have been expanding continuously. While
the current init/kill/reinit/exit model can cover a lot, the coupling
of kill/reinit with atomic/percpu mode switching is turning out to be
too restrictive for use cases where many percpu_refs are created and
destroyed back-to-back with only some of them reaching extended
operation. The coupling also makes implementing always-atomic debug
mode difficult.
This patch separates out atomic mode switching into
percpu_ref_switch_to_atomic() and reimplements
percpu_ref_kill_and_confirm() on top of it.
* The handling of __PERCPU_REF_ATOMIC and __PERCPU_REF_DEAD is now
differentiated. Among get/put operations, percpu_ref_tryget_live()
is the only one which cares about DEAD.
* percpu_ref_switch_to_atomic() can be called multiple times on the
same ref. This means that multiple @confirm_switch may get queued
up which we can't do reliably without extra memory area. This is
handled by making the later invocation synchronously wait for the
completion of the previous one. This isn't particularly desirable
but such synchronous waits shouldn't happen in most cases.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
percpu_ref will be restructured so that percpu/atomic mode switching
and reference killing are dedoupled. In preparation, add
PCPU_REF_DEAD and PCPU_REF_ATOMIC_DEAD which is OR of ATOMIC and DEAD.
For now, ATOMIC and DEAD are changed together and all PCPU_REF_ATOMIC
uses are converted to PCPU_REF_ATOMIC_DEAD without causing any
behavior changes.
percpu_ref_init() now specifies an explicit alignment when allocating
the percpu counters so that the pointer has enough unused low bits to
accomodate the flags. Note that one flag was fine as min alignment
for percpu memory is 2 bytes but two flags are already too many for
the natural alignment of unsigned longs on archs like cris and m68k.
v2: The original patch had BUILD_BUG_ON() which triggers if unsigned
long's alignment isn't enough to accomodate the flags, which
triggered on cris and m64k. percpu_ref_init() updated to specify
the required alignment explicitly. Reported by Fengguang.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
Cc: kbuild test robot <fengguang.wu@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
percpu_ref will be restructured so that percpu/atomic mode switching
and reference killing are dedoupled. In preparation, do the following
renames.
* percpu_ref->confirm_kill -> percpu_ref->confirm_switch
* __PERCPU_REF_DEAD -> __PERCPU_REF_ATOMIC
* __percpu_ref_alive() -> __ref_is_percpu()
This patch is pure rename and doesn't introduce any functional
changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
percpu_ref uses pcpu_ prefix for internal stuff and percpu_ for
externally visible ones. This is the same convention used in the
percpu allocator implementation. It works fine there but percpu_ref
doesn't have too much internal-only stuff and scattered usages of
pcpu_ prefix are confusing than helpful.
This patch replaces all pcpu_ prefixes with percpu_. This is pure
rename and there's no functional change. Note that PCPU_REF_DEAD is
renamed to __PERCPU_REF_DEAD to signify that the flag is internal.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* Some comments became stale. Updated.
* percpu_ref_tryget() unnecessarily initializes @ret. Removed.
* A blank line removed from percpu_ref_kill_rcu().
* Explicit function name in a WARN format string replaced with __func__.
* WARN_ON() in percpu_ref_reinit() converted to WARN_ON_ONCE().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
percpu_ref is gonna go through restructuring. Move
percpu_ref_reinit() after percpu_ref_kill_and_confirm(). This will
make later changes easier to follow and result in cleaner
organization.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
probe"
This reverts commit 0a30288da1aec914e158c2d7a3482a85f632750f, which
was a temporary fix for SCSI blk-mq stall issue. The following
patches will fix the issue properly by introducing atomic mode to
percpu_ref.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block into for-3.18
This is to receive 0a30288da1ae ("blk-mq, percpu_ref: implement a
kludge for SCSI blk-mq stall during probe") which implements
__percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The
commit reverted and patches to implement proper fix will be added.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This reverts commit 3189eddbcafc ("percpu: free percpu allocation info for
uniprocessor system").
The commit causes a hang with a crisv32 image. This may be an architecture
problem, but at least for now the revert is necessary to be able to boot a
crisv32 image.
Cc: Tejun Heo <tj@kernel.org>
Cc: Honggang Li <enjoymindful@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 3189eddbcafc ("percpu: free percpu allocation info for uniprocessor system")
Cc: stable@vger.kernel.org # Please don't apply 3189eddbcafc
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
percpu_ref is currently based on ints and the number of refs it can
cover is (1 << 31). This makes it impossible to use a percpu_ref to
count memory objects or pages on 64bit machines as it may overflow.
This forces those users to somehow aggregate the references before
contributing to the percpu_ref which is often cumbersome and sometimes
challenging to get the same level of performance as using the
percpu_ref directly.
While using ints for the percpu counters makes them pack tighter on
64bit machines, the possible gain from using ints instead of longs is
extremely small compared to the overall gain from per-cpu operation.
This patch makes percpu_ref based on longs so that it can be used to
directly count memory objects or pages.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
percpu_ref's WARN messages can be a lot more helpful by indicating
who's the culprit. Make them report the release function that the
offending percpu-refcount is associated with. This should make it a
lot easier to track down the reported invalid refcnting operations.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
While updating locking, b38d08f3181c ("percpu: restructure locking")
broke pcpu_create_chunk() creation path in pcpu_alloc(). It returns
without releasing pcpu_alloc_mutex. Fix it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Percpu allocator now supports allocation mask. Add @gfp to
percpu_ref_init() so that !GFP_KERNEL allocation masks can be used
with percpu_refs too.
This patch doesn't make any functional difference.
v2: blk-mq conversion was missing. Updated.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Jens Axboe <axboe@kernel.dk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Percpu allocator now supports allocation mask. Add @gfp to
[flex_]proportions init functions so that !GFP_KERNEL allocation masks
can be used with them too.
This patch doesn't make any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Percpu allocator now supports allocation mask. Add @gfp to
percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
with percpu_counters too.
We could have left percpu_counter_init() alone and added
percpu_counter_init_gfp(); however, the number of users isn't that
high and introducing _gfp variants to all percpu data structures would
be quite ugly, so let's just do the conversion. This is the one with
the most users. Other percpu data structures are a lot easier to
convert.
This patch doesn't make any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: x86@kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
percpu_counter is scheduled to grow @gfp support to allow atomic
initialization. This patch makes percpu_counters_lock irq-safe so
that it can be safely used from atomic contexts.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The percpu allocator now supports atomic allocations by only
allocating from already populated areas but the mechanism to ensure
that there's adequate amount of populated areas was missing.
This patch expands pcpu_balance_work so that in addition to freeing
excess free chunks it also populates chunks to maintain an adequate
level of populated areas. pcpu_alloc() schedules pcpu_balance_work if
the amount of free populated areas is too low or after an atomic
allocation failure.
* PERPCU_DYNAMIC_RESERVE is increased by two pages to account for
PCPU_EMPTY_POP_PAGES_LOW.
* pcpu_async_enabled is added to gate both async jobs -
chunk->map_extend_work and pcpu_balance_work - so that we don't end
up scheduling them while the needed subsystems aren't up yet.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
pcpu_reclaim_work will also be used to populate chunks asynchronously.
Rename it to pcpu_balance_work in preparation. pcpu_reclaim() is
renamed to pcpu_balance_workfn() and some of its local variables are
renamed too.
This is pure rename.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
pcpu_nr_empty_pop_pages counts the number of empty populated pages
across all chunks and chunk->nr_populated counts the number of
populated pages in a chunk. Both will be used to implement pre/async
population for atomic allocations.
pcpu_chunk_[de]populated() are added to update chunk->populated,
chunk->nr_populated and pcpu_nr_empty_pop_pages together. All
successful chunk [de]populations should be followed by the
corresponding pcpu_chunk_[de]populated() calls.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
An allocation attempt may require extending chunk->map array which
requires GFP_KERNEL context which isn't available for atomic
allocations. This patch ensures that chunk->map array usually keeps
some amount of available space by directly allocating buffer space
during GFP_KERNEL allocations and scheduling async extension during
atomic ones. This should make atomic allocation failures from map
space exhaustion rare.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Now that pcpu_alloc_area() can allocate only from populated areas,
it's easy to add atomic allocation support to [__]alloc_percpu().
Update pcpu_alloc() so that it accepts @gfp and skips all the blocking
operations and allocates only from the populated areas if @gfp doesn't
contain GFP_KERNEL. New interface functions [__]alloc_percpu_gfp()
are added.
While this means that atomic allocations are possible, this isn't
complete yet as there's no mechanism to ensure that certain amount of
populated areas is kept available and atomic allocations may keep
failing under certain conditions.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The next patch will conditionalize the population block in
pcpu_alloc() which will end up making a rather large indentation
change obfuscating the actual logic change. This patch puts the block
under "if (true)" so that the next patch can avoid indentation
changes. The defintions of the local variables which are used only in
the block are moved into the block.
This patch is purely cosmetic.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Update pcpu_alloc_area() so that it can skip unpopulated areas if the
new parameter @pop_only is true. This is implemented by a new
function, pcpu_fit_in_area(), which determines the amount of head
padding considering the alignment and populated state.
@pop_only is currently always false but this will be used to implement
atomic allocation.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
At first, the percpu allocator required a sleepable context for both
alloc and free paths and used pcpu_alloc_mutex to protect everything.
Later, pcpu_lock was introduced to protect the index data structure so
that the free path can be invoked from atomic contexts. The
conversion only updated what's necessary and left most of the
allocation path under pcpu_alloc_mutex.
The percpu allocator is planned to add support for atomic allocation
and this patch restructures locking so that the coverage of
pcpu_alloc_mutex is further reduced.
* pcpu_alloc() now grab pcpu_alloc_mutex only while creating a new
chunk and populating the allocated area. Everything else is now
protected soley by pcpu_lock.
After this change, multiple instances of pcpu_extend_area_map() may
race but the function already implements sufficient synchronization
using pcpu_lock.
This also allows multiple allocators to arrive at new chunk
creation. To avoid creating multiple empty chunks back-to-back, a
new chunk is created iff there is no other empty chunk after
grabbing pcpu_alloc_mutex.
* pcpu_lock is now held while modifying chunk->populated bitmap.
After this, all data structures are protected by pcpu_lock.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
percpu-km instantiates the whole chunk on creation and doesn't make
use of chunk->populated bitmap and leaves it as zero. While this
currently doesn't cause any problem, the inconsistency makes it
difficult to build further logic on top of chunk->populated. This
patch makes percpu-km fill chunk->populated on creation so that the
bitmap is always consistent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Previously, pcpu_[de]populate_chunk() were called with the range which
may contain multiple target regions in it and
pcpu_[de]populate_chunk() iterated over the regions. This has the
benefit of batching up cache flushes for all the regions; however,
we're planning to add more bookkeeping logic around [de]population to
support atomic allocations and this delegation of iterations gets in
the way.
This patch moves the region iterations out of
pcpu_[de]populate_chunk() into its callers - pcpu_alloc() and
pcpu_reclaim() - so that we can later add logic to track more states
around them. This change may make cache and tlb flushes more frequent
but multi-region [de]populations are rare anyway and if this actually
becomes a problem, it's not difficult to factor out cache flushes as
separate callbacks which are directly invoked from percpu.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
percpu-vm and percpu-km implement separate versions of
pcpu_[de]populate_chunk() and some part which is or should be common
are currently in the specific implementations. Make the following
changes.
* Allocate area clearing is moved from the pcpu_populate_chunk()
implementations to pcpu_alloc(). This makes percpu-km's version
noop.
* Quick exit tests in pcpu_[de]populate_chunk() of percpu-vm are moved
to their respective callers so that they are applied to percpu-km
too. This doesn't make any meaningful difference as both functions
are noop for percpu-km; however, this is more consistent and will
help implementing atomic allocation support.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
pcpu_get_pages() creates the temp pages array if not already allocated
and returns the pointer to it. As the function is called from both
[de]population paths and depopulation can only happen after at least
one successful population, the param doesn't make any difference - the
allocation will always happen on the population path anyway.
Remove @may_alloc from pcpu_get_pages(). Also, add an lockdep
assertion pcpu_alloc_mutex instead of vaguely stating that the
exclusion is the caller's responsibility.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
percpu-vm uses pcpu_get_pages_and_bitmap() to acquire temp pages array
and populated bitmap and uses the two during [de]population. The temp
bitmap is used only to build the new bitmap that is copied to
chunk->populated after the operation succeeds; however, the new bitmap
can be trivially set after success without using the temp bitmap.
This patch removes the temp populated bitmap usage from percpu-vm.c.
* pcpu_get_pages_and_bitmap() is renamed to pcpu_get_pages() and no
longer hands out the temp bitmap.
* @populated arugment is dropped from all the related functions.
@populated updates in pcpu_[un]map_pages() are dropped.
* Two loops in pcpu_map_pages() are merged.
* pcpu_[de]populated_chunk() modify chunk->populated bitmap directly
from @page_start and @page_end after success.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"Nothing too interesting. Just a handful of cleanup patches"
* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Revert "cgroup: remove redundant variable in cgroup_mount()"
cgroup: remove redundant variable in cgroup_mount()
cgroup: fix missing unlock in cgroup_release_agent()
cgroup: remove CGRP_RELEASABLE flag
perf/cgroup: Remove perf_put_cgroup()
cgroup: remove redundant check in cgroup_ino()
cpuset: simplify proc_cpuset_show()
cgroup: simplify proc_cgroup_show()
cgroup: use a per-cgroup work for release agent
cgroup: remove bogus comments
cgroup: remove redundant code in cgroup_rmdir()
cgroup: remove some useless forward declarations
cgroup: fix a typo in comment.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This reverts commit 0c7bf3e8cab7900e17ce7f97104c39927d835469.
If there are child cgroups in the cgroupfs and then we umount it,
the superblock will be destroyed but the cgroup_root will be kept
around. When we mount it again, cgroup_mount() will find this
cgroup_root and allocate a new sb for it.
So with this commit we will be trapped in a dead loop in the case
described above, because kernfs_pin_sb() keeps returning NULL.
Currently I don't see how we can avoid using both pinned_sb and
new_sb, so just revert it.
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Reported-by: Andrey Wagin <avagin@gmail.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Both pinned_sb and new_sb indicate if a new superblock is needed,
so we can just remove new_sb.
Note now we must check if kernfs_tryget_sb() returns NULL, because
when it returns NULL, kernfs_mount() may still re-use an existing
superblock, which is just allocated by another concurent mount.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The patch 971ff4935538: "cgroup: use a per-cgroup work for release
agent" from Sep 18, 2014, leads to the following static checker
warning:
kernel/cgroup.c:5310 cgroup_release_agent()
warn: 'mutex:&cgroup_mutex' is sometimes locked here and sometimes unlocked.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We call put_css_set() after setting CGRP_RELEASABLE flag in
cgroup_task_migrate(), but in other places we call it without setting
the flag. I don't see the necessity of this flag.
Moreover once the flag is set, it will never be cleared, unless writing
to the notify_on_release control file, so it can be quite confusing
if we look at the output of debug.releasable.
# mount -t cgroup -o debug xxx /cgroup
# mkdir /cgroup/child
# cat /cgroup/child/debug.releasable
0 <-- shows 0 though the cgroup is empty
# echo $$ > /cgroup/child/tasks
# cat /cgroup/child/debug.releasable
0
# echo $$ > /cgroup/tasks && echo $$ > /cgroup/child/tasks
# cat /proc/child/debug.releasable
1 <-- shows 1 though the cgroup is not empty
This patch removes the flag, and now debug.releasable shows if the
cgroup is empty or not.
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit 5a17f543ed68 ("cgroup: improve css_from_dir() into css_tryget_from_dir()")
removed perf_tryget_cgroup(), so let's also remove perf_put_cgroup().
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
After we implemented default unified hierarchy, cgrp->kn can never
be NULL.
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Use the ONE macro instead of REG, and we can simplify proc_cpuset_show().
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Use the ONE macro instead of REG, and we can simplify proc_cgroup_show().
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Instead of using a global work to schedule release agent on removable
cgroups, we change to use a per-cgroup work to do this, which makes
the code much simpler.
v2: use a dedicated work instead of reusing css->destroy_work. (Tejun)
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We never grab cgroup mutex in fork and exit paths no matter whether
notify_on_release is set or not.
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|