| Commit message (Collapse) | Author | Age |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* 'unicore32' of git://github.com/gxt/linux:
rtc-puv3: solve section mismatch in rtc-puv3.c
rtc-puv3: using module_platform_driver()
i2c-puv3: using module_platform_driver()
rtc-puv3: irq: remove IRQF_DISABLED
unicore32: Remove IRQF_DISABLED
unicore32: Use set_current_blocked()
unicore32: add ioremap_nocache definition
unicore32: delete specified xlate_dev_mem_ptr
of: add include asm/setup.h in drivers/of/fdt.c
unicore32: standardize /proc/iomem "Kernel code" name
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The patch renames puv3_rtcdrv to puv3_rtc_driver, so that modpost will know
that this is simply a list of pointers to driver functions, in which case
the section mismatch is OK. (Thanks Michal Marek)
Cc: Axel Lin <axel.lin@gmail.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: rtc-linux@googlegroups.com
Signed-off-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
--
Section mismatch warning information:
WARNING: drivers/rtc/built-in.o(.data+0x90): Section mismatch in
reference from the variable puv3_rtcdrv to the
function .devinit.text:puv3_rtc_probe()
The variable puv3_rtcdrv references
the function __devinit puv3_rtc_probe()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the
variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one,
*_console
WARNING: drivers/rtc/built-in.o(.data+0x94): Section mismatch in
reference from the variable puv3_rtcdrv to the
function .devexit.text:puv3_rtc_remove()
The variable puv3_rtcdrv references
the function __devexit puv3_rtc_remove()
If the reference is valid then annotate the
variable with __exit* (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one,
*_console
WARNING: drivers/built-in.o(.data+0x6c04): Section mismatch in reference
from the variable puv3_rtcdrv to the
function .devinit.text:puv3_rtc_probe()
The variable puv3_rtcdrv references
the function __devinit puv3_rtc_probe()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the
variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one,
*_console
WARNING: drivers/built-in.o(.data+0x6c08): Section mismatch in reference
from the variable puv3_rtcdrv to the
function .devexit.text:puv3_rtc_remove()
The variable puv3_rtcdrv references
the function __devexit puv3_rtc_remove()
If the reference is valid then annotate the
variable with __exit* (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one,
*_console
WARNING: vmlinux.o(.data+0x1126c): Section mismatch in reference from
the variable puv3_rtcdrv to the function .devinit.text:puv3_rtc_probe()
The variable puv3_rtcdrv references
the function __devinit puv3_rtc_probe()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the
variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one,
*_console
WARNING: vmlinux.o(.data+0x11270): Section mismatch in reference from
the variable puv3_rtcdrv to the function .devexit.text:puv3_rtc_remove()
The variable puv3_rtcdrv references
the function __devexit puv3_rtc_remove()
If the reference is valid then annotate the
variable with __exit* (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one,
*_console
|
| |
| |
| |
| |
| |
| |
| |
| | |
This patch converts the driver to use the module_platform_driver()
macro which makes the code smaller and a bit simpler.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This patch converts the driver to use the module_platform_driver()
macro which makes the code smaller and a bit simpler.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
|
| |
| |
| |
| |
| |
| |
| | |
This flag is deprecated, so is removed now.
Signed-off-by: Yong Zhang <yong.zhang@gmail.com>
Signed-off-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
|
| |
| |
| |
| |
| |
| | |
This flag is a NOOP and can be removed now.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As described in e6fa16ab ("signal: sigprocmask() should do
retarget_shared_pending()") the modification of current->blocked is
incorrect as we need to check whether the signal we're about to block
is pending in the shared queue.
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Bugfix for following error messages:
lib/iomap.c: In function 'pci_iomap':
lib/iomap.c:274: error: implicit declaration of function 'ioremap_nocache'
lib/iomap.c:274: warning: return makes pointer from integer without a cast
Also see commit <f1ecc69838a2d7c8a3e1909f637d4083c071777d>
it will hide the ioremap_nocache function for systems with an MMU
Signed-off-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jonas Bonn <jonas@southpole.se>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For the commit <f1ecc69838a2d7c8a3e1909f637d4083c071777d> has changed
xlate_dev_mem_ptr definition in asm-generic/io.h for the systems with
an MMU, so delete it from unicore32 specified io.h.
Signed-off-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jonas Bonn <jonas@southpole.se>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In the file drivers/of/fdt.c, it uses the COMMAND_LINE_SIZE which is stated
in asm/setup.h, so asm/setup.h should be included in drivers/of/fdt.c.
Signed-off-by: Yu Yue <yuyue@mprc.pku.edu.cn>
Signed-off-by: Guan Xuetao <guanxuetao@mprc.pku.edu.cn>
Cc: Grant Likerly <grant.likely@secretlab.ca>
Cc: devicetree-discuss@lists.ozlabs.org
Cc: Arnd Bergmann <arnd@arndb.de>
|
| |
| |
| |
| |
| |
| |
| |
| | |
All other ports use "Kernel code" to identify the Kernel text segment
in /proc/iomem. Change the unicore32 resources to do the same.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/lliubbo/blackfin
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lliubbo/blackfin:
blackfin: bf561: add adv7183 capture support
blackfin: bf537: add capture support
blackfin: bf548: add capture support
blackfin: time-ts: rm unused func broadcast_timer_setup()
blackfin: i2c-lcd: change default clock rate
blackfin: mac: dsa: add vlan mask in board file
blackfin: bf537: change num_chipselect for spi-sport
blackfin: serial: bfin-uart: remove unused field
bf54x: get mem size: missing break in switch
blackfin: smp: fix msg queue overflow issue
blackfin: config: update macro SPI_BFIN in board file
blackfin: config: update def config for all boards
blackfin: smp: cleanup smp code
blackfin: smp: add suspend and wakeup irq flags
blackfin: bf533-stamp: add missed patches for new asoc driver
blackfin: bf533-stamp: fix ad1836 name
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Scott Jiang <scott.jiang.linux@gmail.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Scott Jiang <scott.jiang.linux@gmail.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Scott Jiang <scott.jiang.linux@gmail.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
broadcast_timer_setup() has no user now, drop it.
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Change default clock rate of GPIO based I2C operation for BF533
and BF561 to bring up the I2C interface LCD display
Signed-off-by: Aaron Wu <Aaron.Wu@analog.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Else push file through ftp/rcp will fail.
Signed-off-by: Steven Miao <realmz6@gmail.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
spi sport driver can support any gpio as its cs pin.
Signed-off-by: Scott Jiang <scott.jiang.linux@gmail.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Remove unused field for hardware flow control.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Steven Miao <realmz6@gmail.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
disable preemption when icache flush and use wait mode cross call, hold the msg
queue lock while handle cross function call to avoid overflow
Signed-off-by: Steven Miao <realmz6@gmail.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Macro name for spi controller driver has been modified, so update default
board file accordingly.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Config name for spi controller driver has been modified in previous commit,
so update default config accordingly.
Signed-off-by: Scott Jiang <scott.jiang.linux@gmail.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
move idle task point to percpu blackfin_cpudata and add smp_timer_broadcast
interface.
enable SUPPLE_1_WAKEUP and add BFIN_IPI_TIMER ipi support.
Signed-off-by: Steven Miao <realmz6@gmail.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add IRQF_NO_SUSPEND | IRQF_FORCE_RESUME to irq flags, supplement irq should
not be disabled when system do suspend.
Signed-off-by: Steven Miao <realmz6@gmail.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ASoC drivers changed between 2.6.37 and 3.0, but we didn't apply these changes
in bf533 board file.
So apply missed patches for asoc since 2.6.37 together.
Signed-off-by: Scott Jiang <scott.jiang.linux@gmail.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The ASoC codec name is "ad1836" and not "ad183x" as the change to rename
things ultimately did not get merged.
Signed-off-by: Scott Jiang <scott.jiang.linux@gmail.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c
writeback: balanced_rate cannot exceed write bandwidth
writeback: do strict bdi dirty_exceeded
writeback: avoid tiny dirty poll intervals
writeback: max, min and target dirty pause time
writeback: dirty ratelimit - think time compensation
btrfs: fix dirtied pages accounting on sub-page writes
writeback: fix dirtied pages accounting on redirty
writeback: fix dirtied pages accounting on sub-page writes
writeback: charge leaked page dirties to active tasks
writeback: Include all dirty inodes in background writeback
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fix compile error
fs/fs-writeback.c:515:33: error: ‘PAGE_CACHE_SHIFT’ undeclared (first use in this function)
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add an upper limit to balanced_rate according to the below inequality.
This filters out some rare but huge singular points, which at least
enables more readable gnuplot figures.
When there are N dd dirtiers,
balanced_dirty_ratelimit = write_bw / N
So it holds that
balanced_dirty_ratelimit <= write_bw
The singular points originate from dirty_rate in the below formular:
balanced_dirty_ratelimit = task_ratelimit * write_bw / dirty_rate
where
dirty_rate = (number of page dirties in the past 200ms) / 200ms
In the extreme case, if all dd tasks suddenly get blocked on something
else and hence no pages are dirtied at all, dirty_rate will be 0 and
balanced_dirty_ratelimit will be inf. This could happen in reality.
Note that these huge singular points are not a real threat, since they
are _guaranteed_ to be filtered out by the
min(balanced_dirty_ratelimit, task_ratelimit)
line in bdi_update_dirty_ratelimit(). task_ratelimit is based on the
number of dirty pages, which will never _suddenly_ fly away like
balanced_dirty_ratelimit. So any weirdly large balanced_dirty_ratelimit
will be cut down to the level of task_ratelimit.
There won't be tiny singular points though, as long as the dirty pages
lie inside the dirty throttling region (above the freerun region).
Because there the dd tasks will be throttled by balanced_dirty_pages()
and won't be able to suddenly dirty much more pages than average.
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This helps to reduce dirty throttling polls and hence CPU overheads.
bdi->dirty_exceeded typically only helps when suddenly starting 100+
dd's on a disk, in which case the dd's may need to poll
balance_dirty_pages() earlier than tsk->nr_dirtied_pause.
CC: Jan Kara <jack@suse.cz>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The LKP tests see big 56% regression for the case fio_mmap_randwrite_64k.
Shaohua manages to root cause it to be the much smaller dirty pause times
and hence much more frequent invocations to the IO-less balance_dirty_pages().
Since fio_mmap_randwrite_64k effectively contains both reads and writes,
the more frequent pauses triggered more idling in the cfq IO scheduler.
The solution is to increase pause time all the way up to the max 200ms
in this case, which is found to restore most performance. This will help
reduce CPU overheads in other cases, too.
Note that I don't expect many performance critical workloads to run this
access pattern: the mmap read-on-write is rather inefficient and could
be avoided by doing normal writes syscalls.
CC: Jan Kara <jack@suse.cz>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Li Shaohua <shaohua.li@intel.com>
Tested-by: Li Shaohua <shaohua.li@intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Control the pause time and the call intervals to balance_dirty_pages()
with three parameters:
1) max_pause, limited by bdi_dirty and MAX_PAUSE
2) the target pause time, grows with the number of dd tasks
and is normally limited by max_pause/2
3) the minimal pause, set to half the target pause
and is used to skip short sleeps and accumulate them into bigger ones
The typical behaviors after patch:
- if ever task_ratelimit is far below dirty_ratelimit, the pause time
will remain constant at max_pause and nr_dirtied_pause will be
fluctuating with task_ratelimit
- in the normal cases, nr_dirtied_pause will remain stable (keep in the
same pace with dirty_ratelimit) and the pause time will be fluctuating
with task_ratelimit
In summary, someone has to fluctuate with task_ratelimit, because
task_ratelimit = nr_dirtied_pause / pause
We normally prefer a stable nr_dirtied_pause, until reaching max_pause.
The notable behavior changes are:
- in stable workloads, there will no longer be sudden big trajectory
switching of nr_dirtied_pause as concerned by Peter. It will be as
smooth as dirty_ratelimit and changing proportionally with it (as
always, assuming bdi bandwidth does not fluctuate across 2^N lines,
otherwise nr_dirtied_pause will show up in 2+ parallel trajectories)
- in the rare cases when something keeps task_ratelimit far below
dirty_ratelimit, the smoothness can no longer be retained and
nr_dirtied_pause will be "dancing" with task_ratelimit. This fixes a
(not that destructive but still not good) bug that
dirty_ratelimit gets brought down undesirably
<= balanced_dirty_ratelimit is under estimated
<= weakly executed task_ratelimit
<= pause goes too large and gets trimmed down to max_pause
<= nr_dirtied_pause (based on dirty_ratelimit) is set too large
<= dirty_ratelimit being much larger than task_ratelimit
- introduce min_pause to avoid small pause sleeps
- when pause is trimmed down to max_pause, try to compensate it at the
next pause time
The "refactor" type of changes are:
The max_pause equation is slightly transformed to make it slightly more
efficient.
We now scale target_pause by (N * 10ms) on 2^N concurrent tasks, which
is effectively equal to the original scaling max_pause by (N * 20ms)
because the original code does implicit target_pause ~= max_pause / 2.
Based on the same implicit ratio, target_pause starts with 10ms on 1 dd.
CC: Jan Kara <jack@suse.cz>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Compensate the task's think time when computing the final pause time,
so that ->dirty_ratelimit can be executed accurately.
think time := time spend outside of balance_dirty_pages()
In the rare case that the task slept longer than the 200ms period time
(result in negative pause time), the sleep time will be compensated in
the following periods, too, if it's less than 1 second.
Accumulated errors are carefully avoided as long as the max pause area
is not hitted.
Pseudo code:
period = pages_dirtied / task_ratelimit;
think = jiffies - dirty_paused_when;
pause = period - think;
1) normal case: period > think
pause = period - think
dirty_paused_when = jiffies + pause
nr_dirtied = 0
period time
|===============================>|
think time pause time
|===============>|==============>|
------|----------------|---------------|------------------------
dirty_paused_when jiffies
2) no pause case: period <= think
don't pause; reduce future pause time by:
dirty_paused_when += period
nr_dirtied = 0
period time
|===============================>|
think time
|===================================================>|
------|--------------------------------+-------------------|----
dirty_paused_when jiffies
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When doing 1KB sequential writes to the same page,
balance_dirty_pages_ratelimited_nr() should be called once instead of 4
times, the latter makes the dirtier tasks be throttled much too heavy.
Fix it with proper de-accounting on clear_page_dirty_for_io().
CC: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
De-account the accumulative dirty counters on page redirty.
Page redirties (very common in ext4) will introduce mismatch between
counters (a) and (b)
a) NR_DIRTIED, BDI_DIRTIED, tsk->nr_dirtied
b) NR_WRITTEN, BDI_WRITTEN
This will introduce systematic errors in balanced_rate and result in
dirty page position errors (ie. the dirty pages are no longer balanced
around the global/bdi setpoints).
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When dd in 512bytes, generic_perform_write() calls
balance_dirty_pages_ratelimited() 8 times for the same page, but
obviously the page is only dirtied once.
Fix it by accounting tsk->nr_dirtied and bdp_ratelimits at page dirty time.
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It's a years long problem that a large number of short-lived dirtiers
(eg. gcc instances in a fast kernel build) may starve long-run dirtiers
(eg. dd) as well as pushing the dirty pages to the global hard limit.
The solution is to charge the pages dirtied by the exited gcc to the
other random dirtying tasks. It sounds not perfect, however should
behave good enough in practice, seeing as that throttled tasks aren't
actually running so those that are running are more likely to pick it up
and get throttled, therefore promoting an equal spread.
Randy: fix compile error: 'dirty_throttle_leaks' undeclared in exit.c
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Current livelock avoidance code makes background work to include only inodes
that were dirtied before background writeback has started. However background
writeback can be running for a long time and thus excluding newly dirtied
inodes can eventually exclude significant portion of dirty inodes making
background writeback inefficient. Since background writeback avoids livelocking
the flusher thread by yielding to any other work, there is no real reason why
background work should not include all dirty inodes so change the logic in
wb_writeback().
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Andrew elucidates:
- First installmeant of MM. We have a HUGE number of MM patches this
time. It's crazy.
- MAINTAINERS updates
- backlight updates
- leds
- checkpatch updates
- misc ELF stuff
- rtc updates
- reiserfs
- procfs
- some misc other bits
* akpm: (124 commits)
user namespace: make signal.c respect user namespaces
workqueue: make alloc_workqueue() take printf fmt and args for name
procfs: add hidepid= and gid= mount options
procfs: parse mount options
procfs: introduce the /proc/<pid>/map_files/ directory
procfs: make proc_get_link to use dentry instead of inode
signal: add block_sigmask() for adding sigmask to current->blocked
sparc: make SA_NOMASK a synonym of SA_NODEFER
reiserfs: don't lock root inode searching
reiserfs: don't lock journal_init()
reiserfs: delay reiserfs lock until journal initialization
reiserfs: delete comments referring to the BKL
drivers/rtc/interface.c: fix alarm rollover when day or month is out-of-range
drivers/rtc/rtc-twl.c: add DT support for RTC inside twl4030/twl6030
drivers/rtc/: remove redundant spi driver bus initialization
drivers/rtc/rtc-jz4740.c: make jz4740_rtc_driver static
drivers/rtc/rtc-mc13xxx.c: make mc13xxx_rtc_idtable static
rtc: convert drivers/rtc/* to use module_platform_driver()
drivers/rtc/rtc-wm831x.c: convert to devm_kzalloc()
drivers/rtc/rtc-wm831x.c: remove unused period IRQ handler
...
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
user namespace. (new, thanks Oleg)
__send_signal: convert current's uid to the recipient's user namespace
for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
again :)
do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
user namespace
ptrace_signal maps parent's uid into current's user namespace before
including in signal to current. IIUC Oleg has argued that this shouldn't
matter as the debugger will play with it, but it seems like not converting
the value currently being set is misleading.
Changelog:
Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
simplify callers and help make clear what we are translating
(which uid into which namespace). Passing the target task would
make callers even easier to read, but we pass in user_ns because
current_user_ns() != task_cred_xxx(current, user_ns).
Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
in ptrace_signal().
Sep 23: In send_signal(), detect when (user) signal is coming from an
ancestor or unrelated user namespace. Pass that on to __send_signal,
which sets si_uid to 0 or overflowuid if needed.
Oct 12: Base on Oleg's fixup_uid() patch. On top of that, handle all
SI_FROMKERNEL cases at callers, because we can't assume sender is
current in those cases.
Nov 10: (mhelsley) rename fixup_uid to more meaningful usern_fixup_signal_uid
Nov 10: (akpm) make the !CONFIG_USER_NS case clearer
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
From: Serge Hallyn <serge.hallyn@canonical.com>
Subject: __send_signal: pass q->info, not info, to userns_fixup_signal_uid (v2)
Eric Biederman pointed out that passing info is a bug and could lead to a
NULL pointer deref to boot.
A collection of signal, securebits, filecaps, cap_bounds, and a few other
ltp tests passed with this kernel.
Changelog:
Nov 18: previous patch missed a leading '&'
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
From: Dan Carpenter <dan.carpenter@oracle.com>
Subject: ipc/mqueue: lock() => unlock() typo
There was a double lock typo introduced in b085f4bd6b21 "user namespace:
make signal.c respect user namespaces"
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
alloc_workqueue() currently expects the passed in @name pointer to remain
accessible. This is inconvenient and a bit silly given that the whole wq
is being dynamically allocated. This patch updates alloc_workqueue() and
friends to take printf format string instead of opaque string and matching
varargs at the end. The name is allocated together with the wq and
formatted.
alloc_ordered_workqueue() is converted to a macro to unify varargs
handling with alloc_workqueue(), and, while at it, add comment to
alloc_workqueue().
None of the current in-kernel users pass in string with '%' as constant
name and this change shouldn't cause any problem.
[akpm@linux-foundation.org: use __printf]
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add support for mount options to restrict access to /proc/PID/
directories. The default backward-compatible "relaxed" behaviour is left
untouched.
The first mount option is called "hidepid" and its value defines how much
info about processes we want to be available for non-owners:
hidepid=0 (default) means the old behavior - anybody may read all
world-readable /proc/PID/* files.
hidepid=1 means users may not access any /proc/<pid>/ directories, but
their own. Sensitive files like cmdline, sched*, status are now protected
against other users. As permission checking done in proc_pid_permission()
and files' permissions are left untouched, programs expecting specific
files' modes are not confused.
hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other
users. It doesn't mean that it hides whether a process exists (it can be
learned by other means, e.g. by kill -0 $PID), but it hides process' euid
and egid. It compicates intruder's task of gathering info about running
processes, whether some daemon runs with elevated privileges, whether
another user runs some sensitive program, whether other users run any
program at all, etc.
gid=XXX defines a group that will be able to gather all processes' info
(as in hidepid=0 mode). This group should be used instead of putting
nonroot user in sudoers file or something. However, untrusted users (like
daemons, etc.) which are not supposed to monitor the tasks in the whole
system should not be added to the group.
hidepid=1 or higher is designed to restrict access to procfs files, which
might reveal some sensitive private information like precise keystrokes
timings:
http://www.openwall.com/lists/oss-security/2011/11/05/3
hidepid=1/2 doesn't break monitoring userspace tools. ps, top, pgrep, and
conky gracefully handle EPERM/ENOENT and behave as if the current user is
the only user running processes. pstree shows the process subtree which
contains "pstree" process.
Note: the patch doesn't deal with setuid/setgid issues of keeping
preopened descriptors of procfs files (like
https://lkml.org/lkml/2011/2/7/368). We rely on that the leaked
information like the scheduling counters of setuid apps doesn't threaten
anybody's privacy - only the user started the setuid program may read the
counters.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg KH <greg@kroah.com>
Cc: Theodore Tso <tytso@MIT.EDU>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: James Morris <jmorris@namei.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add support for procfs mount options. Actual mount options are coming in
the next patches.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg KH <greg@kroah.com>
Cc: Theodore Tso <tytso@MIT.EDU>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: James Morris <jmorris@namei.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This one behaves similarly to the /proc/<pid>/fd/ one - it contains
symlinks one for each mapping with file, the name of a symlink is
"vma->vm_start-vma->vm_end", the target is the file. Opening a symlink
results in a file that point exactly to the same inode as them vma's one.
For example the ls -l of some arbitrary /proc/<pid>/map_files/
| lr-x------ 1 root root 64 Aug 26 06:40 7f8f80403000-7f8f80404000 -> /lib64/libc-2.5.so
| lr-x------ 1 root root 64 Aug 26 06:40 7f8f8061e000-7f8f80620000 -> /lib64/libselinux.so.1
| lr-x------ 1 root root 64 Aug 26 06:40 7f8f80826000-7f8f80827000 -> /lib64/libacl.so.1.1.0
| lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a2f000-7f8f80a30000 -> /lib64/librt-2.5.so
| lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a30000-7f8f80a4c000 -> /lib64/ld-2.5.so
This *helps* checkpointing process in three ways:
1. When dumping a task mappings we do know exact file that is mapped
by particular region. We do this by opening
/proc/$pid/map_files/$address symlink the way we do with file
descriptors.
2. This also helps in determining which anonymous shared mappings are
shared with each other by comparing the inodes of them.
3. When restoring a set of processes in case two of them has a mapping
shared, we map the memory by the 1st one and then open its
/proc/$pid/map_files/$address file and map it by the 2nd task.
Using /proc/$pid/maps for this is quite inconvenient since it brings
repeatable re-reading and reparsing for this text file which slows down
restore procedure significantly. Also as being pointed in (3) it is a way
easier to use top level shared mapping in children as
/proc/$pid/map_files/$address when needed.
[akpm@linux-foundation.org: coding-style fixes]
[gorcunov@openvz.org: make map_files depend on CHECKPOINT_RESTORE]
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Vasiliy Kulikov <segoon@openwall.com>
Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Tejun Heo <tj@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Prepare the ground for the next "map_files" patch which needs a name of a
link file to analyse.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Abstract the code sequence for adding a signal handler's sa_mask to
current->blocked because the sequence is identical for all architectures.
Furthermore, in the past some architectures actually got this code wrong,
so introduce a wrapper that all architectures can use.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Unlike other architectures, sparc currently has no SA_NODEFER definition
but only the older SA_NOMASK. Since SA_NOMASK is the historical name for
SA_NODEFER, add SA_NODEFER and copy what other architectures do by making
SA_NOMASK a synonym for SA_NODEFER.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Nothing requires that we lock the filesystem until the root inode is
provided.
Also iget5_locked() triggers a warning because we are holding the
filesystem lock while allocating the inode, which result in a lockdep
suspicion that we have a lock inversion against the reclaim path:
[ 1986.896979] =================================
[ 1986.896990] [ INFO: inconsistent lock state ]
[ 1986.896997] 3.1.1-main #8
[ 1986.897001] ---------------------------------
[ 1986.897007] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 1986.897016] kswapd0/16 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1986.897023] (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a
[ 1986.897044] {RECLAIM_FS-ON-W} state was registered at:
[ 1986.897050] [<c014a5b9>] mark_held_locks+0xae/0xd0
[ 1986.897060] [<c014aab3>] lockdep_trace_alloc+0x7d/0x91
[ 1986.897068] [<c0190ee0>] kmem_cache_alloc+0x1a/0x93
[ 1986.897078] [<c01e7728>] reiserfs_alloc_inode+0x13/0x3d
[ 1986.897088] [<c01a5b06>] alloc_inode+0x14/0x5f
[ 1986.897097] [<c01a5cb9>] iget5_locked+0x62/0x13a
[ 1986.897106] [<c01e99e0>] reiserfs_fill_super+0x410/0x8b9
[ 1986.897114] [<c01953da>] mount_bdev+0x10b/0x159
[ 1986.897123] [<c01e764d>] get_super_block+0x10/0x12
[ 1986.897131] [<c0195b38>] mount_fs+0x59/0x12d
[ 1986.897138] [<c01a80d1>] vfs_kern_mount+0x45/0x7a
[ 1986.897147] [<c01a83e3>] do_kern_mount+0x2f/0xb0
[ 1986.897155] [<c01a987a>] do_mount+0x5c2/0x612
[ 1986.897163] [<c01a9a72>] sys_mount+0x61/0x8f
[ 1986.897170] [<c044060c>] sysenter_do_call+0x12/0x32
[ 1986.897181] irq event stamp: 7509691
[ 1986.897186] hardirqs last enabled at (7509691): [<c0190f34>] kmem_cache_alloc+0x6e/0x93
[ 1986.897197] hardirqs last disabled at (7509690): [<c0190eea>] kmem_cache_alloc+0x24/0x93
[ 1986.897209] softirqs last enabled at (7508896): [<c01294bd>] __do_softirq+0xee/0xfd
[ 1986.897222] softirqs last disabled at (7508859): [<c01030ed>] do_softirq+0x50/0x9d
[ 1986.897234]
[ 1986.897235] other info that might help us debug this:
[ 1986.897242] Possible unsafe locking scenario:
[ 1986.897244]
[ 1986.897250] CPU0
[ 1986.897254] ----
[ 1986.897257] lock(&REISERFS_SB(s)->lock);
[ 1986.897265] <Interrupt>
[ 1986.897269] lock(&REISERFS_SB(s)->lock);
[ 1986.897276]
[ 1986.897277] *** DEADLOCK ***
[ 1986.897278]
[ 1986.897286] no locks held by kswapd0/16.
[ 1986.897291]
[ 1986.897292] stack backtrace:
[ 1986.897299] Pid: 16, comm: kswapd0 Not tainted 3.1.1-main #8
[ 1986.897306] Call Trace:
[ 1986.897314] [<c0439e76>] ? printk+0xf/0x11
[ 1986.897324] [<c01482d1>] print_usage_bug+0x20e/0x21a
[ 1986.897332] [<c01479b8>] ? print_irq_inversion_bug+0x172/0x172
[ 1986.897341] [<c014855c>] mark_lock+0x27f/0x483
[ 1986.897349] [<c0148d88>] __lock_acquire+0x628/0x1472
[ 1986.897358] [<c0149fae>] lock_acquire+0x47/0x5e
[ 1986.897366] [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
[ 1986.897384] [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
[ 1986.897397] [<c043b5ef>] mutex_lock_nested+0x35/0x26f
[ 1986.897409] [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
[ 1986.897421] [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a
[ 1986.897433] [<c01e2edd>] map_block_for_writepage+0xc9/0x590
[ 1986.897448] [<c01b1706>] ? create_empty_buffers+0x33/0x8f
[ 1986.897461] [<c0121124>] ? get_parent_ip+0xb/0x31
[ 1986.897472] [<c043ef7f>] ? sub_preempt_count+0x81/0x8e
[ 1986.897485] [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d
[ 1986.897496] [<c0121124>] ? get_parent_ip+0xb/0x31
[ 1986.897508] [<c01e355d>] reiserfs_writepage+0x1b9/0x3e7
[ 1986.897521] [<c0173b40>] ? clear_page_dirty_for_io+0xcb/0xde
[ 1986.897533] [<c014a6e3>] ? trace_hardirqs_on_caller+0x108/0x138
[ 1986.897546] [<c014a71e>] ? trace_hardirqs_on+0xb/0xd
[ 1986.897559] [<c0177b38>] shrink_page_list+0x34f/0x5e2
[ 1986.897572] [<c01780a7>] shrink_inactive_list+0x172/0x22c
[ 1986.897585] [<c0178464>] shrink_zone+0x303/0x3b1
[ 1986.897597] [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d
[ 1986.897611] [<c01788c9>] kswapd+0x3b7/0x5f2
The deadlock shouldn't happen since we are doing that allocation in the
mount path, the filesystem is not available for any reclaim. Still the
warning is annoying.
To solve this, acquire the lock later only where we need it, right before
calling reiserfs_read_locked_inode() that wants to lock to walk the tree.
Reported-by: Knut Petersen <Knut_Petersen@t-online.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|