| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After merging the block tree, today's linux-next build (powerpc ppc64_defconfig)
failed like this:
fs/nilfs2/the_nilfs.c: In function 'nilfs_discard_segments':
fs/nilfs2/the_nilfs.c:673: error: 'DISCARD_FL_BARRIER' undeclared (first use in this function)
Caused by commit fbd9b09a177a481eda256447c881f014f29034fe ("blkdev:
generalize flags for blkdev_issue_fn functions") interacting with commit
e902ec9906e844f4613fa6190c6fa65f162dc86e ("nilfs2: issue discard request
after cleaning segments") (which netered Linus' tree on about March 4 -
before v2.6.34-rc1).
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
| |
Just cast the page size to sector_t, that will always fit.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
|
| |
CFQ_GROUP_IOSCHED=n
We should return the cfq_group for this case, not NULL.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
|
|
|
| |
- Add bio_batch helper primitive. This is rather generic primitive
for submitting/waiting a complex request which consists of several
bios.
- blkdev_issue_zeroout() generate number of zero filed write bios.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
|
|
| |
Move blkdev_issue_discard from blk-barrier.c because it is
not barrier related.
Later the file will be populated by other helpers.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
| |
In some places caller don't want to wait a request to complete.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
|
| |
The patch just convert all blkdev_issue_xxx function to common
set of flags. Wait/allocation semantics preserved.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, device claiming for exclusive open is done after low level
open - disk->fops->open() - has completed successfully. This means
that exclusive open attempts while a device is already exclusively
open will fail only after disk->fops->open() is called.
cdrom driver issues commands during open() which means that O_EXCL
open attempt can unintentionally inject commands to in-progress
command stream for burning thus disturbing burning process. In most
cases, this doesn't cause problems because the first command to be
issued is TUR which most devices can process in the middle of burning.
However, depending on how a device replies to TUR during burning,
cdrom driver may end up issuing further commands.
This can't be resolved trivially by moving bd_claim() before doing
actual open() because that means an open attempt which will end up
failing could interfere other legit O_EXCL open attempts.
ie. unconfirmed open attempts can fail others.
This patch resolves the problem by introducing claiming block which is
started by bd_start_claiming() and terminated either by bd_claim() or
bd_abort_claiming(). bd_claim() from inside a claiming block is
guaranteed to succeed and once a claiming block is started, other
bd_start_claiming() or bd_claim() attempts block till the current
claiming block is terminated.
bd_claim() can still be used standalone although now it always
synchronizes against claiming blocks, so the existing users will keep
working without any change.
blkdev_open() and open_bdev_exclusive() are converted to use claiming
blocks so that exclusive open attempts from these functions don't
interfere with the existing exclusive open.
This problem was discovered while investigating bko#15403.
https://bugzilla.kernel.org/show_bug.cgi?id=15403
The burning problem itself can be resolved by updating userspace
probing tools to always open w/ O_EXCL.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Matthias-Christian Ott <ott@mirix.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
|
|
| |
Factor out bd_may_claim() from bd_claim(), add comments and apply a
couple of cosmetic edits. This is to prepare for further updates to
claim path.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes few usability and configurability issues.
o All the cgroup based controller options are configurable from
"Genral Setup/Control Group Support/" menu. blkio is the only exception.
Hence make this option visible in above menu and make it configurable from
there to bring it inline with rest of the cgroup based controllers.
o Get rid of CONFIG_DEBUG_CFQ_IOSCHED.
This option currently does two things.
- Enable printing of cgroup paths in blktrace
- Enables CONFIG_DEBUG_BLK_CGROUP, which in turn displays additional stat
files in cgroup.
If we are using group scheduling, blktrace data is of not really much use
if cgroup information is not present. To get this data, currently one has to
also enable CONFIG_DEBUG_CFQ_IOSCHED, which in turn brings the overhead of
all the additional debug stat files which is not desired.
Hence, this patch moves printing of cgroup paths under
CONFIG_CFQ_GROUP_IOSCHED.
This allows us to get rid of CONFIG_DEBUG_CFQ_IOSCHED completely. Now all
the debug stat files are controlled only by CONFIG_DEBUG_BLK_CGROUP which
can be enabled through config menu.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Divyesh Shah <dpshah@google.com>
Reviewed-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
o Once in a while, I was hitting a BUG_ON() in blkio code. empty_time was
assuming that upon slice expiry, group can't be marked empty already (except
forced dispatch).
But this assumption is broken if cfqq can move (group_isolation=0) across
groups after receiving a request.
I think most likely in this case we got a request in a cfqq and accounted
the rq in one group, later while adding the cfqq to tree, we moved the queue
to a different group which was already marked empty and after dispatch from
slice we found group already marked empty and raised alarm.
This patch does not error out if group is already marked empty. This can
introduce some empty_time stat error only in case of group_isolation=0. This
is better than crashing. In case of group_isolation=1 we should still get
same stats as before this patch.
[ 222.308546] ------------[ cut here ]------------
[ 222.309311] kernel BUG at block/blk-cgroup.c:236!
[ 222.309311] invalid opcode: 0000 [#1] SMP
[ 222.309311] last sysfs file: /sys/devices/virtual/block/dm-3/queue/scheduler
[ 222.309311] CPU 1
[ 222.309311] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[ 222.309311]
[ 222.309311] Pid: 4780, comm: fio Not tainted 2.6.34-rc4-blkio-config #68 0A98h/HP xw8600 Workstation
[ 222.309311] RIP: 0010:[<ffffffff8121ad88>] [<ffffffff8121ad88>] blkiocg_set_start_empty_time+0x50/0x83
[ 222.309311] RSP: 0018:ffff8800ba6e79f8 EFLAGS: 00010002
[ 222.309311] RAX: 0000000000000082 RBX: ffff8800a13b7990 RCX: ffff8800a13b7808
[ 222.309311] RDX: 0000000000002121 RSI: 0000000000000082 RDI: ffff8800a13b7a30
[ 222.309311] RBP: ffff8800ba6e7a18 R08: 0000000000000000 R09: 0000000000000001
[ 222.309311] R10: 000000000002f8c8 R11: ffff8800ba6e7ad8 R12: ffff8800a13b78ff
[ 222.309311] R13: ffff8800a13b7990 R14: 0000000000000001 R15: ffff8800a13b7808
[ 222.309311] FS: 00007f3beec476f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
[ 222.309311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 222.309311] CR2: 000000000040e7f0 CR3: 00000000a12d5000 CR4: 00000000000006e0
[ 222.309311] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 222.309311] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 222.309311] Process fio (pid: 4780, threadinfo ffff8800ba6e6000, task ffff8800b3d6bf00)
[ 222.309311] Stack:
[ 222.309311] 0000000000000001 ffff8800bab17a48 ffff8800bab17a48 ffff8800a13b7800
[ 222.309311] <0> ffff8800ba6e7a68 ffffffff8121da35 ffff880000000001 00ff8800ba5c5698
[ 222.309311] <0> ffff8800ba6e7a68 ffff8800a13b7800 0000000000000000 ffff8800bab17a48
[ 222.309311] Call Trace:
[ 222.309311] [<ffffffff8121da35>] __cfq_slice_expired+0x2af/0x3ec
[ 222.309311] [<ffffffff8121fd7b>] cfq_dispatch_requests+0x2c8/0x8e8
[ 222.309311] [<ffffffff8120f1cd>] ? spin_unlock_irqrestore+0xe/0x10
[ 222.309311] [<ffffffff8120fb1a>] ? blk_insert_cloned_request+0x70/0x7b
[ 222.309311] [<ffffffff81210461>] blk_peek_request+0x191/0x1a7
[ 222.309311] [<ffffffffa0002799>] dm_request_fn+0x38/0x14c [dm_mod]
[ 222.309311] [<ffffffff810ae61f>] ? sync_page_killable+0x0/0x35
[ 222.309311] [<ffffffff81210fd4>] __generic_unplug_device+0x32/0x37
[ 222.309311] [<ffffffff81211274>] generic_unplug_device+0x2e/0x3c
[ 222.309311] [<ffffffffa00011a6>] dm_unplug_all+0x42/0x5b [dm_mod]
[ 222.309311] [<ffffffff8120ca37>] blk_unplug+0x29/0x2d
[ 222.309311] [<ffffffff8120ca4d>] blk_backing_dev_unplug+0x12/0x14
[ 222.309311] [<ffffffff81109a7a>] block_sync_page+0x35/0x39
[ 222.309311] [<ffffffff810ae616>] sync_page+0x41/0x4a
[ 222.309311] [<ffffffff810ae62d>] sync_page_killable+0xe/0x35
[ 222.309311] [<ffffffff8158aa59>] __wait_on_bit_lock+0x46/0x8f
[ 222.309311] [<ffffffff810ae4f5>] __lock_page_killable+0x66/0x6d
[ 222.309311] [<ffffffff81056f9c>] ? wake_bit_function+0x0/0x33
[ 222.309311] [<ffffffff810ae528>] lock_page_killable+0x2c/0x2e
[ 222.309311] [<ffffffff810afbc5>] generic_file_aio_read+0x361/0x4f0
[ 222.309311] [<ffffffff810ea044>] do_sync_read+0xcb/0x108
[ 222.309311] [<ffffffff811e42f7>] ? security_file_permission+0x16/0x18
[ 222.309311] [<ffffffff810ea6ab>] vfs_read+0xab/0x108
[ 222.309311] [<ffffffff810ea7c8>] sys_read+0x4a/0x6e
[ 222.309311] [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b
[ 222.309311] Code: 58 01 00 00 00 48 89 c6 75 0a 48 83 bb 60 01 00 00 00 74 09 48 8d bb a0 00 00 00 eb 35 41 fe cc 74 0d f6 83 c0 01 00 00 04 74 04 <0f> 0b eb fe 48 89 75 e8 e8 be e0 de ff 66 83 8b c0 01 00 00 04
[ 222.309311] RIP [<ffffffff8121ad88>] blkiocg_set_start_empty_time+0x50/0x83
[ 222.309311] RSP <ffff8800ba6e79f8>
[ 222.309311] ---[ end trace 32b4f71dffc15712 ]---
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Divyesh Shah <dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
blkio + cfq was crashing even when two sequential readers were put in two
separate cgroups (group_isolation=0).
The reason being that cfqq can migrate across groups based on its being
sync-noidle or not, it can happen that at request insertion time, cfqq
belonged to one cfqg and at request dispatch time, it belonged to root
group. In this case request stats per cgroup can go wrong and it also runs
into BUG_ON().
This patch implements rq stashing away a cfq group pointer and not relying
on cfqq->cfqg pointer alone for rq stat accounting.
[ 65.163523] ------------[ cut here ]------------
[ 65.164301] kernel BUG at block/blk-cgroup.c:117!
[ 65.164301] invalid opcode: 0000 [#1] SMP
[ 65.164301] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:60:00.1/host9/rport-9:0-0/target9:0:0/9:0:0:2/block/sde/stat
[ 65.164301] CPU 1
[ 65.164301] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[ 65.164301]
[ 65.164301] Pid: 4505, comm: fio Not tainted 2.6.34-rc4-blk-for-35 #34 0A98h/HP xw8600 Workstation
[ 65.164301] RIP: 0010:[<ffffffff8121924f>] [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf
[ 65.164301] RSP: 0018:ffff8800ba5a79e8 EFLAGS: 00010046
[ 65.164301] RAX: 0000000000000096 RBX: ffff8800bb268d60 RCX: 0000000000000000
[ 65.164301] RDX: ffff8800bb268eb8 RSI: 0000000000000000 RDI: ffff8800bb268e00
[ 65.164301] RBP: ffff8800ba5a7a08 R08: 0000000000000064 R09: 0000000000000001
[ 65.164301] R10: 0000000000079640 R11: ffff8800a0bd5bf0 R12: ffff8800bab4af01
[ 65.164301] R13: ffff8800bab4af00 R14: ffff8800bb1d8928 R15: 0000000000000000
[ 65.164301] FS: 00007f18f75056f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
[ 65.164301] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 65.164301] CR2: 000000000040e7f0 CR3: 00000000ba52b000 CR4: 00000000000006e0
[ 65.164301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 65.164301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 65.164301] Process fio (pid: 4505, threadinfo ffff8800ba5a6000, task ffff8800ba45ae80)
[ 65.164301] Stack:
[ 65.164301] ffff8800ba5a7a08 ffff8800ba722540 ffff8800bab4af68 ffff8800bab4af68
[ 65.164301] <0> ffff8800ba5a7a38 ffffffff8121d814 ffff8800ba722540 ffff8800bab4af68
[ 65.164301] <0> ffff8800ba722540 ffff8800a08f6800 ffff8800ba5a7a68 ffffffff8121d8ca
[ 65.164301] Call Trace:
[ 65.164301] [<ffffffff8121d814>] cfq_remove_request+0xe4/0x116
[ 65.164301] [<ffffffff8121d8ca>] cfq_dispatch_insert+0x84/0xe1
[ 65.164301] [<ffffffff8121e833>] cfq_dispatch_requests+0x767/0x8e8
[ 65.164301] [<ffffffff8120e524>] ? submit_bio+0xc3/0xcc
[ 65.164301] [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35
[ 65.164301] [<ffffffff8120ea8d>] blk_peek_request+0x191/0x1a7
[ 65.164301] [<ffffffffa000109c>] ? dm_get_live_table+0x44/0x4f [dm_mod]
[ 65.164301] [<ffffffffa0002799>] dm_request_fn+0x38/0x14c [dm_mod]
[ 65.164301] [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35
[ 65.164301] [<ffffffff8120f600>] __generic_unplug_device+0x32/0x37
[ 65.164301] [<ffffffff8120f8a0>] generic_unplug_device+0x2e/0x3c
[ 65.164301] [<ffffffffa00011a6>] dm_unplug_all+0x42/0x5b [dm_mod]
[ 65.164301] [<ffffffff8120b063>] blk_unplug+0x29/0x2d
[ 65.164301] [<ffffffff8120b079>] blk_backing_dev_unplug+0x12/0x14
[ 65.164301] [<ffffffff81108a82>] block_sync_page+0x35/0x39
[ 65.164301] [<ffffffff810ad64e>] sync_page+0x41/0x4a
[ 65.164301] [<ffffffff810ad665>] sync_page_killable+0xe/0x35
[ 65.164301] [<ffffffff81589027>] __wait_on_bit_lock+0x46/0x8f
[ 65.164301] [<ffffffff810ad52d>] __lock_page_killable+0x66/0x6d
[ 65.164301] [<ffffffff81055fd4>] ? wake_bit_function+0x0/0x33
[ 65.164301] [<ffffffff810ad560>] lock_page_killable+0x2c/0x2e
[ 65.164301] [<ffffffff810aebfd>] generic_file_aio_read+0x361/0x4f0
[ 65.164301] [<ffffffff810e906c>] do_sync_read+0xcb/0x108
[ 65.164301] [<ffffffff811e32a3>] ? security_file_permission+0x16/0x18
[ 65.164301] [<ffffffff810e96d3>] vfs_read+0xab/0x108
[ 65.164301] [<ffffffff810e97f0>] sys_read+0x4a/0x6e
[ 65.164301] [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b
[ 65.164301] Code: 00 74 1c 48 8b 8b 60 01 00 00 48 85 c9 75 04 0f 0b eb fe 48 ff c9 48 89 8b 60 01 00 00 eb 1a 48 8b 8b 58 01 00 00 48 85 c9 75 04 <0f> 0b eb fe 48 ff c9 48 89 8b 58 01 00 00 45 84 e4 74 16 48 8b
[ 65.164301] RIP [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf
[ 65.164301] RSP <ffff8800ba5a79e8>
[ 65.164301] ---[ end trace 1b2b828753032e68 ]---
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
|
| |
This fixes the lockdep warning reported by Gui Jianfeng.
Signed-off-by: Divyesh Shah <dpshah@google.com>
Reviewed-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After merging the block tree, 20100414's linux-next build (x86_64
allmodconfig) failed like this:
ERROR: "get_gendisk" [block/blk-cgroup.ko] undefined!
ERROR: "sched_clock" [block/blk-cgroup.ko] undefined!
This happens because the two symbols aren't exported and hence not available
when blk-cgroup code is built as a module. I've tried to stay consistent with
the use of EXPORT_SYMBOL or EXPORT_SYMBOL_GPL with the other symbols in the
respective files.
Signed-off-by: Divyesh Shah <dpshah@google.com>
Acked-by: Gui Jianfeng <guijianfeng@cn.fujitsu.cn>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
blk_rq_timed_out_timer() relied on blk_add_timer() never returning a
timer value of zero, but commit 7838c15b8dd18e78a523513749e5b54bda07b0cb
removed the code that bumped this value when it was zero.
Therefore when jiffies is near wrap we could get unlucky & not set the
timeout value correctly.
This patch uses a flag to indicate that the timeout value was set and so
handles jiffies wrap correctly, and it keeps all the logic in one
function so should be easier to maintain in the future.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
|
|
|
|
|
|
| |
Fixes compile errors in blk-cgroup code for empty_time stat and a merge fix in
CFQ. The first error was when CONFIG_DEBUG_CFQ_IOSCHED is not set.
Signed-off-by: Divyesh Shah <dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|\
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
block/blk-cgroup.c
block/cfq-iosched.c
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
| | |
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* anonvma:
anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma
anon_vma: clone the anon_vma chain in the right order
vma_adjust: fix the copying of anon_vma chains
Simplify and comment on anon_vma re-use for anon_vma_prepare()
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Otherwise we might be mapping in a page in a new mapping, but that page
(through the swapcache) would later be mapped into an old mapping too.
The page->mapping must be the case that works for everybody, not just
the mapping that happened to page it in first.
Here's the scenario:
- page gets allocated/mapped by process A. Let's call the anon_vma we
associate the page with 'A' to keep it easy to track.
- Process A forks, creating process B. The anon_vma in B is 'B', and has
a chain that looks like 'B' -> 'A'. Everything is fine.
- Swapping happens. The page (with mapping pointing to 'A') gets swapped
out (perhaps not to disk - it's enough to assume that it's just not
mapped any more, and lives entirely in the swap-cache)
- Process B pages it in, which goes like this:
do_swap_page ->
page = lookup_swap_cache(entry);
...
set_pte_at(mm, address, page_table, pte);
page_add_anon_rmap(page, vma, address);
And think about what happens here!
In particular, what happens is that this will now be the "first"
mapping of that page, so page_add_anon_rmap() used to do
if (first)
__page_set_anon_rmap(page, vma, address);
and notice what anon_vma it will use? It will use the anon_vma for
process B!
What happens then? Trivial: process 'A' also pages it in (nothing
happens, it's not the first mapping), and then process 'B' execve's
or exits or unmaps, making anon_vma B go away.
End result: process A has a page that points to anon_vma B, but
anon_vma B does not exist any more. This can go on forever. Forget
about RCU grace periods, forget about locking, forget anything like
that. The bug is simply that page->mapping points to an anon_vma
that was correct at one point, but was _not_ the one that was shared
by all users of that possible mapping.
Changing it to always use the deepest anon_vma in the anonvma chain gets
us to the safest model.
This can be improved in certain cases: if we know the page is private to
just this particular mapping (for example, it's a new page, or it is the
only swapcache entry), we could pick the top (most specific) anon_vma.
But that's a future optimization. Make it _work_ reliably first.
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "What do you know, I think you fixed it!" ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We want to walk the chain in reverse order when cloning it, so that the
order of the result chain will be the same as the order in the source
chain. When we add entries to the chain, they go at the head of the
chain, so we want to add the source head last.
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "No, it still oopses" ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When we move the boundaries between two vma's due to things like
mprotect, we need to make sure that the anon_vma of the pages that got
moved from one vma to another gets properly copied around. And that was
not always the case, in this rather hard-to-follow code sequence.
Clarify the code, and fix it so that it copies the anon_vma from the
right source.
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "Yeah, not so much this one either" ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This changes the anon_vma reuse case to require that we only reuse
simple anon_vma's - ie the case when the vma only has a single anon_vma
associated with it.
This means that a reuse of an anon_vma from an adjacent vma will always
guarantee that both vma's are associated not only with the same
anon_vma, they will also have the same anon_vma chain (of just a single
entry in this case).
And since anon_vma re-use was the only case where the same anon_vma
might be associated with different chains of anon_vma's, we now have the
case that every vma that shares the same anon_vma will always also have
the same chain. That makes it much easier to think about merging vma's
that share the same anon_vma's: you can always just drop the other
anon_vma chain in anon_vma_merge() since you know that they are always
identical.
This also splits up the function to validate the anon_vma re-use, and
adds a lot of commentary about the possible races.
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "That didn't fix it" ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* master.kernel.org:/home/rmk/linux-2.6-arm: (21 commits)
ARM: Fix ioremap_cached()/ioremap_wc() for SMP platforms
ARM: 6043/1: AT91 slow-clock resume: Don't wait for a disabled PLL to lock
ARM: 6031/1: fix Thumb-2 decompressor
ARM: 6029/1: ep93xx: gpio.c: local functions should be static
ARM: 6028/1: ARM: add MAINTAINERS for U300
ARM: 6024/1: bcmring: fix missing down on semaphore in dma.c
MXC: mach_armadillo5x0: Add USB Host support.
ARM mach-mx3: duplicated include
ARM mach-mx3: duplicated include
imx31: add watchdog device on litekit board.
imx3: Add watchdog platform device support
MXC: mach-mx31_3ds: add support for freescale mc13783 power management device.
MXC: mach-mx31_3ds: Add SPI1 device support.
MXC: mach-mx31_3ds: Add support for on board NAND Flash.
MXC: mach-mx31_3ds: Update variable names over recent mach name modification.
imx31: fix parent clock for rtc
i.MX51: remove NFC AXI static mapping
i.MX51: determine silicon revision dynamically
i.MX51: map TZIC dynamically
i.MX51: Use correct clock for gpt
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Write combining/cached device mappings are not setting the shared bit,
which could potentially cause problems on SMP systems since the cache
lines won't participate in the cache coherency protocol.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
at91 slow-clock resume: Don't wait for a disabled PLL to lock.
We run into this problem with the PLLB on the at91: ohci-at91 disables
the PLLB when going to suspend. The slowclock code however tries to do
the same: It saves the PLLB register value and when restoring the value
during resume, it waits for the PLLB to lock again. However the PLL will
never lock and the loop would run into its timeout because the slowclock
code just stored and restored an empty register.
This fixes the problem by only restoring PLLA/PLLB when they were enabled
at suspend time.
Cc: Andrew Victor <avictor.za@gmail.com>
Signed-off-by: Anders Larsen <al@alarsen.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | |\ \
| | | | |
| | | | |
| | | | |
| | | | | |
Conflicts:
arch/arm/mach-mx3/mach-pcm037.c
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This add USB Host capability. The Armadillo 500 board is supplied
with two USB Host connectors driven by the USB OTG and USB Host 2
ports, through two NXP isp 1504 transceivers.
Signed-off-by: Alberto Panizzo <maramaopercheseimorto@gmail.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
arch/arm/mach-mx3/mx31lite-db.c: linux/platform_device.h is included more than once.
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
arch/arm/mach-mx3/mach-pcm037.c: linux/fsl_devices.h is included more than once.
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch adds support for SoC build-in watchdog device on litekit
board.
Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch adds support for build-in watchdog device found on
Freescale imx31 and imx35 SoCs.
Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | |\ \ |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This area contains the Nand Flash controller registers. There
is no need to map them statically as the Nand driver uses ioremap().
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Freescale redboot passes the silicon revision via
ATAG_REVISION. Remove this bootloader dependency by
doing the same as redboot does in the Kernel.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This looks cleaner and allows us to call mx51_revision
later when we can use ioremap to determine the silicon
revision dynamically.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The gpt uses the ipg clock, not ipg_perclk
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Power Gates must to be always enabled.
Signed-off-by: Alberto Panizzo <maramaopercheseimorto@gmail.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Signed-off-by: Alberto Panizzo <maramaopercheseimorto@gmail.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Since the using of Bad Block Table is not constantly a good behave
I had made it configurable.
Signed-off-by: Alberto Panizzo <maramaopercheseimorto@gmail.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Signed-off-by: Alberto Panizzo <maramaopercheseimorto@gmail.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
According to imx31 reference manual the signal from external low
frequency clock is sent to RTC clock.
The patch makes redundant the previously defined mxc_rtc clock.
Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Daniel Mack <daniel@caiaq.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
98e12b5a6e05413 ("ARM: Fix decompressor's kernel size estimation for
ROM=y") broke the Thumb-2 decompressor because it added an entry in the
LC0 table but didn't adjust the offset the Thumb-2 code uses to load the
SP from that table. Fix it.
Cc: stable <stable@kernel.org>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The functions ep93xx_gpio_update_int_params and ep93xx_gpio_int_mask
are not exported and should be static. This was overlooked when
moving the code from core.c.
Also, change a comment to better indicate what the code is for.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This adds myself as maintainer of the U300 machine and associated
system-on-chip drivers.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Added missing down on the memMap->lock semaphore. Also fixed a return
statement so that we always exit with an up (i.e. early exit via return
is not allowed)
Signed-off-by: Leo Hao Chen <leochen@broadcom.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: make sure the chunk allocator doesn't create zero length chunks
Btrfs: fix data enospc check overflow
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
A recent commit allowed for smaller chunks to be created, but didn't
make sure they were always bigger than a stripe. After some divides,
this led to zero length stripes.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Because we account for reserved space we get from the allocator before we
actually account for allocating delalloc space, we can have a small window where
the amount of "used" space in a space_info is more than the total amount of
space in the space_info. This will cause a overflow in our check, so it will
seem like we have _tons_ of free space, and we'll allow reservations to occur
that will end up larger than the amount of space we have. I've seen users
report ENOSPC panic's in cow_file_range a few times recently, so I tried to
reproduce this problem and found I could reproduce it if I ran one of my tests
in a loop for like 20 minutes. With this patch my test ran all night without
issues. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|