| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement support for writing to regular AFS files, including:
(1) write
(2) truncate
(3) fsync, fdatasync
(4) chmod, chown, chgrp, utime.
AFS writeback attempts to batch writes into as chunks as large as it can manage
up to the point that it writes back 65535 pages in one chunk or it meets a
locked page.
Furthermore, if a page has been written to using a particular key, then should
another write to that page use some other key, the first write will be flushed
before the second is allowed to take place. If the first write fails due to a
security error, then the page will be scrapped and reread before the second
write takes place.
If a page is dirty and the callback on it is broken by the server, then the
dirty data is not discarded (same behaviour as NFS).
Shared-writable mappings are not supported by this patch.
[akpm@linux-foundation.org: fix a bunch of warnings]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make some miscellaneous changes to the AFS filesystem:
(1) Assert RCU barriers on module exit to make sure RCU has finished with
callbacks in this module.
(2) Correctly handle the AFS server returning a zero-length read.
(3) Split out data zapping calls into one function (afs_zap_data).
(4) Rename some afs_file_*() functions to afs_*() where they apply to
non-regular files too.
(5) Be consistent about the presentation of volume ID:vnode ID in debugging
output.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
| |
Since path_walk sets the total_link_count to 0 and calls link_path_walk, we
can just call path_walk directly.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
| |
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
| |
Cleanup using simple_read_from_buffer() in binfmt_misc, configfs, and sysfs.
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
| |
/proc/pid/clear_refs is only defined in the CONFIG_MMU case, so make sure we
don't have any references to clear_refs_smap() in generic procfs code.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
| |
.. to match what we do on write(). This way, people who write to files
by using [f]truncate + writable mmap have the same semantics as if they
were using the write() family of system calls.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* git://oss.sgi.com:8090/xfs/xfs-2.6:
[XFS] Add lockdep support for XFS
[XFS] Fix race in xfs_write() b/w dmapi callout and direct I/O checks.
[XFS] Get rid of redundant "required" in msg.
[XFS] Export via a function xfs_buftarg_list for use by kdb/xfsidbg.
[XFS] Remove unused ilen variable and references.
[XFS] Fix to prevent the notorious 'NULL files' problem after a crash.
[XFS] Fix race condition in xfs_write().
[XFS] Fix uquota and oquota enforcement problems.
[XFS] propogate return codes from flush routines
[XFS] Fix quotaon syscall failures for group enforcement requests.
[XFS] Invalidate quotacheck when mounting without a quota type.
[XFS] reducing the number of random number functions.
[XFS] remove more misc. unused args
[XFS] the "aendp" arg to xfs_dir2_data_freescan is always NULL, remove it.
[XFS] The last argument "lsn" of xfs_trans_commit() is always called with
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
SGI-PV: 963965
SGI-Modid: xfs-linux-melb:xfs-kern:28485a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In xfs_write() the iolock is dropped and reacquired in XFS_SEND_DATA()
which means that the file could change from not-cached to cached and we
need to redo the direct I/O checks. We should also redo the direct I/O
checks when the file size changes regardless if O_APPEND is set or not.
SGI-PV: 963483
SGI-Modid: xfs-linux-melb:xfs-kern:28440a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
SGI-PV: 963466
SGI-Modid: xfs-linux-melb:xfs-kern:28416a
Signed-off-by: Utako Kusaka <utako@tnes.nec.co.jp>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
SGI-PV: 963465
SGI-Modid: xfs-linux-melb:xfs-kern:28414a
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
SGI-PV: 907752
SGI-Modid: xfs-linux-melb:xfs-kern:28344a
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The problem that has been addressed is that of synchronising updates of
the file size with writes that extend a file. Without the fix the update
of a file's size, as a result of a write beyond eof, is independent of
when the cached data is flushed to disk. Often the file size update would
be written to the filesystem log before the data is flushed to disk. When
a system crashes between these two events and the filesystem log is
replayed on mount the file's size will be set but since the contents never
made it to disk the file is full of holes. If some of the cached data was
flushed to disk then it may just be a section of the file at the end that
has holes.
There are existing fixes to help alleviate this problem, particularly in
the case where a file has been truncated, that force cached data to be
flushed to disk when the file is closed. If the system crashes while the
file(s) are still open then this flushing will never occur.
The fix that we have implemented is to introduce a second file size,
called the in-memory file size, that represents the current file size as
viewed by the user. The existing file size, called the on-disk file size,
is the one that get's written to the filesystem log and we only update it
when it is safe to do so. When we write to a file beyond eof we only
update the in- memory file size in the write operation. Later when the I/O
operation, that flushes the cached data to disk completes, an I/O
completion routine will update the on-disk file size. The on-disk file
size will be updated to the maximum offset of the I/O or to the value of
the in-memory file size if the I/O includes eof.
SGI-PV: 958522
SGI-Modid: xfs-linux-melb:xfs-kern:28322a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This change addresses a race in xfs_write() where, for direct I/O, the
flags need_i_mutex and need_flush are setup before the iolock is acquired.
The logic used to setup the flags may change between setting the flags and
acquiring the iolock resulting in these flags having incorrect values. For
example, if a file is not currently cached then need_i_mutex is set to
zero and then if the file is cached before the iolock is acquired we will
fail to do the flushinval before the direct write.
The flush (and also the call to xfs_zero_eof()) need to be done with the
iolock held exclusive so we need to acquire the iolock before checking for
cached data (or if the write begins after eof) to prevent this state from
changing. For direct I/O I've chosen to always acquire the iolock in
shared mode initially and if there is a need to promote it then drop it
and reacquire it.
There's also some other tidy-ups including removing the O_APPEND offset
adjustment since that work is done in generic_write_checks() (and we don't
use offset as an input parameter anywhere).
SGI-PV: 962170
SGI-Modid: xfs-linux-melb:xfs-kern:28319a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When uquota and oquota (gquota/pquota) are enabled for accounting both are
enforced if ether has enforcement active.
Conditions:
- Both XFS_UQUOTA_ACCT and XFS_GQUOTA_ACCT are enabled.
- Either XFS_UQUOTA_ENFD or XFS_OQUOTA_ENFD is enabled.
- The usage without enforce is reached at the soft limit.
Problems:
1. "repquota" shows all grace time even if no enforcement.
2. we cannot make a file over a hard limits even if no enforcement.
SGI-PV: 962291
SGI-Modid: xfs-linux-melb:xfs-kern:28272a
Signed-off-by: Kouta Ooizumi <k-ooizumi@tnes.nec.co.jp>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch handles error return values in fs_flush_pages and
fs_flushinval_pages. It changes the prototype of fs_flushinval_pages so we
can propogate the errors and handle them at higher layers. I also modified
xfs_itruncate_start so that it could propogate the error further.
SGI-PV: 961990
SGI-Modid: xfs-linux-melb:xfs-kern:28231a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Stewart Smith <stewart@flamingspork.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
xfs_qm_scall_quotaon was incorrectly failing requests to enable group
quota enforcement. Fixes logic error in OQUOTA handling.
SGI-PV: 961964
SGI-Modid: xfs-linux-melb:xfs-kern:28227a
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When quotas are mounted or remounted without a particular quota type the
quota accounting for that type becomes invalid. Previously we were
ignoring this leading to accounting errors.
SGI-PV: 961964
SGI-Modid: xfs-linux-melb:xfs-kern:28225a
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Utako Kusaka <utako@tnes.nec.co.jp>
Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Patch provided by Joe Perches
SGI-PV: 961696
SGI-Modid: xfs-linux-melb:xfs-kern:28209a
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Patch provided by Eric Sandeen.
SGI-PV: 961695
SGI-Modid: xfs-linux-melb:xfs-kern:28205a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Patch provided by Eric Sandeen.
SGI-PV: 961694
SGI-Modid: xfs-linux-melb:xfs-kern:28204a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
NULL.
Patch provided by Eric Sandeen.
SGI-PV: 961693
SGI-Modid: xfs-linux-melb:xfs-kern:28199a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | | |
* 'for-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block:
[PATCH] ll_rw_blk: fix missing bounce in blk_rq_map_kern()
[PATCH] splice: always call into page_cache_readahead()
[PATCH] splice(): fix interaction with readahead
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Don't try to guess what the read-ahead logic will do, allow it
to make its own decisions.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Eric Dumazet, thank you for disclosing this bug.
Readahead logic somehow fails to populate the page range with data.
It can be because
1) the readahead routine is not always called in the following lines of
fs/splice.c:
if (!loff || nr_pages > 1)
page_cache_readahead(mapping, &in->f_ra, in, index, nr_pages);
2) even called, page_cache_readahead() wont guarantee the pages are there.
It wont submit readahead I/O for pages already in the radix tree, or when
(ra_pages == 0), or after 256 cache hits.
In your case, it should be because of the retried reads, which lead to
excessive cache hits, and disables readahead at some time.
And that _one_ failure of readahead blocks the whole read process.
The application receives EAGAIN and retries the read, but
__generic_file_splice_read() refuse to make progress:
- in the previous invocation, it has allocated a blank page and inserted it
into the radix tree, but never has the chance to start I/O for it: the test
of SPLICE_F_NONBLOCK goes before that.
- in the retried invocation, the readahead code will neither get out of the
cache hit mode, nor will it submit I/O for an already existing page.
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
JFS: Fix race waking up jfsIO kernel thread
JFS: use __set_current_state()
Copy i_flags to jfs inode flags on write
JFS: document uid, gid, and umask mount options in jfs.txt
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It's possible for a journal I/O request to be added to the log_redrive
queue and the jfsIO thread to be awakened after the thread releases
log_redrive_lock but before it sets its state to TASK_INTERRUPTIBLE.
The jfsIO thread should set the state before giving up the spinlock, so
the waking thread will really wake it.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
use __set_current_state(TASK_*) instead of current->state = TASK_*
Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This mirrors Jan Kara's patches for ext3. This patch makes sure that
changes made to inode->i_flags are reflected on disk for jfs. It also
moves a call of jfs_set_inode_flags() to be more consistent with where
jfs_get_inode_flags() is called.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
sb_read may return NULL, let's explicitly check it.
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make UDF work correctly for files larger than 1GB. As no extent can be
longer than (1<<30)-blocksize bytes, we have to create several extents if a
big hole is being created. As a side-effect, we now don't discard
preallocated blocks when creating a hole.
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a few assertions into udf_discard_prealloc() to check that the file is
sane (mostly helps debugging further patches ;).
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make UDF use get_bh() instead of directly accessing b_count and use
brelse() instead of udf_release_data() which does just brelse()...
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Introduce a structure extent_position to store a position of an extent and
the corresponding buffer_head in one place.
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Use sector_t and loff_t for file offsets in UDF filesystem. Otherwise an
overflow may occur for long files. Also make inode_bmap() return offset in
the extent in number of blocks instead of number of bytes - for most
callers this is more convenient.
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Change the atomic_t in struct nfs_server to atomic_long_t in anticipation
of machines that can handle 8+TB of (4K) pages under writeback.
However I suspect other things in NFS will start going *bang* by then.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Implement utimensat(2) which is an extension to futimesat(2) in that it
a) supports nano-second resolution for the timestamps
b) allows to selectively ignore the atime/mtime value
c) allows to selectively use the current time for either atime or mtime
d) supports changing the atime/mtime of a symlink itself along the lines
of the BSD lutimes(3) functions
For this change the internally used do_utimes() functions was changed to
accept a timespec time value and an additional flags parameter.
Additionally the sys_utime function was changed to match compat_sys_utime
which already use do_utimes instead of duplicating the work.
Also, the completely missing futimensat() functionality is added. We have
such a function in glibc but we have to resort to using /proc/self/fd/* which
not everybody likes (chroot etc).
Test application (the syscall number will need per-arch editing):
#include <errno.h>
#include <fcntl.h>
#include <time.h>
#include <sys/time.h>
#include <stddef.h>
#include <syscall.h>
#define __NR_utimensat 280
#define UTIME_NOW ((1l << 30) - 1l)
#define UTIME_OMIT ((1l << 30) - 2l)
int
main(void)
{
int status = 0;
int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666);
if (fd == -1)
error (1, errno, "failed to create test file \"ttt\"");
struct stat64 st1;
if (fstat64 (fd, &st1) != 0)
error (1, errno, "fstat failed");
struct timespec t[2];
t[0].tv_sec = 0;
t[0].tv_nsec = 0;
t[1].tv_sec = 0;
t[1].tv_nsec = 0;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");
struct stat64 st2;
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
{
puts ("atim not reset to zero");
status = 1;
}
if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim not reset to zero");
status = 1;
}
if (status != 0)
goto out;
t[0] = st1.st_atim;
t[1].tv_sec = 0;
t[1].tv_nsec = UTIME_OMIT;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
|| st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
{
puts ("atim not set");
status = 1;
}
if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim changed from zero");
status = 1;
}
if (status != 0)
goto out;
t[0].tv_sec = 0;
t[0].tv_nsec = UTIME_OMIT;
t[1] = st1.st_mtim;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
|| st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
{
puts ("mtim changed from original time");
status = 1;
}
if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec
|| st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec)
{
puts ("mtim not set");
status = 1;
}
if (status != 0)
goto out;
sleep (2);
t[0].tv_sec = 0;
t[0].tv_nsec = UTIME_NOW;
t[1].tv_sec = 0;
t[1].tv_nsec = UTIME_NOW;
if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
error (1, errno, "utimensat failed");
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
struct timeval tv;
gettimeofday(&tv,NULL);
if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec
|| st2.st_atim.tv_sec > tv.tv_sec)
{
puts ("atim not set to NOW");
status = 1;
}
if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec
|| st2.st_mtim.tv_sec > tv.tv_sec)
{
puts ("mtim not set to NOW");
status = 1;
}
if (symlink ("ttt", "tttsym") != 0)
error (1, errno, "cannot create symlink");
t[0].tv_sec = 0;
t[0].tv_nsec = 0;
t[1].tv_sec = 0;
t[1].tv_nsec = 0;
if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0)
error (1, errno, "utimensat failed");
if (lstat64 ("tttsym", &st2) != 0)
error (1, errno, "lstat failed");
if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
{
puts ("symlink atim not reset to zero");
status = 1;
}
if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
puts ("symlink mtim not reset to zero");
status = 1;
}
if (status != 0)
goto out;
t[0].tv_sec = 1;
t[0].tv_nsec = 0;
t[1].tv_sec = 1;
t[1].tv_nsec = 0;
if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0)
error (1, errno, "utimensat failed");
if (fstat64 (fd, &st2) != 0)
error (1, errno, "fstat failed");
if (st2.st_atim.tv_sec != 1 || st2.st_atim.tv_nsec != 0)
{
puts ("atim not reset to one");
status = 1;
}
if (st2.st_mtim.tv_sec != 1 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim not reset to one");
status = 1;
}
if (status == 0)
puts ("all OK");
out:
close (fd);
unlink ("ttt");
unlink ("tttsym");
return status;
}
[akpm@linux-foundation.org: add missing i386 syscall table entry]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
their root inodes
This patch makes it so that simple_fill_super and get_sb_pseudo assign their
root inodes to be number 1. It also fixes up a couple of callers of
simple_fill_super that were passing in files arrays that had an index at
number 1, and adds a warning for any caller that sends in such an array.
It would have been nice to have made it so that it wasn't possible to make
such a collision, but some callers need to be able to control what inode
number their entries get, so I think this is the best that can be done.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The problems are:
- on filesystems w/o permanent inode numbers, i_ino values can be larger
than 32 bits, which can cause problems for some 32 bit userspace programs on
a 64 bit kernel. We can't do anything for filesystems that have actual
>32-bit inode numbers, but on filesystems that generate i_ino values on the
fly, we should try to have them fit in 32 bits. We could trivially fix this
by making the static counters in new_inode and iunique 32 bits, but...
- many filesystems call new_inode and assume that the i_ino values they are
given are unique. They are not guaranteed to be so, since the static
counter can wrap. This problem is exacerbated by the fix for #1.
- after allocating a new inode, some filesystems call iunique to try to get
a unique i_ino value, but they don't actually add their inodes to the
hashtable, and so they're still not guaranteed to be unique if that counter
wraps.
This patch set takes the simpler approach of simply using iunique and hashing
the inodes afterward. Christoph H. previously mentioned that he thought that
this approach may slow down lookups for filesystems that currently hash their
inodes.
The questions are:
1) how much would this slow down lookups for these filesystems?
2) is it enough to justify adding more infrastructure to avoid it?
What might be best is to start with this approach and then only move to using
IDR or some other scheme if these extra inodes in the hashtable prove to be
problematic.
I've done some cursory testing with this patch and the overhead of hashing and
unhashing the inodes with pipefs is pretty low -- just a few seconds of system
time added on to the creation and destruction of 10 million pipes (very
similar to the overhead that the IDR approach would add).
The hard thing to measure is what effect this has on other filesystems. I'm
open to ways to try and gauge this.
Again, I've only converted pipefs as an example. If this approach is
acceptable then I'll start work on patches to convert other filesystems.
With a pretty-much-worst-case microbenchmark provided by Eric Dumazet
<dada1@cosmosbay.com>:
hashing patch (pipebench):
sys 1m15.329s
sys 1m16.249s
sys 1m17.169s
unpatched (pipebench):
sys 1m9.836s
sys 1m12.541s
sys 1m14.153s
Which works out to 1.05642174294555027017. So ~5-6% slowdown.
This patch:
When a 32-bit program that was not compiled with large file offsets does a
stat and gets a st_ino value back that won't fit in the 32 bit field, glibc
(correctly) generates an EOVERFLOW error. We can't do anything about fs's
with larger permanent inode numbers, but when we generate them on the fly, we
ought to try and have them fit within a 32 bit field.
This patch takes the first step toward this by making the static counters in
these two functions be 32 bits.
[jlayton@redhat.com: mention that it's only the case for 32bit, non-LFS stat]
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When elf loader fails to map executable (due to memory shortage or because
binary is malformed), it can return 0. Normally, this is invisible because
process is killed with SIGKILL and it never returns to user space.
But if exec() is called from kernel thread (hotplug, whatever)
consequences are more interesting and vary depending on architecture.
i386. Nothing especially interesting, execve() just returns
with "success" :-)
x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded
with zeros, ergo... double fault.
ia64. Similar to i386, but r32...r95 are corrupted. Sometimes it
oopses due to return to zero PC, sometimes it sees NaT in
rXX and oopses due to NaT consumption.
Signed-off-by: Alexey Kuznetsov <alexey@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Cleanup using simple_read_from_buffer() in procfs.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Don't clobber error from sys_ioctl in HDIO_GETGEO compat wrapper.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It appears that a minor thinko occurred in udf_rmdir and the
(already-cleared) link count on the directory that is being removed was
being decremented instead of the link count on its parent directory. This
gives rise to lots of kernel messages similar to:
UDF-fs warning (device loop1): udf_rmdir: empty directory has nlink != 2 (8)
when removing directory trees. No other ill effects have been observed but
I guess it could theoretically result in the link count overflowing on a
very long-lived, much modified directory.
Signed-off-by: Stephen Mollett <molletts@yahoo.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If you compile and run the below test case in an msdos or vfat directory on
an x86-64 system with -m32 you'll get garbage in the kernel_dirent struct
followed by a SIGSEGV.
The patch fixes this.
Reported and initial fix by Bart Oldeman
#include <sys/types.h>
#include <sys/ioctl.h>
#include <dirent.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
struct kernel_dirent {
long d_ino;
long d_off;
unsigned short d_reclen;
char d_name[256]; /* We must not include limits.h! */
};
#define VFAT_IOCTL_READDIR_BOTH _IOR('r', 1, struct kernel_dirent [2])
#define VFAT_IOCTL_READDIR_SHORT _IOR('r', 2, struct kernel_dirent [2])
int main(void)
{
int fd = open(".", O_RDONLY);
struct kernel_dirent de[2];
while (1) {
int i = ioctl(fd, VFAT_IOCTL_READDIR_BOTH, (long)de);
if (i == -1) break;
if (de[0].d_reclen == 0) break;
printf("SFN: reclen=%2d off=%d ino=%d, %-12s",
de[0].d_reclen, de[0].d_off, de[0].d_ino, de[0].d_name);
if (de[1].d_reclen)
printf("\tLFN: reclen=%2d off=%d ino=%d, %s",
de[1].d_reclen, de[1].d_off, de[1].d_ino, de[1].d_name);
printf("\n");
}
return 0;
}
Signed-off-by: Bart Oldeman <bartoldeman@users.sourceforge.net>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
ext2-specific i_flags. Hence, when someone sets these flags via a different
interface than ioctl, they are stored correctly.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It seems that the recent Windows changed specification, and it's
undocumented. Windows doesn't update ->free_clusters correctly.
This patch doesn't use ->free_clusters by default. (instead, add "usefree"
for forcing to use it)
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Juergen Beisert <juergen127@kreuzholzen.de>
Cc: Andreas Schwab <schwab@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
use __set_current_state(TASK_*) instead of current->state = TASK_*, in
fs/reiserfs
Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the thread failed to create the subsequent wait_event will hang forever.
This is likely to happen if kernel hits max_threads limit.
Will be critical for virtualization systems that limit the number of tasks
and kernel memory usage within the container.
(akpm: JBD should be converted fully to the kthread API: kthread_should_stop()
and kthread_stop()).
Cc: <linux-ext4@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There's a missing check for CAP_SYS_ADMIN in do_change_type().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|