aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAge
* timerfd use waitqueue lock ...Davide Libenzi2007-05-18
| | | | | | | | | | The timerfd was using the unlocked waitqueue operations, but it was using a different lock, so poll_wait() would race with it. This makes timerfd directly use the waitqueue lock. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* eventfd use waitqueue lock ...Davide Libenzi2007-05-18
| | | | | | | | | | The eventfd was using the unlocked waitqueue operations, but it was using a different lock, so poll_wait() would race with it. This makes eventfd directly use the waitqueue lock. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'master' of /home/trondmy/repositories/git/linux-2.6/Trond Myklebust2007-05-17
|\
| * Fix page allocation flags in grow_dev_page()Christoph Lameter2007-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | grow_dev_page() simply passes GFP_NOFS to find_or_create_page. This means the allocation of radix tree nodes is done with GFP_NOFS and the allocation of a new page is done using GFP_NOFS. The mapping has a flags field that contains the necessary allocation flags for the page cache allocation. These need to be consulted in order to get DMA and HIGHMEM allocations etc right. And yes a blockdev could be allowing Highmem allocations if its a ramdisk. Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * circular locking dependency found in QUOTA OFFJan Kara2007-05-17
| | | | | | | | | | | | | | | | | | | | | | | | i_mutex on quota files is special. Unlike i_mutexes for other inodes it is acquired under dqonoff_mutex. Tell lockdep about this lock ranking. Also comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota file can be acquired under dqonoff_mutex). Move truncate_inode_pages() call under dqonoff_mutex and save some problems with races... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * ecryptfs: use zero_user_pageNate Diller2007-05-17
| | | | | | | | | | | | | | | | | | Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core filename ↵Dan Aloni2007-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | size Make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core filename size and change it to 128, so that extensive patterns such as '/local/cores/%e-%h-%s-%t-%p.core' won't result in truncated filename generation. Signed-off-by: Dan Aloni <da-x@monatomic.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * simplify compat_sys_timerfdHeiko Carstens2007-05-17
| | | | | | | | | | | | | | | | | | Just thought this is easier to read. Acked-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Remove SLAB_CTOR_CONSTRUCTORChristoph Lameter2007-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@ucw.cz> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'master' of /home/trondmy/repositories/git/linux-2.6/Trond Myklebust2007-05-17
|\|
| * AFS: Fix afs_prepare_write()David Howells2007-05-17
| | | | | | | | | | | | | | | | | | | | afs_prepare_write() should not mark a page up to date if it only partially fills it in, in expectation of the caller filling in the rest prior to calling commit_write(). commit_write(), however, should mark the page up to date. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * AFS: write back dirty data on unmountDavid Howells2007-05-17
| | | | | | | | | | | | | | | | | | | | Fix AFS to write back dirty on unmounting. This didn't happen because afs_super_ops.drop_inode was pointing to generic_delete_inode. Now this pointer is left set to NULL so that the default behaviour occurs instead. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'origin'Trond Myklebust2007-05-15
|\|
| * epoll: move kfree inside ep_freeDavide Libenzi2007-05-15
| | | | | | | | | | | | | | | | Move the kfree() call inside the ep_free() function. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * epoll: fix some commentsDavide Libenzi2007-05-15
| | | | | | | | | | | | | | | | Fixes some epoll code comments. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * epoll locks changes and cleanupsDavide Libenzi2007-05-15
| | | | | | | | | | | | | | | | | | | | Changes the rwlock to a spinlock, and drops the use-count variable. Operations are always bound by the mutex now, so the use-count is no more needed. For the same reason, the rwlock can become a simple spinlock. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * fix epoll single pass code and add wait-exclusive flagDavide Libenzi2007-05-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes the epoll single pass code. During the unlocked event delivery (to userspace) code, the poll callback can re-issue new events, and we must receive them correctly. Since we loop in a lockless fashion, we want to be O(nready), and we don't want to flash on/off the spinlock for every event, we have the poll callback to use a secondary list to queue events while we're inside the event delivery loop. The rw_semaphore has been turned into a mutex. This patch also adds the wait-exclusive flag, as suggested by Davi Arnaut. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | NLM: Fix sparse warningsTrond Myklebust2007-05-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - fs/lockd/xdr4.c:140:27: warning: incorrect type in argument 2 (different explicit signedness) - fs/lockd/xdr4.c:141:27: warning: incorrect type in argument 2 (different explicit signedness) - fs/lockd/xdr4.c:432:28: warning: incorrect type in argument 2 (different explicit signedness) - fs/lockd/xdr4.c:433:28: warning: incorrect type in argument 2 (different explicit signedness) - fs/lockd/xdr4.c:587:20: warning: symbol 'nlm_version4' was not declared. Should it be static? Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFS: Fix more sparse warningsTrond Myklebust2007-05-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - fs/nfs/nfs4xdr.c:2499:42: warning: incorrect type in argument 2 (different signedness) - fs/nfs/nfs4xdr.c:2658:49: warning: incorrect type in argument 4 (different explicit signedness) - fs/nfs/nfs4xdr.c:2683:50: warning: incorrect type in argument 4 (different explicit signedness) - fs/nfs/nfs4xdr.c:3063:68: warning: incorrect type in argument 4 (different explicit signedness) - fs/nfs/nfs4xdr.c:3065:68: warning: incorrect type in argument 4 (different explicit signedness) - fs/nfs/callback_xdr.c:138:31: warning: incorrect type in argument 2 (different signedness) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFS: Fix some 'sparse' warnings...Trond Myklebust2007-05-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - fs/nfs/dir.c:610:8: warning: symbol 'nfs_llseek_dir' was not declared. Should it be static? - fs/nfs/dir.c:636:5: warning: symbol 'nfs_fsync_dir' was not declared. Should it be static? - fs/nfs/write.c:925:19: warning: symbol 'req' shadows an earlier one - fs/nfs/write.c:61:6: warning: symbol 'nfs_commit_rcu_free' was not declared. Should it be static? - fs/nfs/nfs4proc.c:793:5: warning: symbol 'nfs4_recover_expired_lease' was not declared. Should it be static? Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFS4: Fix incorrect use of sizeof() in fs/nfs/nfs4xdr.cTrond Myklebust2007-05-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The XDR code should not depend on the physical allocation size of structures like nfs4_stateid and nfs4_verifier since those may have to change at some future date. We therefore replace all uses of sizeof() with constants like NFS4_VERIFIER_SIZE and NFS4_STATEID_SIZE. This also has the side-effect of fixing some warnings of the type format ‘%u’ expects type ‘unsigned int’, but argument X has type ‘long unsigned int’ on 64-bit systems Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFS: use zero_user_pageNate Diller2007-05-14
| | | | | | | | | | | | | | | | | | | | Use zero_user_page() instead of the newly deprecated memclear_highpage_flush(). Signed-off-by: Nate Diller <nate.diller@gmail.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NLM: don't use CLONE_SIGHAND in nlmclnt_recoveryOleg Nesterov2007-05-14
| | | | | | | | | | | | | | | | | | | | | | reclaimer() calls allow_signal() which plays with parent process's ->sighand. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NLM: Fix locking client timeouts...Trond Myklebust2007-05-14
|/ | | | | | nlmsvc_timeout is already in units of HZ... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* ntfs: use zero_user_pageNate Diller2007-05-12
| | | | | | | | | | Use zero_user_page() instead of open-coding it. [akpm@linux-foundation.org: kmap-type fixes] Signed-off-by: Nate Diller <nate.diller@gmail.com> Acked-by: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'audit.b38' of ↵Linus Torvalds2007-05-11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b38' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] Abnormal End of Processes [PATCH] match audit name data [PATCH] complete message queue auditing [PATCH] audit inode for all xattr syscalls [PATCH] initialize name osid [PATCH] audit signal recipients [PATCH] add SIGNAL syscall class (v3) [PATCH] auditing ptrace
| * [PATCH] Abnormal End of ProcessesSteve Grubb2007-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | Hi, I have been working on some code that detects abnormal events based on audit system events. One kind of event that we currently have no visibility for is when a program terminates due to segfault - which should never happen on a production machine. And if it did, you'd want to investigate it. Attached is a patch that collects these events and sends them into the audit system. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] complete message queue auditingAmy Griffis2007-05-11
| | | | | | | | | | | | | | | | | | | | Handle the edge cases for POSIX message queue auditing. Collect inode info when opening an existing mq, and for send/receive operations. Remove audit_inode_update() as it has really evolved into the equivalent of audit_inode(). Signed-off-by: Amy Griffis <amy.griffis@hp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] audit inode for all xattr syscallsAmy Griffis2007-05-11
| | | | | | | | | | | | | | | | | | Collect inode info for the remaining xattr syscalls that operate on a file descriptor. These don't call a path_lookup variant, so they aren't covered by the general audit hook. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | epoll cleanups: epoll remove static pre-declarations and akpm-ize the codeDavide Libenzi2007-05-11
| | | | | | | | | | | | | | | | | | Re-arrange epoll code to avoid static functions pre-declarations, and apply akpm-filter on it. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | epoll cleanups: epoll no moduleDavide Libenzi2007-05-11
| | | | | | | | | | | | | | | | | | Epoll is either compiled it, or not (if EMBEDDED). Remove the module code and use fs_initcall(). Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | epoll: use anonymous inodesDavide Libenzi2007-05-11
| | | | | | | | | | | | | | | | | | Cut out lots of code from epoll, by reusing the anonymous inode source patch (fs/anon_inodes.c). Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | signal/timer/event: KAIO eventfd support exampleDavide Libenzi2007-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an example about how to add eventfd support to the current KAIO code, in order to enable KAIO to post readiness events to a pollable fd (hence compatible with POSIX select/poll). The KAIO code simply signals the eventfd fd when events are ready, and this triggers a POLLIN in the fd. This patch uses a reserved for future use member of the struct iocb to pass an eventfd file descriptor, that KAIO will use to post events every time a request completes. At that point, an aio_getevents() will return the completed result to a struct io_event. I made a quick test program to verify the patch, and it runs fine here: http://www.xmailserver.org/eventfd-aio-test.c The test program uses poll(2), but it'd, of course, work with select and epoll too. This can allow to schedule both block I/O and other poll-able devices requests, and wait for results using select/poll/epoll. In a typical scenario, an application would submit KAIO request using aio_submit(), and will also use epoll_ctl() on the whole other class of devices (that with the addition of signals, timers and user events, now it's pretty much complete), and then would: epoll_wait(...); for_each_event { if (curr_event_is_kaiofd) { aio_getevents(); dispatch_aio_events(); } else { dispatch_epoll_event(); } } Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | signal/timer/event: eventfd coreDavide Libenzi2007-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a very simple and light file descriptor, that can be used as event wait/dispatch by userspace (both wait and dispatch) and by the kernel (dispatch only). It can be used instead of pipe(2) in all cases where those would simply be used to signal events. Their kernel overhead is much lower than pipes, and they do not consume two fds. When used in the kernel, it can offer an fd-bridge to enable, for example, functionalities like KAIO or syslets/threadlets to signal to an fd the completion of certain operations. But more in general, an eventfd can be used by the kernel to signal readiness, in a POSIX poll/select way, of interfaces that would otherwise be incompatible with it. The API is: int eventfd(unsigned int count); The eventfd API accepts an initial "count" parameter, and returns an eventfd fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2). The POLLIN flag is raised when the internal counter is greater than zero. The POLLOUT flag is raised when at least a value of "1" can be written to the internal counter. The POLLERR flag is raised when an overflow in the counter value is detected. The write(2) operation can never overflow the counter, since it blocks (unless O_NONBLOCK is set, in which case -EAGAIN is returned). But the eventfd_signal() function can do it, since it's supposed to not sleep during its operation. The read(2) function reads the __u64 counter value, and reset the internal value to zero. If the value read is equal to (__u64) -1, an overflow happened on the internal counter (due to 2^64 eventfd_signal() posts that has never been retired - unlickely, but possible). The write(2) call writes an __u64 count value, and adds it to the current counter. The eventfd fd supports O_NONBLOCK also. On the kernel side, we have: struct file *eventfd_fget(int fd); int eventfd_signal(struct file *file, unsigned int n); The eventfd_fget() should be called to get a struct file* from an eventfd fd (this is an fget() + check of f_op being an eventfd fops pointer). The kernel can then call eventfd_signal() every time it wants to post an event to userspace. The eventfd_signal() function can be called from any context. An eventfd() simple test and bench is available here: http://www.xmailserver.org/eventfd-bench.c This is the eventfd-based version of pipetest-4 (pipe(2) based): http://www.xmailserver.org/pipetest-4.c Not that performance matters much in the eventfd case, but eventfd-bench shows almost as double as performance than pipetest-4. [akpm@linux-foundation.org: fix i386 build] [akpm@linux-foundation.org: add sys_eventfd to sys_ni.c] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | signal/timer/event: timerfd compat codeDavide Libenzi2007-05-11
| | | | | | | | | | | | | | | | This patch implements the necessary compat code for the timerfd system call. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | signal/timer/event: timerfd coreDavide Libenzi2007-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new system call for timers events delivered though file descriptors. This allows timer event to be used with standard POSIX poll(2), select(2) and read(2). As a consequence of supporting the Linux f_op->poll subsystem, they can be used with epoll(2) too. The system call is defined as: int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr); The "ufd" parameter allows for re-use (re-programming) of an existing timerfd w/out going through the close/open cycle (same as signalfd). If "ufd" is -1, s new file descriptor will be created, otherwise the existing "ufd" will be re-programmed. The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time specified in the "utmr->it_value" parameter is the expiry time for the timer. If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time, otherwise it's a relative time. If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0, tv_nsec == 0), this is the period at which the following ticks should be generated. The "utmr->it_interval" should be set to zero if only one tick is requested. Setting the "utmr->it_value" to zero will disable the timer, or will create a timerfd without the timer enabled. The function returns the new (or same, in case "ufd" is a valid timerfd descriptor) file, or -1 in case of error. As stated before, the timerfd file descriptor supports poll(2), select(2) and epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be returned. The read(2) call can be used, and it will return a u32 variable holding the number of "ticks" that happened on the interface since the last call to read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will be returned if no ticks happened. A quick test program, shows timerfd working correctly on my amd64 box: http://www.xmailserver.org/timerfd-test.c [akpm@linux-foundation.org: add sys_timerfd to sys_ni.c] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | signal/timer/event: signalfd compat codeDavide Libenzi2007-05-11
| | | | | | | | | | | | | | | | This patch implements the necessary compat code for the signalfd system call. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | signal/timer/event: signalfd coreDavide Libenzi2007-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch series implements the new signalfd() system call. I took part of the original Linus code (and you know how badly it can be broken :), and I added even more breakage ;) Signals are fetched from the same signal queue used by the process, so signalfd will compete with standard kernel delivery in dequeue_signal(). If you want to reliably fetch signals on the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This seems to be working fine on my Dual Opteron machine. I made a quick test program for it: http://www.xmailserver.org/signafd-test.c The signalfd() system call implements signal delivery into a file descriptor receiver. The signalfd file descriptor if created with the following API: int signalfd(int ufd, const sigset_t *mask, size_t masksize); The "ufd" parameter allows to change an existing signalfd sigmask, w/out going to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new signalfd file. The "mask" allows to specify the signal mask of signals that we are interested in. The "masksize" parameter is the size of "mask". The signalfd fd supports the poll(2) and read(2) system calls. The poll(2) will return POLLIN when signals are available to be dequeued. As a direct consequence of supporting the Linux poll subsystem, the signalfd fd can use used together with epoll(2) too. The read(2) system call will return a "struct signalfd_siginfo" structure in the userspace supplied buffer. The return value is the number of bytes copied in the supplied buffer, or -1 in case of error. The read(2) call can also return 0, in case the sighand structure to which the signalfd was attached, has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will return -EAGAIN in case no signal is available. If the size of the buffer passed to read(2) is lower than sizeof(struct signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also return -ERESTARTSYS in case a signal hits the process. The format of the struct signalfd_siginfo is, and the valid fields depends of the (->code & __SI_MASK) value, in the same way a struct siginfo would: struct signalfd_siginfo { __u32 signo; /* si_signo */ __s32 err; /* si_errno */ __s32 code; /* si_code */ __u32 pid; /* si_pid */ __u32 uid; /* si_uid */ __s32 fd; /* si_fd */ __u32 tid; /* si_fd */ __u32 band; /* si_band */ __u32 overrun; /* si_overrun */ __u32 trapno; /* si_trapno */ __s32 status; /* si_status */ __s32 svint; /* si_int */ __u64 svptr; /* si_ptr */ __u64 utime; /* si_utime */ __u64 stime; /* si_stime */ __u64 addr; /* si_addr */ }; [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | signal/timer/event fds: anonymous inode sourceDavide Libenzi2007-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch add an anonymous inode source, to be used for files that need and inode only in order to create a file*. We do not care of having an inode for each file, and we do not even care of having different names in the associated dentries (dentry names will be same for classes of file*). This allow code reuse, and will be used by epoll, signalfd and timerfd (and whatever else there'll be). Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Replace pid_t in autofs with struct pid referenceSukadev Bhattiprolu2007-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make autofs container-friendly by caching struct pid reference rather than pid_t and using pid_nr() to retreive a task's pid_t. ChangeLog: - Fix Eric Biederman's comments - Use find_get_pid() to hold a reference to oz_pgrp and release while unmounting; separate out changes to autofs and autofs4. - Fix Cedric's comments: retain old prototype of parse_options() and move necessary change to its caller. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: containers@lists.osdl.org Acked-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Fix some coding-style errors in autofsSukadev Bhattiprolu2007-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix coding style errors (extra spaces, long lines) in autofs and autofs4 files being modified for container/pidspace issues. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: <containers@lists.osdl.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | attach_pid() with struct pid parameterSukadev Bhattiprolu2007-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | attach_pid() currently takes a pid_t and then uses find_pid() to find the corresponding struct pid. Sometimes we already have the struct pid. We can then skip find_pid() if attach_pid() were to take a struct pid parameter. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: <containers@lists.osdl.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | consolidate generic_writepages and mpage_writepagesMiklos Szeredi2007-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up massive code duplication between mpage_writepages() and generic_writepages(). The new generic function, write_cache_pages() takes a function pointer argument, which will be called for each page to be written. Maybe cifs_writepages() too can use this infrastructure, but I'm not touching that with a ten-foot pole. The upcoming page writeback support in fuse will also want this. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | small cleanup in gpt partition handlingOlaf Hering2007-05-11
| | | | | | | | | | | | | | | | | | Remove unused argument in is_pmbr_valid() Remove unneeded initialization of local variable legacy_mbr Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Let SYSV68_PARTITION default to yes on VME onlyGeert Uytterhoeven2007-05-11
| | | | | | | | | | | | | | | | | | | | Don't enable SYSV68 partition table support on all m68k boxes by default, only on Motorola VME boards. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | AFS: implement statfsDavid Howells2007-05-11
| | | | | | | | | | | | | | | | Implement the statfs() op for AFS. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | AFS: fix a couple of problems with unlinking AFS filesDavid Howells2007-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a couple of problems with unlinking AFS files. (1) The parent directory wasn't being updated properly between unlink() and the following lookup(). It seems that, for some reason, invalidate_remote_inode() wasn't discarding the directory contents correctly, so this patch calls invalidate_inode_pages2() instead on non-regular files. (2) afs_vnode_deleted_remotely() should handle vnodes that don't have a source server recorded without oopsing. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | AFS: fix interminable loop in afs_write_back_from_locked_page()David Howells2007-05-11
|/ | | | | | | | | | | | | | | | Following bug was uncovered by compiling with '-W' flag: CC [M] fs/afs/write.o fs/afs/write.c: In function ‘afs_write_back_from_locked_page’: fs/afs/write.c:398: warning: comparison of unsigned expression >= 0 is always true Loop variable 'n' is unsigned, so wraps around happily as far as I can see. Trival fix attached (compile tested only). Signed-off-by: Mika Kukkonen <mikukkon@iki.fi> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* locks: fix F_GETLK regression (failure to find conflicts)J. Bruce Fields2007-05-10
| | | | | | | | | | | | | | | | | | | | | | In 9d6a8c5c213e34c475e72b245a8eb709258e968c we changed posix_test_lock to modify its single file_lock argument instead of taking separate input and output arguments. This makes it no longer safe to set the output lock's fl_type to F_UNLCK before looking for a conflict, since that means searching for a conflict against a lock with type F_UNLCK. This fixes a regression which causes F_GETLK to incorrectly report no conflict on most filesystems (including any filesystem that doesn't do its own locking). Also fix posix_lock_to_flock() to copy the lock type. This isn't strictly necessary, since the caller already does this; but it seems less likely to cause confusion in the future. Thanks to Doug Chapman for the bug report. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Doug Chapman <doug.chapman@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Allow compat_ioctl.c to compile without CONFIG_NETSimon Horman2007-05-10
| | | | | | | | | | | | | | | | | | | | A small regression appears to have been introduced in the recent patch "cleanup compat ioctl handling", which was included in Linus' tree after 2.6.20. siocdevprivate_ioctl() is no longer defined if CONFIG_NET is undefined, whereas previously it was a dummy function in this case. This causes compilation with CONFIG_COMPAT but without CONFIG_NET to fail. fs/compat_ioctl.c: In function `compat_sys_ioctl': fs/compat_ioctl.c:3571: warning: implicit declaration of function `siocdevprivate_ioctl' Cc: Christoph Hellwig <hch@lst.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>