aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorRobert Love <rml@novell.com>2005-07-12 17:06:03 -0400
committerLinus Torvalds <torvalds@g5.osdl.org>2005-07-12 23:38:38 -0400
commit0eeca28300df110bd6ed54b31193c83b87921443 (patch)
tree7db42d8a18d80eca538f5b7d25e0532b8fa38b85
parentbd4c625c061c2a38568d0add3478f59172455159 (diff)
[PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly its inability to scale and its terrible user interface: * dnotify requires the opening of one fd per each directory that you intend to watch. This quickly results in too many open files and pins removable media, preventing unmount. * dnotify is directory-based. You only learn about changes to directories. Sure, a change to a file in a directory affects the directory, but you are then forced to keep a cache of stat structures. * dnotify's interface to user-space is awful. Signals? inotify provides a more usable, simple, powerful solution to file change notification: * inotify's interface is a system call that returns a fd, not SIGIO. You get a single fd, which is select()-able. * inotify has an event that says "the filesystem that the item you were watching is on was unmounted." * inotify can watch directories or files. Inotify is currently used by Beagle (a desktop search infrastructure), Gamin (a FAM replacement), and other projects. See Documentation/filesystems/inotify.txt. Signed-off-by: Robert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-rw-r--r--Documentation/filesystems/inotify.txt138
-rw-r--r--arch/i386/kernel/syscall_table.S3
-rw-r--r--fs/Kconfig13
-rw-r--r--fs/Makefile1
-rw-r--r--fs/attr.c33
-rw-r--r--fs/compat.c12
-rw-r--r--fs/file_table.c3
-rw-r--r--fs/inode.c6
-rw-r--r--fs/inotify.c999
-rw-r--r--fs/namei.c30
-rw-r--r--fs/nfsd/vfs.c6
-rw-r--r--fs/open.c3
-rw-r--r--fs/read_write.c15
-rw-r--r--fs/sysfs/file.c7
-rw-r--r--fs/xattr.c5
-rw-r--r--include/asm-i386/unistd.h5
-rw-r--r--include/linux/fs.h6
-rw-r--r--include/linux/fsnotify.h248
-rw-r--r--include/linux/inotify.h108
-rw-r--r--include/linux/sched.h4
-rw-r--r--include/linux/sysctl.h11
-rw-r--r--kernel/sys_ni.c3
-rw-r--r--kernel/sysctl.c43
-rw-r--r--kernel/user.c4
24 files changed, 1639 insertions, 67 deletions
diff --git a/Documentation/filesystems/inotify.txt b/Documentation/filesystems/inotify.txt
new file mode 100644
index 000000000000..2c716041f578
--- /dev/null
+++ b/Documentation/filesystems/inotify.txt
@@ -0,0 +1,138 @@
1 inotify
2 a powerful yet simple file change notification system
3
4
5
6Document started 15 Mar 2005 by Robert Love <rml@novell.com>
7
8(i) User Interface
9
10Inotify is controlled by a set of three sys calls
11
12First step in using inotify is to initialise an inotify instance
13
14 int fd = inotify_init ();
15
16Change events are managed by "watches". A watch is an (object,mask) pair where
17the object is a file or directory and the mask is a bit mask of one or more
18inotify events that the application wishes to receive. See <linux/inotify.h>
19for valid events. A watch is referenced by a watch descriptor, or wd.
20
21Watches are added via a path to the file.
22
23Watches on a directory will return events on any files inside of the directory.
24
25Adding a watch is simple,
26
27 int wd = inotify_add_watch (fd, path, mask);
28
29You can add a large number of files via something like
30
31 for each file to watch {
32 int wd = inotify_add_watch (fd, file, mask);
33 }
34
35You can update an existing watch in the same manner, by passing in a new mask.
36
37An existing watch is removed via the INOTIFY_IGNORE ioctl, for example
38
39 inotify_rm_watch (fd, wd);
40
41Events are provided in the form of an inotify_event structure that is read(2)
42from a inotify instance fd. The filename is of dynamic length and follows the
43struct. It is of size len. The filename is padded with null bytes to ensure
44proper alignment. This padding is reflected in len.
45
46You can slurp multiple events by passing a large buffer, for example
47
48 size_t len = read (fd, buf, BUF_LEN);
49
50Will return as many events as are available and fit in BUF_LEN.
51
52each inotify instance fd is also select()- and poll()-able.
53
54You can find the size of the current event queue via the FIONREAD ioctl.
55
56All watches are destroyed and cleaned up on close.
57
58
59(ii) Internal Kernel Implementation
60
61Each open inotify instance is associated with an inotify_device structure.
62
63Each watch is associated with an inotify_watch structure. Watches are chained
64off of each associated device and each associated inode.
65
66See fs/inotify.c for the locking and lifetime rules.
67
68
69(iii) Rationale
70
71Q: What is the design decision behind not tying the watch to the open fd of
72 the watched object?
73
74A: Watches are associated with an open inotify device, not an open file.
75 This solves the primary problem with dnotify: keeping the file open pins
76 the file and thus, worse, pins the mount. Dnotify is therefore infeasible
77 for use on a desktop system with removable media as the media cannot be
78 unmounted.
79
80Q: What is the design decision behind using an-fd-per-device as opposed to
81 an fd-per-watch?
82
83A: An fd-per-watch quickly consumes more file descriptors than are allowed,
84 more fd's than are feasible to manage, and more fd's than are optimally
85 select()-able. Yes, root can bump the per-process fd limit and yes, users
86 can use epoll, but requiring both is a silly and extraneous requirement.
87 A watch consumes less memory than an open file, separating the number
88 spaces is thus sensible. The current design is what user-space developers
89 want: Users initialize inotify, once, and add n watches, requiring but one fd
90 and no twiddling with fd limits. Initializing an inotify instance two
91 thousand times is silly. If we can implement user-space's preferences
92 cleanly--and we can, the idr layer makes stuff like this trivial--then we
93 should.
94
95 There are other good arguments. With a single fd, there is a single
96 item to block on, which is mapped to a single queue of events. The single
97 fd returns all watch events and also any potential out-of-band data. If
98 every fd was a separate watch,
99
100 - There would be no way to get event ordering. Events on file foo and
101 file bar would pop poll() on both fd's, but there would be no way to tell
102 which happened first. A single queue trivially gives you ordering. Such
103 ordering is crucial to existing applications such as Beagle. Imagine
104 "mv a b ; mv b a" events without ordering.
105
106 - We'd have to maintain n fd's and n internal queues with state,
107 versus just one. It is a lot messier in the kernel. A single, linear
108 queue is the data structure that makes sense.
109
110 - User-space developers prefer the current API. The Beagle guys, for
111 example, love it. Trust me, I asked. It is not a surprise: Who'd want
112 to manage and block on 1000 fd's via select?
113
114 - You'd have to manage the fd's, as an example: Call close() when you
115 received a delete event.
116
117 - No way to get out of band data.
118
119 - 1024 is still too low. ;-)
120
121 When you talk about designing a file change notification system that
122 scales to 1000s of directories, juggling 1000s of fd's just does not seem
123 the right interface. It is too heavy.
124
125Q: Why the system call approach?
126
127A: The poor user-space interface is the second biggest problem with dnotify.
128 Signals are a terrible, terrible interface for file notification. Or for
129 anything, for that matter. The ideal solution, from all perspectives, is a
130 file descriptor-based one that allows basic file I/O and poll/select.
131 Obtaining the fd and managing the watches could have been done either via a
132 device file or a family of new system calls. We decided to implement a
133 family of system calls because that is the preffered approach for new kernel
134 features and it means our user interface requirements.
135
136 Additionally, it _is_ possible to more than one instance and
137 juggle more than one queue and thus more than one associated fd.
138
diff --git a/arch/i386/kernel/syscall_table.S b/arch/i386/kernel/syscall_table.S
index 3db9a04aec6e..468500a7e894 100644
--- a/arch/i386/kernel/syscall_table.S
+++ b/arch/i386/kernel/syscall_table.S
@@ -291,3 +291,6 @@ ENTRY(sys_call_table)
291 .long sys_keyctl 291 .long sys_keyctl
292 .long sys_ioprio_set 292 .long sys_ioprio_set
293 .long sys_ioprio_get /* 290 */ 293 .long sys_ioprio_get /* 290 */
294 .long sys_inotify_init
295 .long sys_inotify_add_watch
296 .long sys_inotify_rm_watch
diff --git a/fs/Kconfig b/fs/Kconfig
index f93fd41b025d..5d0c4be43dba 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -359,6 +359,19 @@ config ROMFS_FS
359 If you don't know whether you need it, then you don't need it: 359 If you don't know whether you need it, then you don't need it:
360 answer N. 360 answer N.
361 361
362config INOTIFY
363 bool "Inotify file change notification support"
364 default y
365 ---help---
366 Say Y here to enable inotify support and the /dev/inotify character
367 device. Inotify is a file change notification system and a
368 replacement for dnotify. Inotify fixes numerous shortcomings in
369 dnotify and introduces several new features. It allows monitoring
370 of both files and directories via a single open fd. Multiple file
371 events are supported.
372
373 If unsure, say Y.
374
362config QUOTA 375config QUOTA
363 bool "Quota support" 376 bool "Quota support"
364 help 377 help
diff --git a/fs/Makefile b/fs/Makefile
index 20edcf28bfd2..cf95eb894fd5 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -12,6 +12,7 @@ obj-y := open.o read_write.o file_table.o buffer.o bio.o super.o \
12 seq_file.o xattr.o libfs.o fs-writeback.o mpage.o direct-io.o \ 12 seq_file.o xattr.o libfs.o fs-writeback.o mpage.o direct-io.o \
13 ioprio.o 13 ioprio.o
14 14
15obj-$(CONFIG_INOTIFY) += inotify.o
15obj-$(CONFIG_EPOLL) += eventpoll.o 16obj-$(CONFIG_EPOLL) += eventpoll.o
16obj-$(CONFIG_COMPAT) += compat.o 17obj-$(CONFIG_COMPAT) += compat.o
17 18
diff --git a/fs/attr.c b/fs/attr.c
index c3c76fe78346..b1796fb9e524 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -10,7 +10,7 @@
10#include <linux/mm.h> 10#include <linux/mm.h>
11#include <linux/string.h> 11#include <linux/string.h>
12#include <linux/smp_lock.h> 12#include <linux/smp_lock.h>
13#include <linux/dnotify.h> 13#include <linux/fsnotify.h>
14#include <linux/fcntl.h> 14#include <linux/fcntl.h>
15#include <linux/quotaops.h> 15#include <linux/quotaops.h>
16#include <linux/security.h> 16#include <linux/security.h>
@@ -107,31 +107,8 @@ int inode_setattr(struct inode * inode, struct iattr * attr)
107out: 107out:
108 return error; 108 return error;
109} 109}
110
111EXPORT_SYMBOL(inode_setattr); 110EXPORT_SYMBOL(inode_setattr);
112 111
113int setattr_mask(unsigned int ia_valid)
114{
115 unsigned long dn_mask = 0;
116
117 if (ia_valid & ATTR_UID)
118 dn_mask |= DN_ATTRIB;
119 if (ia_valid & ATTR_GID)
120 dn_mask |= DN_ATTRIB;
121 if (ia_valid & ATTR_SIZE)
122 dn_mask |= DN_MODIFY;
123 /* both times implies a utime(s) call */
124 if ((ia_valid & (ATTR_ATIME|ATTR_MTIME)) == (ATTR_ATIME|ATTR_MTIME))
125 dn_mask |= DN_ATTRIB;
126 else if (ia_valid & ATTR_ATIME)
127 dn_mask |= DN_ACCESS;
128 else if (ia_valid & ATTR_MTIME)
129 dn_mask |= DN_MODIFY;
130 if (ia_valid & ATTR_MODE)
131 dn_mask |= DN_ATTRIB;
132 return dn_mask;
133}
134
135int notify_change(struct dentry * dentry, struct iattr * attr) 112int notify_change(struct dentry * dentry, struct iattr * attr)
136{ 113{
137 struct inode *inode = dentry->d_inode; 114 struct inode *inode = dentry->d_inode;
@@ -197,11 +174,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
197 if (ia_valid & ATTR_SIZE) 174 if (ia_valid & ATTR_SIZE)
198 up_write(&dentry->d_inode->i_alloc_sem); 175 up_write(&dentry->d_inode->i_alloc_sem);
199 176
200 if (!error) { 177 if (!error)
201 unsigned long dn_mask = setattr_mask(ia_valid); 178 fsnotify_change(dentry, ia_valid);
202 if (dn_mask) 179
203 dnotify_parent(dentry, dn_mask);
204 }
205 return error; 180 return error;
206} 181}
207 182
diff --git a/fs/compat.c b/fs/compat.c
index 728cd8365384..6b06b6bae35e 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -37,7 +37,7 @@
37#include <linux/ctype.h> 37#include <linux/ctype.h>
38#include <linux/module.h> 38#include <linux/module.h>
39#include <linux/dirent.h> 39#include <linux/dirent.h>
40#include <linux/dnotify.h> 40#include <linux/fsnotify.h>
41#include <linux/highuid.h> 41#include <linux/highuid.h>
42#include <linux/sunrpc/svc.h> 42#include <linux/sunrpc/svc.h>
43#include <linux/nfsd/nfsd.h> 43#include <linux/nfsd/nfsd.h>
@@ -1307,9 +1307,13 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
1307out: 1307out:
1308 if (iov != iovstack) 1308 if (iov != iovstack)
1309 kfree(iov); 1309 kfree(iov);
1310 if ((ret + (type == READ)) > 0) 1310 if ((ret + (type == READ)) > 0) {
1311 dnotify_parent(file->f_dentry, 1311 struct dentry *dentry = file->f_dentry;
1312 (type == READ) ? DN_ACCESS : DN_MODIFY); 1312 if (type == READ)
1313 fsnotify_access(dentry);
1314 else
1315 fsnotify_modify(dentry);
1316 }
1313 return ret; 1317 return ret;
1314} 1318}
1315 1319
diff --git a/fs/file_table.c b/fs/file_table.c
index fa7849fae134..1d3de78e6bc9 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -16,6 +16,7 @@
16#include <linux/eventpoll.h> 16#include <linux/eventpoll.h>
17#include <linux/mount.h> 17#include <linux/mount.h>
18#include <linux/cdev.h> 18#include <linux/cdev.h>
19#include <linux/fsnotify.h>
19 20
20/* sysctl tunables... */ 21/* sysctl tunables... */
21struct files_stat_struct files_stat = { 22struct files_stat_struct files_stat = {
@@ -126,6 +127,8 @@ void fastcall __fput(struct file *file)
126 struct inode *inode = dentry->d_inode; 127 struct inode *inode = dentry->d_inode;
127 128
128 might_sleep(); 129 might_sleep();
130
131 fsnotify_close(file);
129 /* 132 /*
130 * The function eventpoll_release() should be the first called 133 * The function eventpoll_release() should be the first called
131 * in the file cleanup chain. 134 * in the file cleanup chain.
diff --git a/fs/inode.c b/fs/inode.c
index 5bc97507eeaa..96364fae0844 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -21,6 +21,7 @@
21#include <linux/pagemap.h> 21#include <linux/pagemap.h>
22#include <linux/cdev.h> 22#include <linux/cdev.h>
23#include <linux/bootmem.h> 23#include <linux/bootmem.h>
24#include <linux/inotify.h>
24 25
25/* 26/*
26 * This is needed for the following functions: 27 * This is needed for the following functions:
@@ -202,6 +203,10 @@ void inode_init_once(struct inode *inode)
202 INIT_LIST_HEAD(&inode->i_data.i_mmap_nonlinear); 203 INIT_LIST_HEAD(&inode->i_data.i_mmap_nonlinear);
203 spin_lock_init(&inode->i_lock); 204 spin_lock_init(&inode->i_lock);
204 i_size_ordered_init(inode); 205 i_size_ordered_init(inode);
206#ifdef CONFIG_INOTIFY
207 INIT_LIST_HEAD(&inode->inotify_watches);
208 sema_init(&inode->inotify_sem, 1);
209#endif
205} 210}
206 211
207EXPORT_SYMBOL(inode_init_once); 212EXPORT_SYMBOL(inode_init_once);
@@ -351,6 +356,7 @@ int invalidate_inodes(struct super_block * sb)
351 356
352 down(&iprune_sem); 357 down(&iprune_sem);
353 spin_lock(&inode_lock); 358 spin_lock(&inode_lock);
359 inotify_unmount_inodes(&sb->s_inodes);
354 busy = invalidate_list(&sb->s_inodes, &throw_away); 360 busy = invalidate_list(&sb->s_inodes, &throw_away);
355 spin_unlock(&inode_lock); 361 spin_unlock(&inode_lock);
356 362
diff --git a/fs/inotify.c b/fs/inotify.c
new file mode 100644
index 000000000000..e423bfe0c86f
--- /dev/null
+++ b/fs/inotify.c
@@ -0,0 +1,999 @@
1/*
2 * fs/inotify.c - inode-based file event notifications
3 *
4 * Authors:
5 * John McCutchan <ttb@tentacle.dhs.org>
6 * Robert Love <rml@novell.com>
7 *
8 * Copyright (C) 2005 John McCutchan
9 *
10 * This program is free software; you can redistribute it and/or modify it
11 * under the terms of the GNU General Public License as published by the
12 * Free Software Foundation; either version 2, or (at your option) any
13 * later version.
14 *
15 * This program is distributed in the hope that it will be useful, but
16 * WITHOUT ANY WARRANTY; without even the implied warranty of
17 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
18 * General Public License for more details.
19 */
20
21#include <linux/module.h>
22#include <linux/kernel.h>
23#include <linux/sched.h>
24#include <linux/spinlock.h>
25#include <linux/idr.h>
26#include <linux/slab.h>
27#include <linux/fs.h>
28#include <linux/file.h>
29#include <linux/mount.h>
30#include <linux/namei.h>
31#include <linux/poll.h>
32#include <linux/device.h>
33#include <linux/miscdevice.h>
34#include <linux/init.h>
35#include <linux/list.h>
36#include <linux/writeback.h>
37#include <linux/inotify.h>
38
39#include <asm/ioctls.h>
40
41static atomic_t inotify_cookie;
42
43static kmem_cache_t *watch_cachep;
44static kmem_cache_t *event_cachep;
45
46static struct vfsmount *inotify_mnt;
47
48/* These are configurable via /proc/sys/inotify */
49int inotify_max_user_devices;
50int inotify_max_user_watches;
51int inotify_max_queued_events;
52
53/*
54 * Lock ordering:
55 *
56 * dentry->d_lock (used to keep d_move() away from dentry->d_parent)
57 * iprune_sem (synchronize shrink_icache_memory())
58 * inode_lock (protects the super_block->s_inodes list)
59 * inode->inotify_sem (protects inode->inotify_watches and watches->i_list)
60 * inotify_dev->sem (protects inotify_device and watches->d_list)
61 */
62
63/*
64 * Lifetimes of the three main data structures--inotify_device, inode, and
65 * inotify_watch--are managed by reference count.
66 *
67 * inotify_device: Lifetime is from open until release. Additional references
68 * can bump the count via get_inotify_dev() and drop the count via
69 * put_inotify_dev().
70 *
71 * inotify_watch: Lifetime is from create_watch() to destory_watch().
72 * Additional references can bump the count via get_inotify_watch() and drop
73 * the count via put_inotify_watch().
74 *
75 * inode: Pinned so long as the inode is associated with a watch, from
76 * create_watch() to put_inotify_watch().
77 */
78
79/*
80 * struct inotify_device - represents an open instance of an inotify device
81 *
82 * This structure is protected by the semaphore 'sem'.
83 */
84struct inotify_device {
85 wait_queue_head_t wq; /* wait queue for i/o */
86 struct idr idr; /* idr mapping wd -> watch */
87 struct semaphore sem; /* protects this bad boy */
88 struct list_head events; /* list of queued events */
89 struct list_head watches; /* list of watches */
90 atomic_t count; /* reference count */
91 struct user_struct *user; /* user who opened this dev */
92 unsigned int queue_size; /* size of the queue (bytes) */
93 unsigned int event_count; /* number of pending events */
94 unsigned int max_events; /* maximum number of events */
95};
96
97/*
98 * struct inotify_kernel_event - An inotify event, originating from a watch and
99 * queued for user-space. A list of these is attached to each instance of the
100 * device. In read(), this list is walked and all events that can fit in the
101 * buffer are returned.
102 *
103 * Protected by dev->sem of the device in which we are queued.
104 */
105struct inotify_kernel_event {
106 struct inotify_event event; /* the user-space event */
107 struct list_head list; /* entry in inotify_device's list */
108 char *name; /* filename, if any */
109};
110
111/*
112 * struct inotify_watch - represents a watch request on a specific inode
113 *
114 * d_list is protected by dev->sem of the associated watch->dev.
115 * i_list and mask are protected by inode->inotify_sem of the associated inode.
116 * dev, inode, and wd are never written to once the watch is created.
117 */
118struct inotify_watch {
119 struct list_head d_list; /* entry in inotify_device's list */
120 struct list_head i_list; /* entry in inode's list */
121 atomic_t count; /* reference count */
122 struct inotify_device *dev; /* associated device */
123 struct inode *inode; /* associated inode */
124 s32 wd; /* watch descriptor */
125 u32 mask; /* event mask for this watch */
126};
127
128static inline void get_inotify_dev(struct inotify_device *dev)
129{
130 atomic_inc(&dev->count);
131}
132
133static inline void put_inotify_dev(struct inotify_device *dev)
134{
135 if (atomic_dec_and_test(&dev->count)) {
136 atomic_dec(&dev->user->inotify_devs);
137 free_uid(dev->user);
138 kfree(dev);
139 }
140}
141
142static inline void get_inotify_watch(struct inotify_watch *watch)
143{
144 atomic_inc(&watch->count);
145}
146
147/*
148 * put_inotify_watch - decrements the ref count on a given watch. cleans up
149 * the watch and its references if the count reaches zero.
150 */
151static inline void put_inotify_watch(struct inotify_watch *watch)
152{
153 if (atomic_dec_and_test(&watch->count)) {
154 put_inotify_dev(watch->dev);
155 iput(watch->inode);
156 kmem_cache_free(watch_cachep, watch);
157 }
158}
159
160/*
161 * kernel_event - create a new kernel event with the given parameters
162 *
163 * This function can sleep.
164 */
165static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
166 const char *name)
167{
168 struct inotify_kernel_event *kevent;
169
170 kevent = kmem_cache_alloc(event_cachep, GFP_KERNEL);
171 if (unlikely(!kevent))
172 return NULL;
173
174 /* we hand this out to user-space, so zero it just in case */
175 memset(&kevent->event, 0, sizeof(struct inotify_event));
176
177 kevent->event.wd = wd;
178 kevent->event.mask = mask;
179 kevent->event.cookie = cookie;
180
181 INIT_LIST_HEAD(&kevent->list);
182
183 if (name) {
184 size_t len, rem, event_size = sizeof(struct inotify_event);
185
186 /*
187 * We need to pad the filename so as to properly align an
188 * array of inotify_event structures. Because the structure is
189 * small and the common case is a small filename, we just round
190 * up to the next multiple of the structure's sizeof. This is
191 * simple and safe for all architectures.
192 */
193 len = strlen(name) + 1;
194 rem = event_size - len;
195 if (len > event_size) {
196 rem = event_size - (len % event_size);
197 if (len % event_size == 0)
198 rem = 0;
199 }
200
201 kevent->name = kmalloc(len + rem, GFP_KERNEL);
202 if (unlikely(!kevent->name)) {
203 kmem_cache_free(event_cachep, kevent);
204 return NULL;
205 }
206 memcpy(kevent->name, name, len);
207 if (rem)
208 memset(kevent->name + len, 0, rem);
209 kevent->event.len = len + rem;
210 } else {
211 kevent->event.len = 0;
212 kevent->name = NULL;
213 }
214
215 return kevent;
216}
217
218/*
219 * inotify_dev_get_event - return the next event in the given dev's queue
220 *
221 * Caller must hold dev->sem.
222 */
223static inline struct inotify_kernel_event *
224inotify_dev_get_event(struct inotify_device *dev)
225{
226 return list_entry(dev->events.next, struct inotify_kernel_event, list);
227}
228
229/*
230 * inotify_dev_queue_event - add a new event to the given device
231 *
232 * Caller must hold dev->sem. Can sleep (calls kernel_event()).
233 */
234static void inotify_dev_queue_event(struct inotify_device *dev,
235 struct inotify_watch *watch, u32 mask,
236 u32 cookie, const char *name)
237{
238 struct inotify_kernel_event *kevent, *last;
239
240 /* coalescing: drop this event if it is a dupe of the previous */
241 last = inotify_dev_get_event(dev);
242 if (last && last->event.mask == mask && last->event.wd == watch->wd &&
243 last->event.cookie == cookie) {
244 const char *lastname = last->name;
245
246 if (!name && !lastname)
247 return;
248 if (name && lastname && !strcmp(lastname, name))
249 return;
250 }
251
252 /* the queue overflowed and we already sent the Q_OVERFLOW event */
253 if (unlikely(dev->event_count > dev->max_events))
254 return;
255
256 /* if the queue overflows, we need to notify user space */
257 if (unlikely(dev->event_count == dev->max_events))
258 kevent = kernel_event(-1, IN_Q_OVERFLOW, cookie, NULL);
259 else
260 kevent = kernel_event(watch->wd, mask, cookie, name);
261
262 if (unlikely(!kevent))
263 return;
264
265 /* queue the event and wake up anyone waiting */
266 dev->event_count++;
267 dev->queue_size += sizeof(struct inotify_event) + kevent->event.len;
268 list_add_tail(&kevent->list, &dev->events);
269 wake_up_interruptible(&dev->wq);
270}
271
272/*
273 * remove_kevent - cleans up and ultimately frees the given kevent
274 *
275 * Caller must hold dev->sem.
276 */
277static void remove_kevent(struct inotify_device *dev,
278 struct inotify_kernel_event *kevent)
279{
280 list_del(&kevent->list);
281
282 dev->event_count--;
283 dev->queue_size -= sizeof(struct inotify_event) + kevent->event.len;
284
285 kfree(kevent->name);
286 kmem_cache_free(event_cachep, kevent);
287}
288
289/*
290 * inotify_dev_event_dequeue - destroy an event on the given device
291 *
292 * Caller must hold dev->sem.
293 */
294static void inotify_dev_event_dequeue(struct inotify_device *dev)
295{
296 if (!list_empty(&dev->events)) {
297 struct inotify_kernel_event *kevent;
298 kevent = inotify_dev_get_event(dev);
299 remove_kevent(dev, kevent);
300 }
301}
302
303/*
304 * inotify_dev_get_wd - returns the next WD for use by the given dev
305 *
306 * Callers must hold dev->sem. This function can sleep.
307 */
308static int inotify_dev_get_wd(struct inotify_device *dev,
309 struct inotify_watch *watch)
310{
311 int ret;
312
313 do {
314 if (unlikely(!idr_pre_get(&dev->idr, GFP_KERNEL)))
315 return -ENOSPC;
316 ret = idr_get_new(&dev->idr, watch, &watch->wd);
317 } while (ret == -EAGAIN);
318
319 return ret;
320}
321
322/*
323 * find_inode - resolve a user-given path to a specific inode and return a nd
324 */
325static int find_inode(const char __user *dirname, struct nameidata *nd)
326{
327 int error;
328
329 error = __user_walk(dirname, LOOKUP_FOLLOW, nd);
330 if (error)
331 return error;
332 /* you can only watch an inode if you have read permissions on it */
333 error = permission(nd->dentry->d_inode, MAY_READ, NULL);
334 if (error)
335 path_release (nd);
336 return error;
337}
338
339/*
340 * create_watch - creates a watch on the given device.
341 *
342 * Callers must hold dev->sem. Calls inotify_dev_get_wd() so may sleep.
343 * Both 'dev' and 'inode' (by way of nameidata) need to be pinned.
344 */
345static struct inotify_watch *create_watch(struct inotify_device *dev,
346 u32 mask, struct inode *inode)
347{
348 struct inotify_watch *watch;
349 int ret;
350
351 if (atomic_read(&dev->user->inotify_watches) >= inotify_max_user_watches)
352 return ERR_PTR(-ENOSPC);
353
354 watch = kmem_cache_alloc(watch_cachep, GFP_KERNEL);
355 if (unlikely(!watch))
356 return ERR_PTR(-ENOMEM);
357
358 ret = inotify_dev_get_wd(dev, watch);
359 if (unlikely(ret)) {
360 kmem_cache_free(watch_cachep, watch);
361 return ERR_PTR(ret);
362 }
363
364 watch->mask = mask;
365 atomic_set(&watch->count, 0);
366 INIT_LIST_HEAD(&watch->d_list);
367 INIT_LIST_HEAD(&watch->i_list);
368
369 /* save a reference to device and bump the count to make it official */
370 get_inotify_dev(dev);
371 watch->dev = dev;
372
373 /*
374 * Save a reference to the inode and bump the ref count to make it
375 * official. We hold a reference to nameidata, which makes this safe.
376 */
377 watch->inode = igrab(inode);
378
379 /* bump our own count, corresponding to our entry in dev->watches */
380 get_inotify_watch(watch);
381
382 atomic_inc(&dev->user->inotify_watches);
383
384 return watch;
385}
386
387/*
388 * inotify_find_dev - find the watch associated with the given inode and dev
389 *
390 * Callers must hold inode->inotify_sem.
391 */
392static struct inotify_watch *inode_find_dev(struct inode *inode,
393 struct inotify_device *dev)
394{
395 struct inotify_watch *watch;
396
397 list_for_each_entry(watch, &inode->inotify_watches, i_list) {
398 if (watch->dev == dev)
399 return watch;
400 }
401
402 return NULL;
403}
404
405/*
406 * remove_watch_no_event - remove_watch() without the IN_IGNORED event.
407 */
408static void remove_watch_no_event(struct inotify_watch *watch,
409 struct inotify_device *dev)
410{
411 list_del(&watch->i_list);
412 list_del(&watch->d_list);
413
414 atomic_dec(&dev->user->inotify_watches);
415 idr_remove(&dev->idr, watch->wd);
416 put_inotify_watch(watch);
417}
418
419/*
420 * remove_watch - Remove a watch from both the device and the inode. Sends
421 * the IN_IGNORED event to the given device signifying that the inode is no
422 * longer watched.
423 *
424 * Callers must hold both inode->inotify_sem and dev->sem. We drop a
425 * reference to the inode before returning.
426 *
427 * The inode is not iput() so as to remain atomic. If the inode needs to be
428 * iput(), the call returns one. Otherwise, it returns zero.
429 */
430static void remove_watch(struct inotify_watch *watch,struct inotify_device *dev)
431{
432 inotify_dev_queue_event(dev, watch, IN_IGNORED, 0, NULL);
433 remove_watch_no_event(watch, dev);
434}
435
436/*
437 * inotify_inode_watched - returns nonzero if there are watches on this inode
438 * and zero otherwise. We call this lockless, we do not care if we race.
439 */
440static inline int inotify_inode_watched(struct inode *inode)
441{
442 return !list_empty(&inode->inotify_watches);
443}
444
445/* Kernel API */
446
447/**
448 * inotify_inode_queue_event - queue an event to all watches on this inode
449 * @inode: inode event is originating from
450 * @mask: event mask describing this event
451 * @cookie: cookie for synchronization, or zero
452 * @name: filename, if any
453 */
454void inotify_inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
455 const char *name)
456{
457 struct inotify_watch *watch, *next;
458
459 if (!inotify_inode_watched(inode))
460 return;
461
462 down(&inode->inotify_sem);
463 list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) {
464 u32 watch_mask = watch->mask;
465 if (watch_mask & mask) {
466 struct inotify_device *dev = watch->dev;
467 get_inotify_watch(watch);
468 down(&dev->sem);
469 inotify_dev_queue_event(dev, watch, mask, cookie, name);
470 if (watch_mask & IN_ONESHOT)
471 remove_watch_no_event(watch, dev);
472 up(&dev->sem);
473 put_inotify_watch(watch);
474 }
475 }
476 up(&inode->inotify_sem);
477}
478EXPORT_SYMBOL_GPL(inotify_inode_queue_event);
479
480/**
481 * inotify_dentry_parent_queue_event - queue an event to a dentry's parent
482 * @dentry: the dentry in question, we queue against this dentry's parent
483 * @mask: event mask describing this event
484 * @cookie: cookie for synchronization, or zero
485 * @name: filename, if any
486 */
487void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
488 u32 cookie, const char *name)
489{
490 struct dentry *parent;
491 struct inode *inode;
492
493 spin_lock(&dentry->d_lock);
494 parent = dentry->d_parent;
495 inode = parent->d_inode;
496
497 if (inotify_inode_watched(inode)) {
498 dget(parent);
499 spin_unlock(&dentry->d_lock);
500 inotify_inode_queue_event(inode, mask, cookie, name);
501 dput(parent);
502 } else
503 spin_unlock(&dentry->d_lock);
504}
505EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);
506
507/**
508 * inotify_get_cookie - return a unique cookie for use in synchronizing events.
509 */
510u32 inotify_get_cookie(void)
511{
512 return atomic_inc_return(&inotify_cookie);
513}
514EXPORT_SYMBOL_GPL(inotify_get_cookie);
515
516/**
517 * inotify_unmount_inodes - an sb is unmounting. handle any watched inodes.
518 * @list: list of inodes being unmounted (sb->s_inodes)
519 *
520 * Called with inode_lock held, protecting the unmounting super block's list
521 * of inodes, and with iprune_sem held, keeping shrink_icache_memory() at bay.
522 * We temporarily drop inode_lock, however, and CAN block.
523 */
524void inotify_unmount_inodes(struct list_head *list)
525{
526 struct inode *inode, *next_i, *need_iput = NULL;
527
528 list_for_each_entry_safe(inode, next_i, list, i_sb_list) {
529 struct inotify_watch *watch, *next_w;
530 struct inode *need_iput_tmp;
531 struct list_head *watches;
532
533 /*
534 * If i_count is zero, the inode cannot have any watches and
535 * doing an __iget/iput with MS_ACTIVE clear would actually
536 * evict all inodes with zero i_count from icache which is
537 * unnecessarily violent and may in fact be illegal to do.
538 */
539 if (!atomic_read(&inode->i_count))
540 continue;
541
542 /*
543 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
544 * I_WILL_FREE which is fine because by that point the inode
545 * cannot have any associated watches.
546 */
547 if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
548 continue;
549
550 need_iput_tmp = need_iput;
551 need_iput = NULL;
552 /* In case the remove_watch() drops a reference. */
553 if (inode != need_iput_tmp)
554 __iget(inode);
555 else
556 need_iput_tmp = NULL;
557 /* In case the dropping of a reference would nuke next_i. */
558 if ((&next_i->i_sb_list != list) &&
559 atomic_read(&next_i->i_count) &&
560 !(next_i->i_state & (I_CLEAR | I_FREEING |
561 I_WILL_FREE))) {
562 __iget(next_i);
563 need_iput = next_i;
564 }
565
566 /*
567 * We can safely drop inode_lock here because we hold
568 * references on both inode and next_i. Also no new inodes
569 * will be added since the umount has begun. Finally,
570 * iprune_sem keeps shrink_icache_memory() away.
571 */
572 spin_unlock(&inode_lock);
573
574 if (need_iput_tmp)
575 iput(need_iput_tmp);
576
577 /* for each watch, send IN_UNMOUNT and then remove it */
578 down(&inode->inotify_sem);
579 watches = &inode->inotify_watches;
580 list_for_each_entry_safe(watch, next_w, watches, i_list) {
581 struct inotify_device *dev = watch->dev;
582 down(&dev->sem);
583 inotify_dev_queue_event(dev, watch, IN_UNMOUNT,0,NULL);
584 remove_watch(watch, dev);
585 up(&dev->sem);
586 }
587 up(&inode->inotify_sem);
588 iput(inode);
589
590 spin_lock(&inode_lock);
591 }
592}
593EXPORT_SYMBOL_GPL(inotify_unmount_inodes);
594
595/**
596 * inotify_inode_is_dead - an inode has been deleted, cleanup any watches
597 * @inode: inode that is about to be removed
598 */
599void inotify_inode_is_dead(struct inode *inode)
600{
601 struct inotify_watch *watch, *next;
602
603 down(&inode->inotify_sem);
604 list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) {
605 struct inotify_device *dev = watch->dev;
606 down(&dev->sem);
607 remove_watch(watch, dev);
608 up(&dev->sem);
609 }
610 up(&inode->inotify_sem);
611}
612EXPORT_SYMBOL_GPL(inotify_inode_is_dead);
613
614/* Device Interface */
615
616static unsigned int inotify_poll(struct file *file, poll_table *wait)
617{
618 struct inotify_device *dev = file->private_data;
619 int ret = 0;
620
621 poll_wait(file, &dev->wq, wait);
622 down(&dev->sem);
623 if (!list_empty(&dev->events))
624 ret = POLLIN | POLLRDNORM;
625 up(&dev->sem);
626
627 return ret;
628}
629
630static ssize_t inotify_read(struct file *file, char __user *buf,
631 size_t count, loff_t *pos)
632{
633 size_t event_size = sizeof (struct inotify_event);
634 struct inotify_device *dev;
635 char __user *start;
636 int ret;
637 DEFINE_WAIT(wait);
638
639 start = buf;
640 dev = file->private_data;
641
642 while (1) {
643 int events;
644
645 prepare_to_wait(&dev->wq, &wait, TASK_INTERRUPTIBLE);
646
647 down(&dev->sem);
648 events = !list_empty(&dev->events);
649 up(&dev->sem);
650 if (events) {
651 ret = 0;
652 break;
653 }
654
655 if (file->f_flags & O_NONBLOCK) {
656 ret = -EAGAIN;
657 break;
658 }
659
660 if (signal_pending(current)) {
661 ret = -EINTR;
662 break;
663 }
664
665 schedule();
666 }
667
668 finish_wait(&dev->wq, &wait);
669 if (ret)
670 return ret;
671
672 down(&dev->sem);
673 while (1) {
674 struct inotify_kernel_event *kevent;
675
676 ret = buf - start;
677 if (list_empty(&dev->events))
678 break;
679
680 kevent = inotify_dev_get_event(dev);
681 if (event_size + kevent->event.len > count)
682 break;
683
684 if (copy_to_user(buf, &kevent->event, event_size)) {
685 ret = -EFAULT;
686 break;
687 }
688 buf += event_size;
689 count -= event_size;
690
691 if (kevent->name) {
692 if (copy_to_user(buf, kevent->name, kevent->event.len)){
693 ret = -EFAULT;
694 break;
695 }
696 buf += kevent->event.len;
697 count -= kevent->event.len;
698 }
699
700 remove_kevent(dev, kevent);
701 }
702 up(&dev->sem);
703
704 return ret;
705}
706
707static int inotify_release(struct inode *ignored, struct file *file)
708{
709 struct inotify_device *dev = file->private_data;
710
711 /*
712 * Destroy all of the watches on this device. Unfortunately, not very
713 * pretty. We cannot do a simple iteration over the list, because we
714 * do not know the inode until we iterate to the watch. But we need to
715 * hold inode->inotify_sem before dev->sem. The following works.
716 */
717 while (1) {
718 struct inotify_watch *watch;
719 struct list_head *watches;
720 struct inode *inode;
721
722 down(&dev->sem);
723 watches = &dev->watches;
724 if (list_empty(watches)) {
725 up(&dev->sem);
726 break;
727 }
728 watch = list_entry(watches->next, struct inotify_watch, d_list);
729 get_inotify_watch(watch);
730 up(&dev->sem);
731
732 inode = watch->inode;
733 down(&inode->inotify_sem);
734 down(&dev->sem);
735 remove_watch_no_event(watch, dev);
736 up(&dev->sem);
737 up(&inode->inotify_sem);
738 put_inotify_watch(watch);
739 }
740
741 /* destroy all of the events on this device */
742 down(&dev->sem);
743 while (!list_empty(&dev->events))
744 inotify_dev_event_dequeue(dev);
745 up(&dev->sem);
746
747 /* free this device: the put matching the get in inotify_open() */
748 put_inotify_dev(dev);
749
750 return 0;
751}
752
753/*
754 * inotify_ignore - handle the INOTIFY_IGNORE ioctl, asking that a given wd be
755 * removed from the device.
756 *
757 * Can sleep.
758 */
759static int inotify_ignore(struct inotify_device *dev, s32 wd)
760{
761 struct inotify_watch *watch;
762 struct inode *inode;
763
764 down(&dev->sem);
765 watch = idr_find(&dev->idr, wd);
766 if (unlikely(!watch)) {
767 up(&dev->sem);
768 return -EINVAL;
769 }
770 get_inotify_watch(watch);
771 inode = watch->inode;
772 up(&dev->sem);
773
774 down(&inode->inotify_sem);
775 down(&dev->sem);
776
777 /* make sure that we did not race */
778 watch = idr_find(&dev->idr, wd);
779 if (likely(watch))
780 remove_watch(watch, dev);
781
782 up(&dev->sem);
783 up(&inode->inotify_sem);
784 put_inotify_watch(watch);
785
786 return 0;
787}
788
789static long inotify_ioctl(struct file *file, unsigned int cmd,
790 unsigned long arg)
791{
792 struct inotify_device *dev;
793 void __user *p;
794 int ret = -ENOTTY;
795
796 dev = file->private_data;
797 p = (void __user *) arg;
798
799 switch (cmd) {
800 case FIONREAD:
801 ret = put_user(dev->queue_size, (int __user *) p);
802 break;
803 }
804
805 return ret;
806}
807
808static struct file_operations inotify_fops = {
809 .poll = inotify_poll,
810 .read = inotify_read,
811 .release = inotify_release,
812 .unlocked_ioctl = inotify_ioctl,
813 .compat_ioctl = inotify_ioctl,
814};
815
816asmlinkage long sys_inotify_init(void)
817{
818 struct inotify_device *dev;
819 struct user_struct *user;
820 int ret = -ENOTTY;
821 int fd;
822 struct file *filp;
823
824 fd = get_unused_fd();
825 if (fd < 0) {
826 ret = fd;
827 goto out;
828 }
829
830 filp = get_empty_filp();
831 if (!filp) {
832 put_unused_fd(fd);
833 ret = -ENFILE;
834 goto out;
835 }
836 filp->f_op = &inotify_fops;
837 filp->f_vfsmnt = mntget(inotify_mnt);
838 filp->f_dentry = dget(inotify_mnt->mnt_root);
839 filp->f_mapping = filp->f_dentry->d_inode->i_mapping;
840 filp->f_mode = FMODE_READ;
841 filp->f_flags = O_RDONLY;
842
843 user = get_uid(current->user);
844
845 if (unlikely(atomic_read(&user->inotify_devs) >= inotify_max_user_devices)) {
846 ret = -EMFILE;
847 goto out_err;
848 }
849
850 dev = kmalloc(sizeof(struct inotify_device), GFP_KERNEL);
851 if (unlikely(!dev)) {
852 ret = -ENOMEM;
853 goto out_err;
854 }
855
856 idr_init(&dev->idr);
857 INIT_LIST_HEAD(&dev->events);
858 INIT_LIST_HEAD(&dev->watches);
859 init_waitqueue_head(&dev->wq);
860 sema_init(&dev->sem, 1);
861 dev->event_count = 0;
862 dev->queue_size = 0;
863 dev->max_events = inotify_max_queued_events;
864 dev->user = user;
865 atomic_set(&dev->count, 0);
866
867 get_inotify_dev(dev);
868 atomic_inc(&user->inotify_devs);
869
870 filp->private_data = dev;
871 fd_install (fd, filp);
872 return fd;
873out_err:
874 put_unused_fd (fd);
875 put_filp (filp);
876 free_uid(user);
877out:
878 return ret;
879}
880
881asmlinkage long sys_inotify_add_watch(int fd, const char *path, u32 mask)
882{
883 struct inotify_watch *watch, *old;
884 struct inode *inode;
885 struct inotify_device *dev;
886 struct nameidata nd;
887 struct file *filp;
888 int ret;
889
890 filp = fget(fd);
891 if (!filp)
892 return -EBADF;
893
894 dev = filp->private_data;
895
896 ret = find_inode ((const char __user*)path, &nd);
897 if (ret)
898 goto fput_and_out;
899
900 /* Held in place by reference in nd */
901 inode = nd.dentry->d_inode;
902
903 down(&inode->inotify_sem);
904 down(&dev->sem);
905
906 /* don't let user-space set invalid bits: we don't want flags set */
907 mask &= IN_ALL_EVENTS;
908 if (!mask) {
909 ret = -EINVAL;
910 goto out;
911 }
912
913 /*
914 * Handle the case of re-adding a watch on an (inode,dev) pair that we
915 * are already watching. We just update the mask and return its wd.
916 */
917 old = inode_find_dev(inode, dev);
918 if (unlikely(old)) {
919 old->mask = mask;
920 ret = old->wd;
921 goto out;
922 }
923
924 watch = create_watch(dev, mask, inode);
925 if (unlikely(IS_ERR(watch))) {
926 ret = PTR_ERR(watch);
927 goto out;
928 }
929
930 /* Add the watch to the device's and the inode's list */
931 list_add(&watch->d_list, &dev->watches);
932 list_add(&watch->i_list, &inode->inotify_watches);
933 ret = watch->wd;
934out:
935 path_release (&nd);
936 up(&dev->sem);
937 up(&inode->inotify_sem);
938fput_and_out:
939 fput(filp);
940 return ret;
941}
942
943asmlinkage long sys_inotify_rm_watch(int fd, u32 wd)
944{
945 struct file *filp;
946 struct inotify_device *dev;
947 int ret;
948
949 filp = fget(fd);
950 if (!filp)
951 return -EBADF;
952 dev = filp->private_data;
953 ret = inotify_ignore (dev, wd);
954 fput(filp);
955 return ret;
956}
957
958static struct super_block *
959inotify_get_sb(struct file_system_type *fs_type, int flags,
960 const char *dev_name, void *data)
961{
962 return get_sb_pseudo(fs_type, "inotify", NULL, 0xBAD1DEA);
963}
964
965static struct file_system_type inotify_fs_type = {
966 .name = "inotifyfs",
967 .get_sb = inotify_get_sb,
968 .kill_sb = kill_anon_super,
969};
970
971/*
972 * inotify_init - Our initialization function. Note that we cannnot return
973 * error because we have compiled-in VFS hooks. So an (unlikely) failure here
974 * must result in panic().
975 */
976static int __init inotify_init(void)
977{
978 register_filesystem(&inotify_fs_type);
979 inotify_mnt = kern_mount(&inotify_fs_type);
980
981 inotify_max_queued_events = 8192;
982 inotify_max_user_devices = 128;
983 inotify_max_user_watches = 8192;
984
985 atomic_set(&inotify_cookie, 0);
986
987 watch_cachep = kmem_cache_create("inotify_watch_cache",
988 sizeof(struct inotify_watch),
989 0, SLAB_PANIC, NULL, NULL);
990 event_cachep = kmem_cache_create("inotify_event_cache",
991 sizeof(struct inotify_kernel_event),
992 0, SLAB_PANIC, NULL, NULL);
993
994 printk(KERN_INFO "inotify syscall\n");
995
996 return 0;
997}
998
999module_init(inotify_init);
diff --git a/fs/namei.c b/fs/namei.c
index 1d93cb4f7c5f..02a824cd3c5c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -21,7 +21,7 @@
21#include <linux/namei.h> 21#include <linux/namei.h>
22#include <linux/quotaops.h> 22#include <linux/quotaops.h>
23#include <linux/pagemap.h> 23#include <linux/pagemap.h>
24#include <linux/dnotify.h> 24#include <linux/fsnotify.h>
25#include <linux/smp_lock.h> 25#include <linux/smp_lock.h>
26#include <linux/personality.h> 26#include <linux/personality.h>
27#include <linux/security.h> 27#include <linux/security.h>
@@ -1312,7 +1312,7 @@ int vfs_create(struct inode *dir, struct dentry *dentry, int mode,
1312 DQUOT_INIT(dir); 1312 DQUOT_INIT(dir);
1313 error = dir->i_op->create(dir, dentry, mode, nd); 1313 error = dir->i_op->create(dir, dentry, mode, nd);
1314 if (!error) { 1314 if (!error) {
1315 inode_dir_notify(dir, DN_CREATE); 1315 fsnotify_create(dir, dentry->d_name.name);
1316 security_inode_post_create(dir, dentry, mode); 1316 security_inode_post_create(dir, dentry, mode);
1317 } 1317 }
1318 return error; 1318 return error;
@@ -1637,7 +1637,7 @@ int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
1637 DQUOT_INIT(dir); 1637 DQUOT_INIT(dir);
1638 error = dir->i_op->mknod(dir, dentry, mode, dev); 1638 error = dir->i_op->mknod(dir, dentry, mode, dev);
1639 if (!error) { 1639 if (!error) {
1640 inode_dir_notify(dir, DN_CREATE); 1640 fsnotify_create(dir, dentry->d_name.name);
1641 security_inode_post_mknod(dir, dentry, mode, dev); 1641 security_inode_post_mknod(dir, dentry, mode, dev);
1642 } 1642 }
1643 return error; 1643 return error;
@@ -1710,7 +1710,7 @@ int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
1710 DQUOT_INIT(dir); 1710 DQUOT_INIT(dir);
1711 error = dir->i_op->mkdir(dir, dentry, mode); 1711 error = dir->i_op->mkdir(dir, dentry, mode);
1712 if (!error) { 1712 if (!error) {
1713 inode_dir_notify(dir, DN_CREATE); 1713 fsnotify_mkdir(dir, dentry->d_name.name);
1714 security_inode_post_mkdir(dir,dentry, mode); 1714 security_inode_post_mkdir(dir,dentry, mode);
1715 } 1715 }
1716 return error; 1716 return error;
@@ -1801,7 +1801,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
1801 } 1801 }
1802 up(&dentry->d_inode->i_sem); 1802 up(&dentry->d_inode->i_sem);
1803 if (!error) { 1803 if (!error) {
1804 inode_dir_notify(dir, DN_DELETE); 1804 fsnotify_rmdir(dentry, dentry->d_inode, dir);
1805 d_delete(dentry); 1805 d_delete(dentry);
1806 } 1806 }
1807 dput(dentry); 1807 dput(dentry);
@@ -1874,9 +1874,10 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
1874 1874
1875 /* We don't d_delete() NFS sillyrenamed files--they still exist. */ 1875 /* We don't d_delete() NFS sillyrenamed files--they still exist. */
1876 if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) { 1876 if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) {
1877 fsnotify_unlink(dentry, dir);
1877 d_delete(dentry); 1878 d_delete(dentry);
1878 inode_dir_notify(dir, DN_DELETE);
1879 } 1879 }
1880
1880 return error; 1881 return error;
1881} 1882}
1882 1883
@@ -1950,7 +1951,7 @@ int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname, i
1950 DQUOT_INIT(dir); 1951 DQUOT_INIT(dir);
1951 error = dir->i_op->symlink(dir, dentry, oldname); 1952 error = dir->i_op->symlink(dir, dentry, oldname);
1952 if (!error) { 1953 if (!error) {
1953 inode_dir_notify(dir, DN_CREATE); 1954 fsnotify_create(dir, dentry->d_name.name);
1954 security_inode_post_symlink(dir, dentry, oldname); 1955 security_inode_post_symlink(dir, dentry, oldname);
1955 } 1956 }
1956 return error; 1957 return error;
@@ -2023,7 +2024,7 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
2023 error = dir->i_op->link(old_dentry, dir, new_dentry); 2024 error = dir->i_op->link(old_dentry, dir, new_dentry);
2024 up(&old_dentry->d_inode->i_sem); 2025 up(&old_dentry->d_inode->i_sem);
2025 if (!error) { 2026 if (!error) {
2026 inode_dir_notify(dir, DN_CREATE); 2027 fsnotify_create(dir, new_dentry->d_name.name);
2027 security_inode_post_link(old_dentry, dir, new_dentry); 2028 security_inode_post_link(old_dentry, dir, new_dentry);
2028 } 2029 }
2029 return error; 2030 return error;
@@ -2187,6 +2188,7 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
2187{ 2188{
2188 int error; 2189 int error;
2189 int is_dir = S_ISDIR(old_dentry->d_inode->i_mode); 2190 int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
2191 const char *old_name;
2190 2192
2191 if (old_dentry->d_inode == new_dentry->d_inode) 2193 if (old_dentry->d_inode == new_dentry->d_inode)
2192 return 0; 2194 return 0;
@@ -2208,18 +2210,18 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
2208 DQUOT_INIT(old_dir); 2210 DQUOT_INIT(old_dir);
2209 DQUOT_INIT(new_dir); 2211 DQUOT_INIT(new_dir);
2210 2212
2213 old_name = fsnotify_oldname_init(old_dentry->d_name.name);
2214
2211 if (is_dir) 2215 if (is_dir)
2212 error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry); 2216 error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
2213 else 2217 else
2214 error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry); 2218 error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
2215 if (!error) { 2219 if (!error) {
2216 if (old_dir == new_dir) 2220 const char *new_name = old_dentry->d_name.name;
2217 inode_dir_notify(old_dir, DN_RENAME); 2221 fsnotify_move(old_dir, new_dir, old_name, new_name, is_dir);
2218 else {
2219 inode_dir_notify(old_dir, DN_DELETE);
2220 inode_dir_notify(new_dir, DN_CREATE);
2221 }
2222 } 2222 }
2223 fsnotify_oldname_free(old_name);
2224
2223 return error; 2225 return error;
2224} 2226}
2225 2227
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 5e0bf3917607..4f2cd3d27566 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -45,7 +45,7 @@
45#endif /* CONFIG_NFSD_V3 */ 45#endif /* CONFIG_NFSD_V3 */
46#include <linux/nfsd/nfsfh.h> 46#include <linux/nfsd/nfsfh.h>
47#include <linux/quotaops.h> 47#include <linux/quotaops.h>
48#include <linux/dnotify.h> 48#include <linux/fsnotify.h>
49#include <linux/posix_acl.h> 49#include <linux/posix_acl.h>
50#include <linux/posix_acl_xattr.h> 50#include <linux/posix_acl_xattr.h>
51#ifdef CONFIG_NFSD_V4 51#ifdef CONFIG_NFSD_V4
@@ -860,7 +860,7 @@ nfsd_vfs_read(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
860 nfsdstats.io_read += err; 860 nfsdstats.io_read += err;
861 *count = err; 861 *count = err;
862 err = 0; 862 err = 0;
863 dnotify_parent(file->f_dentry, DN_ACCESS); 863 fsnotify_access(file->f_dentry);
864 } else 864 } else
865 err = nfserrno(err); 865 err = nfserrno(err);
866out: 866out:
@@ -916,7 +916,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
916 set_fs(oldfs); 916 set_fs(oldfs);
917 if (err >= 0) { 917 if (err >= 0) {
918 nfsdstats.io_write += cnt; 918 nfsdstats.io_write += cnt;
919 dnotify_parent(file->f_dentry, DN_MODIFY); 919 fsnotify_modify(file->f_dentry);
920 } 920 }
921 921
922 /* clear setuid/setgid flag after write */ 922 /* clear setuid/setgid flag after write */
diff --git a/fs/open.c b/fs/open.c
index 3f4a4286fdc4..32bf05e2996d 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -10,7 +10,7 @@
10#include <linux/file.h> 10#include <linux/file.h>
11#include <linux/smp_lock.h> 11#include <linux/smp_lock.h>
12#include <linux/quotaops.h> 12#include <linux/quotaops.h>
13#include <linux/dnotify.h> 13#include <linux/fsnotify.h>
14#include <linux/module.h> 14#include <linux/module.h>
15#include <linux/slab.h> 15#include <linux/slab.h>
16#include <linux/tty.h> 16#include <linux/tty.h>
@@ -951,6 +951,7 @@ asmlinkage long sys_open(const char __user * filename, int flags, int mode)
951 put_unused_fd(fd); 951 put_unused_fd(fd);
952 fd = PTR_ERR(f); 952 fd = PTR_ERR(f);
953 } else { 953 } else {
954 fsnotify_open(f->f_dentry);
954 fd_install(fd, f); 955 fd_install(fd, f);
955 } 956 }
956 } 957 }
diff --git a/fs/read_write.c b/fs/read_write.c
index 9292f5fa4d62..563abd09b5c8 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -10,7 +10,7 @@
10#include <linux/file.h> 10#include <linux/file.h>
11#include <linux/uio.h> 11#include <linux/uio.h>
12#include <linux/smp_lock.h> 12#include <linux/smp_lock.h>
13#include <linux/dnotify.h> 13#include <linux/fsnotify.h>
14#include <linux/security.h> 14#include <linux/security.h>
15#include <linux/module.h> 15#include <linux/module.h>
16#include <linux/syscalls.h> 16#include <linux/syscalls.h>
@@ -252,7 +252,7 @@ ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
252 else 252 else
253 ret = do_sync_read(file, buf, count, pos); 253 ret = do_sync_read(file, buf, count, pos);
254 if (ret > 0) { 254 if (ret > 0) {
255 dnotify_parent(file->f_dentry, DN_ACCESS); 255 fsnotify_access(file->f_dentry);
256 current->rchar += ret; 256 current->rchar += ret;
257 } 257 }
258 current->syscr++; 258 current->syscr++;
@@ -303,7 +303,7 @@ ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_
303 else 303 else
304 ret = do_sync_write(file, buf, count, pos); 304 ret = do_sync_write(file, buf, count, pos);
305 if (ret > 0) { 305 if (ret > 0) {
306 dnotify_parent(file->f_dentry, DN_MODIFY); 306 fsnotify_modify(file->f_dentry);
307 current->wchar += ret; 307 current->wchar += ret;
308 } 308 }
309 current->syscw++; 309 current->syscw++;
@@ -539,9 +539,12 @@ static ssize_t do_readv_writev(int type, struct file *file,
539out: 539out:
540 if (iov != iovstack) 540 if (iov != iovstack)
541 kfree(iov); 541 kfree(iov);
542 if ((ret + (type == READ)) > 0) 542 if ((ret + (type == READ)) > 0) {
543 dnotify_parent(file->f_dentry, 543 if (type == READ)
544 (type == READ) ? DN_ACCESS : DN_MODIFY); 544 fsnotify_access(file->f_dentry);
545 else
546 fsnotify_modify(file->f_dentry);
547 }
545 return ret; 548 return ret;
546Efault: 549Efault:
547 ret = -EFAULT; 550 ret = -EFAULT;
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index d72c1ce48559..335288b9be0f 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -3,7 +3,7 @@
3 */ 3 */
4 4
5#include <linux/module.h> 5#include <linux/module.h>
6#include <linux/dnotify.h> 6#include <linux/fsnotify.h>
7#include <linux/kobject.h> 7#include <linux/kobject.h>
8#include <linux/namei.h> 8#include <linux/namei.h>
9#include <asm/uaccess.h> 9#include <asm/uaccess.h>
@@ -391,9 +391,6 @@ int sysfs_create_file(struct kobject * kobj, const struct attribute * attr)
391 * sysfs_update_file - update the modified timestamp on an object attribute. 391 * sysfs_update_file - update the modified timestamp on an object attribute.
392 * @kobj: object we're acting for. 392 * @kobj: object we're acting for.
393 * @attr: attribute descriptor. 393 * @attr: attribute descriptor.
394 *
395 * Also call dnotify for the dentry, which lots of userspace programs
396 * use.
397 */ 394 */
398int sysfs_update_file(struct kobject * kobj, const struct attribute * attr) 395int sysfs_update_file(struct kobject * kobj, const struct attribute * attr)
399{ 396{
@@ -408,7 +405,7 @@ int sysfs_update_file(struct kobject * kobj, const struct attribute * attr)
408 if (victim->d_inode && 405 if (victim->d_inode &&
409 (victim->d_parent->d_inode == dir->d_inode)) { 406 (victim->d_parent->d_inode == dir->d_inode)) {
410 victim->d_inode->i_mtime = CURRENT_TIME; 407 victim->d_inode->i_mtime = CURRENT_TIME;
411 dnotify_parent(victim, DN_MODIFY); 408 fsnotify_modify(victim);
412 409
413 /** 410 /**
414 * Drop reference from initial sysfs_get_dentry(). 411 * Drop reference from initial sysfs_get_dentry().
diff --git a/fs/xattr.c b/fs/xattr.c
index 93dee70a1dbe..6acd5c63da91 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -16,6 +16,7 @@
16#include <linux/security.h> 16#include <linux/security.h>
17#include <linux/syscalls.h> 17#include <linux/syscalls.h>
18#include <linux/module.h> 18#include <linux/module.h>
19#include <linux/fsnotify.h>
19#include <asm/uaccess.h> 20#include <asm/uaccess.h>
20 21
21/* 22/*
@@ -57,8 +58,10 @@ setxattr(struct dentry *d, char __user *name, void __user *value,
57 if (error) 58 if (error)
58 goto out; 59 goto out;
59 error = d->d_inode->i_op->setxattr(d, kname, kvalue, size, flags); 60 error = d->d_inode->i_op->setxattr(d, kname, kvalue, size, flags);
60 if (!error) 61 if (!error) {
62 fsnotify_xattr(d);
61 security_inode_post_setxattr(d, kname, kvalue, size, flags); 63 security_inode_post_setxattr(d, kname, kvalue, size, flags);
64 }
62out: 65out:
63 up(&d->d_inode->i_sem); 66 up(&d->d_inode->i_sem);
64 } 67 }
diff --git a/include/asm-i386/unistd.h b/include/asm-i386/unistd.h
index e25e4c71a879..a7cb377745bf 100644
--- a/include/asm-i386/unistd.h
+++ b/include/asm-i386/unistd.h
@@ -296,8 +296,11 @@
296#define __NR_keyctl 288 296#define __NR_keyctl 288
297#define __NR_ioprio_set 289 297#define __NR_ioprio_set 289
298#define __NR_ioprio_get 290 298#define __NR_ioprio_get 290
299#define __NR_inotify_init 291
300#define __NR_inotify_add_watch 292
301#define __NR_inotify_rm_watch 293
299 302
300#define NR_syscalls 291 303#define NR_syscalls 294
301 304
302/* 305/*
303 * user-visible error numbers are in the range -1 - -128: see 306 * user-visible error numbers are in the range -1 - -128: see
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 302ec20838ca..c9bf3746a9fb 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -474,6 +474,11 @@ struct inode {
474 struct dnotify_struct *i_dnotify; /* for directory notifications */ 474 struct dnotify_struct *i_dnotify; /* for directory notifications */
475#endif 475#endif
476 476
477#ifdef CONFIG_INOTIFY
478 struct list_head inotify_watches; /* watches on this inode */
479 struct semaphore inotify_sem; /* protects the watches list */
480#endif
481
477 unsigned long i_state; 482 unsigned long i_state;
478 unsigned long dirtied_when; /* jiffies of first dirtying */ 483 unsigned long dirtied_when; /* jiffies of first dirtying */
479 484
@@ -1393,7 +1398,6 @@ extern void emergency_remount(void);
1393extern int do_remount_sb(struct super_block *sb, int flags, 1398extern int do_remount_sb(struct super_block *sb, int flags,
1394 void *data, int force); 1399 void *data, int force);
1395extern sector_t bmap(struct inode *, sector_t); 1400extern sector_t bmap(struct inode *, sector_t);
1396extern int setattr_mask(unsigned int);
1397extern int notify_change(struct dentry *, struct iattr *); 1401extern int notify_change(struct dentry *, struct iattr *);
1398extern int permission(struct inode *, int, struct nameidata *); 1402extern int permission(struct inode *, int, struct nameidata *);
1399extern int generic_permission(struct inode *, int, 1403extern int generic_permission(struct inode *, int,
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
new file mode 100644
index 000000000000..eb581b6cfca9
--- /dev/null
+++ b/include/linux/fsnotify.h
@@ -0,0 +1,248 @@
1#ifndef _LINUX_FS_NOTIFY_H
2#define _LINUX_FS_NOTIFY_H
3
4/*
5 * include/linux/fsnotify.h - generic hooks for filesystem notification, to
6 * reduce in-source duplication from both dnotify and inotify.
7 *
8 * We don't compile any of this away in some complicated menagerie of ifdefs.
9 * Instead, we rely on the code inside to optimize away as needed.
10 *
11 * (C) Copyright 2005 Robert Love
12 */
13
14#ifdef __KERNEL__
15
16#include <linux/dnotify.h>
17#include <linux/inotify.h>
18
19/*
20 * fsnotify_move - file old_name at old_dir was moved to new_name at new_dir
21 */
22static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
23 const char *old_name, const char *new_name,
24 int isdir)
25{
26 u32 cookie = inotify_get_cookie();
27
28 if (old_dir == new_dir)
29 inode_dir_notify(old_dir, DN_RENAME);
30 else {
31 inode_dir_notify(old_dir, DN_DELETE);
32 inode_dir_notify(new_dir, DN_CREATE);
33 }
34
35 if (isdir)
36 isdir = IN_ISDIR;
37 inotify_inode_queue_event(old_dir, IN_MOVED_FROM|isdir,cookie,old_name);
38 inotify_inode_queue_event(new_dir, IN_MOVED_TO|isdir, cookie, new_name);
39}
40
41/*
42 * fsnotify_unlink - file was unlinked
43 */
44static inline void fsnotify_unlink(struct dentry *dentry, struct inode *dir)
45{
46 struct inode *inode = dentry->d_inode;
47
48 inode_dir_notify(dir, DN_DELETE);
49 inotify_inode_queue_event(dir, IN_DELETE, 0, dentry->d_name.name);
50 inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
51
52 inotify_inode_is_dead(inode);
53}
54
55/*
56 * fsnotify_rmdir - directory was removed
57 */
58static inline void fsnotify_rmdir(struct dentry *dentry, struct inode *inode,
59 struct inode *dir)
60{
61 inode_dir_notify(dir, DN_DELETE);
62 inotify_inode_queue_event(dir,IN_DELETE|IN_ISDIR,0,dentry->d_name.name);
63 inotify_inode_queue_event(inode, IN_DELETE_SELF | IN_ISDIR, 0, NULL);
64 inotify_inode_is_dead(inode);
65}
66
67/*
68 * fsnotify_create - 'name' was linked in
69 */
70static inline void fsnotify_create(struct inode *inode, const char *name)
71{
72 inode_dir_notify(inode, DN_CREATE);
73 inotify_inode_queue_event(inode, IN_CREATE, 0, name);
74}
75
76/*
77 * fsnotify_mkdir - directory 'name' was created
78 */
79static inline void fsnotify_mkdir(struct inode *inode, const char *name)
80{
81 inode_dir_notify(inode, DN_CREATE);
82 inotify_inode_queue_event(inode, IN_CREATE | IN_ISDIR, 0, name);
83}
84
85/*
86 * fsnotify_access - file was read
87 */
88static inline void fsnotify_access(struct dentry *dentry)
89{
90 struct inode *inode = dentry->d_inode;
91 u32 mask = IN_ACCESS;
92
93 if (S_ISDIR(inode->i_mode))
94 mask |= IN_ISDIR;
95
96 dnotify_parent(dentry, DN_ACCESS);
97 inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
98 inotify_inode_queue_event(inode, mask, 0, NULL);
99}
100
101/*
102 * fsnotify_modify - file was modified
103 */
104static inline void fsnotify_modify(struct dentry *dentry)
105{
106 struct inode *inode = dentry->d_inode;
107 u32 mask = IN_MODIFY;
108
109 if (S_ISDIR(inode->i_mode))
110 mask |= IN_ISDIR;
111
112 dnotify_parent(dentry, DN_MODIFY);
113 inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
114 inotify_inode_queue_event(inode, mask, 0, NULL);
115}
116
117/*
118 * fsnotify_open - file was opened
119 */
120static inline void fsnotify_open(struct dentry *dentry)
121{
122 struct inode *inode = dentry->d_inode;
123 u32 mask = IN_OPEN;
124
125 if (S_ISDIR(inode->i_mode))
126 mask |= IN_ISDIR;
127
128 inotify_inode_queue_event(inode, mask, 0, NULL);
129 inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
130}
131
132/*
133 * fsnotify_close - file was closed
134 */
135static inline void fsnotify_close(struct file *file)
136{
137 struct dentry *dentry = file->f_dentry;
138 struct inode *inode = dentry->d_inode;
139 const char *name = dentry->d_name.name;
140 mode_t mode = file->f_mode;
141 u32 mask = (mode & FMODE_WRITE) ? IN_CLOSE_WRITE : IN_CLOSE_NOWRITE;
142
143 if (S_ISDIR(inode->i_mode))
144 mask |= IN_ISDIR;
145
146 inotify_dentry_parent_queue_event(dentry, mask, 0, name);
147 inotify_inode_queue_event(inode, mask, 0, NULL);
148}
149
150/*
151 * fsnotify_xattr - extended attributes were changed
152 */
153static inline void fsnotify_xattr(struct dentry *dentry)
154{
155 struct inode *inode = dentry->d_inode;
156 u32 mask = IN_ATTRIB;
157
158 if (S_ISDIR(inode->i_mode))
159 mask |= IN_ISDIR;
160
161 inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
162 inotify_inode_queue_event(inode, mask, 0, NULL);
163}
164
165/*
166 * fsnotify_change - notify_change event. file was modified and/or metadata
167 * was changed.
168 */
169static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
170{
171 struct inode *inode = dentry->d_inode;
172 int dn_mask = 0;
173 u32 in_mask = 0;
174
175 if (ia_valid & ATTR_UID) {
176 in_mask |= IN_ATTRIB;
177 dn_mask |= DN_ATTRIB;
178 }
179 if (ia_valid & ATTR_GID) {
180 in_mask |= IN_ATTRIB;
181 dn_mask |= DN_ATTRIB;
182 }
183 if (ia_valid & ATTR_SIZE) {
184 in_mask |= IN_MODIFY;
185 dn_mask |= DN_MODIFY;
186 }
187 /* both times implies a utime(s) call */
188 if ((ia_valid & (ATTR_ATIME | ATTR_MTIME)) == (ATTR_ATIME | ATTR_MTIME))
189 {
190 in_mask |= IN_ATTRIB;
191 dn_mask |= DN_ATTRIB;
192 } else if (ia_valid & ATTR_ATIME) {
193 in_mask |= IN_ACCESS;
194 dn_mask |= DN_ACCESS;
195 } else if (ia_valid & ATTR_MTIME) {
196 in_mask |= IN_MODIFY;
197 dn_mask |= DN_MODIFY;
198 }
199 if (ia_valid & ATTR_MODE) {
200 in_mask |= IN_ATTRIB;
201 dn_mask |= DN_ATTRIB;
202 }
203
204 if (dn_mask)
205 dnotify_parent(dentry, dn_mask);
206 if (in_mask) {
207 if (S_ISDIR(inode->i_mode))
208 in_mask |= IN_ISDIR;
209 inotify_inode_queue_event(inode, in_mask, 0, NULL);
210 inotify_dentry_parent_queue_event(dentry, in_mask, 0,
211 dentry->d_name.name);
212 }
213}
214
215#ifdef CONFIG_INOTIFY /* inotify helpers */
216
217/*
218 * fsnotify_oldname_init - save off the old filename before we change it
219 */
220static inline const char *fsnotify_oldname_init(const char *name)
221{
222 return kstrdup(name, GFP_KERNEL);
223}
224
225/*
226 * fsnotify_oldname_free - free the name we got from fsnotify_oldname_init
227 */
228static inline void fsnotify_oldname_free(const char *old_name)
229{
230 kfree(old_name);
231}
232
233#else /* CONFIG_INOTIFY */
234
235static inline const char *fsnotify_oldname_init(const char *name)
236{
237 return NULL;
238}
239
240static inline void fsnotify_oldname_free(const char *old_name)
241{
242}
243
244#endif /* ! CONFIG_INOTIFY */
245
246#endif /* __KERNEL__ */
247
248#endif /* _LINUX_FS_NOTIFY_H */
diff --git a/include/linux/inotify.h b/include/linux/inotify.h
new file mode 100644
index 000000000000..a40c2bf0408e
--- /dev/null
+++ b/include/linux/inotify.h
@@ -0,0 +1,108 @@
1/*
2 * Inode based directory notification for Linux
3 *
4 * Copyright (C) 2005 John McCutchan
5 */
6
7#ifndef _LINUX_INOTIFY_H
8#define _LINUX_INOTIFY_H
9
10#include <linux/types.h>
11
12/*
13 * struct inotify_event - structure read from the inotify device for each event
14 *
15 * When you are watching a directory, you will receive the filename for events
16 * such as IN_CREATE, IN_DELETE, IN_OPEN, IN_CLOSE, ..., relative to the wd.
17 */
18struct inotify_event {
19 __s32 wd; /* watch descriptor */
20 __u32 mask; /* watch mask */
21 __u32 cookie; /* cookie to synchronize two events */
22 __u32 len; /* length (including nulls) of name */
23 char name[0]; /* stub for possible name */
24};
25
26/* the following are legal, implemented events that user-space can watch for */
27#define IN_ACCESS 0x00000001 /* File was accessed */
28#define IN_MODIFY 0x00000002 /* File was modified */
29#define IN_ATTRIB 0x00000004 /* Metadata changed */
30#define IN_CLOSE_WRITE 0x00000008 /* Writtable file was closed */
31#define IN_CLOSE_NOWRITE 0x00000010 /* Unwrittable file closed */
32#define IN_OPEN 0x00000020 /* File was opened */
33#define IN_MOVED_FROM 0x00000040 /* File was moved from X */
34#define IN_MOVED_TO 0x00000080 /* File was moved to Y */
35#define IN_CREATE 0x00000100 /* Subfile was created */
36#define IN_DELETE 0x00000200 /* Subfile was deleted */
37#define IN_DELETE_SELF 0x00000400 /* Self was deleted */
38
39/* the following are legal events. they are sent as needed to any watch */
40#define IN_UNMOUNT 0x00002000 /* Backing fs was unmounted */
41#define IN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
42#define IN_IGNORED 0x00008000 /* File was ignored */
43
44/* helper events */
45#define IN_CLOSE (IN_CLOSE_WRITE | IN_CLOSE_NOWRITE) /* close */
46#define IN_MOVE (IN_MOVED_FROM | IN_MOVED_TO) /* moves */
47
48/* special flags */
49#define IN_ISDIR 0x40000000 /* event occurred against dir */
50#define IN_ONESHOT 0x80000000 /* only send event once */
51
52/*
53 * All of the events - we build the list by hand so that we can add flags in
54 * the future and not break backward compatibility. Apps will get only the
55 * events that they originally wanted. Be sure to add new events here!
56 */
57#define IN_ALL_EVENTS (IN_ACCESS | IN_MODIFY | IN_ATTRIB | IN_CLOSE_WRITE | \
58 IN_CLOSE_NOWRITE | IN_OPEN | IN_MOVED_FROM | \
59 IN_MOVED_TO | IN_DELETE | IN_CREATE | IN_DELETE_SELF)
60
61#ifdef __KERNEL__
62
63#include <linux/dcache.h>
64#include <linux/fs.h>
65#include <linux/config.h>
66
67#ifdef CONFIG_INOTIFY
68
69extern void inotify_inode_queue_event(struct inode *, __u32, __u32,
70 const char *);
71extern void inotify_dentry_parent_queue_event(struct dentry *, __u32, __u32,
72 const char *);
73extern void inotify_unmount_inodes(struct list_head *);
74extern void inotify_inode_is_dead(struct inode *);
75extern u32 inotify_get_cookie(void);
76
77#else
78
79static inline void inotify_inode_queue_event(struct inode *inode,
80 __u32 mask, __u32 cookie,
81 const char *filename)
82{
83}
84
85static inline void inotify_dentry_parent_queue_event(struct dentry *dentry,
86 __u32 mask, __u32 cookie,
87 const char *filename)
88{
89}
90
91static inline void inotify_unmount_inodes(struct list_head *list)
92{
93}
94
95static inline void inotify_inode_is_dead(struct inode *inode)
96{
97}
98
99static inline u32 inotify_get_cookie(void)
100{
101 return 0;
102}
103
104#endif /* CONFIG_INOTIFY */
105
106#endif /* __KERNEL __ */
107
108#endif /* _LINUX_INOTIFY_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ff48815bd3a2..dec5827c7742 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -410,6 +410,10 @@ struct user_struct {
410 atomic_t processes; /* How many processes does this user have? */ 410 atomic_t processes; /* How many processes does this user have? */
411 atomic_t files; /* How many open files does this user have? */ 411 atomic_t files; /* How many open files does this user have? */
412 atomic_t sigpending; /* How many pending signals does this user have? */ 412 atomic_t sigpending; /* How many pending signals does this user have? */
413#ifdef CONFIG_INOTIFY
414 atomic_t inotify_watches; /* How many inotify watches does this user have? */
415 atomic_t inotify_devs; /* How many inotify devs does this user have opened? */
416#endif
413 /* protected by mq_lock */ 417 /* protected by mq_lock */
414 unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */ 418 unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */
415 unsigned long locked_shm; /* How many pages of mlocked shm ? */ 419 unsigned long locked_shm; /* How many pages of mlocked shm ? */
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 5b5f434ac9a0..ce19a2aa0b21 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -61,7 +61,8 @@ enum
61 CTL_DEV=7, /* Devices */ 61 CTL_DEV=7, /* Devices */
62 CTL_BUS=8, /* Busses */ 62 CTL_BUS=8, /* Busses */
63 CTL_ABI=9, /* Binary emulation */ 63 CTL_ABI=9, /* Binary emulation */
64 CTL_CPU=10 /* CPU stuff (speed scaling, etc) */ 64 CTL_CPU=10, /* CPU stuff (speed scaling, etc) */
65 CTL_INOTIFY=11 /* Inotify */
65}; 66};
66 67
67/* CTL_BUS names: */ 68/* CTL_BUS names: */
@@ -70,6 +71,14 @@ enum
70 CTL_BUS_ISA=1 /* ISA */ 71 CTL_BUS_ISA=1 /* ISA */
71}; 72};
72 73
74/* CTL_INOTIFY names: */
75enum
76{
77 INOTIFY_MAX_USER_DEVICES=1, /* max number of inotify device instances per user */
78 INOTIFY_MAX_USER_WATCHES=2, /* max number of inotify watches per user */
79 INOTIFY_MAX_QUEUED_EVENTS=3 /* Max number of queued events per inotify device instance */
80};
81
73/* CTL_KERN names: */ 82/* CTL_KERN names: */
74enum 83enum
75{ 84{
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 29196ce9b40f..42b40ae5eada 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -80,6 +80,9 @@ cond_syscall(sys_keyctl);
80cond_syscall(compat_sys_keyctl); 80cond_syscall(compat_sys_keyctl);
81cond_syscall(compat_sys_socketcall); 81cond_syscall(compat_sys_socketcall);
82cond_syscall(sys_set_zone_reclaim); 82cond_syscall(sys_set_zone_reclaim);
83cond_syscall(sys_inotify_init);
84cond_syscall(sys_inotify_add_watch);
85cond_syscall(sys_inotify_rm_watch);
83 86
84/* arch-specific weak syscall entries */ 87/* arch-specific weak syscall entries */
85cond_syscall(sys_pciconfig_read); 88cond_syscall(sys_pciconfig_read);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 270ee7fadbd8..b240e2cb86fc 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -67,6 +67,12 @@ extern int printk_ratelimit_jiffies;
67extern int printk_ratelimit_burst; 67extern int printk_ratelimit_burst;
68extern int pid_max_min, pid_max_max; 68extern int pid_max_min, pid_max_max;
69 69
70#ifdef CONFIG_INOTIFY
71extern int inotify_max_user_devices;
72extern int inotify_max_user_watches;
73extern int inotify_max_queued_events;
74#endif
75
70#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) 76#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
71int unknown_nmi_panic; 77int unknown_nmi_panic;
72extern int proc_unknown_nmi_panic(ctl_table *, int, struct file *, 78extern int proc_unknown_nmi_panic(ctl_table *, int, struct file *,
@@ -218,6 +224,7 @@ static ctl_table root_table[] = {
218 .mode = 0555, 224 .mode = 0555,
219 .child = dev_table, 225 .child = dev_table,
220 }, 226 },
227
221 { .ctl_name = 0 } 228 { .ctl_name = 0 }
222}; 229};
223 230
@@ -959,6 +966,40 @@ static ctl_table fs_table[] = {
959 .mode = 0644, 966 .mode = 0644,
960 .proc_handler = &proc_dointvec, 967 .proc_handler = &proc_dointvec,
961 }, 968 },
969#ifdef CONFIG_INOTIFY
970 {
971 .ctl_name = INOTIFY_MAX_USER_DEVICES,
972 .procname = "max_user_devices",
973 .data = &inotify_max_user_devices,
974 .maxlen = sizeof(int),
975 .mode = 0644,
976 .proc_handler = &proc_dointvec_minmax,
977 .strategy = &sysctl_intvec,
978 .extra1 = &zero,
979 },
980
981 {
982 .ctl_name = INOTIFY_MAX_USER_WATCHES,
983 .procname = "max_user_watches",
984 .data = &inotify_max_user_watches,
985 .maxlen = sizeof(int),
986 .mode = 0644,
987 .proc_handler = &proc_dointvec_minmax,
988 .strategy = &sysctl_intvec,
989 .extra1 = &zero,
990 },
991
992 {
993 .ctl_name = INOTIFY_MAX_QUEUED_EVENTS,
994 .procname = "max_queued_events",
995 .data = &inotify_max_queued_events,
996 .maxlen = sizeof(int),
997 .mode = 0644,
998 .proc_handler = &proc_dointvec_minmax,
999 .strategy = &sysctl_intvec,
1000 .extra1 = &zero
1001 },
1002#endif
962 { .ctl_name = 0 } 1003 { .ctl_name = 0 }
963}; 1004};
964 1005
@@ -968,7 +1009,7 @@ static ctl_table debug_table[] = {
968 1009
969static ctl_table dev_table[] = { 1010static ctl_table dev_table[] = {
970 { .ctl_name = 0 } 1011 { .ctl_name = 0 }
971}; 1012};
972 1013
973extern void init_irq_proc (void); 1014extern void init_irq_proc (void);
974 1015
diff --git a/kernel/user.c b/kernel/user.c
index 734575d55769..89e562feb1b1 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -120,6 +120,10 @@ struct user_struct * alloc_uid(uid_t uid)
120 atomic_set(&new->processes, 0); 120 atomic_set(&new->processes, 0);
121 atomic_set(&new->files, 0); 121 atomic_set(&new->files, 0);
122 atomic_set(&new->sigpending, 0); 122 atomic_set(&new->sigpending, 0);
123#ifdef CONFIG_INOTIFY
124 atomic_set(&new->inotify_watches, 0);
125 atomic_set(&new->inotify_devs, 0);
126#endif
123 127
124 new->mq_bytes = 0; 128 new->mq_bytes = 0;
125 new->locked_shm = 0; 129 new->locked_shm = 0;