litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	fsnotify: make fasync generic for both inotify and fanotify	Eric Paris	2012-12-11
\| \| \| \| \| \| \| \|	inotify is supposed to support async signal notification when information is available on the inotify fd. This patch moves that support to generic fsnotify functions so it can be used by all notification mechanisms. Signed-off-by: Eric Paris <eparis@redhat.com>
*	fsnotify: change locking order	Lino Sanfilippo	2012-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote: > > I finally built and tested a v3.0 kernel with these patches (I know I'm > SOOOOOO far behind). Not what I hoped for: > > > [ 150.937798] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds. Have a nice day... > > [ 150.945290] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070 > > [ 150.946012] IP: [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] PGD 2bf9e067 PUD 2bf9f067 PMD 0 > > [ 150.946012] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > [ 150.946012] CPU 0 > > [ 150.946012] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ext4 jbd2 crc16 joydev ata_piix i2c_piix4 pcspkr uinput ipv6 autofs4 usbhid [last unloaded: scsi_wait_scan] > > [ 150.946012] > > [ 150.946012] Pid: 2764, comm: syscall_thrash Not tainted 3.0.0+ #1 Red Hat KVM > > [ 150.946012] RIP: 0010:[<ffffffff810ffd58>] [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] RSP: 0018:ffff88002c2e5df8 EFLAGS: 00010282 > > [ 150.946012] RAX: 000000004e370d9f RBX: 0000000000000000 RCX: ffff88003a029438 > > [ 150.946012] RDX: 0000000033630a5f RSI: 0000000000000000 RDI: ffff88003491c240 > > [ 150.946012] RBP: ffff88002c2e5e08 R08: 0000000000000000 R09: 0000000000000000 > > [ 150.946012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a029428 > > [ 150.946012] R13: ffff88003a029428 R14: ffff88003a029428 R15: ffff88003499a610 > > [ 150.946012] FS: 00007f5a05420700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000 > > [ 150.946012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [ 150.946012] CR2: 0000000000000070 CR3: 000000002a662000 CR4: 00000000000006f0 > > [ 150.946012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 150.946012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > [ 150.946012] Process syscall_thrash (pid: 2764, threadinfo ffff88002c2e4000, task ffff88002bfbc760) > > [ 150.946012] Stack: > > [ 150.946012] ffff88003a029438 ffff88003a029428 ffff88002c2e5e38 ffffffff81102f76 > > [ 150.946012] ffff88003a029438 ffff88003a029598 ffffffff8160f9c0 ffff88002c221250 > > [ 150.946012] ffff88002c2e5e68 ffffffff8115e9be ffff88002c2e5e68 ffff88003a029438 > > [ 150.946012] Call Trace: > > [ 150.946012] [<ffffffff81102f76>] shmem_evict_inode+0x76/0x130 > > [ 150.946012] [<ffffffff8115e9be>] evict+0x7e/0x170 > > [ 150.946012] [<ffffffff8115ee40>] iput_final+0xd0/0x190 > > [ 150.946012] [<ffffffff8115ef33>] iput+0x33/0x40 > > [ 150.946012] [<ffffffff81180205>] fsnotify_destroy_mark_locked+0x145/0x160 > > [ 150.946012] [<ffffffff81180316>] fsnotify_destroy_mark+0x36/0x50 > > [ 150.946012] [<ffffffff81181937>] sys_inotify_rm_watch+0x77/0xd0 > > [ 150.946012] [<ffffffff815aca52>] system_call_fastpath+0x16/0x1b > > [ 150.946012] Code: 67 4a 00 b8 e4 ff ff ff eb aa 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 8b 9f 40 05 00 00 > > [ 150.946012] 83 7b 70 00 74 1c 4c 8d a3 80 00 00 00 4c 89 e7 e8 d2 5d 4a > > [ 150.946012] RIP [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] RSP <ffff88002c2e5df8> > > [ 150.946012] CR2: 0000000000000070 > > Looks at aweful lot like the problem from: > http://www.spinics.net/lists/linux-fsdevel/msg46101.html > I tried to reproduce this bug with your test program, but without success. However, if I understand correctly, this occurs since we dont hold any locks when we call iput() in mark_destroy(), right? With the patches you tested, iput() is also not called within any lock, since the groups mark_mutex is released temporarily before iput() is called. This is, since the original codes behaviour is similar. However since we now have a mutex as the biggest lock, we can do what you suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and call iput() with the mutex held to avoid the race. The patch below implements this. It uses nested locking to avoid deadlock in case we do the final iput() on an inode which still holds marks and thus would take the mutex again when calling fsnotify_inode_delete() in destroy_inode(). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
*	fsnotify: dont put marks on temporary list when clearing marks by group	Lino Sanfilippo	2012-12-11
\| \| \| \| \| \| \| \| \| \| \|	In clear_marks_by_group_flags() the mark list of a group is iterated and the marks are put on a temporary list. Since we introduced fsnotify_destroy_mark_locked() we dont need the temp list any more and are able to remove the marks while the mark list is iterated and the mark list mutex is held. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
*	fsnotify: introduce locked versions of fsnotify_add_mark() and ↵	Lino Sanfilippo	2012-12-11
\| \| \| \| \| \| \| \| \| \| \|	fsnotify_remove_mark() This patch introduces fsnotify_add_mark_locked() and fsnotify_remove_mark_locked() which are essentially the same as fsnotify_add_mark() and fsnotify_remove_mark() but assume that the caller has already taken the groups mark mutex. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
*	fsnotify: pass group to fsnotify_destroy_mark()	Lino Sanfilippo	2012-12-11
\| \| \| \| \| \| \| \|	In fsnotify_destroy_mark() dont get the group from the passed mark anymore, but pass the group itself as an additional parameter to the function. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
*	fsnotify: use a mutex instead of a spinlock to protect a groups mark list	Lino Sanfilippo	2012-12-11
\| \| \| \| \| \| \| \| \|	Replaces the groups mark_lock spinlock with a mutex. Using a mutex instead of a spinlock results in more flexibility (i.e it allows to sleep while the lock is held). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
*	fanotify: add an extra flag to mark_remove_from_mask that indicates wheather ↵	Lino Sanfilippo	2012-12-11
\| \| \| \| \| \| \| \| \| \| \| \|	a mark should be destroyed This patch adds an extra flag to mark_remove_from_mask() to inform the caller if the mark should be destroyed. With this we dont destroy the mark implicitly in the function itself any more but let the caller handle it. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
*	fsnotify: take groups mark_lock before mark lock	Lino Sanfilippo	2012-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Race-free addition and removal of a mark to a groups mark list would be easier if we could lock the mark list of group before we lock the specific mark. This patch changes the order used to add/remove marks to/from mark lists from 1. mark->lock 2. group->mark_lock 3. inode->i_lock to 1. group->mark_lock 2. mark->lock 3. inode->i_lock Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
*	fsnotify: use reference counting for groups	Lino Sanfilippo	2012-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Get a group ref for each mark that is added to the groups list and release that ref when the mark is freed in fsnotify_put_mark(). We also use get a group reference for duplicated marks and for private event data. Now we dont free a group any more when the number of marks becomes 0 but when the groups ref count does. Since this will only happen when all marks are removed from a groups mark list, we dont have to set the groups number of marks to 1 at group creation. Beside clearing all marks in fsnotify_destroy_group() we do also flush the groups event queue. This is since events may hold references to groups (due to private event data) and we have to put those references first before we get a chance to put the final ref, which will result in a call to fsnotify_final_destroy_group(). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
*	fsnotify: introduce fsnotify_get_group()	Lino Sanfilippo	2012-12-11
\| \| \| \| \| \| \|	Introduce fsnotify_get_group() which increments the reference counter of a group. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
*	inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()	Lino Sanfilippo	2012-12-11
\| \| \| \| \| \| \| \| \| \| \|	Currently in fsnotify_put_group() the ref count of a group is decremented and if it becomes 0 fsnotify_destroy_group() is called. Since a groups ref count is only at group creation set to 1 and never increased after that a call to fsnotify_put_group() always results in a call to fsnotify_destroy_group(). With this patch fsnotify_destroy_group() is called directly. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
*	switch dentry_open() to struct path, make it grab references itself	Al Viro	2012-07-22
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	vfs: switch i_dentry/d_alias to hlist	Al Viro	2012-07-14
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fsnotify: remove unused parameter from send_to_group()	Dan Carpenter	2012-05-30
\| \| \| \| \| \| \| \| \| \|	We don't use "mnt" anymore in send_to_group() after 1968f5eed5 ("fanotify: use both marks when possible") was applied. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fs/notify/notification.c: make subsys_initcall function static	H Hartley Sweeten	2012-03-23
\| \| \| \| \| \| \| \| \|	Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Arun Sharma <asharma@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fsnotify: don't BUG in fsnotify_destroy_mark()	Miklos Szeredi	2012-01-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removing the parent of a watched file results in "kernel BUG at fs/notify/mark.c:139". To reproduce add "-w /tmp/audit/dir/watched_file" to audit.rules rm -rf /tmp/audit/dir This is caused by fsnotify_destroy_mark() being called without an extra reference taken by the caller. Reported by Francesco Cosoleto here: https://bugzilla.novell.com/show_bug.cgi?id=689860 Fix by removing the BUG_ON and adding a comment about not accessing mark after the iput. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	vfs: move fsnotify junk to struct mount	Al Viro	2012-01-03
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	atomic: use <linux/atomic.h>	Arun Sharma	2011-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6	Linus Torvalds	2011-04-07
\|\ \| \| \| \| \| \| \| \|	* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings
\| *	Fix common misspellings	Lucas De Marchi	2011-03-31
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
* \|	inotify: fix double free/corruption of stuct user	Eric Paris	2011-04-05
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On an error path in inotify_init1 a normal user can trigger a double free of struct user. This is a regression introduced by a2ae4cc9a16e ("inotify: stop kernel memory leak on file creation failure"). We fix this by making sure that if a group exists the user reference is dropped when the group is cleaned up. We should not explictly drop the reference on error and also drop the reference when the group is cleaned up. The new lifetime rules are that an inotify group lives from inotify_new_group to the last fsnotify_put_group. Since the struct user and inotify_devs are directly tied to this lifetime they are only changed/updated in those two locations. We get rid of all special casing of struct user or user->inotify_devs. Signed-off-by: Eric Paris <eparis@redhat.com> Cc: stable@kernel.org (2.6.37 and up) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fs: rename inode_lock to inode_hash_lock	Dave Chinner	2011-03-24
\| \| \| \| \| \| \| \| \|	All that remains of the inode_lock is protecting the inode hash list manipulation and traversals. Rename the inode_lock to inode_hash_lock to reflect it's actual function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fs: move i_sb_list out from under inode_lock	Dave Chinner	2011-03-24
\| \| \| \| \| \| \| \| \| \| \|	Protect the per-sb inode list with a new global lock inode_sb_list_lock and use it to protect the list manipulations and traversals. This lock replaces the inode_lock as the inodes on the list can be validity checked while holding the inode->i_lock and hence the inode_lock is no longer needed to protect the list. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fs: protect inode->i_state with inode->i_lock	Dave Chinner	2011-03-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Protect inode state transitions and validity checks with the inode->i_lock. This enables us to make inode state transitions independently of the inode_lock and is the first step to peeling away the inode_lock from the code. This requires that __iget() is done atomically with i_state checks during list traversals so that we don't race with another thread marking the inode I_FREEING between the state check and grabbing the reference. Also remove the unlock_new_inode() memory barrier optimisation required to avoid taking the inode_lock when clearing I_NEW. Simplify the code by simply taking the inode->i_lock around the state change and wakeup. Because the wakeup is no longer tricky, remove the wake_up_inode() function and open code the wakeup where necessary. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Remove one to many n's in a word	Justin P. Mattock	2011-03-01
\| \| \| \| \|	Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
*	Merge branch 'for-next' of ↵	Linus Torvalds	2011-01-13
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
\| *	Merge branch 'master' into for-next	Jiri Kosina	2010-12-22
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: MAINTAINERS arch/arm/mach-omap2/pm24xx.c drivers/scsi/bfa/bfa_fcpim.c Needed to update to apply fixes for which the old branch was too outdated.
\| * \|	Kconfig: typo: and -> an	Michael Witten	2010-11-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* \| \|	fs: dcache per-inode inode alias locking	Nick Piggin	2011-01-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dcache_inode_lock can be replaced with per-inode locking. Use existing inode->i_lock for this. This is slightly non-trivial because we sometimes need to find the inode from the dentry, which requires d_inode to be stabilised (either with refcount or d_lock). Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* \| \|	fs: dcache remove dcache_lock	Nick Piggin	2011-01-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* \| \|	fs: scale inode alias list	Nick Piggin	2011-01-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list from concurrent modification. d_alias is also protected by d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* \| \|	fs: dcache scale subdirs	Nick Piggin	2011-01-07
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Protect d_subdirs and d_child with d_lock, except in filesystems that aren't using dcache_lock for these anyway (eg. using i_mutex). Note: if we change the locking rule in future so that ->d_child protection is provided only with ->d_parent->d_lock, it may allow us to reduce some locking. But it would be an exception to an otherwise regular locking scheme, so we'd have to see some good results. Probably not worthwhile. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* \|	fanotify: fill in the metadata_len field on struct fanotify_event_metadata	Eric Paris	2010-12-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fanotify_event_metadata now has a field which is supposed to indicate the length of the metadata portion of the event. Fill in that field as well. Based-in-part-on-patch-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
* \|	fanotify: Dont try to open a file descriptor for the overflow event	Lino Sanfilippo	2010-12-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We should not try to open a file descriptor for the overflow event since this will always fail. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
* \|	fanotify: do not leak user reference on allocation failure	Eric Paris	2010-12-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If fanotify_init is unable to allocate a new fsnotify group it will return but will not drop its reference on the associated user struct. Drop that reference on error. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
* \|	inotify: stop kernel memory leak on file creation failure	Eric Paris	2010-12-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If inotify_init is unable to allocate a new file for the new inotify group we leak the new group. This patch drops the reference on the group on file allocation failure. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> cc: stable@kernel.org Signed-off-by: Eric Paris <eparis@redhat.com>
* \|	fanotify: on group destroy allow all waiters to bypass permission check	Lino Sanfilippo	2010-12-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When fanotify_release() is called, there may still be processes waiting for access permission. Currently only processes for which an event has already been queued into the groups access list will be woken up. Processes for which no event has been queued will continue to sleep and thus cause a deadlock when fsnotify_put_group() is called. Furthermore there is a race allowing further processes to be waiting on the access wait queue after wake_up (if they arrive before clear_marks_by_group() is called). This patch corrects this by setting a flag to inform processes that the group is about to be destroyed and thus not to wait for access permission. [additional changelog from eparis] Lets think about the 4 relevant code paths from the PoV of the 'operator' 'listener' 'responder' and 'closer'. Where operator is the process doing an action (like open/read) which could require permission. Listener is the task (or in this case thread) slated with reading from the fanotify file descriptor. The 'responder' is the thread responsible for responding to access requests. 'Closer' is the thread attempting to close the fanotify file descriptor. The 'operator' is going to end up in: fanotify_handle_event() get_response_from_access() (THIS BLOCKS WAITING ON USERSPACE) The 'listener' interesting code path fanotify_read() copy_event_to_user() prepare_for_access_response() (THIS CREATES AN fanotify_response_event) The 'responder' code path: fanotify_write() process_access_response() (REMOVE A fanotify_response_event, SET RESPONSE, WAKE UP 'operator') The 'closer': fanotify_release() (SUPPOSED TO CLEAN UP THE REST OF THIS MESS) What we have today is that in the closer we remove all of the fanotify_response_events and set a bit so no more response events are ever created in prepare_for_access_response(). The bug is that we never wake all of the operators up and tell them to move along. You fix that in fanotify_get_response_from_access(). You also fix other operators which haven't gotten there yet. So I agree that's a good fix. [/additional changelog from eparis] [remove additional changes to minimize patch size] [move initialization so it was inside CONFIG_FANOTIFY_PERMISSION] Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
* \|	fanotify: Dont allow a mask of 0 if setting or removing a mark	Lino Sanfilippo	2010-12-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In mark_remove_from_mask() we destroy marks that have their event mask cleared. Thus we should not allow the creation of those marks in the first place. With this patch we check if the mask given from user is 0 in case of FAN_MARK_ADD. If so we return an error. Same for FAN_MARK_REMOVE since this does not have any effect. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
* \|	fanotify: correct broken ref counting in case adding a mark failed	Lino Sanfilippo	2010-12-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If adding a mount or inode mark failed fanotify_free_mark() is called explicitly. But at this time the mark has already been put into the destroy list of the fsnotify_mark kernel thread. If the thread is too slow it will try to decrease the reference of a mark, that has already been freed by fanotify_free_mark(). (If its fast enough it will only decrease the marks ref counter from 2 to 1 - note that the counter has been increased to 2 in add_mark() - which has practically no effect.) This patch fixes the ref counting by not calling free_mark() explicitly, but decreasing the ref counter and rely on the fsnotify_mark thread to cleanup in case adding the mark has failed. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
* \|	fanotify: deny permissions when no event was sent	Eric Paris	2010-12-07
\|/ \| \| \| \| \| \| \| \|	If no event was sent to userspace we cannot expect userspace to respond to permissions requests. Today such requests just hang forever. This patch will deny any permissions event which was unable to be sent to userspace. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com> Signed-off-by: Eric Paris <eparis@redhat.com>
*	make fanotify_read() restartable across signals	Lino Sanfilippo	2010-10-30
\| \| \| \| \| \| \|	In fanotify_read() return -ERESTARTSYS instead of -EINTR to make read() restartable across signals (BSD semantic). Signed-off-by: Eric Paris <eparis@redhat.com>
*	fs/notify/fanotify/fanotify_user.c: fix warnings	Andrew Morton	2010-10-28
\| \| \| \| \| \| \| \| \| \| \| \|	fs/notify/fanotify/fanotify_user.c: In function 'fanotify_release': fs/notify/fanotify/fanotify_user.c:375: warning: unused variable 'lre' fs/notify/fanotify/fanotify_user.c:375: warning: unused variable 're' this is really ugly. Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com>
*	fanotify: do not recalculate the mask if the ignored mask changed	Eric Paris	2010-10-28
\| \| \| \| \| \| \| \|	If fanotify sets a new bit in the ignored mask it will cause the generic fsnotify layer to recalculate the real mask. This is stupid since we didn't change that part. Signed-off-by: Eric Paris <eparis@redhat.com>
*	fanotify: ignore events on directories unless specifically requested	Eric Paris	2010-10-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	fanotify has a very limited number of events it sends on directories. The usefulness of these events is yet to be seen and still we send them. This is particularly painful for mount marks where one might receive many of these useless events. As such this patch will drop events on IS_DIR() inodes unless they were explictly requested with FAN_ON_DIR. This means that a mark on a directory without FAN_EVENT_ON_CHILD or FAN_ON_DIR is meaningless and will result in no events ever (although it will still be allowed since detecting it is hard) Signed-off-by: Eric Paris <eparis@redhat.com>
*	fsnotify: rename FS_IN_ISDIR to FS_ISDIR	Eric Paris	2010-10-28
\| \| \| \| \| \| \| \|	The _IN_ in the naming is reserved for flags only used by inotify. Since I am about to use this flag for fanotify rename it to be generic like the rest. Signed-off-by: Eric Paris <eparis@redhat.com>
*	fanotify: do not send events for irregular files	Eric Paris	2010-10-28
\| \| \| \| \| \| \| \| \| \|	fanotify_should_send_event has a test to see if an object is a file or directory and does not send an event otherwise. The problem is that the test is actually checking if the object with a mark is a file or directory, not if the object the event happened on is a file or directory. We should check the latter. Signed-off-by: Eric Paris <eparis@redhat.com>
*	fanotify: limit number of listeners per user	Eric Paris	2010-10-28
\| \| \| \| \| \| \| \|	fanotify currently has no limit on the number of listeners a given user can have open. This patch limits the total number of listeners per user to 128. This is the same as the inotify default limit. Signed-off-by: Eric Paris <eparis@redhat.com>
*	fanotify: allow userspace to override max marks	Eric Paris	2010-10-28
\| \| \| \| \| \| \| \| \| \| \| \|	Some fanotify groups, especially those like AV scanners, will need to place lots of marks, particularly ignore marks. Since ignore marks do not pin inodes in cache and are cleared if the inode is removed from core (usually under memory pressure) we expose an interface for listeners, with CAP_SYS_ADMIN, to override the maximum number of marks and be allowed to set and 'unlimited' number of marks. Programs which make use of this feature will be able to OOM a machine. Signed-off-by: Eric Paris <eparis@redhat.com>
*	fanotify: limit the number of marks in a single fanotify group	Eric Paris	2010-10-28
\| \| \| \| \| \| \| \| \| \|	There is currently no limit on the number of marks a given fanotify group can have. Since fanotify is gated on CAP_SYS_ADMIN this was not seen as a serious DoS threat. This patch implements a default of 8192, the same as inotify to work towards removing the CAP_SYS_ADMIN gating and eliminating the default DoS'able status. Signed-off-by: Eric Paris <eparis@redhat.com>
*	fanotify: allow userspace to override max queue depth	Eric Paris	2010-10-28
\| \| \| \| \| \| \| \| \|	fanotify has a defualt max queue depth. This patch allows processes which explicitly request it to have an 'unlimited' queue depth. These processes need to be very careful to make sure they cannot fall far enough behind that they OOM the box. Thus this flag is gated on CAP_SYS_ADMIN. Signed-off-by: Eric Paris <eparis@redhat.com>