litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	User namespaces: set of cleanups (v2)	Serge Hallyn	2008-11-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The user_ns is moved from nsproxy to user_struct, so that a struct cred by itself is sufficient to determine access (which it otherwise would not be). Corresponding ecryptfs fixes (by David Howells) are here as well. Fix refcounting. The following rules now apply: 1. The task pins the user struct. 2. The user struct pins its user namespace. 3. The user namespace pins the struct user which created it. User namespaces are cloned during copy_creds(). Unsharing a new user_ns is no longer possible. (We could re-add that, but it'll cause code duplication and doesn't seem useful if PAM doesn't need to clone user namespaces). When a user namespace is created, its first user (uid 0) gets empty keyrings and a clean group_info. This incorporates a previous patch by David Howells. Here is his original patch description: >I suggest adding the attached incremental patch. It makes the following >changes: > > (1) Provides a current_user_ns() macro to wrap accesses to current's user > namespace. > > (2) Fixes eCryptFS. > > (3) Renames create_new_userns() to create_user_ns() to be more consistent > with the other associated functions and because the 'new' in the name is > superfluous. > > (4) Moves the argument and permission checks made for CLONE_NEWUSER to the > beginning of do_fork() so that they're done prior to making any attempts > at allocation. > > (5) Calls create_user_ns() after prepare_creds(), and gives it the new creds > to fill in rather than have it return the new root user. I don't imagine > the new root user being used for anything other than filling in a cred > struct. > > This also permits me to get rid of a get_uid() and a free_uid(), as the > reference the creds were holding on the old user_struct can just be > transferred to the new namespace's creator pointer. > > (6) Makes create_user_ns() reset the UIDs and GIDs of the creds under > preparation rather than doing it in copy_creds(). > >David >Signed-off-by: David Howells <dhowells@redhat.com> Changelog: Oct 20: integrate dhowells comments 1. leave thread_keyring alone 2. use current_user_ns() in set_user() Signed-off-by: Serge Hallyn <serue@us.ibm.com>
*	nfsctl: add headers for credentials	Randy Dunlap	2008-11-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Needs headers help for current_cred: Adding only cred.h wasn't enough. linux-next-20081023/fs/nfsctl.c:45: error: implicit declaration of function 'current_cred' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Morris <jmorris@namei.org>
*	coda: fix creds reference	Randy Dunlap	2008-11-19
\| \| \| \| \| \| \| \| \| \| \| \| \|	Needs a header file for credentials struct: linux-next-20081023/fs/coda/file.c:177: error: dereferencing pointer to incomplete type Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Jan Harkes <jaharkes@cs.cmu.edu> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Morris <jmorris@namei.org>
*	Merge branch 'master' into next	James Morris	2008-11-18
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: fs/cifs/misc.c Merge to resolve above, per the patch below. Signed-off-by: James Morris <jmorris@namei.org> diff --cc fs/cifs/misc.c index ec36410,addd1dc..0000000 --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@@ -347,13 -338,13 +338,13 @@@ header_assemble(struct smb_hdr buffer / BB Add support for establishing new tCon and SMB Session / / with userid/password pairs found on the smb session / / for other target tcp/ip addresses BB */ - if (current->fsuid != treeCon->ses->linux_uid) { + if (current_fsuid() != treeCon->ses->linux_uid) { cFYI(1, ("Multiuser mode and UID " "did not match tcon uid")); - read_lock(&GlobalSMBSeslock); - list_for_each(temp_item, &GlobalSMBSessionList) { - ses = list_entry(temp_item, struct cifsSesInfo, cifsSessionList); + read_lock(&cifs_tcp_ses_lock); + list_for_each(temp_item, &treeCon->ses->server->smb_ses_list) { + ses = list_entry(temp_item, struct cifsSesInfo, smb_ses_list); - if (ses->linux_uid == current->fsuid) { + if (ses->linux_uid == current_fsuid()) { if (ses->server == treeCon->ses->server) { cFYI(1, ("found matching uid substitute right smb_uid")); buffer->Uid = ses->Suid;
\| *	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6	Linus Torvalds	2008-11-17
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: prevent cifs_writepages() from skipping unwritten pages Fixed parsing of mount options when doing DFS submount [CIFS] Fix check for tcon seal setting and fix oops on failed mount from earlier patch [CIFS] Fix build break cifs: reinstate sharing of tree connections [CIFS] minor cleanup to cifs_mount cifs: reinstate sharing of SMB sessions sans races cifs: disable sharing session and tcon and add new TCP sharing code [CIFS] clean up server protocol handling [CIFS] remove unused list, add new cifs sock list to prepare for mount/umount fix [CIFS] Fix cifs reconnection flags [CIFS] Can't rely on iov length and base when kernel_recvmsg returns error
\| \| *	prevent cifs_writepages() from skipping unwritten pages	Dave Kleikamp	2008-11-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes a data corruption under heavy stress in which pages could be left dirty after all open instances of a inode have been closed. In order to write contiguous pages whenever possible, cifs_writepages() asks pagevec_lookup_tag() for more pages than it may write at one time. Normally, it then resets index just past the last page written before calling pagevec_lookup_tag() again. If cifs_writepages() can't write the first page returned, it wasn't resetting index, and the next call to pagevec_lookup_tag() resulted in skipping all of the pages it previously returned, even though cifs_writepages() did nothing with them. This can result in data loss when the file descriptor is about to be closed. This patch ensures that index gets set back to the next returned page so that none get skipped. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com> Cc: Shirish S Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| \| *	Fixed parsing of mount options when doing DFS submount	Igor Mammedov	2008-11-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since these hit the same routines, and are relatively small, it is easier to review them as one patch. Fixed incorrect handling of the last option in some cases Fixed prefixpath handling convert path_consumed into host depended string length (in bytes) Use non default separator if it is provided in the original mount options Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Igor Mammedov <niallain@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| \| *	[CIFS] Fix check for tcon seal setting and fix oops on failed mount from ↵	Steve French	2008-11-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	earlier patch set tcon->ses earlier If the inital tree connect fails, we'll end up calling cifs_put_smb_ses with a NULL pointer. Fix it by setting the tcon->ses earlier. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| \| *	[CIFS] Fix build break	Steve French	2008-11-16
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Steve French <sfrench@us.ibm.com>
\| \| *	cifs: reinstate sharing of tree connections	Jeff Layton	2008-11-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a similar approach to the SMB session sharing. Add a list of tcons attached to each SMB session. Move the refcount to non-atomic. Protect all of the above with the cifs_tcp_ses_lock. Add functions to properly find and put references to the tcons. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| \| *	[CIFS] minor cleanup to cifs_mount	Steve French	2008-11-14
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Steve French <sfrench@us.ibm.com>
\| \| *	cifs: reinstate sharing of SMB sessions sans races	Jeff Layton	2008-11-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We do this by abandoning the global list of SMB sessions and instead moving to a per-server list. This entails adding a new list head to the TCP_Server_Info struct. The refcounting for the cifsSesInfo is moved to a non-atomic variable. We have to protect it by a lock anyway, so there's no benefit to making it an atomic. The list and refcount are protected by the global cifs_tcp_ses_lock. The patch also adds a new routines to find and put SMB sessions and that properly take and put references under the lock. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| \| *	cifs: disable sharing session and tcon and add new TCP sharing code	Jeff Layton	2008-11-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code that allows these structs to be shared is extremely racy. Disable the sharing of SMB and tcon structs for now until we can come up with a way to do this that's race free. We want to continue to share TCP sessions, however since they are required for multiuser mounts. For that, implement a new (hopefully race-free) scheme. Add a new global list of TCP sessions, and take care to get a reference to it whenever we're dealing with one. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| \| *	[CIFS] clean up server protocol handling	Steve French	2008-11-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're currently declaring both a sockaddr_in and sockaddr6_in on the stack, but we really only need storage for one of them. Declare a sockaddr struct and cast it to the proper type. Also, eliminate the protocolType field in the TCP_Server_Info struct. It's redundant since we have a sa_family field in the sockaddr anyway. We may need to revisit this if SCTP is ever implemented, but for now this will simplify the code. CIFS over IPv6 also has a number of problems currently. This fixes all of them that I found. Eventually, it would be nice to move more of the code to be protocol independent, but this is a start. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| \| *	[CIFS] remove unused list, add new cifs sock list to prepare for ↵	Steve French	2008-11-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mount/umount fix Also adds two lines missing from the previous patch (for the need reconnect flag in the /proc/fs/cifs/DebugData handling) The new global_cifs_sock_list is added, and initialized in init_cifs but not used yet. Jeff Layton will be adding code in to use that and to remove the GlobalTcon and GlobalSMBSession lists. CC: Jeff Layton <jlayton@redhat.com> CC: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| \| *	[CIFS] Fix cifs reconnection flags	Steve French	2008-11-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation for Jeff's big umount/mount fixes to remove the possibility of various races in cifs mount and linked list handling of sessions, sockets and tree connections, this patch cleans up some repetitive code in cifs_mount, and addresses a problem with ses->status and tcon->tidStatus in which we were overloading the "need_reconnect" state with other status in that field. So the "need_reconnect" flag has been broken out from those two state fields (need reconnect was not mutually exclusive from some of the other possible tid and ses states). In addition, a few exit cases in cifs_mount were cleaned up, and a problem with a tcon flag (for lease support) was not being set consistently for the 2nd mount of the same share CC: Jeff Layton <jlayton@redhat.com> CC: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| \| *	[CIFS] Can't rely on iov length and base when kernel_recvmsg returns error	Steve French	2008-11-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When retrying kernel_recvmsg, reset iov_base and iov_len. Note comment from Sridhar: "In the normal path, iov.iov_len is clearly set to 4. But i think you are running into a case where kernel_recvmsg() is called via 'goto incomplete_rcv' It happens if the previous call fails with EAGAIN. If you want to call recvmsg() after EAGAIN failure, you need to reset iov." Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| * \|	Fix broken ownership of /proc/sys/ files	Al Viro	2008-11-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	D'oh... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reported-and-tested-by: Peter Palfrader <peter@palfrader.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \|	Fix inotify watch removal/umount races	Al Viro	2008-11-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inotify watch removals suck violently. To kick the watch out we need (in this order) inode->inotify_mutex and ih->mutex. That's fine if we have a hold on inode; however, for all other cases we need to make damn sure we don't race with umount. We can NOT just grab a reference to a watch - inotify_unmount_inodes() will happily sail past it and we'll end with reference to inode potentially outliving its superblock. Ideally we just want to grab an active reference to superblock if we can; that will make sure we won't go into inotify_umount_inodes() until we are done. Cleanup is just deactivate_super(). However, that leaves a messy case - what if we are racing with umount() and active references to superblock can't be acquired anymore? We can bump ->s_count, grab ->s_umount, which will almost certainly wait until the superblock is shut down and the watch in question is pining for fjords. That's fine, but there is a problem - we might have hit the window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e. the moment when superblock is past the point of no return and is heading for shutdown) and the moment when deactivate_super() acquires ->s_umount. We could just do drop_super() yield() and retry, but that's rather antisocial and this stuff is luser-triggerable. OTOH, having grabbed ->s_umount and having found that we'd got there first (i.e. that ->s_root is non-NULL) we know that we won't race with inotify_umount_inodes(). So we could grab a reference to watch and do the rest as above, just with drop_super() instead of deactivate_super(), right? Wrong. We had to drop ih->mutex before we could grab ->s_umount. So the watch could've been gone already. That still can be dealt with - we need to save watch->wd, do idr_find() and compare its result with our pointer. If they match, we either have the damn thing still alive or we'd lost not one but two races at once, the watch had been killed and a new one got created with the same ->wd at the same address. That couldn't have happened in inotify_destroy(), but inotify_rm_wd() could run into that. Still, "new one got created" is not a problem - we have every right to kill it or leave it alone, whatever's more convenient. So we can use idr_find(...) == watch && watch->inode->i_sb == sb as "grab it and kill it" check. If it's been our original watch, we are fine, if it's a newcomer - nevermind, just pretend that we'd won the race and kill the fscker anyway; we are safe since we know that its superblock won't be going away. And yes, this is far beyond mere "not very pretty"; so's the entire concept of inotify to start with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Greg KH <greg@kroah.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	Merge branch 'master' into next	James Morris	2008-11-13
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: security/keys/internal.h security/keys/process_keys.c security/keys/request_key.c Fixed conflicts above by using the non 'tsk' versions. Signed-off-by: James Morris <jmorris@namei.org>
\| * \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2008-11-13
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: fix shutdown cleanup
\| \| * \|	dlm: fix shutdown cleanup	David Teigland	2008-11-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes a regression from commit 0f8e0d9a317406612700426fad3efab0b7bbc467, "dlm: allow multiple lockspace creates". An extraneous 'else' slipped into a code fragment being moved from release_lockspace() to dlm_release_lockspace(). The result of the unwanted 'else' is that dlm threads and structures are not stopped and cleaned up when the final dlm lockspace is removed. Trying to create a new lockspace again afterward will fail with "kmem_cache_create: duplicate cache dlm_conn" because the cache was not previously destroyed. Signed-off-by: David Teigland <teigland@redhat.com>
\| * \| \|	ext3: Clean up outdated and incorrect comment for ext3_write_super()	Theodore Tso	2008-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \| \|	vfs: fix shrink_submounts	Eric W. Biederman	2008-11-12
\| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the last refactoring of shrink_submounts a variable was not completely renamed. So finish the renaming of mnt to m now. Without this if you attempt to mount an nfs mount that has both automatic nfs sub mounts on it, and has normal mounts on it. The unmount will succeed when it should not. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \|	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs	Linus Torvalds	2008-11-11
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://oss.sgi.com/xfs/xfs: [XFS] XFS: Check for valid transaction headers in recovery [XFS] handle memory allocation failures during log initialisation [XFS] Account for allocated blocks when expanding directories [XFS] Wait for all I/O on truncate to zero file size [XFS] Fix use-after-free with log and quotas
\| \| * \|	[XFS] XFS: Check for valid transaction headers in recovery	David Chinner	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we are about to add a new item to a transaction in recovery, we need to check that it is valid first. Currently we just assert that header magic number matches, but in production systems that is not present and we add a corrupted transaction to the list to be processed. This results in a kernel oops later when processing the corrupted transaction. Instead, if we detect a corrupted transaction, abort recovery and leave the user to clean up the mess that has occurred. SGI-PV: 988145 SGI-Modid: xfs-linux-melb:xfs-kern:32356a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| \| * \|	[XFS] handle memory allocation failures during log initialisation	Dave Chinner	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When there is no memory left in the system, xfs_buf_get_noaddr() can fail. If this happens at mount time during xlog_alloc_log() we fail to catch the error and oops. Catch the error from xfs_buf_get_noaddr(), and allow other memory allocations to fail and catch those errors too. Report the error to the console and fail the mount with ENOMEM. Tested by manually injecting errors into xfs_buf_get_noaddr() and xlog_alloc_log(). Version 2: o remove unnecessary casts of the returned pointer from kmem_zalloc() SGI-PV: 987246 Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| \| * \|	[XFS] Account for allocated blocks when expanding directories	David Chinner	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we create a directory, we reserve a number of blocks for the maximum possible expansion of of the directory due to various btree splits, freespace allocation, etc. Unfortunately, each allocation is not reflected in the total number of blocks still available to the transaction, so the maximal reservation is used over and over again. This leads to problems where an allocation group has only enough blocks for some of the allocations required for the directory modification. After the first N allocations, the remaining blocks in the allocation group drops below the total reservation, and subsequent allocations fail because the allocator will not allow the allocation to proceed if the AG does not have the enough blocks available for the entire allocation total. This results in an ENOSPC occurring after an allocation has already occurred. This results in aborting the directory operation (leaving the directory in an inconsistent state) and cancelling a dirty transaction, which results in a filesystem shutdown. Avoid the problem by reflecting the number of blocks allocated in any directory expansion in the total number of blocks available to the modification in progress. This prevents a directory modification from being aborted part way through with an ENOSPC. SGI-PV: 988144 SGI-Modid: xfs-linux-melb:xfs-kern:32340a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| \| * \|	[XFS] Wait for all I/O on truncate to zero file size	Lachlan McIlroy	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's possible to have outstanding xfs_ioend_t's queued when the file size is zero. This can happen in the direct I/O path when a direct I/O write fails due to ENOSPC. In this case the xfs_ioend_t will still be queued (ie xfs_end_io_direct() does not know that the I/O failed so can't force the xfs_ioend_t to be flushed synchronously). When we truncate a file on unlink we don't know to wait for these xfs_ioend_ts and we can have a use-after-free situation if the inode is reclaimed before the xfs_ioend_t is finally processed. As was suggested by Dave Chinner lets wait for all I/Os to complete when truncating the file size to zero. SGI-PV: 981668 SGI-Modid: xfs-linux-melb:xfs-kern:32216a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
\| \| * \|	[XFS] Fix use-after-free with log and quotas	Lachlan McIlroy	2008-11-10
\| \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Destroying the quota stuff on unmount can access the log - ie XFS_QM_DONE() ends up in xfs_dqunlock() which calls xfs_trans_unlocked_item() and then xfs_log_move_tail(). By this time the log has already been destroyed. Just move the cleanup of the quota code earlier in xfs_unmountfs() before the call to xfs_log_unmount(). Moving XFS_QM_DONE() up near XFS_QM_DQPURGEALL() seems like a good spot. SGI-PV: 987086 SGI-Modid: xfs-linux-melb:xfs-kern:32148a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Peter Leckie <pleckie@sgi.com>
\| * \|	ocfs2: Check search result in ocfs2_xattr_block_get()	Tiger Yang	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ocfs2_xattr_block_get() calls ocfs2_xattr_search() to find an external xattr, but doesn't check the search result that is passed back via struct ocfs2_xattr_search. Add a check for search result, and pass back -ENODATA if the xattr search failed. This avoids a later NULL pointer error. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: fix printk related build warnings in xattr.c	Mark Fasheh	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: truncate outstanding block after direct io failure	Dmitri Monakhov	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Joel Becker <Joel.Becker@oracle.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2/xattr: Proper hash collision handle in bucket division	Tao Ma	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ocfs2/xattr, we must make sure the xattrs which have the same hash value exist in the same bucket so that the search schema can work. But in the old implementation, when we want to extend a bucket, we just move half number of xattrs to the new bucket. This works in most cases, but if we are lucky enough we will move 2 xattrs into 2 different buckets. This means that an xattr from the previous bucket cannot be found anymore. This patch fix this problem by finding the right position during extending the bucket and extend an empty bucket if needed. Signed-off-by: Tao Ma <tao.ma@oracle.com> Cc: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: return 0 in page_mkwrite to let VFS retry.	Tao Ma	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ocfs2_page_mkwrite, we return -EINVAL when we found the page mapping isn't updated, and it will cause the user space program get SIGBUS and exit. The reason is that during race writeable mmap, we will do unmap_mapping_range in ocfs2_data_downconvert_worker. The good thing is that if we reuturn 0 in page_mkwrite, VFS will retry fault and then call page_mkwrite again, so it is safe to return 0 here. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: Set journal descriptor to NULL after journal shutdown	Sunil Mushran	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch sets journal descriptor to NULL after the journal is shutdown. This ensures that jbd2_journal_release_jbd_inode(), which removes the jbd2 inode from txn lists, can be called safely from ocfs2_clear_inode() even after the journal has been shutdown. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: Fix check of return value of ocfs2_start_trans() in xattr.c.	Tao Ma	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On failure, ocfs2_start_trans() returns values like ERR_PTR(-ENOMEM), so we should check whether handle is NULL. Fix them to use IS_ERR(). Jan has made the patch for other part in ocfs2(thank Jan for it), so this is just the fix for fs/ocfs2/xattr.c. Signed-off-by: Tao Ma <tao.ma@oracle.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: Let inode be really deleted when ocfs2_mknod_locked() fails	Jan Kara	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We forgot to set i_nlink to 0 when returning due to error from ocfs2_mknod_locked() and thus inode was not properly released via ocfs2_delete_inode() (e.g. claimed space was not released). Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: Fix checking of return value of new_inode()	Jan Kara	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	new_inode() does not return ERR_PTR() but NULL in case of failure. Correct checking of the return value. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: Fix check of return value of ocfs2_start_trans()	Jan Kara	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On failure, ocfs2_start_trans() returns values like ERR_PTR(-ENOMEM). Thus checks for !handle are wrong. Fix them to use IS_ERR(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: Fix some typos in xattr annotations.	Tao Ma	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix some typos in the xattr annotations. Signed-off-by: Tao Ma <tao.ma@oracle.com> Reported-by: Coly Li <coyli@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: Remove unused ocfs2_restore_xattr_block().	Tao Ma	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since now ocfs2 supports empty xattr buckets, we will never remove the xattr index tree even if all the xattrs are removed, so this function will never be called. So remove it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: Don't repeat ocfs2_xattr_block_find()	Joel Becker	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ocfs2_xattr_block_get() looks up the xattr in a startlingly familiar way; it's identical to the function ocfs2_xattr_block_find(). Let's just use the later in the former. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: Specify appropriate journal access for new xattr buckets.	Joel Becker	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a couple places that get an xattr bucket that may be reading an existing one or may be allocating a new one. They should specify the correct journal access mode depending. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: Check errors from ocfs2_xattr_update_xattr_search()	Joel Becker	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ocfs2_xattr_update_xattr_search() function can return an error when trying to read blocks off of disk. The caller needs to check this error before using those (possibly invalid) blocks. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: Don't return -EFAULT from a corrupt xattr entry.	Joel Becker	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the xattr disk structures are corrupt, return -EIO, not -EFAULT. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: Check xattr block signatures properly.	Joel Becker	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The xattr.c code is currently memcmp()ing naking buffer pointers. Create the OCFS2_IS_VALID_XATTR_BLOCK() macro to match its peers and use that. In addition, failed signature checks were returning -EFAULT, which is completely wrong. Return -EIO. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: add handler_map array bounds checking	Tiger Yang	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the handler_map array as large as the possible value range to avoid a fencepost error. [ Utilize alternate method -- Joel ] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: remove duplicate definition in xattr	Tiger Yang	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Include/linux/xattr.h already has the definition about xattr prefix, so remove the duplicate definitions in xattr.c. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \|	ocfs2: fix function declaration and definition in xattr	Tiger Yang	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because we merged the xattr sources into one file, some functions no longer belong in the header file. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>