litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	[GFS2] Recovery for lost unlinked inodes	Steven Whitehouse	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under certain circumstances its possible (though rather unlikely) that inodes which were unlinked by one node while still open on another might get "lost" in the sense that they don't get deallocated if the node which held the inode open crashed before it was unlinked. This patch adds the recovery code which allows automatic deallocation of the inode if its found during block allocation (the sensible time to look for such inodes since we are scanning the rgrp's bitmaps anyway at this time, so it adds no overhead to do this). Since the inode will have had its i_nlink set to zero, all we need to trigger recovery is a lookup and an iput(), and the normal deallocation code takes care of the rest. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Can't mount GFS2 file system on AoE device	Robert Peterson	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes bug 243131: Can't mount GFS2 file system on AoE device. When using AoE devices with lock_nolock, there is no locking table, so gfs2 (and gfs1) uses the superblock s_id. This turns out to be the device name in some cases. In the case of AoE, the device contains a slash, (e.g. "etherd/e1.1p2") which is an invalid character when we try to register the table in sysfs. This patch replaces the "/" with underscore. Rather than add a new variable to the stack, I'm just reusing a (char *) variable that's no longer used: table. This code has been tested on the failing system using a RHEL5 patch. The upstream code was tested by using gfs2_tool sb to interject a "/" into the table name of a clustered gfs2 file system. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Fix bug in error path of inode	Steven Whitehouse	2007-07-09
\| \| \| \| \| \| \| \| \| \|	This fixes a bug in the ordering of operations in the error path of createi. Its not valid to do an iput() when holding the inode's glock since the iput() will (in this case) result in delete_inode() being called which needs to grab the lock itself. This was causing the recursive lock checking code to trigger. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Fix typo in rename of directories	Steven Whitehouse	2007-07-09
\| \| \| \| \| \| \|	A typo caused us to pass a NULL pointer when renaming directories. It was accidentally introduced in: [GFS2] Clean up inode number handling Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] variable allocation	Patrick Caulfield	2007-07-09
\| \| \| \| \| \| \| \| \| \|	Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace. This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL. (This updated version of the patch uses gfp_t for ls_allocation.) Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-Off-By: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] fix reference counting	Josef Bacik	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \|	This is a fix for the patch 021d2ff3a08019260a1dc002793c92d6bf18afb6 I left off a dlm_hold_rsb which causes the box to panic if you try to use debugfs. This patch fixes the problem. Sorry about that, Signed-off-by: Josef Bacik <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Add nanosecond timestamp feature	Steven Whitehouse	2007-07-09
\| \| \| \| \| \| \| \| \| \| \|	This adds a nanosecond timestamp feature to the GFS2 filesystem. Due to the way that the on-disk format works, older filesystems will just appear to have this field set to zero. When mounted by an older version of GFS2, the filesystem will simply ignore the extra fields so that it will again appear to have whole second resolution, so that its trivially backward compatible. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Fix sign problem in quota/statfs and cleanup _host structures	Steven Whitehouse	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes some sign issues which were accidentally introduced into the quota & statfs code during the endianess annotation process. Also included is a general clean up which moves all of the _host structures out of gfs2_ondisk.h (where they should not have been to start with) and into the places where they are actually used (often only one place). Also those _host structures which are not required any more are removed entirely (which is the eventual plan for all of them). The conversion routines from ondisk.c are also moved into the places where they are actually used, which for almost every one, was just one single place, so all those are now static functions. This also cleans up the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__. The net result is a reduction of about 100 lines of code, many functions now marked static plus the bug fixes as mentioned above. For good measure I ran the code through sparse after making these changes to check that there are no warnings generated. This fixes Red Hat bz #239686 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] fix jdata issues	Benjamin Marzinski	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a patch for the first three issues of RHBZ #238162 The first issue is that when you allocate a new page for a file, it will not start off uptodate. This makes sense, since you haven't written anything to that part of the file yet. Unfortunately, gfs2_pin() checks to make sure that the buffers are uptodate. The solution to this is to mark the buffers uptodate in gfs2_commit_write(), after they have been zeroed out and have the data written into them. I'm pretty confident with this fix, although it's not completely obvious that there is no problem with marking the buffers uptodate here. The second issue is simply that you can try to pin a data buffer that is already on the incore log, and thus, already pinned. This patch checks to see if this buffer is already on the log, and exits databuf_lo_add() if it is, just like buf_lo_add() does. The third issue is that gfs2_log_flush() doesn't do it's block accounting correctly. Both metadata and journaled data are logged, but gfs2_log_flush() only compares the number of metadata blocks with the number of blocks to commit to the ondisk journal. This patch also counts the journaled data blocks. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] fix socket shutdown	Patrick Caulfield	2007-07-09
\| \| \| \| \| \| \| \| \| \|	This patch clears the user_data of active sockets as part of cleanup. This prevents any late-arriving data from trying to add jobs to the work queue while we are tidying up. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-Off-By: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Make the log reserved blocks depend on block size	Steven Whitehouse	2007-07-09
\| \| \| \| \| \| \| \| \| \| \|	The number of blocks which we reserve in the log at the start of each transaction needs to depends upon the block size since the overhead is related to the number of "pointers" which can be fitted into a single block. This relates to Red Hat bz #240435 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Quotas non-functional - fix another bug	Abhijith Das	2007-07-09
\| \| \| \| \| \| \| \|	This patch fixes a bug where gfs2 was writing update quota usage information to the wrong location in the quota file. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] show default protocol	David Teigland	2007-07-09
\| \| \| \| \| \| \| \| \|	Display the initial value of the "protocol" config value in configfs. The default value has always been 0 in the past anyway, so it's always appeared to be correct. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] dumping master locks	David Teigland	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new debugfs file that dumps a compact list of mastered locks. This will be used by a userland daemon to collect state for deadlock detection. Also, for the existing function that prints all lock state, lock the rsb before going through the lock lists since they can be changing in the course of normal dlm activity. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] canceling deadlocked lock	David Teigland	2007-07-09
\| \| \| \| \| \| \| \| \|	Add a function that can be used through libdlm by a system daemon to cancel another process's deadlocked lock. A completion ast with EDEADLK is returned to the process waiting for the lock. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] timeout fixes	David Teigland	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Various fixes related to the new timeout feature: - add_timeout() missed setting TIMEWARN flag on lkb's when the TIMEOUT flag was already set - clear_proc_locks should remove a dead process's locks from the timeout list - the end-of-life calculation for user locks needs to consider that ETIMEDOUT is equivalent to -DLM_ECANCEL - make initial default timewarn_cs config value visible in configfs - change bit position of TIMEOUT_CANCEL flag so it's not copied to a remote master node - set timestamp on remote lkb's so a lock dump will display the time they've been waiting Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] Compile fix	Steven Whitehouse	2007-07-09
\| \| \| \| \| \| \| \|	A one liner fix which got missed from the earlier patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Fabio Massimo Di Nitto <fabbione@ubuntu.com> Cc: David Teigland <teigland@redhat.com>
*	[DLM] fix compile breakage	David Teigland	2007-07-09
\| \| \| \| \| \| \| \| \|	In the rush to get the previous patch set sent, a compilation bug I fixed shortly before sending somehow got clobbered, probably by a missed quilt refresh or something. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] wait for config check during join [6/6]	David Teigland	2007-07-09
\| \| \| \| \| \| \| \| \| \|	Joining the lockspace should wait for the initial round of inter-node config checks to complete before returning. This way, if there's a configuration mismatch between the joining node and the existing nodes, the join can fail and return an error to the application. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] fix new_lockspace error exit [5/6]	David Teigland	2007-07-09
\| \| \| \| \| \| \| \| \| \|	Fix the error path when exiting new_lockspace(). It was kfree'ing the lockspace struct at the end, but that's only valid if it exits before kobject_register occured. After kobject_register we have to let the kobject do the freeing. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] cancel in conversion deadlock [4/6]	David Teigland	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When conversion deadlock is detected, cancel the conversion and return EDEADLK to the application. This is a new default behavior where before the dlm would allow the deadlock to exist indefinately. The DLM_LKF_NODLCKWT flag can now be used in a conversion to prevent the dlm from performing conversion deadlock detection/cancelation on it. The DLM_LKF_CONVDEADLK flag can continue to be used as before to tell the dlm to demote the granted mode of the lock being converted if it gets into a conversion deadlock. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] dlm_device interface changes [3/6]	David Teigland	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the user/kernel device interface used by libdlm: - Add ability for userspace to check the version of the interface. libdlm can now adapt to different versions of the kernel interface. - Increase the size of the flags passed in a lock request so all possible flags can be used from userspace. - Add an opaque "xid" value for each lock. This "transaction id" will be used later to associate locks with each other during deadlock detection. - Add a "timeout" value for each lock. This is used along with the DLM_LKF_TIMEOUT flag. Also, remove a fragment of unused code in device_read(). This patch requires updating libdlm which is backward compatible with older kernels. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] add lock timeouts and warnings [2/6]	David Teigland	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	New features: lock timeouts and time warnings. If the DLM_LKF_TIMEOUT flag is set, then the request/conversion will be canceled after waiting the specified number of centiseconds (specified per lock). This feature is only available for locks requested through libdlm (can be enabled for kernel dlm users if there's a use for it.) If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then a warning message will be sent to userspace (using genetlink) after a request/conversion has been waiting for a given number of centiseconds (configurable per node). The time warnings will be used in the future to do deadlock detection in userspace. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] block scand during recovery [1/6]	David Teigland	2007-07-09
\| \| \| \| \| \| \| \|	Don't let dlm_scand run during recovery since it may try to do a resource directory removal while the directory nodes are changing. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] keep dlm from panicing when traversing rsb list in debugfs	Josef Bacik	2007-07-09
\| \| \| \| \| \| \| \| \| \| \|	This problem was originally reported against GFS6.1, but the same issue exists in upstream DLM. This patch keeps the rsb iterator assigning under the rsbtbl list lock. Each time we process an rsb we grab a reference to it to make sure it is not freed out from underneath us, and then put it when we get the next rsb in the list or move onto another list. Signed-off-by: Josef Bacik <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Quotas non-functional - fix bug	Abhijith Das	2007-07-09
\| \| \| \| \| \| \| \| \| \|	This patch fixes an error in the quota code where a 'struct gfs2_quota_lvb' was being passed to gfs2_adjust_quota() instead of a 'struct gfs2_quota_data'. Also moved 'struct gfs2_quota_lvb' from fs/gfs2/incore.h to include/linux/gfs2_ondisk.h as per Steve's suggestion. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Clean up inode number handling	Steven Whitehouse	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch cleans up the inode number handling code. The main difference is that instead of looking up the inodes using a struct gfs2_inum_host we now use just the no_addr member of this structure. The tests relating to no_formal_ino can then be done by the calling code. This has advantages in that we want to do different things in different code paths if the no_formal_ino doesn't match. In the NFS patch we want to return -ESTALE, but in the ->lookup() path, its a bug in the fs if the no_formal_ino doesn't match and thus we can withdraw in this case. In order to later fix bz #201012, we need to be able to look up an inode without knowing no_formal_ino, as the only information that is known to us is the on-disk location of the inode in question. This patch will also help us to fix bz #236099 at a later date by cleaning up a lot of the code in that area. There are no user visible changes as a result of this patch and there are no changes to the on-disk format either. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Reduce size of struct gdlm_lock	Steven Whitehouse	2007-07-09
\| \| \| \| \| \| \| \| \| \| \|	This patch removes the completion (which is rather large) from struct gdlm_lock in favour of using the wait_on_bit() functions. We don't need to add any extra fields to the structure to do this, so we save 32 bytes (on x86_64) per structure. This adds up to quite a lot when we may potentially have millions of these lock structures, Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: David Teigland <teigland@redhat.com>
*	[GFS2] Addendum patch 2 for gfs2_grow	Robert Peterson	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This addendum patch 2 corrects three things: 1. It fixes a stupid mistake in the previous addendum that broke gfs2. Ref: https://www.redhat.com/archives/cluster-devel/2007-May/msg00162.html 2. It fixes a problem that Dave Teigland pointed out regarding the external declarations in ops_address.h being in the wrong place. 3. It recasts a couple more %llu printks to (unsigned long long) as requested by Steve Whitehouse. I would have loved to put this all in one revised patch, but there was a rush to get some patches for RHEL5. Therefore, the previous patches were applied to the git tree "as is" and therefore, I'm posting another addendum. Sorry. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] use zero_user_page	Nate Diller	2007-07-09
\| \| \| \| \| \| \| \|	Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
*	[GFS2] Kernel changes to support new gfs2_grow command (part 2)	Robert Peterson	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To avoid code redundancy, I separated out the operational "guts" into a new function called read_rindex_entry. Then I made two functions: the closer-to-original gfs2_ri_update (without the special condition checks) and gfs2_ri_update_special that's designed with that condition in mind. (I don't like the name, but if you have a suggestion, I'm all ears). Oh, and there's an added benefit: we don't need all the ugly gotos anymore. ;) This patch has been tested with gfs2_fsck_hellfire (which runs for three and a half hours, btw). Signed-off-By: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] kernel changes to support new gfs2_grow command	Robert Peterson	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is another revision of my gfs2 kernel patch that allows gfs2_grow to function properly. Steve Whitehouse expressed some concerns about the previous patch and I restructured it based on his comments. The previous patch was doing the statfs_change at file close time, under its own transaction. The current patch does the statfs_change inside the gfs2_commit_write function, which keeps it under the umbrella of the inode transaction. I can't call ri_update to re-read the rindex file during the transaction because the transaction may have outstanding unwritten buffers attached to the rgrps that would be otherwise blown away. So instead, I created a new function, gfs2_ri_total, that will re-read the rindex file just to total the file system space for the sake of the statfs_change. The ri_update will happen later, when gfs2 realizes the version number has changed, as it happened before my patch. Since the statfs_change is happening at write_commit time and there may be multiple writes to the rindex file for one grow operation. So one consequence of this restructuring is that instead of getting one kernel message to indicate the change, you may see several. For example, before when you did a gfs2_grow, you'd get a single message like: GFS2: File system extended by 247876 blocks (968MB) Now you get something like: GFS2: File system extended by 207896 blocks (812MB) GFS2: File system extended by 39980 blocks (156MB) This version has also been successfully run against the hours-long "gfs2_fsck_hellfire" test that does several gfs2_grow and gfs2_fsck while interjecting file system damage. It does this repeatedly under a variety Resource Group conditions. Signed-off-By: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[DLM] fix a couple of races	Satyam Sharma	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix two races in fs/dlm/config.c: (1) Grab the configfs subsystem semaphore before calling config_group_find_obj() in get_space(). This solves a potential race between get_space() and concurrent mkdir(2) or rmdir(2). (2) Grab a reference on the found config_item _while_ holding the configfs subsystem semaphore in get_comm(), and not after it. This solves a potential race between get_comm() and concurrent rmdir(2). Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] flush the glock completely in inode_go_sync	Benjamin Marzinski	2007-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix for bz #231910 When filemap_fdatawrite() is called on the inode mapping in data=ordered mode, it will add the glock to the log. In inode_go_sync(), if you do the gfs2_log_flush() before this, after the filemap_fdatawrite() call, the glock and its associated data buffers will be on the log again. This means you can demote a lock from exclusive, without having it flushed from the log. The attached patch simply moves the gfs2_log_flush up to after the filemap_fdatawrite() call. Originally, I tried moving the gfs2_log_flush to after gfs2_meta_sync(), but that caused me to trip the following assert. GFS2: fsid=cypher-36:test.0: fatal: assertion "!buffer_busy(bh)" failed GFS2: fsid=cypher-36:test.0: function = gfs2_ail_empty_gl, file = fs/gfs2/glops.c, line = 61 It appears that gfs2_log_flush() puts some of the glocks buffers in the busy state and the filemap_fdatawrite() call is necessary to flush them. This makes me worry slightly that a related problem could happen because of moving the gfs2_log_flush() after the initial filemap_fdatawrite(), but I assume that gfs2_ail_empty_gl() would catch that case as well. Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	Fix permission checking for the new utimensat() system call	Linus Torvalds	2007-07-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 1c710c896eb461895d3c399e15bb5f20b39c9073 added the utimensat() system call, but didn't handle the case of checking for the writability of the target right, when the target was a file descriptor, not a filename. We cannot use vfs_permission(MAY_WRITE) for that case, and need to simply check whether the file descriptor is writable. The oops from using the wrong function was noticed and narrowed down by Markus Trippelsdorf. Cc: Ulrich Drepper <drepper@redhat.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	DLM must depend on SYSFS	Adrian Bunk	2007-07-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The dependency of DLM on SYSFS got lost in commit 6ed7257b46709e87d79ac2b6b819b7e0c9184998 resulting in the following compile error with CONFIG_DLM=y, CONFIG_SYSFS=n: <-- snip --> ... LD .tmp_vmlinux1 fs/built-in.o: In function `dlm_lockspace_init': /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/dlm/lockspace.c:231: undefined reference to `kernel_subsys' fs/built-in.o: In function `configfs_init': /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/configfs/mount.c:143: undefined reference to `kernel_subsys' make[1]: *** [.tmp_vmlinux1] Error 1 <-- snip --> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Fix elf_core_dump() when writing arch specific notes (spu coredumps)	Michael Ellerman	2007-07-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	elf_core_dump() supports dumping arch specific ELF notes, via the #define ELF_CORE_WRITE_EXTRA_NOTES. Currently the only user of this is the powerpc spu coredump code. There is a bug in the handling of foffset WRT the arch notes, which causes us to erroneously increment foffset by the size of the arch notes, leaving a block of zeroes in the file, and causing all subsequent data in the file to be at <supposed position> + <arch note size>. eg: LOAD 0x050000 0x00100000 0x00000000 0x20000 0x20000 R E 0x10000 Tells us we should have a chunk of data at 0x50000. The truth is the data is at 0x90dbc = 0x50000 + 0x40dbc (the size of the arch notes). This bug prevents gdb from reading the core file correctly. The simplest fix is to simply remember the size of the arch notes, and add it to foffset after we've written the arch notes. The only drawback is that if the arch code doesn't write as many bytes as it said it would, we end up with a broken core dump again. For now I think that's a reasonable requirement. Tested on a Cell blade, gdb no longer complains about the core file being bogus. While I'm here I should point out that the spu coredump code does not work if we're dumping to a pipe - we'll have to wait for 23 to fix that. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[JFFS2] Fix readinode failure when read_dnode() detects CRC failure.	David Woodhouse	2007-07-04
\| \| \| \| \| \| \| \|	We should have stopped returning 1 from read_dnode() to indicate failure. We can just mark the damn thing obsolete immediately. But I missed a case where we don't. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
*	dio: remove bogus refcounting BUG_ON	Zach Brown	2007-07-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Badari Pulavarty reported a case of this BUG_ON is triggering during testing. It's completely bogus and should be removed. It's trying to notice if we left references to the dio hanging around in the sync case. They should have been dropped as IO completed while this path was in dio_await_completion(). This condition will also be checked, via some twisty logic, by the BUG_ON(ret != -EIOCBQUEUED) a few lines lower. So to start this BUG_ON() is redundant. More fatally, it's dereferencing dio-> after having dropped its reference. It's only safe to dereference the dio after releasing the lock if the final reference was just dropped. Another CPU might free the dio in bio completion and reuse the memory after this path drops the dio lock but before the BUG_ON() is evaluated. This patch passed aio+dio regression unit tests and aio-stress on ext3. Signed-off-by: Zach Brown <zach.brown@oracle.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Introduce fixed sys_sync_file_range2() syscall, implement on PowerPC and ARM	David Woodhouse	2007-06-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Not all the world is an i386. Many architectures need 64-bit arguments to be aligned in suitable pairs of registers, and the original sys_sync_file_range(int, loff_t, loff_t, int) was therefore wasting an argument register for padding after the first integer. Since we don't normally have more than 6 arguments for system calls, that left no room for the final argument on some architectures. Fix this by introducing sys_sync_file_range2(int, int, loff_t, loff_t) which all fits nicely. In fact, ARM already had that, but called it sys_arm_sync_file_range. Move it to fs/sync.c and rename it, then implement the needed compatibility routine. And stop the missing syscall check from bitching about the absence of sys_sync_file_range() if we've implemented sys_sync_file_range2() instead. Tested on PPC32 and with 32-bit and 64-bit userspace on PPC64. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ext2: fix return of uninitialised variable	Andrew Morton	2007-06-28
\| \| \| \| \| \| \| \| \| \|	gcc correctly says fs/ext2/super.c: In function 'ext2_remount': fs/ext2/super.c:1055: warning: 'err' may be used uninitialized in this function Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	avoid spurious POLLIN returns in signalfd	Davide Libenzi	2007-06-28
\| \| \| \| \| \| \| \| \| \| \|	The new code in kernel/signal.c does not allow fetching private signals from another task. This patch avoid spurious POLLIN returns from a signalfd poll(2) operation. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	zero out last page for llseek/write	Michael Halcrow	2007-06-28
\| \| \| \| \| \| \| \| \| \| \|	When one llseek's past the end of the file and then writes, every page past the previous end of the file should be cleared. Trevor found that the code, as is, does not assure that the very last page is always cleared. This patch takes care of that. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	eCryptfs: initialize crypt_stat in setattr	Michael Halcrow	2007-06-28
\| \| \| \| \| \| \| \| \| \| \|	Recent changes in eCryptfs have made it possible to get to ecryptfs_setattr() with an uninitialized crypt_stat struct. This results in a wide and colorful variety of unpleasantries. This patch properly initializes the crypt_stat structure in ecryptfs_setattr() when it is necessary to do so. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	eCryptfs: fix write zeros behavior	Michael Halcrow	2007-06-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the processes involved in wiping regions of the data during truncate and write events, fixing a kernel hang in 2.6.22-rc4 while assuring that zero values are written out to the appropriate locations during events in which the i_size will change. The range passed to ecryptfs_truncate() from ecryptfs_prepare_write() includes the page that is the object of ecryptfs_prepare_write(). This leads to a kernel hang as read_cache_page() is executed on the same page in the ecryptfs_truncate() execution path. This patch remedies this by limiting the range passed to ecryptfs_truncate() so as to exclude the page that is the object of ecryptfs_prepare_write(); it also adds code to ecryptfs_prepare_write() to zero out the region of its own page when writing past the i_size position. This patch also modifies ecryptfs_truncate() so that when a file is truncated to a smaller size, eCryptfs will zero out the contents of the new last page from the new size through to the end of the last page. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ext4: lost brelse in ext4_read_inode()	Kirill Korotaev	2007-06-24
\| \| \| \| \| \| \| \| \| \|	One of error path in ext4_read_inode() leaks bh since brelse is forgoten. Signed-off-by: Kirill Korotaev <dev@openvz.org> Acked-by: Vasily Averin <vvs@sw.ru> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ext3: lost brelse in ext3_read_inode()	Kirill Korotaev	2007-06-24
\| \| \| \| \| \| \| \| \|	One of error path in ext3_read_inode() leaks bh since brelse is forgoten. Signed-off-by: Kirill Korotaev <dev@openvz.org> Acked-by: Vasily Averin <vvs@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ext2: disallow setting xip on remount	Carsten Otte	2007-06-24
\| \| \| \| \| \| \| \| \| \| \|	Yan Zheng pointed out that ext2_remount lacks checking if -o xip should be enabled or not. This patch checks for presence of direct_access on the backing block device and if the blocksize meets the requirements. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Cc: Yan Zheng <yanzheng@21cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[XFS] s/memclear_highpage_flush/zero_user_page/	Christoph Hellwig	2007-06-19
\| \| \| \| \| \| \| \|	SGI-PV: 957103 SGI-Modid: xfs-linux-melb:xfs-kern:28678a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>
*	shm: fix the filename of hugetlb sysv shared memory	Eric W. Biederman	2007-06-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some user space tools need to identify SYSV shared memory when examining /proc/<pid>/maps. To do so they look for a block device with major zero, a dentry named SYSV<sysv key>, and having the minor of the internal sysv shared memory kernel mount. To help these tools and to make it easier for people just browsing /proc/<pid>/maps this patch modifies hugetlb sysv shared memory to use the SYSV<key> dentry naming convention. User space tools will still have to be aware that hugetlb sysv shared memory lives on a different internal kernel mount and so has a different block device minor number from the rest of sysv shared memory. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Albert Cahalan <acahalan@gmail.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>