litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	ext4: Fix compat EXT4_IOC_ADD_GROUP	Ben Hutchings	2010-05-17
\| \| \| \| \| \| \| \|	struct ext4_new_group_input needs to be converted because u64 has only 32-bit alignment on some 32-bit architectures, notably i386. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Conditionally define compat ioctl numbers	Ben Hutchings	2010-05-17
\| \| \| \| \| \| \| \| \|	It is unnecessary, and in general impossible, to define the compat ioctl numbers except when building the filesystem with CONFIG_COMPAT defined. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	tracing: Convert more ext4 events to DEFINE_EVENT	Li Zefan	2010-05-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use DECLARE_EVENT_CLASS, and save ~2.7K: text data bss dec hex filename 274441 7200 260 281901 44d2d fs/ext4/ext4.o.orig 271881 7040 256 279177 44289 fs/ext4/ext4.o 4 events are converted: ext4__mb_new_pa: ext4_mb_new_inode_pa, ext4_mb_new_group_pa ext4__mballoc: ext4_mballoc_discard, ext4_mballoc_free No change in functionality. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Add new tracepoints to track mballoc's buddy bitmap loads	Theodore Ts'o	2010-05-17
\| \| \| \|	Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Add a missing trace hook	Li Zefan	2010-05-17
\| \| \| \| \| \| \| \| \|	Commit f8ec9d6837241865cf99bed97bb99f4399fd5a03 added a trace event ext4_da_release_space, but didn't add some corresponding trace hook. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: restart ext4_ext_remove_space() after transaction restart	Dmitry Monakhov	2010-05-17
\| \| \| \| \| \| \| \| \| \| \| \|	If i_data_sem was internally dropped due to transaction restart, it is necessary to restart path look-up because extents tree was possibly modified by ext4_get_block(). https://bugzilla.kernel.org/show_bug.cgi?id=15827 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz>
*	ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted	Theodore Ts'o	2010-05-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dimitry Monakhov discovered an edge case where it was possible for the EXT4_EOFBLOCKS_FL flag could get cleared unnecessarily. This is true; I have a test case that can be exercised via downloading and decompressing the file: wget ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ext4-testcases/eofblocks-fl-test-case.img.bz2 bunzip2 eofblocks-fl-test-case.img dd if=/dev/zero of=eofblocks-fl-test-case.img bs=1k seek=17925 bs=1k count=1 conv=notrunc However, triggering it in real life is highly unlikely since it requires an extremely fragmented sparse file with a hole in exactly the right place in the extent tree. (It actually took quite a bit of work to generate this test case.) Still, it's nice to get even extreme corner cases to be correct, so this patch makes sure that we don't clear the EXT4_EOFBLOCKS_FL incorrectly even in this corner case. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Avoid crashing on NULL ptr dereference on a filesystem error	Theodore Ts'o	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the EOFBLOCK_FL flag is set when it should not be and the inode is zero length, then eh_entries is zero, and ex is NULL, so dereferencing ex to print ex->ee_block causes a kernel OOPS in ext4_ext_map_blocks(). On top of that, the error message which is printed isn't very helpful. So we fix this by printing something more explanatory which doesn't involve trying to print ex->ee_block. Addresses-Google-Bug: #2655740 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Use bitops to read/modify i_flags in struct ext4_inode_info	Dmitry Monakhov	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \|	At several places we modify EXT4_I(inode)->i_flags without holding i_mutex (ext4_do_update_inode, ...). These modifications are racy and we can lose updates to i_flags. So convert handling of i_flags to use bitops which are atomic. https://bugzilla.kernel.org/show_bug.cgi?id=15792 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE()	Theodore Ts'o	2010-05-16
\| \| \| \| \| \| \| \| \| \|	EXT4_ERROR_INODE() tends to provide better error information and in a more consistent format. Some errors were not even identifying the inode or directory which was corrupted, which made them not very useful. Addresses-Google-Bug: #2507977 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks()	Theodore Ts'o	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \|	This saves a huge amount of stack space by avoiding unnecesary struct buffer_head's from being allocated on the stack. In addition, to make the code easier to understand, collapse and refactor ext4_get_block(), ext4_get_block_write(), noalloc_get_block_write(), into a single function. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks()	Theodore Ts'o	2010-05-16
\| \| \| \| \| \| \| \| \| \| \|	Jack up ext4_get_blocks() and add a new function, ext4_map_blocks() which uses a much smaller structure, struct ext4_map_blocks which is 20 bytes, as opposed to a struct buffer_head, which nearly 5 times bigger on an x86_64 machine. By switching things to use ext4_map_blocks(), we can save stack space by using ext4_map_blocks() since we can avoid allocating a struct buffer_head on the stack. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Use our own write_cache_pages()	Theodore Ts'o	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make a copy of write_cache_pages() for the benefit of ext4_da_writepages(). This allows us to simplify the code some, and will allow us to further customize the code in future patches. There are some nasty hacks in write_cache_pages(), which Linus has (correctly) characterized as vile. I've just copied it into write_cache_pages_da(), without trying to clean those bits up lest I break something in the ext4's delalloc implementation, which is a bit fragile right now. This will allow Dave Chinner to clean up write_cache_pages() in mm/page-writeback.c, without worrying about breaking ext4. Eventually write_cache_pages_da() will go away when I rewrite ext4's delayed allocation and create a general ext4_writepages() which is used for all of ext4's writeback. Until now this is the lowest risk way to clean up the core write_cache_pages() function. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dave Chinner <david@fromorbit.com>
*	ext4: Show journal_checksum option	Jan Kara	2010-05-16
\| \| \| \| \| \| \|	We failed to show journal_checksum option in /proc/mounts. Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Fix for ext4_mb_collect_stats()	Curt Wohlgemuth	2010-05-16
\| \| \| \| \| \| \| \| \|	Fix ext4_mb_collect_stats() to use the correct test for s_bal_success; it should be testing "best-extent.fe_len >= orig-extent.fe_len" , not "orig-extent.fe_len >= goal-extent.fe_len" . Signed-off-by: Curt Wohlgemuth <curtw@google.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: check for a good block group before loading buddy pages	Curt Wohlgemuth	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a new field in ext4_group_info to cache the largest available block range in a block group; and don't load the buddy pages until after we've done a sanity check on the block group. With large allocation requests (e.g., fallocate(), 8MiB) and relatively full partitions, it's easy to have no block groups with a block extent large enough to satisfy the input request length. This currently causes the loop during cr == 0 in ext4_mb_regular_allocator() to load the buddy bitmap pages for EVERY block group. That can be a lot of pages. The patch below allows us to call ext4_mb_good_group() BEFORE we load the buddy pages (although we have check again after we lock the block group). Addresses-Google-Bug: #2578108 Addresses-Google-Bug: #2704453 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate	Nikanth Karthikesan	2010-05-16
\| \| \| \| \| \| \| \| \|	Currently using posix_fallocate one can bypass an RLIMIT_FSIZE limit and create a file larger than the limit. Add a check for that. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Amit Arora <aarora@in.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Remove extraneous newlines in ext4_msg() calls	Curt Wohlgemuth	2010-05-16
\| \| \| \| \| \| \|	Addresses-Google-Bug: #2562325 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Print mount options in when mounting and add a remount message	Curt Wohlgemuth	2010-05-16
\| \| \| \| \| \| \| \| \|	This adds a "re-mounted" message to ext4_remount(), and both it and the mount message in ext4_fill_super() now have the original mount options data string. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: don't use quota reservation for speculative metadata	Eric Sandeen	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because we can badly over-reserve metadata when we calculate worst-case, it complicates things for quota, since we must reserve and then claim later, retry on EDQUOT, etc. Quota is also a generally smaller pool than fs free blocks, so this over-reservation hurts more, and more often. I'm of the opinion that it's not the worst thing to allow metadata to push a user slightly over quota. This simplifies the code and avoids the false quota rejections that result from worst-case speculation. This patch stops the speculative quota-charging for worst-case metadata requirements, and just charges quota when the blocks are allocated at writeout. It also is able to remove the try-again loop on EDQUOT. This patch has been tested indirectly by running the xfstests suite with a hack to mount & enable quota prior to the test. I also did a more specific test of fragmenting freespace and then doing a large delalloc write under quota; quota stopped me at the right amount of file IO, and then the writeout generated enough metadata (due to the fragmentation) that it put me slightly over quota, as expected. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	quota: add the option to not fail with EDQUOT in block	Eric Sandeen	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To simplify metadata tracking for delalloc writes, ext4 will simply claim metadata blocks at allocation time, without first speculatively reserving the worst case and then freeing what was not used. To do this, we need a mechanism to track allocations in the quota subsystem, but potentially allow that allocation to actually go over quota. This patch adds a DQUOT_SPACE_NOFAIL flag and function variants for this purpose. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	quota: use flags interface for dquot alloc/free space	Eric Sandeen	2010-05-16
\| \| \| \| \| \| \| \| \| \| \|	Switch __dquot_alloc_space and __dquot_free_space to take flags to indicate whether to warn and/or to reserve (or free reserve). This is slightly more readable at the callpoints, and makes it cleaner to add a "nofail" option in the next patch. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: init statistics after journal recovery	Dmitry Monakhov	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \|	Currently block/inode/dir counters initialized before journal was recovered. In fact after journal recovery this info will probably change. And freeblocks it critical for correct delalloc mode accounting. https://bugzilla.kernel.org/show_bug.cgi?id=15768 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: clean up inode bitmaps manipulation in ext4_free_inode	Dmitry Monakhov	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Reorganize locking scheme to batch two atomic operation in to one. This also allow us to state what healthy group must obey following rule ext4_free_inodes_count(sb, gdp) == ext4_count_free(inode_bitmap, NUM); - Fix possible undefined pointer dereference. - Even if group descriptor stats aren't accessible we have to update inode bitmaps. - Move non-group members update out of group_lock. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Do not zero out uninitialized extents beyond i_size	Dmitry Monakhov	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The extents code will sometimes zero out blocks and mark them as initialized instead of splitting an extent into several smaller ones. This optimization however, causes problems if the extent is beyond i_size because fsck will complain if there are uninitialized blocks after i_size as this can not be distinguished from an inode that has an incorrect i_size field. https://bugzilla.kernel.org/show_bug.cgi?id=15742 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	jbd2: Improve scalability by not taking j_state_lock in jbd2_journal_stop()	Theodore Ts'o	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	One of the most contended locks in the jbd2 layer is j_state_lock when running dbench. This is especially true if using the real-time kernel with its "sleeping spinlocks" patch that replaces spinlocks with priority inheriting mutexes --- but it also shows up on large SMP benchmarks. Thanks to John Stultz for pointing this out. Reviewed by Mingming Cao and Jan Kara. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: don't scan/accumulate more pages than mballoc will allocate	Eric Sandeen	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was a bug reported on RHEL5 that a 10G dd on a 12G box had a very, very slow sync after that. At issue was the loop in write_cache_pages scanning all the way to the end of the 10G file, even though the subsequent call to mpage_da_submit_io would only actually write a smallish amt; then we went back to the write_cache_pages loop ... wasting tons of time in calling __mpage_da_writepage for thousands of pages we would just revisit (many times) later. Upstream it's not such a big issue for sys_sync because we get to the loop with a much smaller nr_to_write, which limits the loop. However, talking with Aneesh he realized that fsync upstream still gets here with a very large nr_to_write and we face the same problem. This patch makes mpage_add_bh_to_extent stop the loop after we've accumulated 2048 pages, by setting mpd->io_done = 1; which ultimately causes the write_cache_pages loop to break. Repeating the test with a dirty_ratio of 80 (to leave something for fsync to do), I don't see huge IO performance gains, but the reduction in cpu usage is striking: 80% usage with stock, and 2% with the below patch. Instrumenting the loop in write_cache_pages clearly shows that we are wasting time here. Eventually we need to change mpage_da_map_pages() also submit its I/O to the block layer, subsuming mpage_da_submit_io(), and then change it call ext4_get_blocks() multiple times. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: stop issuing discards if not supported by device	Eric Sandeen	2010-05-16
\| \| \| \| \| \| \| \| \| \| \|	Turn off issuance of discard requests if the device does not support it - similar to the action we take for barriers. This will save a little computation time if a non-discardable device is mounted with -o discard, and also makes it obvious that it's not doing what was asked at mount time ... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: don't return to userspace after freezing the fs with a mutex held	Eric Sandeen	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ext4_freeze() used jbd2_journal_lock_updates() which takes the j_barrier mutex, and then returns to userspace. The kernel does not like this: ================================================ [ BUG: lock held when returning to user space! ] ------------------------------------------------ lvcreate/1075 is leaving the kernel with locks still held! 1 lock held by lvcreate/1075: #0: (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>] jbd2_journal_lock_updates+0xe1/0xf0 Use vfs_check_frozen() added to ext4_journal_start_sb() and ext4_force_commit() instead. Addresses-Red-Hat-Bugzilla: #568503 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: symlink must be handled via filesystem specific operation	Dmitry Monakhov	2010-05-16
\| \| \| \| \| \| \| \|	generic setattr implementation is no longer responsible for quota transfer so synlinks must be handled via ext4_setattr. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: check s_log_groups_per_flex in online resize code	Eric Sandeen	2010-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If groups_per_flex < 2, sbi->s_flex_groups[] doesn't get filled out, and every other access to this first tests s_log_groups_per_flex; same thing needs to happen in resize or we'll wander off into a null pointer when doing an online resize of the file system. Thanks to Christoph Biedl, who came up with the trivial testcase: # truncate --size 128M fsfile # mkfs.ext3 -F fsfile # tune2fs -O extents,uninit_bg,dir_index,flex_bg,huge_file,dir_nlink,extra_isize fsfile # e2fsck -yDf -C0 fsfile # truncate --size 132M fsfile # losetup /dev/loop0 fsfile # mount /dev/loop0 mnt # resize2fs -p /dev/loop0 https://bugzilla.kernel.org/show_bug.cgi?id=13549 Reported-by: Alessandro Polverini <alex@nibbles.it> Test-case-by: Christoph Biedl <bugzilla.kernel.bpeb@manchmal.in-ulm.de> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: fix quota accounting in case of fallocate	Dmitry Monakhov	2010-05-16
\| \| \| \| \| \| \|	allocated_meta_data is already included in 'used' variable. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: allow defrag (EXT4_IOC_MOVE_EXT) in 32bit compat mode	Christian Borntraeger	2010-05-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I have an x86_64 kernel with i386 userspace. e4defrag fails on the EXT4_IOC_MOVE_EXT ioctl because it is not wired up for the compat case. It seems that struct move_extent is compat save, only types with fixed widths are used: { __u32 reserved; /* should be zero / __u32 donor_fd; / donor file descriptor / __u64 orig_start; / logical start offset in block for orig / __u64 donor_start; / logical start offset in block for donor / __u64 len; / block length to be moved / __u64 moved_len; / moved block length */ }; Lets just wire up EXT4_IOC_MOVE_EXT for the compat case. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> CC: Akira Fujita <a-fujita@rs.jp.nec.com>
*	ext4: rename ext4_mb_release_desc() to ext4_mb_unload_buddy()	Jing Zhang	2010-05-14
\| \| \| \| \| \| \| \| \| \|	This function cleans up after ext4_mb_load_buddy(), so the renaming makes the code clearer. Signed-off-by: Jing Zhang <zj.barak@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Remove unnecessary call to ext4_get_group_desc() in mballoc	Jing Zhang	2010-05-13
\| \| \| \| \| \|	Signed-off-by: Jing Zhang <zj.barak@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: fix memory leaks in error path handling of ext4_ext_zeroout()	Jing Zhang	2010-05-12
\| \| \| \| \| \| \| \| \| \|	When EIO occurs after bio is submitted, there is no memory free operation for bio, which results in memory leakage. And there is also no check against bio_alloc() for bio. Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Jing Zhang <zj.barak@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Fix coding style in fs/ext4/move_extent.c	Steven Liu	2010-05-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Making sure ee_block is initialized to zero to prevent gcc from kvetching. It's harmless (although it's not obvious that it's harmless) from code inspection: fs/ext4/move_extent.c:478: warning: 'start_ext.ee_block' may be used uninitialized in this function Thanks to Stefan Richter for first bringing this to the attention of linux-ext4@vger.kernel.org. Signed-off-by: LiuQi <lingjiujianke@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
*	ext4: check missed return value in ext4_sync_file()	Dmitry Monakhov	2010-05-10
\| \| \| \| \|	Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	Linux 2.6.34-rc7v2.6.34-rc7	Linus Torvalds	2010-05-09
\|
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6	Linus Torvalds	2010-05-09
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: [SCSI] Retry commands with UNIT_ATTENTION sense codes to fix ext3/ext4 I/O error [SCSI] Enable retries for SYNCRONIZE_CACHE commands to fix I/O error [SCSI] scsi_debug: virtual_gb ignores sector_size [SCSI] libiscsi: regression: fix header digest errors [SCSI] fix locking around blk_abort_request() [SCSI] advansys: fix narrow board error path
\| *	[SCSI] Retry commands with UNIT_ATTENTION sense codes to fix ext3/ext4 I/O error	James Bottomley	2010-05-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's nastyness in the way we currently handle barriers (and discards): They're effectively filesystem commands, but they get processed as BLOCK_PC commands. Unfortunately BLOCK_PC commands are taken by SCSI to be SG_IO commands and the issuer expects to see and handle any returned errors, however trivial. This leads to a huge problem, because the block layer doesn't expect this to happen and any trivially retryable error on a barrier causes an immediate I/O error to the filesystem. The only real way to hack around this is to take the usual class of offending errors (unit attentions) and make them all retryable in the case of a REQ_HARDBARRIER. A correct fix would involve a rework of the entire block and SCSI submit system, and so is out of scope for a quick fix. Cc: Hannes Reinecke <hare@suse.de> Cc: Stable Tree <stable@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
\| *	[SCSI] Enable retries for SYNCRONIZE_CACHE commands to fix I/O error	Hannes Reinecke	2010-05-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some arrays are giving I/O errors with ext3 filesystems when SYNCHRONIZE_CACHE gets a UNIT_ATTENTION. What is happening is that these commands have no retries, so the UNIT_ATTENTION causes the barrier to fail. We should be enable retries here to clear any transient error and allow the barrier to succeed. Signed-off-by: Hannes Reinecke <hare@suse.de> Cc: Stable Tree <stable@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
\| *	[SCSI] scsi_debug: virtual_gb ignores sector_size	Douglas Gilbert	2010-05-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the scsi_debug driver, the virtual_gb option ignores the sector_size, implicitly assuming that is 512 bytes. So if 'virtual_gb=1 sector_size=4096' the result is an 8 GB (virtual) disk. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Cc: Stable Tree <stable@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
\| *	[SCSI] libiscsi: regression: fix header digest errors	Mike Christie	2010-05-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a regression introduced with this commit: commit d3305f3407fa3e9452079ec6cc8379067456e4aa Author: Mike Christie <michaelc@cs.wisc.edu> Date: Thu Aug 20 15:10:58 2009 -0500 [SCSI] libiscsi: don't increment cmdsn if cmd is not sent in 2.6.32. When I moved the hdr->cmdsn after init_task, I added a bug when header digests are used. The problem is that the LLD may calculate the header digest in init_task, so if we then set the cmdsn after the init_task call we change what the digest will be calculated by the target. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Cc: Stable Tree <stable@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
\| *	[SCSI] fix locking around blk_abort_request()	Tejun Heo	2010-05-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	blk_abort_request() expects queue lock to be held by the caller. Grab it before calling the function. Lack of this synchronization led to infinite loop on corrupt q->timeout_list. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org Signed-off-by: James Bottomley <James.Bottomley@suse.de>
\| *	[SCSI] advansys: fix narrow board error path	Herton Ronaldo Krzesinski	2010-05-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Error handling on advansys_board_found is fixed, because it's buggy in the case we have an ASC_NARROW_BOARD set and failure happens on AscInitAsc1000Driver step: it was freeing items of wrong struct in the dvc_var union of struct asc_board, which could lead to an oops in the case we set some of the fields in struct of narrow board as code was choosing to always freeing wide board fields, and not everything was being freed/released properly. Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* \|	cpuidle: Fix incorrect optimization	Arjan van de Ven	2010-05-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 672917dcc78 ("cpuidle: menu governor: reduce latency on exit") added an optimization, where the analysis on the past idle period moved from the end of idle, to the beginning of the new idle. Unfortunately, this optimization had a bug where it zeroed one key variable for new use, that is needed for the analysis. The fix is simple, zero the variable after doing the work from the previous idle. During the audit of the code that found this issue, another issue was also found; the ->measured_us data structure member is never set, a local variable is always used instead. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Corrado Zoccolo <czoccolo@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge branch 'for-linus' of git://neil.brown.name/md	Linus Torvalds	2010-05-07
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://neil.brown.name/md: md: restore ability of spare drives to spin down. md/raid6: Fix raid-6 read-error correction in degraded state
\| * \|	md: restore ability of spare drives to spin down.	NeilBrown	2010-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some time ago we stopped the clean/active metadata updates from being written to a 'spare' device in most cases so that it could spin down and say spun down. Device failure/removal etc are still recorded on spares. However commit 51d5668cb2e3fd1827a55 broke this 50% of the time, depending on whether the event count is even or odd. The change log entry said: This means that the alignment between 'odd/even' and 'clean/dirty' might take a little longer to attain, how ever the code makes no attempt to create that alignment, so it could take arbitrarily long. So when we find that clean/dirty is not aligned with odd/even, force a second metadata-update immediately. There are already cases where a second metadata-update is needed immediately (e.g. when a device fails during the metadata update). We just piggy-back on that. Reported-by: Joe Bryant <tenminjoe@yahoo.com> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org
\| * \|	md/raid6: Fix raid-6 read-error correction in degraded state	Gabriele A. Trombetti	2010-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix: Raid-6 was not trying to correct a read-error when in singly-degraded state and was instead dropping one more device, going to doubly-degraded state. This patch fixes this behaviour. Tested-by: Janos Haar <janos.haar@netcenter.hu> Signed-off-by: Gabriele A. Trombetti <g.trombetti.lkrnl1213@logicschema.com> Reported-by: Janos Haar <janos.haar@netcenter.hu> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org