aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAge
* UDF: Close small mem leak in udf_find_entry()Jesper Juhl2011-01-06
| | | | | | | | | | | | | | | Hi, There's a small memory leak in fs/udf/namei.c::udf_find_entry(). We dynamically allocate memory for 'fname' with kmalloc() and in most situations we free it before we leave the function, but there is one situation where we do not (but should). This patch closes the leak by jumping to the 'out_ok' label which does the correct cleanup rather than doing half the cleanup and returning directly. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Fix directory corruption after extent mergingJan Kara2011-01-06
| | | | | | | | | | | If udf_bread() called from udf_add_entry() managed to merge created extent to an already existing one (or if previous extents could be merged), the code truncating the last extent to proper size would just overwrite the freshly allocated extent with an extent that used to be in that place. This obviously results in a directory corruption. Fix the problem by properly reloading the last extent. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Protect udf_file_aio_write from possible racesJan Kara2011-01-06
| | | | | | | | | Code doing conversion from INICB file to a normal file in udf_file_aio_write() is not protected by any lock from other code modifying the inode. Use i_alloc_sem for that. Reported-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Remove unnecessary bkl usagesAlessio Igor Bogani2011-01-06
| | | | | | | | | | | | | | The udf_readdir(), udf_lookup(), udf_create(), udf_mknod(), udf_mkdir(), udf_rmdir(), udf_link(), udf_get_parent() and udf_unlink() seems already adequately protected by i_mutex held by VFS invoking calls. The udf_rename() instead should be already protected by lock_rename again by VFS. The udf_ioctl(), udf_fill_super() and udf_evict_inode() don't requires any further protection. This work was supported by a hardware donation from the CE Linux Forum. Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Use of s_alloc_mutex to serialize udf_relocate_blocks() executionAlessio Igor Bogani2011-01-06
| | | | | | | This work was supported by a hardware donation from the CE Linux Forum. Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Replace bkl with the UDF_I(inode)->i_data_sem for protect ↵Alessio Igor Bogani2011-01-06
| | | | | | | | | | | | | | | udf_inode_info struct Replace bkl with the UDF_I(inode)->i_data_sem rw semaphore in udf_release_file(), udf_symlink(), udf_symlink_filler(), udf_get_block(), udf_block_map(), and udf_setattr(). The rule now is that any operation on regular file's or symlink's extents (or generally allocation information including goal block) needs to hold i_data_sem. This work was supported by a hardware donation from the CE Linux Forum. Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Remove BKL from free space counting functionsJan Kara2011-01-06
| | | | | | | | | udf_count_free_bitmap() does not need BKL because bitmaps are in a fixed place on disk and so we can count set bits without serialization. udf_count_free_table() is now protected by s_alloc_mutex instead of BKL to get a consistent view of free space extents. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Call udf_add_free_space() for more blocks at once in udf_free_blocks()Jan Kara2011-01-06
| | | | | | | | | | There's no need to call udf_add_free_space() for one block at a time. It saves us noticeable amount of work and yields different result from the original code only if the filesystem is corrupted and bitmap bit is already cleared. In such case counter of free blocks is probably wrong anyways so the change does not matter. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Remove BKL from udf_put_super() and udf_remount_fs()Jan Kara2011-01-06
| | | | | | | | | udf_put_super() does not need BKL because the filesystem is shut down so there's nothing to race with. The credential changes in udf_remount_fs() and LVID changes are now protected by dedicated locks so we can remove BKL from this function as well. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Protect default inode credentials by rwlockJan Kara2011-01-06
| | | | | | | | | Superblock carries credentials (uid, gid, etc.) which are used as default values in __udf_read_inode() when media does not provide these. These credentials can change during remount so we protect them by a rwlock so that each inode gets a consistent set of credentials. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Protect all modifications of LVID with s_alloc_mutexJan Kara2011-01-06
| | | | | | | | | udf_open_lvid() and udf_close_lvid() were modifying LVID without s_alloc_mutex. Since they can be called from remount, the modification could race with other filesystem modifications of LVID so protect them by s_alloc_mutex just to be sure. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Move handling of uniqueID into a helper function and protect it by a ↵Jan Kara2011-01-06
| | | | | | | | | | s_alloc_mutex uniqueID handling has been duplicated in three places. Move it into a common helper. Since we modify an LVID buffer with uniqueID update, we take sbi->s_alloc_mutex to protect agaist other modifications of the structure. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Remove BKL from udf_update_inodeJan Kara2011-01-06
| | | | | | | | | | | udf_update_inode() does not need BKL since on-disk inode modifications are protected by the buffer lock and reading of values of in-memory inode is safe without any lock. In some cases we can write inconsistent inode state to disk but in that case inode will be marked dirty and overwritten later. Also make unnecessarily global udf_sync_inode() static. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Convert UDF_SB(sb)->s_flags to use bitopsJan Kara2011-01-06
| | | | | | | Use atomic bitops to manipulate with sb flags to make manipulation safe without any locking. Signed-off-by: Jan Kara <jack@suse.cz>
* fs/udf: Add printf format/argument verificationJoe Perches2011-01-06
| | | | | | | | | Add __attribute__((format... to udf_warning. All arguments matched formats, no other changes necessary. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fs/udf: Use vzallocJoe Perches2011-01-06
| | | | | Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jan Kara <jack@suse.cz>
* Merge branch 'upstream-linus' of ↵Linus Torvalds2010-12-23
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Fix system inodes cache overflow. ocfs2: Hold ip_lock when set/clear flags for indexed dir. ocfs2: Adjust masklog flag values Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem. ocfs2/dlm: Migrate lockres with no locks if it has a reference
| * ocfs2: Fix system inodes cache overflow.Tao Ma2010-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we store system inodes cache in ocfs2_super, we use a array for global system inodes. But unfortunately, the range is calculated wrongly which makes it overflow and pollute ocfs2_super->local_system_inodes. This patch fix it by setting the range properly. The corresponding bug is ossbug1303. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1303 Cc: stable@kernel.org Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * ocfs2: Hold ip_lock when set/clear flags for indexed dir.Tao Ma2010-12-16
| | | | | | | | | | | | | | | | When we set/clear the dyn_features for an inode we hold the ip_lock. So do it when we set/clear OCFS2_INDEXED_DIR_FL also. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * ocfs2: Adjust masklog flag valuesSunil Mushran2010-12-16
| | | | | | | | | | | | | | Two masklogs had the same flag value. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem.Tristan Ye2010-12-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to newly-introduced 'coherency=full' O_DIRECT writes also takes the EX rw_lock like buffered writes did(rw_level == 1), it turns out messing the usage of 'level' in ocfs2_dio_end_io() up, which caused i_alloc_sem being failed to get up_read'd correctly. This patch tries to teach ocfs2_dio_end_io to understand well on all locking stuffs by explicitly introducing a new bit for i_alloc_sem in iocb's private data, just like what we did for rw_lock. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * ocfs2/dlm: Migrate lockres with no locks if it has a referenceSunil Mushran2010-12-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o2dlm was not migrating resources with zero locks because it assumed that that resource would get purged by dlm_thread. However, some usage patterns involve creating and dropping locks at a high rate leading to the migrate thread seeing zero locks but the purge thread seeing an active reference. When this happens, the dlm_thread cannot purge the resource and the migrate thread sees no reason to migrate that resource. The spell is broken when the migrate thread catches the resource with a lock. The fix is to make the migrate thread also consider the reference map. This usage pattern can be triggered by userspace on userdlm locks and flocks. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* | Merge branch 'linus-hot-fix' of ↵Linus Torvalds2010-12-23
|\ \ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'linus-hot-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix on-line resizing regression
| * | ext4: fix on-line resizing regressionTheodore Ts'o2010-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://bugzilla.kernel.org/show_bug.cgi?id=25352 This regression was caused by commit a31437b85: "ext4: use sb_issue_zeroout in setup_new_group_blocks", by accidentally dropping the code which reserved the block group descriptor and inode table blocks. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* | | logfs: fix "Kernel BUG at readwrite.c:1193"Prasad Joshi2010-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This happens when __logfs_create() tries to write a new inode to the disk which is full. __logfs_create() associates the transaction pointer with inode. During the logfs_write_inode() function call chain this transaction pointer is moved from inode to page->private using function move_inode_to_page (do_write_inode() -> inode_to_page() -> move_inode_to_page) When the write inode fails, the transaction is aborted and iput is called on the failed inode. During delete_inode the same transaction pointer associated with the page is getting used. Thus causing kernel BUG. The patch checks for error in write_inode() and restores the page->private to NULL. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=20162 Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com> Cc: Joern Engel <joern@logfs.org> Cc: Florian Mickler <florian@mickler.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | logfs: fix deadlock in logfs_get_wblocks, hold and wait on super->s_write_mutexPrasad Joshi2010-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do_logfs_journal_wl_pass() should use GFP_NOFS for memory allocation GC code calls btree_insert32 with GFP_KERNEL while holding a mutex super->s_write_mutex. The same mutex is used in address_space_operations->writepage(), and a call to writepage() could be triggered as a result of memory allocation in btree_insert32, causing a deadlock. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=20342 Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com> Cc: Joern Engel <joern@logfs.org> Cc: Florian Mickler <florian@mickler.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2010-12-21
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: handle partial result from get_user_pages ceph: mark user pages dirty on direct-io reads ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport ceph: fix direct-io on non-page-aligned buffers ceph: fix msgr_init error path
| * | | ceph: mark user pages dirty on direct-io readsHenry C Chang2010-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For read operation, we have to set the argument _write_ of get_user_pages to 1 since we will write data to pages. Also, we need to SetPageDirty before releasing these pages. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: fix null pointer dereference in ceph_init_dentry for nfs reexportSage Weil2010-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The fh_to_dentry etc. methods use ceph_init_dentry(), which assumes that d_parent is defined. It isn't for those callers, so check! Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: fix direct-io on non-page-aligned buffersHenry C Chang2010-12-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The user buffer may be 512-byte aligned, not page-aligned. We were assuming the buffer was page-aligned and only accounting for non-page-aligned io offsets. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
* | | | Fix btrfs b0rkageAl Viro2010-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Buggered-in: 76dda93c6ae2 ("Btrfs: add snapshot/subvolume destroy ioctl") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notifyLinus Torvalds2010-12-16
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.infradead.org/users/eparis/notify: fanotify: fill in the metadata_len field on struct fanotify_event_metadata fanotify: split version into version and metadata_len fanotify: Dont try to open a file descriptor for the overflow event fanotify: Introduce FAN_NOFD fanotify: do not leak user reference on allocation failure inotify: stop kernel memory leak on file creation failure fanotify: on group destroy allow all waiters to bypass permission check fanotify: Dont allow a mask of 0 if setting or removing a mark fanotify: correct broken ref counting in case adding a mark failed fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is called fanotify: remove packed from access response message fanotify: deny permissions when no event was sent
| * | | | fanotify: fill in the metadata_len field on struct fanotify_event_metadataEric Paris2010-12-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fanotify_event_metadata now has a field which is supposed to indicate the length of the metadata portion of the event. Fill in that field as well. Based-in-part-on-patch-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | | fanotify: Dont try to open a file descriptor for the overflow eventLino Sanfilippo2010-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should not try to open a file descriptor for the overflow event since this will always fail. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | | fanotify: do not leak user reference on allocation failureEric Paris2010-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If fanotify_init is unable to allocate a new fsnotify group it will return but will not drop its reference on the associated user struct. Drop that reference on error. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | | inotify: stop kernel memory leak on file creation failureEric Paris2010-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If inotify_init is unable to allocate a new file for the new inotify group we leak the new group. This patch drops the reference on the group on file allocation failure. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> cc: stable@kernel.org Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | | fanotify: on group destroy allow all waiters to bypass permission checkLino Sanfilippo2010-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When fanotify_release() is called, there may still be processes waiting for access permission. Currently only processes for which an event has already been queued into the groups access list will be woken up. Processes for which no event has been queued will continue to sleep and thus cause a deadlock when fsnotify_put_group() is called. Furthermore there is a race allowing further processes to be waiting on the access wait queue after wake_up (if they arrive before clear_marks_by_group() is called). This patch corrects this by setting a flag to inform processes that the group is about to be destroyed and thus not to wait for access permission. [additional changelog from eparis] Lets think about the 4 relevant code paths from the PoV of the 'operator' 'listener' 'responder' and 'closer'. Where operator is the process doing an action (like open/read) which could require permission. Listener is the task (or in this case thread) slated with reading from the fanotify file descriptor. The 'responder' is the thread responsible for responding to access requests. 'Closer' is the thread attempting to close the fanotify file descriptor. The 'operator' is going to end up in: fanotify_handle_event() get_response_from_access() (THIS BLOCKS WAITING ON USERSPACE) The 'listener' interesting code path fanotify_read() copy_event_to_user() prepare_for_access_response() (THIS CREATES AN fanotify_response_event) The 'responder' code path: fanotify_write() process_access_response() (REMOVE A fanotify_response_event, SET RESPONSE, WAKE UP 'operator') The 'closer': fanotify_release() (SUPPOSED TO CLEAN UP THE REST OF THIS MESS) What we have today is that in the closer we remove all of the fanotify_response_events and set a bit so no more response events are ever created in prepare_for_access_response(). The bug is that we never wake all of the operators up and tell them to move along. You fix that in fanotify_get_response_from_access(). You also fix other operators which haven't gotten there yet. So I agree that's a good fix. [/additional changelog from eparis] [remove additional changes to minimize patch size] [move initialization so it was inside CONFIG_FANOTIFY_PERMISSION] Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | | fanotify: Dont allow a mask of 0 if setting or removing a markLino Sanfilippo2010-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In mark_remove_from_mask() we destroy marks that have their event mask cleared. Thus we should not allow the creation of those marks in the first place. With this patch we check if the mask given from user is 0 in case of FAN_MARK_ADD. If so we return an error. Same for FAN_MARK_REMOVE since this does not have any effect. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | | fanotify: correct broken ref counting in case adding a mark failedLino Sanfilippo2010-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If adding a mount or inode mark failed fanotify_free_mark() is called explicitly. But at this time the mark has already been put into the destroy list of the fsnotify_mark kernel thread. If the thread is too slow it will try to decrease the reference of a mark, that has already been freed by fanotify_free_mark(). (If its fast enough it will only decrease the marks ref counter from 2 to 1 - note that the counter has been increased to 2 in add_mark() - which has practically no effect.) This patch fixes the ref counting by not calling free_mark() explicitly, but decreasing the ref counter and rely on the fsnotify_mark thread to cleanup in case adding the mark has failed. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | | fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is calledLino Sanfilippo2010-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unsetting FMODE_NONOTIFY in fsnotify_open() is too late, since fsnotify_perm() is called before. If FMODE_NONOTIFY is set fsnotify_perm() will skip permission checks, so a user can still disable permission checks by setting this flag in an open() call. This patch corrects this by unsetting the flag before fsnotify_perm is called. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | | fanotify: deny permissions when no event was sentEric Paris2010-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If no event was sent to userspace we cannot expect userspace to respond to permissions requests. Today such requests just hang forever. This patch will deny any permissions event which was unable to be sent to userspace. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com> Signed-off-by: Eric Paris <eparis@redhat.com>
* | | | | nilfs2: fix regression of garbage collection ioctlRyusuke Konishi2010-12-16
| |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On 2.6.37-rc1, garbage collection ioctl of nilfs was broken due to the commit 263d90cefc7d82a0 ("nilfs2: remove own inode hash used for GC"), and leading to filesystem corruption. The patch doesn't queue gc-inodes for log writer if they are reused through the vfs inode cache. Here, gc-inode is the inode which buffers blocks to be relocated on GC. That patch queues gc-inodes in nilfs_init_gcinode() function, but this function is not called when they don't have I_NEW flag. Thus, some of live blocks are wrongly overrode without being moved to new logs. This resolves the problem by moving the gc-inode queueing to an outer function to ensure it's done right. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | | | Merge branch 'for_linus' of ↵Linus Torvalds2010-12-15
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix typo which broke '..' detection in ext4_find_entry() ext4: Turn off multiple page-io submission by default
| * | | | ext4: fix typo which broke '..' detection in ext4_find_entry()Aaro Koskinen2010-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There should be a check for the NUL character instead of '0'. Fortunately the only thing that cares about this is NFS serving, which is why we didn't notice this in the merge window testing. Reported-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | | ext4: Turn off multiple page-io submission by defaultTheodore Ts'o2010-12-14
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jon Nelson has found a test case which causes postgresql to fail with the error: psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581 Under memory pressure, it looks like part of a file can end up getting replaced by zero's. Until we can figure out the cause, we'll roll back the change and use block_write_full_page() instead of ext4_bio_write_page(). The new, more efficient writing function can be used via the mount option mblk_io_submit, so we can test and fix the new page I/O code. To reproduce the problem, install postgres 8.4 or 9.0, and pin enough memory such that the system just at the end of triggering writeback before running the following sql script: begin; create temporary table foo as select x as a, ARRAY[x] as b FROM generate_series(1, 10000000 ) AS x; create index foo_a_idx on foo (a); create index foo_b_idx on foo USING GIN (b); rollback; If the temporary table is created on a hard drive partition which is encrypted using dm_crypt, then under memory pressure, approximately 30-40% of the time, pgsql will issue the above failure. This patch should fix this problem, and the problem will come back if the file system is mounted with the mblk_io_submit mount option. Reported-by: Jon Nelson <jnelson@jamponi.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* | | | install_special_mapping skips security_file_mmap check.Tavis Ormandy2010-12-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The install_special_mapping routine (used, for example, to setup the vdso) skips the security check before insert_vm_struct, allowing a local attacker to bypass the mmap_min_addr security restriction by limiting the available pages for special mappings. bprm_mm_init() also skips the check, and although I don't think this can be used to bypass any restrictions, I don't see any reason not to have the security check. $ uname -m x86_64 $ cat /proc/sys/vm/mmap_min_addr 65536 $ cat install_special_mapping.s section .bss resb BSS_SIZE section .text global _start _start: mov eax, __NR_pause int 0x80 $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o $ ./install_special_mapping & [1] 14303 $ cat /proc/14303/maps 0000f000-00010000 r-xp 00000000 00:00 0 [vdso] 00010000-00011000 r-xp 00001000 00:19 2453665 /home/taviso/install_special_mapping 00011000-ffffe000 rwxp 00000000 00:00 0 [stack] It's worth noting that Red Hat are shipping with mmap_min_addr set to 4096. Signed-off-by: Tavis Ormandy <taviso@google.com> Acked-by: Kees Cook <kees@ubuntu.com> Acked-by: Robert Swiecki <swiecki@google.com> [ Changed to not drop the error code - akpm ] Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2010-12-14
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: nfsd: Fix possible BUG_ON firing in set_change_info sunrpc: prevent use-after-free on clearing XPT_BUSY
| * | | | nfsd: Fix possible BUG_ON firing in set_change_infoNeil Brown2010-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If vfs_getattr in fill_post_wcc returns an error, we don't set fh_post_change. For NFSv4, this can result in set_change_info triggering a BUG_ON. i.e. fh_post_saved being zero isn't really a bug. So: - instead of BUGging when fh_post_saved is zero, just clear ->atomic. - if vfs_getattr fails in fill_post_wcc, take a copy of i_ctime anyway. This will be used i seg_change_info, but not overly trusted. - While we are there, remove the pointless 'if' statements in set_change_info. There is no harm setting all the values. Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2010-12-14
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: prevent RAID level downgrades when space is low Btrfs: account for missing devices in RAID allocation profiles Btrfs: EIO when we fail to read tree roots Btrfs: fix compiler warnings Btrfs: Make async snapshot ioctl more generic Btrfs: pwrite blocked when writing from the mmaped buffer of the same page Btrfs: Fix a crash when mounting a subvolume Btrfs: fix sync subvol/snapshot creation Btrfs: Fix page leak in compressed writeback path Btrfs: do not BUG if we fail to remove the orphan item for dead snapshots Btrfs: fixup return code for btrfs_del_orphan_item Btrfs: do not do fast caching if we are allocating blocks for tree_root Btrfs: deal with space cache errors better Btrfs: fix use after free in O_DIRECT
| * | | | | Btrfs: prevent RAID level downgrades when space is lowChris Mason2010-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The extent allocator has code that allows us to fill allocations from any available block group, even if it doesn't match the raid level we've requested. This was put in because adding a new drive to a filesystem made with the default mkfs options actually upgrades the metadata from single spindle dup to full RAID1. But, the code also allows us to allocate from a raid0 chunk when we really want a raid1 or raid10 chunk. This can cause big trouble because mkfs creates a small (4MB) raid0 chunk for data and metadata which then goes unused for raid1/raid10 installs. The allocator will happily wander in and allocate from that chunk when things get tight, which is not correct. The fix here is to make sure that we provide duplication when the caller has asked for it. It does all the dups to be any raid level, which preserves the dup->raid1 upgrade abilities. Signed-off-by: Chris Mason <chris.mason@oracle.com>