aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAge
...
| * | | | | | | | | | | | | | proc: make proc_fd_permission() thread-friendlyOleg Nesterov2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | proc_fd_permission() says "process can still access /proc/self/fd after it has executed a setuid()", but the "task_pid() = proc_pid() check only helps if the task is group leader, /proc/self points to /proc/<leader-pid>. Change this check to use task_tgid() so that the whole thread group can access its /proc/self/fd or /proc/<tid-of-sub-thread>/fd. Notes: - CLONE_THREAD does not require CLONE_FILES so task->files can differ, but I don't think this can lead to any security problem. And this matches same_thread_group() in __ptrace_may_access(). - /proc/self should probably point to /proc/<thread-tid>, but it is too late to change the rules. Perhaps it makes sense to add /proc/thread though. Test-case: void *tfunc(void *arg) { assert(opendir("/proc/self/fd")); return NULL; } int main(void) { pthread_t t; pthread_create(&t, NULL, tfunc, NULL); pthread_join(t, NULL); return 0; } fails if, say, this executable is not readable and suid_dumpable = 0. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | fs/proc/task_mmu.c: check the return value of mpol_to_str()Chen Gang2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mpol_to_str() may fail, and not fill the buffer (e.g. -EINVAL), so need check about it, or buffer may not be zero based, and next seq_printf() will cause issue. The failure return need after mpol_cond_put() to match get_vma_policy(). Signed-off-by: Chen Gang <gang.chen@asianux.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | fs/file_table.c:fput(): make comment more truthfulAndrew Morton2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | coredump: add new %P variable in core_patternStéphane Graber2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new %P variable to be used in core_pattern. This variable contains the global PID (PID in the init namespace) as %p contains the PID in the current namespace which isn't always what we want. The main use for this is to make it easier to handle crashes that happened within a container. With that new variables it's possible to have the crashes dumped into the container or forwarded to the host with the right PID (from the host's point of view). Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Reported-by: Hans Feldt <hans.feldt@ericsson.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andy Whitcroft <apw@canonical.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | hfsplus: integrate POSIX ACLs support into driverVyacheslav Dubeyko2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Integrate implemented POSIX ACLs support into hfsplus driver. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | hfsplus: implement POSIX ACLs supportVyacheslav Dubeyko2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement POSIX ACLs support in hfsplus driver. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | hfsplus: add necessary declarations for POSIX ACLs supportVyacheslav Dubeyko2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patchset implements POSIX ACLs support in hfsplus driver. Mac OS X beginning with version 10.4 ("Tiger") support NFSv4 ACLs, which are part of the NFSv4 standard. HFS+ stores ACLs in the form of specially named extended attributes (com.apple.system.Security). But this patchset doesn't use "com.apple.system.Security" extended attributes. It implements support of POSIX ACLs in the form of extended attributes with names "system.posix_acl_access" and "system.posix_acl_default". These xattrs are treated only under Linux. POSIX ACLs doesn't mean something under Mac OS X. Thereby, this patch set provides opportunity to use POSIX ACLs under Linux on HFS+ filesystem. This patch: Add CONFIG_HFSPLUS_FS_POSIX_ACL kernel configuration option, DBG_ACL_MOD debugging flag and acl.h file with declaration of essential functions for support POSIX ACLs in hfsplus driver. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | epoll: add a reschedule point in ep_free()Eric Dumazet2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ep_free() might iterate on a huge set of epitems and hold cpu too long. Add two cond_resched() in order to yield cpu to other tasks. This is safe as we only hold mutexes in this function. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Acked-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | fs/bio-integrity: fix a potential mem leakGu Zheng2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Free the bio_integrity_pool in the fail path of biovec_create_pool in function bioset_integrity_create(). Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | writeback: fix race that cause writeback hungJunxiao Bi2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race between mark inode dirty and writeback thread, see the following scenario. In this case, writeback thread will not run though there is dirty_io. __mark_inode_dirty() bdi_writeback_workfn() ... ... spin_lock(&inode->i_lock); ... if (bdi_cap_writeback_dirty(bdi)) { <<< assume wb has dirty_io, so wakeup_bdi is false. <<< the following inode_dirty also have wakeup_bdi false. if (!wb_has_dirty_io(&bdi->wb)) wakeup_bdi = true; } spin_unlock(&inode->i_lock); <<< assume last dirty_io is removed here. pages_written = wb_do_writeback(wb); ... <<< work_list empty and wb has no dirty_io, <<< delayed_work will not be queued. if (!list_empty(&bdi->work_list) || (wb_has_dirty_io(wb) && dirty_writeback_interval)) queue_delayed_work(bdi_wq, &wb->dwork, msecs_to_jiffies(dirty_writeback_interval * 10)); spin_lock(&bdi->wb.list_lock); inode->dirtied_when = jiffies; <<< new dirty_io is added. list_move(&inode->i_wb_list, &bdi->wb.b_dirty); spin_unlock(&bdi->wb.list_lock); <<< though there is dirty_io, but wakeup_bdi is false, <<< so writeback thread will not be waked up and <<< the new dirty_io will not be flushed. if (wakeup_bdi) bdi_wakeup_thread_delayed(bdi); Writeback will run until there is a new flush work queued. This may cause a lot of dirty pages stay in memory for a long time. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | mm/page-writeback.c: add strictlimit featureMaxim Patlasov2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The feature prevents mistrusted filesystems (ie: FUSE mounts created by unprivileged users) to grow a large number of dirty pages before throttling. For such filesystems balance_dirty_pages always check bdi counters against bdi limits. I.e. even if global "nr_dirty" is under "freerun", it's not allowed to skip bdi checks. The only use case for now is fuse: it sets bdi max_ratio to 1% by default and system administrators are supposed to expect that this limit won't be exceeded. The feature is on if a BDI is marked by BDI_CAP_STRICTLIMIT flag. A filesystem may set the flag when it initializes its BDI. The problematic scenario comes from the fact that nobody pays attention to the NR_WRITEBACK_TEMP counter (i.e. number of pages under fuse writeback). The implementation of fuse writeback releases original page (by calling end_page_writeback) almost immediately. A fuse request queued for real processing bears a copy of original page. Hence, if userspace fuse daemon doesn't finalize write requests in timely manner, an aggressive mmap writer can pollute virtually all memory by those temporary fuse page copies. They are carefully accounted in NR_WRITEBACK_TEMP, but nobody cares. To make further explanations shorter, let me use "NR_WRITEBACK_TEMP problem" as a shortcut for "a possibility of uncontrolled grow of amount of RAM consumed by temporary pages allocated by kernel fuse to process writeback". The problem was very easy to reproduce. There is a trivial example filesystem implementation in fuse userspace distribution: fusexmp_fh.c. I added "sleep(1);" to the write methods, then recompiled and mounted it. Then created a huge file on the mount point and run a simple program which mmap-ed the file to a memory region, then wrote a data to the region. An hour later I observed almost all RAM consumed by fuse writeback. Since then some unrelated changes in kernel fuse made it more difficult to reproduce, but it is still possible now. Putting this theoretical happens-in-the-lab thing aside, there is another thing that really hurts real world (FUSE) users. This is write-through page cache policy FUSE currently uses. I.e. handling write(2), kernel fuse populates page cache and flushes user data to the server synchronously. This is excessively suboptimal. Pavel Emelyanov's patches ("writeback cache policy") solve the problem, but they also make resolving NR_WRITEBACK_TEMP problem absolutely necessary. Otherwise, simply copying a huge file to a fuse mount would result in memory starvation. Miklos, the maintainer of FUSE, believes strictlimit feature the way to go. And eventually putting FUSE topics aside, there is one more use-case for strictlimit feature. Using a slow USB stick (mass storage) in a machine with huge amount of RAM installed is a well-known pain. Let's make simple computations. Assuming 64GB of RAM installed, existing implementation of balance_dirty_pages will start throttling only after 9.6GB of RAM becomes dirty (freerun == 15% of total RAM). So, the command "cp 9GB_file /media/my-usb-storage/" may return in a few seconds, but subsequent "umount /media/my-usb-storage/" will take more than two hours if effective throughput of the storage is, to say, 1MB/sec. After inclusion of strictlimit feature, it will be trivial to add a knob (e.g. /sys/devices/virtual/bdi/x:y/strictlimit) to enable it on demand. Manually or via udev rule. May be I'm wrong, but it seems to be quite a natural desire to limit the amount of dirty memory for some devices we are not fully trust (in the sense of sustainable throughput). [akpm@linux-foundation.org: fix warning in page-writeback.c] Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Cc: Jan Kara <jack@suse.cz> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | mm/writeback: make writeback_inodes_wb staticWanpeng Li2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's not used globally and could be static. Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | mm: track vma changes with VM_SOFTDIRTY bitCyrill Gorcunov2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pavel reported that in case if vma area get unmapped and then mapped (or expanded) in-place, the soft dirty tracker won't be able to recognize this situation since it works on pte level and ptes are get zapped on unmap, loosing soft dirty bit of course. So to resolve this situation we need to track actions on vma level, there VM_SOFTDIRTY flag comes in. When new vma area created (or old expanded) we set this bit, and keep it here until application calls for clearing soft dirty bit. Thus when user space application track memory changes now it can detect if vma area is renewed. Reported-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | writeback: fix occasional slow sync(1)Jan Kara2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case when system contains no dirty pages, wakeup_flusher_threads() will submit WB_SYNC_NONE writeback for 0 pages so wb_writeback() exits immediately without doing anything, even though there are dirty inodes in the system. Thus sync(1) will write all the dirty inodes from a WB_SYNC_ALL writeback pass which is slow. Fix the problem by using get_nr_dirty_pages() in wakeup_flusher_threads() instead of calculating number of dirty pages manually. That function also takes number of dirty inodes into account. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Paul Taysom <taysom@chromium.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: fix the end cluster offset of FIEMAPJie Liu2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call fiemap ioctl(2) with given start offset as well as an desired mapping range should show extents if possible. However, we somehow figure out the end offset of mapping via 'mapping_end -= cpos' before iterating the extent records which would cause problems if the given fiemap length is too small to a cluster size, e.g, Cluster size 4096: debugfs.ocfs2 1.6.3 Block Size Bits: 12 Cluster Size Bits: 12 The extended fiemap test utility From David: https://gist.github.com/anonymous/6172331 # dd if=/dev/urandom of=/ocfs2/test_file bs=1M count=1000 # ./fiemap /ocfs2/test_file 4096 10 start: 4096, length: 10 File /ocfs2/test_file has 0 extents: # Logical Physical Length Flags ^^^^^ <-- No extent is shown In this case, at ocfs2_fiemap(): cpos == mapping_end == 1. Hence the loop of searching extent records was not executed at all. This patch remove the in question 'mapping_end -= cpos', and loops until the cpos is larger than the mapping_end as usual. # ./fiemap /ocfs2/test_file 4096 10 start: 4096, length: 10 File /ocfs2/test_file has 1 extents: # Logical Physical Length Flags 0: 0000000000000000 0000000056a01000 0000000006a00000 0000 Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reported-by: David Weber <wb@munzinger.de> Tested-by: David Weber <wb@munzinger.de> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Mark Fashen <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: remove unused variable ip in dlmfs_get_root_inode()Joseph Qi2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Variable ip in dlmfs_get_root_inode() is defined but not used. So clean it up. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: fix a tiny race case when firing callbacksJoyce2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In o2hb_shutdown_slot() and o2hb_check_slot(), since event is defined as local, it is only valid during the call stack. So the following tiny race case may happen in a multi-volumes mounted environment: o2hb-vol1 o2hb-vol2 1) o2hb_shutdown_slot allocate local event1 2) queue_node_event add event1 to global o2hb_node_events 3) o2hb_shutdown_slot allocate local event2 4) queue_node_event add event2 to global o2hb_node_events 5) o2hb_run_event_list delete event1 from o2hb_node_events 6) o2hb_run_event_list event1 empty, return 7) o2hb_shutdown_slot event1 lifecycle ends 8) o2hb_fire_callbacks event1 is already *invalid* This patch lets it wait on o2hb_callback_sem when another thread is firing callbacks. And for performance consideration, we only call o2hb_run_event_list when there is an event queued. Signed-off-by: Joyce <xuejiufei@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: avoid possible NULL pointer dereference in o2net_accept_one()Joseph Qi2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since o2nm_get_node_by_num() may return NULL, we add this check in o2net_accept_one() to avoid possible NULL pointer dereference. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: adjust code style for o2net_handler_tree_lookup()Joseph Qi2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code in o2net_handler_tree_lookup() may be corrupted by mistake. So adjust it to promote readability. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: free path in ocfs2_remove_inode_range()Younger Liu2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ocfs2_remove_inode_range(), there is a memory leak. The variable path has allocated memory with ocfs2_new_path_from_et(), but it is not free. Signed-off-by: Younger Liu <younger.liu@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: fix possible double free in ocfs2_reflink_xattr_recJoseph Qi2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ocfs2_reflink_xattr_rec(), meta_ac and data_ac are allocated by calling ocfs2_lock_reflink_xattr_rec_allocators(). Once an error occurs when allocating *data_ac, it frees *meta_ac which is allocated before. Here it mistakenly sets meta_ac to NULL but *meta_ac. Then ocfs2_reflink_xattr_rec() will try to free meta_ac again which is already invalid. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2/dlm: force clean refmap when doing local cleanupXue jiufei2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dlm_do_local_recovery_cleanup() should force clean refmap if the owner of lockres is UNKNOWN. Otherwise node may hang when umounting filesystems. Here's the situation: Node1 Node2 dlmlock() -> dlm_get_lock_resource() send DLM_MASTER_REQUEST_MSG to other nodes. trying to master this lockres, return MAYBE. selected as the master of lockresA, set mle->master to Node1, and do assert_master, send DLM_ASSERT_MASTER_MSG to Node2. Node 2 has interest on lockresA and return DLM_ASSERT_RESPONSE_MASTERY_REF then something happened and Node2 crashed. Receiving DLM_ASSERT_RESPONSE_MASTERY_REF, set Node2 into refmap, and keep sending DLM_ASSERT_MASTER_MSG to other nodes o2hb found node2 down, calling dlm_hb_node_down() --> dlm_do_local_recovery_cleanup() the master of lockresA is still UNKNOWN, no need to call dlm_free_dead_locks(). Set the master of lockresA to Node1, but Node2 stills remains in refmap. When Node1 umount, it found that the refmap of lockresA is not empty and attempted to migrate it to Node2, But Node2 is already down, so umount hang, trying to migrate lockresA again and again. Signed-off-by: joyce <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: free meta_ac and data_ac when ocfs2_start_trans fails in ↵Younger Liu2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ocfs2_xattr_set() In ocfs2_xattr_set(), if ocfs2_start_trans failed, meta_ac and data_ac should be free. Otherwise, It would lead to a memory leak. Signed-off-by: Younger Liu <younger.liu@huawei.com> Cc: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: add the missing return value check of ocfs2_xattr_get_clustersJoseph Qi2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ocfs2_xattr_value_attach_refcount(), if error occurs when calling ocfs2_xattr_get_clusters(), it will go with unexpected behavior since local variables p_cluster, num_clusters and ext_flags are declared without initialization. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: fix a memory leak in __ocfs2_move_extents()Jie Liu2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ocfs2 path is not properly freed which leads to a memory leak at __ocfs2_move_extents(). This patch stops the leaks of the ocfs2_path structure. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Younger Liu <younger.liu@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: add missing return value check of ocfs2_get_clusters()Joseph Qi2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ocfs2_attach_refcount_tree() and ocfs2_duplicate_extent_list(), if error occurs when calling ocfs2_get_clusters(), it will go with unexpected behavior as local variables p_cluster, num_clusters and ext_flags are declared without initialization. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: clean up dead code in ocfs2_acl_from_xattr()Joseph Qi2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ocfs2_acl_from_xattr(), if size is less than sizeof(struct posix_acl_entry), it returns ERR_PTR(-EINVAL) directly. Then assign (size / sizeof(struct posix_acl_entry)) to count which will be at least 1, that means the following branch (count < 0) and (count == 0) will never be true. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: use list_for_each_entry() instead of list_for_each()Dong Fang2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [dan.carpenter@oracle.com: fix up some NULL dereference bugs] Signed-off-by: Dong Fang <yp.fangdong@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | fs/ocfs2/cluster/tcp.c: fix possible null pointer dereferencesSunil Mushran2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix some possible null pointer dereferences that were detected by the static code analyser, smatch. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Reported-by: Dan Carpenter <error27@gmail.com> Reported-by: Guozhonghua <guozhonghua@h3c.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: ac_bits_wanted should be local_alloc_bits when returns -ENOSPCYounger Liu2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is an issue in reserving and claiming space for localalloc, When localalloc space is not enough, it would claim space from global_bitmap. And if there is not enough free space in global_bitmap, the size of claiming space would set to half of orignal size and retry. The issue is as follows: osb->local_alloc_bits is set to half of orignal size in ocfs2_recalc_la_window(), but ac->ac_bits_wanted is set to osb->local_alloc_default_bits which is not changed. localalloc always reserves and claims local_alloc_default_bits space and returns ENOSPC. So, ac->ac_bits_wanted should be osb->local_alloc_bits which would be changed. Signed-off-by: Younger Liu <younger.liu@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: dlm_request_all_locks() should deal with the status sent from target nodeXue jiufei2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dlm_request_all_locks() should deal with the status sent from target node if DLM_LOCK_REQUEST_MSG is sent successfully, or recovery master will fall into endless loop, waiting for other nodes to send locks and DLM_RECO_DATA_DONE_MSG to me. NodeA NodeB selected as recovery master dlm_remaster_locks() ->dlm_request_all_locks() send DLM_LOCK_REQUEST_MSG to nodeA It happened that NodeA cannot alloc memory when it processes this message. dlm_request_all_locks_handler() do not queue dlm_request_all_locks_worker and returns -ENOMEM. It will never send locks and DLM_RECO_DATA_DONE_MSG to NodeB. NodeB do not deal with the status sent from nodeA, and will fall in endless loop waiting for the recovery state of NodeA to be changed. Signed-off-by: joyce <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Jeff Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: use i_size_read() to access i_sizeJunxiao Bi2013-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Though ocfs2 uses inode->i_mutex to protect i_size, there are both i_size_read/write() and direct accesses. Clean up all direct access to eliminate confusion. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | | | | | | ocfs2: lighten up allocate transactionYounger Liu2013-09-11
| | |_|_|_|_|_|_|_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue scenario is as following: When fallocating a very large disk space for a small file, __ocfs2_extend_allocation attempts to get a very large transaction. For some journal sizes, there may be not enough room for this transaction, and the fallocate will fail. The patch below extends & restarts the transaction as necessary while allocating space, and should work with even the smallest journal. This patch refers ext4 resize. Test: # mkfs.ocfs2 -b 4K -C 32K -T datafiles /dev/sdc ...(jounral size is 32M) # mount.ocfs2 /dev/sdc /mnt/ocfs2/ # touch /mnt/ocfs2/1.log # fallocate -o 0 -l 400G /mnt/ocfs2/1.log fallocate: /mnt/ocfs2/1.log: fallocate failed: Cannot allocate memory # tail -f /var/log/messages [ 7372.278591] JBD: fallocate wants too many credits (2051 > 2048) [ 7372.278597] (fallocate,6438,0):__ocfs2_extend_allocation:709 ERROR: status = -12 [ 7372.278603] (fallocate,6438,0):ocfs2_allocate_unwritten_extents:1504 ERROR: status = -12 [ 7372.278607] (fallocate,6438,0):__ocfs2_change_file_space:1955 ERROR: status = -12 ^C With this patch, the test works well. Signed-off-by: Younger Liu <younger.liu@huawei.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | | | | Merge tag 'for-linus-3.12-merge' of ↵Linus Torvalds2013-09-11
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull 9p updates from Eric Van Hensbergen: "Minor 9p fixes and tweaks for 3.12 merge window The first fixes namespace issues which causes a kernel NULL pointer dereference, the second fixes uevent handling to work better with udev, and the third switches some code to use srlcpy instead of strncpy in order to be safer. All changes have been baking in for-next for at least 2 weeks" * tag 'for-linus-3.12-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: fs/9p: avoid accessing utsname after namespace has been torn down 9p: send uevent after adding/removing mount_tag attribute fs: 9p: use strlcpy instead of strncpy
| * | | | | | | | | | | | | | fs/9p: avoid accessing utsname after namespace has been torn downWill Deacon2013-08-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During trinity fuzzing in a kvmtool guest, I stumbled across the following: Unable to handle kernel NULL pointer dereference at virtual address 00000004 PC is at v9fs_file_do_lock+0xc8/0x1a0 LR is at v9fs_file_do_lock+0x48/0x1a0 [<c01e2ed0>] (v9fs_file_do_lock+0xc8/0x1a0) from [<c0119154>] (locks_remove_flock+0x8c/0x124) [<c0119154>] (locks_remove_flock+0x8c/0x124) from [<c00d9bf0>] (__fput+0x58/0x1e4) [<c00d9bf0>] (__fput+0x58/0x1e4) from [<c0044340>] (task_work_run+0xac/0xe8) [<c0044340>] (task_work_run+0xac/0xe8) from [<c002e36c>] (do_exit+0x6bc/0x8d8) [<c002e36c>] (do_exit+0x6bc/0x8d8) from [<c002e674>] (do_group_exit+0x3c/0xb0) [<c002e674>] (do_group_exit+0x3c/0xb0) from [<c002e6f8>] (__wake_up_parent+0x0/0x18) I believe this is due to an attempt to access utsname()->nodename, after exit_task_namespaces() has been called, leaving current->nsproxy->uts_ns as NULL and causing the above dereference. A similar issue was fixed for lockd in 9a1b6bf818e7 ("LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs"), so this patch attempts something similar for 9pfs. Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | | | | | | | | | | | | | fs: 9p: use strlcpy instead of strncpyChen Gang2013-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For 'NULL' terminated string, recommend always to be ended by zero. Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* | | | | | | | | | | | | | | Merge tag 'squashfs-updates' of ↵Linus Torvalds2013-09-11
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | |_|/ / / / / / / / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next Pull squashfs updates from Phillip Lougher: "A couple of minor additional sanity check patches for corrupted information, and some fixes. Apart from that there's a minor loop optimisation. These sanity checks mainly exist to trap maliciously corrupted filesystems either through using a deliberately modified mksquashfs, or where the user has deliberately chosen to generate uncompressed metadata and then corrupted it. Normally metadata in Squashfs filesystems is compressed, which means corruption (either accidental or malicious) is detected when trying to decompress the metadata. So corrupted data does not normally get as far as the code paths in question here" * tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next: Squashfs: add corruption check for type in squashfs_readdir() Squashfs: add corruption check in get_dir_index_using_offset() Squashfs: fix corruption checks in squashfs_readdir() Squashfs: fix corruption checks in squashfs_lookup() Squashfs: fix corruption check in get_dir_index_using_name() Squashfs: Optimized uncompressed buffer loop Squashfs: sanity check information from disk
| * | | | | | | | | | | | | | Squashfs: add corruption check for type in squashfs_readdir()Phillip Lougher2013-09-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We read the type field from disk. This value should be sanity checked for correctness to avoid an out of bounds access when reading the squashfs_filetype_table array. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
| * | | | | | | | | | | | | | Squashfs: add corruption check in get_dir_index_using_offset()Phillip Lougher2013-09-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We read the size (of the name) field from disk. This value should be sanity checked for correctness to avoid blindly reading huge amounts of unnecessary data from disk on corruption. Note, here we're not actually reading the name into a buffer, but skipping it, and so corruption doesn't cause buffer overflow, merely lots of unnecessary amounts of data to be read. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
| * | | | | | | | | | | | | | Squashfs: fix corruption checks in squashfs_readdir()Phillip Lougher2013-09-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dir_count and size fields when read from disk are sanity checked for correctness. However, the sanity checks only check the values are not greater than expected. As dir_count and size were incorrectly defined as signed ints, this can lead to corrupted values appearing as negative which are not trapped. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
| * | | | | | | | | | | | | | Squashfs: fix corruption checks in squashfs_lookup()Phillip Lougher2013-09-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dir_count and size fields when read from disk are sanity checked for correctness. However, the sanity checks only check the values are not greater than expected. As dir_count and size were incorrectly defined as signed ints, this can lead to corrupted values appearing as negative which are not trapped. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
| * | | | | | | | | | | | | | Squashfs: fix corruption check in get_dir_index_using_name()Phillip Lougher2013-09-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch "Squashfs: sanity check information from disk" from Dan Carpenter adds a missing check for corruption in the "size" field while reading the directory index from disk. It, however, sets err to -EINVAL, this value is not used later, and so setting it is completely redundant. So remove it. Errors in reading the index are deliberately non-fatal. If we get an error in reading the index we just return the part of the index we have managed to read - the index isn't essential, just quicker. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
| * | | | | | | | | | | | | | Squashfs: Optimized uncompressed buffer loopManish Sharma2013-09-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merged the two for loops. We might get a little gain by overlapping wait_on_bh and the memcpy operations. Signed-off-by: Manish Sharma <manishrma@gmail.com> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
| * | | | | | | | | | | | | | Squashfs: sanity check information from diskDan Carpenter2013-08-28
| | |_|_|/ / / / / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We read the size of the name from the disk, but a larger name than expected would cause memory corruption. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
* | | | | | | | | | | | | | Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2013-09-10
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ | |_|_|_|_|_|/ / / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "This was a very quiet cycle! Just a few bugfixes and some cleanup" * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: rpc: let xdr layer allocate gssproxy receieve pages rpc: fix huge kmalloc's in gss-proxy rpc: comment on linux_cred encoding, treat all as unsigned rpc: clean up decoding of gssproxy linux creds svcrpc: remove unused rq_resused nfsd4: nfsd4_create_clid_dir prints uninitialized data nfsd4: fix leak of inode reference on delegation failure Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR" sunrpc: prepare NFS for 2038 nfsd4: fix setlease error return nfsd: nfs4_file_get_access: need to be more careful with O_RDWR
| * | | | | | | | | | | | | nfsd4: nfsd4_create_clid_dir prints uninitialized dataJ. Bruce Fields2013-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Take the easy way out and just remove the printk. Reported-by: David Howells <dhowells@redhat.com>
| * | | | | | | | | | | | | nfsd4: fix leak of inode reference on delegation failureJ. Bruce Fields2013-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a regression from 68a3396178e6688ad7367202cdf0af8ed03c8727 "nfsd4: shut down more of delegation earlier". After that commit, nfs4_set_delegation() failures result in nfs4_put_delegation being called, but nfs4_put_delegation doesn't free the nfs4_file that has already been set by alloc_init_deleg(). This can result in an oops on later unmounting the exported filesystem. Note also delaying the fi_had_conflict check we're able to return a better error (hence give 4.1 clients a better idea why the delegation failed; though note CONFLICT isn't an exact match here, as that's supposed to indicate a current conflict, but all we know here is that there was one recently). Reported-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR"J. Bruce Fields2013-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit df66e75395c839c3a373bae897dbb1248f741b45. nfsd4_lock can get a read-only or write-only reference when only a read-write open is available. This is normal. Cc: Harshula Jayasuriya <harshula@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | Merge tag 'v3.11-rc5' into for-3.12 branchJ. Bruce Fields2013-08-30
| |\ \ \ \ \ \ \ \ \ \ \ \ \ | | | |_|_|_|_|_|_|_|/ / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For testing purposes I want some nfs and nfsd bugfixes (specifically, 58cd57bfd9db3bc213bf9d6a10920f82095f0114 and previous nfsd patches, and Trond's 4f3cc4809a98a165a9708b72b47de71643797bbd).
| * | | | | | | | | | | | | nfsd4: fix setlease error returnJ. Bruce Fields2013-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This actually makes a difference in the 4.1 case, since we use the status to decide what reason to give the client for the delegation refusal (see nfsd4_open_deleg_none_ext), and in theory a client might choose suboptimal behavior if we give the wrong answer. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>