aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAge
* Merge branch 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linuxLinus Torvalds2010-08-10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux: unistd: add __NR_prlimit64 syscall numbers rlimits: implement prlimit64 syscall rlimits: switch more rlimit syscalls to do_prlimit rlimits: redo do_setrlimit to more generic do_prlimit rlimits: add rlimit64 structure rlimits: do security check under task_lock rlimits: allow setrlimit to non-current tasks rlimits: split sys_setrlimit rlimits: selinux, do rlimits changes under task_lock rlimits: make sure ->rlim_max never grows in sys_setrlimit rlimits: add task_struct to update_rlimit_cpu rlimits: security, add task_struct to setrlimit Fix up various system call number conflicts. We not only added fanotify system calls in the meantime, but asm-generic/unistd.h added a wait4 along with a range of reserved per-architecture system calls.
| * rlimits: implement prlimit64 syscallJiri Slaby2010-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the code to support the sys_prlimit64 syscall which modifies-and-returns the rlim values of a selected process atomically. The first parameter, pid, being 0 means current process. Unlike the current implementation, it is a generic interface, architecture indepentent so that we needn't handle compat stuff anymore. In the future, after glibc start to use this we can deprecate sys_setrlimit and sys_getrlimit in favor to clean up the code finally. It also adds a possibility of changing limits of other processes. We check the user's permissions to do that and if it succeeds, the new limits are propagated online. This is good for large scale applications such as SAP or databases where administrators need to change limits time by time (e.g. on crashes increase core size). And it is unacceptable to restart the service. For safety, all rlim users now either use accessors or doesn't need them due to - locking - the fact a process was just forked and nobody else knows about it yet (and nobody can't thus read/write limits) hence it is safe to modify limits now. The limitation is that we currently stay at ulong internal representation. So the rlim64_is_infinity check is used where value is compared against ULONG_MAX on 32-bit which is the maximum value there. And since internally the limits are held in struct rlimit, converters which are used before and after do_prlimit call in sys_prlimit64 are introduced. Signed-off-by: Jiri Slaby <jslaby@suse.cz>
| * rlimits: switch more rlimit syscalls to do_prlimitJiri Slaby2010-07-16
| | | | | | | | | | | | | | | | After we added more generic do_prlimit, switch sys_getrlimit to that. Also switch compat handling, so we can get rid of ugly __user casts and avoid setting process' address limit to kernel data and back. Signed-off-by: Jiri Slaby <jslaby@suse.cz>
| * rlimits: redo do_setrlimit to more generic do_prlimitJiri Slaby2010-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It now allows also reading of limits. I.e. all read and writes will later use this function. It takes two parameters, new and old limits which can be both NULL. If new is non-NULL, the value in it is set to rlimits. If old is non-NULL, current rlimits are stored there. If both are non-NULL, old are stored prior to setting the new ones, atomically. (Similar to sigaction.) Signed-off-by: Jiri Slaby <jslaby@suse.cz>
| * rlimits: do security check under task_lockJiri Slaby2010-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do security_task_setrlimit under task_lock. Other tasks may change limits under our hands while we are checking limits inside the function. From now on, they can't. Note that all the security work is done under a spinlock here now. Security hooks count with that, they are called from interrupt context (like security_task_kill) and with spinlocks already held (e.g. capable->security_capable). Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: James Morris <jmorris@namei.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
| * rlimits: allow setrlimit to non-current tasksJiri Slaby2010-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add locking to allow setrlimit accept task parameter other than current. Namely, lock tasklist_lock for read and check whether the task structure has sighand non-null. Do all the signal processing under that lock still held. There are some points: 1) security_task_setrlimit is now called with that lock held. This is not new, many security_* functions are called with this lock held already so it doesn't harm (all this security_* stuff does almost the same). 2) task->sighand->siglock (in update_rlimit_cpu) is nested in tasklist_lock. This dependence is already existing. 3) tsk->alloc_lock is nested in tasklist_lock. This is OK too, already existing dependence. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com>
| * rlimits: split sys_setrlimitJiri Slaby2010-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create do_setrlimit from sys_setrlimit and declare do_setrlimit in the resource header. This is the first phase to have generic do_prlimit which allows to be called from read, write and compat rlimits code. The new do_setrlimit also accepts a task pointer to change the limits of. Currently, it cannot be other than current, but this will change with locking later. Also pass tsk->group_leader to security_task_setrlimit to check whether current is allowed to change rlimits of the process and not its arbitrary thread because it makes more sense given that rlimit are per process and not per-thread. Signed-off-by: Jiri Slaby <jslaby@suse.cz>
| * rlimits: make sure ->rlim_max never grows in sys_setrlimitOleg Nesterov2010-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mostly preparation for Jiri's changes, but probably makes sense anyway. sys_setrlimit() checks new_rlim.rlim_max <= old_rlim->rlim_max, but when it takes task_lock() old_rlim->rlim_max can be already lowered. Move this check under task_lock(). Currently this is not important, we can only race with our sub-thread, this means the application is stupid. But when we change the code to allow the update of !current task's limits, it becomes important to make sure ->rlim_max can be lowered "reliably" even if we race with the application doing sys_setrlimit(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
| * rlimits: add task_struct to update_rlimit_cpuJiri Slaby2010-07-16
| | | | | | | | | | | | | | | | Add task_struct as a parameter to update_rlimit_cpu to be able to set rlimit_cpu of different task than current. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Acked-by: James Morris <jmorris@namei.org>
| * rlimits: security, add task_struct to setrlimitJiri Slaby2010-07-16
| | | | | | | | | | | | | | | | | | Add task_struct to task_setrlimit of security_operations to be able to set rlimit of task other than current. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Acked-by: Eric Paris <eparis@redhat.com> Acked-by: James Morris <jmorris@namei.org>
* | Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notifyLinus Torvalds2010-08-10
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits) fanotify: use both marks when possible fsnotify: pass both the vfsmount mark and inode mark fsnotify: walk the inode and vfsmount lists simultaneously fsnotify: rework ignored mark flushing fsnotify: remove global fsnotify groups lists fsnotify: remove group->mask fsnotify: remove the global masks fsnotify: cleanup should_send_event fanotify: use the mark in handler functions audit: use the mark in handler functions dnotify: use the mark in handler functions inotify: use the mark in handler functions fsnotify: send fsnotify_mark to groups in event handling functions fsnotify: Exchange list heads instead of moving elements fsnotify: srcu to protect read side of inode and vfsmount locks fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called fsnotify: use _rcu functions for mark list traversal fsnotify: place marks on object in order of group memory address vfs/fsnotify: fsnotify_close can delay the final work in fput fsnotify: store struct file not struct path ... Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.
| * | fanotify: use both marks when possibleEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | fanotify currently, when given a vfsmount_mark will look up (if it exists) the corresponding inode mark. This patch drops that lookup and uses the mark provided. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: pass both the vfsmount mark and inode markEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | should_send_event() and handle_event() will both need to look up the inode event if they get a vfsmount event. Lets just pass both at the same time since we have them both after walking the lists in lockstep. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: remove group->maskEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | group->mask is now useless. It was originally a shortcut for fsnotify to save on performance. These checks are now redundant, so we remove them. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: cleanup should_send_eventEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | The change to use srcu and walk the object list rather than the global fsnotify_group list means that should_send_event is no longer needed for a number of groups and can be simplified for others. Do that. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | audit: use the mark in handler functionsEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | audit now gets a mark in the should_send_event and handle_event functions. Rather than look up the mark themselves audit should just use the mark it was handed. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: send fsnotify_mark to groups in event handling functionsEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | With the change of fsnotify to use srcu walking the marks list instead of walking the global groups list we now know the mark in question. The code can send the mark to the group's handling functions and the groups won't have to find those marks themselves. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: store struct file not struct pathEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | Al explains that calling dentry_open() with a mnt/dentry pair is only garunteed to be safe if they are already used in an open struct file. To make sure this is the case don't store and use a struct path in fsnotify, always use a struct file. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | sysctl extern cleanup: inotifyDave Young2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extern declarations in sysctl.c should be move to their own head file, and then include them in relavant .c files. Move inotify_table extern declaration to linux/inotify.h Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | dnotify: move dir_notify_enable declarationAlexey Dobriyan2010-07-28
| | | | | | | | | | | | | | | | | | | | | Move dir_notify_enable declaration to where it belongs -- dnotify.h . Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: split generic and inode specific mark codeEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | currently all marking is done by functions in inode-mark.c. Some of this is pretty generic and should be instead done in a generic function and we should only put the inode specific code in inode-mark.c Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fanotify: sys_fanotify_mark declartionEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch simply declares the new sys_fanotify_mark syscall int fanotify_mark(int fanotify_fd, unsigned int flags, u64_mask, int dfd const char *pathname) Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fanotify: fanotify_init syscall declarationEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch defines a new syscall fanotify_init() of the form: int sys_fanotify_init(unsigned int flags, unsigned int event_f_flags, unsigned int priority) This syscall is used to create and fanotify group. This is very similar to the inotify_init() syscall. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: take inode->i_lock inside fsnotify_find_mark_entry()Andreas Gruenbacher2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | All callers to fsnotify_find_mark_entry() except one take and release inode->i_lock around the call. Take the lock inside fsnotify_find_mark_entry() instead. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: rename fsnotify_find_mark_entry to fsnotify_find_markEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | the _entry portion of fsnotify functions is useless. Drop it. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: rename fsnotify_mark_entry to just fsnotify_markEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | The name is long and it serves no real purpose. So rename fsnotify_mark_entry to just fsnotify_mark. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: put inode specific fields in an fsnotify_mark in a unionEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | The addition of marks on vfs mounts will be simplified if the inode specific parts of a mark and the vfsmnt specific parts of a mark are actually in a union so naming can be easy. This patch just implements the inode struct and the union. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: include vfsmount in should_send_event when appropriateEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | To ensure that a group will not duplicate events when it receives it based on the vfsmount and the inode should_send_event test we should distinguish those two cases. We pass a vfsmount to this function so groups can make their own determinations. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: drop mask argument from fsnotify_alloc_groupEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | Nothing uses the mask argument to fsnotify_alloc_group. This patch drops that argument. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | Audit: only set group mask when something is being watchedEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the audit watch group always sets a mask equal to all events it might care about. We instead should only set the group mask if we are actually watching inodes. This should be a perf win when audit watches are compiled in. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: fsnotify_obtain_group should be fsnotify_alloc_groupEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | fsnotify_obtain_group was intended to be able to find an already existing group. Nothing uses that functionality. This just renames it to fsnotify_alloc_group so it is clear what it is doing. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: remove group_num altogetherEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | The original fsnotify interface has a group-num which was intended to be able to find a group after it was added. I no longer think this is a necessary thing to do and so we remove the group_num. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: include data in should_send callsEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | fanotify is going to need to look at file->private_data to know if an event should be sent or not. This passes the data (which might be a file, dentry, inode, or none) to the should_send function calls so fanotify can get that information when available Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: provide the data type to should_send_eventEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fanotify is only interested in event types which contain enough information to open the original file in the context of the fanotify listener. Since fanotify may not want to send events if that data isn't present we pass the data type to the should_send_event function call so fanotify can express its lack of interest. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | inotify: remove inotify in kernel interfaceEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | nothing uses inotify in the kernel, drop it! Signed-off-by: Eric Paris <eparis@redhat.com>
| * | Audit: audit watch init should not be before fsnotify initEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Audit watch init and fsnotify init both use subsys_initcall() but since the audit watch code is linked in before the fsnotify code the audit watch code would be using the fsnotify srcu struct before it was initialized. This patch fixes that problem by moving audit watch init to device_initcall() so it happens after fsnotify is ready. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Eric Paris <eparis@redhat.com> Tested-by : Sachin Sant <sachinp@in.ibm.com>
| * | Audit: split audit watch KconfigEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | Audit watch should depend on CONFIG_AUDIT_SYSCALL and should select FSNOTIFY. This splits the spagetti like mixing of audit_watch and audit_filter code so they can be configured seperately. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | audit: reimplement audit_trees using fsnotify rather than inotifyEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | Simply switch audit_trees from using inotify to using fsnotify for it's inode pinning and disappearing act information. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: allow addition of duplicate fsnotify marksEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows a task to add a second fsnotify mark to an inode for the same group. This mark will be added to the end of the inode's list and this will never be found by the stand fsnotify_find_mark() function. This is useful if a user wants to add a new mark before removing the old one. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | audit: do not get and put just to free a watchEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | deleting audit watch rules is not currently done under audit_filter_mutex. It was done this way because we could not hold the mutex during inotify manipulation. Since we are using fsnotify we don't need to do the extra get/put pair nor do we need the private list on which to store the parents while they are about to be freed. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | audit: redo audit watch locking and refcnt in light of fsnotifyEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | fsnotify can handle mutexes to be held across all fsnotify operations since it deals strickly in spinlocks. This can simplify and reduce some of the audit_filter_mutex taking and dropping. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | audit: convert audit watches to use fsnotify instead of inotifyEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | Audit currently uses inotify to pin inodes in core and to detect when watched inodes are deleted or unmounted. This patch uses fsnotify instead of inotify. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | Audit: clean up the audit_watch splitEric Paris2010-07-28
| | | | | | | | | | | | | | | | | | | | | | | | No real changes, just cleanup to the audit_watch split patch which we done with minimal code changes for easy review. Now fix interfaces to make things work better. Signed-off-by: Eric Paris <eparis@redhat.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2010-08-10
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits) no need for list_for_each_entry_safe()/resetting with superblock list Fix sget() race with failing mount vfs: don't hold s_umount over close_bdev_exclusive() call sysv: do not mark superblock dirty on remount sysv: do not mark superblock dirty on mount btrfs: remove junk sb_dirt change BFS: clean up the superblock usage AFFS: wait for sb synchronization when needed AFFS: clean up dirty flag usage cifs: truncate fallout mbcache: fix shrinker function return value mbcache: Remove unused features add f_flags to struct statfs(64) pass a struct path to vfs_statfs update VFS documentation for method changes. All filesystems that need invalidate_inode_buffers() are doing that explicitly convert remaining ->clear_inode() to ->evict_inode() Make ->drop_inode() just return whether inode needs to be dropped fs/inode.c:clear_inode() is gone fs/inode.c:evict() doesn't care about delete vs. non-delete paths now ... Fix up trivial conflicts in fs/nilfs2/super.c
| * | | pass a struct path to vfs_statfsChristoph Hellwig2010-08-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We'll need the path to implement the flags field for statvfs support. We do have it available in all callers except: - ecryptfs_statfs. This one doesn't actually need vfs_statfs but just needs to do a caller to the lower filesystem statfs method. - sys_ustat. Add a non-exported statfs_by_dentry helper for it which doesn't won't be able to fill out the flags field later on. In addition rename the helpers for statfs vs fstatfs to do_*statfs instead of the misleading vfs prefix. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | gcc-4.6: printk: use stable variable to dump kmsg bufferAndi Kleen2010-08-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kmsg_dump takes care to sample the global variables inside a spinlock, but then goes on to use the same variables outside the spinlock region too. Use the correct variable. This will make the race window smaller. Found by gcc 4.6's new warnings. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | stop_machine: struct cpu_stopper, remove alignment padding on 64 bitsRichard Kennedy2010-08-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reorder elements in structure cpu_stopper to remove alignment padding on 64 bit builds, this shrinks its size from 40 to 32 bytes saving 8 bytes per cpu. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Acked-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | kernel/range: remove unused definition of ARRAY_SIZE()Geert Uytterhoeven2010-08-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove duplicate definition of ARRAY_SIZE(), which was never used anyway. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | sys_personality: remove the bogus checks in ↵Oleg Nesterov2010-08-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sys_personality()->__set_personality() path Cleanup, no functional changes. - __set_personality() always changes ->exec_domain/personality, the special case when ->exec_domain remains the same buys nothing but complicates the code. Unify both cases to simplify the code. - The -EINVAL check in sys_personality() was never right. If we assume that set_personality() can fail we should check the value it returns instead of verifying that task->personality was actually changed. Remove it. Before the previous patch it was possible to hit this case due to overflow problems, but this -EINVAL just indicated the kernel bug. OTOH, probably it makes sense to change lookup_exec_domain() to return ERR_PTR() instead of default_exec_domain if the search in exec_domains list fails, and report this error to the user-space. But this means another user-space change, and we have in-kernel users which need fixes. For example, PER_OSF4 falls into PER_MASK for unkown reason and nobody cares to register this domain. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Wenming Zhang <wezhang@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | hibernation: freeze swap at hibernationKAMEZAWA Hiroyuki2010-08-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When taking a memory snapshot in hibernate_snapshot(), all (directly called) memory allocations use GFP_ATOMIC. Hence swap misusage during hibernation never occurs. But from a pessimistic point of view, there is no guarantee that no page allcation has __GFP_WAIT. It is better to have a global indication "we enter hibernation, don't use swap!". This patch tries to freeze new-swap-allocation during hibernation. (All user processes are frozenm so swapin is not a concern). This way, no updates will happen to swap_map[] between hibernate_snapshot() and save_image(). Swap is thawed when swsusp_free() is called. We can be assured that swap corruption will not occur. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Hugh Dickins <hughd@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Ondrej Zary <linux@rainbow-software.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>