aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Merge branch 'for-linus' of ↵Linus Torvalds2012-10-12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull third pile of kernel_execve() patches from Al Viro: "The last bits of infrastructure for kernel_thread() et.al., with alpha/arm/x86 use of those. Plus sanitizing the asm glue and do_notify_resume() on alpha, fixing the "disabled irq while running task_work stuff" breakage there. At that point the rest of kernel_thread/kernel_execve/sys_execve work can be done independently for different architectures. The only pending bits that do depend on having all architectures converted are restrictred to fs/* and kernel/* - that'll obviously have to wait for the next cycle. I thought we'd have to wait for all of them done before we start eliminating the longjump-style insanity in kernel_execve(), but it turned out there's a very simple way to do that without flagday-style changes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: alpha: switch to saner kernel_execve() semantics arm: switch to saner kernel_execve() semantics x86, um: convert to saner kernel_execve() semantics infrastructure for saner ret_from_kernel_thread semantics make sure that kernel_thread() callbacks call do_exit() themselves make sure that we always have a return path from kernel_execve() ppc: eeh_event should just use kthread_run() don't bother with kernel_thread/kernel_execve for launching linuxrc alpha: get rid of switch_stack argument of do_work_pending() alpha: don't bother passing switch_stack separately from regs alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c alpha: simplify TIF_NEED_RESCHED handling
| * alpha: switch to saner kernel_execve() semanticsAl Viro2012-10-12
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * arm: switch to saner kernel_execve() semanticsAl Viro2012-10-12
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * x86, um: convert to saner kernel_execve() semanticsAl Viro2012-10-12
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * infrastructure for saner ret_from_kernel_thread semanticsAl Viro2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * allow kernel_execve() leave the actual return to userland to caller (selected by CONFIG_GENERIC_KERNEL_EXECVE). Callers updated accordingly. * architecture that does select GENERIC_KERNEL_EXECVE in its Kconfig should have its ret_from_kernel_thread() do this: call schedule_tail call the callback left for it by copy_thread(); if it ever returns, that's because it has just done successful kernel_execve() jump to return from syscall IOW, its only difference from ret_from_fork() is that it does call the callback. * such an architecture should also get rid of ret_from_kernel_execve() and __ARCH_WANT_KERNEL_EXECVE This is the last part of infrastructure patches in that area - from that point on work on different architectures can live independently. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * make sure that kernel_thread() callbacks call do_exit() themselvesAl Viro2012-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of them never returned anyway - only two functions had to be changed. That allows to simplify their callers a whole lot. Note that this does *not* apply to kthread_run() callbacks - all of those had been called from the same kernel_thread() callback, which did do_exit() already. This is strictly about very few low-level kernel_thread() callbacks (there are only 6 of those, mostly as part of kthread.h and kmod.h exported mechanisms, plus kernel_init() itself). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * make sure that we always have a return path from kernel_execve()Al Viro2012-10-11
| | | | | | | | | | | | | | | | | | | | | | The only place where kernel_execve() is called without a way to return to the caller of kernel_thread() callback is kernel_post(). Reorganize kernel_init()/kernel_post() - instead of the former calling the latter in the end (and getting freed by it), have the latter *begin* with calling the former (and turn the latter into kernel_thread() callback, of course). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * ppc: eeh_event should just use kthread_run()Al Viro2012-10-11
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * don't bother with kernel_thread/kernel_execve for launching linuxrcAl Viro2012-10-11
| | | | | | | | | | | | exec_usermodehelper_fns() will do just fine... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * alpha: get rid of switch_stack argument of do_work_pending()Al Viro2012-10-11
| | | | | | | | | | | | ... and now the asm glue side of that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * alpha: don't bother passing switch_stack separately from regsAl Viro2012-10-11
| | | | | | | | | | | | | | | | It's needed only in setup_sigcontext() and it's always reg - <constant>; no point passing it all way down through the call chain. This is just the signal.c side of that stuff; next will come the asm glue one... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.cAl Viro2012-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | Turn the slow side of work_pending into C function, including all the looping. What we get out of that: * we do _not_ call get_signal_to_deliver() with IRQs disabled anymore * no need to save/restore volatiles on each pass if there turns to be more than one (unlikely, but still) * all double-restart prevention is in C now. * glue gets simpler. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * alpha: simplify TIF_NEED_RESCHED handlingAl Viro2012-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case we have both NEED_RESCHED and SIGPENDING/NOTIFY_RESUME, handle the latter first. We'll get to original priorities in the next commit, but now that allows to simplify the treatment of NEED_RESCHED-only case nicely. Namely, now there no need to preserve the data for restarts across the call of schedule() in $work_resched; we can get there only if we had either returned from syscall without SIGPENDING (in which case we should've had no restart-worthy return value and want no restarts) or already got through do_notify_resume() call (in which case we want no restarts anymore). So we can just slap 0 into $19 instead of preserving it (and $20). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'for-linus' of ↵Linus Torvalds2012-10-12
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull third pile of VFS updates from Al Viro: "Stuff from Jeff Layton, mostly. Sanitizing interplay between audit and namei, removing a lot of insanity from audit_inode() mess and getting things ready for his ESTALE patchset." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: procfs: don't need a PATH_MAX allocation to hold a string representation of an int vfs: embed struct filename inside of names_cache allocation if possible audit: make audit_inode take struct filename vfs: make path_openat take a struct filename pointer vfs: turn do_path_lookup into wrapper around struct filename variant audit: allow audit code to satisfy getname requests from its names_list vfs: define struct filename and have getname() return it vfs: unexport getname and putname symbols acct: constify the name arg to acct_on vfs: allocate page instead of names_cache buffer in mount_block_root audit: overhaul __audit_inode_child to accomodate retrying audit: optimize audit_compare_dname_path audit: make audit_compare_dname_path use parent_len helper audit: remove dirlen argument to audit_compare_dname_path audit: set the name_len in audit_inode for parent lookups audit: add a new "type" field to audit_names struct audit: reverse arguments to audit_inode_child audit: no need to walk list in audit_inode if name is NULL audit: pass in dentry to audit_copy_inode wherever possible audit: remove unnecessary NULL ptr checks from do_path_lookup
| * | procfs: don't need a PATH_MAX allocation to hold a string representation of ↵Jeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | an int Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | vfs: embed struct filename inside of names_cache allocation if possibleJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the common case where a name is much smaller than PATH_MAX, an extra allocation for struct filename is unnecessary. Before allocating a separate one, try to embed the struct filename inside the buffer first. If it turns out that that's not long enough, then fall back to allocating a separate struct filename and redoing the copy. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | audit: make audit_inode take struct filenameJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep a pointer to the audit_names "slot" in struct filename. Have all of the audit_inode callers pass a struct filename ponter to audit_inode instead of a string pointer. If the aname field is already populated, then we can skip walking the list altogether and just use it directly. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | vfs: make path_openat take a struct filename pointerJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ...and fix up the callers. For do_file_open_root, just declare a struct filename on the stack and fill out the .name field. For do_filp_open, make it also take a struct filename pointer, and fix up its callers to call it appropriately. For filp_open, add a variant that takes a struct filename pointer and turn filp_open into a wrapper around it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | vfs: turn do_path_lookup into wrapper around struct filename variantJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | ...and make the user_path callers use that variant instead. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | audit: allow audit code to satisfy getname requests from its names_listJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if we call getname() on a userland string more than once, we'll get multiple copies of the string and multiple audit_names records. Add a function that will allow the audit_names code to satisfy getname requests using info from the audit_names list, avoiding a new allocation and audit_names records. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | vfs: define struct filename and have getname() return itJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getname() is intended to copy pathname strings from userspace into a kernel buffer. The result is just a string in kernel space. It would however be quite helpful to be able to attach some ancillary info to the string. For instance, we could attach some audit-related info to reduce the amount of audit-related processing needed. When auditing is enabled, we could also call getname() on the string more than once and not need to recopy it from userspace. This patchset converts the getname()/putname() interfaces to return a struct instead of a string. For now, the struct just tracks the string in kernel space and the original userland pointer for it. Later, we'll add other information to the struct as it becomes convenient. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | vfs: unexport getname and putname symbolsJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | I see no callers in module code. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | acct: constify the name arg to acct_onJeff Layton2012-10-12
| | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | vfs: allocate page instead of names_cache buffer in mount_block_rootJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First, it's incorrect to call putname() after __getname_gfp() since the bare __getname_gfp() call skips the auditing code, while putname() doesn't. mount_block_root allocates a PATH_MAX buffer via __getname_gfp, and then calls get_fs_names to fill the buffer. That function can call get_filesystem_list which assumes that that buffer is a full page in size. On arches where PAGE_SIZE != 4k, then this could potentially overrun. In practice, it's hard to imagine the list of filesystem names even approaching 4k, but it's best to be safe. Just allocate a page for this purpose instead. With this, we can also remove the __getname_gfp() definition since there are no more callers. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | audit: overhaul __audit_inode_child to accomodate retryingJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to accomodate retrying path-based syscalls, we need to add a new "type" argument to audit_inode_child. This will tell us whether we're looking for a child entry that represents a create or a delete. If we find a parent, don't automatically assume that we need to create a new entry. Instead, use the information we have to try to find an existing entry first. Update it if one is found and create a new one if not. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | audit: optimize audit_compare_dname_pathJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the cases where we already know the length of the parent, pass it as a parm so we don't need to recompute it. In the cases where we don't know the length, pass in AUDIT_NAME_FULL (-1) to indicate that it should be determined. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | audit: make audit_compare_dname_path use parent_len helperEric Paris2012-10-12
| | | | | | | | | | | | | | | Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | audit: remove dirlen argument to audit_compare_dname_pathJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | All the callers set this to NULL now. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | audit: set the name_len in audit_inode for parent lookupsJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, this gets set mostly by happenstance when we call into audit_inode_child. While that might be a little more efficient, it seems wrong. If the syscall ends up failing before audit_inode_child ever gets called, then you'll have an audit_names record that shows the full path but has the parent inode info attached. Fix this by passing in a parent flag when we call audit_inode that gets set to the value of LOOKUP_PARENT. We can then fix up the pathname for the audit entry correctly from the get-go. While we're at it, clean up the no-op macro for audit_inode in the !CONFIG_AUDITSYSCALL case. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | audit: add a new "type" field to audit_names structJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For now, we just have two possibilities: UNKNOWN: for a new audit_names record that we don't know anything about yet NORMAL: for everything else In later patches, we'll add other types so we can distinguish and update records created under different circumstances. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | audit: reverse arguments to audit_inode_childJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the callers get called with an inode and dentry in the reverse order. The compiler then has to reshuffle the arg registers and/or stack in order to pass them on to audit_inode_child. Reverse those arguments for a micro-optimization. Reported-by: Eric Paris <eparis@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | audit: no need to walk list in audit_inode if name is NULLJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | If name is NULL then the condition in the loop will never be true. Also, with this change, we can eliminate the check for n->name == NULL since the equivalence check will never be true if it is. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | audit: pass in dentry to audit_copy_inode wherever possibleJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | In some cases, we were passing in NULL even when we have a dentry. Reported-by: Eric Paris <eparis@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | audit: remove unnecessary NULL ptr checks from do_path_lookupJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As best I can tell, whenever retval == 0, nd->path.dentry and nd->inode are also non-NULL. Eliminate those checks and the superfluous audit_context check. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge tag 'stable/for-linus-3.7-rc0-tag' of ↵Linus Torvalds2012-10-12
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen fixes from Konrad Rzeszutek Wilk: "This has four bug-fixes and one tiny feature that I forgot to put initially in my tree due to oversight. The feature is for kdump kernels to speed up the /proc/vmcore reading. There is a ram_is_pfn helper function that the different platforms can register for. We are now doing that. The bug-fixes cover some embarrassing struct pv_cpu_ops variables being set to NULL on Xen (but not baremetal). We had a similar issue in the past with {write|read}_msr_safe and this fills the three missing ones. The other bug-fix is to make the console output (hvc) be capable of dealing with misbehaving backends and not fall flat on its face. Lastly, a quirk for older XenBus implementations that came with an ancient v3.4 hypervisor (so RHEL5 based) - reading of certain non-existent attributes just hangs the guest during bootup - so we take precaution of not doing that on such older installations. Feature: - Register a pfn_is_ram helper to speed up reading of /proc/vmcore. Bug-fixes: - Three pvops call for Xen were undefined causing BUG_ONs. - Add a quirk so that the shutdown watches (used by kdump) are not used with older Xen (3.4). - Fix ungraceful state transition for the HVC console." * tag 'stable/for-linus-3.7-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pv-on-hvm kexec: add quirk for Xen 3.4 and shutdown watches. xen/bootup: allow {read|write}_cr8 pvops call. xen/bootup: allow read_tscp call for Xen PV guests. xen pv-on-hvm: add pfn_is_ram helper for kdump xen/hvc: handle backend CLOSED without CLOSING
| * | | xen/pv-on-hvm kexec: add quirk for Xen 3.4 and shutdown watches.Konrad Rzeszutek Wilk2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 254d1a3f02ebc10ccc6e4903394d8d3f484f715e, titled "xen/pv-on-hvm kexec: shutdown watches from old kernel" assumes that the XenBus backend can deal with reading of values from: "control/platform-feature-xs_reset_watches": ... a patch for xenstored is required so that it accepts the XS_RESET_WATCHES request from a client (see changeset 23839:42a45baf037d in xen-unstable.hg). Without the patch for xenstored the registration of watches will fail and some features of a PVonHVM guest are not available. The guest is still able to boot, but repeated kexec boots will fail." Sadly this is not true when using a Xen 3.4 hypervisor and booting a PVHVM guest. We end up hanging at: err = xenbus_scanf(XBT_NIL, "control", "platform-feature-xs_reset_watches", "%d", &supported); This can easily be seen with guests hanging at xenbus_init: NX (Execute Disable) protection: active SMBIOS 2.4 present. DMI: Xen HVM domU, BIOS 3.4.0 05/13/2011 Hypervisor detected: Xen HVM Xen version 3.4. Xen Platform PCI: I/O protocol version 1 ... snip .. calling xenbus_init+0x0/0x27e @ 1 Reverting the commit or using the attached patch fixes the issue. This fix checks whether the hypervisor is older than 4.0 and if so does not try to perform the read. Fixes-Oracle-Bug: 14708233 CC: stable@vger.kernel.org Acked-by: Olaf Hering <olaf@aepfle.de> [v2: Added a comment in the source code] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen/bootup: allow {read|write}_cr8 pvops call.Konrad Rzeszutek Wilk2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We actually do not do anything about it. Just return a default value of zero and if the kernel tries to write anything but 0 we BUG_ON. This fixes the case when an user tries to suspend the machine and it blows up in save_processor_state b/c 'read_cr8' is set to NULL and we get: kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100! invalid opcode: 0000 [#1] SMP Pid: 2687, comm: init.late Tainted: G O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs RIP: e030:[<ffffffff814d5f42>] [<ffffffff814d5f42>] save_processor_state+0x212/0x270 .. snip.. Call Trace: [<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac [<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150 [<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5 CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen/bootup: allow read_tscp call for Xen PV guests.Konrad Rzeszutek Wilk2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hypervisor will trap it. However without this patch, we would crash as the .read_tscp is set to NULL. This patch fixes it and sets it to the native_read_tscp call. CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen pv-on-hvm: add pfn_is_ram helper for kdumpOlaf Hering2012-10-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Register pfn_is_ram helper speed up reading /proc/vmcore in the kdump kernel. See commit message of 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages") for details. It makes use of a new hvmop HVMOP_get_mem_type which was introduced in xen 4.2 (23298:26413986e6e0) and backported to 4.1.1. The new function is currently only enabled for reading /proc/vmcore. Later it will be used also for the kexec kernel. Since that requires more changes in the generic kernel make it static for the time being. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen/hvc: handle backend CLOSED without CLOSINGDavid Vrabel2012-10-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backend drivers shouldn't transistion to CLOSED unless the frontend is CLOSED. If a backend does transition to CLOSED too soon then the frontend may not see the CLOSING state and will not properly shutdown. So, treat an unexpected backend CLOSED state the same as CLOSING. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | | Merge branch 'slab/urgent' of ↵Linus Torvalds2012-10-12
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull SLAB fix from Pekka Enberg: "This contains a lockdep false positive fix from Jiri Kosina I missed from the previous pull request." * 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: mm, slab: release slab_mutex earlier in kmem_cache_destroy()
| * | | | mm, slab: release slab_mutex earlier in kmem_cache_destroy()Jiri Kosina2012-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock dependency through kmem_cache_destroy() -> rcu_barrier() -> _rcu_barrier() -> get_online_cpus(). Lockdep thinks that this might actually result in ABBA deadlock, and reports it as below: === [ cut here ] === ====================================================== [ INFO: possible circular locking dependency detected ] 3.6.0-rc5-00004-g0d8ee37 #143 Not tainted ------------------------------------------------------- kworker/u:2/40 is trying to acquire lock: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0 but task is already holding lock: (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (slab_mutex){+.+.+.}: [<ffffffff810ae1e2>] validate_chain+0x632/0x720 [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530 [<ffffffff810ae921>] lock_acquire+0x121/0x190 [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450 [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50 [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe [<ffffffff81564b83>] notifier_call_chain+0x93/0x140 [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10 [<ffffffff8155719d>] _cpu_up+0xba/0x14e [<ffffffff815572ed>] cpu_up+0xbc/0x117 [<ffffffff81ae05e3>] smp_init+0x6b/0x9f [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10 -> #1 (cpu_hotplug.lock){+.+.+.}: [<ffffffff810ae1e2>] validate_chain+0x632/0x720 [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530 [<ffffffff810ae921>] lock_acquire+0x121/0x190 [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450 [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50 [<ffffffff81049197>] get_online_cpus+0x37/0x50 [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0 [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20 [<ffffffff810f2309>] rcu_barrier+0x9/0x10 [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90 [<ffffffff8118cc01>] deactivate_super+0x61/0x70 [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180 [<ffffffff811ab49e>] sys_umount+0x6e/0xd0 [<ffffffff81569979>] system_call_fastpath+0x16/0x1b -> #0 (rcu_sched_state.barrier_mutex){+.+...}: [<ffffffff810adb4e>] check_prev_add+0x3de/0x440 [<ffffffff810ae1e2>] validate_chain+0x632/0x720 [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530 [<ffffffff810ae921>] lock_acquire+0x121/0x190 [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450 [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50 [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0 [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20 [<ffffffff810f2309>] rcu_barrier+0x9/0x10 [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0 [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack] [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack] [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack] [<ffffffff81454b79>] ops_exit_list+0x39/0x60 [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0 [<ffffffff8106917b>] process_one_work+0x26b/0x4c0 [<ffffffff81069f3e>] worker_thread+0x12e/0x320 [<ffffffff8106f73e>] kthread+0x9e/0xb0 [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10 other info that might help us debug this: Chain exists of: rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(slab_mutex); lock(cpu_hotplug.lock); lock(slab_mutex); lock(rcu_sched_state.barrier_mutex); *** DEADLOCK *** === [ cut here ] === This is actually a false positive. Lockdep has no way of knowing the fact that the ABBA can actually never happen, because of special semantics of cpu_hotplug.refcount and its handling in cpu_hotplug_begin(); the mutual exclusion there is not achieved through mutex, but through cpu_hotplug.refcount. The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin() until everyone who called get_online_cpus() will call put_online_cpus()" semantics is totally invisible to lockdep. This patch therefore moves the unlock of slab_mutex so that rcu_barrier() is being called with it unlocked. It has two advantages: - it slightly reduces hold time of slab_mutex; as it's used to protect the cachep list, it's not necessary to hold it over kmem_cache_free() call any more - it silences the lockdep false positive warning, as it avoids lockdep ever learning about slab_mutex -> cpu_hotplug.lock dependency Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* | | | | Merge branch 'timers-core-for-linus' of ↵Linus Torvalds2012-10-12
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core update from Thomas Gleixner: - Bug fixes (one for a longstanding dead loop issue) - Rework of time related vsyscalls - Alarm timer updates - Jiffies updates to remove compile time dependencies * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Cast raw_interval to u64 to avoid shift overflow timers: Fix endless looping between cascade() and internal_add_timer() time/jiffies: bring back unconditional LATCH definition time: Convert x86_64 to using new update_vsyscall time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems time: Introduce new GENERIC_TIME_VSYSCALL time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD time: Move update_vsyscall definitions to timekeeper_internal.h time: Move timekeeper structure to timekeeper_internal.h for vsyscall changes jiffies: Remove compile time assumptions about CLOCK_TICK_RATE jiffies: Kill unused TICK_USEC_TO_NSEC alarmtimer: Rename alarmtimer_remove to alarmtimer_dequeue alarmtimer: Remove unused helpers & defines alarmtimer: Use hrtimer per-alarm instead of per-base alarmtimer: Implement minimum alarm interval for allowing suspend
| * | | | | timekeeping: Cast raw_interval to u64 to avoid shift overflowDan Carpenter2012-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We fixed a bunch of integer overflows in timekeeping code during the 3.6 cycle. I did an audit based on that and found this potential overflow. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20121009071823.GA19159@elgon.mountain Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
| * | | | | timers: Fix endless looping between cascade() and internal_add_timer()Hildner, Christian2012-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding two (or more) timers with large values for "expires" (they have to reside within tv5 in the same list) leads to endless looping between cascade() and internal_add_timer() in case CONFIG_BASE_SMALL is one and jiffies are crossing the value 1 << 18. The bug was introduced between 2.6.11 and 2.6.12 (and survived for quite some time). This patch ensures that when cascade() is called timers within tv5 are not added endlessly to their own list again, instead they are added to the next lower tv level tv4 (as expected). Signed-off-by: Christian Hildner <christian.hildner@siemens.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Link: http://lkml.kernel.org/r/98673C87CB31274881CFFE0B65ECC87B0F5FC1963E@DEFTHW99EA4MSX.ww902.siemens.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
| * | | | | Merge branch 'fortglx/3.7/time' of git://git.linaro.org/people/jstultz/linux ↵Thomas Gleixner2012-10-09
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | into timers/core
| | * \ \ \ \ Merge branch 'arnds-jiffies-fix' into fortglx/3.7/timeJohn Stultz2012-09-28
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sort out conflict with Arnd's patch that preserves the unconditional LATCH value. Signed-off-by: John Stultz <john.stultz@linaro.org>
| | | * | | | | time/jiffies: bring back unconditional LATCH definitionArnd Bergmann2012-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch a7ea3bbf5d "time/jiffies: Allow CLOCK_TICK_RATE to be undefined" breaks the compilation of targets that rely on the LATCH definition, because of recursive header file inclusion not defining CLOCK_TICK_RATE before it is checked here. This fixes the problem by moving LATCH back to where it was, but it seems that there are still cases where SHIFTED_HZ is defined incorrectly because of the same problem. Need to investigate further. Without this patch, building h7201_defconfig results in: arch/arm/mach-h720x/common.c: In function 'h720x_gettimeoffset': arch/arm/mach-h720x/common.c:50:73: error: 'LATCH' undeclared (first use in this function) arch/arm/mach-h720x/common.c:50:73: note: each undeclared identifier is reported only once for each function it appears in arch/arm/mach-h720x/common.c:51:1: warning: control reaches end of non-void function [-Wreturn-type] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | | | | | time: Convert x86_64 to using new update_vsyscallJohn Stultz2012-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch x86_64 to using sub-ns precise vsyscall Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | | | | | time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systemsJohn Stultz2012-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only do rounding to the next nanosecond so we don't see minor 1ns inconsistencies in the vsyscall implementations. Since we're changing the vsyscall implementations to avoid this, conditionalize the rounding only to the GENERIC_TIME_VSYSCALL_OLD architectures. Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>