aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAge
...
| | * | | | | | | | | | | uprobes: Fix prepare_uprobe() race with itselfOleg Nesterov2012-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | install_breakpoint() is called under mm->mmap_sem, this protects set_swbp() but not prepare_uprobe(). Two or more different tasks can call install_breakpoint()->prepare_uprobe() at the same time, this leads to numerous problems if UPROBE_COPY_INSN is not set. Just for example, the second copy_insn() can corrupt the already analyzed/fixuped uprobe->arch.insn and race with handle_swbp(). This patch simply adds uprobe->copy_mutex to serialize this code. We could probably reuse ->consumer_rwsem, but this would mean that consumer->handler() can not use mm->mmap_sem, not good. Note: this is another temporary ugly hack until we move this logic into uprobe_register(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Introduce prepare_uprobe()Oleg Nesterov2012-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Preparation. Extract the copy_insn/arch_uprobe_analyze_insn code from install_breakpoint() into the new helper, prepare_uprobe(). And move uprobe->flags defines from uprobes.h to uprobes.c, nobody else can use them anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Fix handle_swbp() vs unregister() + register() raceOleg Nesterov2012-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Strictly speaking this race was added by me in 56bb4cf6. However I think that this bug is just another indication that we should move copy_insn/uprobe_analyze_insn code from install_breakpoint() to uprobe_register(), there are a lot of other reasons for that. Until then, add a hack to close the race. A task can hit uprobe U1, but before it calls find_uprobe() this uprobe can be unregistered *AND* another uprobe U2 can be added to uprobes_tree at the same inode/offset. In this case handle_swbp() will use the not-fully-initialized U2, in particular its arch.insn for xol. Add the additional !UPROBE_COPY_INSN check into handle_swbp(), if this flag is not set we simply restart as if the new uprobe was not inserted yet. This is not very nice, we need barriers, but we will remove this hack when we change uprobe_register(). Note: with or without this patch install_breakpoint() can race with itself, yet another reson to kill UPROBE_COPY_INSN altogether. And even the usage of uprobe->flags is not safe. See the next patches. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Do not delete uprobe if uprobe_unregister() failsOleg Nesterov2012-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | delete_uprobe() must not be called if register_for_each_vma(false) fails to remove all breakpoints, __uprobe_unregister() is correct. The problem is that register_for_each_vma(false) always returns 0 and thus this logic does not work. 1. Change verify_opcode() to return 0 rather than -EINVAL when unregister detects the !is_swbp insn, we can treat this case as success and currently unregister paths ignore the error code anyway. 2. Change remove_breakpoint() to propagate the error code from write_opcode(). 3. Change register_for_each_vma(is_register => false) to remove as much breakpoints as possible but return non-zero if remove_breakpoint() fails at least once. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Don't return success if alloc_uprobe() failsOleg Nesterov2012-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If alloc_uprobe() fails uprobe_register() should return ENOMEM, not 0. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Simplify is_swbp_at_addr(), remove stale commentsOleg Nesterov2012-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the previous change is_swbp_at_addr() is always called with current->mm. Remove this check and move it close to its single caller. Also, remove the obsolete comment about is_swbp_at_addr() and uprobe_state.count. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Kill set_orig_insn()->is_swbp_at_addr()Oleg Nesterov2012-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike set_swbp(), set_orig_insn()->is_swbp_at_addr() makes sense, although it can't prevent all confusions. But the usage of is_swbp_at_addr() is equally confusing, and it adds the extra get_user_pages() we can avoid. This patch removes set_orig_insn()->is_swbp_at_addr() but changes write_opcode() to do the necessary checks before replace_page(). Perhaps it also makes sense to ensure PAGE_MAPPING_ANON in unregister case. find_active_uprobe() becomes the only user of is_swbp_at_addr(), we can change its semantics. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Introduce copy_opcode(), kill read_opcode()Oleg Nesterov2012-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No functional changes, preparations. 1. Extract the kmap-and-memcpy code from read_opcode() into the new trivial helper, copy_opcode(). The next patch will add another user. 2. read_opcode() becomes really trivial, fold it into its single caller, is_swbp_at_addr(). 3. Remove "auprobe" argument from write_opcode(), it is not used since f403072c6. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Kill set_swbp()->is_swbp_at_addr()Oleg Nesterov2012-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A separate patch for better documentation. set_swbp()->is_swbp_at_addr() is not needed for correctness, it is harmless to do the unnecessary __replace_page(old_page, new_page) when these 2 pages are identical. And it can not be counted as optimization. mmap/register races are very unlikely, while in the likely case is_swbp_at_addr() adds the extra get_user_pages() even if the caller is uprobe_mmap(current->mm) and returns false. Note also that the semantics/usage of is_swbp_at_addr() in uprobe.c is confusing. set_swbp() uses it to detect the case when this insn was already modified by uprobes, that is why it should always compare the opcode with UPROBE_SWBP_INSN even if the hardware (like powerpc) has other trap insns. It doesn't matter if this breakpoint was in fact installed by gdb or application itself, we are going to "steal" this breakpoint anyway and execute the original insn from vm_file even if it no longer matches the memory. OTOH, handle_swbp()->find_active_uprobe() uses is_swbp_at_addr() to figure out whether we need to send SIGTRAP or not if we can not find uprobe, so in this case it should return true for all trap variants, not only for UPROBE_SWBP_INSN. This patch removes set_swbp()->is_swbp_at_addr(), the next patches will remove it from set_orig_insn() which is similar to set_swbp() in this respect. So the only caller will be handle_swbp() and we can make its semantics clear. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Restrict valid_vma(false) to skip VM_SHARED vmasOleg Nesterov2012-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | valid_vma(false) ignores ->vm_flags, this is not actually right. We should never try to write into MAP_SHARED mapping, this can confuse an apllication which actually writes to ->vm_file. With this patch valid_vma(false) ignores VM_WRITE only but checks other (immutable) bits checked by valid_vma(true). This can also speedup uprobe_munmap() and uprobe_unregister(). Note: even after this patch _unregister can confuse the probed application if it does mprotect(PROT_WRITE) after _register and installs "int3", but this is hardly possible to avoid and this doesn't differ from gdb case. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXECOleg Nesterov2012-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | uprobe_register() or uprobe_mmap() requires VM_READ | VM_EXEC, this is not right. An apllication can do mprotect(PROT_EXEC) later and execute this code. Change valid_vma(is_register => true) to check VM_MAYEXEC instead. No need to check VM_MAYREAD, it is always set. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Change write_opcode() to use FOLL_FORCEOleg Nesterov2012-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | write_opcode()->get_user_pages() needs FOLL_FORCE to ensure we can read the page even if the probed task did mprotect(PROT_NONE) after uprobe_register(). Without FOLL_WRITE, FOLL_FORCE doesn't have any side effect but allows to read the !VM_READ memory. Otherwiese the subsequent uprobe_unregister()->set_orig_insn() fails and we leak "int3". If that task does mprotect(PROT_READ | EXEC) and execute the probed insn later it will be killed. Note: in fact this is also needed for _register, see the next patch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Move clear_thread_flag(TIF_UPROBE) to uprobe_notify_resume()Oleg Nesterov2012-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move clear_thread_flag(TIF_UPROBE) from do_notify_resume() to uprobe_notify_resume() for !CONFIG_UPROBES case. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Kill UTASK_BP_HIT stateOleg Nesterov2012-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kill UTASK_BP_HIT state, it buys nothing but complicates the code. It is only used in uprobe_notify_resume() to decide who should be called, we can check utask->active_uprobe != NULL instead. And this allows us to simplify handle_swbp(), no need to clear utask->state. Likewise we could kill UTASK_SSTEP, but UTASK_BP_HIT is worse and imho should die. The problem is, it creates the special case when task->utask is NULL, we can't distinguish RUNNING and BP_HIT. With this patch utask == NULL always means RUNNING. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Fix UPROBE_SKIP_SSTEP checks in handle_swbp()Oleg Nesterov2012-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If handle_swbp()->add_utask() fails but UPROBE_SKIP_SSTEP is set, cleanup_ret: path do not restart the insn, this is wrong. Remove this check and add the additional label for can_skip_sstep() = T case. Note also that UPROBE_SKIP_SSTEP can be false positive, we simply can not trust it unless arch_uprobe_skip_sstep() was already called. Also, move another UPROBE_SKIP_SSTEP check before can_skip_sstep() into this helper, this looks more clean and understandable. Note: probably we should rename "skip" to "emulate" and I think that "clear UPROBE_SKIP_SSTEP" should be moved to arch_can_skip. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Do not setup ->active_uprobe/state prematurelyOleg Nesterov2012-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | handle_swbp() sets utask->active_uprobe before handler_chain(), and UTASK_SSTEP before pre_ssout(). This complicates the code for no reason, arch_ hooks or consumer->handler() should not (and can't) use this info. Change handle_swbp() to initialize them after pre_ssout(), and remove the no longer needed cleanup-utask code. Signed-off-by: Oleg Nesterov <oleg@redhat.com> cked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
| | * | | | | | | | | | | uprobes: Do not leak UTASK_BP_HIT if find_active_uprobe() failsOleg Nesterov2012-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If handle_swbp()->find_active_uprobe() fails we return with utask->state = UTASK_BP_HIT. Change handle_swbp() to reset utask->state at the start. Note that we do this unconditionally, see the next patch(es). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
* | | | | | | | | | | | | module_signing: fix printk format warningRandy Dunlap2012-10-22
|/ / / / / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the warning: kernel/module_signing.c:195:2: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t' by using the proper 'z' modifier for printing a size_t. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | | use clamp_t in UNAME26 fixKees Cook2012-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The min/max call needed to have explicit types on some architectures (e.g. mn10300). Use clamp_t instead to avoid the warning: kernel/sys.c: In function 'override_release': kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a cast [enabled by default] Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | | MODSIGN: Move the magic string to the end of a module and eliminate the searchDavid Howells2012-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Emit the magic string that indicates a module has a signature after the signature data instead of before it. This allows module_sig_check() to be made simpler and faster by the elimination of the search for the magic string. Instead we just need to do a single memcmp(). This works because at the end of the signature data there is the fixed-length signature information block. This block then falls immediately prior to the magic number. From the contents of the information block, it is trivial to calculate the size of the signature data and thus the size of the actual module data. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | | pidns: remove recursion from free_pid_ns()Cyrill Gorcunov2012-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | free_pid_ns() operates in a recursive fashion: free_pid_ns(parent) put_pid_ns(parent) kref_put(&ns->kref, free_pid_ns); free_pid_ns thus if there was a huge nesting of namespaces the userspace may trigger avalanche calling of free_pid_ns leading to kernel stack exhausting and a panic eventually. This patch turns the recursion into an iterative loop. Based on a patch by Andrew Vagin. [akpm@linux-foundation.org: export put_pid_ns() to modules] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | | kernel/sys.c: fix stack memory content leak via UNAME26Kees Cook2012-10-19
| |_|_|_|_|/ / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling uname() with the UNAME26 personality set allows a leak of kernel stack contents. This fixes it by defensively calculating the length of copy_to_user() call, making the len argument unsigned, and initializing the stack buffer to zero (now technically unneeded, but hey, overkill). CVE-2012-0957 Reported-by: PaX Team <pageexec@freemail.hu> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: PaX Team <pageexec@freemail.hu> Cc: Brad Spengler <spender@grsecurity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | printk: Fix scheduling-while-atomic problem in console_cpu_notify()Paul E. McKenney2012-10-16
| |_|/ / / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The console_cpu_notify() function runs with interrupts disabled in the CPU_DYING case. It therefore cannot block, for example, as will happen when it calls console_lock(). Therefore, remove the CPU_DYING leg of the switch statement to avoid this problem. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | Merge branch 'modules-next' of ↵Linus Torvalds2012-10-14
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module signing support from Rusty Russell: "module signing is the highlight, but it's an all-over David Howells frenzy..." Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG. * 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits) X.509: Fix indefinite length element skip error handling X.509: Convert some printk calls to pr_devel asymmetric keys: fix printk format warning MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking MODSIGN: Make mrproper should remove generated files. MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs MODSIGN: Use the same digest for the autogen key sig as for the module sig MODSIGN: Sign modules during the build process MODSIGN: Provide a script for generating a key ID from an X.509 cert MODSIGN: Implement module signature checking MODSIGN: Provide module signing public keys to the kernel MODSIGN: Automatically generate module signing keys if missing MODSIGN: Provide Kconfig options MODSIGN: Provide gitignore and make clean rules for extra files MODSIGN: Add FIPS policy module: signature checking hook X.509: Add a crypto key parser for binary (DER) X.509 certificates MPILIB: Provide a function to read raw data into an MPI X.509: Add an ASN.1 decoder X.509: Add simple ASN.1 grammar compiler ...
| * | | | | | | | | | MODSIGN: Make mrproper should remove generated files.Rusty Russell2012-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It doesn't, because the clean targets don't include kernel/Makefile, and because two files were missing from the list. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certsDavid Howells2012-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Place an indication that the certificate should use utf8 strings into the x509.genkey template generated by kernel/Makefile. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | MODSIGN: Use the same digest for the autogen key sig as for the module sigDavid Howells2012-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the same digest type for the autogenerated key signature as for the module signature so that the hash algorithm is guaranteed to be present in the kernel. Without this, the X.509 certificate loader may reject the X.509 certificate so generated because it was self-signed and the signature will be checked against itself - but this won't work if the digest algorithm must be loaded as a module. The symptom is that the key fails to load with the following message emitted into the kernel log: MODSIGN: Problem loading in-kernel X.509 certificate (-65) the error in brackets being -ENOPKG. What you should see is something like: MODSIGN: Loaded cert 'Magarathea: Glacier signing key: 9588321144239a119d3406d4c4cf1fbae1836fa0' Note that this doesn't apply to certificates that are not self-signed as we don't check those currently as they require the parent CA certificate to be available. Reported-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | MODSIGN: Implement module signature checkingDavid Howells2012-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check the signature on the module against the keys compiled into the kernel or available in a hardware key store. Currently, only RSA keys are supported - though that's easy enough to change, and the signature is expected to contain raw components (so not a PGP or PKCS#7 formatted blob). The signature blob is expected to consist of the following pieces in order: (1) The binary identifier for the key. This is expected to match the SubjectKeyIdentifier from an X.509 certificate. Only X.509 type identifiers are currently supported. (2) The signature data, consisting of a series of MPIs in which each is in the format of a 2-byte BE word sizes followed by the content data. (3) A 12 byte information block of the form: struct module_signature { enum pkey_algo algo : 8; enum pkey_hash_algo hash : 8; enum pkey_id_type id_type : 8; u8 __pad; __be32 id_length; __be32 sig_length; }; The three enums are defined in crypto/public_key.h. 'algo' contains the public-key algorithm identifier (0->DSA, 1->RSA). 'hash' contains the digest algorithm identifier (0->MD4, 1->MD5, 2->SHA1, etc.). 'id_type' contains the public-key identifier type (0->PGP, 1->X.509). '__pad' should be 0. 'id_length' should contain in the binary identifier length in BE form. 'sig_length' should contain in the signature data length in BE form. The lengths are in BE order rather than CPU order to make dealing with cross-compilation easier. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor Kconfig fix)
| * | | | | | | | | | MODSIGN: Provide module signing public keys to the kernelDavid Howells2012-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Include a PGP keyring containing the public keys required to perform module verification in the kernel image during build and create a special keyring during boot which is then populated with keys of crypto type holding the public keys found in the PGP keyring. These can be seen by root: [root@andromeda ~]# cat /proc/keys 07ad4ee0 I----- 1 perm 3f010000 0 0 crypto modsign.0: RSA 87b9b3bd [] 15c7f8c3 I----- 1 perm 1f030000 0 0 keyring .module_sign: 1/4 ... It is probably worth permitting root to invalidate these keys, resulting in their removal and preventing further modules from being loaded with that key. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | MODSIGN: Automatically generate module signing keys if missingDavid Howells2012-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Automatically generate keys for module signing if they're absent so that allyesconfig doesn't break. The builder should consider generating their own key and certificate, however, so that the keys are appropriately named. The private key for the module signer should be placed in signing_key.priv (unencrypted!) and the public key in an X.509 certificate as signing_key.x509. If a transient key is desired for signing the modules, a config file for 'openssl req' can be placed in x509.genkey, looking something like the following: [ req ] default_bits = 4096 distinguished_name = req_distinguished_name prompt = no x509_extensions = myexts [ req_distinguished_name ] O = Magarathea CN = Glacier signing key emailAddress = slartibartfast@magrathea.h2g2 [ myexts ] basicConstraints=critical,CA:FALSE keyUsage=digitalSignature subjectKeyIdentifier=hash authorityKeyIdentifier=hash The build process will use this to configure: openssl req -new -nodes -utf8 -sha1 -days 36500 -batch \ -x509 -config x509.genkey \ -outform DER -out signing_key.x509 \ -keyout signing_key.priv to generate the key. Note that it is required that the X.509 certificate have a subjectKeyIdentifier and an authorityKeyIdentifier. Without those, the certificate will be rejected. These can be used to check the validity of a certificate. Note that 'make distclean' will remove signing_key.{priv,x509} and x509.genkey, whether or not they were generated automatically. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | MODSIGN: Add FIPS policyDavid Howells2012-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we're in FIPS mode, we should panic if we fail to verify the signature on a module or we're asked to load an unsigned module in signature enforcing mode. Possibly FIPS mode should automatically enable enforcing mode. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | module: signature checking hookRusty Russell2012-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do a very simple search for a particular string appended to the module (which is cache-hot and about to be SHA'd anyway). There's both a config option and a boot parameter which control whether we accept or fail with unsigned modules and modules that are signed with an unknown key. If module signing is enabled, the kernel will be tainted if a module is loaded that is unsigned or has a signature for which we don't have the key. (Useful feedback and tweaks by David Howells <dhowells@redhat.com>) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | module: wait when loading a module which is currently initializing.Rusty Russell2012-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original module-init-tools module loader used a fnctl lock on the .ko file to avoid attempts to simultaneously load a module. Unfortunately, you can't get an exclusive fcntl lock on a read-only fd, making this not work for read-only mounted filesystems. module-init-tools has a hacky sleep-and-loop for this now. It's not that hard to wait in the kernel, and only return -EEXIST once the first module has finished loading (or continue loading the module if the first one failed to initialize for some reason). It's also consistent with what we do for dependent modules which are still loading. Suggested-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | module: fix symbol waiting when module fails before initRusty Russell2012-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use resolve_symbol_wait(), which blocks if the module containing the symbol is still loading. However: 1) The module_wq we use is only woken after calling the modules' init function, but there are other failure paths after the module is placed in the linked list where we need to do the same thing. 2) wake_up() only wakes one waiter, and our waitqueue is shared by all modules, so we need to wake them all. 3) wake_up_all() doesn't imply a memory barrier: I feel happier calling it after we've grabbed and dropped the module_mutex, not just after the state assignment. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | Make most arch asm/module.h files use asm-generic/module.hDavid Howells2012-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the mapping of Elf_[SPE]hdr, Elf_Addr, Elf_Sym, Elf_Dyn, Elf_Rel/Rela, ELF_R_TYPE() and ELF_R_SYM() to either the 32-bit version or the 64-bit version into asm-generic/module.h for all arches bar MIPS. Also, use the generic definition mod_arch_specific where possible. To this end, I've defined three new config bools: (*) HAVE_MOD_ARCH_SPECIFIC Arches define this if they don't want to use the empty generic mod_arch_specific struct. (*) MODULES_USE_ELF_RELA Arches define this if their modules can contain RELA records. This causes the Elf_Rela mapping to be emitted and allows apply_relocate_add() to be defined by the arch rather than have the core emit an error message. (*) MODULES_USE_ELF_REL Arches define this if their modules can contain REL records. This causes the Elf_Rel mapping to be emitted and allows apply_relocate() to be defined by the arch rather than have the core emit an error message. Note that it is possible to allow both REL and RELA records: m68k and mips are two arches that do this. With this, some arch asm/module.h files can be deleted entirely and replaced with a generic-y marker in the arch Kbuild file. Additionally, I have removed the bits from m32r and score that handle the unsupported type of relocation record as that's now handled centrally. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | module: taint kernel when lve module is loadedMatthew Garrett2012-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cloudlinux have a product called lve that includes a kernel module. This was previously GPLed but is now under a proprietary license, but the module continues to declare MODULE_LICENSE("GPL") and makes use of some EXPORT_SYMBOL_GPL symbols. Forcibly taint it in order to avoid this. Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Alex Lyashkov <umka@cloudlinux.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org
* | | | | | | | | | | Merge tag 'for_linus-3.7' of ↵Linus Torvalds2012-10-12
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb Pull KGDB/KDB fixes and cleanups from Jason Wessel: "Cleanups - Clean up compile warnings in kgdboc.c and x86/kernel/kgdb.c - Add module event hooks for simplified debugging with gdb Fixes - Fix kdb to stop paging with 'q' on bta and dmesg - Fix for data that scrolls off the vga console due to line wrapping when using the kdb pager New - The debug core registers for kernel module events which allows a kernel aware gdb to automatically load symbols and break on entry to a kernel module - Allow kgdboc=kdb to setup kdb on the vga console" * tag 'for_linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb: tty/console: fix warnings in drivers/tty/serial/kgdboc.c kdb,vt_console: Fix missed data due to pager overruns kdb: Fix dmesg/bta scroll to quit with 'q' kgdboc: Accept either kbd or kdb to activate the vga + keyboard kdb shell kgdb,x86: fix warning about unused variable mips,kgdb: fix recursive page fault with CONFIG_KPROBES kgdb: Add module event hooks
| * | | | | | | | | | | kdb,vt_console: Fix missed data due to pager overrunsJason Wessel2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to miss data when using the kdb pager. The kdb pager does not pay attention to the maximum column constraint of the screen or serial terminal. This result is not incrementing the shown lines correctly and the pager will print more lines that fit on the screen. Obviously that is less than useful when using a VGA console where you cannot scroll back. The pager will now look at the kdb_buffer string to see how many characters are printed. It might not be perfect considering you can output ASCII that might move the cursor position, but it is a substantially better approximation for viewing dmesg and trace logs. This also means that the vt screen needs to set the kdb COLUMNS variable. Cc: <stable@vger.kernel.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
| * | | | | | | | | | | kdb: Fix dmesg/bta scroll to quit with 'q'Jason Wessel2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you press 'q' the pager should exit instead of printing everything from dmesg which can really bog down a 9600 baud serial link. The same is true for the bta command. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
| * | | | | | | | | | | kgdb: Add module event hooksJason Wessel2012-10-12
| | |_|_|_|/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow gdb to auto load kernel modules when it is attached, which makes it trivially easy to debug module init functions or pre-set breakpoints in a kernel module that has not loaded yet. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
* | | | | | | | | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2012-10-12
|\ \ \ \ \ \ \ \ \ \ \ | | |_|_|/ / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "This tree includes some late late perf items that missed the first round: tools: - Bash auto completion improvements, now we can auto complete the tools long options, tracepoint event names, etc, from Namhyung Kim. - Look up thread using tid instead of pid in 'perf sched'. - Move global variables into a perf_kvm struct, from David Ahern. - Hists refactorings, preparatory for improved 'diff' command, from Jiri Olsa. - Hists refactorings, preparatory for event group viewieng work, from Namhyung Kim. - Remove double negation on optional feature macro definitions, from Namhyung Kim. - Remove several cases of needless global variables, on most builtins. - misc fixes kernel: - sysfs support for IBS on AMD CPUs, from Robert Richter. - Support for an upcoming Intel CPU, the Xeon-Phi / Knights Corner HPC blade PMU, from Vince Weaver. - misc fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) perf: Fix perf_cgroup_switch for sw-events perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmu perf/AMD/IBS: Add sysfs support perf hists: Add more helpers for hist entry stat perf hists: Move he->stat.nr_events initialization to a template perf hists: Introduce struct he_stat perf diff: Removing the total_period argument from output code perf tool: Add hpp interface to enable/disable hpp column perf tools: Removing hists pair argument from output path perf hists: Separate overhead and baseline columns perf diff: Refactor diff displacement possition info perf hists: Add struct hists pointer to struct hist_entry perf tools: Complete tracepoint event names perf/x86: Add support for Intel Xeon-Phi Knights Corner PMU perf evlist: Remove some unused methods perf evlist: Introduce add_newtp method perf kvm: Move global variables into a perf_kvm struct perf tools: Convert to BACKTRACE_SUPPORT perf tools: Long option completion support for each subcommands perf tools: Complete long option names of perf command ...
| * | | | | | | | | | perf: Fix perf_cgroup_switch for sw-eventsPeter Zijlstra2012-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri reported that he could trigger the WARN_ON_ONCE() in perf_cgroup_switch() using sw-events. This is because sw-events share a cpuctx with multiple PMUs. Use the ->unique_pmu pointer to limit the pmu iteration to unique cpuctx instances. Reported-and-Tested-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-so7wi2zf3jjzrwcutm2mkz0j@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | | | perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmuPeter Zijlstra2012-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stephane thought the perf_cpu_context::active_pmu name confusing and suggested using 'unique_pmu' instead. This pointer is a pointer to a 'random' pmu sharing the cpuctx instance, therefore limiting a for_each_pmu loop to those where cpuctx->unique_pmu matches the pmu we get a loop over unique cpuctx instances. Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-kxyjqpfj2fn9gt7kwu5ag9ks@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-10-12
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull third pile of kernel_execve() patches from Al Viro: "The last bits of infrastructure for kernel_thread() et.al., with alpha/arm/x86 use of those. Plus sanitizing the asm glue and do_notify_resume() on alpha, fixing the "disabled irq while running task_work stuff" breakage there. At that point the rest of kernel_thread/kernel_execve/sys_execve work can be done independently for different architectures. The only pending bits that do depend on having all architectures converted are restrictred to fs/* and kernel/* - that'll obviously have to wait for the next cycle. I thought we'd have to wait for all of them done before we start eliminating the longjump-style insanity in kernel_execve(), but it turned out there's a very simple way to do that without flagday-style changes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: alpha: switch to saner kernel_execve() semantics arm: switch to saner kernel_execve() semantics x86, um: convert to saner kernel_execve() semantics infrastructure for saner ret_from_kernel_thread semantics make sure that kernel_thread() callbacks call do_exit() themselves make sure that we always have a return path from kernel_execve() ppc: eeh_event should just use kthread_run() don't bother with kernel_thread/kernel_execve for launching linuxrc alpha: get rid of switch_stack argument of do_work_pending() alpha: don't bother passing switch_stack separately from regs alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c alpha: simplify TIF_NEED_RESCHED handling
| * | | | | | | | | | | infrastructure for saner ret_from_kernel_thread semanticsAl Viro2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * allow kernel_execve() leave the actual return to userland to caller (selected by CONFIG_GENERIC_KERNEL_EXECVE). Callers updated accordingly. * architecture that does select GENERIC_KERNEL_EXECVE in its Kconfig should have its ret_from_kernel_thread() do this: call schedule_tail call the callback left for it by copy_thread(); if it ever returns, that's because it has just done successful kernel_execve() jump to return from syscall IOW, its only difference from ret_from_fork() is that it does call the callback. * such an architecture should also get rid of ret_from_kernel_execve() and __ARCH_WANT_KERNEL_EXECVE This is the last part of infrastructure patches in that area - from that point on work on different architectures can live independently. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | | | | make sure that kernel_thread() callbacks call do_exit() themselvesAl Viro2012-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of them never returned anyway - only two functions had to be changed. That allows to simplify their callers a whole lot. Note that this does *not* apply to kthread_run() callbacks - all of those had been called from the same kernel_thread() callback, which did do_exit() already. This is strictly about very few low-level kernel_thread() callbacks (there are only 6 of those, mostly as part of kthread.h and kmod.h exported mechanisms, plus kernel_init() itself). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-10-12
|\ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull third pile of VFS updates from Al Viro: "Stuff from Jeff Layton, mostly. Sanitizing interplay between audit and namei, removing a lot of insanity from audit_inode() mess and getting things ready for his ESTALE patchset." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: procfs: don't need a PATH_MAX allocation to hold a string representation of an int vfs: embed struct filename inside of names_cache allocation if possible audit: make audit_inode take struct filename vfs: make path_openat take a struct filename pointer vfs: turn do_path_lookup into wrapper around struct filename variant audit: allow audit code to satisfy getname requests from its names_list vfs: define struct filename and have getname() return it vfs: unexport getname and putname symbols acct: constify the name arg to acct_on vfs: allocate page instead of names_cache buffer in mount_block_root audit: overhaul __audit_inode_child to accomodate retrying audit: optimize audit_compare_dname_path audit: make audit_compare_dname_path use parent_len helper audit: remove dirlen argument to audit_compare_dname_path audit: set the name_len in audit_inode for parent lookups audit: add a new "type" field to audit_names struct audit: reverse arguments to audit_inode_child audit: no need to walk list in audit_inode if name is NULL audit: pass in dentry to audit_copy_inode wherever possible audit: remove unnecessary NULL ptr checks from do_path_lookup
| * | | | | | | | | | | | audit: make audit_inode take struct filenameJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep a pointer to the audit_names "slot" in struct filename. Have all of the audit_inode callers pass a struct filename ponter to audit_inode instead of a string pointer. If the aname field is already populated, then we can skip walking the list altogether and just use it directly. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | | | | | vfs: make path_openat take a struct filename pointerJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ...and fix up the callers. For do_file_open_root, just declare a struct filename on the stack and fill out the .name field. For do_filp_open, make it also take a struct filename pointer, and fix up its callers to call it appropriately. For filp_open, add a variant that takes a struct filename pointer and turn filp_open into a wrapper around it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | | | | | audit: allow audit code to satisfy getname requests from its names_listJeff Layton2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if we call getname() on a userland string more than once, we'll get multiple copies of the string and multiple audit_names records. Add a function that will allow the audit_names code to satisfy getname requests using info from the audit_names list, avoiding a new allocation and audit_names records. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>