aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
...
| * | | | x86/irq: Reorganize the search in assign_irq_vectorThomas Gleixner2016-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split out the code which advances the target cpu for the search so we can reuse it for the next patch which adds an early validation check for the vectormask which we get from the apic. Add comments while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.484562040@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/irq: Reorganize the return path in assign_irq_vectorThomas Gleixner2016-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use an explicit goto for the cases where we have success in the search/update and return -ENOSPC if the search loop ends due to no space. Preparatory patch for fixes. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.403491024@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/irq: Do not use apic_chip_data.old_domain as temporary bufferJiang Liu2016-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function __assign_irq_vector() makes use of apic_chip_data.old_domain as a temporary buffer, which is in the way of using apic_chip_data.old_domain for synchronizing the vector cleanup with the vector assignement code. Use a proper temporary cpumask for this. [ tglx: Renamed the mask to searched_cpumask for clarity ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/1450880014-11741-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/irq: Validate that irq descriptor is still activeThomas Gleixner2016-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In fixup_irqs() we unconditionally dereference the irq chip of an irq descriptor. The descriptor might still be valid, but already cleaned up, i.e. the chip removed. Add a check for this condition. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.236423282@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/irq: Fix a race in x86_vector_free_irqs()Jiang Liu2016-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a race condition between x86_vector_free_irqs() { free_apic_chip_data(irq_data->chip_data); xxxxx //irq_data->chip_data has been freed, but the pointer //hasn't been reset yet irq_domain_reset_irq_data(irq_data); } and smp_irq_move_cleanup_interrupt() { raw_spin_lock(&vector_lock); data = apic_chip_data(irq_desc_get_irq_data(desc)); access data->xxxx // may access freed memory raw_spin_unlock(&desc->lock); } which may cause smp_irq_move_cleanup_interrupt() to access freed memory. Call irq_domain_reset_irq_data(), which clears the pointer with vector lock held. [ tglx: Free memory outside of lock held region. ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/1450880014-11741-3-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/irq: Call chip->irq_set_affinity in proper contextThomas Gleixner2016-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | setup_ioapic_dest() calls irqchip->irq_set_affinity() completely unprotected. That's wrong in several aspects: - it opens a race window where irq_set_affinity() can be interrupted and the irq chip left in unconsistent state. - it triggers a lockdep splat when we fix the vector race for 4.3+ because vector lock is taken with interrupts enabled. The proper calling convention is irq descriptor lock held and interrupts disabled. Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: Joe Lawrence <joe.lawrence@stratus.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1601140919420.3575@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2016-01-31
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "The timer departement delivers: - a regression fix for the NTP code along with a proper selftest - prevent a spurious timer interrupt in the NOHZ lowres code - a fix for user space interfaces returning the remaining time on architectures with CONFIG_TIME_LOW_RES=y - a few patches to fix COMPILE_TEST fallout" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/nohz: Set the correct expiry when switching to nohz/lowres mode clocksource: Fix dependencies for archs w/o HAS_IOMEM clocksource: Select CLKSRC_MMIO where needed tick/sched: Hide unused oneshot timer code kselftests: timers: Add adjtimex SETOFFSET validity tests ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper hrtimer: Handle remaining time proper for TIME_LOW_RES clockevents/tcb_clksrc: Prevent disabling an already disabled clock
| * | | | | tick/nohz: Set the correct expiry when switching to nohz/lowres modeWanpeng Li2016-01-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0ff53d096422 sets the next tick interrupt to the last jiffies update, i.e. in the past, because the forward operation is invoked before the set operation. There is no resulting damage (yet), but we get an extra pointless tick interrupt. Revert the order so we get the next tick interrupt in the future. Fixes: commit 0ff53d096422 "tick: sched: Force tick interrupt and get rid of softirq magic" Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1453893967-3458-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | clocksource: Fix dependencies for archs w/o HAS_IOMEMRichard Weinberger2016-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Not every arch has io memory. So, unbreak the build by fixing the dependencies. Signed-off-by: Richard Weinberger <richard@nod.at> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: http://lkml.kernel.org/r/1453760661-1444-21-git-send-email-richard@nod.at Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | clocksource: Select CLKSRC_MMIO where neededArnd Bergmann2016-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Tegra clocksource implementation uses the clocksource_mmio helper functions, but currently can be configured without them, which fails: drivers/clocksource/built-in.o: In function `tegra20_init_timer': :(.init.text+0xac): undefined reference to `clocksource_mmio_init' :(.init.text+0x140): undefined reference to `clocksource_mmio_readl_up' The same problem exists for Digicolor: drivers/clocksource/built-in.o: In function `digicolor_timer_init': :(.init.text+0xfa): undefined reference to `clocksource_mmio_init' :(.init.text+0x14c): undefined reference to `clocksource_mmio_readl_down' I've inspected the Kconfig file to look for other cases that I have not yet run into, and added an explicit 'select' to each one to ensure we can successfully link the drivers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: linux-arm-kernel@lists.infradead.org Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: http://lkml.kernel.org/r/1453737776-1960372-1-git-send-email-arnd@arndb.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | tick/sched: Hide unused oneshot timer codeArnd Bergmann2016-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A couple of functions in kernel/time/tick-sched.c are only relevant for oneshot timer mode, i.e. when hires-timers or nohz mode are enabled. If both are disabled, we get gcc warnings about them: kernel/time/tick-sched.c:98:16: warning: 'tick_init_jiffy_update' defined but not used [-Wunused-function] static ktime_t tick_init_jiffy_update(void) ^ kernel/time/tick-sched.c:112:13: warning: 'tick_sched_do_timer' defined but not used [-Wunused-function] static void tick_sched_do_timer(ktime_t now) ^ kernel/time/tick-sched.c:134:13: warning: 'tick_sched_handle' defined but not used [-Wunused-function] static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs) ^ This encloses the whole set of functions in an appropriate ifdef to avoid the warning and to make it clearer when they are used. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1453736525-1959191-1-git-send-email-arnd@arndb.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | kselftests: timers: Add adjtimex SETOFFSET validity testsJohn Stultz2016-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add some simple tests to check both valid and invalid offsets when using adjtimex's ADJ_SETOFFSET method. Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Harald Hoyer <harald@redhat.com> Cc: Kay Sievers <kay@vrfy.org> Cc: David Herrmann <dh.herrmann@gmail.com> Link: http://lkml.kernel.org/r/1453417415-19110-3-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANOJohn Stultz2016-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently, in commit 37cf4dc3370f I forgot to check if the timeval being passed was actually a timespec (as is signaled with ADJ_NANO). This resulted in that patch breaking ADJ_SETOFFSET users who set ADJ_NANO, by rejecting valid timespecs that were compared with valid timeval ranges. This patch addresses this by checking for the ADJ_NANO flag and using the timepsec check instead in that case. Reported-by: Harald Hoyer <harald@redhat.com> Reported-by: Kay Sievers <kay@vrfy.org> Fixes: 37cf4dc3370f "time: Verify time values in adjtimex ADJ_SETOFFSET to avoid overflow" Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: David Herrmann <dh.herrmann@gmail.com> Link: http://lkml.kernel.org/r/1453417415-19110-2-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | itimers: Handle relative timers with CONFIG_TIME_LOW_RES properThomas Gleixner2016-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Helge reported for timerfd we have the same issue in itimers. We return remaining time larger than the programmed relative time to user space in case of CONFIG_TIME_LOW_RES=y. Use the proper function to adjust the extra time added in hrtimer_start_range_ns(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Helge Deller <deller@gmx.de> Cc: John Stultz <john.stultz@linaro.org> Cc: linux-m68k@lists.linux-m68k.org Cc: dhowells@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20160114164159.528222587@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES properThomas Gleixner2016-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Helge reported for timerfd we have the same issue in posix timers. We return remaining time larger than the programmed relative time to user space in case of CONFIG_TIME_LOW_RES=y. Use the proper function to adjust the extra time added in hrtimer_start_range_ns(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Helge Deller <deller@gmx.de> Cc: John Stultz <john.stultz@linaro.org> Cc: linux-m68k@lists.linux-m68k.org Cc: dhowells@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20160114164159.450510905@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | timerfd: Handle relative timers with CONFIG_TIME_LOW_RES properThomas Gleixner2016-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Helge reported that a relative timer can return a remaining time larger than the programmed relative time on parisc and other architectures which have CONFIG_TIME_LOW_RES set. This happens because we add a jiffie to the resulting expiry time to prevent short timeouts. Use the new function hrtimer_expires_remaining_adjusted() to calculate the remaining time. It takes that extra added time into account for relative timers. Reported-and-tested-by: Helge Deller <deller@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: linux-m68k@lists.linux-m68k.org Cc: dhowells@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20160114164159.354500742@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | hrtimer: Handle remaining time proper for TIME_LOW_RESThomas Gleixner2016-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If CONFIG_TIME_LOW_RES is enabled we add a jiffie to the relative timeout to prevent short sleeps, but we do not account for that in interfaces which retrieve the remaining time. Helge observed that timerfd can return a remaining time larger than the relative timeout. That's not expected and breaks userland test programs. Store the information that the timer was armed relative and provide functions to adjust the remaining time. To avoid bloating the hrtimer struct make state a u8, which as a bonus results in better code on x86 at least. Reported-and-tested-by: Helge Deller <deller@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: linux-m68k@lists.linux-m68k.org Cc: dhowells@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20160114164159.273328486@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | clockevents/tcb_clksrc: Prevent disabling an already disabled clockAlexandre Belloni2016-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clockevents_exchange_device is calling clockevents_shutdown() on the new clockenvents device but it may have never been enabled in the first place. This results in the tcb clock being disabled without being enabled first: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:680 clk_disable+0x28/0x34() Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 4.4.0+ #6 Hardware name: Atmel AT91SAM9 [<c000f2b8>] (unwind_backtrace) from [<c000d01c>] (show_stack+0x10/0x14) [<c000d01c>] (show_stack) from [<c00172f0>] (warn_slowpath_common+0x78/0xa0) [<c00172f0>] (warn_slowpath_common) from [<c00173a8>] (warn_slowpath_null+0x18/0x20) [<c00173a8>] (warn_slowpath_null) from [<c0361528>] (clk_disable+0x28/0x34) [<c0361528>] (clk_disable) from [<c034d560>] (tc_shutdown+0x38/0x4c) [<c034d560>] (tc_shutdown) from [<c0059ad4>] (clockevents_switch_state+0x38/0x6c) [<c0059ad4>] (clockevents_switch_state) from [<c0059b18>] (clockevents_shutdown+0x10/0x24) [<c0059b18>] (clockevents_shutdown) from [<c005a458>] (tick_check_new_device+0x84/0xac) [<c005a458>] (tick_check_new_device) from [<c0059660>] (clockevents_register_device+0x7c/0x108) [<c0059660>] (clockevents_register_device) from [<c06b5a68>] (tcb_clksrc_init+0x390/0x3e8) [<c06b5a68>] (tcb_clksrc_init) from [<c00097cc>] (do_one_initcall+0x114/0x1d4) [<c00097cc>] (do_one_initcall) from [<c069bd54>] (kernel_init_freeable+0xfc/0x1b8) [<c069bd54>] (kernel_init_freeable) from [<c04c3818>] (kernel_init+0x8/0xe0) [<c04c3818>] (kernel_init) from [<c000a410>] (ret_from_fork+0x14/0x24) ---[ end trace 0000000000000001 ]--- Check what state we were in before trying to disable the clock. Fixes: cf4541c101ea ("clockevents/drivers/tcb_clksrc: Migrate to new 'set-state' interface") Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Boris Brezillon <boris.brezillon@free-electrons.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1452854061-30370-1-git-send-email-alexandre.belloni@free-electrons.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2016-01-31
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "Three small fixes in the scheduler/core: - use after free in the numa code - crash in the numa init code - a simple spelling fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: pid: Fix spelling in comments sched/numa: Fix use-after-free bug in the task_numa_compare sched: Fix crash in sched_init_numa()
| * | | | | | pid: Fix spelling in commentsZhen Lei2016-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Accidentally discovered this typo when I studied this module. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tianhong Ding <dingtianhong@huawei.com> Cc: Xinwei Hu <huxinwei@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1454119457-11272-1-git-send-email-thunder.leizhen@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | sched/numa: Fix use-after-free bug in the task_numa_compareGavin Guo2016-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following message can be observed on the Ubuntu v3.13.0-65 with KASan backported: ================================================================== BUG: KASan: use after free in task_numa_find_cpu+0x64c/0x890 at addr ffff880dd393ecd8 Read of size 8 by task qemu-system-x86/3998900 ============================================================================= BUG kmalloc-128 (Tainted: G B ): kasan: bad access detected ----------------------------------------------------------------------------- INFO: Allocated in task_numa_fault+0xc1b/0xed0 age=41980 cpu=18 pid=3998890 __slab_alloc+0x4f8/0x560 __kmalloc+0x1eb/0x280 task_numa_fault+0xc1b/0xed0 do_numa_page+0x192/0x200 handle_mm_fault+0x808/0x1160 __do_page_fault+0x218/0x750 do_page_fault+0x1a/0x70 page_fault+0x28/0x30 SyS_poll+0x66/0x1a0 system_call_fastpath+0x1a/0x1f INFO: Freed in task_numa_free+0x1d2/0x200 age=62 cpu=18 pid=0 __slab_free+0x2ab/0x3f0 kfree+0x161/0x170 task_numa_free+0x1d2/0x200 finish_task_switch+0x1d2/0x210 __schedule+0x5d4/0xc60 schedule_preempt_disabled+0x40/0xc0 cpu_startup_entry+0x2da/0x340 start_secondary+0x28f/0x360 Call Trace: [<ffffffff81a6ce35>] dump_stack+0x45/0x56 [<ffffffff81244aed>] print_trailer+0xfd/0x170 [<ffffffff8124ac36>] object_err+0x36/0x40 [<ffffffff8124cbf9>] kasan_report_error+0x1e9/0x3a0 [<ffffffff8124d260>] kasan_report+0x40/0x50 [<ffffffff810dda7c>] ? task_numa_find_cpu+0x64c/0x890 [<ffffffff8124bee9>] __asan_load8+0x69/0xa0 [<ffffffff814f5c38>] ? find_next_bit+0xd8/0x120 [<ffffffff810dda7c>] task_numa_find_cpu+0x64c/0x890 [<ffffffff810de16c>] task_numa_migrate+0x4ac/0x7b0 [<ffffffff810de523>] numa_migrate_preferred+0xb3/0xc0 [<ffffffff810e0b88>] task_numa_fault+0xb88/0xed0 [<ffffffff8120ef02>] do_numa_page+0x192/0x200 [<ffffffff81211038>] handle_mm_fault+0x808/0x1160 [<ffffffff810d7dbd>] ? sched_clock_cpu+0x10d/0x160 [<ffffffff81068c52>] ? native_load_tls+0x82/0xa0 [<ffffffff81a7bd68>] __do_page_fault+0x218/0x750 [<ffffffff810c2186>] ? hrtimer_try_to_cancel+0x76/0x160 [<ffffffff81a6f5e7>] ? schedule_hrtimeout_range_clock.part.24+0xf7/0x1c0 [<ffffffff81a7c2ba>] do_page_fault+0x1a/0x70 [<ffffffff81a772e8>] page_fault+0x28/0x30 [<ffffffff8128cbd4>] ? do_sys_poll+0x1c4/0x6d0 [<ffffffff810e64f6>] ? enqueue_task_fair+0x4b6/0xaa0 [<ffffffff810233c9>] ? sched_clock+0x9/0x10 [<ffffffff810cf70a>] ? resched_task+0x7a/0xc0 [<ffffffff810d0663>] ? check_preempt_curr+0xb3/0x130 [<ffffffff8128b5c0>] ? poll_select_copy_remaining+0x170/0x170 [<ffffffff810d3bc0>] ? wake_up_state+0x10/0x20 [<ffffffff8112a28f>] ? drop_futex_key_refs.isra.14+0x1f/0x90 [<ffffffff8112d40e>] ? futex_requeue+0x3de/0xba0 [<ffffffff8112e49e>] ? do_futex+0xbe/0x8f0 [<ffffffff81022c89>] ? read_tsc+0x9/0x20 [<ffffffff8111bd9d>] ? ktime_get_ts+0x12d/0x170 [<ffffffff8108f699>] ? timespec_add_safe+0x59/0xe0 [<ffffffff8128d1f6>] SyS_poll+0x66/0x1a0 [<ffffffff81a830dd>] system_call_fastpath+0x1a/0x1f As commit 1effd9f19324 ("sched/numa: Fix unsafe get_task_struct() in task_numa_assign()") points out, the rcu_read_lock() cannot protect the task_struct from being freed in the finish_task_switch(). And the bug happens in the process of calculation of imp which requires the access of p->numa_faults being freed in the following path: do_exit() current->flags |= PF_EXITING; release_task() ~~delayed_put_task_struct()~~ schedule() ... ... rq->curr = next; context_switch() finish_task_switch() put_task_struct() __put_task_struct() task_numa_free() The fix here to get_task_struct() early before end of dst_rq->lock to protect the calculation process and also put_task_struct() in the corresponding point if finally the dst_rq->curr somehow cannot be assigned. Additional credit to Liang Chen who helped fix the error logic and add the put_task_struct() to the place it missed. Signed-off-by: Gavin Guo <gavin.guo@canonical.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jay.vosburgh@canonical.com Cc: liang.chen@canonical.com Link: http://lkml.kernel.org/r/1453264618-17645-1-git-send-email-gavin.guo@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | sched: Fix crash in sched_init_numa()Raghavendra K T2016-01-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following PowerPC commit: c118baf80256 ("arch/powerpc/mm/numa.c: do not allocate bootmem memory for non existing nodes") avoids allocating bootmem memory for non existent nodes. But when DEBUG_PER_CPU_MAPS=y is enabled, my powerNV system failed to boot because in sched_init_numa(), cpumask_or() operation was done on unallocated nodes. Fix that by making cpumask_or() operation only on existing nodes. [ Tested with and w/o DEBUG_PER_CPU_MAPS=y on x86 and PowerPC. ] Reported-by: Jan Stancek <jstancek@redhat.com> Tested-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: <gkurz@linux.vnet.ibm.com> Cc: <grant.likely@linaro.org> Cc: <nikunj@linux.vnet.ibm.com> Cc: <vdavydov@parallels.com> Cc: <linuxppc-dev@lists.ozlabs.org> Cc: <linux-mm@kvack.org> Cc: <peterz@infradead.org> Cc: <benh@kernel.crashing.org> Cc: <paulus@samba.org> Cc: <mpe@ellerman.id.au> Cc: <anton@samba.org> Link: http://lkml.kernel.org/r/1452884483-11676-1-git-send-email-raghavendra.kt@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2016-01-31
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "This is much bigger than typical fixes, but Peter found a category of races that spurred more fixes and more debugging enhancements. Work started before the merge window, but got finished only now. Aside of that this contains the usual small fixes to perf and tools. Nothing particular exciting" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits) perf: Remove/simplify lockdep annotation perf: Synchronously clean up child events perf: Untangle 'owner' confusion perf: Add flags argument to perf_remove_from_context() perf: Clean up sync_child_event() perf: Robustify event->owner usage and SMP ordering perf: Fix STATE_EXIT usage perf: Update locking order perf: Remove __free_event() perf/bpf: Convert perf_event_array to use struct file perf: Fix NULL deref perf/x86: De-obfuscate code perf/x86: Fix uninitialized value usage perf: Fix race in perf_event_exit_task_context() perf: Fix orphan hole perf stat: Do not clean event's private stats perf hists: Fix HISTC_MEM_DCACHELINE width setting perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed perf tests: Remove wrong semicolon in while loop in CQM test perf: Synchronously free aux pages in case of allocation failure ...
| * \ \ \ \ \ \ Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar2016-01-30
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Fix 'perf stat' stddev reporting due to mistakenly cleaning event private stats (Jiri Olsa) - Fix 'perf test CQM' endless loop detected by 'gcc6 -Wmisleading-indentation' (Markus Trippelsdorf) - Fix behaviour of Shift-Tab when nothing is focussed in the annotate TUI browser, detected with gcc6 -Wmisleading-indentation (Markus Trippelsdorf) - Fix mem data cacheline hists browser width setting for unresolved addresses (Jiri Olsa) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | | | | | perf stat: Do not clean event's private statsJiri Olsa2016-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mel reported stddev reporting was broken due to following commit: 106a94a0f8c2 ("perf stat: Introduce read_counters function") This commit merged interval and overall counters reading into single read_counters function. The old interval code cleaned the stddev data for some reason (it's never displayed in interval mode) and the mentioned commit kept on cleaning the stddev data in merged function, which resulted in the stddev not being displayed. Removing the wrong stddev data cleanup init_stats call. Reported-and-Tested-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@vger.kernel.org # v4.2+ Fixes: 106a94a0f8c2 ("perf stat: Introduce read_counters function") Link: http://lkml.kernel.org/r/1453290995-18485-4-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | | | perf hists: Fix HISTC_MEM_DCACHELINE width settingJiri Olsa2016-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set correct width for unresolved mem_dcacheline addr. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: 9b32ba71ba90 ("perf tools: Add dcacheline sort") Link: http://lkml.kernel.org/r/1453290995-18485-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | | | perf annotate browser: Fix behaviour of Shift-Tab with nothing focussedMarkus Trippelsdorf2016-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue was pointed out by gcc-6's -Wmisleading-indentation. Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: c97cf42219b7 ("perf top: Live TUI Annotation") Link: http://lkml.kernel.org/r/20151214154403.GB1409@x4 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | | | perf tests: Remove wrong semicolon in while loop in CQM testMarkus Trippelsdorf2016-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The while loop was spinning. Fix by removing a semicolon. The issue was pointed out by gcc-6's -Wmisleading-indentation. Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 035827e9f2bd ("perf tests: Add Intel CQM test") Link: http://lkml.kernel.org/r/20151214154335.GA1409@x4 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | perf: Remove/simplify lockdep annotationPeter Zijlstra2016-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the perf_event_ctx_lock_nested() call has moved from put_event() into perf_event_release_kernel() the first reason is no longer valid as that can no longer happen. The second reason seems to have been invalidated when Al Viro made fput() unconditionally async in the following commit: 4a9d4b024a31 ("switch fput to task_work_add") such that munmap()->fput()->release()->perf_release() would no longer happen. Therefore, remove the annotation. This should increase the efficiency of lockdep coverage of perf locking. Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf: Synchronously clean up child eventsPeter Zijlstra2016-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The orphan cleanup workqueue doesn't always catch orphans, for example, if they never schedule after they are orphaned. IOW, the event leak is still very real. It also wouldn't work for kernel counters. Doing it synchonously is a little hairy due to lock inversion issues, but is made to work. Patch based on work by Alexander Shishkin. Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf: Untangle 'owner' confusionPeter Zijlstra2016-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two concepts of owner wrt an event and they are conflated: - event::owner / event::owner_list, used by prctl(.option = PR_TASK_PERF_EVENTS_{EN,DIS}ABLE). - the 'owner' of the event object, typically the file descriptor. Currently these two concepts are conflated, which gives trouble with scm_rights passing of file descriptors. Passing the event and then closing the creating task would render the event 'orphan' and would have it cleared out. Unlikely what is expectd. This patch untangles these two concepts by using PERF_EVENT_STATE_EXIT to denote the second type. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf: Add flags argument to perf_remove_from_context()Peter Zijlstra2016-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to adding more options, convert the boolean argument into a flags word. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf: Clean up sync_child_event()Peter Zijlstra2016-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sync_child_event() has outgrown its purpose, it does far too much. Bring it back to its named purpose. Rename __perf_event_exit_task() to perf_event_exit_event() to better reflect what it does and move the event->state assignment under the ctx->lock, like state changes ought to be. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf: Robustify event->owner usage and SMP orderingPeter Zijlstra2016-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use smp_store_release() to clear event->owner and lockless_dereference() to observe it. Further use READ_ONCE() for all lockless reads. This changes perf_remove_from_owner() to leave event->owner cleared. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf: Fix STATE_EXIT usagePeter Zijlstra2016-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should never attempt to enable a STATE_EXIT event. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf: Update locking orderPeter Zijlstra2016-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the locking order to note that ctx::lock nests inside of child_mutex, as per: perf_ioctl(): ctx::mutex -> perf_event_for_each(): event::child_mutex -> _perf_event_enable(): ctx::lock Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf: Remove __free_event()Peter Zijlstra2016-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is but a single caller, remove the function - we already have _free_event(), the extra indirection is nonsensical.. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf/bpf: Convert perf_event_array to use struct fileAlexei Starovoitov2016-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Robustify refcounting. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wang Nan <wangnan0@huawei.com> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160126045947.GA40151@ast-mbp.thefacebook.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf: Fix NULL derefPeter Zijlstra2016-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dan reported: 1229 if (ctx->task == TASK_TOMBSTONE || 1230 !atomic_inc_not_zero(&ctx->refcount)) { 1231 raw_spin_unlock(&ctx->lock); 1232 ctx = NULL; ^^^^^^^^^^ ctx is NULL. 1233 } 1234 1235 WARN_ON_ONCE(ctx->task != task); ^^^^^^^^^^^^^^^^^ The patch adds a NULL dereference. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 63b6da39bb38 ("perf: Fix perf_event_exit_task() race") Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf/x86: De-obfuscate codePeter Zijlstra2016-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get rid of the 'onln' obfuscation. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf/x86: Fix uninitialized value usagePeter Zijlstra2016-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When calling intel_alt_er() with .idx != EXTRA_REG_RSP_* we will not initialize alt_idx and then use this uninitialized value to index an array. When that is not fatal, it can result in an infinite loop in its caller __intel_shared_reg_get_constraints(), with IRQs disabled. Alternative error modes are random memory corruption due to the cpuc->shared_regs->regs[] array overrun, which manifest in either get_constraints or put_constraints doing weird stuff. Only took 6 hours of painful debugging to find this. Neither GCC nor Smatch warnings flagged this bug. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: ae3f011fc251 ("perf/x86/intel: Fix SLM MSR_OFFCORE_RSP1 valid_mask") Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf: Fix race in perf_event_exit_task_context()Peter Zijlstra2016-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race between perf_event_exit_task_context() and orphans_remove_work() which results in a use-after-free. We mark ctx->task with TASK_TOMBSTONE to indicate a context is 'dead', under ctx->lock. After which point event_function_call() on any event of that context will NOP A concurrent orphans_remove_work() will only hold ctx->mutex for the list iteration and not serialize against this. Therefore its possible that orphans_remove_work()'s perf_remove_from_context() call will fail, but we'll continue to free the event, with the result of free'd memory still being on lists and everything. Once perf_event_exit_task_context() gets around to acquiring ctx->mutex it too will iterate the event list, encounter the already free'd event and proceed to free it _again_. This fails with the WARN in free_event(). Plug the race by having perf_event_exit_task_context() hold ctx::mutex over the whole tear-down, thereby 'naturally' serializing against all other sites, including the orphan work. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: alexander.shishkin@linux.intel.com Cc: dsahern@gmail.com Cc: namhyung@kernel.org Link: http://lkml.kernel.org/r/20160125130954.GY6357@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf: Fix orphan holePeter Zijlstra2016-01-28
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should set event->owner before we install the event, otherwise there is a hole where the target task can fork() and we'll not inherit the event because it thinks the event is orphaned. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | perf: Synchronously free aux pages in case of allocation failureAlexander Shishkin2016-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are currently using asynchronous deallocation in the error path in AUX mmap code, which is unnecessary and also presents a problem for users that wish to probe for the biggest possible buffer size they can get: they'll get -EINVAL on all subsequent attemts to allocate a smaller buffer before the asynchronous deallocation callback frees up the pages from the previous unsuccessful attempt. Currently, gdb does that for allocating AUX buffers for Intel PT traces. More specifically, overwrite mode of AUX pmus that don't support hardware sg (some implementations of Intel PT, for instance) is limited to only one contiguous high order allocation for its buffer and there is no way of knowing its size without trying. This patch changes error path freeing to be synchronous as there won't be any contenders for the AUX pages at that point. Reported-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1453216469-9509-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | perf/x86: add Intel SkyLake uncore IMC PMU supportStephane Eranian2016-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables the uncore_imc PMU for Intel SkyLake Desktop processors (Core i7-6700, model 94). It is possible to compute memory read/write bandwidth using: $ perf stat -a -e uncore_imc/data_reads/,uncore_imc/data_writes/ .... Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1452151546-8853-1-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | perf: Fix perf_event_exit_task() racePeter Zijlstra2016-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race against perf_event_exit_task() vs event_function_call(),find_get_context(),perf_install_in_context() (iow, everyone). Since there is no permanent marker on a context that its dead, it is quite possible that we access (and even modify) a context after its passed through perf_event_exit_task(). For instance, find_get_context() might find the context still installed, but by the time we get to perf_install_in_context() it might already have passed through perf_event_exit_task() and be considered dead, we will however still add the event to it. Solve this by marking a ctx dead by setting its ctx->task value to -1, it must be !0 so we still know its a (former) task context. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | perf: Add more assertionsPeter Zijlstra2016-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Try to trigger warnings before races do damage. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | perf: Collapse and fix event_function_call() usersPeter Zijlstra2016-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is one common bug left in all the event_function_call() users, between loading ctx->task and getting to the remote_function(), ctx->task can already have been changed. Therefore we need to double check and retry if ctx->task != current. Insert another trampoline specific to event_function_call() that checks for this and further validates state. This also allows getting rid of the active/inactive functions. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | perf: Specialize perf_event_exit_task()Peter Zijlstra2016-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The perf_remove_from_context() usage in __perf_event_exit_task() is different from the other usages in that this site has already detached and scheduled out the task context. This will stand in the way of stronger assertions checking the (task) context scheduling invariants. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | perf: Fix task context schedulingPeter Zijlstra2016-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a very nasty problem wrt disabling the perf task scheduling hooks. Currently we {set,clear} ctx->is_active on every __perf_event_task_sched_{in,out}, _however_ this means that if we disable these calls we'll have task contexts with ->is_active set that are not active and 'active' task contexts without ->is_active set. This can result in event_function_call() looping on the ctx->is_active condition basically indefinitely. Resolve this by changing things such that contexts without events do not set ->is_active like we used to. From this invariant it trivially follows that if there are no (task) events, every task ctx is inactive and disabling the context switch hooks is harmless. This leaves two places that need attention (and already had accumulated weird and wonderful hacks to work around, without recognising this actual problem). Namely: - perf_install_in_context() will need to deal with installing events in an inactive context, meaning it cannot rely on ctx-is_active for its IPIs. - perf_remove_from_context() will have to mark a context as inactive when it removes the last event. For specific detail, see the patch/comments. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>