aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/sched.c
diff options
context:
space:
mode:
authorStephane Eranian <eranian@google.com>2011-08-25 09:58:03 -0400
committerIngo Molnar <mingo@elte.hu>2011-08-29 06:28:33 -0400
commita8d757ef076f0f95f13a918808824058de25b3eb (patch)
tree3c1151ef886d9b72d0a7b7b267d9f37c72d5f475 /kernel/sched.c
parentc6a389f123b9f68d605bb7e0f9b32ec1e3e14132 (diff)
perf events: Fix slow and broken cgroup context switch code
The current cgroup context switch code was incorrect leading to bogus counts. Furthermore, as soon as there was an active cgroup event on a CPU, the context switch cost on that CPU would increase by a significant amount as demonstrated by a simple ping/pong example: $ ./pong Both processes pinned to CPU1, running for 10s 10684.51 ctxsw/s Now start a cgroup perf stat: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100 $ ./pong Both processes pinned to CPU1, running for 10s 6674.61 ctxsw/s That's a 37% penalty. Note that pong is not even in the monitored cgroup. The results shown by perf stat are bogus: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100 Performance counter stats for 'sleep 100': CPU1 <not counted> cycles test CPU1 16,984,189,138 cycles # 0.000 GHz The second 'cycles' event should report a count @ CPU clock (here 2.4GHz) as it is counting across all cgroups. The patch below fixes the bogus accounting and bypasses any cgroup switches in case the outgoing and incoming tasks are in the same cgroup. With this patch the same test now yields: $ ./pong Both processes pinned to CPU1, running for 10s 10775.30 ctxsw/s Start perf stat with cgroup: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Run pong outside the cgroup: $ /pong Both processes pinned to CPU1, running for 10s 10687.80 ctxsw/s The penalty is now less than 2%. And the results for perf stat are correct: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Performance counter stats for 'sleep 10': CPU1 <not counted> cycles test # 0.000 GHz CPU1 23,933,981,448 cycles # 0.000 GHz Now perf stat reports the correct counts for for the non cgroup event. If we run pong inside the cgroup, then we also get the correct counts: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Performance counter stats for 'sleep 10': CPU1 22,297,726,205 cycles test # 0.000 GHz CPU1 23,933,981,448 cycles # 0.000 GHz 10.001457237 seconds time elapsed Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad Signed-off-by: Ingo Molnar <mingo@elte.hu>
Diffstat (limited to 'kernel/sched.c')
-rw-r--r--kernel/sched.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/kernel/sched.c b/kernel/sched.c
index ccacdbdecf45..0408cdc6d572 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3065,7 +3065,7 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
3065#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW 3065#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
3066 local_irq_disable(); 3066 local_irq_disable();
3067#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */ 3067#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
3068 perf_event_task_sched_in(current); 3068 perf_event_task_sched_in(prev, current);
3069#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW 3069#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
3070 local_irq_enable(); 3070 local_irq_enable();
3071#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */ 3071#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */