diff options
author | Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> | 2009-12-02 03:28:07 -0500 |
---|---|---|
committer | Ingo Molnar <mingo@elte.hu> | 2009-12-02 11:32:40 -0500 |
commit | 0cf55e1ec08bb5a22e068309e2d8ba1180ab4239 (patch) | |
tree | 6102662a9594d51155bee11666fe8517fcbe6039 /fs | |
parent | d99ca3b977fc5a93141304f571475c2af9e6c1c5 (diff) |
sched, cputime: Introduce thread_group_times()
This is a real fix for problem of utime/stime values decreasing
described in the thread:
http://lkml.org/lkml/2009/11/3/522
Now cputime is accounted in the following way:
- {u,s}time in task_struct are increased every time when the thread
is interrupted by a tick (timer interrupt).
- When a thread exits, its {u,s}time are added to signal->{u,s}time,
after adjusted by task_times().
- When all threads in a thread_group exits, accumulated {u,s}time
(and also c{u,s}time) in signal struct are added to c{u,s}time
in signal struct of the group's parent.
So {u,s}time in task struct are "raw" tick count, while
{u,s}time and c{u,s}time in signal struct are "adjusted" values.
And accounted values are used by:
- task_times(), to get cputime of a thread:
This function returns adjusted values that originates from raw
{u,s}time and scaled by sum_exec_runtime that accounted by CFS.
- thread_group_cputime(), to get cputime of a thread group:
This function returns sum of all {u,s}time of living threads in
the group, plus {u,s}time in the signal struct that is sum of
adjusted cputimes of all exited threads belonged to the group.
The problem is the return value of thread_group_cputime(),
because it is mixed sum of "raw" value and "adjusted" value:
group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)
This misbehavior can break {u,s}time monotonicity.
Assume that if there is a thread that have raw values greater
than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
but only runs 45ms) and if it exits, cputime will decrease (e.g.
-5ms).
To fix this, we could do:
group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)
But task_times() contains hard divisions, so applying it for
every thread should be avoided.
This patch fixes the above problem in the following way:
- Modify thread's exit (= __exit_signal()) not to use task_times().
It means {u,s}time in signal struct accumulates raw values instead
of adjusted values. As the result it makes thread_group_cputime()
to return pure sum of "raw" values.
- Introduce a new function thread_group_times(*task, *utime, *stime)
that converts "raw" values of thread_group_cputime() to "adjusted"
values, in same calculation procedure as task_times().
- Modify group's exit (= wait_task_zombie()) to use this introduced
thread_group_times(). It make c{u,s}time in signal struct to
have adjusted values like before this patch.
- Replace some thread_group_cputime() by thread_group_times().
This replacements are only applied where conveys the "adjusted"
cputime to users, and where already uses task_times() near by it.
(i.e. sys_times(), getrusage(), and /proc/<PID>/stat.)
This patch have a positive side effect:
- Before this patch, if a group contains many short-life threads
(e.g. runs 0.9ms and not interrupted by ticks), the group's
cputime could be invisible since thread's cputime was accumulated
after adjusted: imagine adjustment function as adj(ticks, runtime),
{adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
After this patch it will not happen because the adjustment is
applied after accumulated.
v2:
- remove if()s, put new variables into signal_struct.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
LKML-Reference: <4B162517.8040909@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Diffstat (limited to 'fs')
-rw-r--r-- | fs/proc/array.c | 5 |
1 files changed, 1 insertions, 4 deletions
diff --git a/fs/proc/array.c b/fs/proc/array.c index ca61a88aed66..2571da43c736 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c | |||
@@ -506,7 +506,6 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, | |||
506 | 506 | ||
507 | /* add up live thread stats at the group level */ | 507 | /* add up live thread stats at the group level */ |
508 | if (whole) { | 508 | if (whole) { |
509 | struct task_cputime cputime; | ||
510 | struct task_struct *t = task; | 509 | struct task_struct *t = task; |
511 | do { | 510 | do { |
512 | min_flt += t->min_flt; | 511 | min_flt += t->min_flt; |
@@ -517,9 +516,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, | |||
517 | 516 | ||
518 | min_flt += sig->min_flt; | 517 | min_flt += sig->min_flt; |
519 | maj_flt += sig->maj_flt; | 518 | maj_flt += sig->maj_flt; |
520 | thread_group_cputime(task, &cputime); | 519 | thread_group_times(task, &utime, &stime); |
521 | utime = cputime.utime; | ||
522 | stime = cputime.stime; | ||
523 | gtime = cputime_add(gtime, sig->gtime); | 520 | gtime = cputime_add(gtime, sig->gtime); |
524 | } | 521 | } |
525 | 522 | ||