diff options
author | Hugh Dickins <hugh@veritas.com> | 2005-10-29 21:16:18 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@g5.osdl.org> | 2005-10-30 00:40:39 -0400 |
commit | 365e9c87a982c03d0af3886e29d877f581b59611 (patch) | |
tree | d06c1918ca9fe6677d7e4e869555e095004274f7 /kernel | |
parent | 861f2fb8e796022b4928cab9c74fca6681a1c557 (diff) |
[PATCH] mm: update_hiwaters just in time
update_mem_hiwater has attracted various criticisms, in particular from those
concerned with mm scalability. Originally it was called whenever rss or
total_vm got raised. Then many of those callsites were replaced by a timer
tick call from account_system_time. Now Frank van Maarseveen reports that to
be found inadequate. How about this? Works for Frank.
Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
update_hiwater_rss and update_hiwater_vm. Don't attempt to keep
mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
by 1): those are hot paths. Do the opposite, update only when about to lower
rss (usually by many), or just before final accounting in do_exit. Handle
mm->hiwater_vm in the same way, though it's much less of an issue. Demand
that whoever collects these hiwater statistics do the work of taking the
maximum with rss or total_vm.
And there has been no collector of these hiwater statistics in the tree. The
new convention needs an example, so match Frank's usage by adding a VmPeak
line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS
(High-Water-Mark or High-Water-Memory).
There was a particular anomaly during mremap move, that hiwater_vm might be
captured too high. A fleeting such anomaly remains, but it's quickly
corrected now, whereas before it would stick.
What locking? None: if the app is racy then these statistics will be racy,
it's not worth any overhead to make them exact. But whenever it suits,
hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
page_table_lock (for now) or with preemption disabled (later on): without
going to any trouble, minimize the time between reading current values and
updating, to minimize those occasions when a racing thread bumps a count up
and back down in between.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diffstat (limited to 'kernel')
-rw-r--r-- | kernel/exit.c | 5 | ||||
-rw-r--r-- | kernel/sched.c | 2 |
2 files changed, 4 insertions, 3 deletions
diff --git a/kernel/exit.c b/kernel/exit.c index 3b25b182d2be..79f52b85d6ed 100644 --- a/kernel/exit.c +++ b/kernel/exit.c | |||
@@ -839,7 +839,10 @@ fastcall NORET_TYPE void do_exit(long code) | |||
839 | preempt_count()); | 839 | preempt_count()); |
840 | 840 | ||
841 | acct_update_integrals(tsk); | 841 | acct_update_integrals(tsk); |
842 | update_mem_hiwater(tsk); | 842 | if (tsk->mm) { |
843 | update_hiwater_rss(tsk->mm); | ||
844 | update_hiwater_vm(tsk->mm); | ||
845 | } | ||
843 | group_dead = atomic_dec_and_test(&tsk->signal->live); | 846 | group_dead = atomic_dec_and_test(&tsk->signal->live); |
844 | if (group_dead) { | 847 | if (group_dead) { |
845 | del_timer_sync(&tsk->signal->real_timer); | 848 | del_timer_sync(&tsk->signal->real_timer); |
diff --git a/kernel/sched.c b/kernel/sched.c index 1e5cafdf4e27..4f26c544d02c 100644 --- a/kernel/sched.c +++ b/kernel/sched.c | |||
@@ -2511,8 +2511,6 @@ void account_system_time(struct task_struct *p, int hardirq_offset, | |||
2511 | cpustat->idle = cputime64_add(cpustat->idle, tmp); | 2511 | cpustat->idle = cputime64_add(cpustat->idle, tmp); |
2512 | /* Account for system time used */ | 2512 | /* Account for system time used */ |
2513 | acct_update_integrals(p); | 2513 | acct_update_integrals(p); |
2514 | /* Update rss highwater mark */ | ||
2515 | update_mem_hiwater(p); | ||
2516 | } | 2514 | } |
2517 | 2515 | ||
2518 | /* | 2516 | /* |