[PATCH] cpuset: memory pressure meter

Provide a simple per-cpuset metric of memory pressure, tracking the -rate- that the tasks in a cpuset call try_to_free_pages(), the synchronous (direct) memory reclaim code. This enables batch managers monitoring jobs running in dedicated cpusets to efficiently detect what level of memory pressure that job is causing. This is useful both on tightly managed systems running a wide mix of submitted jobs, which may choose to terminate or reprioritize jobs that are trying to use more memory than allowed on the nodes assigned them, and with tightly coupled, long running, massively parallel scientific computing jobs that will dramatically fail to meet required performance goals if they start to use more memory than allowed to them. This patch just provides a very economical way for the batch manager to monitor a cpuset for signs of memory pressure. It's up to the batch manager or other user code to decide what to do about it and take action. ==> Unless this feature is enabled by writing "1" to the special file /dev/cpuset/memory_pressure_enabled, the hook in the rebalance code of __alloc_pages() for this metric reduces to simply noticing that the cpuset_memory_pressure_enabled flag is zero. So only systems that enable this feature will compute the metric. Why a per-cpuset, running average: Because this meter is per-cpuset, rather than per-task or mm, the system load imposed by a batch scheduler monitoring this metric is sharply reduced on large systems, because a scan of the tasklist can be avoided on each set of queries. Because this meter is a running average, instead of an accumulating counter, a batch scheduler can detect memory pressure with a single read, instead of having to read and accumulate results for a period of time. Because this meter is per-cpuset rather than per-task or mm, the batch scheduler can obtain the key information, memory pressure in a cpuset, with a single read, rather than having to query and accumulate results over all the (dynamically changing) set of tasks in the cpuset. A per-cpuset simple digital filter (requires a spinlock and 3 words of data per-cpuset) is kept, and updated by any task attached to that cpuset, if it enters the synchronous (direct) page reclaim code. A per-cpuset file provides an integer number representing the recent (half-life of 10 seconds) rate of direct page reclaims caused by the tasks in the cpuset, in units of reclaims attempted per second, times 1000. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
author: Paul Jackson <pj@sgi.com> 2006-01-08 04:01:49 -0500
committer: Linus Torvalds <torvalds@g5.osdl.org> 2006-01-08 23:13:42 -0500
commit: 3e0d98b9f1eb757fc98efc84e74e54a08308aa73 (patch)
tree: 7cf1c75994f734ede7ec89373de640c4a58b237a /include/linux
parent: 5966514db662fb24c9bb43226a80106bcffd51f8 (diff)
1 files changed, 11 insertions, 0 deletions
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 8b21786490ee..736d73801cb6 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -26,6 +26,15 @@ void cpuset_update_current_mems_allowed(void);
 int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl);
 extern int cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask);
 extern int cpuset_excl_nodes_overlap(const struct task_struct *p);
+#define cpuset_memory_pressure_bump()                           \
+        do {                                                    \
+                if (cpuset_memory_pressure_enabled)             \
+                        __cpuset_memory_pressure_bump();        \
+        } while (0)
+extern int cpuset_memory_pressure_enabled;
+extern void __cpuset_memory_pressure_bump(void);
 extern struct file_operations proc_cpuset_operations;
 extern char *cpuset_task_status_allowed(struct task_struct *task, char *buffer);
@@ -60,6 +69,8 @@ static inline int cpuset_excl_nodes_overlap(const struct task_struct *p)
        return 1;
 }
+static inline void cpuset_memory_pressure_bump(void) {}
 static inline char *cpuset_task_status_allowed(struct task_struct *task,
                                                        char *buffer)
 {
author	Paul Jackson <pj@sgi.com>	2006-01-08 04:01:49 -0500
committer	Linus Torvalds <torvalds@g5.osdl.org>	2006-01-08 23:13:42 -0500
commit	3e0d98b9f1eb757fc98efc84e74e54a08308aa73 (patch)
tree	7cf1c75994f734ede7ec89373de640c4a58b237a /include/linux
parent	5966514db662fb24c9bb43226a80106bcffd51f8 (diff)