From 383f2835eb9afb723af71850037b2f074ac9db60 Mon Sep 17 00:00:00 2001 From: "Chen, Kenneth W" Date: Fri, 9 Sep 2005 13:02:02 -0700 Subject: [PATCH] Prefetch kernel stacks to speed up context switch For architecture like ia64, the switch stack structure is fairly large (currently 528 bytes). For context switch intensive application, we found that significant amount of cache misses occurs in switch_to() function. The following patch adds a hook in the schedule() function to prefetch switch stack structure as soon as 'next' task is determined. This allows maximum overlap in prefetch cache lines for that structure. Signed-off-by: Ken Chen Cc: Ingo Molnar Cc: "Luck, Tony" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/sched.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux/sched.h') diff --git a/include/linux/sched.h b/include/linux/sched.h index ea1b5f32ec5c..c551e6a1447e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -604,6 +604,11 @@ extern int groups_search(struct group_info *group_info, gid_t grp); #define GROUP_AT(gi, i) \ ((gi)->blocks[(i)/NGROUPS_PER_BLOCK][(i)%NGROUPS_PER_BLOCK]) +#ifdef ARCH_HAS_PREFETCH_SWITCH_STACK +extern void prefetch_stack(struct task_struct*); +#else +static inline void prefetch_stack(struct task_struct *t) { } +#endif struct audit_context; /* See audit.c */ struct mempolicy; -- cgit v1.2.2 From 4247bdc60048018b98f71228b45cfbc5f5270c86 Mon Sep 17 00:00:00 2001 From: Paul Jackson Date: Sat, 10 Sep 2005 00:26:06 -0700 Subject: [PATCH] cpuset semaphore depth check deadlock fix The cpusets-formalize-intermediate-gfp_kernel-containment patch has a deadlock problem. This patch was part of a set of four patches to make more extensive use of the cpuset 'mem_exclusive' attribute to manage kernel GFP_KERNEL memory allocations and to constrain the out-of-memory (oom) killer. A task that is changing cpusets in particular ways on a system when it is very short of free memory could double trip over the global cpuset_sem semaphore (get the lock and then deadlock trying to get it again). The second attempt to get cpuset_sem would be in the routine cpuset_zone_allowed(). This was discovered by code inspection. I can not reproduce the problem except with an artifically hacked kernel and a specialized stress test. In real life you cannot hit this unless you are manipulating cpusets, and are very unlikely to hit it unless you are rapidly modifying cpusets on a memory tight system. Even then it would be a rare occurence. If you did hit it, the task double tripping over cpuset_sem would deadlock in the kernel, and any other task also trying to manipulate cpusets would deadlock there too, on cpuset_sem. Your batch manager would be wedged solid (if it was cpuset savvy), but classic Unix shells and utilities would work well enough to reboot the system. The unusual condition that led to this bug is that unlike most semaphores, cpuset_sem _can_ be acquired while in the page allocation code, when __alloc_pages() calls cpuset_zone_allowed. So it easy to mistakenly perform the following sequence: 1) task makes system call to alter a cpuset 2) take cpuset_sem 3) try to allocate memory 4) memory allocator, via cpuset_zone_allowed, trys to take cpuset_sem 5) deadlock The reason that this is not a serious bug for most users is that almost all calls to allocate memory don't require taking cpuset_sem. Only some code paths off the beaten track require taking cpuset_sem -- which is good. Taking a global semaphore on the main code path for allocating memory would not scale well. This patch fixes this deadlock by wrapping the up() and down() calls on cpuset_sem in kernel/cpuset.c with code that tracks the nesting depth of the current task on that semaphore, and only does the real down() if the task doesn't hold the lock already, and only does the real up() if the nesting depth (number of unmatched downs) is exactly one. The previous required use of refresh_mems(), anytime that the cpuset_sem semaphore was acquired and the code executed while holding that semaphore might try to allocate memory, is no longer required. Two refresh_mems() calls were removed thanks to this. This is a good change, as failing to get all the necessary refresh_mems() calls placed was a primary source of bugs in this cpuset code. The only remaining call to refresh_mems() is made while doing a memory allocation, if certain task memory placement data needs to be updated from its cpuset, due to the cpuset having been changed behind the tasks back. Signed-off-by: Paul Jackson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/sched.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux/sched.h') diff --git a/include/linux/sched.h b/include/linux/sched.h index c551e6a1447e..8a1fcfe80fc7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -782,6 +782,7 @@ struct task_struct { short il_next; #endif #ifdef CONFIG_CPUSETS + short cpuset_sem_nest_depth; struct cpuset *cpuset; nodemask_t mems_allowed; int cpuset_mems_generation; -- cgit v1.2.2 From d79fc0fc6645b0cf5cd980da76942ca6d6300fa4 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Sat, 10 Sep 2005 00:26:12 -0700 Subject: [PATCH] sched: TASK_NONINTERACTIVE This patch implements a task state bit (TASK_NONINTERACTIVE), which can be used by blocking points to mark the task's wait as "non-interactive". This does not mean the task will be considered a CPU-hog - the wait will simply not have an effect on the waiting task's priority - positive or negative alike. Right now only pipe_wait() will make use of it, because it's a common source of not-so-interactive waits (kernel compilation jobs, etc.). Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/sched.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux/sched.h') diff --git a/include/linux/sched.h b/include/linux/sched.h index 8a1fcfe80fc7..ac70f845b5b1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -114,6 +114,7 @@ extern unsigned long nr_iowait(void); #define TASK_TRACED 8 #define EXIT_ZOMBIE 16 #define EXIT_DEAD 32 +#define TASK_NONINTERACTIVE 64 #define __set_task_state(tsk, state_value) \ do { (tsk)->state = (state_value); } while (0) -- cgit v1.2.2 From 64ed93a268bc18fa6f72f61420d0e0022c5e38d1 Mon Sep 17 00:00:00 2001 From: Nishanth Aravamudan Date: Sat, 10 Sep 2005 00:27:21 -0700 Subject: [PATCH] add schedule_timeout_{,un}interruptible() interfaces Add schedule_timeout_{,un}interruptible() interfaces so that schedule_timeout() callers don't have to worry about forgetting to add the set_current_state() call beforehand. Signed-off-by: Nishanth Aravamudan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/sched.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux/sched.h') diff --git a/include/linux/sched.h b/include/linux/sched.h index ac70f845b5b1..4b83cb230006 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -203,6 +203,8 @@ extern int in_sched_functions(unsigned long addr); #define MAX_SCHEDULE_TIMEOUT LONG_MAX extern signed long FASTCALL(schedule_timeout(signed long timeout)); +extern signed long schedule_timeout_interruptible(signed long timeout); +extern signed long schedule_timeout_uninterruptible(signed long timeout); asmlinkage void schedule(void); struct namespace; -- cgit v1.2.2 From a2a979821b6ab75a4f143cfaa1c4672cc259ec10 Mon Sep 17 00:00:00 2001 From: Keith Owens Date: Sun, 11 Sep 2005 17:19:06 +1000 Subject: [PATCH] MCA/INIT: scheduler hooks Scheduler hooks to see/change which process is deemed to be on a cpu. Signed-off-by: Keith Owens Signed-off-by: Tony Luck --- include/linux/sched.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux/sched.h') diff --git a/include/linux/sched.h b/include/linux/sched.h index 4b83cb230006..ed3bb19d1337 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -904,6 +904,8 @@ extern int task_curr(const task_t *p); extern int idle_cpu(int cpu); extern int sched_setscheduler(struct task_struct *, int, struct sched_param *); extern task_t *idle_task(int cpu); +extern task_t *curr_task(int cpu); +extern void set_curr_task(int cpu, task_t *p); void yield(void); -- cgit v1.2.2 From b3426599af9524104be6938bcb1fcaab314781c7 Mon Sep 17 00:00:00 2001 From: Paul Jackson Date: Mon, 12 Sep 2005 04:30:30 -0700 Subject: [PATCH] cpuset semaphore depth check optimize Optimize the deadlock avoidance check on the global cpuset semaphore cpuset_sem. Instead of adding a depth counter to the task struct of each task, rather just two words are enough, one to store the depth and the other the current cpuset_sem holder. Thanks to Nikita Danilov for the idea. Signed-off-by: Paul Jackson [ We may want to change this further, but at least it's now a totally internal decision to the cpusets code ] Signed-off-by: Linus Torvalds --- include/linux/sched.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux/sched.h') diff --git a/include/linux/sched.h b/include/linux/sched.h index ed3bb19d1337..38c8654aaa96 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -785,7 +785,6 @@ struct task_struct { short il_next; #endif #ifdef CONFIG_CPUSETS - short cpuset_sem_nest_depth; struct cpuset *cpuset; nodemask_t mems_allowed; int cpuset_mems_generation; -- cgit v1.2.2