From 3773dc92055f219c2f2403f9de774b8ad41a9214 Mon Sep 17 00:00:00 2001 From: Jan Blunck Date: Sun, 13 Aug 2006 23:24:19 -0700 Subject: [PATCH] fix hrtimer percpu usage typo The percpu variable is used incorrectly in switch_hrtimer_base(). Signed-off-by: Jan Blunck Acked-by: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- kernel/hrtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'kernel') diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index be989efc7856..21c38a7e666b 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -187,7 +187,7 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_base *base) { struct hrtimer_base *new_base; - new_base = &__get_cpu_var(hrtimer_bases[base->index]); + new_base = &__get_cpu_var(hrtimer_bases)[base->index]; if (base != new_base) { /* -- cgit v1.2.2 From 657b3010d8f8a72195dfcbe63040127d596f0b14 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Sun, 13 Aug 2006 23:24:19 -0700 Subject: [PATCH] panic.c build fix kernel/panic.c: In function 'add_taint': kernel/panic.c:176: warning: implicit declaration of function 'debug_locks_off' Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- kernel/panic.c | 1 + 1 file changed, 1 insertion(+) (limited to 'kernel') diff --git a/kernel/panic.c b/kernel/panic.c index d8a0bca21233..9b8dcfd1ca93 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -18,6 +18,7 @@ #include #include #include +#include int panic_on_oops; int tainted; -- cgit v1.2.2 From 6997a6faaa129a1c91775f7344c8d371a05178ea Mon Sep 17 00:00:00 2001 From: Kirill Korotaev Date: Sun, 13 Aug 2006 23:24:23 -0700 Subject: [PATCH] sys_getppid oopses on debug kernel sys_getppid() optimization can access a freed memory. On kernels with DEBUG_SLAB turned ON, this results in Oops. As Dave Hansen noted, this optimization is also unsafe for memory hotplug. So this patch always takes the lock to be safe. [oleg@tv-sign.ru: simplifications] Signed-off-by: Kirill Korotaev Cc: Cc: Dave Hansen Signed-off-by: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- kernel/timer.c | 41 +++++++---------------------------------- 1 file changed, 7 insertions(+), 34 deletions(-) (limited to 'kernel') diff --git a/kernel/timer.c b/kernel/timer.c index b650f04888ed..1d7dd6267c2d 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -1324,46 +1324,19 @@ asmlinkage long sys_getpid(void) } /* - * Accessing ->group_leader->real_parent is not SMP-safe, it could - * change from under us. However, rather than getting any lock - * we can use an optimistic algorithm: get the parent - * pid, and go back and check that the parent is still - * the same. If it has changed (which is extremely unlikely - * indeed), we just try again.. - * - * NOTE! This depends on the fact that even if we _do_ - * get an old value of "parent", we can happily dereference - * the pointer (it was and remains a dereferencable kernel pointer - * no matter what): we just can't necessarily trust the result - * until we know that the parent pointer is valid. - * - * NOTE2: ->group_leader never changes from under us. + * Accessing ->real_parent is not SMP-safe, it could + * change from under us. However, we can use a stale + * value of ->real_parent under rcu_read_lock(), see + * release_task()->call_rcu(delayed_put_task_struct). */ asmlinkage long sys_getppid(void) { int pid; - struct task_struct *me = current; - struct task_struct *parent; - parent = me->group_leader->real_parent; - for (;;) { - pid = parent->tgid; -#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT) -{ - struct task_struct *old = parent; + rcu_read_lock(); + pid = rcu_dereference(current->real_parent)->tgid; + rcu_read_unlock(); - /* - * Make sure we read the pid before re-reading the - * parent pointer: - */ - smp_rmb(); - parent = me->group_leader->real_parent; - if (old != parent) - continue; -} -#endif - break; - } return pid; } -- cgit v1.2.2 From e579dcbf23604cb33c08b5c3c3ac06ca36e7c683 Mon Sep 17 00:00:00 2001 From: john stultz Date: Sun, 13 Aug 2006 23:24:24 -0700 Subject: [PATCH] futex_handle_fault always fails We found this issue last week w/ the -RT kernel, but it seems the same issue is in mainline as well. Basically it is possible for futex_unlock_pi to return without actually freeing the lock. This is due to buggy logic in the use of futex_handle_fault() and its attempt argument in a failure case. Looking at futex.c the logic is as follows: 1) In futex_unlock_pi() we start w/ ret=0 and we go down to the first futex_atomic_cmpxchg_inatomic(), where we find uval==-EFAULT. We then jump to the pi_faulted label. 2) From pi_faulted: We increment attempt, unlock the sem and hit the retry label. 3) From the retry label, with ret still zero, we again hit EFAULT on the first futex_atomic_cmpxchg_inatomic(), and again goto the pi_faulted label. 4) Again from pi_faulted: we increment attempt and enter the conditional, where we call futex_handle_fault. 5) futex_handle_fault fails, and we goto the out_unlock_release_sem label. 6) From out_unlock_release_sem we return, and since ret is still zero, we return without error, while never actually unlocking the lock. Issue #1: at the first futex_atomic_cmpxchg_inatomic() we should probably be setting ret=-EFAULT before jumping to pi_faulted: However in our case this doesn't really affect anything, as the glibc we're using ignores the error value from futex_unlock_pi(). Issue #2: Look at futex_handle_fault(), its first conditional will return -EFAULT if attempt is >= 2. However, from the "if(attempt++) futex_handle_fault(attempt)" logic above, we'll *never* call futex_handle_fault when attempt is less then two. So we never get a chance to even try to fault the page in. The following patch addresses these two issues by 1) Always setting ret to -EFAULT if futex_handle_fault fails, and 2) Removing the = in futex_handle_fault's (attempt >= 2) check. I'm really not sure this is the right fix, but wanted to bring it up so folks knew the issue is alive and well in the current -git tree. From looking at the git logs the logic was first introduced (then later copied to other places) in the following commit almost a year ago: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4732efbeb997189d9f9b04708dc26bf8613ed721;hp=5b039e681b8c5f30aac9cc04385cc94be45d0823 Cc: Rusty Russell Cc: Ingo Molnar Acked-by: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) (limited to 'kernel') diff --git a/kernel/futex.c b/kernel/futex.c index c2b2e0b83abf..d4633c588f33 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -297,7 +297,7 @@ static int futex_handle_fault(unsigned long address, int attempt) struct vm_area_struct * vma; struct mm_struct *mm = current->mm; - if (attempt >= 2 || !(vma = find_vma(mm, address)) || + if (attempt > 2 || !(vma = find_vma(mm, address)) || vma->vm_start > address || !(vma->vm_flags & VM_WRITE)) return -EFAULT; @@ -747,8 +747,10 @@ retry: */ if (attempt++) { if (futex_handle_fault((unsigned long)uaddr2, - attempt)) + attempt)) { + ret = -EFAULT; goto out; + } goto retry; } @@ -1322,9 +1324,10 @@ static int do_futex_lock_pi(u32 __user *uaddr, int detect, int trylock, * still holding the mmap_sem. */ if (attempt++) { - if (futex_handle_fault((unsigned long)uaddr, attempt)) + if (futex_handle_fault((unsigned long)uaddr, attempt)) { + ret = -EFAULT; goto out_unlock_release_sem; - + } goto retry_locked; } @@ -1506,9 +1509,10 @@ pi_faulted: * still holding the mmap_sem. */ if (attempt++) { - if (futex_handle_fault((unsigned long)uaddr, attempt)) + if (futex_handle_fault((unsigned long)uaddr, attempt)) { + ret = -EFAULT; goto out_unlock; - + } goto retry_locked; } -- cgit v1.2.2 From 9b41ea7289a589993d3daabc61f999b4147872c4 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Sun, 13 Aug 2006 23:24:26 -0700 Subject: [PATCH] workqueue: remove lock_cpu_hotplug() Use a private lock instead. It protects all per-cpu data structures in workqueue.c, including the workqueues list. Fix a bug in schedule_on_each_cpu(): it was forgetting to lock down the per-cpu resources. Unfixed long-standing bug: if someone unplugs the CPU identified by `singlethread_cpu' the kernel will get very sick. Cc: Dave Jones Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- kernel/workqueue.c | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-) (limited to 'kernel') diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 448e8f7b342d..835fe28b87a8 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -68,7 +68,7 @@ struct workqueue_struct { /* All the per-cpu workqueues on the system, for hotplug cpu to add/remove threads to each one as cpus come/go. */ -static DEFINE_SPINLOCK(workqueue_lock); +static DEFINE_MUTEX(workqueue_mutex); static LIST_HEAD(workqueues); static int singlethread_cpu; @@ -320,10 +320,10 @@ void fastcall flush_workqueue(struct workqueue_struct *wq) } else { int cpu; - lock_cpu_hotplug(); + mutex_lock(&workqueue_mutex); for_each_online_cpu(cpu) flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu)); - unlock_cpu_hotplug(); + mutex_unlock(&workqueue_mutex); } } EXPORT_SYMBOL_GPL(flush_workqueue); @@ -371,8 +371,7 @@ struct workqueue_struct *__create_workqueue(const char *name, } wq->name = name; - /* We don't need the distraction of CPUs appearing and vanishing. */ - lock_cpu_hotplug(); + mutex_lock(&workqueue_mutex); if (singlethread) { INIT_LIST_HEAD(&wq->list); p = create_workqueue_thread(wq, singlethread_cpu); @@ -381,9 +380,7 @@ struct workqueue_struct *__create_workqueue(const char *name, else wake_up_process(p); } else { - spin_lock(&workqueue_lock); list_add(&wq->list, &workqueues); - spin_unlock(&workqueue_lock); for_each_online_cpu(cpu) { p = create_workqueue_thread(wq, cpu); if (p) { @@ -393,7 +390,7 @@ struct workqueue_struct *__create_workqueue(const char *name, destroy = 1; } } - unlock_cpu_hotplug(); + mutex_unlock(&workqueue_mutex); /* * Was there any error during startup? If yes then clean up: @@ -434,17 +431,15 @@ void destroy_workqueue(struct workqueue_struct *wq) flush_workqueue(wq); /* We don't need the distraction of CPUs appearing and vanishing. */ - lock_cpu_hotplug(); + mutex_lock(&workqueue_mutex); if (is_single_threaded(wq)) cleanup_workqueue_thread(wq, singlethread_cpu); else { for_each_online_cpu(cpu) cleanup_workqueue_thread(wq, cpu); - spin_lock(&workqueue_lock); list_del(&wq->list); - spin_unlock(&workqueue_lock); } - unlock_cpu_hotplug(); + mutex_unlock(&workqueue_mutex); free_percpu(wq->cpu_wq); kfree(wq); } @@ -515,11 +510,13 @@ int schedule_on_each_cpu(void (*func)(void *info), void *info) if (!works) return -ENOMEM; + mutex_lock(&workqueue_mutex); for_each_online_cpu(cpu) { INIT_WORK(per_cpu_ptr(works, cpu), func, info); __queue_work(per_cpu_ptr(keventd_wq->cpu_wq, cpu), per_cpu_ptr(works, cpu)); } + mutex_unlock(&workqueue_mutex); flush_workqueue(keventd_wq); free_percpu(works); return 0; @@ -635,6 +632,7 @@ static int __devinit workqueue_cpu_callback(struct notifier_block *nfb, switch (action) { case CPU_UP_PREPARE: + mutex_lock(&workqueue_mutex); /* Create a new workqueue thread for it. */ list_for_each_entry(wq, &workqueues, list) { if (!create_workqueue_thread(wq, hotcpu)) { @@ -653,6 +651,7 @@ static int __devinit workqueue_cpu_callback(struct notifier_block *nfb, kthread_bind(cwq->thread, hotcpu); wake_up_process(cwq->thread); } + mutex_unlock(&workqueue_mutex); break; case CPU_UP_CANCELED: @@ -664,6 +663,15 @@ static int __devinit workqueue_cpu_callback(struct notifier_block *nfb, any_online_cpu(cpu_online_map)); cleanup_workqueue_thread(wq, hotcpu); } + mutex_unlock(&workqueue_mutex); + break; + + case CPU_DOWN_PREPARE: + mutex_lock(&workqueue_mutex); + break; + + case CPU_DOWN_FAILED: + mutex_unlock(&workqueue_mutex); break; case CPU_DEAD: @@ -671,6 +679,7 @@ static int __devinit workqueue_cpu_callback(struct notifier_block *nfb, cleanup_workqueue_thread(wq, hotcpu); list_for_each_entry(wq, &workqueues, list) take_over_work(wq, hotcpu); + mutex_unlock(&workqueue_mutex); break; } -- cgit v1.2.2 From f8986c241dfd54d51c9eff967129a550ae230144 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Sun, 27 Aug 2006 01:23:34 -0700 Subject: [PATCH] revert "Drop tasklist lock in do_sched_setscheduler" sched_setscheduler() looks at ->signal->rlim[]. It is unsafe do dereference ->signal unless tasklist_lock or ->siglock is held (or p == current). We pin the task structure, but this can't prevent from release_task()->__exit_signal() which sets ->signal = NULL. Restore tasklist_lock across the setscheduler call. Signed-off-by: Oleg Nesterov Cc: Greg KH Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/sched.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'kernel') diff --git a/kernel/sched.c b/kernel/sched.c index a2be2d055299..a234fbee1238 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4162,10 +4162,8 @@ do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param) read_unlock_irq(&tasklist_lock); return -ESRCH; } - get_task_struct(p); - read_unlock_irq(&tasklist_lock); retval = sched_setscheduler(p, policy, &lparam); - put_task_struct(p); + read_unlock_irq(&tasklist_lock); return retval; } -- cgit v1.2.2 From d015baebba44613ef59ddffeae2114fa4ede7104 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Sun, 27 Aug 2006 01:23:40 -0700 Subject: [PATCH] futex_find_get_task(): remove an obscure EXIT_ZOMBIE check futex_find_get_task: if (p->state == EXIT_ZOMBIE || p->exit_state == EXIT_ZOMBIE) return NULL; I can't understand this. First, p->state can't be EXIT_ZOMBIE. The ->exit_state check looks strange too. Sub-threads or tasks whose ->parent ignores SIGCHLD go directly to EXIT_DEAD state (I am ignoring a ptrace case). Why EXIT_DEAD tasks should be ok? Yes, EXIT_ZOMBIE is more important (a task may stay zombie for a long time), but this doesn't mean we should explicitely ignore other EXIT_XXX states. Signed-off-by: Oleg Nesterov Acked-by: Ingo Molnar Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/futex.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'kernel') diff --git a/kernel/futex.c b/kernel/futex.c index d4633c588f33..b9b8aea5389e 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -397,7 +397,7 @@ static struct task_struct * futex_find_get_task(pid_t pid) p = NULL; goto out_unlock; } - if (p->state == EXIT_ZOMBIE || p->exit_state == EXIT_ZOMBIE) { + if (p->exit_state != 0) { p = NULL; goto out_unlock; } -- cgit v1.2.2 From 4edb9a143e31d2e191c199262226e1a5923ff8f7 Mon Sep 17 00:00:00 2001 From: Yingchao Zhou Date: Sun, 27 Aug 2006 01:23:46 -0700 Subject: [PATCH] Remove redundant up() in stop_machine() An up() is called in kernel/stop_machine.c on failure, and also in the caller (unconditionally). Signed-off-by: Zhou Yingchao Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/stop_machine.c | 1 - 1 file changed, 1 deletion(-) (limited to 'kernel') diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index dcfb5d731466..51cacd111dbd 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -111,7 +111,6 @@ static int stop_machine(void) /* If some failed, kill them all. */ if (ret < 0) { stopmachine_set_state(STOPMACHINE_EXIT); - up(&stopmachine_mutex); return ret; } -- cgit v1.2.2 From 4c4d50f7b39cc58f1064b93a61ad617451ae41df Mon Sep 17 00:00:00 2001 From: Paul Jackson Date: Sun, 27 Aug 2006 01:23:51 -0700 Subject: [PATCH] cpuset: top_cpuset tracks hotplug changes to cpu_online_map Change the list of cpus allowed to tasks in the top (root) cpuset to dynamically track what cpus are online, using a CPU hotplug notifier. Make this top cpus file read-only. On systems that have cpusets configured in their kernel, but that aren't actively using cpusets (for some distros, this covers the majority of systems) all tasks end up in the top cpuset. If that system does support CPU hotplug, then these tasks cannot make use of CPUs that are added after system boot, because the CPUs are not allowed in the top cpuset. This is a surprising regression over earlier kernels that didn't have cpusets enabled. In order to keep the behaviour of cpusets consistent between systems actively making use of them and systems not using them, this patch changes the behaviour of the 'cpus' file in the top (root) cpuset, making it read only, and making it automatically track the value of cpu_online_map. Thus tasks in the top cpuset will have automatic use of hot plugged CPUs allowed by their cpuset. Thanks to Anton Blanchard and Nathan Lynch for reporting this problem, driving the fix, and earlier versions of this patch. Signed-off-by: Paul Jackson Cc: Nathan Lynch Cc: Anton Blanchard Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/cpuset.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) (limited to 'kernel') diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 1a649f2bb9bb..f1dda98bdffe 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -816,6 +816,10 @@ static int update_cpumask(struct cpuset *cs, char *buf) struct cpuset trialcs; int retval, cpus_unchanged; + /* top_cpuset.cpus_allowed tracks cpu_online_map; it's read-only */ + if (cs == &top_cpuset) + return -EACCES; + trialcs = *cs; retval = cpulist_parse(buf, trialcs.cpus_allowed); if (retval < 0) @@ -2033,6 +2037,33 @@ out: return err; } +/* + * The top_cpuset tracks what CPUs and Memory Nodes are online, + * period. This is necessary in order to make cpusets transparent + * (of no affect) on systems that are actively using CPU hotplug + * but making no active use of cpusets. + * + * This handles CPU hotplug (cpuhp) events. If someday Memory + * Nodes can be hotplugged (dynamically changing node_online_map) + * then we should handle that too, perhaps in a similar way. + */ + +#ifdef CONFIG_HOTPLUG_CPU +static int cpuset_handle_cpuhp(struct notifier_block *nb, + unsigned long phase, void *cpu) +{ + mutex_lock(&manage_mutex); + mutex_lock(&callback_mutex); + + top_cpuset.cpus_allowed = cpu_online_map; + + mutex_unlock(&callback_mutex); + mutex_unlock(&manage_mutex); + + return 0; +} +#endif + /** * cpuset_init_smp - initialize cpus_allowed * @@ -2043,6 +2074,8 @@ void __init cpuset_init_smp(void) { top_cpuset.cpus_allowed = cpu_online_map; top_cpuset.mems_allowed = node_online_map; + + hotcpu_notifier(cpuset_handle_cpuhp, 0); } /** -- cgit v1.2.2 From 0d673a5a4775d3dc565b6668ed75fd2db2ede624 Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Sun, 27 Aug 2006 01:23:54 -0700 Subject: [PATCH] cpuset: oom panic fix cpuset_excl_nodes_overlap always returns 0 if current is exiting. This caused customer's systems to panic in the OOM killer when processes were having trouble getting memory for the final put_user in mm_release. Even though there were lots of processes to kill. Change to returning 1 in this case. This achieves parity with !CONFIG_CPUSETS case, and was observed to fix the problem. Signed-off-by: Nick Piggin Acked-by: Paul Jackson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/cpuset.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'kernel') diff --git a/kernel/cpuset.c b/kernel/cpuset.c index f1dda98bdffe..4ea6f0dc2fc5 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -2420,7 +2420,7 @@ EXPORT_SYMBOL_GPL(cpuset_mem_spread_node); int cpuset_excl_nodes_overlap(const struct task_struct *p) { const struct cpuset *cs1, *cs2; /* my and p's cpuset ancestors */ - int overlap = 0; /* do cpusets overlap? */ + int overlap = 1; /* do cpusets overlap? */ task_lock(current); if (current->flags & PF_EXITING) { -- cgit v1.2.2