aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAge
* make cancel_rearming_delayed_work() reliableOleg Nesterov2007-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thanks to Jarek Poplawski for the ideas and for spotting the bug in the initial draft patch. cancel_rearming_delayed_work() currently has many limitations, because it requires that dwork always re-arms itself via queue_delayed_work(). So it hangs forever if dwork doesn't do this, or cancel_rearming_delayed_work/ cancel_delayed_work was already called. It uses flush_workqueue() in a loop, so it can't be used if workqueue was freezed, and it is potentially live- lockable on busy system if delay is small. With this patch cancel_rearming_delayed_work() doesn't make any assumptions about dwork, it can re-arm itself via queue_delayed_work(), or queue_work(), or do nothing. As a "side effect", cancel_work_sync() was changed to handle re-arming works as well. Disadvantages: - this patch adds wmb() to insert_work(). - slowdowns the fast path (when del_timer() succeeds on entry) of cancel_rearming_delayed_work(), because wait_on_work() is called unconditionally. In that case, compared to the old version, we are doing "unneeded" lock/unlock for each online CPU. On the other hand, this means we don't need to use cancel_work_sync() after cancel_rearming_delayed_work(). - complicates the code (.text grows by 130 bytes). [akpm@linux-foundation.org: fix speling] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Chinner <dgc@sgi.com> Cc: David Howells <dhowells@redhat.com> Cc: Gautham Shenoy <ego@in.ibm.com> Acked-by: Jarek Poplawski <jarkao2@o2.pl> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Remove kthread_bind() call from _cpu_down()Gautham R Shenoy2007-05-09
| | | | | | | | | | | | | | | | | We are anyway kthread_stop()ping other per-cpu kernel threads after move_task_off_dead_cpu(), so we can do it with the stop_machine_run thread as well. I just checked with Vatsa if there was any subtle reason why they had put in the kthread_bind() in cpu.c. Vatsa cannot seem to recollect any and I can't see any. So let us just remove the kthread_bind. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* change kernel threads to ignore signals instead of blocking themOleg Nesterov2007-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently kernel threads use sigprocmask(SIG_BLOCK) to protect against signals. This doesn't prevent the signal delivery, this only blocks signal_wake_up(). Every "killall -33 kthreadd" means a "struct siginfo" leak. Change kthreadd_setup() to set all handlers to SIG_IGN instead of blocking them (make a new helper ignore_signals() for that). If the kernel thread needs some signal, it should use allow_signal() anyway, and in that case it should not use CLONE_SIGHAND. Note that we can't change daemonize() (should die!) in the same way, because it can be used along with CLONE_SIGHAND. This means that allow_signal() still should unblock the signal to work correctly with daemonize()ed threads. However, disallow_signal() doesn't block the signal any longer but ignores it. NOTE: with or without this patch the kernel threads are not protected from handle_stop_signal(), this seems harmless, but not good. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* worker_thread: don't play with SIGCHLD and numa policyOleg Nesterov2007-05-09
| | | | | | | | | | worker_thread() inherits ignored SIGCHLD and numa_default_policy() from its parent, kthreadd. No need to setup this again. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* wait_for_helper: remove unneeded do_sigaction()Oleg Nesterov2007-05-09
| | | | | | | | | | | allow_signal(SIGCHLD) does all necessary job, no need to call do_sigaction() prior to. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Change reparent_to_init to reparent_to_kthreaddEric W. Biederman2007-05-09
| | | | | | | | | | | | | | | When a kernel thread calls daemonize, instead of reparenting the thread to init reparent the thread to kthreadd next to the threads created by kthread_create. This is really just a stop gap until daemonize goes away, but it does ensure no kernel threads are under init and they are all in one place that is easy to find. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kthread: don't depend on work queuesEric W. Biederman2007-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there is a circular reference between work queue initialization and kthread initialization. This prevents the kthread infrastructure from initializing until after work queues have been initialized. We want the properties of tasks created with kthread_create to be as close as possible to the init_task and to not be contaminated by user processes. The later we start our kthreadd that creates these tasks the harder it is to avoid contamination from user processes and the more of a mess we have to clean up because the defaults have changed on us. So this patch modifies the kthread support to not use work queues but to instead use a simple list of structures, and to have kthreadd start from init_task immediately after our kernel thread that execs /sbin/init. By being a true child of init_task we only have to change those process settings that we want to have different from init_task, such as our process name, the cpus that are allowed, blocking all signals and setting SIGCHLD to SIG_IGN so that all of our children are reaped automatically. By being a true child of init_task we also naturally get our ppid set to 0 and do not wind up as a child of PID == 1. Ensuring that tasks generated by kthread_create will not slow down the functioning of the wait family of functions. [akpm@linux-foundation.org: use interruptible sleeps] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ____call_usermodehelper: don't flush_signals()Oleg Nesterov2007-05-09
| | | | | | | | | | | ____call_usermodehelper() has no reason for flush_signals(). It is a fresh forked process which is going to exec a user-space application or exit on failure. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* unify flush_work/flush_work_keventd and rename it to cancel_work_syncOleg Nesterov2007-05-09
| | | | | | | | | | | | | | | | | | | | flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq (this was possible from the very beginnig, I missed this). So we can unify flush_work_keventd and flush_work. Also, rename flush_work() to cancel_work_sync() and fix all callers. Perhaps this is not the best name, but "flush_work" is really bad. (akpm: this is why the earlier patches bypassed maintainers) Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Auke Kok <auke-jan.h.kok@intel.com>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zap_other_threads: remove unneeded ->exit_signal changeOleg Nesterov2007-05-09
| | | | | | | | | We already depend on fact that all sub-threads have ->exit_signal == -1, no need to set it in zap_other_threads(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* worker_thread: fix racy try_to_freeze() usageOleg Nesterov2007-05-09
| | | | | | | | | | | | | | | | worker_thread() can miss freeze_process()->signal_wake_up() if it happens between try_to_freeze() and prepare_to_wait(). We should check freezing() before entering schedule(). This race was introduced by me in [PATCH 1/1] workqueue: don't migrate pending works from the dead CPU Looks like mm/vmscan.c:kswapd() has the same race. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* worker_thread: don't play with signalsOleg Nesterov2007-05-09
| | | | | | | | | worker_thread() doesn't need to "Block and flush all signals", this was already done by its caller, kthread(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueue: kill NOAUTOREL worksOleg Nesterov2007-05-09
| | | | | | | | | | | | | We don't have any users, and it is not so trivial to use NOAUTOREL works correctly. It is better to simplify API. Delete NOAUTOREL support and rename work_release to work_clear_pending to avoid a confusion. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* make cancel_rearming_delayed_work() work on any workqueue, not just keventd_wqOleg Nesterov2007-05-09
| | | | | | | | | | | | | | | | cancel_rearming_delayed_workqueue(wq, dwork) doesn't need the first parameter. We don't hang on un-queued dwork any longer, and work->data doesn't change its type. This means we can always figure out "wq" from dwork when it is needed. Remove this parameter, and rename the function to cancel_rearming_delayed_work(). Re-create an inline "obsolete" cancel_rearming_delayed_workqueue(wq) which just calls cancel_rearming_delayed_work(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueue: introduce wq_per_cpu() helperOleg Nesterov2007-05-09
| | | | | | | | | Cleanup. A number of per_cpu_ptr(wq->cpu_wq, cpu) users have to check that cpu is valid for this wq. Make a simple helper. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* unify queue_delayed_work() and queue_delayed_work_on()Oleg Nesterov2007-05-09
| | | | | | | | | | | | | Change queue_delayed_work() to use queue_delayed_work_on() to avoid the code duplication (saves 133 bytes). Q: queue_delayed_work() enqueues &dwork->work directly when delay == 0, why? [jirislaby@gmail.com: oops fix] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* make queue_delayed_work() friendly to flush_fork()Oleg Nesterov2007-05-09
| | | | | | | | | | | | | | | | | | | | | | Currently typeof(delayed_work->work.data) is "struct workqueue_struct" when the timer is pending "struct cpu_workqueue_struct" whe the work is queued This makes impossible to use flush_fork(delayed_work->work) in addition to cancel_delayed_work/cancel_rearming_delayed_work, not good. Change queue_delayed_work/delayed_work_timer_fn to use cwq, not wq. This complicates (and uglifies) these functions a little bit, but alows us to use flush_fork(dwork) and imho makes the whole code more consistent. Also, document the fact that cancel_rearming_delayed_work() doesn't garantee the completion of work->func() upon return. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueues: shift kthread_bind() from CPU_UP_PREPARE to CPU_ONLINEOleg Nesterov2007-05-09
| | | | | | | | | | | | | | | CPU_UP_PREPARE binds cwq->thread to the new CPU. So CPU_UP_CANCELED tries to wake up the task which is bound to the failed CPU. With this patch we don't bind cwq->thread until CPU becomes online. The first wake_up() after kthread_create() is a bit special, make a simple helper for that. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueue: make init_workqueues() __initOleg Nesterov2007-05-09
| | | | | | | | | The only caller of init_workqueues() is do_basic_setup(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueue: introduce workqueue_struct->singlethreadOleg Nesterov2007-05-09
| | | | | | | | | | | Add explicit workqueue_struct->singlethread flag. This lessens .text a little, but most importantly this allows us to manipulate wq->list without changine the meaning of is_single_threaded(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueue: introduce cpu_singlethread_mapOleg Nesterov2007-05-09
| | | | | | | | | | | | | | | | | | | The code like if (is_single_threaded(wq)) do_something(singlethread_cpu); else { for_each_cpu_mask(cpu, cpu_populated_map) do_something(cpu); } looks very annoying. We can add "static cpumask_t cpu_singlethread_map" and simplify the code. Lessens .text a bit, and imho makes the code more readable. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueue: make cancel_rearming_delayed_workqueue() work on idle dworkOleg Nesterov2007-05-09
| | | | | | | | | | | | | | | | | | cancel_rearming_delayed_workqueue(dwork) will hang forever if dwork was not scheduled, because in that case cancel_delayed_work()->del_timer_sync() never returns true. I don't know if there are any callers which may have problems, but this is not so convenient, and the fix is very simple. Q: looks like we don't need "struct workqueue_struct *wq" parameter. If the timer was aborted successfully, get_wq_data() == wq. Is it worth to add the new function? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueue: don't save interrupts in run_workqueue()Oleg Nesterov2007-05-09
| | | | | | | | | work->func() may sleep, it's a bug to call run_workqueue() with irqs disabled. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueue: kill run_scheduled_work()Oleg Nesterov2007-05-09
| | | | | | | | | | | | Because it has no callers. Actually, I think the whole idea of run_scheduled_work() was not right, not good to mix "unqueue this work and execute its ->func()" in one function. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueue: don't migrate pending works from the dead CPUOleg Nesterov2007-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently CPU_DEAD uses kthread_stop() to stop cwq->thread and then transfers cwq->worklist to another CPU. However, it is very unlikely that worker_thread() will notice kthread_should_stop() before flushing cwq->worklist. It is only possible if worker_thread() was preempted after run_workqueue(cwq), a new work_struct was added, and CPU_DEAD happened before cwq->thread has a chance to run. This means that take_over_work() mostly adds unneeded complications. Note also that kthread_stop() is not good per se, wake_up_process() may confuse work->func() if it sleeps waiting for some event. Remove take_over_work() and migrate_sequence complications. CPU_DEAD sets the cwq->should_stop flag (introduced by this patch) and waits for cwq->thread to flush cwq->worklist and exit. Because the dead CPU is not on cpu_online_map, no more works can be added to that cwq. cpu_populated_map was introduced to optimize for_each_possible_cpu(), it is not strictly needed, and it is more a documentation in fact. Saves 418 bytes. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Gautham shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueue: don't clear cwq->thread until it exitsOleg Nesterov2007-05-09
| | | | | | | | | | | | | | | | | | Pointed out by Srivatsa Vaddagiri. cleanup_workqueue_thread() sets cwq->thread = NULL and does kthread_stop(). This breaks the "if (cwq->thread == current)" logic in flush_cpu_workqueue() and leads to deadlock. Kill the thead first, then clear cwq->thread. workqueue_mutex protects us from create_workqueue_thread() so we don't need cwq->lock. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Gautham shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueue: fix flush_workqueue() vs CPU_DEAD raceOleg Nesterov2007-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many thanks to Srivatsa Vaddagiri for the helpful discussion and for spotting the bug in my previous attempt. work->func() (and thus flush_workqueue()) must not use workqueue_mutex, this leads to deadlock when CPU_DEAD does kthread_stop(). However without this mutex held we can't detect CPU_DEAD in progress, which can move pending works to another CPU while the dead one is not on cpu_online_map. Change flush_workqueue() to use for_each_possible_cpu(). This means that flush_cpu_workqueue() may hit CPU which is already dead. However in that case !list_empty(&cwq->worklist) || cwq->current_work != NULL means that CPU_DEAD in progress, it will do kthread_stop() + take_over_work() so we can proceed and insert a barrier. We hold cwq->lock, so we are safe. Also, add migrate_sequence incremented by take_over_work() under cwq->lock. If take_over_work() happened before we checked this CPU, we should see the new value after spin_unlock(). Further possible changes: remove CPU_DEAD handling (along with take_over_work, migrate_sequence) from workqueue.c. CPU_DEAD just sets cwq->please_exit_after_flush flag. CPU_UP_PREPARE->create_workqueue_thread() clears this flag, and creates the new thread if cwq->thread == NULL. This way the workqueue/cpu-hotplug interaction is almost zero, workqueue_mutex just protects "workqueues" list, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE go away. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Gautham shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueue: fix freezeable workqueues implementationOleg Nesterov2007-05-09
| | | | | | | | | | | | Currently ->freezeable is per-cpu, this is wrong. CPU_UP_PREPARE creates cwq->thread which is not freezeable. Move ->freezeable to workqueue_struct. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Gautham shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failedHeiko Carstens2007-05-09
| | | | | | | | | | | | | | This makes cpu hotplug symmetrical: if CPU_UP_PREPARE fails we get CPU_UP_CANCELED, so we can undo what ever happened on PREPARE. The same should happen for CPU_DOWN_PREPARE. [akpm@linux-foundation.org: fix for reduce-size-of-task_struct-on-64-bit-machines] Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Gautham Shenoy <ego@in.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Eliminate lock_cpu_hotplug in kernel/schedcGautham R Shenoy2007-05-09
| | | | | | | | | | | | | | Eliminate lock_cpu_hotplug from kernel/sched.c and use sched_hotcpu_mutex instead to postpone a hotplug event. In the migration_call hotcpu callback function, take sched_hotcpu_mutex while handling the event CPU_LOCK_ACQUIRE and release it while handling CPU_LOCK_RELEASE event. [akpm@linux-foundation.org: fix deadlock] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Define and use new events,CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASEGautham R Shenoy2007-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an attempt to provide an alternate mechanism for postponing a hotplug event instead of using a global mechanism like lock_cpu_hotplug. The proposal is to add two new events namely CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE. The notification for these two events would be sent out before and after a cpu_hotplug event respectively. During the CPU_LOCK_ACQUIRE event, a cpu-hotplug-aware subsystem is supposed to acquire any per-subsystem hotcpu mutex ( Eg. workqueue_mutex in kernel/workqueue.c ). During the CPU_LOCK_RELEASE release event the cpu-hotplug-aware subsystem is supposed to release the per-subsystem hotcpu mutex. The reasons for defining new events as opposed to reusing the existing events like CPU_UP_PREPARE/CPU_UP_FAILED/CPU_ONLINE for locking/unlocking of per-subsystem hotcpu mutexes are as follow: - CPU_LOCK_ACQUIRE: All hotcpu mutexes are taken before subsystems start handling pre-hotplug events like CPU_UP_PREPARE/CPU_DOWN_PREPARE etc, thus ensuring a clean handling of these events. - CPU_LOCK_RELEASE: The hotcpu mutexes will be released only after all subsystems have handled post-hotplug events like CPU_DOWN_FAILED, CPU_DEAD,CPU_ONLINE etc thereby ensuring that there are no subsequent clashes amongst the interdependent subsystems after a cpu hotplugs. This patch also uses __raw_notifier_call chain in _cpu_up to take care of the dependency between the two consequetive calls to raw_notifier_call_chain. [akpm@linux-foundation.org: fix a bug] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Extend notifier_call_chain to count nr_calls madeGautham R Shenoy2007-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 2.6.18-something, the community has been bugged by the problem to provide a clean and a stable mechanism to postpone a cpu-hotplug event as lock_cpu_hotplug was badly broken. This is another proposal towards solving that problem. This one is along the lines of the solution provided in kernel/workqueue.c Instead of having a global mechanism like lock_cpu_hotplug, we allow the subsytems to define their own per-subsystem hot cpu mutexes. These would be taken(released) where ever we are currently calling lock_cpu_hotplug(unlock_cpu_hotplug). Also, in the per-subsystem hotcpu callback function,we take this mutex before we handle any pre-cpu-hotplug events and release it once we finish handling the post-cpu-hotplug events. A standard means for doing this has been provided in [PATCH 2/4] and demonstrated in [PATCH 3/4]. The ordering of these per-subsystem mutexes might still prove to be a problem, but hopefully lockdep should help us get out of that muddle. The patch set to be applied against linux-2.6.19-rc5 is as follows: [PATCH 1/4] : Extend notifier_call_chain with an option to specify the number of notifications to be sent and also count the number of notifications actually sent. [PATCH 2/4] : Define events CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE and send out notifications for these in _cpu_up and _cpu_down. This would help us standardise the acquire and release of the subsystem locks in the hotcpu callback functions of these subsystems. [PATCH 3/4] : Eliminate lock_cpu_hotplug from kernel/sched.c. [PATCH 4/4] : In workqueue_cpu_callback function, acquire(release) the workqueue_mutex while handling CPU_LOCK_ACQUIRE(CPU_LOCK_RELEASE). If the per-subsystem-locking approach survives the test of time, we can expect a slow phasing out of lock_cpu_hotplug, which has not yet been eliminated in these patches :) This patch: Provide notifier_call_chain with an option to call only a specified number of notifiers and also record the number of call to notifiers made. The need for this enhancement was identified in the post entitled "Slab - Eliminate lock_cpu_hotplug from slab" (http://lkml.org/lkml/2006/10/28/92) by Ravikiran G Thirumalai and Andrew Morton. This patch adds two additional parameters to notifier_call_chain API namely - int nr_to_calls : Number of notifier_functions to be called. The don't care value is -1. - unsigned int *nr_calls : Records the total number of notifier_funtions called by notifier_call_chain. The don't care value is NULL. [michal.k.k.piotrowski@gmail.com: build fix] Credit: Andrew Morton <akpm@osdl.org> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* relay: use plain timer instead of delayed workTom Zanussi2007-05-09
| | | | | | | | | | | relay doesn't need to use schedule_delayed_work() for waking readers when a simple timer will do. Signed-off-by: Tom Zanussi <zanussi@comcast.net> Cc: Satyam Sharma <satyam.sharma@gmail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* flush_cpu_workqueue: don't flush an empty ->worklistOleg Nesterov2007-05-09
| | | | | | | | | | | | | | Now when we have ->current_work we can avoid adding a barrier and waiting for its completition when cwq's queue is empty. Note: this change is also useful if we change flush_workqueue() to also check the dead CPUs. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Gautham Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* flush_workqueue(): use preempt_disable to hold off cpu hotplugAndrew Morton2007-05-09
| | | | | | | | Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Gautham Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* implement flush_work()Oleg Nesterov2007-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A basic problem with flush_scheduled_work() is that it blocks behind _all_ presently-queued works, rather than just the work whcih the caller wants to flush. If the caller holds some lock, and if one of the queued work happens to want that lock as well then accidental deadlocks can occur. One example of this is the phy layer: it wants to flush work while holding rtnl_lock(). But if a linkwatch event happens to be queued, the phy code will deadlock because the linkwatch callback function takes rtnl_lock. So we implement a new function which will flush a *single* work - just the one which the caller wants to free up. Thus we avoid the accidental deadlocks which can arise from unrelated subsystems' callbacks taking shared locks. flush_work() non-blockingly dequeues the work_struct which we want to kill, then it waits for its handler to complete on all CPUs. Add ->current_work to the "struct cpu_workqueue_struct", it points to currently running "struct work_struct". When flush_work(work) detects ->current_work == work, it inserts a barrier at the _head_ of ->worklist (and thus right _after_ that work) and waits for completition. This means that the next work fired on that CPU will be this barrier, or another barrier queued by concurrent flush_work(), so the caller of flush_work() will be woken before any "regular" work has a chance to run. When wait_on_work() unlocks workqueue_mutex (or whatever we choose to protect against CPU hotplug), CPU may go away. But in that case take_over_work() will move a barrier we queued to another CPU, it will be fired sometime, and wait_on_work() will be woken. Actually, we are doing cleanup_workqueue_thread()->kthread_stop() before take_over_work(), so cwq->thread should complete its ->worklist (and thus the barrier), because currently we don't check kthread_should_stop() in run_workqueue(). But even if we did, everything should be ok. [akpm@osdl.org: cleanup] [akpm@osdl.org: add flush_work_keventd() wrapper] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* reimplement flush_workqueue()Oleg Nesterov2007-05-09
| | | | | | | | | | | | | | | Remove ->remove_sequence, ->insert_sequence, and ->work_done from struct cpu_workqueue_struct. To implement flush_workqueue() we can queue a barrier work on each CPU and wait for its completition. The barrier is queued under workqueue_mutex to ensure that per cpu wq->cpu_wq is alive, we drop this mutex before going to sleep. If CPU goes down while we are waiting for completition, take_over_work() will move the barrier on another CPU, and the handler will wake up us eventually. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* schedule_on_each_cpu(): use preempt_disable()Andrew Morton2007-05-09
| | | | | | | | | We take workqueue_mutex in there to keep CPU hotplug away. But preempt_disable() will suffice for that. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix printk format warnings in timer_list.cDavid Miller2007-05-09
| | | | | | | | | | | | | u64 and s64 are not necessarily 'long long' on some 64-bit platforms, so explicit the type to kill the compiler warnings. Also consistently use '%Lu' which is unsigned. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* clocksource: spelling error in watchdog codeDaniel Walker2007-05-09
| | | | | | | | There's more that need fixing, and fix my own subject spelling error too. Signed-off-by: Daniel Walker <dwalker@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Move sig_kernel_* et al macros to linux/signal.hRoland McGrath2007-05-09
| | | | | | | | | | | This patch moves the sig_kernel_* and related macros from kernel/signal.c to linux/signal.h, and cleans them up slightly. I need the sig_kernel_* macros for default signal behavior in the utrace code, and want to avoid duplication or overhead to share the knowledge. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* use simple_read_from_buffer in kernel/Akinobu Mita2007-05-09
| | | | | | | | | | | Cleanup using simple_read_from_buffer() for /dev/cpuset/tasks and /proc/config.gz. Cc: Paul Jackson <pj@sgi.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix Linuxdoc commentJeff Dike2007-05-09
| | | | | | | | | A linuxdoc comment had fallen out of date - it refers to an argument which no longer exists. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* PM: Separate hibernation code from suspend codeRafael J. Wysocki2007-05-09
| | | | | | | | | | | | | | | | | | | | | | [ With Johannes Berg <johannes@sipsolutions.net> ] Separate the hibernation (aka suspend to disk code) from the other suspend code. In particular: * Remove the definitions related to hibernation from include/linux/pm.h * Introduce struct hibernation_ops and a new hibernate() function to hibernate the system, defined in include/linux/suspend.h * Separate suspend code in kernel/power/main.c from hibernation-related code in kernel/power/disk.c and kernel/power/user.c (with the help of hibernation_ops) * Switch ACPI (the only user of pm_ops.pm_disk_mode) to hibernation_ops Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* swsusp: clean up printkAndrew Morton2007-05-09
| | | | | | | | | Remove an inexplicable / Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* revert 'sched: redundant reschedule when set_user_nice() boosts a prio of a ↵Andrew Morton2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | task from the "expired" array' Revert commit bd53f96ca54a21c07e7a0ae1886fa623d370b85f. Con says: This is no good, sorry. The one I saw originally was with the staircase deadline cpu scheduler in situ and was different. #define TASK_PREEMPTS_CURR(p, rq) \ ((p)->prio < (rq)->curr->prio) (((p)->prio < (rq)->curr->prio) && ((p)->array == (rq)->active)) This will fail to wake up a runqueue for a task that has been migrated to the expired array of a runqueue which is otherwise idle which can happen with smp balancing, Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: Con Kolivas <kernel@kolivas.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'master' of ↵Linus Torvalds2007-05-08
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (77 commits) [POWERPC] Abolish powerpc_flash_init() [POWERPC] Early serial debug support for PPC44x [POWERPC] Support for the Ebony 440GP reference board in arch/powerpc [POWERPC] Add device tree for Ebony [POWERPC] Add powerpc/platforms/44x, disable platforms/4xx for now [POWERPC] MPIC U3/U4 MSI backend [POWERPC] MPIC MSI allocator [POWERPC] Enable MSI mappings for MPIC [POWERPC] Tell Phyp we support MSI [POWERPC] RTAS MSI implementation [POWERPC] PowerPC MSI infrastructure [POWERPC] Rip out the existing powerpc msi stubs [POWERPC] Remove use of 4level-fixup.h for ppc32 [POWERPC] Add powerpc PCI-E reset API implementation [POWERPC] Holly bootwrapper [POWERPC] Holly DTS [POWERPC] Holly defconfig [POWERPC] Add support for 750CL Holly board [POWERPC] Generalize tsi108 PCI setup [POWERPC] Generalize tsi108 PHY types ... Fixed conflict in include/asm-powerpc/kdebug.h manually Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Merge branch 'linux-2.6'Paul Mackerras2007-05-07
| |\
| * | [POWERPC] powermac: Suspend to disk on G5Johannes Berg2007-05-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Powermac G5 suspend to disk implementation. The code is platform agnostic but only tested on powermac, no other 64-bit powerpc machines. Because nvidiafb still breaks suspend I have marked it EXPERIMENTAL on powermac and because I can't test it and some lowlevel code will need changes it is BROKEN on all other 64-bit platforms. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | Add IRQF_IRQPOLL flag (common code)Bernhard Walle2007-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | irqpoll is broken on some architectures that don't use the IRQ 0 for the timer interrupt like IA64. This patch adds a IRQF_IRQPOLL flag. Each architecture is handled in a separate pach. As I left the irq == 0 as condition, this should not break existing architectures that use timer_irq == 0 and that I did't address with that patch (because I don't know). This patch: This patch adds a IRQF_IRQPOLL flag that the interrupt registration code could use for the interrupt it wants to use for IRQ polling. Because this must not be the timer interrupt, an additional flag was added instead of re-using the IRQF_TIMER constant. Until all architectures will have an IRQF_IRQPOLL interrupt, irq == 0 will stay as alternative as it should not break anything. Also, note_interrupt() is called on CPU-specific interrupts to be used as interrupt source for IRQ polling. Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: Grant Grundler <grundler@google.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>