sched/core, locking: Document Program-Order guarantees

These are some notes on the scheduler locking and how it provides program order guarantees on SMP systems. ( This commit is in the locking tree, because the new documentation refers to a newly introduced locking primitive. ) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Peter Zijlstra <peterz@infradead.org> 2015-11-17 13:01:11 -0500
committer: Ingo Molnar <mingo@kernel.org> 2015-12-04 04:33:41 -0500
commit: 8643cda549ca49a403160892db68504569ac9052 (patch)
tree: e6a333ec181b60487584cbb3bca73e202d69c349
parent: b3e0b1b6d841a4b2f64fc09ea728913da8218424 (diff)
1 files changed, 91 insertions, 0 deletions
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9f7862da2cd1..91db75018652 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1905,6 +1905,97 @@ static void ttwu_queue(struct task_struct *p, int cpu)
        raw_spin_unlock(&rq->lock);
 }
+/*
+ * Notes on Program-Order guarantees on SMP systems.
+ *
+ *  MIGRATION
+ *
+ * The basic program-order guarantee on SMP systems is that when a task [t]
+ * migrates, all its activity on its old cpu [c0] happens-before any subsequent
+ * execution on its new cpu [c1].
+ *
+ * For migration (of runnable tasks) this is provided by the following means:
+ *
+ *  A) UNLOCK of the rq(c0)->lock scheduling out task t
+ *  B) migration for t is required to synchronize *both* rq(c0)->lock and
+ *     rq(c1)->lock (if not at the same time, then in that order).
+ *  C) LOCK of the rq(c1)->lock scheduling in task
+ *
+ * Transitivity guarantees that B happens after A and C after B.
+ * Note: we only require RCpc transitivity.
+ * Note: the cpu doing B need not be c0 or c1
+ *
+ * Example:
+ *
+ *   CPU0            CPU1            CPU2
+ *
+ *   LOCK rq(0)->lock
+ *   sched-out X
+ *   sched-in Y
+ *   UNLOCK rq(0)->lock
+ *
+ *                                   LOCK rq(0)->lock // orders against CPU0
+ *                                   dequeue X
+ *                                   UNLOCK rq(0)->lock
+ *
+ *                                   LOCK rq(1)->lock
+ *                                   enqueue X
+ *                                   UNLOCK rq(1)->lock
+ *
+ *                   LOCK rq(1)->lock // orders against CPU2
+ *                   sched-out Z
+ *                   sched-in X
+ *                   UNLOCK rq(1)->lock
+ *
+ *
+ *  BLOCKING -- aka. SLEEP + WAKEUP
+ *
+ * For blocking we (obviously) need to provide the same guarantee as for
+ * migration. However the means are completely different as there is no lock
+ * chain to provide order. Instead we do:
+ *
+ *   1) smp_store_release(X->on_cpu, 0)
+ *   2) smp_cond_acquire(!X->on_cpu)
+ *
+ * Example:
+ *
+ *   CPU0 (schedule)  CPU1 (try_to_wake_up) CPU2 (schedule)
+ *
+ *   LOCK rq(0)->lock LOCK X->pi_lock
+ *   dequeue X
+ *   sched-out X
+ *   smp_store_release(X->on_cpu, 0);
+ *
+ *                    smp_cond_acquire(!X->on_cpu);
+ *                    X->state = WAKING
+ *                    set_task_cpu(X,2)
+ *
+ *                    LOCK rq(2)->lock
+ *                    enqueue X
+ *                    X->state = RUNNING
+ *                    UNLOCK rq(2)->lock
+ *
+ *                                          LOCK rq(2)->lock // orders against CPU1
+ *                                          sched-out Z
+ *                                          sched-in X
+ *                                          UNLOCK rq(2)->lock
+ *
+ *                    UNLOCK X->pi_lock
+ *   UNLOCK rq(0)->lock
+ *
+ *
+ * However; for wakeups there is a second guarantee we must provide, namely we
+ * must observe the state that lead to our wakeup. That is, not only must our
+ * task observe its own prior state, it must also observe the stores prior to
+ * its wakeup.
+ *
+ * This means that any means of doing remote wakeups must order the CPU doing
+ * the wakeup against the CPU the task is going to end up running on. This,
+ * however, is already required for the regular Program-Order guarantee above,
+ * since the waking CPU is the one issueing the ACQUIRE (smp_cond_acquire).
+ *
+ */
 /**
 * try_to_wake_up - wake up a thread
 * @p: the thread to be awakened
author	Peter Zijlstra <peterz@infradead.org>	2015-11-17 13:01:11 -0500
committer	Ingo Molnar <mingo@kernel.org>	2015-12-04 04:33:41 -0500
commit	8643cda549ca49a403160892db68504569ac9052 (patch)
tree	e6a333ec181b60487584cbb3bca73e202d69c349
parent	b3e0b1b6d841a4b2f64fc09ea728913da8218424 (diff)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9f7862da2cd1..91db75018652 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c
@@ -1905,6 +1905,97 @@ static void ttwu_queue(struct task_struct *p, int cpu)
1905	raw_spin_unlock(&rq->lock);	1905	raw_spin_unlock(&rq->lock);
1906	}	1906	}
1907		1907
		1908	/*
		1909	* Notes on Program-Order guarantees on SMP systems.
		1910	*
		1911	* MIGRATION
		1912	*
		1913	* The basic program-order guarantee on SMP systems is that when a task [t]
		1914	* migrates, all its activity on its old cpu [c0] happens-before any subsequent
		1915	* execution on its new cpu [c1].
		1916	*
		1917	* For migration (of runnable tasks) this is provided by the following means:
		1918	*
		1919	* A) UNLOCK of the rq(c0)->lock scheduling out task t
		1920	* B) migration for t is required to synchronize both rq(c0)->lock and
		1921	* rq(c1)->lock (if not at the same time, then in that order).
		1922	* C) LOCK of the rq(c1)->lock scheduling in task
		1923	*
		1924	* Transitivity guarantees that B happens after A and C after B.
		1925	* Note: we only require RCpc transitivity.
		1926	* Note: the cpu doing B need not be c0 or c1
		1927	*
		1928	* Example:
		1929	*
		1930	* CPU0 CPU1 CPU2
		1931	*
		1932	* LOCK rq(0)->lock
		1933	* sched-out X
		1934	* sched-in Y
		1935	* UNLOCK rq(0)->lock
		1936	*
		1937	* LOCK rq(0)->lock // orders against CPU0
		1938	* dequeue X
		1939	* UNLOCK rq(0)->lock
		1940	*
		1941	* LOCK rq(1)->lock
		1942	* enqueue X
		1943	* UNLOCK rq(1)->lock
		1944	*
		1945	* LOCK rq(1)->lock // orders against CPU2
		1946	* sched-out Z
		1947	* sched-in X
		1948	* UNLOCK rq(1)->lock
		1949	*
		1950	*
		1951	* BLOCKING -- aka. SLEEP + WAKEUP
		1952	*
		1953	* For blocking we (obviously) need to provide the same guarantee as for
		1954	* migration. However the means are completely different as there is no lock
		1955	* chain to provide order. Instead we do:
		1956	*
		1957	* 1) smp_store_release(X->on_cpu, 0)
		1958	* 2) smp_cond_acquire(!X->on_cpu)
		1959	*
		1960	* Example:
		1961	*
		1962	* CPU0 (schedule) CPU1 (try_to_wake_up) CPU2 (schedule)
		1963	*
		1964	* LOCK rq(0)->lock LOCK X->pi_lock
		1965	* dequeue X
		1966	* sched-out X
		1967	* smp_store_release(X->on_cpu, 0);
		1968	*
		1969	* smp_cond_acquire(!X->on_cpu);
		1970	* X->state = WAKING
		1971	* set_task_cpu(X,2)
		1972	*
		1973	* LOCK rq(2)->lock
		1974	* enqueue X
		1975	* X->state = RUNNING
		1976	* UNLOCK rq(2)->lock
		1977	*
		1978	* LOCK rq(2)->lock // orders against CPU1
		1979	* sched-out Z
		1980	* sched-in X
		1981	* UNLOCK rq(2)->lock
		1982	*
		1983	* UNLOCK X->pi_lock
		1984	* UNLOCK rq(0)->lock
		1985	*
		1986	*
		1987	* However; for wakeups there is a second guarantee we must provide, namely we
		1988	* must observe the state that lead to our wakeup. That is, not only must our
		1989	* task observe its own prior state, it must also observe the stores prior to
		1990	* its wakeup.
		1991	*
		1992	* This means that any means of doing remote wakeups must order the CPU doing
		1993	* the wakeup against the CPU the task is going to end up running on. This,
		1994	* however, is already required for the regular Program-Order guarantee above,
		1995	* since the waking CPU is the one issueing the ACQUIRE (smp_cond_acquire).
		1996	*
		1997	*/
		1998
1908	/**	1999	/**
1909	* try_to_wake_up - wake up a thread	2000	* try_to_wake_up - wake up a thread
1910	* @p: the thread to be awakened	2001	* @p: the thread to be awakened