sched/rt: Use IPI to trigger RT task push migration instead of pulling

When debugging the latencies on a 40 core box, where we hit 300 to 500 microsecond latencies, I found there was a huge contention on the runqueue locks. Investigating it further, running ftrace, I found that it was due to the pulling of RT tasks. The test that was run was the following: cyclictest --numa -p95 -m -d0 -i100 This created a thread on each CPU, that would set its wakeup in iterations of 100 microseconds. The -d0 means that all the threads had the same interval (100us). Each thread sleeps for 100us and wakes up and measures its latencies. cyclictest is maintained at: git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git What happened was another RT task would be scheduled on one of the CPUs that was running our test, when the other CPU tests went to sleep and scheduled idle. This caused the "pull" operation to execute on all these CPUs. Each one of these saw the RT task that was overloaded on the CPU of the test that was still running, and each one tried to grab that task in a thundering herd way. To grab the task, each thread would do a double rq lock grab, grabbing its own lock as well as the rq of the overloaded CPU. As the sched domains on this box was rather flat for its size, I saw up to 12 CPUs block on this lock at once. This caused a ripple affect with the rq locks especially since the taking was done via a double rq lock, which means that several of the CPUs had their own rq locks held while trying to take this rq lock. As these locks were blocked, any wakeups or load balanceing on these CPUs would also block on these locks, and the wait time escalated. I've tried various methods to lessen the load, but things like an atomic counter to only let one CPU grab the task wont work, because the task may have a limited affinity, and we may pick the wrong CPU to take that lock and do the pull, to only find out that the CPU we picked isn't in the task's affinity. Instead of doing the PULL, I now have the CPUs that want the pull to send over an IPI to the overloaded CPU, and let that CPU pick what CPU to push the task to. No more need to grab the rq lock, and the push/pull algorithm still works fine. With this patch, the latency dropped to just 150us over a 20 hour run. Without the patch, the huge latencies would trigger in seconds. I've created a new sched feature called RT_PUSH_IPI, which is enabled by default. When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks and having the pulling CPU do the work is implemented. When RT_PUSH_IPI is enabled, the IPI is sent to the overloaded CPU to do a push. To enabled or disable this at run time: # mount -t debugfs nodev /sys/kernel/debug # echo RT_PUSH_IPI > /sys/kernel/debug/sched_features or # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features Update: This original patch would send an IPI to all CPUs in the RT overload list. But that could theoretically cause the reverse issue. That is, there could be lots of overloaded RT queues and one CPU lowers its priority. It would then send an IPI to all the overloaded RT queues and they could then all try to grab the rq lock of the CPU lowering its priority, and then we have the same problem. The latest design sends out only one IPI to the first overloaded CPU. It tries to push any tasks that it can, and then looks for the next overloaded CPU that can push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable tasks that have priorities greater than the source CPU are covered. In case the source CPU lowers its priority again, a flag is set to tell the IPI traversal to restart with the first RT overloaded CPU after the source CPU. Parts-suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Joern Engel <joern@purestorage.com> Cc: Clark Williams <williams@redhat.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150318144946.2f3cc982@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Steven Rostedt <rostedt@goodmis.org> 2015-03-18 14:49:46 -0400
committer: Ingo Molnar <mingo@kernel.org> 2015-03-23 05:55:22 -0400
commit: b6366f048e0caff28af5335b7af2031266e1b06b (patch)
tree: 41e0f5adf67d4a6c8b53d7f32bbdb0fbe329aeaf /kernel/sched
parent: 71ad00d61ec861dc68b4544887729850e58cb99b (diff)
3 files changed, 202 insertions, 0 deletions
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 90284d117fe6..91e33cd485f6 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -56,6 +56,19 @@ SCHED_FEAT(NONTASK_CAPACITY, true)
 */
 SCHED_FEAT(TTWU_QUEUE, true)
+#ifdef HAVE_RT_PUSH_IPI
+/*
+ * In order to avoid a thundering herd attack of CPUs that are
+ * lowering their priorities at the same time, and there being
+ * a single CPU that has an RT task that can migrate and is waiting
+ * to run, where the other CPUs will try to take that CPUs
+ * rq lock and possibly create a large contention, sending an
+ * IPI to that CPU and let that CPU push the RT task to where
+ * it should go may be a better scenario.
+ */
+SCHED_FEAT(RT_PUSH_IPI, true)
+#endif
 SCHED_FEAT(FORCE_SD_OVERLAP, false)
 SCHED_FEAT(RT_RUNTIME_SHARE, true)
 SCHED_FEAT(LB_MIN, false)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f4d4b077eba0..ad0241561c3e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -6,6 +6,7 @@
 #include "sched.h"
 #include <linux/slab.h>
+#include <linux/irq_work.h>
 int sched_rr_timeslice = RR_TIMESLICE;
@@ -59,6 +60,10 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
        raw_spin_unlock(&rt_b->rt_runtime_lock);
 }
+#ifdef CONFIG_SMP
+static void push_irq_work_func(struct irq_work *work);
+#endif
 void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
 {
        struct rt_prio_array *array;
@@ -78,7 +83,14 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
        rt_rq->rt_nr_migratory = 0;
        rt_rq->overloaded = 0;
        plist_head_init(&rt_rq->pushable_tasks);
+#ifdef HAVE_RT_PUSH_IPI
+        rt_rq->push_flags = 0;
+        rt_rq->push_cpu = nr_cpu_ids;
+        raw_spin_lock_init(&rt_rq->push_lock);
+        init_irq_work(&rt_rq->push_work, push_irq_work_func);
 #endif
+#endif /* CONFIG_SMP */
        /* We start is dequeued state, because no RT tasks are queued */
        rt_rq->rt_queued = 0;
@@ -1778,6 +1790,164 @@ static void push_rt_tasks(struct rq *rq)
                ;
 }
+#ifdef HAVE_RT_PUSH_IPI
+/*
+ * The search for the next cpu always starts at rq->cpu and ends
+ * when we reach rq->cpu again. It will never return rq->cpu.
+ * This returns the next cpu to check, or nr_cpu_ids if the loop
+ * is complete.
+ *
+ * rq->rt.push_cpu holds the last cpu returned by this function,
+ * or if this is the first instance, it must hold rq->cpu.
+ */
+static int rto_next_cpu(struct rq *rq)
+{
+        int prev_cpu = rq->rt.push_cpu;
+        int cpu;
+        cpu = cpumask_next(prev_cpu, rq->rd->rto_mask);
+        /*
+         * If the previous cpu is less than the rq's CPU, then it already
+         * passed the end of the mask, and has started from the beginning.
+         * We end if the next CPU is greater or equal to rq's CPU.
+         */
+        if (prev_cpu < rq->cpu) {
+                if (cpu >= rq->cpu)
+                        return nr_cpu_ids;
+        } else if (cpu >= nr_cpu_ids) {
+                /*
+                 * We passed the end of the mask, start at the beginning.
+                 * If the result is greater or equal to the rq's CPU, then
+                 * the loop is finished.
+                 */
+                cpu = cpumask_first(rq->rd->rto_mask);
+                if (cpu >= rq->cpu)
+                        return nr_cpu_ids;
+        }
+        rq->rt.push_cpu = cpu;
+        /* Return cpu to let the caller know if the loop is finished or not */
+        return cpu;
+}
+static int find_next_push_cpu(struct rq *rq)
+{
+        struct rq *next_rq;
+        int cpu;
+        while (1) {
+                cpu = rto_next_cpu(rq);
+                if (cpu >= nr_cpu_ids)
+                        break;
+                next_rq = cpu_rq(cpu);
+                /* Make sure the next rq can push to this rq */
+                if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr)
+                        break;
+        }
+        return cpu;
+}
+#define RT_PUSH_IPI_EXECUTING           1
+#define RT_PUSH_IPI_RESTART             2
+static void tell_cpu_to_push(struct rq *rq)
+{
+        int cpu;
+        if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
+                raw_spin_lock(&rq->rt.push_lock);
+                /* Make sure it's still executing */
+                if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
+                        /*
+                         * Tell the IPI to restart the loop as things have
+                         * changed since it started.
+                         */
+                        rq->rt.push_flags |= RT_PUSH_IPI_RESTART;
+                        raw_spin_unlock(&rq->rt.push_lock);
+                        return;
+                }
+                raw_spin_unlock(&rq->rt.push_lock);
+        }
+        /* When here, there's no IPI going around */
+        rq->rt.push_cpu = rq->cpu;
+        cpu = find_next_push_cpu(rq);
+        if (cpu >= nr_cpu_ids)
+                return;
+        rq->rt.push_flags = RT_PUSH_IPI_EXECUTING;
+        irq_work_queue_on(&rq->rt.push_work, cpu);
+}
+/* Called from hardirq context */
+static void try_to_push_tasks(void *arg)
+{
+        struct rt_rq *rt_rq = arg;
+        struct rq *rq, *src_rq;
+        int this_cpu;
+        int cpu;
+        this_cpu = rt_rq->push_cpu;
+        /* Paranoid check */
+        BUG_ON(this_cpu != smp_processor_id());
+        rq = cpu_rq(this_cpu);
+        src_rq = rq_of_rt_rq(rt_rq);
+again:
+        if (has_pushable_tasks(rq)) {
+                raw_spin_lock(&rq->lock);
+                push_rt_task(rq);
+                raw_spin_unlock(&rq->lock);
+        }
+        /* Pass the IPI to the next rt overloaded queue */
+        raw_spin_lock(&rt_rq->push_lock);
+        /*
+         * If the source queue changed since the IPI went out,
+         * we need to restart the search from that CPU again.
+         */
+        if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) {
+                rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART;
+                rt_rq->push_cpu = src_rq->cpu;
+        }
+        cpu = find_next_push_cpu(src_rq);
+        if (cpu >= nr_cpu_ids)
+                rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING;
+        raw_spin_unlock(&rt_rq->push_lock);
+        if (cpu >= nr_cpu_ids)
+                return;
+        /*
+         * It is possible that a restart caused this CPU to be
+         * chosen again. Don't bother with an IPI, just see if we
+         * have more to push.
+         */
+        if (unlikely(cpu == rq->cpu))
+                goto again;
+        /* Try the next RT overloaded CPU */
+        irq_work_queue_on(&rt_rq->push_work, cpu);
+}
+static void push_irq_work_func(struct irq_work *work)
+{
+        struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work);
+        try_to_push_tasks(rt_rq);
+}
+#endif /* HAVE_RT_PUSH_IPI */
 static int pull_rt_task(struct rq *this_rq)
 {
        int this_cpu = this_rq->cpu, ret = 0, cpu;
@@ -1793,6 +1963,13 @@ static int pull_rt_task(struct rq *this_rq)
         */
        smp_rmb();
+#ifdef HAVE_RT_PUSH_IPI
+        if (sched_feat(RT_PUSH_IPI)) {
+                tell_cpu_to_push(this_rq);
+                return 0;
+        }
+#endif
        for_each_cpu(cpu, this_rq->rd->rto_mask) {
                if (this_cpu == cpu)
                        continue;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index dc0f435a2779..c2c0d7bd5027 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -6,6 +6,7 @@
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
 #include <linux/stop_machine.h>
+#include <linux/irq_work.h>
 #include <linux/tick.h>
 #include <linux/slab.h>
@@ -418,6 +419,11 @@ static inline int rt_bandwidth_enabled(void)
        return sysctl_sched_rt_runtime >= 0;
 }
+/* RT IPI pull logic requires IRQ_WORK */
+#ifdef CONFIG_IRQ_WORK
+# define HAVE_RT_PUSH_IPI
+#endif
 /* Real-Time classes' related field in a runqueue: */
 struct rt_rq {
        struct rt_prio_array active;
@@ -435,7 +441,13 @@ struct rt_rq {
        unsigned long rt_nr_total;
        int overloaded;
        struct plist_head pushable_tasks;
+#ifdef HAVE_RT_PUSH_IPI
+        int push_flags;
+        int push_cpu;
+        struct irq_work push_work;
+        raw_spinlock_t push_lock;
 #endif
+#endif /* CONFIG_SMP */
        int rt_queued;
        int rt_throttled;
author	Steven Rostedt <rostedt@goodmis.org>	2015-03-18 14:49:46 -0400
committer	Ingo Molnar <mingo@kernel.org>	2015-03-23 05:55:22 -0400
commit	b6366f048e0caff28af5335b7af2031266e1b06b (patch)
tree	41e0f5adf67d4a6c8b53d7f32bbdb0fbe329aeaf /kernel/sched
parent	71ad00d61ec861dc68b4544887729850e58cb99b (diff)

diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 90284d117fe6..91e33cd485f6 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h
@@ -56,6 +56,19 @@ SCHED_FEAT(NONTASK_CAPACITY, true)
56	*/	56	*/
57	SCHED_FEAT(TTWU_QUEUE, true)	57	SCHED_FEAT(TTWU_QUEUE, true)
58		58
		59	#ifdef HAVE_RT_PUSH_IPI
		60	/*
		61	* In order to avoid a thundering herd attack of CPUs that are
		62	* lowering their priorities at the same time, and there being
		63	* a single CPU that has an RT task that can migrate and is waiting
		64	* to run, where the other CPUs will try to take that CPUs
		65	* rq lock and possibly create a large contention, sending an
		66	* IPI to that CPU and let that CPU push the RT task to where
		67	* it should go may be a better scenario.
		68	*/
		69	SCHED_FEAT(RT_PUSH_IPI, true)
		70	#endif
		71
59	SCHED_FEAT(FORCE_SD_OVERLAP, false)	72	SCHED_FEAT(FORCE_SD_OVERLAP, false)
60	SCHED_FEAT(RT_RUNTIME_SHARE, true)	73	SCHED_FEAT(RT_RUNTIME_SHARE, true)
61	SCHED_FEAT(LB_MIN, false)	74	SCHED_FEAT(LB_MIN, false)


diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index f4d4b077eba0..ad0241561c3e 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c
@@ -6,6 +6,7 @@
6	#include "sched.h"	6	#include "sched.h"
7		7
8	#include <linux/slab.h>	8	#include <linux/slab.h>
		9	#include <linux/irq_work.h>
9		10
10	int sched_rr_timeslice = RR_TIMESLICE;	11	int sched_rr_timeslice = RR_TIMESLICE;
11		12
@@ -59,6 +60,10 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b)
59	raw_spin_unlock(&rt_b->rt_runtime_lock);	60	raw_spin_unlock(&rt_b->rt_runtime_lock);
60	}	61	}
61		62
		63	#ifdef CONFIG_SMP
		64	static void push_irq_work_func(struct irq_work *work);
		65	#endif
		66
62	void init_rt_rq(struct rt_rq rt_rq, struct rq rq)	67	void init_rt_rq(struct rt_rq rt_rq, struct rq rq)
63	{	68	{
64	struct rt_prio_array *array;	69	struct rt_prio_array *array;
@@ -78,7 +83,14 @@ void init_rt_rq(struct rt_rq rt_rq, struct rq rq)
78	rt_rq->rt_nr_migratory = 0;	83	rt_rq->rt_nr_migratory = 0;
79	rt_rq->overloaded = 0;	84	rt_rq->overloaded = 0;
80	plist_head_init(&rt_rq->pushable_tasks);	85	plist_head_init(&rt_rq->pushable_tasks);
		86
		87	#ifdef HAVE_RT_PUSH_IPI
		88	rt_rq->push_flags = 0;
		89	rt_rq->push_cpu = nr_cpu_ids;
		90	raw_spin_lock_init(&rt_rq->push_lock);
		91	init_irq_work(&rt_rq->push_work, push_irq_work_func);
81	#endif	92	#endif
		93	#endif /* CONFIG_SMP */
82	/* We start is dequeued state, because no RT tasks are queued */	94	/* We start is dequeued state, because no RT tasks are queued */
83	rt_rq->rt_queued = 0;	95	rt_rq->rt_queued = 0;
84		96
@@ -1778,6 +1790,164 @@ static void push_rt_tasks(struct rq *rq)
1778	;	1790	;
1779	}	1791	}
1780		1792
		1793	#ifdef HAVE_RT_PUSH_IPI
		1794	/*
		1795	* The search for the next cpu always starts at rq->cpu and ends
		1796	* when we reach rq->cpu again. It will never return rq->cpu.
		1797	* This returns the next cpu to check, or nr_cpu_ids if the loop
		1798	* is complete.
		1799	*
		1800	* rq->rt.push_cpu holds the last cpu returned by this function,
		1801	* or if this is the first instance, it must hold rq->cpu.
		1802	*/
		1803	static int rto_next_cpu(struct rq *rq)
		1804	{
		1805	int prev_cpu = rq->rt.push_cpu;
		1806	int cpu;
		1807
		1808	cpu = cpumask_next(prev_cpu, rq->rd->rto_mask);
		1809
		1810	/*
		1811	* If the previous cpu is less than the rq's CPU, then it already
		1812	* passed the end of the mask, and has started from the beginning.
		1813	* We end if the next CPU is greater or equal to rq's CPU.
		1814	*/
		1815	if (prev_cpu < rq->cpu) {
		1816	if (cpu >= rq->cpu)
		1817	return nr_cpu_ids;
		1818
		1819	} else if (cpu >= nr_cpu_ids) {
		1820	/*
		1821	* We passed the end of the mask, start at the beginning.
		1822	* If the result is greater or equal to the rq's CPU, then
		1823	* the loop is finished.
		1824	*/
		1825	cpu = cpumask_first(rq->rd->rto_mask);
		1826	if (cpu >= rq->cpu)
		1827	return nr_cpu_ids;
		1828	}
		1829	rq->rt.push_cpu = cpu;
		1830
		1831	/* Return cpu to let the caller know if the loop is finished or not */
		1832	return cpu;
		1833	}
		1834
		1835	static int find_next_push_cpu(struct rq *rq)
		1836	{
		1837	struct rq *next_rq;
		1838	int cpu;
		1839
		1840	while (1) {
		1841	cpu = rto_next_cpu(rq);
		1842	if (cpu >= nr_cpu_ids)
		1843	break;
		1844	next_rq = cpu_rq(cpu);
		1845
		1846	/* Make sure the next rq can push to this rq */
		1847	if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr)
		1848	break;
		1849	}
		1850
		1851	return cpu;
		1852	}
		1853
		1854	#define RT_PUSH_IPI_EXECUTING 1
		1855	#define RT_PUSH_IPI_RESTART 2
		1856
		1857	static void tell_cpu_to_push(struct rq *rq)
		1858	{
		1859	int cpu;
		1860
		1861	if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
		1862	raw_spin_lock(&rq->rt.push_lock);
		1863	/* Make sure it's still executing */
		1864	if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) {
		1865	/*
		1866	* Tell the IPI to restart the loop as things have
		1867	* changed since it started.
		1868	*/
		1869	rq->rt.push_flags \|= RT_PUSH_IPI_RESTART;
		1870	raw_spin_unlock(&rq->rt.push_lock);
		1871	return;
		1872	}
		1873	raw_spin_unlock(&rq->rt.push_lock);
		1874	}
		1875
		1876	/* When here, there's no IPI going around */
		1877
		1878	rq->rt.push_cpu = rq->cpu;
		1879	cpu = find_next_push_cpu(rq);
		1880	if (cpu >= nr_cpu_ids)
		1881	return;
		1882
		1883	rq->rt.push_flags = RT_PUSH_IPI_EXECUTING;
		1884
		1885	irq_work_queue_on(&rq->rt.push_work, cpu);
		1886	}
		1887
		1888	/* Called from hardirq context */
		1889	static void try_to_push_tasks(void *arg)
		1890	{
		1891	struct rt_rq *rt_rq = arg;
		1892	struct rq rq, src_rq;
		1893	int this_cpu;
		1894	int cpu;
		1895
		1896	this_cpu = rt_rq->push_cpu;
		1897
		1898	/* Paranoid check */
		1899	BUG_ON(this_cpu != smp_processor_id());
		1900
		1901	rq = cpu_rq(this_cpu);
		1902	src_rq = rq_of_rt_rq(rt_rq);
		1903
		1904	again:
		1905	if (has_pushable_tasks(rq)) {
		1906	raw_spin_lock(&rq->lock);
		1907	push_rt_task(rq);
		1908	raw_spin_unlock(&rq->lock);
		1909	}
		1910
		1911	/* Pass the IPI to the next rt overloaded queue */
		1912	raw_spin_lock(&rt_rq->push_lock);
		1913	/*
		1914	* If the source queue changed since the IPI went out,
		1915	* we need to restart the search from that CPU again.
		1916	*/
		1917	if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) {
		1918	rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART;
		1919	rt_rq->push_cpu = src_rq->cpu;
		1920	}
		1921
		1922	cpu = find_next_push_cpu(src_rq);
		1923
		1924	if (cpu >= nr_cpu_ids)
		1925	rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING;
		1926	raw_spin_unlock(&rt_rq->push_lock);
		1927
		1928	if (cpu >= nr_cpu_ids)
		1929	return;
		1930
		1931	/*
		1932	* It is possible that a restart caused this CPU to be
		1933	* chosen again. Don't bother with an IPI, just see if we
		1934	* have more to push.
		1935	*/
		1936	if (unlikely(cpu == rq->cpu))
		1937	goto again;
		1938
		1939	/* Try the next RT overloaded CPU */
		1940	irq_work_queue_on(&rt_rq->push_work, cpu);
		1941	}
		1942
		1943	static void push_irq_work_func(struct irq_work *work)
		1944	{
		1945	struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work);
		1946
		1947	try_to_push_tasks(rt_rq);
		1948	}
		1949	#endif /* HAVE_RT_PUSH_IPI */
		1950
1781	static int pull_rt_task(struct rq *this_rq)	1951	static int pull_rt_task(struct rq *this_rq)
1782	{	1952	{
1783	int this_cpu = this_rq->cpu, ret = 0, cpu;	1953	int this_cpu = this_rq->cpu, ret = 0, cpu;
@@ -1793,6 +1963,13 @@ static int pull_rt_task(struct rq *this_rq)
1793	*/	1963	*/
1794	smp_rmb();	1964	smp_rmb();
1795		1965
		1966	#ifdef HAVE_RT_PUSH_IPI
		1967	if (sched_feat(RT_PUSH_IPI)) {
		1968	tell_cpu_to_push(this_rq);
		1969	return 0;
		1970	}
		1971	#endif
		1972
1796	for_each_cpu(cpu, this_rq->rd->rto_mask) {	1973	for_each_cpu(cpu, this_rq->rd->rto_mask) {
1797	if (this_cpu == cpu)	1974	if (this_cpu == cpu)
1798	continue;	1975	continue;


diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index dc0f435a2779..c2c0d7bd5027 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h
@@ -6,6 +6,7 @@
6	#include <linux/mutex.h>	6	#include <linux/mutex.h>
7	#include <linux/spinlock.h>	7	#include <linux/spinlock.h>
8	#include <linux/stop_machine.h>	8	#include <linux/stop_machine.h>
		9	#include <linux/irq_work.h>
9	#include <linux/tick.h>	10	#include <linux/tick.h>
10	#include <linux/slab.h>	11	#include <linux/slab.h>
11		12
@@ -418,6 +419,11 @@ static inline int rt_bandwidth_enabled(void)
418	return sysctl_sched_rt_runtime >= 0;	419	return sysctl_sched_rt_runtime >= 0;
419	}	420	}
420		421
		422	/* RT IPI pull logic requires IRQ_WORK */
		423	#ifdef CONFIG_IRQ_WORK
		424	# define HAVE_RT_PUSH_IPI
		425	#endif
		426
421	/* Real-Time classes' related field in a runqueue: */	427	/* Real-Time classes' related field in a runqueue: */
422	struct rt_rq {	428	struct rt_rq {
423	struct rt_prio_array active;	429	struct rt_prio_array active;
@@ -435,7 +441,13 @@ struct rt_rq {
435	unsigned long rt_nr_total;	441	unsigned long rt_nr_total;
436	int overloaded;	442	int overloaded;
437	struct plist_head pushable_tasks;	443	struct plist_head pushable_tasks;
		444	#ifdef HAVE_RT_PUSH_IPI
		445	int push_flags;
		446	int push_cpu;
		447	struct irq_work push_work;
		448	raw_spinlock_t push_lock;
438	#endif	449	#endif
		450	#endif /* CONFIG_SMP */
439	int rt_queued;	451	int rt_queued;
440		452
441	int rt_throttled;	453	int rt_throttled;