Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue fix from Tejun Heo: "So, it turns out we had a silly bug in the most fundamental part of workqueue for a very long time. AFAICS, this dates back to pre-git era and has quite likely been there from the time workqueue was first introduced. A work item uses its PENDING bit to synchronize multiple queuers. Anyone who wins the PENDING bit owns the pending state of the work item. Whether a queuer wins or loses the race, one thing should be guaranteed - there will soon be at least one execution of the work item - where "after" means that the execution instance would be able to see all the changes that the queuer has made prior to the queueing attempt. Unfortunately, we were missing a smp_mb() after clearing PENDING for execution, so nothing guaranteed visibility of the changes that a queueing loser has made, which manifested as a reproducible blk-mq stall. Lots of kudos to Roman for debugging the problem. The patch for -stable is the minimal one. For v3.7, Peter is working on a patch to make the code path slightly more efficient and less fragile" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix ghost PENDING flag while doing MQ IO
author: Linus Torvalds <torvalds@linux-foundation.org> 2016-04-27 15:03:59 -0400
committer: Linus Torvalds <torvalds@linux-foundation.org> 2016-04-27 15:03:59 -0400
commit: b75a2bf899b668b1d52de8846aafdbcf81349c73 (patch)
tree: 6696988bd5a31543833d4b275605d766e0a20840 /kernel/workqueue.c
parent: 763cfc86ee8fd728a7cf2334b8d3a897af7a7ade (diff)
parent: 346c09f80459a3ad97df1816d6d606169a51001a (diff)
1 files changed, 29 insertions, 0 deletions
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 2232ae3e3ad6..3bfdff06eea7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -666,6 +666,35 @@ static void set_work_pool_and_clear_pending(struct work_struct *work,
         */
        smp_wmb();
        set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT, 0);
+        /*
+         * The following mb guarantees that previous clear of a PENDING bit
+         * will not be reordered with any speculative LOADS or STORES from
+         * work->current_func, which is executed afterwards.  This possible
+         * reordering can lead to a missed execution on attempt to qeueue
+         * the same @work.  E.g. consider this case:
+         *
+         *   CPU#0                         CPU#1
+         *   ----------------------------  --------------------------------
+         *
+         * 1  STORE event_indicated
+         * 2  queue_work_on() {
+         * 3    test_and_set_bit(PENDING)
+         * 4 }                             set_..._and_clear_pending() {
+         * 5                                 set_work_data() # clear bit
+         * 6                                 smp_mb()
+         * 7                               work->current_func() {
+         * 8                                  LOAD event_indicated
+         *                                 }
+         *
+         * Without an explicit full barrier speculative LOAD on line 8 can
+         * be executed before CPU#0 does STORE on line 1.  If that happens,
+         * CPU#0 observes the PENDING bit is still set and new execution of
+         * a @work is not queued in a hope, that CPU#1 will eventually
+         * finish the queued @work.  Meanwhile CPU#1 does not see
+         * event_indicated is set, because speculative LOAD was executed
+         * before actual STORE.
+         */
+        smp_mb();
 }
 static void clear_work_data(struct work_struct *work)
author	Linus Torvalds <torvalds@linux-foundation.org>	2016-04-27 15:03:59 -0400
committer	Linus Torvalds <torvalds@linux-foundation.org>	2016-04-27 15:03:59 -0400
commit	b75a2bf899b668b1d52de8846aafdbcf81349c73 (patch)
tree	6696988bd5a31543833d4b275605d766e0a20840 /kernel/workqueue.c
parent	763cfc86ee8fd728a7cf2334b8d3a897af7a7ade (diff)
parent	346c09f80459a3ad97df1816d6d606169a51001a (diff)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 2232ae3e3ad6..3bfdff06eea7 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c
@@ -666,6 +666,35 @@ static void set_work_pool_and_clear_pending(struct work_struct *work,
666	*/	666	*/
667	smp_wmb();	667	smp_wmb();
668	set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT, 0);	668	set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT, 0);
		669	/*
		670	* The following mb guarantees that previous clear of a PENDING bit
		671	* will not be reordered with any speculative LOADS or STORES from
		672	* work->current_func, which is executed afterwards. This possible
		673	* reordering can lead to a missed execution on attempt to qeueue
		674	* the same @work. E.g. consider this case:
		675	*
		676	* CPU#0 CPU#1
		677	* ---------------------------- --------------------------------
		678	*
		679	* 1 STORE event_indicated
		680	* 2 queue_work_on() {
		681	* 3 test_and_set_bit(PENDING)
		682	* 4 } set_..._and_clear_pending() {
		683	* 5 set_work_data() # clear bit
		684	* 6 smp_mb()
		685	* 7 work->current_func() {
		686	* 8 LOAD event_indicated
		687	* }
		688	*
		689	* Without an explicit full barrier speculative LOAD on line 8 can
		690	* be executed before CPU#0 does STORE on line 1. If that happens,
		691	* CPU#0 observes the PENDING bit is still set and new execution of
		692	* a @work is not queued in a hope, that CPU#1 will eventually
		693	* finish the queued @work. Meanwhile CPU#1 does not see
		694	* event_indicated is set, because speculative LOAD was executed
		695	* before actual STORE.
		696	*/
		697	smp_mb();
669	}	698	}
670		699
671	static void clear_work_data(struct work_struct *work)	700	static void clear_work_data(struct work_struct *work)