diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-04-27 15:03:59 -0400 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-04-27 15:03:59 -0400 |
| commit | b75a2bf899b668b1d52de8846aafdbcf81349c73 (patch) | |
| tree | 6696988bd5a31543833d4b275605d766e0a20840 /kernel/workqueue.c | |
| parent | 763cfc86ee8fd728a7cf2334b8d3a897af7a7ade (diff) | |
| parent | 346c09f80459a3ad97df1816d6d606169a51001a (diff) | |
Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"So, it turns out we had a silly bug in the most fundamental part of
workqueue for a very long time. AFAICS, this dates back to pre-git
era and has quite likely been there from the time workqueue was first
introduced.
A work item uses its PENDING bit to synchronize multiple queuers.
Anyone who wins the PENDING bit owns the pending state of the work
item. Whether a queuer wins or loses the race, one thing should be
guaranteed - there will soon be at least one execution of the work
item - where "after" means that the execution instance would be able
to see all the changes that the queuer has made prior to the queueing
attempt.
Unfortunately, we were missing a smp_mb() after clearing PENDING for
execution, so nothing guaranteed visibility of the changes that a
queueing loser has made, which manifested as a reproducible blk-mq
stall.
Lots of kudos to Roman for debugging the problem. The patch for
-stable is the minimal one. For v3.7, Peter is working on a patch to
make the code path slightly more efficient and less fragile"
* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix ghost PENDING flag while doing MQ IO
Diffstat (limited to 'kernel/workqueue.c')
| -rw-r--r-- | kernel/workqueue.c | 29 |
1 files changed, 29 insertions, 0 deletions
diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 2232ae3e3ad6..3bfdff06eea7 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c | |||
| @@ -666,6 +666,35 @@ static void set_work_pool_and_clear_pending(struct work_struct *work, | |||
| 666 | */ | 666 | */ |
| 667 | smp_wmb(); | 667 | smp_wmb(); |
| 668 | set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT, 0); | 668 | set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT, 0); |
| 669 | /* | ||
| 670 | * The following mb guarantees that previous clear of a PENDING bit | ||
| 671 | * will not be reordered with any speculative LOADS or STORES from | ||
| 672 | * work->current_func, which is executed afterwards. This possible | ||
| 673 | * reordering can lead to a missed execution on attempt to qeueue | ||
| 674 | * the same @work. E.g. consider this case: | ||
| 675 | * | ||
| 676 | * CPU#0 CPU#1 | ||
| 677 | * ---------------------------- -------------------------------- | ||
| 678 | * | ||
| 679 | * 1 STORE event_indicated | ||
| 680 | * 2 queue_work_on() { | ||
| 681 | * 3 test_and_set_bit(PENDING) | ||
| 682 | * 4 } set_..._and_clear_pending() { | ||
| 683 | * 5 set_work_data() # clear bit | ||
| 684 | * 6 smp_mb() | ||
| 685 | * 7 work->current_func() { | ||
| 686 | * 8 LOAD event_indicated | ||
| 687 | * } | ||
| 688 | * | ||
| 689 | * Without an explicit full barrier speculative LOAD on line 8 can | ||
| 690 | * be executed before CPU#0 does STORE on line 1. If that happens, | ||
| 691 | * CPU#0 observes the PENDING bit is still set and new execution of | ||
| 692 | * a @work is not queued in a hope, that CPU#1 will eventually | ||
| 693 | * finish the queued @work. Meanwhile CPU#1 does not see | ||
| 694 | * event_indicated is set, because speculative LOAD was executed | ||
| 695 | * before actual STORE. | ||
| 696 | */ | ||
| 697 | smp_mb(); | ||
| 669 | } | 698 | } |
| 670 | 699 | ||
| 671 | static void clear_work_data(struct work_struct *work) | 700 | static void clear_work_data(struct work_struct *work) |
