aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorQuentin Barnes <qbarnes+linux@yahoo-inc.com>2008-03-19 20:00:39 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2008-03-19 21:53:35 -0400
commit6cb2a21049b8990df4576c5fce4d48d0206c22d5 (patch)
tree4303438449e0c0d4859e1818115e09d2fd8340ca
parent264e3e889d86e552b4191d69bb60f4f3b383135a (diff)
aio: bad AIO race in aio_complete() leads to process hang
My group ran into a AIO process hang on a 2.6.24 kernel with the process sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come and it never would. We ran the tests on x86_64 SMP. The hang only occurred on a Xeon box ("Clovertown") but not a Core2Duo ("Conroe"). On the Xeon, the L2 cache isn't shared between all eight processors, but is L2 is shared between between all two processors on the Core2Duo we use. My analysis of the hang is if you go down to the second while-loop in read_events(), what happens on processor #1: 1) add_wait_queue_exclusive() adds thread to ctx->wait 2) aio_read_evt() to check tail 3) if aio_read_evt() returned 0, call [io_]schedule() and sleep In aio_complete() with processor #2: A) info->tail = tail; B) waitqueue_active(&ctx->wait) C) if waitqueue_active() returned non-0, call wake_up() The way the code is written, step 1 must be seen by all other processors before processor 1 checks for pending events in step 2 (that were recorded by step A) and step A by processor 2 must be seen by all other processors (checked in step 2) before step B is done. The race I believed I was seeing is that steps 1 and 2 were effectively swapped due to the __list_add() being delayed by the L2 cache not shared by some of the other processors. Imagine: proc 2: just before step A proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet proc 1, step 2: checks tail and sees no pending events proc 2, step A: updates tail proc 1, step 3: calls [io_]schedule() and sleeps proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup so proc 1 sleeps indefinitely My patch adds a memory barrier between steps A and B. It ensures that the update in step 1 gets seen on processor 2 before continuing. If processor 1 was just before step 1, the memory barrier makes sure that step A (update tail) gets seen by the time processor 1 makes it to step 2 (check tail). Before the patch our AIO process would hang virtually 100% of the time. After the patch, we have yet to see the process ever hang. Signed-off-by: Quentin Barnes <qbarnes+linux@yahoo-inc.com> Reviewed-by: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: <stable@kernel.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ We should probably disallow that "if (waitqueue_active()) wake_up()" coding pattern, because it's so often buggy wrt memory ordering ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r--fs/aio.c8
1 files changed, 8 insertions, 0 deletions
diff --git a/fs/aio.c b/fs/aio.c
index b74c567383bc..6af921940622 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -996,6 +996,14 @@ put_rq:
996 /* everything turned out well, dispose of the aiocb. */ 996 /* everything turned out well, dispose of the aiocb. */
997 ret = __aio_put_req(ctx, iocb); 997 ret = __aio_put_req(ctx, iocb);
998 998
999 /*
1000 * We have to order our ring_info tail store above and test
1001 * of the wait list below outside the wait lock. This is
1002 * like in wake_up_bit() where clearing a bit has to be
1003 * ordered with the unlocked test.
1004 */
1005 smp_mb();
1006
999 if (waitqueue_active(&ctx->wait)) 1007 if (waitqueue_active(&ctx->wait))
1000 wake_up(&ctx->wait); 1008 wake_up(&ctx->wait);
1001 1009