aboutsummaryrefslogtreecommitdiffstats
path: root/fs/xfs
diff options
context:
space:
mode:
authorDave Chinner <dchinner@redhat.com>2014-09-23 01:57:59 -0400
committerDave Chinner <david@fromorbit.com>2014-09-23 01:57:59 -0400
commit8af3dcd3c89aef10375bdd79d5f336b22b57487c (patch)
treeb09268c1a3c1756edc0356bdcbd34c4e90f608f4 /fs/xfs
parentab6978c295b074eb2ba4b06fdf206c7ab4f293e5 (diff)
xfs: xlog_cil_force_lsn doesn't always wait correctly
When running a tight mount/unmount loop on an older kernel, RedHat QE found that unmount would occasionally hang in xfs_buf_unpin_wait() on the superblock buffer. Tracing and other debug work by Eric Sandeen indicated that it was hanging on the writing of the superblock during unmount immediately after logging the superblock counters in a synchronous transaction. Further debug indicated that the synchronous transaction was not waiting for completion correctly, and we narrowed it down to xlog_cil_force_lsn() returning NULLCOMMITLSN and hence not pushing the transaction in the iclog buffer to disk correctly. While this unmount superblock write code is now very different in mainline kernels, the xlog_cil_force_lsn() code is identical, and it was bisected to the backport of commit f876e44 ("xfs: always do log forces via the workqueue"). This commit made the CIL push asynchronous for log forces and hence exposed a race condition that couldn't occur on a synchronous push. Essentially, the xlog_cil_force_lsn() relied implicitly on the fact that the sequence push would be complete by the time xlog_cil_push_now() returned, resulting in the context being pushed being in the committing list. When it was made asynchronous, it was recognised that there was a race condition in detecting whether an asynchronous push has started or not and code was added to handle it. Unfortunately, the fix was not quite right and left a race condition where it it would detect an empty CIL while a push was in progress before the context had been added to the committing list. This was incorrectly seen as a "nothing to do" condition and so would tell xfs_log_force_lsn() that there is nothing to wait for, and hence it would push the iclogbufs in memory. The fix is simple, but explaining the logic and the race condition is a lot more complex. The fix is to add the context to the committing list before we start emptying the CIL. This allows us to detect the difference between an empty "do nothing" push and a push that has not started by adding a discrete "emptying the CIL" state to avoid the transient, incorrect "empty" condition that the (unchanged) waiting code was seeing. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Diffstat (limited to 'fs/xfs')
-rw-r--r--fs/xfs/xfs_log_cil.c47
1 files changed, 38 insertions, 9 deletions
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index f6b79e5325dd..f506c457011e 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -463,12 +463,40 @@ xlog_cil_push(
463 spin_unlock(&cil->xc_push_lock); 463 spin_unlock(&cil->xc_push_lock);
464 goto out_skip; 464 goto out_skip;
465 } 465 }
466 spin_unlock(&cil->xc_push_lock);
467 466
468 467
469 /* check for a previously pushed seqeunce */ 468 /* check for a previously pushed seqeunce */
470 if (push_seq < cil->xc_ctx->sequence) 469 if (push_seq < cil->xc_ctx->sequence) {
470 spin_unlock(&cil->xc_push_lock);
471 goto out_skip; 471 goto out_skip;
472 }
473
474 /*
475 * We are now going to push this context, so add it to the committing
476 * list before we do anything else. This ensures that anyone waiting on
477 * this push can easily detect the difference between a "push in
478 * progress" and "CIL is empty, nothing to do".
479 *
480 * IOWs, a wait loop can now check for:
481 * the current sequence not being found on the committing list;
482 * an empty CIL; and
483 * an unchanged sequence number
484 * to detect a push that had nothing to do and therefore does not need
485 * waiting on. If the CIL is not empty, we get put on the committing
486 * list before emptying the CIL and bumping the sequence number. Hence
487 * an empty CIL and an unchanged sequence number means we jumped out
488 * above after doing nothing.
489 *
490 * Hence the waiter will either find the commit sequence on the
491 * committing list or the sequence number will be unchanged and the CIL
492 * still dirty. In that latter case, the push has not yet started, and
493 * so the waiter will have to continue trying to check the CIL
494 * committing list until it is found. In extreme cases of delay, the
495 * sequence may fully commit between the attempts the wait makes to wait
496 * on the commit sequence.
497 */
498 list_add(&ctx->committing, &cil->xc_committing);
499 spin_unlock(&cil->xc_push_lock);
472 500
473 /* 501 /*
474 * pull all the log vectors off the items in the CIL, and 502 * pull all the log vectors off the items in the CIL, and
@@ -532,7 +560,6 @@ xlog_cil_push(
532 */ 560 */
533 spin_lock(&cil->xc_push_lock); 561 spin_lock(&cil->xc_push_lock);
534 cil->xc_current_sequence = new_ctx->sequence; 562 cil->xc_current_sequence = new_ctx->sequence;
535 list_add(&ctx->committing, &cil->xc_committing);
536 spin_unlock(&cil->xc_push_lock); 563 spin_unlock(&cil->xc_push_lock);
537 up_write(&cil->xc_ctx_lock); 564 up_write(&cil->xc_ctx_lock);
538 565
@@ -855,13 +882,15 @@ restart:
855 * Hence by the time we have got here it our sequence may not have been 882 * Hence by the time we have got here it our sequence may not have been
856 * pushed yet. This is true if the current sequence still matches the 883 * pushed yet. This is true if the current sequence still matches the
857 * push sequence after the above wait loop and the CIL still contains 884 * push sequence after the above wait loop and the CIL still contains
858 * dirty objects. 885 * dirty objects. This is guaranteed by the push code first adding the
886 * context to the committing list before emptying the CIL.
859 * 887 *
860 * When the push occurs, it will empty the CIL and atomically increment 888 * Hence if we don't find the context in the committing list and the
861 * the currect sequence past the push sequence and move it into the 889 * current sequence number is unchanged then the CIL contents are
862 * committing list. Of course, if the CIL is clean at the time of the 890 * significant. If the CIL is empty, if means there was nothing to push
863 * push, it won't have pushed the CIL at all, so in that case we should 891 * and that means there is nothing to wait for. If the CIL is not empty,
864 * try the push for this sequence again from the start just in case. 892 * it means we haven't yet started the push, because if it had started
893 * we would have found the context on the committing list.
865 */ 894 */
866 if (sequence == cil->xc_current_sequence && 895 if (sequence == cil->xc_current_sequence &&
867 !list_empty(&cil->xc_cil)) { 896 !list_empty(&cil->xc_cil)) {