Btrfs: fix race setting up and completing qgroup rescan workers

There is a race between setting up a qgroup rescan worker and completing a qgroup rescan worker that can lead to callers of the qgroup rescan wait ioctl to either not wait for the rescan worker to complete or to hang forever due to missing wake ups. The following diagram shows a sequence of steps that illustrates the race. CPU 1 CPU 2 CPU 3 btrfs_ioctl_quota_rescan() btrfs_qgroup_rescan() qgroup_rescan_init() mutex_lock(&fs_info->qgroup_rescan_lock) spin_lock(&fs_info->qgroup_lock) fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN init_completion( &fs_info->qgroup_rescan_completion) fs_info->qgroup_rescan_running = true mutex_unlock(&fs_info->qgroup_rescan_lock) spin_unlock(&fs_info->qgroup_lock) btrfs_init_work() --> starts the worker btrfs_qgroup_rescan_worker() mutex_lock(&fs_info->qgroup_rescan_lock) fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN mutex_unlock(&fs_info->qgroup_rescan_lock) starts transaction, updates qgroup status item, etc btrfs_ioctl_quota_rescan() btrfs_qgroup_rescan() qgroup_rescan_init() mutex_lock(&fs_info->qgroup_rescan_lock) spin_lock(&fs_info->qgroup_lock) fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN init_completion( &fs_info->qgroup_rescan_completion) fs_info->qgroup_rescan_running = true mutex_unlock(&fs_info->qgroup_rescan_lock) spin_unlock(&fs_info->qgroup_lock) btrfs_init_work() --> starts another worker mutex_lock(&fs_info->qgroup_rescan_lock) fs_info->qgroup_rescan_running = false mutex_unlock(&fs_info->qgroup_rescan_lock) complete_all(&fs_info->qgroup_rescan_completion) Before the rescan worker started by the task at CPU 3 completes, if another task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS because the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at fs_info->qgroup_flags, which is expected and correct behaviour. However if other task calls btrfs_ioctl_quota_rescan_wait() before the rescan worker started by the task at CPU 3 completes, it will return immediately without waiting for the new rescan worker to complete, because fs_info->qgroup_rescan_running is set to false by CPU 2. This race is making test case btrfs/171 (from fstests) to fail often: btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad) --- tests/btrfs/171.out 2018-09-16 21:30:48.505104287 +0100 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad 2019-09-19 02:01:36.938486039 +0100 @@ -1,2 +1,3 @@ QA output created by 171 +ERROR: quota rescan failed: Operation now in progress Silence is golden ... (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad' to see the entire diff) That is because the test calls the btrfs-progs commands "qgroup quota rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs" commands 'qgroup assign' and 'qgroup remove' often call the rescan start ioctl after calling the qgroup assign ioctl, btrfs_ioctl_qgroup_assign()), since previous waits didn't actually wait for a rescan worker to complete. Another problem the race can cause is missing wake ups for waiters, since the call to complete_all() happens outside a critical section and after clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence diagram above, if we have a waiter for the first rescan task (executed by CPU 2), then fs_info->qgroup_rescan_completion.wait is not empty, and if after the rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and before it calls complete_all() against fs_info->qgroup_rescan_completion, the task at CPU 3 calls init_completion() against fs_info->qgroup_rescan_completion which re-initilizes its wait queue to an empty queue, therefore causing the rescan worker at CPU 2 to call complete_all() against an empty queue, never waking up the task waiting for that rescan worker. Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting fs_info->qgroup_rescan_running to false in the same critical section, delimited by the mutex fs_info->qgroup_rescan_lock, as well as doing the call to complete_all() in that same critical section. This gives the protection needed to avoid rescan wait ioctl callers not waiting for a running rescan worker and the lost wake ups problem, since setting that rescan flag and boolean as well as initializing the wait queue is done already in a critical section delimited by that mutex (at qgroup_rescan_init()). Fixes: 57254b6ebce4ce ("Btrfs: add ioctl to wait for qgroup rescan completion") Fixes: d2c609b834d62f ("btrfs: properly track when rescan worker is running") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
author: Filipe Manana <fdmanana@suse.com> 2019-09-24 05:49:54 -0400
committer: David Sterba <dsterba@suse.com> 2019-09-24 10:38:53 -0400
commit: 13fc1d271a2e3ab8a02071e711add01fab9271f6 (patch)
tree: 6e88d30879d67a5940f7c7b104be6cf7a81c1503
parent: 0607eb1d452d45c5ac4c745a9e9e0d95152ea9d0 (diff)
1 files changed, 19 insertions, 14 deletions
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 8d3bd799ac7d..52701c1be109 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3166,9 +3166,6 @@ out:
        btrfs_free_path(path);
        mutex_lock(&fs_info->qgroup_rescan_lock);
-        if (!btrfs_fs_closing(fs_info))
-                fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
        if (err > 0 &&
            fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {
                fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
@@ -3184,16 +3181,30 @@ out:
        trans = btrfs_start_transaction(fs_info->quota_root, 1);
        if (IS_ERR(trans)) {
                err = PTR_ERR(trans);
+                trans = NULL;
                btrfs_err(fs_info,
                          "fail to start transaction for status update: %d",
                          err);
-                goto done;
        }
-        ret = update_qgroup_status_item(trans);
-        if (ret < 0) {
+        mutex_lock(&fs_info->qgroup_rescan_lock);
-                err = ret;
+        if (!btrfs_fs_closing(fs_info))
-                btrfs_err(fs_info, "fail to update qgroup status: %d", err);
+                fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+        if (trans) {
+                ret = update_qgroup_status_item(trans);
+                if (ret < 0) {
+                        err = ret;
+                        btrfs_err(fs_info, "fail to update qgroup status: %d",
+                                  err);
+                }
        }
+        fs_info->qgroup_rescan_running = false;
+        complete_all(&fs_info->qgroup_rescan_completion);
+        mutex_unlock(&fs_info->qgroup_rescan_lock);
+        if (!trans)
+                return;
        btrfs_end_transaction(trans);
        if (btrfs_fs_closing(fs_info)) {
@@ -3204,12 +3215,6 @@ out:
        } else {
                btrfs_err(fs_info, "qgroup scan failed with %d", err);
        }
-done:
-        mutex_lock(&fs_info->qgroup_rescan_lock);
-        fs_info->qgroup_rescan_running = false;
-        mutex_unlock(&fs_info->qgroup_rescan_lock);
-        complete_all(&fs_info->qgroup_rescan_completion);
 }
 /*
author	Filipe Manana <fdmanana@suse.com>	2019-09-24 05:49:54 -0400
committer	David Sterba <dsterba@suse.com>	2019-09-24 10:38:53 -0400
commit	13fc1d271a2e3ab8a02071e711add01fab9271f6 (patch)
tree	6e88d30879d67a5940f7c7b104be6cf7a81c1503
parent	0607eb1d452d45c5ac4c745a9e9e0d95152ea9d0 (diff)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 8d3bd799ac7d..52701c1be109 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c
@@ -3166,9 +3166,6 @@ out:
3166	btrfs_free_path(path);	3166	btrfs_free_path(path);
3167		3167
3168	mutex_lock(&fs_info->qgroup_rescan_lock);	3168	mutex_lock(&fs_info->qgroup_rescan_lock);
3169	if (!btrfs_fs_closing(fs_info))
3170	fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
3171
3172	if (err > 0 &&	3169	if (err > 0 &&
3173	fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {	3170	fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {
3174	fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;	3171	fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
@@ -3184,16 +3181,30 @@ out:
3184	trans = btrfs_start_transaction(fs_info->quota_root, 1);	3181	trans = btrfs_start_transaction(fs_info->quota_root, 1);
3185	if (IS_ERR(trans)) {	3182	if (IS_ERR(trans)) {
3186	err = PTR_ERR(trans);	3183	err = PTR_ERR(trans);
		3184	trans = NULL;
3187	btrfs_err(fs_info,	3185	btrfs_err(fs_info,
3188	"fail to start transaction for status update: %d",	3186	"fail to start transaction for status update: %d",
3189	err);	3187	err);
3190	goto done;
3191	}	3188	}
3192	ret = update_qgroup_status_item(trans);	3189
3193	if (ret < 0) {	3190	mutex_lock(&fs_info->qgroup_rescan_lock);
3194	err = ret;	3191	if (!btrfs_fs_closing(fs_info))
3195	btrfs_err(fs_info, "fail to update qgroup status: %d", err);	3192	fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
		3193	if (trans) {
		3194	ret = update_qgroup_status_item(trans);
		3195	if (ret < 0) {
		3196	err = ret;
		3197	btrfs_err(fs_info, "fail to update qgroup status: %d",
		3198	err);
		3199	}
3196	}	3200	}
		3201	fs_info->qgroup_rescan_running = false;
		3202	complete_all(&fs_info->qgroup_rescan_completion);
		3203	mutex_unlock(&fs_info->qgroup_rescan_lock);
		3204
		3205	if (!trans)
		3206	return;
		3207
3197	btrfs_end_transaction(trans);	3208	btrfs_end_transaction(trans);
3198		3209
3199	if (btrfs_fs_closing(fs_info)) {	3210	if (btrfs_fs_closing(fs_info)) {
@@ -3204,12 +3215,6 @@ out:
3204	} else {	3215	} else {
3205	btrfs_err(fs_info, "qgroup scan failed with %d", err);	3216	btrfs_err(fs_info, "qgroup scan failed with %d", err);
3206	}	3217	}
3207
3208	done:
3209	mutex_lock(&fs_info->qgroup_rescan_lock);
3210	fs_info->qgroup_rescan_running = false;
3211	mutex_unlock(&fs_info->qgroup_rescan_lock);
3212	complete_all(&fs_info->qgroup_rescan_completion);
3213	}	3218	}
3214		3219
3215	/*	3220	/*