From f7d219dd1c95ba9de2349b4de9f8cb510ec001cb Mon Sep 17 00:00:00 2001 From: Alex Waterman Date: Thu, 21 Jan 2016 14:50:23 -0800 Subject: gpu: nvgpu: Fix semaphore race condition A race condition existed in gk20a_channel_semaphore_wait_fd(). In some instances the semaphore underlying the sync_fence being waited on would have already signaled. This would cause the subsequent sync_fence_wait_async() call to return 1 and do nothing. Normally, the sync_fence_wait_async() call would release the newly created semaphore but in the above case that would not happen and hang any channel waiting on that semaphore. To fix this problem if sync_fence_wait_async() returns 1 immediately release the newly created semaphore. Bug 1604892 Change-Id: I1f5e811695bb099f71b7762835aba4a7e27362ec Signed-off-by: Alex Waterman Reviewed-on: http://git-master/r/935910 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom GVS: Gerrit_Virtual_Submit --- drivers/gpu/nvgpu/gk20a/channel_sync_gk20a.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) (limited to 'drivers/gpu/nvgpu/gk20a/channel_sync_gk20a.c') diff --git a/drivers/gpu/nvgpu/gk20a/channel_sync_gk20a.c b/drivers/gpu/nvgpu/gk20a/channel_sync_gk20a.c index 952e6e6a..bba18789 100644 --- a/drivers/gpu/nvgpu/gk20a/channel_sync_gk20a.c +++ b/drivers/gpu/nvgpu/gk20a/channel_sync_gk20a.c @@ -456,7 +456,7 @@ static int gk20a_channel_semaphore_wait_fd( struct priv_cmd_entry *wait_cmd = NULL; struct wait_fence_work *w; int written; - int err; + int err, ret; u64 va; sync_fence = gk20a_sync_fence_fdget(fd); @@ -490,8 +490,18 @@ static int gk20a_channel_semaphore_wait_fd( va = gk20a_semaphore_gpu_va(w->sema, c->vm); /* GPU unblocked when when the semaphore value becomes 1. */ written = add_sema_cmd(wait_cmd->ptr, va, 1, true, false); + WARN_ON(written != wait_cmd->size); - sync_fence_wait_async(sync_fence, &w->waiter); + ret = sync_fence_wait_async(sync_fence, &w->waiter); + + /* + * If the sync_fence has already signaled then the above async_wait + * will never trigger. This causes the semaphore release op to never + * happen which, in turn, hangs the GPU. That's bad. So let's just + * do the semaphore_release right now. + */ + if (ret == 1) + gk20a_semaphore_release(w->sema); /* XXX - this fixes an actual bug, we need to hold a ref to this semaphore while the job is in flight. */ -- cgit v1.2.2