aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/futex.c
diff options
context:
space:
mode:
authorLai Jiangshan <laijs@cn.fujitsu.com>2011-01-14 04:09:41 -0500
committerSteven Rostedt <rostedt@goodmis.org>2011-01-27 21:13:51 -0500
commit8161239a8bcce9ad6b537c04a1fa3b5c68bae693 (patch)
treea30738ef6e6be053e3604d7ca966a4805ef0039b /kernel/futex.c
parent6fb1b304255efc5c4c93874ac8c066272e257e28 (diff)
rtmutex: Simplify PI algorithm and make highest prio task get lock
In current rtmutex, the pending owner may be boosted by the tasks in the rtmutex's waitlist when the pending owner is deboosted or a task in the waitlist is boosted. This boosting is unrelated, because the pending owner does not really take the rtmutex. It is not reasonable. Example. time1: A(high prio) onwers the rtmutex. B(mid prio) and C (low prio) in the waitlist. time2 A release the lock, B becomes the pending owner A(or other high prio task) continues to run. B's prio is lower than A, so B is just queued at the runqueue. time3 A or other high prio task sleeps, but we have passed some time The B and C's prio are changed in the period (time2 ~ time3) due to boosting or deboosting. Now C has the priority higher than B. ***Is it reasonable that C has to boost B and help B to get the rtmutex? NO!! I think, it is unrelated/unneed boosting before B really owns the rtmutex. We should give C a chance to beat B and win the rtmutex. This is the motivation of this patch. This patch *ensures* only the top waiter or higher priority task can take the lock. How? 1) we don't dequeue the top waiter when unlock, if the top waiter is changed, the old top waiter will fail and go to sleep again. 2) when requiring lock, it will get the lock when the lock is not taken and: there is no waiter OR higher priority than waiters OR it is top waiter. 3) In any time, the top waiter is changed, the top waiter will be woken up. The algorithm is much simpler than before, no pending owner, no boosting for pending owner. Other advantage of this patch: 1) The states of a rtmutex are reduced a half, easier to read the code. 2) the codes become shorter. 3) top waiter is not dequeued until it really take the lock: they will retain FIFO when it is stolen. Not advantage nor disadvantage 1) Even we may wakeup multiple waiters(any time when top waiter changed), we hardly cause "thundering herd", the number of wokenup task is likely 1 or very little. 2) two APIs are changed. rt_mutex_owner() will not return pending owner, it will return NULL when the top waiter is going to take the lock. rt_mutex_next_owner() always return the top waiter. will not return NULL if we have waiters because the top waiter is not dequeued. I have fixed the code that use these APIs. need updated after this patch is accepted 1) Document/* 2) the testcase scripts/rt-tester/t4-l2-pi-deboost.tst Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4D3012D5.4060709@cn.fujitsu.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Diffstat (limited to 'kernel/futex.c')
-rw-r--r--kernel/futex.c22
1 files changed, 12 insertions, 10 deletions
diff --git a/kernel/futex.c b/kernel/futex.c
index b766d28accd6..64c38115c7b6 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1556,10 +1556,10 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
1556 1556
1557 /* 1557 /*
1558 * We are here either because we stole the rtmutex from the 1558 * We are here either because we stole the rtmutex from the
1559 * pending owner or we are the pending owner which failed to 1559 * previous highest priority waiter or we are the highest priority
1560 * get the rtmutex. We have to replace the pending owner TID 1560 * waiter but failed to get the rtmutex the first time.
1561 * in the user space variable. This must be atomic as we have 1561 * We have to replace the newowner TID in the user space variable.
1562 * to preserve the owner died bit here. 1562 * This must be atomic as we have to preserve the owner died bit here.
1563 * 1563 *
1564 * Note: We write the user space value _before_ changing the pi_state 1564 * Note: We write the user space value _before_ changing the pi_state
1565 * because we can fault here. Imagine swapped out pages or a fork 1565 * because we can fault here. Imagine swapped out pages or a fork
@@ -1608,8 +1608,8 @@ retry:
1608 1608
1609 /* 1609 /*
1610 * To handle the page fault we need to drop the hash bucket 1610 * To handle the page fault we need to drop the hash bucket
1611 * lock here. That gives the other task (either the pending 1611 * lock here. That gives the other task (either the highest priority
1612 * owner itself or the task which stole the rtmutex) the 1612 * waiter itself or the task which stole the rtmutex) the
1613 * chance to try the fixup of the pi_state. So once we are 1613 * chance to try the fixup of the pi_state. So once we are
1614 * back from handling the fault we need to check the pi_state 1614 * back from handling the fault we need to check the pi_state
1615 * after reacquiring the hash bucket lock and before trying to 1615 * after reacquiring the hash bucket lock and before trying to
@@ -1685,18 +1685,20 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
1685 /* 1685 /*
1686 * pi_state is incorrect, some other task did a lock steal and 1686 * pi_state is incorrect, some other task did a lock steal and
1687 * we returned due to timeout or signal without taking the 1687 * we returned due to timeout or signal without taking the
1688 * rt_mutex. Too late. We can access the rt_mutex_owner without 1688 * rt_mutex. Too late.
1689 * locking, as the other task is now blocked on the hash bucket
1690 * lock. Fix the state up.
1691 */ 1689 */
1690 raw_spin_lock(&q->pi_state->pi_mutex.wait_lock);
1692 owner = rt_mutex_owner(&q->pi_state->pi_mutex); 1691 owner = rt_mutex_owner(&q->pi_state->pi_mutex);
1692 if (!owner)
1693 owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
1694 raw_spin_unlock(&q->pi_state->pi_mutex.wait_lock);
1693 ret = fixup_pi_state_owner(uaddr, q, owner); 1695 ret = fixup_pi_state_owner(uaddr, q, owner);
1694 goto out; 1696 goto out;
1695 } 1697 }
1696 1698
1697 /* 1699 /*
1698 * Paranoia check. If we did not take the lock, then we should not be 1700 * Paranoia check. If we did not take the lock, then we should not be
1699 * the owner, nor the pending owner, of the rt_mutex. 1701 * the owner of the rt_mutex.
1700 */ 1702 */
1701 if (rt_mutex_owner(&q->pi_state->pi_mutex) == current) 1703 if (rt_mutex_owner(&q->pi_state->pi_mutex) == current)
1702 printk(KERN_ERR "fixup_owner: ret = %d pi-mutex: %p " 1704 printk(KERN_ERR "fixup_owner: ret = %d pi-mutex: %p "