diff options
author | Paul Jackson <pj@sgi.com> | 2005-05-27 05:02:43 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@ppc970.osdl.org> | 2005-05-27 11:07:26 -0400 |
commit | 2efe86b809d97debaaf9fcc13b041aedf15bd3d2 (patch) | |
tree | 87e039397918f4c5b0a21d798589a8ce517bfa2d /kernel | |
parent | 88c1834633341bbb94e315433067496338bff4ad (diff) |
[PATCH] cpuset exit NULL dereference fix
There is a race in the kernel cpuset code, between the code
to handle notify_on_release, and the code to remove a cpuset.
The notify_on_release code can end up trying to access a
cpuset that has been removed. In the most common case, this
causes a NULL pointer dereference from the routine cpuset_path.
However all manner of bad things are possible, in theory at least.
The existing code decrements the cpuset use count, and if the
count goes to zero, processes the notify_on_release request,
if appropriate. However, once the count goes to zero, unless we
are holding the global cpuset_sem semaphore, there is nothing to
stop another task from immediately removing the cpuset entirely,
and recycling its memory.
The obvious fix would be to always hold the cpuset_sem
semaphore while decrementing the use count and dealing with
notify_on_release. However we don't want to force a global
semaphore into the mainline task exit path, as that might create
a scaling problem.
The actual fix is almost as easy - since this is only an issue
for cpusets using notify_on_release, which the top level big
cpusets don't normally need to use, only take the cpuset_sem
for cpusets using notify_on_release.
This code has been run for hours without a hiccup, while running
a cpuset create/destroy stress test that could crash the existing
kernel in seconds. This patch applies to the current -linus
git kernel.
Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Simon Derr <simon.derr@bull.net>
Acked-by: Dinakar Guniguntala <dino@in.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diffstat (limited to 'kernel')
-rw-r--r-- | kernel/cpuset.c | 24 |
1 files changed, 19 insertions, 5 deletions
diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 961d74044deb..00e8f2575512 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c | |||
@@ -166,9 +166,8 @@ static struct super_block *cpuset_sb = NULL; | |||
166 | * The hooks from fork and exit, cpuset_fork() and cpuset_exit(), don't | 166 | * The hooks from fork and exit, cpuset_fork() and cpuset_exit(), don't |
167 | * (usually) grab cpuset_sem. These are the two most performance | 167 | * (usually) grab cpuset_sem. These are the two most performance |
168 | * critical pieces of code here. The exception occurs on exit(), | 168 | * critical pieces of code here. The exception occurs on exit(), |
169 | * if the last task using a cpuset exits, and the cpuset was marked | 169 | * when a task in a notify_on_release cpuset exits. Then cpuset_sem |
170 | * notify_on_release. In that case, the cpuset_sem is taken, the | 170 | * is taken, and if the cpuset count is zero, a usermode call made |
171 | * path to the released cpuset calculated, and a usermode call made | ||
172 | * to /sbin/cpuset_release_agent with the name of the cpuset (path | 171 | * to /sbin/cpuset_release_agent with the name of the cpuset (path |
173 | * relative to the root of cpuset file system) as the argument. | 172 | * relative to the root of cpuset file system) as the argument. |
174 | * | 173 | * |
@@ -1404,6 +1403,18 @@ void cpuset_fork(struct task_struct *tsk) | |||
1404 | * | 1403 | * |
1405 | * Description: Detach cpuset from @tsk and release it. | 1404 | * Description: Detach cpuset from @tsk and release it. |
1406 | * | 1405 | * |
1406 | * Note that cpusets marked notify_on_release force every task | ||
1407 | * in them to take the global cpuset_sem semaphore when exiting. | ||
1408 | * This could impact scaling on very large systems. Be reluctant | ||
1409 | * to use notify_on_release cpusets where very high task exit | ||
1410 | * scaling is required on large systems. | ||
1411 | * | ||
1412 | * Don't even think about derefencing 'cs' after the cpuset use | ||
1413 | * count goes to zero, except inside a critical section guarded | ||
1414 | * by the cpuset_sem semaphore. If you don't hold cpuset_sem, | ||
1415 | * then a zero cpuset use count is a license to any other task to | ||
1416 | * nuke the cpuset immediately. | ||
1417 | * | ||
1407 | **/ | 1418 | **/ |
1408 | 1419 | ||
1409 | void cpuset_exit(struct task_struct *tsk) | 1420 | void cpuset_exit(struct task_struct *tsk) |
@@ -1415,10 +1426,13 @@ void cpuset_exit(struct task_struct *tsk) | |||
1415 | tsk->cpuset = NULL; | 1426 | tsk->cpuset = NULL; |
1416 | task_unlock(tsk); | 1427 | task_unlock(tsk); |
1417 | 1428 | ||
1418 | if (atomic_dec_and_test(&cs->count)) { | 1429 | if (notify_on_release(cs)) { |
1419 | down(&cpuset_sem); | 1430 | down(&cpuset_sem); |
1420 | check_for_release(cs); | 1431 | if (atomic_dec_and_test(&cs->count)) |
1432 | check_for_release(cs); | ||
1421 | up(&cpuset_sem); | 1433 | up(&cpuset_sem); |
1434 | } else { | ||
1435 | atomic_dec(&cs->count); | ||
1422 | } | 1436 | } |
1423 | } | 1437 | } |
1424 | 1438 | ||