diff options
author | Paul Jackson <pj@sgi.com> | 2006-05-20 18:00:09 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@g5.osdl.org> | 2006-05-21 15:59:18 -0400 |
commit | bdd804f478a0cc74bf7db8e9f9d5fd379d1b31ca (patch) | |
tree | 2b8f083b1ca698c0f9321b3714dab036d2531f29 /mm | |
parent | 593ee20766921fec643194dff829e17f30552220 (diff) |
[PATCH] Cpuset: might sleep checking zones allowed fix
Fix a couple of infrequently encountered 'sleeping function called from
invalid context' in the cpuset hooks in __alloc_pages. Could sleep while
interrupts disabled.
The routine cpuset_zone_allowed() is called by code in mm/page_alloc.c
__alloc_pages() to determine if a zone is allowed in the current tasks
cpuset. This routine can sleep, for certain GFP_KERNEL allocations, if the
zone is on a memory node not allowed in the current cpuset, but might be
allowed in a parent cpuset.
But we can't sleep in __alloc_pages() if in interrupt, nor if called for a
GFP_ATOMIC request (__GFP_WAIT not set in gfp_flags).
The rule was intended to be:
Don't call cpuset_zone_allowed() if you can't sleep, unless you
pass in the __GFP_HARDWALL flag set in gfp_flag, which disables
the code that might scan up ancestor cpusets and sleep.
This rule was being violated in a couple of places, due to a bogus change
made (by myself, pj) to __alloc_pages() as part of the November 2005 effort
to cleanup its logic, and also due to a later fix to constrain which swap
daemons were awoken.
The bogus change can be seen at:
http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-11/4691.html
[PATCH 01/05] mm fix __alloc_pages cpuset ALLOC_* flags
This was first noticed on a tight memory system, in code that was disabling
interrupts and doing allocation requests with __GFP_WAIT not set, which
resulted in __might_sleep() writing complaints to the log "Debug: sleeping
function called ...", when the code in cpuset_zone_allowed() tried to take
the callback_sem cpuset semaphore.
We haven't seen a system hang on this 'might_sleep' yet, but we are at
decent risk of seeing it fairly soon, especially since the additional
cpuset_zone_allowed() check was added, conditioning wakeup_kswapd(), in
March 2006.
Special thanks to Dave Chinner, for figuring this out, and a tip of the hat
to Nick Piggin who warned me of this back in Nov 2005, before I was ready
to listen.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diffstat (limited to 'mm')
-rw-r--r-- | mm/page_alloc.c | 5 |
1 files changed, 3 insertions, 2 deletions
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 813b4ec1298a..bb3416932ab0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c | |||
@@ -951,7 +951,7 @@ restart: | |||
951 | goto got_pg; | 951 | goto got_pg; |
952 | 952 | ||
953 | do { | 953 | do { |
954 | if (cpuset_zone_allowed(*z, gfp_mask)) | 954 | if (cpuset_zone_allowed(*z, gfp_mask|__GFP_HARDWALL)) |
955 | wakeup_kswapd(*z, order); | 955 | wakeup_kswapd(*z, order); |
956 | } while (*(++z)); | 956 | } while (*(++z)); |
957 | 957 | ||
@@ -970,7 +970,8 @@ restart: | |||
970 | alloc_flags |= ALLOC_HARDER; | 970 | alloc_flags |= ALLOC_HARDER; |
971 | if (gfp_mask & __GFP_HIGH) | 971 | if (gfp_mask & __GFP_HIGH) |
972 | alloc_flags |= ALLOC_HIGH; | 972 | alloc_flags |= ALLOC_HIGH; |
973 | alloc_flags |= ALLOC_CPUSET; | 973 | if (wait) |
974 | alloc_flags |= ALLOC_CPUSET; | ||
974 | 975 | ||
975 | /* | 976 | /* |
976 | * Go through the zonelist again. Let __GFP_HIGH and allocations | 977 | * Go through the zonelist again. Let __GFP_HIGH and allocations |