aboutsummaryrefslogtreecommitdiffstats
path: root/mm/page_alloc.c
diff options
context:
space:
mode:
authorMichal Hocko <mhocko@suse.com>2017-02-22 18:46:22 -0500
committerLinus Torvalds <torvalds@linux-foundation.org>2017-02-22 19:41:30 -0500
commit06ad276ac18742c6b281698d41b27a290cd42407 (patch)
treea9767802901845a4dc46f27486f599801e6ddd01 /mm/page_alloc.c
parent9a67f6488eca926f8356b2737fc9f8f6c0cbed85 (diff)
mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically
__alloc_pages_may_oom makes sure to skip the OOM killer depending on the allocation request. This includes lowmem requests, costly high order requests and others. For a long time __GFP_NOFAIL acted as an override for all those rules. This is not documented and it can be quite surprising as well. E.g. GFP_NOFS requests are not invoking the OOM killer but GFP_NOFS|__GFP_NOFAIL does so if we try to convert some of the existing open coded loops around allocator to nofail request (and we have done that in the past) then such a change would have a non trivial side effect which is far from obvious. Note that the primary motivation for skipping the OOM killer is to prevent from pre-mature invocation. The exception has been added by commit 82553a937f12 ("oom: invoke oom killer for __GFP_NOFAIL"). The changelog points out that the oom killer has to be invoked otherwise the request would be looping for ever. But this argument is rather weak because the OOM killer doesn't really guarantee a forward progress for those exceptional cases: - it will hardly help to form costly order which in turn can result in the system panic because of no oom killable task in the end - I believe we certainly do not want to put the system down just because there is a nasty driver asking for order-9 page with GFP_NOFAIL not realizing all the consequences. It is much better this request would loop for ever than the massive system disruption - lowmem is also highly unlikely to be freed during OOM killer - GFP_NOFS request could trigger while there is still a lot of memory pinned by filesystems. This patch simply removes the __GFP_NOFAIL special case in order to have a more clear semantic without surprising side effects. Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Nils Holland <nholland@tisys.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/page_alloc.c')
-rw-r--r--mm/page_alloc.c49
1 files changed, 24 insertions, 25 deletions
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dd36da6ffef5..1e37740837ac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3090,32 +3090,31 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
3090 if (page) 3090 if (page)
3091 goto out; 3091 goto out;
3092 3092
3093 if (!(gfp_mask & __GFP_NOFAIL)) { 3093 /* Coredumps can quickly deplete all memory reserves */
3094 /* Coredumps can quickly deplete all memory reserves */ 3094 if (current->flags & PF_DUMPCORE)
3095 if (current->flags & PF_DUMPCORE) 3095 goto out;
3096 goto out; 3096 /* The OOM killer will not help higher order allocs */
3097 /* The OOM killer will not help higher order allocs */ 3097 if (order > PAGE_ALLOC_COSTLY_ORDER)
3098 if (order > PAGE_ALLOC_COSTLY_ORDER) 3098 goto out;
3099 goto out; 3099 /* The OOM killer does not needlessly kill tasks for lowmem */
3100 /* The OOM killer does not needlessly kill tasks for lowmem */ 3100 if (ac->high_zoneidx < ZONE_NORMAL)
3101 if (ac->high_zoneidx < ZONE_NORMAL) 3101 goto out;
3102 goto out; 3102 if (pm_suspended_storage())
3103 if (pm_suspended_storage()) 3103 goto out;
3104 goto out; 3104 /*
3105 /* 3105 * XXX: GFP_NOFS allocations should rather fail than rely on
3106 * XXX: GFP_NOFS allocations should rather fail than rely on 3106 * other request to make a forward progress.
3107 * other request to make a forward progress. 3107 * We are in an unfortunate situation where out_of_memory cannot
3108 * We are in an unfortunate situation where out_of_memory cannot 3108 * do much for this context but let's try it to at least get
3109 * do much for this context but let's try it to at least get 3109 * access to memory reserved if the current task is killed (see
3110 * access to memory reserved if the current task is killed (see 3110 * out_of_memory). Once filesystems are ready to handle allocation
3111 * out_of_memory). Once filesystems are ready to handle allocation 3111 * failures more gracefully we should just bail out here.
3112 * failures more gracefully we should just bail out here. 3112 */
3113 */ 3113
3114 /* The OOM killer may not free memory on a specific node */
3115 if (gfp_mask & __GFP_THISNODE)
3116 goto out;
3114 3117
3115 /* The OOM killer may not free memory on a specific node */
3116 if (gfp_mask & __GFP_THISNODE)
3117 goto out;
3118 }
3119 /* Exhausted what can be done so it's blamo time */ 3118 /* Exhausted what can be done so it's blamo time */
3120 if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { 3119 if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
3121 *did_some_progress = 1; 3120 *did_some_progress = 1;