summaryrefslogtreecommitdiffstats
path: root/mm
diff options
context:
space:
mode:
authorJoonsoo Kim <iamjoonsoo.kim@lge.com>2018-04-10 19:30:15 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2018-04-11 13:28:32 -0400
commitbad8c6c0b1144694ecb0bc5629ede9b8b578b86e (patch)
tree4f35d3265bcb009ce44b6cd9fe20c45be1f22bc6 /mm
parentd3cda2337bbc9edd2a26b83cb00eaa8c048ff274 (diff)
mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE
Patch series "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE", v2. 0. History This patchset is the follow-up of the discussion about the "Introduce ZONE_CMA (v7)" [1]. Please reference it if more information is needed. 1. What does this patch do? This patch changes the management way for the memory of the CMA area in the MM subsystem. Currently the memory of the CMA area is managed by the zone where their pfn is belong to. However, this approach has some problems since MM subsystem doesn't have enough logic to handle the situation that different characteristic memories are in a single zone. To solve this issue, this patch try to manage all the memory of the CMA area by using the MOVABLE zone. In MM subsystem's point of view, characteristic of the memory on the MOVABLE zone and the memory of the CMA area are the same. So, managing the memory of the CMA area by using the MOVABLE zone will not have any problem. 2. Motivation There are some problems with current approach. See following. Although these problem would not be inherent and it could be fixed without this conception change, it requires many hooks addition in various code path and it would be intrusive to core MM and would be really error-prone. Therefore, I try to solve them with this new approach. Anyway, following is the problems of the current implementation. o CMA memory utilization First, following is the freepage calculation logic in MM. - For movable allocation: freepage = total freepage - For unmovable allocation: freepage = total freepage - CMA freepage Freepages on the CMA area is used after the normal freepages in the zone where the memory of the CMA area is belong to are exhausted. At that moment that the number of the normal freepages is zero, so - For movable allocation: freepage = total freepage = CMA freepage - For unmovable allocation: freepage = 0 If unmovable allocation comes at this moment, allocation request would fail to pass the watermark check and reclaim is started. After reclaim, there would exist the normal freepages so freepages on the CMA areas would not be used. FYI, there is another attempt [2] trying to solve this problem in lkml. And, as far as I know, Qualcomm also has out-of-tree solution for this problem. Useless reclaim: There is no logic to distinguish CMA pages in the reclaim path. Hence, CMA page is reclaimed even if the system just needs the page that can be usable for the kernel allocation. Atomic allocation failure: This is also related to the fallback allocation policy for the memory of the CMA area. Consider the situation that the number of the normal freepages is *zero* since the bunch of the movable allocation requests come. Kswapd would not be woken up due to following freepage calculation logic. - For movable allocation: freepage = total freepage = CMA freepage If atomic unmovable allocation request comes at this moment, it would fails due to following logic. - For unmovable allocation: freepage = total freepage - CMA freepage = 0 It was reported by Aneesh [3]. Useless compaction: Usual high-order allocation request is unmovable allocation request and it cannot be served from the memory of the CMA area. In compaction, migration scanner try to migrate the page in the CMA area and make high-order page there. As mentioned above, it cannot be usable for the unmovable allocation request so it's just waste. 3. Current approach and new approach Current approach is that the memory of the CMA area is managed by the zone where their pfn is belong to. However, these memory should be distinguishable since they have a strong limitation. So, they are marked as MIGRATE_CMA in pageblock flag and handled specially. However, as mentioned in section 2, the MM subsystem doesn't have enough logic to deal with this special pageblock so many problems raised. New approach is that the memory of the CMA area is managed by the MOVABLE zone. MM already have enough logic to deal with special zone like as HIGHMEM and MOVABLE zone. So, managing the memory of the CMA area by the MOVABLE zone just naturally work well because constraints for the memory of the CMA area that the memory should always be migratable is the same with the constraint for the MOVABLE zone. There is one side-effect for the usability of the memory of the CMA area. The use of MOVABLE zone is only allowed for a request with GFP_HIGHMEM && GFP_MOVABLE so now the memory of the CMA area is also only allowed for this gfp flag. Before this patchset, a request with GFP_MOVABLE can use them. IMO, It would not be a big issue since most of GFP_MOVABLE request also has GFP_HIGHMEM flag. For example, file cache page and anonymous page. However, file cache page for blockdev file is an exception. Request for it has no GFP_HIGHMEM flag. There is pros and cons on this exception. In my experience, blockdev file cache pages are one of the top reason that causes cma_alloc() to fail temporarily. So, we can get more guarantee of cma_alloc() success by discarding this case. Note that there is no change in admin POV since this patchset is just for internal implementation change in MM subsystem. Just one minor difference for admin is that the memory stat for CMA area will be printed in the MOVABLE zone. That's all. 4. Result Following is the experimental result related to utilization problem. 8 CPUs, 1024 MB, VIRTUAL MACHINE make -j16 <Before> CMA area: 0 MB 512 MB Elapsed-time: 92.4 186.5 pswpin: 82 18647 pswpout: 160 69839 <After> CMA : 0 MB 512 MB Elapsed-time: 93.1 93.4 pswpin: 84 46 pswpout: 183 92 akpm: "kernel test robot" reported a 26% improvement in vm-scalability.throughput: http://lkml.kernel.org/r/20180330012721.GA3845@yexl-desktop [1]: lkml.kernel.org/r/1491880640-9944-1-git-send-email-iamjoonsoo.kim@lge.com [2]: https://lkml.org/lkml/2014/10/15/623 [3]: http://www.spinics.net/lists/linux-mm/msg100562.html Link: http://lkml.kernel.org/r/1512114786-5085-2-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Tony Lindgren <tony@atomide.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm')
-rw-r--r--mm/cma.c83
-rw-r--r--mm/internal.h3
-rw-r--r--mm/page_alloc.c55
3 files changed, 125 insertions, 16 deletions
diff --git a/mm/cma.c b/mm/cma.c
index 5809bbe360d7..aa40e6c7b042 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -39,6 +39,7 @@
39#include <trace/events/cma.h> 39#include <trace/events/cma.h>
40 40
41#include "cma.h" 41#include "cma.h"
42#include "internal.h"
42 43
43struct cma cma_areas[MAX_CMA_AREAS]; 44struct cma cma_areas[MAX_CMA_AREAS];
44unsigned cma_area_count; 45unsigned cma_area_count;
@@ -109,23 +110,25 @@ static int __init cma_activate_area(struct cma *cma)
109 if (!cma->bitmap) 110 if (!cma->bitmap)
110 return -ENOMEM; 111 return -ENOMEM;
111 112
112 WARN_ON_ONCE(!pfn_valid(pfn));
113 zone = page_zone(pfn_to_page(pfn));
114
115 do { 113 do {
116 unsigned j; 114 unsigned j;
117 115
118 base_pfn = pfn; 116 base_pfn = pfn;
117 if (!pfn_valid(base_pfn))
118 goto err;
119
120 zone = page_zone(pfn_to_page(base_pfn));
119 for (j = pageblock_nr_pages; j; --j, pfn++) { 121 for (j = pageblock_nr_pages; j; --j, pfn++) {
120 WARN_ON_ONCE(!pfn_valid(pfn)); 122 if (!pfn_valid(pfn))
123 goto err;
124
121 /* 125 /*
122 * alloc_contig_range requires the pfn range 126 * In init_cma_reserved_pageblock(), present_pages
123 * specified to be in the same zone. Make this 127 * is adjusted with assumption that all pages in
124 * simple by forcing the entire CMA resv range 128 * the pageblock come from a single zone.
125 * to be in the same zone.
126 */ 129 */
127 if (page_zone(pfn_to_page(pfn)) != zone) 130 if (page_zone(pfn_to_page(pfn)) != zone)
128 goto not_in_zone; 131 goto err;
129 } 132 }
130 init_cma_reserved_pageblock(pfn_to_page(base_pfn)); 133 init_cma_reserved_pageblock(pfn_to_page(base_pfn));
131 } while (--i); 134 } while (--i);
@@ -139,7 +142,7 @@ static int __init cma_activate_area(struct cma *cma)
139 142
140 return 0; 143 return 0;
141 144
142not_in_zone: 145err:
143 pr_err("CMA area %s could not be activated\n", cma->name); 146 pr_err("CMA area %s could not be activated\n", cma->name);
144 kfree(cma->bitmap); 147 kfree(cma->bitmap);
145 cma->count = 0; 148 cma->count = 0;
@@ -149,6 +152,41 @@ not_in_zone:
149static int __init cma_init_reserved_areas(void) 152static int __init cma_init_reserved_areas(void)
150{ 153{
151 int i; 154 int i;
155 struct zone *zone;
156 pg_data_t *pgdat;
157
158 if (!cma_area_count)
159 return 0;
160
161 for_each_online_pgdat(pgdat) {
162 unsigned long start_pfn = UINT_MAX, end_pfn = 0;
163
164 zone = &pgdat->node_zones[ZONE_MOVABLE];
165
166 /*
167 * In this case, we cannot adjust the zone range
168 * since it is now maximum node span and we don't
169 * know original zone range.
170 */
171 if (populated_zone(zone))
172 continue;
173
174 for (i = 0; i < cma_area_count; i++) {
175 if (pfn_to_nid(cma_areas[i].base_pfn) !=
176 pgdat->node_id)
177 continue;
178
179 start_pfn = min(start_pfn, cma_areas[i].base_pfn);
180 end_pfn = max(end_pfn, cma_areas[i].base_pfn +
181 cma_areas[i].count);
182 }
183
184 if (!end_pfn)
185 continue;
186
187 zone->zone_start_pfn = start_pfn;
188 zone->spanned_pages = end_pfn - start_pfn;
189 }
152 190
153 for (i = 0; i < cma_area_count; i++) { 191 for (i = 0; i < cma_area_count; i++) {
154 int ret = cma_activate_area(&cma_areas[i]); 192 int ret = cma_activate_area(&cma_areas[i]);
@@ -157,9 +195,32 @@ static int __init cma_init_reserved_areas(void)
157 return ret; 195 return ret;
158 } 196 }
159 197
198 /*
199 * Reserved pages for ZONE_MOVABLE are now activated and
200 * this would change ZONE_MOVABLE's managed page counter and
201 * the other zones' present counter. We need to re-calculate
202 * various zone information that depends on this initialization.
203 */
204 build_all_zonelists(NULL);
205 for_each_populated_zone(zone) {
206 if (zone_idx(zone) == ZONE_MOVABLE) {
207 zone_pcp_reset(zone);
208 setup_zone_pageset(zone);
209 } else
210 zone_pcp_update(zone);
211
212 set_zone_contiguous(zone);
213 }
214
215 /*
216 * We need to re-init per zone wmark by calling
217 * init_per_zone_wmark_min() but doesn't call here because it is
218 * registered on core_initcall and it will be called later than us.
219 */
220
160 return 0; 221 return 0;
161} 222}
162core_initcall(cma_init_reserved_areas); 223pure_initcall(cma_init_reserved_areas);
163 224
164/** 225/**
165 * cma_init_reserved_mem() - create custom contiguous area from reserved memory 226 * cma_init_reserved_mem() - create custom contiguous area from reserved memory
diff --git a/mm/internal.h b/mm/internal.h
index 502d14189794..228dd6642951 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -168,6 +168,9 @@ extern void post_alloc_hook(struct page *page, unsigned int order,
168 gfp_t gfp_flags); 168 gfp_t gfp_flags);
169extern int user_min_free_kbytes; 169extern int user_min_free_kbytes;
170 170
171extern void set_zone_contiguous(struct zone *zone);
172extern void clear_zone_contiguous(struct zone *zone);
173
171#if defined CONFIG_COMPACTION || defined CONFIG_CMA 174#if defined CONFIG_COMPACTION || defined CONFIG_CMA
172 175
173/* 176/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 34a4c12d2675..facc25ee6e2d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1747,16 +1747,38 @@ void __init page_alloc_init_late(void)
1747} 1747}
1748 1748
1749#ifdef CONFIG_CMA 1749#ifdef CONFIG_CMA
1750static void __init adjust_present_page_count(struct page *page, long count)
1751{
1752 struct zone *zone = page_zone(page);
1753
1754 /* We don't need to hold a lock since it is boot-up process */
1755 zone->present_pages += count;
1756}
1757
1750/* Free whole pageblock and set its migration type to MIGRATE_CMA. */ 1758/* Free whole pageblock and set its migration type to MIGRATE_CMA. */
1751void __init init_cma_reserved_pageblock(struct page *page) 1759void __init init_cma_reserved_pageblock(struct page *page)
1752{ 1760{
1753 unsigned i = pageblock_nr_pages; 1761 unsigned i = pageblock_nr_pages;
1762 unsigned long pfn = page_to_pfn(page);
1754 struct page *p = page; 1763 struct page *p = page;
1764 int nid = page_to_nid(page);
1765
1766 /*
1767 * ZONE_MOVABLE will steal present pages from other zones by
1768 * changing page links so page_zone() is changed. Before that,
1769 * we need to adjust previous zone's page count first.
1770 */
1771 adjust_present_page_count(page, -pageblock_nr_pages);
1755 1772
1756 do { 1773 do {
1757 __ClearPageReserved(p); 1774 __ClearPageReserved(p);
1758 set_page_count(p, 0); 1775 set_page_count(p, 0);
1759 } while (++p, --i); 1776
1777 /* Steal pages from other zones */
1778 set_page_links(p, ZONE_MOVABLE, nid, pfn);
1779 } while (++p, ++pfn, --i);
1780
1781 adjust_present_page_count(page, pageblock_nr_pages);
1760 1782
1761 set_pageblock_migratetype(page, MIGRATE_CMA); 1783 set_pageblock_migratetype(page, MIGRATE_CMA);
1762 1784
@@ -6208,6 +6230,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
6208{ 6230{
6209 enum zone_type j; 6231 enum zone_type j;
6210 int nid = pgdat->node_id; 6232 int nid = pgdat->node_id;
6233 unsigned long node_end_pfn = 0;
6211 6234
6212 pgdat_resize_init(pgdat); 6235 pgdat_resize_init(pgdat);
6213#ifdef CONFIG_NUMA_BALANCING 6236#ifdef CONFIG_NUMA_BALANCING
@@ -6235,9 +6258,13 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
6235 struct zone *zone = pgdat->node_zones + j; 6258 struct zone *zone = pgdat->node_zones + j;
6236 unsigned long size, realsize, freesize, memmap_pages; 6259 unsigned long size, realsize, freesize, memmap_pages;
6237 unsigned long zone_start_pfn = zone->zone_start_pfn; 6260 unsigned long zone_start_pfn = zone->zone_start_pfn;
6261 unsigned long movable_size = 0;
6238 6262
6239 size = zone->spanned_pages; 6263 size = zone->spanned_pages;
6240 realsize = freesize = zone->present_pages; 6264 realsize = freesize = zone->present_pages;
6265 if (zone_end_pfn(zone) > node_end_pfn)
6266 node_end_pfn = zone_end_pfn(zone);
6267
6241 6268
6242 /* 6269 /*
6243 * Adjust freesize so that it accounts for how much memory 6270 * Adjust freesize so that it accounts for how much memory
@@ -6286,12 +6313,30 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
6286 zone_seqlock_init(zone); 6313 zone_seqlock_init(zone);
6287 zone_pcp_init(zone); 6314 zone_pcp_init(zone);
6288 6315
6289 if (!size) 6316 /*
6317 * The size of the CMA area is unknown now so we need to
6318 * prepare the memory for the usemap at maximum.
6319 */
6320 if (IS_ENABLED(CONFIG_CMA) && j == ZONE_MOVABLE &&
6321 pgdat->node_spanned_pages) {
6322 movable_size = node_end_pfn - pgdat->node_start_pfn;
6323 }
6324
6325 if (!size && !movable_size)
6290 continue; 6326 continue;
6291 6327
6292 set_pageblock_order(); 6328 set_pageblock_order();
6293 setup_usemap(pgdat, zone, zone_start_pfn, size); 6329 if (movable_size) {
6294 init_currently_empty_zone(zone, zone_start_pfn, size); 6330 zone->zone_start_pfn = pgdat->node_start_pfn;
6331 zone->spanned_pages = movable_size;
6332 setup_usemap(pgdat, zone,
6333 pgdat->node_start_pfn, movable_size);
6334 init_currently_empty_zone(zone,
6335 pgdat->node_start_pfn, movable_size);
6336 } else {
6337 setup_usemap(pgdat, zone, zone_start_pfn, size);
6338 init_currently_empty_zone(zone, zone_start_pfn, size);
6339 }
6295 memmap_init(size, nid, j, zone_start_pfn); 6340 memmap_init(size, nid, j, zone_start_pfn);
6296 } 6341 }
6297} 6342}
@@ -7932,7 +7977,7 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
7932} 7977}
7933#endif 7978#endif
7934 7979
7935#ifdef CONFIG_MEMORY_HOTPLUG 7980#if defined CONFIG_MEMORY_HOTPLUG || defined CONFIG_CMA
7936/* 7981/*
7937 * The zone indicated has a new number of managed_pages; batch sizes and percpu 7982 * The zone indicated has a new number of managed_pages; batch sizes and percpu
7938 * page high values need to be recalulated. 7983 * page high values need to be recalulated.