diff options
author | Jiang Liu <liuj97@gmail.com> | 2012-12-12 16:52:12 -0500 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-12-12 20:38:34 -0500 |
commit | 9feedc9d831e18ae6d0d15aa562e5e46ba53647b (patch) | |
tree | cb26ff54b0f02c4905772288b27f99b8b384ad6d /include | |
parent | c2d23f919bafcbc2259f5257d9a7d729802f0e3a (diff) |
mm: introduce new field "managed_pages" to struct zone
Currently a zone's present_pages is calcuated as below, which is
inaccurate and may cause trouble to memory hotplug.
spanned_pages - absent_pages - memmap_pages - dma_reserve.
During fixing bugs caused by inaccurate zone->present_pages, we found
zone->present_pages has been abused. The field zone->present_pages may
have different meanings in different contexts:
1) pages existing in a zone.
2) pages managed by the buddy system.
For more discussions about the issue, please refer to:
http://lkml.org/lkml/2012/11/5/866
https://patchwork.kernel.org/patch/1346751/
This patchset tries to introduce a new field named "managed_pages" to
struct zone, which counts "pages managed by the buddy system". And revert
zone->present_pages to count "physical pages existing in a zone", which
also keep in consistence with pgdat->node_present_pages.
We will set an initial value for zone->managed_pages in function
free_area_init_core() and will adjust it later if the initial value is
inaccurate.
For DMA/normal zones, the initial value is set to:
(spanned_pages - absent_pages - memmap_pages - dma_reserve)
Later zone->managed_pages will be adjusted to the accurate value when the
bootmem allocator frees all free pages to the buddy system in function
free_all_bootmem_node() and free_all_bootmem().
The bootmem allocator doesn't touch highmem pages, so highmem zones'
managed_pages is set to the accurate value "spanned_pages - absent_pages"
in function free_area_init_core() and won't be updated anymore.
This patch also adds a new field "managed_pages" to /proc/zoneinfo
and sysrq showmem.
[akpm@linux-foundation.org: small comment tweaks]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'include')
-rw-r--r-- | include/linux/mmzone.h | 41 |
1 files changed, 34 insertions, 7 deletions
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 0c0b1d608a69..cd55dad56aac 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h | |||
@@ -460,17 +460,44 @@ struct zone { | |||
460 | unsigned long zone_start_pfn; | 460 | unsigned long zone_start_pfn; |
461 | 461 | ||
462 | /* | 462 | /* |
463 | * zone_start_pfn, spanned_pages and present_pages are all | 463 | * spanned_pages is the total pages spanned by the zone, including |
464 | * protected by span_seqlock. It is a seqlock because it has | 464 | * holes, which is calculated as: |
465 | * to be read outside of zone->lock, and it is done in the main | 465 | * spanned_pages = zone_end_pfn - zone_start_pfn; |
466 | * allocator path. But, it is written quite infrequently. | ||
467 | * | 466 | * |
468 | * The lock is declared along with zone->lock because it is | 467 | * present_pages is physical pages existing within the zone, which |
468 | * is calculated as: | ||
469 | * present_pages = spanned_pages - absent_pages(pags in holes); | ||
470 | * | ||
471 | * managed_pages is present pages managed by the buddy system, which | ||
472 | * is calculated as (reserved_pages includes pages allocated by the | ||
473 | * bootmem allocator): | ||
474 | * managed_pages = present_pages - reserved_pages; | ||
475 | * | ||
476 | * So present_pages may be used by memory hotplug or memory power | ||
477 | * management logic to figure out unmanaged pages by checking | ||
478 | * (present_pages - managed_pages). And managed_pages should be used | ||
479 | * by page allocator and vm scanner to calculate all kinds of watermarks | ||
480 | * and thresholds. | ||
481 | * | ||
482 | * Locking rules: | ||
483 | * | ||
484 | * zone_start_pfn and spanned_pages are protected by span_seqlock. | ||
485 | * It is a seqlock because it has to be read outside of zone->lock, | ||
486 | * and it is done in the main allocator path. But, it is written | ||
487 | * quite infrequently. | ||
488 | * | ||
489 | * The span_seq lock is declared along with zone->lock because it is | ||
469 | * frequently read in proximity to zone->lock. It's good to | 490 | * frequently read in proximity to zone->lock. It's good to |
470 | * give them a chance of being in the same cacheline. | 491 | * give them a chance of being in the same cacheline. |
492 | * | ||
493 | * Write access to present_pages and managed_pages at runtime should | ||
494 | * be protected by lock_memory_hotplug()/unlock_memory_hotplug(). | ||
495 | * Any reader who can't tolerant drift of present_pages and | ||
496 | * managed_pages should hold memory hotplug lock to get a stable value. | ||
471 | */ | 497 | */ |
472 | unsigned long spanned_pages; /* total size, including holes */ | 498 | unsigned long spanned_pages; |
473 | unsigned long present_pages; /* amount of memory (excluding holes) */ | 499 | unsigned long present_pages; |
500 | unsigned long managed_pages; | ||
474 | 501 | ||
475 | /* | 502 | /* |
476 | * rarely used fields: | 503 | * rarely used fields: |