aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/vm/transhuge.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/vm/transhuge.txt')
-rw-r--r--Documentation/vm/transhuge.txt81
1 files changed, 2 insertions, 79 deletions
diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
index 8785fb87d9c..29bdf62aac0 100644
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
@@ -116,13 +116,6 @@ echo always >/sys/kernel/mm/transparent_hugepage/defrag
116echo madvise >/sys/kernel/mm/transparent_hugepage/defrag 116echo madvise >/sys/kernel/mm/transparent_hugepage/defrag
117echo never >/sys/kernel/mm/transparent_hugepage/defrag 117echo never >/sys/kernel/mm/transparent_hugepage/defrag
118 118
119By default kernel tries to use huge zero page on read page fault.
120It's possible to disable huge zero page by writing 0 or enable it
121back by writing 1:
122
123echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page
124echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page
125
126khugepaged will be automatically started when 119khugepaged will be automatically started when
127transparent_hugepage/enabled is set to "always" or "madvise, and it'll 120transparent_hugepage/enabled is set to "always" or "madvise, and it'll
128be automatically shutdown if it's set to "never". 121be automatically shutdown if it's set to "never".
@@ -173,76 +166,6 @@ behavior. So to make them effective you need to restart any
173application that could have been using hugepages. This also applies to 166application that could have been using hugepages. This also applies to
174the regions registered in khugepaged. 167the regions registered in khugepaged.
175 168
176== Monitoring usage ==
177
178The number of transparent huge pages currently used by the system is
179available by reading the AnonHugePages field in /proc/meminfo. To
180identify what applications are using transparent huge pages, it is
181necessary to read /proc/PID/smaps and count the AnonHugePages fields
182for each mapping. Note that reading the smaps file is expensive and
183reading it frequently will incur overhead.
184
185There are a number of counters in /proc/vmstat that may be used to
186monitor how successfully the system is providing huge pages for use.
187
188thp_fault_alloc is incremented every time a huge page is successfully
189 allocated to handle a page fault. This applies to both the
190 first time a page is faulted and for COW faults.
191
192thp_collapse_alloc is incremented by khugepaged when it has found
193 a range of pages to collapse into one huge page and has
194 successfully allocated a new huge page to store the data.
195
196thp_fault_fallback is incremented if a page fault fails to allocate
197 a huge page and instead falls back to using small pages.
198
199thp_collapse_alloc_failed is incremented if khugepaged found a range
200 of pages that should be collapsed into one huge page but failed
201 the allocation.
202
203thp_split is incremented every time a huge page is split into base
204 pages. This can happen for a variety of reasons but a common
205 reason is that a huge page is old and is being reclaimed.
206
207thp_zero_page_alloc is incremented every time a huge zero page is
208 successfully allocated. It includes allocations which where
209 dropped due race with other allocation. Note, it doesn't count
210 every map of the huge zero page, only its allocation.
211
212thp_zero_page_alloc_failed is incremented if kernel fails to allocate
213 huge zero page and falls back to using small pages.
214
215As the system ages, allocating huge pages may be expensive as the
216system uses memory compaction to copy data around memory to free a
217huge page for use. There are some counters in /proc/vmstat to help
218monitor this overhead.
219
220compact_stall is incremented every time a process stalls to run
221 memory compaction so that a huge page is free for use.
222
223compact_success is incremented if the system compacted memory and
224 freed a huge page for use.
225
226compact_fail is incremented if the system tries to compact memory
227 but failed.
228
229compact_pages_moved is incremented each time a page is moved. If
230 this value is increasing rapidly, it implies that the system
231 is copying a lot of data to satisfy the huge page allocation.
232 It is possible that the cost of copying exceeds any savings
233 from reduced TLB misses.
234
235compact_pagemigrate_failed is incremented when the underlying mechanism
236 for moving a page failed.
237
238compact_blocks_moved is incremented each time memory compaction examines
239 a huge page aligned range of pages.
240
241It is possible to establish how long the stalls were using the function
242tracer to record how long was spent in __alloc_pages_nodemask and
243using the mm_page_alloc tracepoint to identify which allocations were
244for huge pages.
245
246== get_user_pages and follow_page == 169== get_user_pages and follow_page ==
247 170
248get_user_pages and follow_page if run on a hugepage, will return the 171get_user_pages and follow_page if run on a hugepage, will return the
@@ -291,7 +214,7 @@ unaffected. libhugetlbfs will also work fine as usual.
291== Graceful fallback == 214== Graceful fallback ==
292 215
293Code walking pagetables but unware about huge pmds can simply call 216Code walking pagetables but unware about huge pmds can simply call
294split_huge_page_pmd(vma, addr, pmd) where the pmd is the one returned by 217split_huge_page_pmd(mm, pmd) where the pmd is the one returned by
295pmd_offset. It's trivial to make the code transparent hugepage aware 218pmd_offset. It's trivial to make the code transparent hugepage aware
296by just grepping for "pmd_offset" and adding split_huge_page_pmd where 219by just grepping for "pmd_offset" and adding split_huge_page_pmd where
297missing after pmd_offset returns the pmd. Thanks to the graceful 220missing after pmd_offset returns the pmd. Thanks to the graceful
@@ -314,7 +237,7 @@ diff --git a/mm/mremap.c b/mm/mremap.c
314 return NULL; 237 return NULL;
315 238
316 pmd = pmd_offset(pud, addr); 239 pmd = pmd_offset(pud, addr);
317+ split_huge_page_pmd(vma, addr, pmd); 240+ split_huge_page_pmd(mm, pmd);
318 if (pmd_none_or_clear_bad(pmd)) 241 if (pmd_none_or_clear_bad(pmd))
319 return NULL; 242 return NULL;
320 243