diff options
Diffstat (limited to 'Documentation/vm/transhuge.txt')
-rw-r--r-- | Documentation/vm/transhuge.txt | 81 |
1 files changed, 2 insertions, 79 deletions
diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt index 8785fb87d9c..29bdf62aac0 100644 --- a/Documentation/vm/transhuge.txt +++ b/Documentation/vm/transhuge.txt | |||
@@ -116,13 +116,6 @@ echo always >/sys/kernel/mm/transparent_hugepage/defrag | |||
116 | echo madvise >/sys/kernel/mm/transparent_hugepage/defrag | 116 | echo madvise >/sys/kernel/mm/transparent_hugepage/defrag |
117 | echo never >/sys/kernel/mm/transparent_hugepage/defrag | 117 | echo never >/sys/kernel/mm/transparent_hugepage/defrag |
118 | 118 | ||
119 | By default kernel tries to use huge zero page on read page fault. | ||
120 | It's possible to disable huge zero page by writing 0 or enable it | ||
121 | back by writing 1: | ||
122 | |||
123 | echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page | ||
124 | echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page | ||
125 | |||
126 | khugepaged will be automatically started when | 119 | khugepaged will be automatically started when |
127 | transparent_hugepage/enabled is set to "always" or "madvise, and it'll | 120 | transparent_hugepage/enabled is set to "always" or "madvise, and it'll |
128 | be automatically shutdown if it's set to "never". | 121 | be automatically shutdown if it's set to "never". |
@@ -173,76 +166,6 @@ behavior. So to make them effective you need to restart any | |||
173 | application that could have been using hugepages. This also applies to | 166 | application that could have been using hugepages. This also applies to |
174 | the regions registered in khugepaged. | 167 | the regions registered in khugepaged. |
175 | 168 | ||
176 | == Monitoring usage == | ||
177 | |||
178 | The number of transparent huge pages currently used by the system is | ||
179 | available by reading the AnonHugePages field in /proc/meminfo. To | ||
180 | identify what applications are using transparent huge pages, it is | ||
181 | necessary to read /proc/PID/smaps and count the AnonHugePages fields | ||
182 | for each mapping. Note that reading the smaps file is expensive and | ||
183 | reading it frequently will incur overhead. | ||
184 | |||
185 | There are a number of counters in /proc/vmstat that may be used to | ||
186 | monitor how successfully the system is providing huge pages for use. | ||
187 | |||
188 | thp_fault_alloc is incremented every time a huge page is successfully | ||
189 | allocated to handle a page fault. This applies to both the | ||
190 | first time a page is faulted and for COW faults. | ||
191 | |||
192 | thp_collapse_alloc is incremented by khugepaged when it has found | ||
193 | a range of pages to collapse into one huge page and has | ||
194 | successfully allocated a new huge page to store the data. | ||
195 | |||
196 | thp_fault_fallback is incremented if a page fault fails to allocate | ||
197 | a huge page and instead falls back to using small pages. | ||
198 | |||
199 | thp_collapse_alloc_failed is incremented if khugepaged found a range | ||
200 | of pages that should be collapsed into one huge page but failed | ||
201 | the allocation. | ||
202 | |||
203 | thp_split is incremented every time a huge page is split into base | ||
204 | pages. This can happen for a variety of reasons but a common | ||
205 | reason is that a huge page is old and is being reclaimed. | ||
206 | |||
207 | thp_zero_page_alloc is incremented every time a huge zero page is | ||
208 | successfully allocated. It includes allocations which where | ||
209 | dropped due race with other allocation. Note, it doesn't count | ||
210 | every map of the huge zero page, only its allocation. | ||
211 | |||
212 | thp_zero_page_alloc_failed is incremented if kernel fails to allocate | ||
213 | huge zero page and falls back to using small pages. | ||
214 | |||
215 | As the system ages, allocating huge pages may be expensive as the | ||
216 | system uses memory compaction to copy data around memory to free a | ||
217 | huge page for use. There are some counters in /proc/vmstat to help | ||
218 | monitor this overhead. | ||
219 | |||
220 | compact_stall is incremented every time a process stalls to run | ||
221 | memory compaction so that a huge page is free for use. | ||
222 | |||
223 | compact_success is incremented if the system compacted memory and | ||
224 | freed a huge page for use. | ||
225 | |||
226 | compact_fail is incremented if the system tries to compact memory | ||
227 | but failed. | ||
228 | |||
229 | compact_pages_moved is incremented each time a page is moved. If | ||
230 | this value is increasing rapidly, it implies that the system | ||
231 | is copying a lot of data to satisfy the huge page allocation. | ||
232 | It is possible that the cost of copying exceeds any savings | ||
233 | from reduced TLB misses. | ||
234 | |||
235 | compact_pagemigrate_failed is incremented when the underlying mechanism | ||
236 | for moving a page failed. | ||
237 | |||
238 | compact_blocks_moved is incremented each time memory compaction examines | ||
239 | a huge page aligned range of pages. | ||
240 | |||
241 | It is possible to establish how long the stalls were using the function | ||
242 | tracer to record how long was spent in __alloc_pages_nodemask and | ||
243 | using the mm_page_alloc tracepoint to identify which allocations were | ||
244 | for huge pages. | ||
245 | |||
246 | == get_user_pages and follow_page == | 169 | == get_user_pages and follow_page == |
247 | 170 | ||
248 | get_user_pages and follow_page if run on a hugepage, will return the | 171 | get_user_pages and follow_page if run on a hugepage, will return the |
@@ -291,7 +214,7 @@ unaffected. libhugetlbfs will also work fine as usual. | |||
291 | == Graceful fallback == | 214 | == Graceful fallback == |
292 | 215 | ||
293 | Code walking pagetables but unware about huge pmds can simply call | 216 | Code walking pagetables but unware about huge pmds can simply call |
294 | split_huge_page_pmd(vma, addr, pmd) where the pmd is the one returned by | 217 | split_huge_page_pmd(mm, pmd) where the pmd is the one returned by |
295 | pmd_offset. It's trivial to make the code transparent hugepage aware | 218 | pmd_offset. It's trivial to make the code transparent hugepage aware |
296 | by just grepping for "pmd_offset" and adding split_huge_page_pmd where | 219 | by just grepping for "pmd_offset" and adding split_huge_page_pmd where |
297 | missing after pmd_offset returns the pmd. Thanks to the graceful | 220 | missing after pmd_offset returns the pmd. Thanks to the graceful |
@@ -314,7 +237,7 @@ diff --git a/mm/mremap.c b/mm/mremap.c | |||
314 | return NULL; | 237 | return NULL; |
315 | 238 | ||
316 | pmd = pmd_offset(pud, addr); | 239 | pmd = pmd_offset(pud, addr); |
317 | + split_huge_page_pmd(vma, addr, pmd); | 240 | + split_huge_page_pmd(mm, pmd); |
318 | if (pmd_none_or_clear_bad(pmd)) | 241 | if (pmd_none_or_clear_bad(pmd)) |
319 | return NULL; | 242 | return NULL; |
320 | 243 | ||