diff options
Diffstat (limited to 'Documentation/vm/unevictable-lru.txt')
-rw-r--r-- | Documentation/vm/unevictable-lru.txt | 63 |
1 files changed, 18 insertions, 45 deletions
diff --git a/Documentation/vm/unevictable-lru.txt b/Documentation/vm/unevictable-lru.txt index 125eed560e5a..0706a7282a8c 100644 --- a/Documentation/vm/unevictable-lru.txt +++ b/Documentation/vm/unevictable-lru.txt | |||
@@ -137,13 +137,6 @@ shrink_page_list() where they will be detected when vmscan walks the reverse | |||
137 | map in try_to_unmap(). If try_to_unmap() returns SWAP_MLOCK, shrink_page_list() | 137 | map in try_to_unmap(). If try_to_unmap() returns SWAP_MLOCK, shrink_page_list() |
138 | will cull the page at that point. | 138 | will cull the page at that point. |
139 | 139 | ||
140 | Note that for anonymous pages, shrink_page_list() attempts to add the page to | ||
141 | the swap cache before it tries to unmap the page. To avoid this unnecessary | ||
142 | consumption of swap space, shrink_page_list() calls try_to_munlock() to check | ||
143 | whether any VM_LOCKED vmas map the page without attempting to unmap the page. | ||
144 | If try_to_munlock() returns SWAP_MLOCK, shrink_page_list() will cull the page | ||
145 | without consuming swap space. try_to_munlock() will be described below. | ||
146 | |||
147 | To "cull" an unevictable page, vmscan simply puts the page back on the lru | 140 | To "cull" an unevictable page, vmscan simply puts the page back on the lru |
148 | list using putback_lru_page()--the inverse operation to isolate_lru_page()-- | 141 | list using putback_lru_page()--the inverse operation to isolate_lru_page()-- |
149 | after dropping the page lock. Because the condition which makes the page | 142 | after dropping the page lock. Because the condition which makes the page |
@@ -190,8 +183,8 @@ several places: | |||
190 | in the VM_LOCKED flag being set for the vma. | 183 | in the VM_LOCKED flag being set for the vma. |
191 | 3) in the fault path, if mlocked pages are "culled" in the fault path, | 184 | 3) in the fault path, if mlocked pages are "culled" in the fault path, |
192 | and when a VM_LOCKED stack segment is expanded. | 185 | and when a VM_LOCKED stack segment is expanded. |
193 | 4) as mentioned above, in vmscan:shrink_page_list() with attempting to | 186 | 4) as mentioned above, in vmscan:shrink_page_list() when attempting to |
194 | reclaim a page in a VM_LOCKED vma--via try_to_unmap() or try_to_munlock(). | 187 | reclaim a page in a VM_LOCKED vma via try_to_unmap(). |
195 | 188 | ||
196 | Mlocked pages become unlocked and rescued from the unevictable list when: | 189 | Mlocked pages become unlocked and rescued from the unevictable list when: |
197 | 190 | ||
@@ -260,9 +253,9 @@ mlock_fixup() filters several classes of "special" vmas: | |||
260 | 253 | ||
261 | 2) vmas mapping hugetlbfs page are already effectively pinned into memory. | 254 | 2) vmas mapping hugetlbfs page are already effectively pinned into memory. |
262 | We don't need nor want to mlock() these pages. However, to preserve the | 255 | We don't need nor want to mlock() these pages. However, to preserve the |
263 | prior behavior of mlock()--before the unevictable/mlock changes--mlock_fixup() | 256 | prior behavior of mlock()--before the unevictable/mlock changes-- |
264 | will call make_pages_present() in the hugetlbfs vma range to allocate the | 257 | mlock_fixup() will call make_pages_present() in the hugetlbfs vma range |
265 | huge pages and populate the ptes. | 258 | to allocate the huge pages and populate the ptes. |
266 | 259 | ||
267 | 3) vmas with VM_DONTEXPAND|VM_RESERVED are generally user space mappings of | 260 | 3) vmas with VM_DONTEXPAND|VM_RESERVED are generally user space mappings of |
268 | kernel pages, such as the vdso page, relay channel pages, etc. These pages | 261 | kernel pages, such as the vdso page, relay channel pages, etc. These pages |
@@ -322,7 +315,7 @@ __mlock_vma_pages_range()--the same function used to mlock a vma range-- | |||
322 | passing a flag to indicate that munlock() is being performed. | 315 | passing a flag to indicate that munlock() is being performed. |
323 | 316 | ||
324 | Because the vma access protections could have been changed to PROT_NONE after | 317 | Because the vma access protections could have been changed to PROT_NONE after |
325 | faulting in and mlocking some pages, get_user_pages() was unreliable for visiting | 318 | faulting in and mlocking pages, get_user_pages() was unreliable for visiting |
326 | these pages for munlocking. Because we don't want to leave pages mlocked(), | 319 | these pages for munlocking. Because we don't want to leave pages mlocked(), |
327 | get_user_pages() was enhanced to accept a flag to ignore the permissions when | 320 | get_user_pages() was enhanced to accept a flag to ignore the permissions when |
328 | fetching the pages--all of which should be resident as a result of previous | 321 | fetching the pages--all of which should be resident as a result of previous |
@@ -416,8 +409,8 @@ Mlocked Pages: munmap()/exit()/exec() System Call Handling | |||
416 | When unmapping an mlocked region of memory, whether by an explicit call to | 409 | When unmapping an mlocked region of memory, whether by an explicit call to |
417 | munmap() or via an internal unmap from exit() or exec() processing, we must | 410 | munmap() or via an internal unmap from exit() or exec() processing, we must |
418 | munlock the pages if we're removing the last VM_LOCKED vma that maps the pages. | 411 | munlock the pages if we're removing the last VM_LOCKED vma that maps the pages. |
419 | Before the unevictable/mlock changes, mlocking did not mark the pages in any way, | 412 | Before the unevictable/mlock changes, mlocking did not mark the pages in any |
420 | so unmapping them required no processing. | 413 | way, so unmapping them required no processing. |
421 | 414 | ||
422 | To munlock a range of memory under the unevictable/mlock infrastructure, the | 415 | To munlock a range of memory under the unevictable/mlock infrastructure, the |
423 | munmap() hander and task address space tear down function call | 416 | munmap() hander and task address space tear down function call |
@@ -517,12 +510,10 @@ couldn't be mlocked. | |||
517 | Mlocked pages: try_to_munlock() Reverse Map Scan | 510 | Mlocked pages: try_to_munlock() Reverse Map Scan |
518 | 511 | ||
519 | TODO/FIXME: a better name might be page_mlocked()--analogous to the | 512 | TODO/FIXME: a better name might be page_mlocked()--analogous to the |
520 | page_referenced() reverse map walker--especially if we continue to call this | 513 | page_referenced() reverse map walker. |
521 | from shrink_page_list(). See related TODO/FIXME below. | ||
522 | 514 | ||
523 | When munlock_vma_page()--see "Mlocked Pages: munlock()/munlockall() System | 515 | When munlock_vma_page()--see "Mlocked Pages: munlock()/munlockall() |
524 | Call Handling" above--tries to munlock a page, or when shrink_page_list() | 516 | System Call Handling" above--tries to munlock a page, it needs to |
525 | encounters an anonymous page that is not yet in the swap cache, they need to | ||
526 | determine whether or not the page is mapped by any VM_LOCKED vma, without | 517 | determine whether or not the page is mapped by any VM_LOCKED vma, without |
527 | actually attempting to unmap all ptes from the page. For this purpose, the | 518 | actually attempting to unmap all ptes from the page. For this purpose, the |
528 | unevictable/mlock infrastructure introduced a variant of try_to_unmap() called | 519 | unevictable/mlock infrastructure introduced a variant of try_to_unmap() called |
@@ -535,10 +526,7 @@ for VM_LOCKED vmas. When such a vma is found for anonymous pages and file | |||
535 | pages mapped in linear VMAs, as in the try_to_unmap() case, the functions | 526 | pages mapped in linear VMAs, as in the try_to_unmap() case, the functions |
536 | attempt to acquire the associated mmap semphore, mlock the page via | 527 | attempt to acquire the associated mmap semphore, mlock the page via |
537 | mlock_vma_page() and return SWAP_MLOCK. This effectively undoes the | 528 | mlock_vma_page() and return SWAP_MLOCK. This effectively undoes the |
538 | pre-clearing of the page's PG_mlocked done by munlock_vma_page() and informs | 529 | pre-clearing of the page's PG_mlocked done by munlock_vma_page. |
539 | shrink_page_list() that the anonymous page should be culled rather than added | ||
540 | to the swap cache in preparation for a try_to_unmap() that will almost | ||
541 | certainly fail. | ||
542 | 530 | ||
543 | If try_to_unmap() is unable to acquire a VM_LOCKED vma's associated mmap | 531 | If try_to_unmap() is unable to acquire a VM_LOCKED vma's associated mmap |
544 | semaphore, it will return SWAP_AGAIN. This will allow shrink_page_list() | 532 | semaphore, it will return SWAP_AGAIN. This will allow shrink_page_list() |
@@ -557,10 +545,7 @@ However, the scan can terminate when it encounters a VM_LOCKED vma and can | |||
557 | successfully acquire the vma's mmap semphore for read and mlock the page. | 545 | successfully acquire the vma's mmap semphore for read and mlock the page. |
558 | Although try_to_munlock() can be called many [very many!] times when | 546 | Although try_to_munlock() can be called many [very many!] times when |
559 | munlock()ing a large region or tearing down a large address space that has been | 547 | munlock()ing a large region or tearing down a large address space that has been |
560 | mlocked via mlockall(), overall this is a fairly rare event. In addition, | 548 | mlocked via mlockall(), overall this is a fairly rare event. |
561 | although shrink_page_list() calls try_to_munlock() for every anonymous page that | ||
562 | it handles that is not yet in the swap cache, on average anonymous pages will | ||
563 | have very short reverse map lists. | ||
564 | 549 | ||
565 | Mlocked Page: Page Reclaim in shrink_*_list() | 550 | Mlocked Page: Page Reclaim in shrink_*_list() |
566 | 551 | ||
@@ -588,8 +573,8 @@ Some examples of these unevictable pages on the LRU lists are: | |||
588 | munlock_vma_page() was forced to let the page back on to the normal | 573 | munlock_vma_page() was forced to let the page back on to the normal |
589 | LRU list for vmscan to handle. | 574 | LRU list for vmscan to handle. |
590 | 575 | ||
591 | shrink_inactive_list() also culls any unevictable pages that it finds | 576 | shrink_inactive_list() also culls any unevictable pages that it finds on |
592 | on the inactive lists, again diverting them to the appropriate zone's unevictable | 577 | the inactive lists, again diverting them to the appropriate zone's unevictable |
593 | lru list. shrink_inactive_list() should only see SHM_LOCKed pages that became | 578 | lru list. shrink_inactive_list() should only see SHM_LOCKed pages that became |
594 | SHM_LOCKed after shrink_active_list() had moved them to the inactive list, or | 579 | SHM_LOCKed after shrink_active_list() had moved them to the inactive list, or |
595 | pages mapped into VM_LOCKED vmas that munlock_vma_page() couldn't isolate from | 580 | pages mapped into VM_LOCKED vmas that munlock_vma_page() couldn't isolate from |
@@ -597,19 +582,7 @@ the lru to recheck via try_to_munlock(). shrink_inactive_list() won't notice | |||
597 | the latter, but will pass on to shrink_page_list(). | 582 | the latter, but will pass on to shrink_page_list(). |
598 | 583 | ||
599 | shrink_page_list() again culls obviously unevictable pages that it could | 584 | shrink_page_list() again culls obviously unevictable pages that it could |
600 | encounter for similar reason to shrink_inactive_list(). As already discussed, | 585 | encounter for similar reason to shrink_inactive_list(). Pages mapped into |
601 | shrink_page_list() proactively looks for anonymous pages that should have | ||
602 | PG_mlocked set but don't--these would not be detected by page_evictable()--to | ||
603 | avoid adding them to the swap cache unnecessarily. File pages mapped into | ||
604 | VM_LOCKED vmas but without PG_mlocked set will make it all the way to | 586 | VM_LOCKED vmas but without PG_mlocked set will make it all the way to |
605 | try_to_unmap(). shrink_page_list() will divert them to the unevictable list when | 587 | try_to_unmap(). shrink_page_list() will divert them to the unevictable list |
606 | try_to_unmap() returns SWAP_MLOCK, as discussed above. | 588 | when try_to_unmap() returns SWAP_MLOCK, as discussed above. |
607 | |||
608 | TODO/FIXME: If we can enhance the swap cache to reliably remove entries | ||
609 | with page_count(page) > 2, as long as all ptes are mapped to the page and | ||
610 | not the swap entry, we can probably remove the call to try_to_munlock() in | ||
611 | shrink_page_list() and just remove the page from the swap cache when | ||
612 | try_to_unmap() returns SWAP_MLOCK. Currently, remove_exclusive_swap_page() | ||
613 | doesn't seem to allow that. | ||
614 | |||
615 | |||