diff options
Diffstat (limited to 'Documentation/vm')
-rw-r--r-- | Documentation/vm/cleancache.txt | 43 | ||||
-rw-r--r-- | Documentation/vm/page-types.c | 2 | ||||
-rw-r--r-- | Documentation/vm/pagemap.txt | 4 | ||||
-rw-r--r-- | Documentation/vm/unevictable-lru.txt | 8 |
4 files changed, 32 insertions, 25 deletions
diff --git a/Documentation/vm/cleancache.txt b/Documentation/vm/cleancache.txt index 36c367c73084..142fbb0f325a 100644 --- a/Documentation/vm/cleancache.txt +++ b/Documentation/vm/cleancache.txt | |||
@@ -46,10 +46,11 @@ a negative return value indicates failure. A "put_page" will copy a | |||
46 | the pool id, a file key, and a page index into the file. (The combination | 46 | the pool id, a file key, and a page index into the file. (The combination |
47 | of a pool id, a file key, and an index is sometimes called a "handle".) | 47 | of a pool id, a file key, and an index is sometimes called a "handle".) |
48 | A "get_page" will copy the page, if found, from cleancache into kernel memory. | 48 | A "get_page" will copy the page, if found, from cleancache into kernel memory. |
49 | A "flush_page" will ensure the page no longer is present in cleancache; | 49 | An "invalidate_page" will ensure the page no longer is present in cleancache; |
50 | a "flush_inode" will flush all pages associated with the specified file; | 50 | an "invalidate_inode" will invalidate all pages associated with the specified |
51 | and, when a filesystem is unmounted, a "flush_fs" will flush all pages in | 51 | file; and, when a filesystem is unmounted, an "invalidate_fs" will invalidate |
52 | all files specified by the given pool id and also surrender the pool id. | 52 | all pages in all files specified by the given pool id and also surrender |
53 | the pool id. | ||
53 | 54 | ||
54 | An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache | 55 | An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache |
55 | to treat the pool as shared using a 128-bit UUID as a key. On systems | 56 | to treat the pool as shared using a 128-bit UUID as a key. On systems |
@@ -62,12 +63,12 @@ of the kernel (e.g. by "tools" that control cleancache). Or a | |||
62 | cleancache implementation can simply disable shared_init by always | 63 | cleancache implementation can simply disable shared_init by always |
63 | returning a negative value. | 64 | returning a negative value. |
64 | 65 | ||
65 | If a get_page is successful on a non-shared pool, the page is flushed (thus | 66 | If a get_page is successful on a non-shared pool, the page is invalidated |
66 | making cleancache an "exclusive" cache). On a shared pool, the page | 67 | (thus making cleancache an "exclusive" cache). On a shared pool, the page |
67 | is NOT flushed on a successful get_page so that it remains accessible to | 68 | is NOT invalidated on a successful get_page so that it remains accessible to |
68 | other sharers. The kernel is responsible for ensuring coherency between | 69 | other sharers. The kernel is responsible for ensuring coherency between |
69 | cleancache (shared or not), the page cache, and the filesystem, using | 70 | cleancache (shared or not), the page cache, and the filesystem, using |
70 | cleancache flush operations as required. | 71 | cleancache invalidate operations as required. |
71 | 72 | ||
72 | Note that cleancache must enforce put-put-get coherency and get-get | 73 | Note that cleancache must enforce put-put-get coherency and get-get |
73 | coherency. For the former, if two puts are made to the same handle but | 74 | coherency. For the former, if two puts are made to the same handle but |
@@ -77,22 +78,22 @@ if a get for a given handle fails, subsequent gets for that handle will | |||
77 | never succeed unless preceded by a successful put with that handle. | 78 | never succeed unless preceded by a successful put with that handle. |
78 | 79 | ||
79 | Last, cleancache provides no SMP serialization guarantees; if two | 80 | Last, cleancache provides no SMP serialization guarantees; if two |
80 | different Linux threads are simultaneously putting and flushing a page | 81 | different Linux threads are simultaneously putting and invalidating a page |
81 | with the same handle, the results are indeterminate. Callers must | 82 | with the same handle, the results are indeterminate. Callers must |
82 | lock the page to ensure serial behavior. | 83 | lock the page to ensure serial behavior. |
83 | 84 | ||
84 | CLEANCACHE PERFORMANCE METRICS | 85 | CLEANCACHE PERFORMANCE METRICS |
85 | 86 | ||
86 | Cleancache monitoring is done by sysfs files in the | 87 | If properly configured, monitoring of cleancache is done via debugfs in |
87 | /sys/kernel/mm/cleancache directory. The effectiveness of cleancache | 88 | the /sys/kernel/debug/mm/cleancache directory. The effectiveness of cleancache |
88 | can be measured (across all filesystems) with: | 89 | can be measured (across all filesystems) with: |
89 | 90 | ||
90 | succ_gets - number of gets that were successful | 91 | succ_gets - number of gets that were successful |
91 | failed_gets - number of gets that failed | 92 | failed_gets - number of gets that failed |
92 | puts - number of puts attempted (all "succeed") | 93 | puts - number of puts attempted (all "succeed") |
93 | flushes - number of flushes attempted | 94 | invalidates - number of invalidates attempted |
94 | 95 | ||
95 | A backend implementatation may provide additional metrics. | 96 | A backend implementation may provide additional metrics. |
96 | 97 | ||
97 | FAQ | 98 | FAQ |
98 | 99 | ||
@@ -143,7 +144,7 @@ systems. | |||
143 | 144 | ||
144 | The core hooks for cleancache in VFS are in most cases a single line | 145 | The core hooks for cleancache in VFS are in most cases a single line |
145 | and the minimum set are placed precisely where needed to maintain | 146 | and the minimum set are placed precisely where needed to maintain |
146 | coherency (via cleancache_flush operations) between cleancache, | 147 | coherency (via cleancache_invalidate operations) between cleancache, |
147 | the page cache, and disk. All hooks compile into nothingness if | 148 | the page cache, and disk. All hooks compile into nothingness if |
148 | cleancache is config'ed off and turn into a function-pointer- | 149 | cleancache is config'ed off and turn into a function-pointer- |
149 | compare-to-NULL if config'ed on but no backend claims the ops | 150 | compare-to-NULL if config'ed on but no backend claims the ops |
@@ -184,15 +185,15 @@ or for real kernel-addressable RAM, it makes perfect sense for | |||
184 | transcendent memory. | 185 | transcendent memory. |
185 | 186 | ||
186 | 4) Why is non-shared cleancache "exclusive"? And where is the | 187 | 4) Why is non-shared cleancache "exclusive"? And where is the |
187 | page "flushed" after a "get"? (Minchan Kim) | 188 | page "invalidated" after a "get"? (Minchan Kim) |
188 | 189 | ||
189 | The main reason is to free up space in transcendent memory and | 190 | The main reason is to free up space in transcendent memory and |
190 | to avoid unnecessary cleancache_flush calls. If you want inclusive, | 191 | to avoid unnecessary cleancache_invalidate calls. If you want inclusive, |
191 | the page can be "put" immediately following the "get". If | 192 | the page can be "put" immediately following the "get". If |
192 | put-after-get for inclusive becomes common, the interface could | 193 | put-after-get for inclusive becomes common, the interface could |
193 | be easily extended to add a "get_no_flush" call. | 194 | be easily extended to add a "get_no_invalidate" call. |
194 | 195 | ||
195 | The flush is done by the cleancache backend implementation. | 196 | The invalidate is done by the cleancache backend implementation. |
196 | 197 | ||
197 | 5) What's the performance impact? | 198 | 5) What's the performance impact? |
198 | 199 | ||
@@ -222,7 +223,7 @@ Some points for a filesystem to consider: | |||
222 | as tmpfs should not enable cleancache) | 223 | as tmpfs should not enable cleancache) |
223 | - To ensure coherency/correctness, the FS must ensure that all | 224 | - To ensure coherency/correctness, the FS must ensure that all |
224 | file removal or truncation operations either go through VFS or | 225 | file removal or truncation operations either go through VFS or |
225 | add hooks to do the equivalent cleancache "flush" operations | 226 | add hooks to do the equivalent cleancache "invalidate" operations |
226 | - To ensure coherency/correctness, either inode numbers must | 227 | - To ensure coherency/correctness, either inode numbers must |
227 | be unique across the lifetime of the on-disk file OR the | 228 | be unique across the lifetime of the on-disk file OR the |
228 | FS must provide an "encode_fh" function. | 229 | FS must provide an "encode_fh" function. |
@@ -243,11 +244,11 @@ If cleancache would use the inode virtual address instead of | |||
243 | inode/filehandle, the pool id could be eliminated. But, this | 244 | inode/filehandle, the pool id could be eliminated. But, this |
244 | won't work because cleancache retains pagecache data pages | 245 | won't work because cleancache retains pagecache data pages |
245 | persistently even when the inode has been pruned from the | 246 | persistently even when the inode has been pruned from the |
246 | inode unused list, and only flushes the data page if the file | 247 | inode unused list, and only invalidates the data page if the file |
247 | gets removed/truncated. So if cleancache used the inode kva, | 248 | gets removed/truncated. So if cleancache used the inode kva, |
248 | there would be potential coherency issues if/when the inode | 249 | there would be potential coherency issues if/when the inode |
249 | kva is reused for a different file. Alternately, if cleancache | 250 | kva is reused for a different file. Alternately, if cleancache |
250 | flushed the pages when the inode kva was freed, much of the value | 251 | invalidated the pages when the inode kva was freed, much of the value |
251 | of cleancache would be lost because the cache of pages in cleanache | 252 | of cleancache would be lost because the cache of pages in cleanache |
252 | is potentially much larger than the kernel pagecache and is most | 253 | is potentially much larger than the kernel pagecache and is most |
253 | useful if the pages survive inode cache removal. | 254 | useful if the pages survive inode cache removal. |
diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 7445caa26d05..0b13f02d4059 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c | |||
@@ -98,6 +98,7 @@ | |||
98 | #define KPF_HWPOISON 19 | 98 | #define KPF_HWPOISON 19 |
99 | #define KPF_NOPAGE 20 | 99 | #define KPF_NOPAGE 20 |
100 | #define KPF_KSM 21 | 100 | #define KPF_KSM 21 |
101 | #define KPF_THP 22 | ||
101 | 102 | ||
102 | /* [32-] kernel hacking assistances */ | 103 | /* [32-] kernel hacking assistances */ |
103 | #define KPF_RESERVED 32 | 104 | #define KPF_RESERVED 32 |
@@ -147,6 +148,7 @@ static const char *page_flag_names[] = { | |||
147 | [KPF_HWPOISON] = "X:hwpoison", | 148 | [KPF_HWPOISON] = "X:hwpoison", |
148 | [KPF_NOPAGE] = "n:nopage", | 149 | [KPF_NOPAGE] = "n:nopage", |
149 | [KPF_KSM] = "x:ksm", | 150 | [KPF_KSM] = "x:ksm", |
151 | [KPF_THP] = "t:thp", | ||
150 | 152 | ||
151 | [KPF_RESERVED] = "r:reserved", | 153 | [KPF_RESERVED] = "r:reserved", |
152 | [KPF_MLOCKED] = "m:mlocked", | 154 | [KPF_MLOCKED] = "m:mlocked", |
diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt index df09b9650a81..4600cbe3d6be 100644 --- a/Documentation/vm/pagemap.txt +++ b/Documentation/vm/pagemap.txt | |||
@@ -60,6 +60,7 @@ There are three components to pagemap: | |||
60 | 19. HWPOISON | 60 | 19. HWPOISON |
61 | 20. NOPAGE | 61 | 20. NOPAGE |
62 | 21. KSM | 62 | 21. KSM |
63 | 22. THP | ||
63 | 64 | ||
64 | Short descriptions to the page flags: | 65 | Short descriptions to the page flags: |
65 | 66 | ||
@@ -97,6 +98,9 @@ Short descriptions to the page flags: | |||
97 | 21. KSM | 98 | 21. KSM |
98 | identical memory pages dynamically shared between one or more processes | 99 | identical memory pages dynamically shared between one or more processes |
99 | 100 | ||
101 | 22. THP | ||
102 | contiguous pages which construct transparent hugepages | ||
103 | |||
100 | [IO related page flags] | 104 | [IO related page flags] |
101 | 1. ERROR IO error occurred | 105 | 1. ERROR IO error occurred |
102 | 3. UPTODATE page has up-to-date data | 106 | 3. UPTODATE page has up-to-date data |
diff --git a/Documentation/vm/unevictable-lru.txt b/Documentation/vm/unevictable-lru.txt index 97bae3c576c2..fa206cccf89f 100644 --- a/Documentation/vm/unevictable-lru.txt +++ b/Documentation/vm/unevictable-lru.txt | |||
@@ -538,7 +538,7 @@ different reverse map mechanisms. | |||
538 | process because mlocked pages are migratable. However, for reclaim, if | 538 | process because mlocked pages are migratable. However, for reclaim, if |
539 | the page is mapped into a VM_LOCKED VMA, the scan stops. | 539 | the page is mapped into a VM_LOCKED VMA, the scan stops. |
540 | 540 | ||
541 | try_to_unmap_anon() attempts to acquire in read mode the mmap semphore of | 541 | try_to_unmap_anon() attempts to acquire in read mode the mmap semaphore of |
542 | the mm_struct to which the VMA belongs. If this is successful, it will | 542 | the mm_struct to which the VMA belongs. If this is successful, it will |
543 | mlock the page via mlock_vma_page() - we wouldn't have gotten to | 543 | mlock the page via mlock_vma_page() - we wouldn't have gotten to |
544 | try_to_unmap_anon() if the page were already mlocked - and will return | 544 | try_to_unmap_anon() if the page were already mlocked - and will return |
@@ -619,11 +619,11 @@ all PTEs from the page. For this purpose, the unevictable/mlock infrastructure | |||
619 | introduced a variant of try_to_unmap() called try_to_munlock(). | 619 | introduced a variant of try_to_unmap() called try_to_munlock(). |
620 | 620 | ||
621 | try_to_munlock() calls the same functions as try_to_unmap() for anonymous and | 621 | try_to_munlock() calls the same functions as try_to_unmap() for anonymous and |
622 | mapped file pages with an additional argument specifing unlock versus unmap | 622 | mapped file pages with an additional argument specifying unlock versus unmap |
623 | processing. Again, these functions walk the respective reverse maps looking | 623 | processing. Again, these functions walk the respective reverse maps looking |
624 | for VM_LOCKED VMAs. When such a VMA is found for anonymous pages and file | 624 | for VM_LOCKED VMAs. When such a VMA is found for anonymous pages and file |
625 | pages mapped in linear VMAs, as in the try_to_unmap() case, the functions | 625 | pages mapped in linear VMAs, as in the try_to_unmap() case, the functions |
626 | attempt to acquire the associated mmap semphore, mlock the page via | 626 | attempt to acquire the associated mmap semaphore, mlock the page via |
627 | mlock_vma_page() and return SWAP_MLOCK. This effectively undoes the | 627 | mlock_vma_page() and return SWAP_MLOCK. This effectively undoes the |
628 | pre-clearing of the page's PG_mlocked done by munlock_vma_page. | 628 | pre-clearing of the page's PG_mlocked done by munlock_vma_page. |
629 | 629 | ||
@@ -641,7 +641,7 @@ with it - the usual fallback position. | |||
641 | Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's | 641 | Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's |
642 | reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA. | 642 | reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA. |
643 | However, the scan can terminate when it encounters a VM_LOCKED VMA and can | 643 | However, the scan can terminate when it encounters a VM_LOCKED VMA and can |
644 | successfully acquire the VMA's mmap semphore for read and mlock the page. | 644 | successfully acquire the VMA's mmap semaphore for read and mlock the page. |
645 | Although try_to_munlock() might be called a great many times when munlocking a | 645 | Although try_to_munlock() might be called a great many times when munlocking a |
646 | large region or tearing down a large address space that has been mlocked via | 646 | large region or tearing down a large address space that has been mlocked via |
647 | mlockall(), overall this is a fairly rare event. | 647 | mlockall(), overall this is a fairly rare event. |